Skip to Content

Migrate posts from Wordpress to Hugo for Blogdown

Once I found out how to make websites in R with the blogdown library I decided to migrate another blog that I have from wordpress to hugo.

Here there are the steps that I did to migrate the old posts.
I used a Macbook with Python installed.

1) Export XML from Wordpress

Go to Wordpress Admin –> Tools –> Export –> select posts.

2) In Terminal:

  • change directory to the one where you want to save the files
  • git clone
  • sudo easy_install pip
  • sudo pip install pyyaml
  • sudo pip install beautifulsoup4
  • sudo pip install html2text

In the directory where the exitwp git clone has been created:

  • Move the Wordpress export XML in the wordpress-xml directory.

And back to Terminal:

  • Run in Terminal xmllint YOUR_WORDPRESS_EXPORT.xml
  • Back in the exitwp folder, run python2

3) Clean the markdown files created in the above step

The above steps will create a folder containing a markdown file for each old post.
Then I need to adapt those old posts for the new blog with Hugo.

Change the file formats from .markdown to .md
folder_path <- "~/Desktop/posts"
files = list.files(folder_path, full.names = TRUE)
file.rename(files, sub('[.]markdown$', '.md', files))
Remove the date in front of the name for each file
files = list.files(folder_path, full.names = TRUE)
file.rename(files, paste0(folder_path,"/",substring(files, 40))) #change this number according to the lenght of the folder path
Remove and replace text in YALM for each post

This is specific for your own posts. For my ones I just needed for each file to do some replacements.

files = list.files(folder_path, full.names = TRUE)

for (i in 1:length(files)) {
tx  <- readLines(files[i])
tx  <- gsub("layout: post", "type: post", x = tx)
tx  <-gsub("^\\[Amazon.*NoScript\\)\\t\\t$", "", tx)
tx  <-gsub("^\\[Amazon.*NoScript\\)$", "", tx)
writeLines(tx, con=files[i])

4) Final manual step

Copy the markdown files created above into the new blogdown project in the folder content/post.

comments powered by Disqus