got some markdown files exported from weird proprietary platforms like wordpress, tumblr, or livejournal?
my issue was that i had lots of weird standards of headers, to the .md files, and had disgusting inline HTML that had been languishing inside the blog posts from the mid 00s.
this is how you clean them up, to remove all the weird HTML that they dump inline into your otherwise beautiful writing:
convert the markdown files to pure HTML using pandoc
convert them back to markdown
use these sed programmes to remove the headers, which in my case were delimited by ‘+++’:
sed ‘s/+++/&\n/g’ file.txt(pushes all the text AFTER every instance of ‘+++’ onto a new line, to make the next command effective)
sed ‘/+++/,/+++/d’ file.txt(deletes everything between the ‘+++’ header delimiters, INCLUDING the delimiters
done (: cleaned up markdown files!!
i am now ready to import ~850 markdown blog posts from august 2006 up until present day into Logarion, an OCaml static site generator!