Auto-converted old news

Hehe — I suddenly have 240 posts in my blog! They are old news, actually the very first “news items” I put on my site back in May 2000 long before there was anything called “blogs” around. Or at least before I knew anything about them.

All the new old posts are categorized as “uncategorized” until I look them through and fix any missing cruft and assign a category to them.

I was my good friend Kristian who got me started with using Linux, programming in PHP and HTML back when we both was in gymnasiet (high-school). It was an exciting and enormous world that opened itself to me. Since then my computer has been my favorite toy!

5 Comments

  1. Kristian Kristensen:

    Yeah, those were the days ;-) I too remember our first stabs at PHP codin’

    On another note, how did you do the migration? ‘Coz I’m considering converting all my stuff to WordPress and hosting it on my own server…

  2. Thomas Mølhave:

    Are you going to import your posts from the wiki as well?

  3. Martin:

    Are you going to import your posts from the wiki as well?

    Yes, that’s the plan. I don’t know how difficult that will be, though, but it would certainly be cool to have all five years covered in full.

    The conversion would have to extract the content, parse the Wiki markup enough to translate it into Markdown markup (hehe :-) and then insert it into WordPress. Since I’ve only used pretty basic Wiki markup I think it can be done.

  4. Martin:

    On another note, how did you do the migration?

    Through the magic powers of regular expressions! :-) Really, I just converted links written in HTML markup into the corresponding Markdown markup. I then “unwrapped” the paragraphs (removed the tags) and — presto! — I had lots of posts.

    I didn’t bother converting lists and more advanced HTML into Markdown since I only had a handful of posts using such things.

  5. Martin:

    Ehh, the actual conversion code might be interesting, so here it is. Imagine that the body of the post is a HTML fragment stored in $body. I then ran it through this code:

    function wordwrap_p($matches) {
      return "\n\n" . wordwrap($matches[1]);
    }
    
    $body = preg_replace(”| *\n *|”,
                         ‘ ‘, $body);
    $body = preg_replace(’|<a href=”(.*?)”>(.*?)</a>|’,
                         ‘[$2]($1)’, $body);
    $body = preg_replace(’|<acronym>(.*?)</acronym>|’,
                         ‘$1′, $body);
    $body = preg_replace(’|<i>(.*?)</i>|’,
                         ‘*$1*’, $body);
    $body = preg_replace(’|<b>(.*?)</b>|’,
                         ‘**$1**’, $body);
    $body = preg_replace(’|<code>(.*?)</code>|’,
                         ‘`$1`’, $body);
    $body = str_replace(’&quot;’, ‘”‘, $body);
    $body = str_replace(’&eacute;’,  ‘é’, $body);
    $body = str_replace(’&aelig;’,  ‘æ’, $body);
    $body = str_replace(’&oslash;’, ‘ø’, $body);
    $body = str_replace(’&aring;’,  ‘Ã¥’, $body);
    $body = str_replace(’&Aelig;’,  ‘Æ’, $body);
    $body = str_replace(’&Oslash;’, ‘Ø’, $body);
    $body = str_replace(’&Aring;’,  ‘Å’, $body);
    $body = str_replace(’ - ‘, ‘ — ‘, $body);
    $body = preg_replace_callback(’| *<p>(.*?)</p>|’,
                                  ‘wordwrap_p’, $body);
    $body = trim($body);
    

    After that treatment it was a matter of using the wp_insert_post() function. I included wp-config.php to get access to it.

Leave a comment