Kristian’s new blog

So, Kristian also decided to move his blog to WordPress: http://zianet.dk/blog/ — the link in the sidebar is updated! He has his old posts there too, pretty cool.

Having gone from a simple home-made system based on HTML fragments, to a system with Wiki markup, and now to a dedicated blogging system using Markdown markup I now have posts in three formats. No two, since I’ve converted the HTML fragments.

I think we need a simple format to store posts in to avoid such stupid situations. This format could very well be an XML dialect, but that’s not the important part, the important part is to have a single format for blog posts. Posts could then be translated into whatever format your favorite blog requires, be it XHTML, Markdown or a Wiki markup format.

If this common format is an XML dialect it would be easy to parse, but tedious to edit — XML is not meant to be edited by humans. (Not that it’s impossible, using the nxml-mode for Emacs it’s not that bad.) So to make it efficient to we need to be able to map back and forth between the canonical XML format and a Wiki-like (or Markdown, call it what you want) variant.

Given such a format and solid conversion tools I hope that we can avoid parse errors like those I’ve written about before. And we would have a more versatile tool than what we have with XHTML, Markdown, Wiki, and all the other markup languages separated.

Problematic special characters

Using the Wiki-like markup system PHP Markdown is not without issues. Or rather, testing a little on the PHP Markdown Web Dingus (I wonder what a “dingus” is?) shows that it must be the way WordPress handles the data that causes the problem.

Take a look at this comment. The double-quotes are escaped with a backslash, why?! My guess is that WordPress is broken with regard to special characters. It reminds me of the problems I had when I was just starting out learning PHP. But please don’t tell me that WordPress has such embarissing mistakes! A system installed on more than 12,126 systems (as of February 17th 2005) should be better tested than this.

Auto-converted old news

Hehe — I suddenly have 240 posts in my blog! They are old news, actually the very first “news items” I put on my site back in May 2000 long before there was anything called “blogs” around. Or at least before I knew anything about them.

All the new old posts are categorized as “uncategorized” until I look them through and fix any missing cruft and assign a category to them.

I was my good friend Kristian who got me started with using Linux, programming in PHP and HTML back when we both was in gymnasiet (high-school). It was an exciting and enormous world that opened itself to me. Since then my computer has been my favorite toy!

Redirections in place

I’ve just added a bunch of mod_rewrite rules to my .htaccess file so that people trying to access old content wont be totally lost. Many of them will still look in vain because the Wiki pages are offline — I still have the database so they might reappear some day when I get a lot of time.

Redirection is cool like that, and because I send back “301 Moved Permanently” headers intelligent software (like Google) will automatically know to update the links.

Securing my data

My machine is now running with /home on my RAID-1 mirror. When I booted my machine with one of the drives turned off (by removing the power cable) it didn’t make a fuss. Putting the “faulty” disk back online was a simple matter of adding it back to the array. The RAID then resynched and was back to normal status after 40 minutes. Pretty cool!

I ran bonnie++ to test the read performance on my regular /dev/sda drive and on the /dev/md0 RAID mirror. The results from my normal disk:

------Sequential Output------
-Per Chr- --Block-- -Rewrite-
K/sec %CP K/sec %CP K/sec %CP
33601  94 49338  15 17551   4
--Sequential Input- --Random-
-Per Chr- --Block-- --Seeks--
K/sec %CP K/sec %CP  /sec %CP
16533  46 42633   6 184.3   0
------Sequential Create------
-Create-- --Read--- -Delete--
 /sec %CP  /sec %CP  /sec %CP
22150  87 +++++ +++ 22595  99
--------Random Create--------
-Create-- --Read--- -Delete--
 /sec %CP  /sec %CP  /sec %CP
21485  87 +++++ +++ 21210  99

compared to the results from my RAID mirror:

------Sequential Output------
-Per Chr- --Block-- -Rewrite-
K/sec %CP K/sec %CP K/sec %CP
32071  91 54533  16 24885   6
--Sequential Input- --Random-
-Per Chr- --Block-- --Seeks--
K/sec %CP K/sec %CP  /sec %CP
22946  62 52236   6 382.1   0
------Sequential Create------
-Create-- --Read--- -Delete--
 /sec %CP  /sec %CP  /sec %CP
26386  92 +++++ +++ 24380  99
--------Random Create--------
-Create-- --Read--- -Delete--
 /sec %CP  /sec %CP  /sec %CP
27755  99 +++++ +++ 22607  99

The performance is better in all areas, except that the CPU utilization is a tad higher. The read performance went up from 42 MiB/s to 52 MiB/s, an increase of 25%. I expected an increase, but it could have been bigger considering that the read requests are balanced over the two drives. But then again, the main goal of the RAID was to make sure that my data will be kept safe, so the increase in performance is just an added bonus!

Even the write performance went up, from 49 MiB/s to 54 MiB/s. This is a bit strange, since every RAID Howto I’ve read explains how the write performance should drop using RAID-1. This is because each block is put on the bus twice, once for each disk. But who am I to complain? :-)

With my data backed by two disks I’m feeling fairly safe on that front. Of course my computer could still be stolen, hit by lightning, or the data could simply be deleted. To protect against the latter I’ve installed dirvish to take backups of /home to my normal disk.

These backups are made daily, and rotated so that I have images for the last two weeks. The nice feature of dirvish is that the backups are live — they exist on my disk as a normal filesystem tree.

Normally it would require a hidious amount of space to keep two weeks worth of full backups, but since dirvish only uses space for new and changed files it should do just fine with my 80 GB disk. The trick is using hard links for files which haven’t changed between backups — hardlinks take up almost no space,or rather, they take up inodes, but ReiserFS (which is the filesystem I use) allocates inodes dynamically as needed so I wont suddenly run out of them.

So with my data spread over no less than three disks I can sleep with ease at night :-)