Parse errors…

While trying to put my PHP tutorial back online I have just spend at least half an hour fighting the markup language! The problem is the lack of tables in [Markdown][].

I wanted a table listing the operators in one column and their meaning in another column — a quite simple talk one would think, and something which I was able to do in PhpWiki with only a minor problem: I could not use || when describing the or operator, since the latter is used in the definition of the table itself.

With Markdown I had to write the table myself. No problem, I’ve written lots of tables by hand! But no, everytime I saved the page the closing table tag somehow disappeared. This messed the page up quite severely.

Next idea: make the table in plain ASCII. It wont be as nice, but it ought to work. But no, even within a code block, where the lines are interpreted literally, I got into trouble. The line with the < ate the following spaces.

These things are what I hate the most with all those PHP content-management systems: they are all so fragile! The parsing done in [Markdown][] is based on regular expressions, and so is the parsing in all other systems I’ve seen. This just doesn’t work reliably — my experience is that you either get strange results like I did, or that you get silly limitations, or you end up with both.

The limitations I’m talking about is when you in PhpWiki cannot apply formatting markup to a link. A quick test to show that it works with Markdown.

I believe that using stronger tools for the parsing would help with these problems — in particular defining a proper grammar and writing a lexer and parser would make things more robust. When people submit a comment with parse errors it would be up to the compiler to flag them as such. It wont be easy to make a compiler with good error-recovery for such a system.

But if it were done, then we would in effect have a system of writing valid [XHTML][] without all the tags — that is a worthy goal! And given such a precise understanding of the structure of the text, one could easily convert it into all sorts of interesting formats such as [LaTeX][] (for later conversion into good-looking PDFs) and ASCII (for inclusion in README files and such).

Looking at the source code for the [GNU][] Flex and Bison tools one sees that they are not exactly trivial to reimplement in PHP — far from it. But I still hope that they will some day either support PHP natively, or that we get another lexer/parser framework for PHP.

Leave a comment