Parse errors…

While trying to put my PHP tutorial back online I have just spend at least half an hour fighting the markup language! The problem is the lack of tables in Markdown.

I wanted a table listing the operators in one column and their meaning in another column — a quite simple talk one would think, and something which I was able to do in PhpWiki with only a minor problem: I could not use || when describing the or operator, since the latter is used in the definition of the table itself.

With Markdown I had to write the table myself. No problem, I’ve written lots of tables by hand! But no, everytime I saved the page the closing table tag somehow disappeared. This messed the page up quite severely.

Next idea: make the table in plain ASCII. It wont be as nice, but it ought to work. But no, even within a code block, where the lines are interpreted literally, I got into trouble. The line with the < ate the following spaces.

These things are what I hate the most with all those PHP content-management systems: they are all so fragile! The parsing done in Markdown is based on regular expressions, and so is the parsing in all other systems I’ve seen. This just doesn’t work reliably — my experience is that you either get strange results like I did, or that you get silly limitations, or you end up with both.

The limitations I’m talking about is when you in PhpWiki cannot apply formatting markup to a link. A quick test to show that it works with Markdown.

I believe that using stronger tools for the parsing would help with these problems — in particular defining a proper grammar and writing a lexer and parser would make things more robust. When people submit a comment with parse errors it would be up to the compiler to flag them as such. It wont be easy to make a compiler with good error-recovery for such a system.

But if it were done, then we would in effect have a system of writing valid XHTML without all the tags — that is a worthy goal! And given such a precise understanding of the structure of the text, one could easily convert it into all sorts of interesting formats such as LaTeX (for later conversion into good-looking PDFs) and ASCII (for inclusion in README files and such).

Looking at the source code for the GNU Flex and Bison tools one sees that they are not exactly trivial to reimplement in PHP — far from it. But I still hope that they will some day either support PHP natively, or that we get another lexer/parser framework for PHP.

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI (optional)
Your Comment (smaller size | larger size)

Formatted using Markdown: use blank lines to separate paragraphs, * for emphasis, and backticks (`) around code.

(Googlebot visited this page Sunday, December 18, 2005)