[wp-hackers] WP issues

Sam Angove sam at rephrase.net
Sun Jun 3 15:06:24 GMT 2007


On 6/3/07, Geoffrey Sneddon <foolistbar at googlemail.com> wrote:
>
> Parse it as HTML, then serialise it. You'll have well-formed and
> valid output. Trying to parse HTML using regular expressions will
> never work ...

A serialiser approaches irrelevancy if we have a sufficiently good
HTML parser. The single most important and problematic part of the
system is the part that handles unexpected input, which is to say,
errors. That part is presumably in the parser.

If we had such a parser, well-formedness would be a trivial problem.
That has *nothing* to do with a serialiser: if we can guarantee good
input, good output is easy.

But we have not had such a parser, and no-one has ever offered one. If
you have one, that's a game-changer. I would very much like the
technical details, or better yet the code.


> I was advocating either ensuring that the output is valid (as any
> XHTML parser can throw a fatal error on invalid XHTML (and yes, I do
> mean invalid, not malformed)) or not using something where fatal
> errors can be thrown (such as HTML).

I agree in theory, but there are practical considerations. If you have
a workable solution to the XHTML validity problem, hooray; I'll wait
for the details. (This comes right back to the HTML parser.) If not,
there are good reasons for maintaining the status quo.


More information about the wp-hackers mailing list