[wp-hackers] WP issues

Sam Angove sam at rephrase.net
Tue Jun 5 00:58:14 GMT 2007


On 6/4/07, Geoffrey Sneddon <foolistbar at googlemail.com> wrote:
>
> How do you propose you get from the parser output to XHTML, if not a
> serialiser?

I was addressing your insistence that an XML serialiser would solve
all of our validity woes. The hard part is generating a good input
stream for it; once that's done, it's far from impossible to serialise
valid output through string concatenation and other nasties. (I'm not
saying it's better, just that it's possible.) That part is
comparatively easy.


> The rules for HTML and XHTML differ (though if serving as text/html
> they follow neither specification), and you must therefore convert
> between the two (which is harder than it sounds).

The declared content-type is irrelevant. Most of the markup we'd be
likely to encounter is more or less common to both -- quoted
attributes, no missing end tags, no xmlns, whatever.

Again, the issue is the part of the system that can recognize, say, a
missing end tag, and generate a parsing event for it. (Especially
given that some of the input is guaranteed to be illegal HTML too.)


> And there's several HTML5 parsers under development (HTML5 aims to be
> compatible with classic HTML parsers, and current web content):
> http://php-html5lib.dashslot.net/
> http://jero.net/lab/ph5p/

The HTML5 parsers were never an option before (they didn't exist), but
they're definitely worth looking into.


More information about the wp-hackers mailing list