[wp-hackers] WP issues

Sun Jun 3 11:05:16 GMT 2007

On 2 Jun 2007, at 16:44, Sam Angove wrote:

> On 6/3/07, Geoffrey Sneddon <foolistbar at googlemail.com> wrote lots  
> of things.
>>
>> Well, why not use HTML? That gets around both this and the above.
>
> Judging by your responses, we seem to be talking at cross purposes. I
> assume you mean, use @type="html" instead of "xhtml", in which case,
> yes -- there's no reason at all to be producing malformed Atom feeds.

I mean using HTML anywhere where we use XHTML.

> In your original point 5, you complained that WordPress offers no
> guarantee of producing well-formed, valid XHTML. I assumed you were
> advocating such a guarantee.

I was advocating either ensuring that the output is valid (as any  
XHTML parser can throw a fatal error on invalid XHTML (and yes, I do  
mean invalid, not malformed)) or not using something where fatal  
errors can be thrown (such as HTML).

> To avoid further confusion, it would be helpful if you could outline
> exactly what it is that you would like to happen. I am specifically
> interested in points 1, 5 and 6. Jargon is better than imprecise
> simplification.

I'll do so later.

>> And suffer how browser makers have for years? "This feed works in x
>> aggregator, but it doesn't work in your aggregator. Please fix this
>> bug." — So then you go off and do more reverse-engineering of
>> malformed XML.
>
> To be fair, you'd have to do it anyway. :) WordPress isn't the only
> fish in the sea.

I don't really want to get into a long discussion about this, but I  
will say that with many major feed readers not doing any error  
handling, more minor ones can get away with it quite easily.

>
>> Using SAX would allow us to behave in similar ways as we already do.
>> Tag-balancing issues would never arise with a serialiser.
>
> The tag balancing problem arose in the context of a function to
> correct constructions like `<p><blockquote></p></blockquote>`. I don't
> see how a serialiser is relevant.

Parse it as HTML, then serialise it. You'll have well-formed and  
valid output. Trying to parse HTML using regular expressions will  
never work, simply due to the fact that there are de-facto UA (and  
yes, a parsing library is a UA) parsing rules that cannot be  
expressed as regular expressions. There are bugs with balanceTags()  
even under the limited subset of SGML that can be safely used in the  
real world (e.g., <img src="test.gif" alt="This is a test which is >  
*">).

- Geoffrey Sneddon