[wp-hackers] WP issues

Geoffrey Sneddon foolistbar at googlemail.com
Sun Jun 3 11:13:37 GMT 2007

On 2 Jun 2007, at 22:16, Peter Westwood wrote:

> Hash: SHA1
> Geoffrey Sneddon wrote:
>> On 2 Jun 2007, at 14:46, Sam Angove wrote:
>>> The term "serialiser" is vague (what are you serialising from?),  
>>> but I
>>> assume you meant that the output should be built as, say, a DOM
>>> object, then serialised from it to a text|application/xml  
>>> document. If
>>> so, then I disagree. It's not a magic bullet.
>> Any XML structure, whether it be SAX, DOM, or something else.
> SAX is not relavent in the context of generating XML - SAX is all  
> about
> parsing xml in a simple way without having to build an in memory model
> of the document to navigate like you do with a DOM parser.

Try telling that to the developers of gnu.xml.util.XMLWriter, a SAX  
based serialiser.

> The biggest problem with any XML output serialiser that want to ensure
> the document is well formed before providing it is the fact that they
> just don't scale well.  Especially not in a web context.

Then cache the output. The output doesn't change on every load.

> The memory usage of a DOM model of the page and the delay introduced
> before sending any content just doesn't seem worth anything for the  
> user
> when you consider the fact that all it can do is stop you sending the
> invalid XML it can't actually fix the problem.

Almost every serialiser I've come across _ALWAYS_ gives valid output  
even with broken input.

>>> Most errors occur when users save posts and comments full of  
>>> malformed
>>> markup and bad character data. Building output as an XML DOM won't
>>> help with that at all, because the broken input comes in as a string
>>> and will need to be corrected beforehand. If that problem can be
>>> solved, the class of errors that a serialiser would catch are
>>> comparatively easy to handle.
>> The serialiser will ensure that  that it is well-formed, so would
>> therefore strip invalid characters.
> The problem here is that if the output of your generator is invalid  
> then you need to fix the generator - wrapping it in a box and  
> hiding the
> fact that is doesn't work doesn't help anyone - the user still has to
> fix what is being generated in order to get the output they want!

But a bug in a single library, as opposed to twenty places where  
strings are being put together, is less likely, as it receives twenty  
times the amount of testing.

>> Using SAX would allow us to behave in similar ways as we already do.
>> Tag-balancing issues would never arise with a serialiser. You're  
>> never
>> going to have test suites to test everything. Something explicitly
>> designed to avoid these errors would avoid them happening. There are
>> literally thousands of places in WP where I can insert content  
>> that'll
>> cause a fatal error.
> As I have said above SAX doesn't help. Serialisers only ensure valid
> html in a purely technical sense.

As I have said above there are SAX serialisers. Surely if we have a  
focus on web standards making sure output is valid is a focus in itself?

If you really want to ensure WP's output is fine, regardless of the  
input, develop with XHTML sent as application/xhtml+xml AND use a  
DOCTYPE-aware XML parser. I guarantee it won't be hard to break.

- Geoffrey Sneddon

More information about the wp-hackers mailing list