[wp-hackers] WP issues
Geoffrey Sneddon
foolistbar at googlemail.com
Sun Jun 3 11:13:37 GMT 2007
On 2 Jun 2007, at 22:16, Peter Westwood wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Geoffrey Sneddon wrote:
>> On 2 Jun 2007, at 14:46, Sam Angove wrote:
>>> The term "serialiser" is vague (what are you serialising from?),
>>> but I
>>> assume you meant that the output should be built as, say, a DOM
>>> object, then serialised from it to a text|application/xml
>>> document. If
>>> so, then I disagree. It's not a magic bullet.
>>
>> Any XML structure, whether it be SAX, DOM, or something else.
>>
>
> SAX is not relavent in the context of generating XML - SAX is all
> about
> parsing xml in a simple way without having to build an in memory model
> of the document to navigate like you do with a DOM parser.
Try telling that to the developers of gnu.xml.util.XMLWriter, a SAX
based serialiser.
> The biggest problem with any XML output serialiser that want to ensure
> the document is well formed before providing it is the fact that they
> just don't scale well. Especially not in a web context.
Then cache the output. The output doesn't change on every load.
> The memory usage of a DOM model of the page and the delay introduced
> before sending any content just doesn't seem worth anything for the
> user
> when you consider the fact that all it can do is stop you sending the
> invalid XML it can't actually fix the problem.
Almost every serialiser I've come across _ALWAYS_ gives valid output
even with broken input.
>>> Most errors occur when users save posts and comments full of
>>> malformed
>>> markup and bad character data. Building output as an XML DOM won't
>>> help with that at all, because the broken input comes in as a string
>>> and will need to be corrected beforehand. If that problem can be
>>> solved, the class of errors that a serialiser would catch are
>>> comparatively easy to handle.
>>
>> The serialiser will ensure that that it is well-formed, so would
>> therefore strip invalid characters.
>>
>
> The problem here is that if the output of your generator is invalid
> XML
> then you need to fix the generator - wrapping it in a box and
> hiding the
> fact that is doesn't work doesn't help anyone - the user still has to
> fix what is being generated in order to get the output they want!
But a bug in a single library, as opposed to twenty places where
strings are being put together, is less likely, as it receives twenty
times the amount of testing.
>> Using SAX would allow us to behave in similar ways as we already do.
>> Tag-balancing issues would never arise with a serialiser. You're
>> never
>> going to have test suites to test everything. Something explicitly
>> designed to avoid these errors would avoid them happening. There are
>> literally thousands of places in WP where I can insert content
>> that'll
>> cause a fatal error.
>
> As I have said above SAX doesn't help. Serialisers only ensure valid
> html in a purely technical sense.
As I have said above there are SAX serialisers. Surely if we have a
focus on web standards making sure output is valid is a focus in itself?
If you really want to ensure WP's output is fine, regardless of the
input, develop with XHTML sent as application/xhtml+xml AND use a
DOCTYPE-aware XML parser. I guarantee it won't be hard to break.
- Geoffrey Sneddon
More information about the wp-hackers
mailing list