[wp-hackers] WordPress, web standards, and (X)HTML

Benjamin Hawkes-Lewis bhawkeslewis at googlemail.com
Thu Nov 30 19:08:53 GMT 2006


It's been suggested that we move this discussion
[ http://trac.wordpress.org/ticket/3406 ] from Trac to this mailing
list.

WordPress bills itself on its homepage as producing standard markup.

By default, WordPress produces a variation on XHTML 1.0 markup which is
always served as text/html. There appears to be no effective enforcement
of conformance to protect against user "stupidity" or misbehaving
plugins. As a result, /in practice/, many (if not most) WordPress blogs
include non-conformant, even non-validating, pages.

XHTML 1.0 should be served as application/xhtml+xml to supporting user
agents. Serving XHTML 1.0 as text/html was envisaged by the XHTML 1.0
specification writers as a compatibility hack for legacy browsers, not
as the main MIME type for XHTML 1.0 content.

There is no specification for how user agents should parse or render
XHTML 1.0 served as text/html, except that they should try and mimic
rival user agents' error recovery:

http://www.w3.org/TR/xhtml1/#guidelines

http://www.apps.ietf.org/rfc/rfc2854.html#page-3

Most user agents hand such XHTML over to the same tag soup parsers that
parse any other HTML. These parsers are only capable of rendering XHTML
1.0 acceptably because of their historical failure to conform to the
HTML 4.01 specification in the first place. Worse, certain JavaScript
and CSS techniques that work with documents served as text/html will not
work when the same documents are served as application/xhtml+xml. For
this and other reasons, many consider serving XHTML 1.0 as text/html at
all to be bad practice:

http://www.hixie.ch/advocacy/xhtml

http://www.webdevout.net/articles/beware_of_xhtml.php

I like XHTML. But I believe that best practice is to either:

1) Serve XHTML 1.1 or modular XHTML to user agents that can handle
application/xhtml+xml correctly and HTML 4.01 Strict to user agents that
cannot.

OR

2) More simply, serve HTML 4.01 Strict to everyone.

I can understand why WordPress developers should be loathe to break
existing templates. Rather than breaking existing templates, I suggest
that a standards-compliance mode be implemented where either 1) or 2)
takes place.

At the /very least/, WordPress should serve XHTML 1.x as
application/xhtml+xml to supporting user agents and reserve text/html
for non-supporting user agents. 

Such a change will be crucial if WordPress wishes to eventually migrate
to Web Applications 1.0 or XHTML 2.0.

If this forces WordPress to double-check markup for well-formedness,
validity, and conformance, so much the better. People put a lot of
effort into writing their blogs. They deserve markup that user agents
can read correctly if they follow the specifications; markup that will
still be parsable in a hundred years time when Internet Explorer and
Firefox will be long forgotten.

P.S. XHTML 1.1 should NEVER be served as text/html. To do so with the
WordPress blog itself [ http://wordpress.org/development/ ] radically
undermines WordPress's claim to be the choice for those who want
standard markup. (Yes, it validates. The W3C Validator pretends all
XHTML documents were received with the proper application/xhtml+xml MIME
type.)

--
Benjamin Hawkes-Lewis



More information about the wp-hackers mailing list