[wp-hackers] WordPress, web standards, and (X)HTML

Benjamin Hawkes-Lewis bhawkeslewis at googlemail.com
Tue Dec 5 21:26:16 GMT 2006


Just wanted to add a couple points to what Sam Angrove has already said.

On Tue, 2006-12-05 at 11:06 +0200, Computer Guru wrote: 
> I know my blog doesn't validate - I don't have time for it :)

Yes. One of the things I'm arguing is that WordPress ought to be
ensuring that your markup validates. The fact that you'd have to spend a
lot of time making sure your markup validates is a good indication that
something is broken with the current WordPress design.

> My main site, however, is:
> http://validator.w3.org/check?uri=http%3A%2F%2Fneosmart.net%2F
> 
> HTML 4 is ancient. If you go with HTML 4, when (in the future obviously) IE
> and the rest of the world becomes XHTML-compliant, you're going to have to
> do a lot of work just to make it work with XHTML.

If browsers ever emerge that cannot render HTML 4.01 Strict, then there
are programs that can convert from HTML to XHTML 1.0 Strict, like Tidy.
And if WordPress develops an engine for serializing content to multiple
document formats, serving XHTML instead would involve no additional work
for the author. 

Of course, even real XHTML 1.0 Strict is a bit pointless compared to
compound document formats, XHTML 2, and WHATWG's XHTML serialization.
Moving from either HTML 4.01 or XHTML 1.0 to those may well involve
author work, because all three introduce new semantics. But developing
an engine to serialize content to multiple document formats should help
minimize such work.

> HTML 4 has many issues, it's not as strict (even HTML 4.01 STRICT), and
> there is no point to *going back to it* after already switching to
> XHTMl-compliant code.

Can you offer an example of such "strictness" benefiting visitors?

> AS a matter of fact though, EVEN if you were to use HTML 4.01 strict, *XHTML
> STILL VALIDATES*
> Here:
> http://validator.w3.org/check?uri=http%3A%2F%2Fneosmart.net%2F&charset=%28de
> tect+automatically%29&doctype=HTML+4.01+Strict
> Basically, revalidating my XHTML-compatible homepage as HTML 4. The only
> error is the XML-descriptor at the top - something that HTML doesn't use.

Even if it validated completely, that wouldn't help since the same
markup means something different in HTML 4.01, due to different
treatment of />.

> So if XHTML is 100% backwards-compatible with HTML 4 - why the hell would
> you take a giant step back and make it ONLY compatible with HTML 4 and kill
> off XHTML compatibility?

XHTML 1.0 is in no sense 100% backwards-compatible with either HTML 4.01
or even our broken user agents. Instead, a certain subset of XHTML will
work most of the time because of bugs in our user agents.

What evidence do you have that using HTML 4.01 would be a "giant step
back"? Anyhow, XHTML-devotees (in which party I include myself) should
be arguing for a system that can serialize to both, not a system that
churns out faux XHTML. Because we want to take a leap forward.

>  4. Author decides to send the same content as application/xhtml+xml,
>     because it is, after all, XHTML.
> [snip] 
> We have number 1 down just fine, and 2, and 3.
> So long as you *don't* do number 4, ABSOLOUTELY NOTHING WRONG OR BAD OR
> HARMFUL HAPPENS.

You began this email by claiming (wrongly) that when mainstream browsers
can parse and render application/xhtml+xml, it would involve a lot of
work to switch content from HTML to XHTML. But now you seem to be
suggesting that we should never switch to application/xhtml+xml, and
that we ignore the view of the W3C HTML working group that we should
serve that MIME type to supporting browsers:

http://www.w3.org/TR/xhtml-media-types

So which is it?

> My point is, HTML4 is old and decrepit. Whether in a year or 5, we *will*
> switch to pure XHTML. 

Internet Explorer 7 will still be around in 5 years time. While
people /might/ start making more use of real XHTML if Internet Explorer
8 adds support, it is unlikely that that commercial sites will be able
to avoid still sending some content as text/html.

It's faux XHTML that is "old and decrepit". HTML 4.01 became a
recommendation in December 1999. XHTML 1.0 became a recommendation a
month later in January 2000. But since then we've seen a new version of
XHTML which MUST not be served as text/html: XHTML 1.1 (May 2001).
Compound documents including SVG or MathML (i.e. the only documents
which at this juncture will actually gain from using XHTML) should use
+xml media types not text/html.

XHTML 1.0 itself, and the HTML compatibility appendix especially, were
intended as transitional technologies. The editors certainly didn't
envisage that in 2006 people would be fighting a rearguard battle to
make this underspecified hack into the mainstream web markup. They
thought people would move onto an XML-based markup world. Systems like
WordPress are standing directly in the way of the transition, while
talking up their standards compliance. XHTML has been turned into a mere
gimmick, at the expense of future authors and readers.

Now the race is on to replace XHTML 1.1 with XHTML 2 or Web Applications
1.0. Serving faux XHTML ensures WordPress will be in no position to move
to either.

> Using HTML4 now simply means having to do a lot more
> work later. If we can continue to use our XHTML-as-HTML implementation
> without problems (this is all "theoretical" problems in the here and now, no
> actual, real-world issues) and ease the transition later, whyever not?

They're should be tangible benefits to balance "theoretical" problems. 

> As is mentioned, this document WAS WRITTEN IN 2002!!! THINGS CHANGE.

Note the document says that "It has since been regularly updated to
correct errors that have been brought up in various mailing lists and
other discussion forums. As of late 2004, it is still just as relevant
as when it was originally written." What exactly do you think has
changed since late 2004?

> Today, XHTML *IS* correct. Browsers have improved. They won't render this
> stuff wrong. XHTML is the way to go, and HTML4 is dead. Think about it -
> 2002?! 

Sorry, this was too vague for me to work out what you mean. Which XHTML?
Correct how? Improved how? What is "this stuff"? And why do you think
representatives from Opera, Mozilla, and Apple are working on a new HTML
specification, and even the W3C is getting back in on the act:

http://dig.csail.mit.edu/breadcrumbs/node/166

> (erm.. I kinda got carried away, please, no offense intended to anyone
> pushing this idea on the list... I just feel strongly about taking huge
> steps backward :)

Definitely no offence taken. :) I feel strongly about these things too. 

But I'm left a little bit unclear about exactly what you're advocating,
and still more unclear about /why/. You don't seem to have any
particular use for XML-based markup, otherwise you'd be pushing for
alternatives 1) or 2). The only advantages of faux XHTML you've
suggested are "strictness" and novelty. 

Please tell us, what need of yours would not be met by /either/ a
"stricter" lint checker for HTML 4.01 Strict or by WHATWG's HTML
serialization?

--
Benjamin Hawkes-Lewis



More information about the wp-hackers mailing list