[wp-hackers] WordPress eXtended RSS DTD

Geoffrey Sneddon foolistbar at googlemail.com
Mon Jul 21 18:06:15 GMT 2008

On 17 Jul 2008, at 15:27, Otto wrote:

> On Thu, Jul 17, 2008 at 9:13 AM, Geoffrey Sneddon
> <foolistbar at googlemail.com> wrote:
>> If the idea of WXR is to
>> allow easy movement between different blogging packages, then it has
>> miserably failed
> That was never the idea behind WXR, as far as I know. The idea was to
> make it easy to move between different WordPress installations and
> WordPress.com. Other blogging platforms were not considered.
> WXR is basically just RSS with some additional fields added to it for
> things like comments and such.
> Basically, it is simple enough that I think nobody has bothered to
> document it yet. Your request for high levels of documentation seems,
> to me, to be more than a little silly. Just *looking* at the damn
> thing would show you that it doesn't need an extreme amount of effort
> to produce.
> If you want it documented, then look at it and write a document for  
> it.

Ah, then I must've been mislead, in part at least by Matt's comment,  
"In fact it’d be nice if they [MT4] could export to WXR as well as  
it’s pretty semantically rich"[1]. He also said, in a comment,  
"However I still do plan to get a spec doc up for it one day." — this  
was almost a year ago.

The problem with "just looking at the damn thing" is that I don't know  
whether the title element is text/plain or text/html (this is  
ambiguous in RSS 2.0, different aggregator implement different things,  
some using complex algorithms to determine which it is), I don't know  
where the comment author's name is text/plain or text/html.

So, I finally looked at the code you use to implement it for the first  
time in years: it isn't even XML! It's a XML-like text format! '<title  
foo:bar="heh>" xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My  
amazing title</title>' should give 'My amazing title' as the content  
if parsed with an XML parser, yet your non-XML parser gives '"  
xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title'.  
You just strip out <![CDATA[ and ]]> from the data, giving them  
absolutely no meaning. <category><![CDATA[&gt;foo>]]></category> and  
<category>&gt;foo></category> are two totally different categories  
(the former is '&gt;foo>', the latter is '<foo>').

It takes more effort than you seem to think, as I can't use a pre- 
existing XML tool-chain to manipulate it. When (seeming the intention  
is to spec it, so hopefully not "if") you define it, I hope it isn't  
defined as an XML format. It plainly isn't. Either define it to be  
what it is (a vaguely XML like format), or completely rewrite WP's  
currently normative implementation to create a backwards incompatible  
new version that is XML. There are a vast deal of differences from XML  
in it.

On 18 Jul 2008, at 09:48, Xavier Borderie wrote:

> Yet Geoffrey is right too. The good thing with using XML is that it
> helps portability, yet without a proper DTD, WXR is as useful to other
> projects as a straight file-dump would be.

A DTD does next to nothing to actually document a format: it merely  
defines what elements are allowed in the document, and what attributes  
are allowed on what elements. They do nothing to actually define what  
the content of the element actually is. DTDs don't even support  
namespaces! Regardless, there is no point in having a DTD for WXR as  
it isn't XML. DOCTYPEs are all but dead
anyway, for reasons that are entirely irrelevant.

Geoffrey Sneddon

[1]: http://ma.tt/2007/08/movabletype-4-vs-wordpress-22/

More information about the wp-hackers mailing list