[wp-hackers] WordPress eXtended RSS DTD
foolistbar at googlemail.com
Mon Jul 21 18:06:15 GMT 2008
On 17 Jul 2008, at 15:27, Otto wrote:
> On Thu, Jul 17, 2008 at 9:13 AM, Geoffrey Sneddon
> <foolistbar at googlemail.com> wrote:
>> If the idea of WXR is to
>> allow easy movement between different blogging packages, then it has
>> miserably failed
> That was never the idea behind WXR, as far as I know. The idea was to
> make it easy to move between different WordPress installations and
> WordPress.com. Other blogging platforms were not considered.
> WXR is basically just RSS with some additional fields added to it for
> things like comments and such.
> Basically, it is simple enough that I think nobody has bothered to
> document it yet. Your request for high levels of documentation seems,
> to me, to be more than a little silly. Just *looking* at the damn
> thing would show you that it doesn't need an extreme amount of effort
> to produce.
> If you want it documented, then look at it and write a document for
Ah, then I must've been mislead, in part at least by Matt's comment,
"In fact it’d be nice if they [MT4] could export to WXR as well as
it’s pretty semantically rich". He also said, in a comment,
"However I still do plan to get a spec doc up for it one day." — this
was almost a year ago.
The problem with "just looking at the damn thing" is that I don't know
whether the title element is text/plain or text/html (this is
ambiguous in RSS 2.0, different aggregator implement different things,
some using complex algorithms to determine which it is), I don't know
where the comment author's name is text/plain or text/html.
So, I finally looked at the code you use to implement it for the first
time in years: it isn't even XML! It's a XML-like text format! '<title
amazing title</title>' should give 'My amazing title' as the content
if parsed with an XML parser, yet your non-XML parser gives '"
xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title'.
You just strip out <![CDATA[ and ]]> from the data, giving them
absolutely no meaning. <category><![CDATA[>foo>]]></category> and
<category>>foo></category> are two totally different categories
(the former is '>foo>', the latter is '<foo>').
It takes more effort than you seem to think, as I can't use a pre-
existing XML tool-chain to manipulate it. When (seeming the intention
is to spec it, so hopefully not "if") you define it, I hope it isn't
defined as an XML format. It plainly isn't. Either define it to be
what it is (a vaguely XML like format), or completely rewrite WP's
currently normative implementation to create a backwards incompatible
new version that is XML. There are a vast deal of differences from XML
On 18 Jul 2008, at 09:48, Xavier Borderie wrote:
> Yet Geoffrey is right too. The good thing with using XML is that it
> helps portability, yet without a proper DTD, WXR is as useful to other
> projects as a straight file-dump would be.
A DTD does next to nothing to actually document a format: it merely
defines what elements are allowed in the document, and what attributes
are allowed on what elements. They do nothing to actually define what
the content of the element actually is. DTDs don't even support
namespaces! Regardless, there is no point in having a DTD for WXR as
it isn't XML. DOCTYPEs are all but dead
anyway, for reasons that are entirely irrelevant.
More information about the wp-hackers