[wp-trac] [WordPress Trac] #12137: Wordpress import module does not correctly parse XML

WordPress Trac wp-trac at lists.automattic.com
Fri Feb 5 10:56:37 UTC 2010


#12137: Wordpress import module does not correctly parse XML
--------------------------+-------------------------------------------------
 Reporter:  greggman      |       Owner:            
     Type:  defect (bug)  |      Status:  new       
 Priority:  normal        |   Milestone:  Unassigned
Component:  Import        |     Version:  2.9.1     
 Severity:  normal        |    Keywords:            
--------------------------+-------------------------------------------------
 I'm not sure if I can say this well. Basically the Wordpress import module
 claims to read a modified form of RSS which is based on XML. But the
 import module is not actually reading XML, it's just parsing text with
 hardcoded rules.  This means you can give perfectly valid XML files and it
 will fail

 Examples. In XML the following 2 lines represent exactly the same data

 <content:encoded>hello world</content:encoded>
 <content:encoded><![CDATA[hello world]]></content:encoded>

 Yet wordpress's import is hardcoded to require the second form.

 Another example, these 2 examples represent exactly the same data in XML

 --example 1--
 <wp:category><wp:cat_name>news</wp:cat_name></wp:category>
 --example 2-
 <wp:category>
 <wp:cat_name>news</wp:cat_name>
 </wp:category>

 Yet the wordpress importer is hardcoded to only except the first form.

 There are many other examples.

 The suggestion is to use the build in PHP XML libraries to read the files
 and then get the data from those. They will correctly parse XML data
 regardless of whitespace, entity or cdata differences.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/12137>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list