[wp-trac] [WordPress Trac] #15197: WXR export/import umbrella ticket

Sun Oct 24 11:00:19 UTC 2010

#15197: WXR export/import umbrella ticket
--------------------------+-------------------------------------------------
 Reporter:  duck_         |       Owner:                       
     Type:  defect (bug)  |      Status:  new                  
 Priority:  normal        |   Milestone:  Awaiting Review      
Component:  Export        |     Version:                       
 Severity:  normal        |    Keywords:  has-patch ux-feedback
--------------------------+-------------------------------------------------

Old description:

> Umbrella ticket for a number of upgrades to the WXR export/import
> process.
>
> == Export ==
>  * Bump WXR version to 1.1
>  * Removed filtering ''for now'' (see explanation below)
>  * Removed `wxr_missing_parents` (local function), seems to be a remnant
> from pre-`get_categories`
>  * Added author information to export (for better import UX) - #11118
>  * Greater usage of slug-like identifiers, e.g. login instead of name in
> <dc:creator>
>  * Don't export auto-drafts
>  * Filled in docs
>  * Ignore _edit_lock and _edit_last meta keys
>  * Only use the 'forward compatible' term tags, `<category domain="foo"
> nicename="bar">`, within post items
>
> == Import ==
>  * Use an XML parser (where available). 3 parser options:
> [http://www.php.net/manual/en/book.simplexml.php SimpleXML] (yay!),
> [http://www.php.net/manual/en/book.xml.php XML Parser] (yay!), or regular
> expressions (boo!)
>  * Proper import support for nav menus - #14750
>    * Menu items for missing content will be skipped, there ''should'' be
> no problems when an associated object is further down the import file
> than the menu item
>    * Orphaned menu items (e.g. their parent was skipped due to above
> point) will become top-level
>  * Greater usage of slug-like identifiers, e.g. Use `<category
> domain="..." nicename="...">` tags to fix a bunch of category issues
>  * Either import author as is (i.e. from information stored in WXR file,
> this allows us to create a user with more data by default) or map to an
> existing user - #10319
>  * Less direct feedback (ignoring errors, currently none :( !), as it is
> unwieldy for a large import.
>
> All accompanied by a number of smaller changes and anything I forgot to
> write down.
>
> == Further work ==
>
> === Backwards Compatibility ===
> The main problem for now is ensuring backwards compatibility with WXR 1.0
> files. That said, no major faults ''should'' occur when importing a 1.0
> file. Excluding all the problems you will come across already in an
> export/import in 3.0.1:
>  * No author import (the current importer takes author data from each
> post)[[BR]]'''Possible solution:''' if we get an empty author array then
> loop through posts grabbing unique authors and offering to map them (but
> not to import)
>  * I think (not tested properly yet) that all term menu items will be
> skipped due to missing term_id XML tags so no way of mapping old ID to
> new[[BR]]'''Possible solution:''' off the top of my head, maybe slugs
> instead of IDs for processed_terms mapping (?)
>  * Probably some indexes and vars which need to be checked with isset and
> fallback provided (for when the XML tag doesn't exist in 1.0 files)
>  * ... and possibly more with further testing
>
> How far should this go back?
> Example: 3 years ago [6375] introduced forwards compatible category tags
> including the slug and taxonomy. These are the only category tags the
> parsers currently read, is it worth checking the really old style XML
> tags if no terms are found for a post (should be easy for SimpleXML and
> regular expressions, but I think will be harder for XML Parser)?
>
> === The problem of filtering ===
>  * Potential to export a pretty useless file, e.g. choose Category:
> Uncategorized and Content Type: Pages
>  * Makes reliable importing of nav menus harder (worse UX when importer
> is creating half made menus)
>
> Moving forward I am currently imagining some sort of grid of post types
> selectable by checkbox. Each post type lists its taxonomies below, these
> are
> only activated/recognised if the post type is selected. But what filters
> to include and how to show them are probably for another ticket.
>
> === Other ===
>
> The feedback from the importer needs to be completed (see above), I was
> thinking of listing errors (default hidden with JS show?) and a table of
> results showing the number of successes and failures for each of authors,
> posts, terms, ...
>
> The `can_export` property of a post type only enables it to show up in
> the Content Types dropdown for export filtering, but if "All Content" is
> selected then all post types are exported including those with can_export
> set to false. Fix based on export patch here could be something like:
> {{{
> $post_types = get_post_types( array( 'public' => true, 'can_export' =>
> true ) );
> $where = "post_type IN ('" . implode( "','", $post_types ) . "') AND
> post_status != 'auto-draft'";
> // grab a snapshot of post IDs, just in case it changes during the export
> $post_ids = $wpdb->get_col( "SELECT ID FROM $wpdb->posts WHERE $where
> ORDER BY post_date_gmt ASC" );
> }}}
> (NB: would need to look into exactly which builtin posts are and should
> be can_export => false)
>
> Docs in the importer.
>
> Currently I have unit tests for the parsers and hopefully coming soon
> will be more for the whole process (need to think up a full checklist of
> tests for edge and problem cases)
>
> ----
>
> This is still partly a work in progress so feedback and a lot of testing
> please. Thank you.
>
> This ticket aims to fix the following:
> #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574
> #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750
> #15055 #15091 #15108

New description:

 Umbrella ticket for a number of upgrades to the WXR export/import process.

 == Export ==
  * Bump WXR version to 1.1
  * Removed filtering ''for now'' (see explanation below)
  * Removed `wxr_missing_parents` (local function), seems to be a remnant
 from pre-`get_categories`
  * Added author information to export (for better import UX) - #11118
  * Greater usage of slug-like identifiers, e.g. login instead of name in
 <dc:creator>
  * Don't export auto-drafts
  * Filled in docs
  * Ignore _edit_lock and _edit_last meta keys
  * Only use the 'forward compatible' term tags, `<category domain="foo"
 nicename="bar">`, within post items

 == Import ==
  * Use an XML parser (where available). 3 parser options:
 [http://www.php.net/manual/en/book.simplexml.php SimpleXML] (yay!),
 [http://www.php.net/manual/en/book.xml.php XML Parser] (yay!), or regular
 expressions (boo!)
  * Proper import support for nav menus - #14750
    * Menu items for missing content will be skipped, there ''should'' be
 no problems when an associated object is further down the import file than
 the menu item
    * Orphaned menu items (e.g. their parent was skipped due to above
 point) will become top-level
  * Greater usage of slug-like identifiers, e.g. Use `<category
 domain="..." nicename="...">` tags to fix a bunch of category issues
  * Either import author as is (i.e. from information stored in WXR file,
 this allows us to create a user with more data by default) or map to an
 existing user - #10319
  * Less direct feedback (ignoring errors, currently none :( !), as it is
 unwieldy for a large import.

 All accompanied by a number of smaller changes and anything I forgot to
 write down.

 == Further work ==

 === Backwards Compatibility ===
 The main problem for now is ensuring backwards compatibility with WXR 1.0
 files. That said, no major faults ''should'' occur when importing a 1.0
 file. Excluding all the problems you will come across already in an
 export/import in 3.0.1:
  * No author import (the current importer takes author data from each
 post)[[BR]]'''SOLVED:''' if we get an empty author array then loop through
 posts grabbing unique authors and offering to map them (but not to import)
  * ~~All term menu items will be skipped due to missing term_id XML tags
 '''Possible solution:''' slugs instead of IDs for processed_terms
 mapping?~~ In fact, as far as I can see, filling imported menus is
 actually impossible with WXR 1.0 since the file doesn't contain custom
 terms for post items, see #13453 and #14306, so we don't know which menu
 to assign the menu items to
  * Probably some indexes and vars which need to be checked with isset and
 fallback provided (for when the XML tag doesn't exist in 1.0 files)
  * ... and possibly more with further testing

 How far should this go back?
 Example: 3 years ago [6375] introduced forwards compatible category tags
 including the slug and taxonomy. These are the only category tags the
 parsers currently read, is it worth checking the really old style XML tags
 if no terms are found for a post (should be easy for SimpleXML and regular
 expressions, but I think will be harder for XML Parser)?

 === The problem of filtering ===
  * Potential to export a pretty useless file, e.g. choose Category:
 Uncategorized and Content Type: Pages
  * Makes reliable importing of nav menus harder (worse UX when importer is
 creating half made menus)

 ~~Moving forward I am currently imagining some sort of grid of post types
 selectable by checkbox. Each post type lists its taxonomies below, these
 are only activated/recognised if the post type is selected. But what
 filters to include and how to show them are probably for another ticket.~~

 See [comment:ticket:15197:3 nacin's comment] and
 [attachment:ticket:15197:15197.filtering.png mockup] for the current plan
 for export filtering.

 === Other ===

 The feedback from the importer needs to be completed (see above), I was
 thinking of listing errors (default hidden with JS show?) and a table of
 results showing the number of successes and failures for each of authors,
 posts, terms, ...

 The `can_export` property of a post type only enables it to show up in the
 Content Types dropdown for export filtering, but if "All Content" is
 selected then all post types are exported including those with can_export
 set to false. Fix based on export patch here could be something like:
 {{{
 $post_types = get_post_types( array( 'public' => true, 'can_export' =>
 true ) );
 $where = "post_type IN ('" . implode( "','", $post_types ) . "') AND
 post_status != 'auto-draft'";
 // grab a snapshot of post IDs, just in case it changes during the export
 $post_ids = $wpdb->get_col( "SELECT ID FROM $wpdb->posts WHERE $where
 ORDER BY post_date_gmt ASC" );
 }}}
 (NB: would need to look into exactly which builtin posts are and should be
 can_export => false)

 Docs in the importer.

 Currently I have unit tests for the parsers and hopefully coming soon will
 be more for the whole process (need to think up a full checklist of tests
 for edge and problem cases)

 ----

 This is still partly a work in progress so feedback and a lot of testing
 please. Thank you.

 This ticket aims to fix the following:
 #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574
 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750
 #15055 #15091 #15108

--

Comment(by duck_):

 New import patch coming soon with WXR 1.0 author fix and a few other
 things.

 The current todo:
  * Give better feedback to the user at the end of import
  * Re-implement export filtering
  * Double check for undefined variables/indexes on WXR 1.0 import

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/15197#comment:4>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software