[wp-hackers] GSoC proposal for blog import/export using XML-RPC

Shannon Quinn squinn at gatech.edu
Thu Apr 1 01:01:16 UTC 2010


Hi all,

My name is Shannon :) I'm a long-time lurker, first-time poster, so 
please ask for clarification on anything that might be confusing. And I 
apologize in advance for the length.

I'm finishing up my master's degree in computational biology (graduated 
with a bachelor's in computer science in 2008), and I'm hoping to put my 
PHP skills to work for WordPress by augmenting the current WordPress 
import/export functionality with over-the-wire XML-RPC options that are 
pluggable for every other type of blog one would want to import from / 
export to. Here are some of the details.

* Remove the currently supported blogs from the import and make each one 
a plugin to the core import interface (though obviously WordPress would 
be installed by default, possibly even included in the core install). 
Many of the importers already use remote procedure calls to pull content 
from those blogs, so that functionality would be moved from its current 
location to a plugin whose calls would mesh with the core's API.

* Add an option to the current export functionality to send content 
over-the-wire, via XML-RPC. This, like importing, would allow the user 
to export content to blogs based on what plugins they have installed. 
Options would be available to the user to decide whether they want to 
export everything, or just published posts, or pages, or drafts, or even 
pick individual items to export.

* In both cases, a robust ability to handle large amounts of information 
as well as unplanned failures or user actions would need to be 
implemented. Currently the WordPress core handles this well by resuming 
import automatically if a user navigates away or the connection is 
terminated, and this will be included. I'd also like to investigate the 
possibility of batching these processes, or even sending them to a 
cronjob. Furthermore, I'd like to investigate the XML-RPC implementation 
itself to see if any optimizations can be made (if possible I'd love to 
work with Joseph Scott on the XML-RPC investigations to find out if any 
of that is plausible, as well as anyone else on the WordPress team who 
has a greater depth of experience with XML-RPC than myself...which is 
likely everyone), as previous work that I've done has shown that 
exporting a few thousand entries at once - even just the titles and 
bodies - can take quite awhile.

* An interesting point a few other previous posters made was about the 
transfer of binary information, e.g. images and videos. According to the 
Xmlrpc spec and the implementation used by WordPress, this is certainly 
possible over XML-RPC, but it would also incur an overhead that could 
easily translate to transfer time well beyond linear to the number of 
items being sent. At the very least I'd like to make this an option that 
is user-configurable - perhaps something which highlights the entries 
containing multimedia content stored internally and its size, allowing 
the user to pick and choose which are sent and which are kept. Again, 
this also possibly be optimized using batch processing and/or cronjobs 
to make this transfer as invisible to the user as possible.

I'll make a post about this soon on my WordPress blog 
(magsol.wordpress.com) as my official GSoC application is just about 
ready to go, but I wanted to go ahead and throw in my ideas for 
consideration. I'm happy for any sort of feedback as I would absolutely 
love the opportunity to work on this project over the summer :)

Thank you!

Regards,
Shannon


More information about the wp-hackers mailing list