[wp-hackers] GSoC proposal for blog import/export using XML-RPC
squinn at gatech.edu
Thu Apr 1 01:01:16 UTC 2010
My name is Shannon :) I'm a long-time lurker, first-time poster, so
please ask for clarification on anything that might be confusing. And I
apologize in advance for the length.
I'm finishing up my master's degree in computational biology (graduated
with a bachelor's in computer science in 2008), and I'm hoping to put my
PHP skills to work for WordPress by augmenting the current WordPress
import/export functionality with over-the-wire XML-RPC options that are
pluggable for every other type of blog one would want to import from /
export to. Here are some of the details.
* Remove the currently supported blogs from the import and make each one
a plugin to the core import interface (though obviously WordPress would
be installed by default, possibly even included in the core install).
Many of the importers already use remote procedure calls to pull content
from those blogs, so that functionality would be moved from its current
location to a plugin whose calls would mesh with the core's API.
* Add an option to the current export functionality to send content
over-the-wire, via XML-RPC. This, like importing, would allow the user
to export content to blogs based on what plugins they have installed.
Options would be available to the user to decide whether they want to
export everything, or just published posts, or pages, or drafts, or even
pick individual items to export.
* In both cases, a robust ability to handle large amounts of information
as well as unplanned failures or user actions would need to be
implemented. Currently the WordPress core handles this well by resuming
import automatically if a user navigates away or the connection is
terminated, and this will be included. I'd also like to investigate the
possibility of batching these processes, or even sending them to a
cronjob. Furthermore, I'd like to investigate the XML-RPC implementation
itself to see if any optimizations can be made (if possible I'd love to
work with Joseph Scott on the XML-RPC investigations to find out if any
of that is plausible, as well as anyone else on the WordPress team who
has a greater depth of experience with XML-RPC than myself...which is
likely everyone), as previous work that I've done has shown that
exporting a few thousand entries at once - even just the titles and
bodies - can take quite awhile.
* An interesting point a few other previous posters made was about the
transfer of binary information, e.g. images and videos. According to the
Xmlrpc spec and the implementation used by WordPress, this is certainly
possible over XML-RPC, but it would also incur an overhead that could
easily translate to transfer time well beyond linear to the number of
items being sent. At the very least I'd like to make this an option that
is user-configurable - perhaps something which highlights the entries
containing multimedia content stored internally and its size, allowing
the user to pick and choose which are sent and which are kept. Again,
this also possibly be optimized using batch processing and/or cronjobs
to make this transfer as invisible to the user as possible.
I'll make a post about this soon on my WordPress blog
(magsol.wordpress.com) as my official GSoC application is just about
ready to go, but I wanted to go ahead and throw in my ideas for
consideration. I'm happy for any sort of feedback as I would absolutely
love the opportunity to work on this project over the summer :)
More information about the wp-hackers