[wp-hackers] Porting static content

Bill Dennen dennen at gmail.com
Wed Feb 23 01:29:35 UTC 2011

>> I've used http://sourceforge.net/projects/simplehtmldom/ a number of times.

I've had good luck with that, and phpQuery.

phpQuery is a server-side, chainable, CSS3 selector driven Document
Object Model (DOM) API based on jQuery JavaScript Library

In our case, we first build a sitemap of the pages we wanted to
import. It's basically a hierarchy built as an mult-level, unordered
list. This allows us to maintain parent-child relationships between
pages when they are imported into WordPress. We loop over that
unordered list of links, scrape each page, and use phpQuery to select
different parts of the page based on jQuery selectors. We can also add
custom fields to these imported page using add_post_meta.


More information about the wp-hackers mailing list