[wp-hackers] Porting static content

Christopher Ross cross at thisismyurl.com
Tue Feb 22 22:55:22 UTC 2011


Scot, this sounds like a perfect use of the wp_insert_post() function


	$post = array(
	  'comment_status' => 'closed',
	  'ping_status' => 'closed',
	  'post_author' => $authorID,
	  'post_category' => $cat,
	  'post_content' => mysql_real_escape_string($_POST['post_content']),
	  'post_excerpt' => mysql_real_escape_string($_POST['post_excerpt']),
	  'post_status' => $blogpoststatus,
	  'post_title' => $posttitle,
	  'post_type' => 'post',
	  'tags_input' =>  $_POST['post_tags']
	);
	$wpid = wp_insert_post($post);


I did a government site a while back with similar restrictions, after downloading the content to a directory using an offline viewer I simply ran RegEx on the content until I had 50,000 usable documents. After that, I simply ran a PHP script to pull in 100 pages at time, post to the WP database and move those files.


On 2011-02-22, at 6:42 PM, Scot Hacker wrote:

> I have a client (an Ethiopian in exile) who has created a very popular static site comprised of 6,000 (!) pages... all hand-created in Notepad (yes, the wheels turn differently in some parts of the world). Amazing, I know. 
> 
> I'm building a Wordpress site for him, but the question is how to get all that old static content into the site. Fortunately he's based all the old articles on the same original file, so the document structure is highly regular.  In Python/Django I'd write a BeautifulSoup script to crawl the directory, scrape content into objects, and pump it in through the Django API. I'm sure similar solutions exist for PHP/WordPress but don't know where to start. Has anyone done a project like this? Do you have a skeleton script to share, or pointers on best way to proceed?
> 
> Thanks,
> Scot
> 
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers


__

Christopher Ross

http://christopherross.ca
http://www.thisismyurl.com



More information about the wp-hackers mailing list