[wp-hackers] Getting every page ID, URL, and Title on site

Mike Schinkel mikeschinkel at newclarity.net
Fri Apr 8 06:18:01 UTC 2011


On Apr 8, 2011, at 1:05 AM, Philip Walton wrote:
> Mike, your second approach still loads the content.

By "content", do you mean post_content and post_excerpt?  If yes, you are correct; WordPress doesn't allow a hook to (easily) omit loading post_content and post_excerpt.

That said, WordPress uses a permalink structure that, to be generic must be able to see each and every  field in a post. Yes, for a specific site you can optimize but in general you cannot bypass the hook that handles permalink structure if you want the permalinks to be 100% correct, and in those hooks the permalink may choose to inspect the post object.  By omitting post_content or post_excerpt we could trigger an incorrect email but my guess is that would be unlikely so if it were me I'd be willing to risk it.

That's just one of the things that one must deal with in exchange for WordPress' flexibility with permalinks.

At the bottom of this email is another version that omits the post_content and post_excerpt.

> If you run an SQL log of your second method, you'll notice at least one query per post, which is incredibly wasteful, but unfortunately what most archiving/sitemap plugins do.

It's only wasteful if it takes too long, runs out of memory or costs a significant amount extra for computing power.  Otherwise, it's like when a tree falls in the forest and nobody hears it; does the noise of falling really matter?

If you really need to have only one query you could pre-populate the 'posts' cache with objects that don't have post_content or post_excerpt but then you run into the problem of running another large query where you may run out of memory if you have too many posts. But when you are talking scale, things are rarely easy.

-Mike


<?php 
/*
	/tsv-export.php
*/

	include('wp-load.php');
	add_filter('query','tsv_export_modify_query');
	header('Content-type: text/tab-separated-values');
	global $wpdb;
	$posts = $wpdb->get_results("SELECT ID,post_type FROM {$wpdb->posts} WHERE post_status='publish'");
	foreach($posts as $post) {
		$title = get_the_title($post->ID);
		$post->ID = $post;
		$permalink = get_permalink($post);
		echo "{$post->ID}\t\"{$post->post_type}\"\t\"{$title}\"\t\"{$permalink}\"\n";
	}

function tsv_export_modify_query($query) {
	static $fields;
	global $wpdb;
	if (preg_match("#^SELECT [*]{1} FROM {$wpdb->posts} WHERE ID = ([0-9]+) LIMIT 1$#",$query)) {
		if (!isset($fields)) {
			$fields = true; // prevent recursion
			$post = (array)$wpdb->get_row($query);
			unset($post['post_content']);
			unset($post['post_exceprt']);
			$fields = implode(',',array_keys($post));
		}
		if (is_string($fields))
			$query = preg_replace('#^SELECT \*#',"SELECT {$fields}",$query);
	}
	return $query;
}



More information about the wp-hackers mailing list