[wp-hackers] WPM(U|S) and table partitions

Andy Skelton skeltoac at gmail.com
Tue Apr 6 16:15:38 UTC 2010

Hi, Elizabeth,

> Has anyone explored table partitioning with wordpress?

You haven't said it in these words, but I believe you want all of the
blogs' comments and posts in monolithic comments/posts tables. This
would allow you to run analytical queries against the entire dataset,
avoiding foreaches. Unless you are acting in an environment where
economics can be ignored, I would advise against it.

There was a project called Lyceum http://lyceum.ibiblio.org/ that
basically took WordPress MU in that direction. It has been defunct for
almost two years. It proved to be uneconomical to maintain a version
of WordPress that diverged at such a deep level.

My advice is to denormalize any statistical data that needs to be used
for real-time business decisions, and continue to use foreach for
periodic statistics. By denormalize I mean duplicate the data in a
separate table designed for the analytical need.


> I understand that for sites like wordpress.com and edublogs, having a
> separation of content is really important.  Not to mention that since what
> they have works fine, there's no reason to switch.  The site we're
> developing for my client is not for users to make blogs but for managing
> multiple sites.  So far wordpress mu has worked great.  Unfortunately, I've
> been running into various foreach($blog_id) $wpdb->get_results('wp_' .
> $blog_id . '_tablename'); issues. And having to create statistic tables just
> to be able to do otherwise simple joings.  It seems that with proper
> partitions that this would become less of an issue.
> I was thinking a basic range partition on blog_id for most tables, with a
> subpartition on date (month or year seeing how traffic goes) for comments
> and posts.  I have various plugins that would use the blog_id partition as
> well.
> Categories makes for an interesting design problem.  There exists already,
> wp_sitecategories, and the ids are the same ids used in wp_$blog_id_terms.
>  I have some ideas, but nothing concrete yet.
> From what I've seen of the internals, the table data is pretty abstracted.
>  With some if statements and reg exp, I think I can just add a ' WHERE
> blog_id=' . $blog_id to the query functions in the db class.  I'd allow for
> an 'all' => true or 'blog_id' => array(1..n) argument for easy sitewide
> searches.
> I'm still relatively new to Wordpress hacking, so I'd really love to get
> your opinion.  Are people interested in this as a contribution or option? Do
> you think wordpress's data abstraction is good enough to handle this?  There
> doesn't seem to be any performance related data with regards to Wordpress
> specifically, so if you are interested in performance data, what statements
> would you like me to run?
> (We have 60 blogs, with 40,000 posts with 2 or 3 more blogs with 100-200
> posts each a day.  I will be transferring that data to the test db to play
> around with.  I won't be activating any plugins at first.  I have a custom
> plugin with a bunch of user statistics data that I will then turn on and
> test)
> --
> Thanks,
> Elizabeth Buckwalter
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers

More information about the wp-hackers mailing list