[wp-testers] Wordpress scaling problems

Otto otto at ottodestruct.com
Thu Jan 29 20:23:32 GMT 2009


On Thu, Jan 29, 2009 at 1:43 PM, matthijs <matthijsenator at gmail.com> wrote:
> It's a good explanation for how wordpress does things currently. Thanks for
> that.
> I still don't agree about the performance issue mysql vs php. The first page
> in any php/mysql book explains how you should build your code and queries in
> such a way that you retrieve out of the database exactly what you need,
> instead of getting everything out and then trying to "query" that within php
> to get the real result your need.
>
> But let's not derail the thread into that. We're probably thinking about
> different scenarios.

Yes, we are. Your idea is correct in the general sense, but not in the
specific. WordPress tends to be geared towards the lowest common
denominator, which is shared hosting. One database handling lots and
lots of sites. In such a scenario, the database is doing tons of work.
Past experience shows that relying on the database for actual work
instead of mere storage is problematic at best.

In a broader perspective, the biggest bottleneck to a database is the
actual query processing. Once the rows have been retrieved into memory
by the database server, sending them across the wire is relatively
fast. This is simply another way of saying fewer queries is better
(obviously), but it also implies that simpler queries are better as
well. The notion of "getting everything out" and then querying locally
is generally not a good idea, but when what you get out is in a format
so that "querying" it is unnecessary, well then that changes things.

WordPress stores a lot of data as serialized arrays, rewrite_rules
included. A serialized array is simply a text version of PHP's own
array structure, so all it has to do is grab the serialized data,
shove it into an array, and then perform normal highly optimized array
functions on it. So you have two cases: a) complex select to get
exactly what you need or b) grab the array with a simple select, and
then use the indexed array lookup functions on it. B turns out to be
faster when the database is a bottleneck.

> Now that I am aware of the problem, it might be possible to choose different
> permalink strategies for my sites (even though I might not want to, good
> URLs are very important to me). But it's something else if your client calls
> you and asks why their site is so slow suddenly. And I can't change around
> the permalinks of their site now that it's been life for almost 2 years.

Two things to say here.

First, category based URLs are bad mojo for a lot of reasons, the main
one being that posts can have more than one category. So which
category gets in the permalink URL is unclear. If you stick another
category in there, you get the same post. Duplicate content = bad for
SEO.

Second, you can change the permalinks. WordPress should redirect old
permalinks to the correct place afterwards, due to canonical
redirection. Might want to test that though.

> What configuration are you thinking about? How could I research that? I'm no
> server administrator and have no control over most of the servers my sites
> are on. But sometimes it's possible to override configuration settings with
> htaccess. For the site I have the problem with I still need a long term
> solution (of which changing the permalinks is not one).

I doubt you can modify mysql settings unless you're the server admin.
Figuring out where the problem lies is not a particularly easy task.

> Yes, it's probably a bit of a rewrite of some code to adapt it to such a new
> situation. But I don't think maintaining the table would be more problematic
> then it is now to maintain the single serialized array. It's just a
> different kind of storage, isn't it?

Maintaining the array is easy by comparison. Just shove new data into
the array, serialize it, and stick it in the database. Doing lots of
selects and updates and inserts, not so much.

> I'm actually surprised that not that many people encounter this scaling
> issue (yet). Many sites have the category as the base for their permalinks
> and there are probably enough sites out there with more then a few hundred
> or thousand pages.

I think that the default max query size in mysql is 1 meg. You'd need
a lot of Posts to hit that limit. I think you might hit it faster if
you had a ton of Pages as well. Not sure, there may be a bug there
somewhere. Hard to say.

I know WP doesn't scale well with lots of Pages. Like Austin said, it
assumes that you have more Categories than Pages. WordPress Pages are
sort of an afterthought, in my opinion. They're meant to be in the
tens, not the hundreds.

-Otto


More information about the wp-testers mailing list