[wp-testers] Wordpress scaling problems

Thu Jan 29 18:54:39 GMT 2009

On Thu, Jan 29, 2009 at 5:43 AM, matthijs <matthijsenator at gmail.com> wrote:
> Again, I understand that in the - current - wordpress design this is how it
> goes. But if you forget the current design for a moment, and think about how
> you could otherwise design a system handling permalinks. I can hardly
> believe there is no other way to do this. How are all those other websites
> with more then a few hundred pages doing it then?

Actually, I think this deserves a bit more discussion... Let's
consider a permalink like %category%/%postname%.

So you're handed a URL like /mycat/mypost. You start by parsing it
into mycat and mypost. You don't know what these are. They're just
strings to you. So, first, you have to consider what "mycat" is.

First, you query to see if "mycat" is a pagename. This is a select
from wp_posts where post_slug = mycat and post_type = page. No joy
there.

Next, you query to see if "mycat" is a category. This is a select from
wp_terms join wp_term_taxonomy on (term_id = term_id) where term =
mycat and taxonomy = category. Hey, we found a mycat, so that's good.
Unfortunately, this just tells us that it's a category, which is
rather useless in retrieving the actual post we're looking for. So we
ignore the category.

Now, we move on to the "mypost". Again, we start querying:
1. Is it a page?  select from wp_posts where post_slug = mypost and
post_type = page. Nope.
2. Is it a category?  select from wp_terms join wp_term_taxonomy on
(term_id = term_id) where term = mypost and taxonomy = category. Nope.
3. Is it a post? select from wp_posts where post_slug = mypost and
post_type = post. Bingo.

The whole goal is to determine the specific post being asked for. The
category is not helpful in this respect, and we have to do a couple
queries just to figure out that we need to ignore it. Five queries to
determine what the post is with this structure. Five queries, two of
them expensive (joins ain't cheap). And these have to happen on every
load of a post on your site.

With the pre-built list of rewrite rules, we already have the data in
memory, in a big array. All that need be done with the current case is
an array lookup on the url. And the query to get that list was one
simple one, just get the rewrite_rules option. In fact, most all the
options are preloaded with one big query at the beginning load anyway,
so it's a no-query operation.

The only reason you're running 2400 queries is because your
rewrite_rules are not being saved in the first place. Now, I grant you
that 2400 queries is a heck of a lot to build the rewrite rule list in
the first place, and no doubt some optimization can be done there, but
if that data was saved properly when you built the rewrite rules, then
the pre-calculation saves time and queries and makes it much faster.
The only reason it's not working is that the data isn't being saved.

It's certainly possible to conceive of an alternate strategy. What if
we stored all the rewrite rules in a separate table? Now we can pull
from that table as needed, and each row is quite small by comparison.
This would certainly work for this specific case, but it makes other
cases require more queries. Since now we don't have them in memory any
more, we have to pull a new row for each different rewrite case.

There's no particularly good all-around solution, unless you want to
precalculate every possible URL and store them in a separate table
with that URL as the index. Then it's one query for all cases, but
maintaining that table becomes difficult and problematic. It also
becomes static, without the ability to adapt to new cases. Perhaps
this is the way to go for the future, perhaps a plugin could make this
sort of optimization by overriding the rewrite system. But it seems
like more trouble than it's worth, really.

-Otto