[wp-testers] Wordpress scaling problems

Thu Jan 29 19:43:18 GMT 2009

/me takes hat off in respect...

I agree that is a totally good read. Now to take some of your logic and 
apply it to the WP e-Commerce Plugin (yeah!!!)...

Ciao,
Dan

Paul Robinson wrote:
> I'd just like to chip in here and say bravo to Otto. In one fell swoop you
> have just explained to me how WP does a lot of things with it'd DB & why you
> shouldn't use the category, post name combo for your permalink. You should
> be a teacher. :P
>
> Paul.
>
> 2009/1/29 Otto <otto at ottodestruct.com>
>
>   
>> On Thu, Jan 29, 2009 at 5:43 AM, matthijs <matthijsenator at gmail.com>
>> wrote:
>>     
>>> Again, I understand that in the - current - wordpress design this is how
>>>       
>> it
>>     
>>> goes. But if you forget the current design for a moment, and think about
>>>       
>> how
>>     
>>> you could otherwise design a system handling permalinks. I can hardly
>>> believe there is no other way to do this. How are all those other
>>>       
>> websites
>>     
>>> with more then a few hundred pages doing it then?
>>>       
>> Actually, I think this deserves a bit more discussion... Let's
>> consider a permalink like %category%/%postname%.
>>
>> So you're handed a URL like /mycat/mypost. You start by parsing it
>> into mycat and mypost. You don't know what these are. They're just
>> strings to you. So, first, you have to consider what "mycat" is.
>>
>> First, you query to see if "mycat" is a pagename. This is a select
>> from wp_posts where post_slug = mycat and post_type = page. No joy
>> there.
>>
>> Next, you query to see if "mycat" is a category. This is a select from
>> wp_terms join wp_term_taxonomy on (term_id = term_id) where term =
>> mycat and taxonomy = category. Hey, we found a mycat, so that's good.
>> Unfortunately, this just tells us that it's a category, which is
>> rather useless in retrieving the actual post we're looking for. So we
>> ignore the category.
>>
>> Now, we move on to the "mypost". Again, we start querying:
>> 1. Is it a page?  select from wp_posts where post_slug = mypost and
>> post_type = page. Nope.
>> 2. Is it a category?  select from wp_terms join wp_term_taxonomy on
>> (term_id = term_id) where term = mypost and taxonomy = category. Nope.
>> 3. Is it a post? select from wp_posts where post_slug = mypost and
>> post_type = post. Bingo.
>>
>> The whole goal is to determine the specific post being asked for. The
>> category is not helpful in this respect, and we have to do a couple
>> queries just to figure out that we need to ignore it. Five queries to
>> determine what the post is with this structure. Five queries, two of
>> them expensive (joins ain't cheap). And these have to happen on every
>> load of a post on your site.
>>
>> With the pre-built list of rewrite rules, we already have the data in
>> memory, in a big array. All that need be done with the current case is
>> an array lookup on the url. And the query to get that list was one
>> simple one, just get the rewrite_rules option. In fact, most all the
>> options are preloaded with one big query at the beginning load anyway,
>> so it's a no-query operation.
>>
>> The only reason you're running 2400 queries is because your
>> rewrite_rules are not being saved in the first place. Now, I grant you
>> that 2400 queries is a heck of a lot to build the rewrite rule list in
>> the first place, and no doubt some optimization can be done there, but
>> if that data was saved properly when you built the rewrite rules, then
>> the pre-calculation saves time and queries and makes it much faster.
>> The only reason it's not working is that the data isn't being saved.
>>
>> It's certainly possible to conceive of an alternate strategy. What if
>> we stored all the rewrite rules in a separate table? Now we can pull
>> from that table as needed, and each row is quite small by comparison.
>> This would certainly work for this specific case, but it makes other
>> cases require more queries. Since now we don't have them in memory any
>> more, we have to pull a new row for each different rewrite case.
>>
>> There's no particularly good all-around solution, unless you want to
>> precalculate every possible URL and store them in a separate table
>> with that URL as the index. Then it's one query for all cases, but
>> maintaining that table becomes difficult and problematic. It also
>> becomes static, without the ability to adapt to new cases. Perhaps
>> this is the way to go for the future, perhaps a plugin could make this
>> sort of optimization by overriding the rewrite system. But it seems
>> like more trouble than it's worth, really.
>>
>> -Otto
>> _______________________________________________
>> wp-testers mailing list
>> wp-testers at lists.automattic.com
>> http://lists.automattic.com/mailman/listinfo/wp-testers
>>
>>     
> _______________________________________________
> wp-testers mailing list
> wp-testers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-testers
>