[wp-hackers] Meta tables: Take 5
mikeschinkel at newclarity.net
Sat Jul 25 06:48:19 UTC 2009
On Jul 25, 2009, at 1:29 AM, Otto wrote:
> I considered enums as well, but the alter table problem I think makes
> a big difference. We don't have anything doing alters except on
> upgrades at present, making it alter the table like this seems
> problematic to me.
The only time ALTER would be needed would be in the case a plugin
needed an ENUM value that doesn't already exist in the wp_meta table
and then only on plugin activation.
We can further minimize the need for ALTER by including ENUM values
for all the obvious objects that might need meta, i.e. comments,
links, categories, tags, etc.
That would make it rare that ALTER would be needed; much more rare in
fact than upgrades which *practically* everyone will need to run.
> I'd be happier with strings. We already use strings for types
> elsewhere (comment_type is comment, ping, etc.. post_type is page,
> post, attachment...).
> I don't think the overhead is going to be enough to make any
> difference except in huge scales. Perhaps some testing is in order?
That comfort comes with less than trivial performance implications.
Given the index I proposed, an ENUM(2)+BIGINT(20)+4 bytes, each key is
26 bytes long. With a 4K page filled 2/3s full you get 105 records
per index page. That allows MySQL to retrieve record pointers for
around 11,030 records in two index reads. Assuming 10 meta fields per
object, that allows you to have 1000 records and stay in two index
page reads. However, you can get to 1.16 million records in only
three index page reads, or about 115,900 object records (10 meta
records per object record.) And for 4 index reads you can get 127.6
million meta records, or 12.17 million object records. Not bad....
Now, let's look the option we'd choose it we didn't challenge the
default assumptions: VARCHAR(255)+BIGINT(20)+4, or 279 bytes per index
key, or only 9 meta records per index page, or less than 1 object
record per index page!!! That means for two index page reads, you only
get 95 meta records, or 9 object records. Three index page reads
gives you only 937 meta records, or 93 object records. Four index
pages reads give you 9,176 meta records, or 917 object records. With
five index page reads you can get up to 89,809 meta records, but only
8980 object records. To get to 87,899 object records you need 6 index
page reads, and to get to over 1 million object records you have to
have 7 index page reads. UGH!
Holy crap! ENUMs give you 115,900 object records in only 3 index page
reads yet strings require 6 index page reads for only 87,899 object
records. That's < 1/2 the performance, and you'll find it's actually
a lot worse than that (int indexes always perform better than string
That overhead from strings makes WordPress a non-option in high
traffic sites in the 100k record range (without *lots* of expensive
tuning), and noticeably slow for even moderate size sites (>1000
object records.) And what benefit for? All to avoid having to do an
ALTER? The more I research this and write about it, the more clear
that moving to ENUMs could be a huge win for WordPress scalability
My guess the reason most people are not screaming about the current
performance is that tables that use strings like the taxonomy tables
have so few records for most sites that it rarely becomes an issue.
But we are not architecting WordPress for sites with 25 blog posts,
are we? We are trying to ensure that it can scale to be able to be
used for even the largest sites, right?
WordPress Custom Plugins
P.S. BTW, I just ran came across this: "Disadvantages of Creating
Many Tables in the Same Database": "If you have many MyISAM tables in
the same database directory, open, close, and create operations are slow
More information about the wp-hackers