[wp-hackers] Meta tables: Take 5

Casey Bisson casey.bisson at gmail.com
Thu Aug 6 02:41:53 UTC 2009


Contents[1]:
 > Overview of current arguments
 > The role of indexes
 > Actual use


 >_Overview of Current Arguments_
The argument for ENUMs is that they're (mostly) treated as INTs  
internally, so selects and joins should theoretically be faster.

The argument against ENUMs is that (a) the association between the  
string value that applications are using and the INT value that  
represents it internally is rarely consistent across databases and can  
lead to painful problems. Further, (b) it's a MYSQL only trick.

The argument for VARCHARs is that they're "fast enough for most  
people, most of the time" and the risk of string collisions is about  
the same as the risk of function name collisions (i.e.: safe enough  
for most people, most of the time).

The argument against is that lookups and joins are theoretically  
slower than for INTs or ENUMs.


 >_The Role of Indexes_
However, none of this discussion so far has considered how these  
different types behave when indexed.

Assuming you've got a good index, that you can keep the table open and  
indexes/keys cached, and that you're querying/joining against an  
indexed column, the column type doesn't much matter.

A table with a million rows and six distinct string values for the  
meta_type (or whatever) should have exactly six entries in the index  
for that column. Querying the by meta_type starts at the index, which  
lists all the matching row numbers. MySQL 5 eliminated the prior  
restriction of one index per query, so even complex queries are all  
done as booleans against the indexes.


 >_Actual Use_
And, this discussion hasn't considered that the current post_meta  
table isn't very useful as a tool for querying posts. That's what the  
taxonomy tables are for. It's very difficult (some might say foolish)  
to build a table that's optimized both for mass storage (such as the  
serialized content that often appears in post_meta) _and_ fast  
retrieval (such as identifying all the posts with a matching taxonomy  
value).

WordPress already has a good taxonomy system that's well optimized for  
search and retrieval. And it's extensible to all the data types WP now  
supports. What we don't have is a good place to store arbitrary data  
related to objects other than posts and users. Following that usage  
scenario, the only queries the proposed meta table should be optimized  
for is query by (int) object_id and object_type (where object type in  
post, comment, or user). By the time that the database engine is  
looking at the meta_type, it's already eliminated all but a couple/few  
rows, making arguments about ENUM vs. VARCHAR rather pointless.

[1]: Putting a table of contents in an email is a little ridiculous. I  
guess I'm feeling overly officious at this moment.

--Casey
http://maisonbisson.com/
http://about.scriblio.net/


On Aug 4, 2009, at 6:06 PM, Mike Schinkel wrote:

> On Aug 4, 2009, at 5:20 PM, Peter Westwood wrote:
>> On 27 Jul 2009, at 17:47, Mike Schinkel wrote:
>>>> Why so large? I was thinking CHAR(20) at most.
>>> Take a look at wp_postmeta and wp_usermeta; meta_key is  
>>> VARCHAR(255).  I presumed the same for this.
>>> Even so, ENUM is still a lot more efficient and it provides nice  
>>> "documentation in code" for what types are valid.
>>
>> Please don't use ENUM as it is a mysqlism
>> We switched away from it [1] to make our DB schema more portable  
>> and to make it easier for people to write alternative database  
>> classes.
>> [1] http://core.trac.wordpress.org/ticket/4778
>
> Note that Otto said it was a very bad idea, but was overruled by  
> Matt.  Unfortunately VARCHAR is a poor choice for an index key on  
> any SQL database.
>
> I must ask, is database portability for WordPress ever a reasonable  
> and realistic goal?  I think we may be kidding ourselves if we think  
> so, and worse I think adding database portability would bloat  
> WordPress core significantly and make it much harder to make  
> conforming plugins.
>
> Ideally I'd like to see us give up on the (false?) hope of database  
> portability and return to ENUMs. Failing that, a portable alternate  
> would be to move from VARCHAR(20) to SMALLINT(2) with a table that  
> defines the keys. That would make a big difference in scalability as  
> the number of rows increase and additional metadata is added.
>
> I've added a ticket into track for discussion: http://core.trac.wordpress.org/ticket/10546



More information about the wp-hackers mailing list