[wp-hackers] Meta tables: Take 5
Casey Bisson
casey.bisson at gmail.com
Thu Aug 6 02:41:53 UTC 2009
Contents[1]:
> Overview of current arguments
> The role of indexes
> Actual use
>_Overview of Current Arguments_
The argument for ENUMs is that they're (mostly) treated as INTs
internally, so selects and joins should theoretically be faster.
The argument against ENUMs is that (a) the association between the
string value that applications are using and the INT value that
represents it internally is rarely consistent across databases and can
lead to painful problems. Further, (b) it's a MYSQL only trick.
The argument for VARCHARs is that they're "fast enough for most
people, most of the time" and the risk of string collisions is about
the same as the risk of function name collisions (i.e.: safe enough
for most people, most of the time).
The argument against is that lookups and joins are theoretically
slower than for INTs or ENUMs.
>_The Role of Indexes_
However, none of this discussion so far has considered how these
different types behave when indexed.
Assuming you've got a good index, that you can keep the table open and
indexes/keys cached, and that you're querying/joining against an
indexed column, the column type doesn't much matter.
A table with a million rows and six distinct string values for the
meta_type (or whatever) should have exactly six entries in the index
for that column. Querying the by meta_type starts at the index, which
lists all the matching row numbers. MySQL 5 eliminated the prior
restriction of one index per query, so even complex queries are all
done as booleans against the indexes.
>_Actual Use_
And, this discussion hasn't considered that the current post_meta
table isn't very useful as a tool for querying posts. That's what the
taxonomy tables are for. It's very difficult (some might say foolish)
to build a table that's optimized both for mass storage (such as the
serialized content that often appears in post_meta) _and_ fast
retrieval (such as identifying all the posts with a matching taxonomy
value).
WordPress already has a good taxonomy system that's well optimized for
search and retrieval. And it's extensible to all the data types WP now
supports. What we don't have is a good place to store arbitrary data
related to objects other than posts and users. Following that usage
scenario, the only queries the proposed meta table should be optimized
for is query by (int) object_id and object_type (where object type in
post, comment, or user). By the time that the database engine is
looking at the meta_type, it's already eliminated all but a couple/few
rows, making arguments about ENUM vs. VARCHAR rather pointless.
[1]: Putting a table of contents in an email is a little ridiculous. I
guess I'm feeling overly officious at this moment.
--Casey
http://maisonbisson.com/
http://about.scriblio.net/
On Aug 4, 2009, at 6:06 PM, Mike Schinkel wrote:
> On Aug 4, 2009, at 5:20 PM, Peter Westwood wrote:
>> On 27 Jul 2009, at 17:47, Mike Schinkel wrote:
>>>> Why so large? I was thinking CHAR(20) at most.
>>> Take a look at wp_postmeta and wp_usermeta; meta_key is
>>> VARCHAR(255). I presumed the same for this.
>>> Even so, ENUM is still a lot more efficient and it provides nice
>>> "documentation in code" for what types are valid.
>>
>> Please don't use ENUM as it is a mysqlism
>> We switched away from it [1] to make our DB schema more portable
>> and to make it easier for people to write alternative database
>> classes.
>> [1] http://core.trac.wordpress.org/ticket/4778
>
> Note that Otto said it was a very bad idea, but was overruled by
> Matt. Unfortunately VARCHAR is a poor choice for an index key on
> any SQL database.
>
> I must ask, is database portability for WordPress ever a reasonable
> and realistic goal? I think we may be kidding ourselves if we think
> so, and worse I think adding database portability would bloat
> WordPress core significantly and make it much harder to make
> conforming plugins.
>
> Ideally I'd like to see us give up on the (false?) hope of database
> portability and return to ENUMs. Failing that, a portable alternate
> would be to move from VARCHAR(20) to SMALLINT(2) with a table that
> defines the keys. That would make a big difference in scalability as
> the number of rows increase and additional metadata is added.
>
> I've added a ticket into track for discussion: http://core.trac.wordpress.org/ticket/10546
More information about the wp-hackers
mailing list