[wp-hackers] A "terms" table

Robert Deaton false.hopes at gmail.com
Sun Apr 15 20:18:57 GMT 2007


On 4/15/07, Matt Mullenweg <m at mullenweg.com> wrote:
> WordPress is like a sandwich.

I would swear you were highly overweight if I didn't know any better.

I will be snipping heavily.

> We also havae condiments which are currently handled by two tables:
> wp_categories and wp_post2cat. On the taxonomy/condiment side, right now
> we really only allow ketchup aka categories, and users for at least a
> year have been asking for more. In 2.2 we decided to satiate their
> appetites.

I wouldn't say we only "allow" ketchup, I'd say in the past we only
provided ketchup on our sandwhich, and if you want mayonnaise aka
tags, you needed to put them on yourself. I still don't necessarily
think this is a bad idea, but I won't argue this point, as removing
tags entirely is almost certainly not going to happen.

> Everyone agrees that ketchup and mayonnaise are totally different, even
> though they're both condiments and you put them both on sandwiches. No
> one is trying to create some horrible pink mixture of the two tastes.

You could've fooled me in the past. :)

> Let me do my best to make the case for putting category data and tag
> data in separate tables, and feel free to chime in if you think I've
> missed any points.
>
> * We shouldn't ship anything with a data schema people disagree on,
> because plugins and themes will be written against it.
> * They're different things, so we should have them in different tables.
> * Tags can have things like synonyms, and don't need things like hierarchy.
> * There are ugly legacy field names in the category table like
> category_nicename, cat_name, cat_ID (wtf capitals) and we can clean
> those up in new tables
> * With separate tables our queries on the admin side become WAY easier
> and cleaner to do, with no bitwise or _count nonsense
> * Plugins for tagging have implemented it this way.
>
> The code currently in SVN does something different. It uses the
> categories table for names of the tags and then adds fields to hint how
> those names are being used for the admin section. If I wanted to make
> everyone happy and be popular I would just go with the above since there
> seems to be good consensus there, but I think this is an important
> long-term decision for WP so let me spell out some reasons why I think
> the current design has legs not just for 2.2 but beyond.
>
> 1. It performs faster.
>
> On front-end display, we have added ZERO QUERIES to support tags. The
> query that grabs categories is also grabbing tags and we're sorting them
> out in the code.

One of the first rules of SQL has always been to not fetch more than
you need, to allow sorting and searching to be done on the SQL side,
etc. This breaks that fundamental rule.

The bitfield also introduces new things. Let's face it, the APIs for
fetching things from the database are horribly incomplete and will
never ever possibly cover all the things that plugins will want to do.
So, how about a plugin that wants to fetch all the tags to do some
magic? Well, with a bitfield, that plugin now has to fetch all the
categories and all the tags, and then sort through with bitwise
operations and discard the categories which they don't care about.
Efficiency? iffy.

> In the dashboard some of the queries are more complicated (though not
> really any different than what we deal with for link categories) and a
> few milliseconds slower than the old ones. However, that really doesn't
> matter because 1) we only need to write them once and more importantly
> 2) they're run several orders of magnitude fewer times than the ones
> that display the blog on the front-end. A mantra has always been that
> user time is more important than developer time.

> A separate tag naming table and post2tag table would require at least 2
> additional queries and/or joins to the front page, which already think
> does too many queries and is too heavy.

I think you missed a noun in the last clause of the sentence. Who
thinks it is? I think that the number of queries shouldn't necessarily
make a _huge_ difference, its the queries themselves, how much they
fetch, how much is actually used, etc. WP takes a "fetch everything
with the assumption it will be used approach", perhaps we need to
rethink that.

> 2. It's a better long-term foundation.
>
> I think there are a lot of benefits to having a single ID that maps to a
> term and a slug. Let's pretend we had perfect foresight 5 years ago and
> instead of wp_categories we had wp_terms.

I still think we'd be having this conversation today.

> Regardless of the UI and philosophy behind categories, tags, and ooga
> booga, on a data level they're still mapping a set of terms to an item
> in post_content.

Are you gradually moving us toward dropping this table altogether and
moving it all into postmeta?

> In WP a term has three important things: an ID, a human-entered name,
> and a URL-friendly slug. We use the ID in our relations instead of the
> slug because it's more efficient and slugs are not necessarily unique
> (because of hierarchy).
>
> Having "dogs" in a category table have one ID and "dogs" in a tag table
> have a different ID is a long-term deck of cards that we will seriously
> regret later. It's MUCH harder to reconcile items with internally
> different IDs than it is to split out unique IDs into different tables.

But like you said earlier, ketchup and mayonnaise are two different
things. Why would we be trying to put them together?

> As for some of the bit and count fields currently causing grief, I would
> argue the solution for that isn't a separate tags table, but a separate
> table specifically for that type of data. In Drupal for this
> infrastructure they have a term_data, term_hiercharchy, term_node,
> term_relation, term_synonym, vocabulary, and vocabulary_node_types
> tables. I think that might be a little more than we need, but there are
> some concepts there we could pretty cleanly combine into a single extra
> table that isn't called categories or tags, and will provide a good and
> scalable foundation for years to come.
>

> 3. There should be no user- or plugin-facing problems with how it's
> currently implemented, or if we decide to change it.

I disagree. Especially because the implementation is lacking so many
features that other tagging plugins have done well for some time now,
new plugins are bound to be forced to write their own queries to
interface with this table. And when they don't work anymore in the
next version, someone's gonna be pissed.

>
> Now this isn't to suggest for a second there aren't bugs, many have been
> fixed already and I'm sure there are many still left, but that is going
> to be true of ANY code we put in WP and anyone who suggests otherwise is
> not very familiar with software development. From a point of view of
> plugin authors, they shouldn't have to think or care if we're storing it
> in a categories table or a turkey, the function they use should remain
> consistent no matter what we change or gymnastics we do behind the
> curtain. No matter what we do in 2.2 or 2.3, that's not going to change.
>
> I do think there is something intrinsically better about shipping and
> iterating than noodling without release in search of the "perfect"
> implementation.

I agree, there is something better about shipping and iterating, as
long as the initial shipment has something that is not fundamentally
broken.

I snipped the rest cause I have more to say but I have to run out for
lunch. I'll come back later.


-- 
--Robert Deaton
http://lushlab.com


More information about the wp-hackers mailing list