[wp-hackers] A "terms" table

Chris chris.hearn01 at ntlworld.com
Sun Apr 15 20:42:49 GMT 2007


You're the boss in this I guess - fwiw why the rush? because some other 
blogging systems put out releases more often? bad reason!
The documentation needs sorting for version 2.1 - stability is good - DB 
changes are bad unless very carefully implemented and documented so  = 
Keep tags out of the way, and don't rush to another release - May/June, 
later.. whatever!
Chris

Matt Mullenweg wrote:
> WordPress is like a sandwich.
>
> Assuming we've scared off all the vegetarians with all the talk of 
> BBQ, the core is the meat. Our meat is the wp_posts table, which 
> stores what I would refer to as the primary points of content. 
> Currently for us this is posts, pages, and attachments, though in the 
> future I could see it expanding to support new post types such as 
> externals, galleries, and hopefully things we can't even imagine yet.
>
> On the side you have chips (good comments), vegetables (idiot 
> comments), and that funny stuff your cousin brought that you're going 
> to move around on the plate but never eat (spam comments). I think 
> comments are okay right now, maybe they could use a meta table but we 
> can talk about that later.
>
> Meat alone is only a real meal at rboren's house, so most people put 
> things on the sandwich to add flavor and spice it up. Some add other 
> types of meat, in the WP world this is postmeta, which we call custom 
> fields in polite company.
>
> We also havae condiments which are currently handled by two tables: 
> wp_categories and wp_post2cat. On the taxonomy/condiment side, right 
> now we really only allow ketchup aka categories, and users for at 
> least a year have been asking for more. In 2.2 we decided to satiate 
> their appetites.
>
> Everyone agrees that ketchup and mayonnaise are totally different, 
> even though they're both condiments and you put them both on 
> sandwiches. No one is trying to create some horrible pink mixture of 
> the two tastes.
>
> However there are currently two schools of thought on how we should 
> store the data for categories and tags at a very low level in our DB.
>
> Let me do my best to make the case for putting category data and tag 
> data in separate tables, and feel free to chime in if you think I've 
> missed any points.
>
> * We shouldn't ship anything with a data schema people disagree on, 
> because plugins and themes will be written against it.
> * They're different things, so we should have them in different tables.
> * Tags can have things like synonyms, and don't need things like 
> hierarchy.
> * There are ugly legacy field names in the category table like 
> category_nicename, cat_name, cat_ID (wtf capitals) and we can clean 
> those up in new tables
> * With separate tables our queries on the admin side become WAY easier 
> and cleaner to do, with no bitwise or _count nonsense
> * Plugins for tagging have implemented it this way.
>
> The code currently in SVN does something different. It uses the 
> categories table for names of the tags and then adds fields to hint 
> how those names are being used for the admin section. If I wanted to 
> make everyone happy and be popular I would just go with the above 
> since there seems to be good consensus there, but I think this is an 
> important long-term decision for WP so let me spell out some reasons 
> why I think the current design has legs not just for 2.2 but beyond.
>
> 1. It performs faster.
>
> On front-end display, we have added ZERO QUERIES to support tags. The 
> query that grabs categories is also grabbing tags and we're sorting 
> them out in the code.
>
> In the dashboard some of the queries are more complicated (though not 
> really any different than what we deal with for link categories) and a 
> few milliseconds slower than the old ones. However, that really 
> doesn't matter because 1) we only need to write them once and more 
> importantly 2) they're run several orders of magnitude fewer times 
> than the ones that display the blog on the front-end. A mantra has 
> always been that user time is more important than developer time.
>
> A separate tag naming table and post2tag table would require at least 
> 2 additional queries and/or joins to the front page, which already 
> think does too many queries and is too heavy.
>
> 2. It's a better long-term foundation.
>
> I think there are a lot of benefits to having a single ID that maps to 
> a term and a slug. Let's pretend we had perfect foresight 5 years ago 
> and instead of wp_categories we had wp_terms.
>
> Regardless of the UI and philosophy behind categories, tags, and ooga 
> booga, on a data level they're still mapping a set of terms to an item 
> in post_content.
>
> In WP a term has three important things: an ID, a human-entered name, 
> and a URL-friendly slug. We use the ID in our relations instead of the 
> slug because it's more efficient and slugs are not necessarily unique 
> (because of hierarchy).
>
> Having "dogs" in a category table have one ID and "dogs" in a tag 
> table have a different ID is a long-term deck of cards that we will 
> seriously regret later. It's MUCH harder to reconcile items with 
> internally different IDs than it is to split out unique IDs into 
> different tables.
>
> As for some of the bit and count fields currently causing grief, I 
> would argue the solution for that isn't a separate tags table, but a 
> separate table specifically for that type of data. In Drupal for this 
> infrastructure they have a term_data, term_hiercharchy, term_node, 
> term_relation, term_synonym, vocabulary, and vocabulary_node_types 
> tables. I think that might be a little more than we need, but there 
> are some concepts there we could pretty cleanly combine into a single 
> extra table that isn't called categories or tags, and will provide a 
> good and scalable foundation for years to come.
>
> 3. There should be no user- or plugin-facing problems with how it's 
> currently implemented, or if we decide to change it.
>
> Now this isn't to suggest for a second there aren't bugs, many have 
> been fixed already and I'm sure there are many still left, but that is 
> going to be true of ANY code we put in WP and anyone who suggests 
> otherwise is not very familiar with software development. From a point 
> of view of plugin authors, they shouldn't have to think or care if 
> we're storing it in a categories table or a turkey, the function they 
> use should remain consistent no matter what we change or gymnastics we 
> do behind the curtain. No matter what we do in 2.2 or 2.3, that's not 
> going to change.
>
> I do think there is something intrinsically better about shipping and 
> iterating than noodling without release in search of the "perfect" 
> implementation.
>
> More importantly from a user's point of view, all that really matters 
> is that they have a box they can type tags in and that their host 
> doesn't tell them not to upgrade to 2.2 because it does more queries.
>
> 4. I'm open
>
> I'm not personally tied to any code written thus far and if I think 
> the best thing is.
>
> There is a separate but related decision around what to do about the 
> release date. Based on the discussion here I'm going to make go/no-go 
> decision on Tuesday.
>
> If we do delay I think we should laser-focus on tags and now allow 
> other pet-issues to creep in, and I will fully expect people to put in 
> as much time writing code and fixing bugs as they have arguing points 
> on mailing lists, IRC, and trac. At the very least I hope we've 
> learned a bit more about getting these things out of the way early 
> rather than a week or two before a release. Also if something is 
> sitting in trac, take it to the hackers list early.
>
> I think if we stick with the current implementation we can hit it with 
> a very stable release next Monday, but if we decide to replace it we 
> need to push it back at least into mid-May.
>


More information about the wp-hackers mailing list