[wp-hackers] Time for a wp_post_relationships table?

Tue Aug 3 08:05:25 UTC 2010

On Aug 3, 2010, at 12:34 AM, Otto wrote:
> However, I'll play devil's advocate.

I expect no less from you, though I have come to appreciate it. :)

In the same vein I'll respond. I currently see value but am open to having my opinion changed. So it's not a "fight to the death" on the topic but to have an honest debate. 

>> 1.) Post-to-post relationships are a special pattern that has extremely broad applicability across many use-cases.  Having a special purpose functionality to handle common use-cases makes sense much like why foreach() exists in PHP even though for() would have addressed every use-case where foreach() is used. Same here, we could use other alternatives but post-to-post relationships are so fundamental why delegate that use-case to generic functionality?
> 
> There is no use case in core that requires post-to-post relationships
> in this way.

Not yet, anyway though I could easily see one being added at some point in the future such as a related posts selection metabox that could be triggered by adding something like "related-posts-{$post_type}" in the "supports" section of register_post_type().  This would be helpful for every single site I've worked on since 3.0 was in beta (and before, even.)

> Even if comments become a post type, it's a many-to-one
> relationship that can easily use the post_parent field.

Yes, post_parent works well for one-to-many, but can't handle many-to-many.  It also can't handle it when a post can be on the many end of one-to-many but related to more than one post type (see examples below.)

> For special plugin cases, they should devise their own methods for
> doing what they want. Hell, they will anyway.

Why would we want to devise their own methods rather than use a standard approach built into core?  (It's an honest question.)

>> 2.) If we delegate this to a case of "just use post meta" then we'll find different plugins will implement relationships different, i.e. using different meta_keys and thus won't interoperate.
> 
> They won't interoperate anyway. The proposed table doesn't specify
> what the given relationships *mean*. What does it mean if I have a
> bunch of posts related to other posts and to taxonomies? No idea. It's
> just a big table of numbers, you can interpret it any way you choose.

You are getting too use-case specific; they don't need to specify what the relationships mean only that they are related. Plugins can add the "meaning" layer but WordPress should probably offer the core relationship support.

There are many generic functionalities that would have value.  For example, you could have a plugin that managed related posts in a grid editor and a use case for that would be actors in a movie and the grid would let the user add/edit actors for the movie and add annotate the name of their character from the movie.  Then another plugin could provide a "related posts" widget which could be used to display the actors on the movie page and the movies on the actor page.  With a standardized table like this those plugins could (and often would) be developed separately and unaware of the other.  Without such a table the only way to get those functionalities would be if one plugin author developed it all, if the two collaborated, or if one built on the other; all of which are much less likey scenarios.

>> 3.) With the inclusion of a wp_post_relationships table we'll lay the groundwork for being able to add functions to core like but not limited to get_related_posts($post_id) as well as admin UI functionality to interact with related posts. We'll also lay groundwork for plugins to interoperate.
> 
> Generally we don't put things in core just so plugins can use them.
> There needs to be some kind of core use of this table. Making a table
> that will be empty unless you use a specific plugin is silly.

> 
> If plugins want to interoperate, they can do that now by having plugin
> developers talk to one another.

Well, I don't think it is silly but our opinion differ.  Taken in a vacuum, that's like saying that if electric appliance vendors want their appliances to work they can collaborate with homebuilders to make sure that the homes are built with their proprietary connector of powering the appliances.  Fortunately for us that need not be the case because the industry had the forethought to standardize plugs and outlets (at least within the US, can't speak for everywhere else.) 

Or another analogy would be to say "If the browser vendor wants to interoperate with the web server, they can talk to one another."  If they had used the above logic, where would the web be now?

I think maybe I simply value standardization as a stepping stone for greater things and you don't (right?) and that difference in how we value standardization colors any discussion we have related to it. 

> I don't see how taxonomy has anything to do with post-to-post
> relationships, but then I also admit that I don't understand any
> significant use for the the table as proposed. A fuller and more
> complete explanation would be preferred. The links you gave were light
> on detail.

Michael Pretty implemented taxonomy mirrored posts in a project and mentioned it on the list.  I then asked for some help implementing from anyone who had done so (Michael didn't offer help but then I didn't directly ask him and he was under no obligation to help anyway.) So I implemented a taxonomy-mirrored posts plugin asked a question about it on the list to which Michael replied that he had come to the conclusion that it caused two many problems and that he was next going to use a custom table even though typically he would never add a new table.

The use-cases for post-to-post are probably not very apparent when using WordPress only as a blog but become much more apparent when one starts using it as a CMS with lots of different data types.  I'll give you numerous examples where I've needed it in the past several months (#1 thru #3 are actual projects I've worked on, #4 & #5 are hypothetical):

1.) A law firm website has attorneys, practice areas, articles, case studies, and events/presentations.  Each attorney is in one or more practice areas.  Each article and each case study can be attached to one or more attorneys and relevant to one or more practice areas.  Each event/presentation can have one or more attorneys presenting and can be represented one one or more practice areas' calendars.

2.) A conference website can have sessions, sponsors, presenters, time slots and rooms. A session can have one or more presenters and be presented in one or more time slots and one room per time slot ,and a presenter can represent a sponsor or not.

3.) A website listing restaurant menus can have restaurants, locations, menus. A restaurant can have one or more locations and one or more menus (breakfast, lunch, dinner, catering, etc.) A menu can apply to all locations for a restaurant or only a specific location.  

4.) A movie website could have movies and person-roles where a person-role could be an actor, producer, cinematographer, casting, film editing, costumer, set designer, composer, etc.

5.) A hospital website has doctors and departments.  Each department can have multiple doctors and each doctor can work in multiple departments.

6.) A radio station website could have stations and disc jockey's where a station could have multiple disc jockeys and each disc jockey could appear on multiple stations (given how radio works these days.)

I could actually give you many other examples if you need it; there is an example for almost every business that could potentially use WordPress for it's website (vs. just it's blog.) 

>> 5.) Using postmeta to store post-to-post relationships means storing foreign keys in non-indexed longtext meta_value field or embedded in the meta_key field. That's not too bad if you are looking up via post_id but it's awful if you have a large database with an even larger number of relationships and you need to do a select or worse join to the post->ID that is stored in meta_value.  It's absolutely not scalable and while #1 thru #4 are all debatable I'm pretty sure everyone can understand how this is unworkable for sites that may need to scale one day?
> 
> Post meta could be used for the majority of cases, and I don't see
> that there's any scaling issues whatsoever there. The meta_value
> doesn't need to be indexed to join on it, when you're using the
> meta_key as a selection mechanism. How many post relationships can a
> post have? 1000 to 1000? What would be the point?

It's not the number of post relationships that a *specific* post has, it's the *total* number of records with the matching meta_key in the wp_postmeta table.  Let's assume an IMDB.COM clone with only 1,000 movies listed (According to Google by searching "inurl:www.imdb.com/title/" IMDB.com has a jaw dropping 73,500,000 movies. Really, there are that many movies?  Wow.)  

1000 is trivial, but I didn't want to say 10,000+ because I didn't want you to argue with the number.   

So we add a postmeta field to relate post_id to the movie's post->ID, meta_key='actor', and meta_value relates to actor's post->ID.  Now let's assume that each post has 10 post_meta fields for "actor" on average. So that's 10,000 postmeta records.  Want a list of actors for a movie?    Here's the SQL:

$sql = $wpdb->prepare("
SELECT
	actor.post_title AS actor
FROM
	wp_posts actor 
	INNER JOIN wp_postmeta movie_actor ON actor.ID=movie_actor.meta_value AND movie_actor.meta_key='actor'
WHERE
	movie_actor.post_id=$d", $actor_id);

MySQL EXPLAIN gives us the following. Not too bad; we have to do a 10,000 record scan but at least it is on an 8 byte key:

*************************** 1. row ***************************
id           : 1
select_type  : SIMPLE
table        : movie_actor
type         : ref
possible_keys: post_id,meta_key
key          : post_id
key_len      : 8
ref          : const
rows         : 10,000
Extra        : Using where
*************************** 2. row ***************************
id           : 1
select_type  : SIMPLE
table        : actor
type         : eq_ref
possible_keys: PRIMARY
key          : PRIMARY
key_len      : 8
ref          : movie_actor.meta_value
rows         : 1
Extra        : Using where
2 rows in set 

OTOH, want a list of movies for an actor? Here's the query (almost the same):

$sql = $wpdb->prepare("
SELECT
	movie.post_title AS movie
FROM
	wp_posts movie 
	INNER JOIN wp_postmeta movie_actor ON movie.ID=movie_actor.post_id AND movie_actor.meta_key='actor'
WHERE
	movie_actor.meta_value=$d", $movie_id);

And here's the MySQL EXPLAIN.   10,000 records scanning a 768 byte key (and "Using where.")  YEOWCH!

*************************** 1. row ***************************
id           : 1
select_type  : SIMPLE
table        : movie_actor
type         : ref
possible_keys: post_id,meta_key
key          : meta_key
key_len      : 768
ref          : const
rows         : 10,000
Extra        : Using where
*************************** 2. row ***************************
id           : 1
select_type  : SIMPLE
table        : movie
type         : eq_ref
possible_keys: PRIMARY
key          : PRIMARY
key_len      : 8
ref          : movie_actor.post_id
rows         : 1
Extra        : 
2 rows in set 

Now what if we have 10,000 movies and 25 postmeta fields per movie?  Now we are talking abut scanning 768 byte fields for 250,000 records for *each* query like the latter one. 

To make matters worse, what if we now need to do a join?  What if we want a query that shows which actors have worked with other actors?  Now we have to do a table scan during a join.

But what if we had the second query using the proposed wp_post_relationships table:

SELECT
	movie.post_title AS movie
FROM
	wp_posts movie 
	INNER JOIN wp_post_relationships movie_actor ON movie.ID=movie_actor.parent_id
WHERE
	movie_actor.post_id=22

And MySQL EXPLAIN tells us we match one row with one row and no "Using where." This is about as good of MySQL performance as you can get.  For even moderate sized data sets this will be orders of magnitude faster than scanning a 768 byte key using where.  So we are talking about sites that will start slowing down significantly with almost trivial number of posts.

*************************** 1. row ***************************
id           : 1
select_type  : SIMPLE
table        : movie_actor
type         : ref
possible_keys: PRIMARY,post_id
key          : post_id
key_len      : 8
ref          : const
rows         : 1
Extra        : 
*************************** 2. row ***************************
id           : 1
select_type  : SIMPLE
table        : movie
type         : eq_ref
possible_keys: PRIMARY
key          : PRIMARY
key_len      : 8
ref          : wp30.movie_actor.parent_id
rows         : 1
Extra        : 
2 rows in set (0.02 sec)

Still think that is "no issue whatsoever?"  

If yes, here's some good reading: 

-- http://www.slideshare.net/techdude/how-to-kill-mysql-performance
-- http://joinfu.com/presentations/understanding_query_execution.pdf
-- http://www.devshed.com/c/a/MySQL/MySQL-Optimization-part-1/2/

> What *is* the point, anyway? What does this table *mean*? What data
> does it hold? Speaking abstract is fine, but I don't see the use for
> the data itself, because I see no use cases here at all. What is a
> specific use case that I can point to and say "this is why we need
> this"?

In a word if it's all about blogging only, most of this is unimportant.  

However, once CMS requirements are added into the mix it becomes important.  Examples #1 thru #6 above are specifics, not abstracts.  Not all of sites #1 thru #6 will need to scale but some of them will and especially #3 and #4.  

And all of them would benefit from standardized post-to-post relationships whether you appreciate the value of such or not. :)

-Mike