[wp-trac] [WordPress Trac] #30335: Splitting shared terms on term update breaks backward compatibility when using an old term_id

Sat Nov 15 14:40:43 UTC 2014

#30335: Splitting shared terms on term update breaks backward compatibility when
using an old term_id
-----------------------------------+--------------------
 Reporter:  boonebgorges           |       Owner:
     Type:  defect (bug)           |      Status:  new
 Priority:  high                   |   Milestone:  4.1
Component:  Taxonomy               |     Version:  trunk
 Severity:  major                  |  Resolution:
 Keywords:  has-patch 2nd-opinion  |     Focuses:
-----------------------------------+--------------------
Changes (by boonebgorges):

 * keywords:  needs-patch => has-patch 2nd-opinion

Comment:

 [attachment:30335.3.patch] is a first pass at providing backward
 compatibility for the use of the pre-split term_id in a number of cases.
 The cases covered are `get_term()`, `WP_Tax_Query` (when 'field=term_id'),
 and `get_terms()` (when using a param that accepts term IDs, like
 'include' or 'parent'). In brief: when a term is split in
 `_split_shared_term()`, the affected term_taxonomy_id is stored (in the
 '_split_terms' option, keyed by the old term_id). In `get_term()`, if an
 integer term_id is provided and no term is found in the cache or the DB by
 that term, the split term store is checked as a fallback. In `get_terms()`
 and `WP_Tax_Query::transform_query()`, a fallback query won't work, so we
 parse the passed term_ids *before* running the query, and swap out old
 term IDs with new ones, if found.

 The implementation I chose for storing and fetching split term data is
 designed for performance on large sites, where split terms are most likely
 to occur. The only thing stored in the  '_split_terms' option is
 term_taxonomy_ids, both to keep the option as small as possible (so it
 won't overload a cache block). `_get_split_term()` bails quickly if the
 passed term_id does not have a match in this array. If it *does* find a
 match, `get_term_by()` is run, which as of 4.1 is properly cached. So, on
 a site with a persistent object cache, no additional database queries
 should be triggered.

 I guess I have a couple concerns/points for discussion:

 * This system has a slight risk of false positives. If term 5 is shared
 between terms in 'post_tag' and 'category', and then the category is split
 to term_id=13, a later lookup `get_term( 5, 'category' )` will return term
 13. It's possible that someone could choose to pass 5 to `get_term()` in
 this way without intending to reference the old term_id, and getting back
 term 13 would be a pretty odd surprise. The chances of this are very
 small, but worth thinking about (would have to happen with a shared term
 that's been split - rare to begin with - and where someone just happens to
 query for a term with that ID, in which case they should be expecting bad
 (empty) results anyway).
 * I've chosen to provide silent fallback support. But maybe we should
 throw a `_doing_it_wrong()`?
 * How many cases does this actually cover? From some discussions with
 mboynes and a few others who have built custom systems that cache
 'term_id', I think it will cover quite a large percentage. If you have
 written something that has cached term IDs in this way, your eyes on this
 patch would be most appreciated.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/30335#comment:15>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform