[buddypress-trac] [BuddyPress] #4060: Improve performance of BP_Core_User::get_users()

buddypress-trac at lists.automattic.com buddypress-trac at lists.automattic.com
Sat Aug 25 15:19:03 UTC 2012


#4060: Improve performance of BP_Core_User::get_users()
-------------------------------------------------+-------------------------
 Reporter:  shanebp                              |       Owner:
     Type:  enhancement                          |      Status:  new
 Priority:  highest                              |   Milestone:  1.7
Component:  Members                              |     Version:  1.2.9
 Severity:  critical                             |  Resolution:
 Keywords:  2nd-opinion has-patch needs-testing  |
  dev-feedback                                   |
-------------------------------------------------+-------------------------
Changes (by boonebgorges):

 * keywords:  2nd-opinion => 2nd-opinion has-patch needs-testing dev-
               feedback


Comment:

 4060.01.patch is a first rough draft of overhauling our user queries.

 First, a brief summary of the issues at hand. Our current user queries run
 through BP_Core_User::get_users(). The queries built by this method do not
 scale very well at all, due to a couple of fundamental issues:
 - Directional joins against global tables. The query constructor always
 joins (at least) wp_users against wp_usermeta, and almost always joins
 against some BP tables too
 - Directional joins against improperly indexed tables. wp_usermeta does
 not have an index on meta_value (which is used for last_activity - every
 single query - as well as a number of other query types)
 - SELECT * and other greedy queries. With very large datasets, queries
 like SELECT * and other multiple-column SELECT statements increase the
 likelihood that temp tables will be required.
 - Unflexible COUNT(*) queries. At a certain point, there's nothing you can
 really do to optimize these kinds of queries. Our current user query setup
 prevents admins from easily overriding them (with, eg, SQL_CALC_FOUND_ROWS
 and FOUND_ROWS()) or caching them or skipping them altogether

 My proposed solution 4060.01 introduces BP_User_Query and deprecates
 BP_Core_User::get_users(). This solution eliminates some of the issues
 above, mitigates some of those that it can't eliminate outright, and has a
 couple of added benefits. I'll outline the general strategy and its
 benefits, and then pose some questions for feedback.

 BP_User_Query is intentionally modeled on WP_User_Query, with similar
 method and property names, and a similar structure. This should make it
 easier for WP devs to approach. It improves our user queries in a couple
 of key ways. First and foremost, it splits our big multiple JOIN into
 several steps:
 1) A custom query for getting matching IDs only. This parallels the
 split_the_query option recently introduced in WP_Query, for similar
 reasons: when your initial WHERE query (which does all the heavy lifting
 of filtering) only loads IDs, you are loading far less information into
 memory than if you do SELECT(*) or something like that, with the result
 that you have to resort far less to temporary tables.
 2) Once we have the resulting user IDs, we use WP_User_Query to get some
 of the WP core data that BP needs (user_email, etc)
 3) Finally, we use the user IDs to run another set of queries, to fetch
 the BP-specific data corresponding to 'populate_extras': friend counts,
 is_friend, last_activity, etc. Much of this data comes from wp_usermeta,
 and was previously fetched in separate standalone queries; I was able to
 refactor it to roughly halve the number of queries required for
 populate_extras.

 Some sample preliminary results, on a dataset with ~113K rows in wp_users
 and ~1.35M rows in wp_usermeta:

 = TYPE: 'newest' =
 BP_Core_User::get_users
   - Main query: 3,050ms
   - Count query: 2,372ms

 BP_User_Query
   - Main query: 1,577ms
   - Count query: 185ms

 = TYPE: 'alphabetical' (xprofile disabled or BP-WP profile sync enabled) =
 BP_Core_User::get_users
   - Main query: 3,464ms
   - Count query: 2,491ms

 BP_User_Query
   - Main query: 88ms
   - Count query: 68ms

 = TYPE: 'alphabetical' (xprofile enabled and BP-WP profile sync disabled)
 =
 BP_Core_User::get_users
   - Main query: 3,464ms
   - Count query: 2,491ms

 BP_User_Query
   - Main query: 509ms
   - Count query: 151ms

 As you add more filters to the query (like search_terms), they get slower,
 of course. But the speed improvements are roughly linear to what you see
 above. As you can see, avoiding unnecessary JOINs results in huge
 improvements. Moreover, the restructuring means that BP tables are never
 joined against WP tables during user queries (except in one or two filter
 cases that I haven't gotten around to rewriting yet - see
 meta_key/meta_value).

 A couple of questions and directions for further development:
 - 'last_activity' - The current bottleneck of the whole setup is the
 'last_activity' usermeta value. We use this to order and filter many of
 our user queries. But storing this data in wp_usermeta means that there's
 no index, so that our queries are quite slow (see my 'newest' numbers
 above). Moving this to a properly indexed transactional table would speed
 things up enormously. wp_bp_activity is the obvious choice, since it's
 built for this purpose. But such a move would mean that the BP Activity
 component could not be shut off (or would have to be split into two parts,
 so that the core couldn't be shut off), and it would also cause backward
 compatibility issues. For BP 1.7, I propose that we do *not* pursue any
 change along these lines - one thing at a time - but that we revisit for a
 future release.
 - Backward compatibility - I have chosen to leave
 BP_Core_User::get_users() untouched, and to introduce BP_User_Query as a
 new class. That means that anyone continuing to use
 BP_Core_User::get_users() directly will see a deprecated notice, but their
 code will still work. The point at which I'm choosing to swap out the new
 class is in functions like bp_core_get_users(). (There are 6-8 other
 places throughout BP where a similar change will have to be made; I
 haven't done it yet, because I want to settle on a strategy first.) For
 the vast majority of uses, this changeover will be totally seamless.
 However, anyone who is directly filtering the get_users() sql queries
 ('bp_core_get_paged_users_sql', 'bp_core_get_total_users_sql') will be
 affected by the move to BP_User_Query, because these filters disappear.
 (There's simply no way to salvage them, given that the entire query
 structure is changed.) My proposed backpat strategy, as you can see in the
 patch, is to provide a 'bp_use_legacy_user_queries' filter, which takes
 the __FUNCTION__ as a parameter, so that developers can opt to fall back
 on the old method in selected cases. What do you think of this strategy?
 - Ongoing status of BP_Core_User - If we're no longer using
 BP_Core_User::get_users(), the only thing that BP_Core_User is really used
 for is instantiating a *single* user object. This, IMO, is a good thing -
 it more closely parallels the relationship between WP_User and
 WP_User_Query. However, as it stands, there is currently (both in the
 current codebase and my own patch) some reduplication of efforts. For
 example, it doesn't make sense for there to be two different ways for
 'populate_extras' to be run, one for multiple users and one for single
 users. This makes for multiple points of failure. In the future it would
 be nice to abstract this somewhat. Perhaps BP_Core_User could be
 refactored somewhat so that it can be instantiated either for single user
 IDs (the current setup) or for multiple ones; then, instead of doing
 populate_extras and WP_User_Query business inside of BP_User_Query, the
 latter class would instead use BP_Core_User to call up the additional data
 required, once it'd come up with a list of user ids. This is similar to
 how WP_User_Query handles all_with_meta queries - it uses WP_User
 internally. This is probably not required for 1.7, but is a good direction
 for future parity with WP.

 Whew. Sorry for all the words - I've thought a lot about this, and want to
 make sure it's done right, as it's a major step toward scaling BP.
 Feedback welcome.

-- 
Ticket URL: <https://buddypress.trac.wordpress.org/ticket/4060#comment:25>
BuddyPress <http://buddypress.org/>
BuddyPress


More information about the buddypress-trac mailing list