[buddypress-trac] [BuddyPress] #4060: Improve performance of BP_Core_User::get_users()
buddypress-trac at lists.automattic.com
buddypress-trac at lists.automattic.com
Sat Aug 25 15:19:03 UTC 2012
#4060: Improve performance of BP_Core_User::get_users()
-------------------------------------------------+-------------------------
Reporter: shanebp | Owner:
Type: enhancement | Status: new
Priority: highest | Milestone: 1.7
Component: Members | Version: 1.2.9
Severity: critical | Resolution:
Keywords: 2nd-opinion has-patch needs-testing |
dev-feedback |
-------------------------------------------------+-------------------------
Changes (by boonebgorges):
* keywords: 2nd-opinion => 2nd-opinion has-patch needs-testing dev-
feedback
Comment:
4060.01.patch is a first rough draft of overhauling our user queries.
First, a brief summary of the issues at hand. Our current user queries run
through BP_Core_User::get_users(). The queries built by this method do not
scale very well at all, due to a couple of fundamental issues:
- Directional joins against global tables. The query constructor always
joins (at least) wp_users against wp_usermeta, and almost always joins
against some BP tables too
- Directional joins against improperly indexed tables. wp_usermeta does
not have an index on meta_value (which is used for last_activity - every
single query - as well as a number of other query types)
- SELECT * and other greedy queries. With very large datasets, queries
like SELECT * and other multiple-column SELECT statements increase the
likelihood that temp tables will be required.
- Unflexible COUNT(*) queries. At a certain point, there's nothing you can
really do to optimize these kinds of queries. Our current user query setup
prevents admins from easily overriding them (with, eg, SQL_CALC_FOUND_ROWS
and FOUND_ROWS()) or caching them or skipping them altogether
My proposed solution 4060.01 introduces BP_User_Query and deprecates
BP_Core_User::get_users(). This solution eliminates some of the issues
above, mitigates some of those that it can't eliminate outright, and has a
couple of added benefits. I'll outline the general strategy and its
benefits, and then pose some questions for feedback.
BP_User_Query is intentionally modeled on WP_User_Query, with similar
method and property names, and a similar structure. This should make it
easier for WP devs to approach. It improves our user queries in a couple
of key ways. First and foremost, it splits our big multiple JOIN into
several steps:
1) A custom query for getting matching IDs only. This parallels the
split_the_query option recently introduced in WP_Query, for similar
reasons: when your initial WHERE query (which does all the heavy lifting
of filtering) only loads IDs, you are loading far less information into
memory than if you do SELECT(*) or something like that, with the result
that you have to resort far less to temporary tables.
2) Once we have the resulting user IDs, we use WP_User_Query to get some
of the WP core data that BP needs (user_email, etc)
3) Finally, we use the user IDs to run another set of queries, to fetch
the BP-specific data corresponding to 'populate_extras': friend counts,
is_friend, last_activity, etc. Much of this data comes from wp_usermeta,
and was previously fetched in separate standalone queries; I was able to
refactor it to roughly halve the number of queries required for
populate_extras.
Some sample preliminary results, on a dataset with ~113K rows in wp_users
and ~1.35M rows in wp_usermeta:
= TYPE: 'newest' =
BP_Core_User::get_users
- Main query: 3,050ms
- Count query: 2,372ms
BP_User_Query
- Main query: 1,577ms
- Count query: 185ms
= TYPE: 'alphabetical' (xprofile disabled or BP-WP profile sync enabled) =
BP_Core_User::get_users
- Main query: 3,464ms
- Count query: 2,491ms
BP_User_Query
- Main query: 88ms
- Count query: 68ms
= TYPE: 'alphabetical' (xprofile enabled and BP-WP profile sync disabled)
=
BP_Core_User::get_users
- Main query: 3,464ms
- Count query: 2,491ms
BP_User_Query
- Main query: 509ms
- Count query: 151ms
As you add more filters to the query (like search_terms), they get slower,
of course. But the speed improvements are roughly linear to what you see
above. As you can see, avoiding unnecessary JOINs results in huge
improvements. Moreover, the restructuring means that BP tables are never
joined against WP tables during user queries (except in one or two filter
cases that I haven't gotten around to rewriting yet - see
meta_key/meta_value).
A couple of questions and directions for further development:
- 'last_activity' - The current bottleneck of the whole setup is the
'last_activity' usermeta value. We use this to order and filter many of
our user queries. But storing this data in wp_usermeta means that there's
no index, so that our queries are quite slow (see my 'newest' numbers
above). Moving this to a properly indexed transactional table would speed
things up enormously. wp_bp_activity is the obvious choice, since it's
built for this purpose. But such a move would mean that the BP Activity
component could not be shut off (or would have to be split into two parts,
so that the core couldn't be shut off), and it would also cause backward
compatibility issues. For BP 1.7, I propose that we do *not* pursue any
change along these lines - one thing at a time - but that we revisit for a
future release.
- Backward compatibility - I have chosen to leave
BP_Core_User::get_users() untouched, and to introduce BP_User_Query as a
new class. That means that anyone continuing to use
BP_Core_User::get_users() directly will see a deprecated notice, but their
code will still work. The point at which I'm choosing to swap out the new
class is in functions like bp_core_get_users(). (There are 6-8 other
places throughout BP where a similar change will have to be made; I
haven't done it yet, because I want to settle on a strategy first.) For
the vast majority of uses, this changeover will be totally seamless.
However, anyone who is directly filtering the get_users() sql queries
('bp_core_get_paged_users_sql', 'bp_core_get_total_users_sql') will be
affected by the move to BP_User_Query, because these filters disappear.
(There's simply no way to salvage them, given that the entire query
structure is changed.) My proposed backpat strategy, as you can see in the
patch, is to provide a 'bp_use_legacy_user_queries' filter, which takes
the __FUNCTION__ as a parameter, so that developers can opt to fall back
on the old method in selected cases. What do you think of this strategy?
- Ongoing status of BP_Core_User - If we're no longer using
BP_Core_User::get_users(), the only thing that BP_Core_User is really used
for is instantiating a *single* user object. This, IMO, is a good thing -
it more closely parallels the relationship between WP_User and
WP_User_Query. However, as it stands, there is currently (both in the
current codebase and my own patch) some reduplication of efforts. For
example, it doesn't make sense for there to be two different ways for
'populate_extras' to be run, one for multiple users and one for single
users. This makes for multiple points of failure. In the future it would
be nice to abstract this somewhat. Perhaps BP_Core_User could be
refactored somewhat so that it can be instantiated either for single user
IDs (the current setup) or for multiple ones; then, instead of doing
populate_extras and WP_User_Query business inside of BP_User_Query, the
latter class would instead use BP_Core_User to call up the additional data
required, once it'd come up with a list of user ids. This is similar to
how WP_User_Query handles all_with_meta queries - it uses WP_User
internally. This is probably not required for 1.7, but is a good direction
for future parity with WP.
Whew. Sorry for all the words - I've thought a lot about this, and want to
make sure it's done right, as it's a major step toward scaling BP.
Feedback welcome.
--
Ticket URL: <https://buddypress.trac.wordpress.org/ticket/4060#comment:25>
BuddyPress <http://buddypress.org/>
BuddyPress
More information about the buddypress-trac
mailing list