[buddypress-trac] [BuddyPress] #4060: Improve performance of BP_Core_User::get_users() (was: slow queries in bp-core-classes)
buddypress-trac at lists.automattic.com
buddypress-trac at lists.automattic.com
Wed Mar 14 15:40:41 UTC 2012
#4060: Improve performance of BP_Core_User::get_users()
-----------------------------------+-----------------------------
Reporter: shanebp | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Future Release
Component: Members | Version: 1.2.9
Severity: critical | Resolution:
Keywords: 1.7-early 2nd-opinion |
-----------------------------------+-----------------------------
Changes (by boonebgorges):
* keywords: => 1.7-early 2nd-opinion
* type: defect (bug) => enhancement
* severity: normal => critical
* milestone: 1.6 => Future Release
Comment:
Hi everyone. I've had a chance to do a bit of experimentation with my
wp_bp_activity suggestion. It doesn't work out of the box, but if I do a
bit of query restructuring, it works really beautifully. Here are some
sample stats, using a database of about 1,000,000 activity items and
235,000 users.
*CURRENT BP TRUNK*
{{{
mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename,
u.user_login,
u.display_name, u.user_email , um.meta_value as last_activity FROM
wp_users u LEFT JOIN
wp_usermeta um ON um.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND
u.user_status = 0
AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 20;
------------+------------------------------------------------+---------------------+
20 rows in set (11.31 sec)
}}}
{{{
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_usermeta
um ON um.user_id =
u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND
um.meta_key =
'last_activity' ORDER BY um.meta_value DESC;
+----------------------+
| COUNT(DISTINCT u.ID) |
+----------------------+
| 237511 |
+----------------------+
1 row in set (4.00 sec)
}}}
*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, WITH SIMILAR QUERY STRUCTURE*
I wrote a script that rewrote last_activity usermeta to the activity
table, with type = 'last_activity' and user_id = $user_id. Then I changed
the JOIN so it was against activity instead of usermeta, but left the
basic query structure the same.
{{{
mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename,
u.user_login,
u.display_name, u.user_email , a.date_recorded as last_activity FROM
wp_users u LEFT JOIN
wp_bp_activity a ON a.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0
AND u.user_status = 0
AND a.type = 'last_activity' ORDER BY a.date_recorded DESC LIMIT 0, 20;
------------+------------------------------------------------+---------------------+
20 rows in set (12.22 sec)
}}}
{{{
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN
wp_bp_activity a ON a.user_id =
u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND a.type
= 'last_activity'
ORDER BY a.date_recorded DESC;
+----------------------+
| COUNT(DISTINCT u.ID) |
+----------------------+
| 237511 |
+----------------------+
1 row in set (4.07 sec)
}}}
As you can see, there is not any benefit from this move in itself, because
the way in which the items are joined prevents MySQL from taking advantage
of the date_recorded index.
*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, AND CHANGED QUERY TO A SUBQUERY*
In order to take advantage of the date_recorded index, I tested with a
subquery, with extremely promising results. Because the data is structured
differently, is requires three queries instead of two (one to get the user
IDs, one to get the user IDs, another to get the total count, and a third
(this is the new one) to get the missing data out of wp_users that used to
come from the join.
{{{
mysql> select distinct user_id from wp_bp_activity where user_id not in
(select ID from
wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and type =
'last_activity'
order by date_recorded desc limit 20;
+---------+
20 rows in set (0.01 sec)
}}}
{{{
mysql> select count(distinct user_id) from wp_bp_activity where user_id
not in (select ID from wp_users where spam != 0 OR deleted != 0 OR
user_status != 0) and type = 'last_activity';
+-------------------------+
| count(distinct user_id) |
+-------------------------+
| 237511 |
+-------------------------+
1 row in set (4.15 sec)
}}}
{{{
mysql> SELECT ID, user_registered, user_nicename, user_login,
display_name, user_email FROM wp_users WHERE ID IN
(1,156563,31438,18959,115038,95468,81552,217731,180316,50888,129014,214293,80522,16344,31461,9046
,1035,51964,176472,127686);
------------+------------------------------------------------+
20 rows in set (0.00 sec)
}}}
Obviously, this is a dramatic improvement. Just for fun, I decided to try
a similar subquery structure using the current last_activity data in
wp_usermeta:
{{{
mysql> select distinct user_id from wp_usermeta where user_id not in
(select ID from wp_users where spam != 0 OR deleted != 0 OR user_status !=
0) and meta_key = 'last_activity' order by meta_value desc limit 20;
+---------+
20 rows in set (6.09 sec)
}}}
6.09 seconds is much faster than the 11-12 seconds for a join, but we are
still forced to use filesort to do the DESC sort due to the lack of an
index on wp_usermeta.meta_value, which is the real source of pain.
====
TAKEAWAYS:
Moving to subqueries for member queries has the potential for huge
benefits, so I think we should pursue it. And I think that moving
last_activity to the activity table is hugely beneficial as well, so we
should also pursue that. (For group last_activity too, though usually
groups are not such a pressure point since there are generally fewer of
them.) But there are a couple of caveats that will make this a non-trivial
task:
- The query methods will need to be rewritten for the three-query
structure mentioned above, rather than the current two-query structure.
This is straightfoward.
- We are currently using the very same query structure for all different
"types" of user queries: last active, alphabetical, newest registered,
online. 'last active' is the default, and 'online' is really a subset of
'last active', which is why I've focused on it above. Though this needs
testing, I have a feeling that 'alphabetical' and 'newest registered' will
need to be structured quite differently. 'newest registered' is pretty
easy - wp_users has an index on user_registered, so we query that and then
use a subquery or a second query to get last active data. There's a good
chance that we're never going to be able to optimize 'alphabetical'
without some major modifications, because we store that data in
wp_users.display_name and wp_bp_xprofile_data.data, neither of which are
indexed. Regardless, it will not work well with the wp_bp_activity
subquery method.
- Backward compatibility. Do we keep 'last_activity' data mirrored in
wp_usermeta? Generally mirroring data is bad, but (1) it's nice to leave
it for plugins that depend on it (though it would be problematic for
plugins that *modify* it, and (2) moving to the activity table will add a
small amount of overhead for the loggedin/displayed user, since usermeta
is loaded in the get_userdata() call, while the activity data will take a
separate query. On balance, I think it's best to just move it, and to urge
plugin authors to use the existing bp_core_record_activity() and
bp_core_get_last_activity() functions, which need to be refactored of
course.
- The COUNT queries are equally bad in all the scenarios above. There's no
good way around this. So we should probably stop doing a live query for
total member counts, store it as an option, and then bust and recalculate
when new members join/become active or when accounts are deleted.
- Backward compatibility: We have filters on the sql queries, which will
more or less break if the queries are radically restructured. I think it
makes sense to remove the filters and add new ones; see
https://core.trac.wordpress.org/ticket/18536#comment:14 and surrounding
discussion for WP reasoning on a similar issue
====
I'm moving this ticket to 1.7-early, because IMO it is going to require a
large amount of work and especially user/plugin testing.
--
Ticket URL: <https://buddypress.trac.wordpress.org/ticket/4060#comment:9>
BuddyPress <http://buddypress.org/>
BuddyPress
More information about the buddypress-trac
mailing list