[buddypress-trac] [BuddyPress] #4060: Improve performance of BP_Core_User::get_users() (was: slow queries in bp-core-classes)

buddypress-trac at lists.automattic.com buddypress-trac at lists.automattic.com
Wed Mar 14 15:40:41 UTC 2012


#4060: Improve performance of BP_Core_User::get_users()
-----------------------------------+-----------------------------
 Reporter:  shanebp                |       Owner:
     Type:  enhancement            |      Status:  new
 Priority:  normal                 |   Milestone:  Future Release
Component:  Members                |     Version:  1.2.9
 Severity:  critical               |  Resolution:
 Keywords:  1.7-early 2nd-opinion  |
-----------------------------------+-----------------------------
Changes (by boonebgorges):

 * keywords:   => 1.7-early 2nd-opinion
 * type:  defect (bug) => enhancement
 * severity:  normal => critical
 * milestone:  1.6 => Future Release


Comment:

 Hi everyone. I've had a chance to do a bit of experimentation with my
 wp_bp_activity suggestion. It doesn't work out of the box, but if I do a
 bit of query restructuring, it works really beautifully. Here are some
 sample stats, using a database of about 1,000,000 activity items and
 235,000 users.

 *CURRENT BP TRUNK*
 {{{
 mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename,
 u.user_login,
 u.display_name, u.user_email , um.meta_value as last_activity FROM
 wp_users u LEFT JOIN
 wp_usermeta um ON um.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND
 u.user_status = 0
 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 20;
 ------------+------------------------------------------------+---------------------+
 20 rows in set (11.31 sec)
 }}}

 {{{
 mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_usermeta
 um ON um.user_id =
  u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND
 um.meta_key =
 'last_activity' ORDER BY um.meta_value DESC;
 +----------------------+
 | COUNT(DISTINCT u.ID) |
 +----------------------+
 |               237511 |
 +----------------------+
 1 row in set (4.00 sec)
 }}}

 *MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, WITH SIMILAR QUERY STRUCTURE*

 I wrote a script that rewrote last_activity usermeta to the activity
 table, with type = 'last_activity' and user_id = $user_id. Then I changed
 the JOIN so it was against activity instead of usermeta, but left the
 basic query structure the same.

 {{{
 mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename,
 u.user_login,
 u.display_name, u.user_email , a.date_recorded as last_activity FROM
 wp_users u LEFT JOIN
 wp_bp_activity a ON a.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0
 AND u.user_status = 0
  AND a.type = 'last_activity' ORDER BY a.date_recorded DESC LIMIT 0, 20;
 ------------+------------------------------------------------+---------------------+
 20 rows in set (12.22 sec)
 }}}


 {{{
 mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN
 wp_bp_activity a ON a.user_id =
  u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND a.type
 = 'last_activity'
  ORDER BY a.date_recorded DESC;
 +----------------------+
 | COUNT(DISTINCT u.ID) |
 +----------------------+
 |               237511 |
 +----------------------+
 1 row in set (4.07 sec)
 }}}

 As you can see, there is not any benefit from this move in itself, because
 the way in which the items are joined prevents MySQL from taking advantage
 of the date_recorded index.

 *MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, AND CHANGED QUERY TO A SUBQUERY*

 In order to take advantage of the date_recorded index, I tested with a
 subquery, with extremely promising results. Because the data is structured
 differently, is requires three queries instead of two (one to get the user
 IDs, one to get the user IDs, another to get the total count, and a third
 (this is the new one) to get the missing data out of wp_users that used to
 come from the join.

 {{{
 mysql> select distinct user_id from wp_bp_activity where user_id not in
 (select ID from
 wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and type =
 'last_activity'
 order by date_recorded desc limit 20;
 +---------+
 20 rows in set (0.01 sec)
 }}}

 {{{
 mysql> select count(distinct user_id) from wp_bp_activity where user_id
 not in (select ID from wp_users where spam != 0 OR deleted != 0 OR
 user_status != 0) and type = 'last_activity';
 +-------------------------+
 | count(distinct user_id) |
 +-------------------------+
 |                  237511 |
 +-------------------------+
 1 row in set (4.15 sec)
 }}}

 {{{
 mysql> SELECT ID, user_registered, user_nicename, user_login,
 display_name, user_email FROM wp_users WHERE ID IN
 (1,156563,31438,18959,115038,95468,81552,217731,180316,50888,129014,214293,80522,16344,31461,9046
 ,1035,51964,176472,127686);
 ------------+------------------------------------------------+
 20 rows in set (0.00 sec)
 }}}

 Obviously, this is a dramatic improvement. Just for fun, I decided to try
 a similar subquery structure using the current last_activity data in
 wp_usermeta:

 {{{
 mysql> select distinct user_id from wp_usermeta where user_id not in
 (select ID from wp_users where spam != 0 OR deleted != 0 OR user_status !=
 0) and meta_key = 'last_activity' order by meta_value desc limit 20;
 +---------+
 20 rows in set (6.09 sec)
 }}}

 6.09 seconds is much faster than the 11-12 seconds for a join, but we are
 still forced to use filesort to do the DESC sort due to the lack of an
 index on wp_usermeta.meta_value, which is the real source of pain.

 ====

 TAKEAWAYS:

 Moving to subqueries for member queries has the potential for huge
 benefits, so I think we should pursue it. And I think that moving
 last_activity to the activity table is hugely beneficial as well, so we
 should also pursue that. (For group last_activity too, though usually
 groups are not such a pressure point since there are generally fewer of
 them.) But there are a couple of caveats that will make this a non-trivial
 task:

 - The query methods will need to be rewritten for the three-query
 structure mentioned above, rather than the current two-query structure.
 This is straightfoward.
 - We are currently using the very same query structure for all different
 "types" of user queries: last active, alphabetical, newest registered,
 online. 'last active' is the default, and 'online' is really a subset of
 'last active', which is why I've focused on it above. Though this needs
 testing, I have a feeling that 'alphabetical' and 'newest registered' will
 need to be structured quite differently. 'newest registered' is pretty
 easy - wp_users has an index on user_registered, so we query that and then
 use a subquery or a second query to get last active data. There's a good
 chance that we're never going to be able to optimize 'alphabetical'
 without some major modifications, because we store that data in
 wp_users.display_name and wp_bp_xprofile_data.data, neither of which are
 indexed. Regardless, it will not work well with the wp_bp_activity
 subquery method.
 - Backward compatibility. Do we keep 'last_activity' data mirrored in
 wp_usermeta? Generally mirroring data is bad, but (1) it's nice to leave
 it for plugins that depend on it (though it would be problematic for
 plugins that *modify* it, and (2) moving to the activity table will add a
 small amount of overhead for the loggedin/displayed user, since usermeta
 is loaded in the get_userdata() call, while the activity data will take a
 separate query. On balance, I think it's best to just move it, and to urge
 plugin authors to use the existing bp_core_record_activity() and
 bp_core_get_last_activity() functions, which need to be refactored of
 course.
 - The COUNT queries are equally bad in all the scenarios above. There's no
 good way around this. So we should probably stop doing a live query for
 total member counts, store it as an option, and then bust and recalculate
 when new members join/become active or when accounts are deleted.
 - Backward compatibility: We have filters on the sql queries, which will
 more or less break if the queries are radically restructured. I think it
 makes sense to remove the filters and add new ones; see
 https://core.trac.wordpress.org/ticket/18536#comment:14 and surrounding
 discussion for WP reasoning on a similar issue

 ====

 I'm moving this ticket to 1.7-early, because IMO it is going to require a
 large amount of work and especially user/plugin testing.

-- 
Ticket URL: <https://buddypress.trac.wordpress.org/ticket/4060#comment:9>
BuddyPress <http://buddypress.org/>
BuddyPress


More information about the buddypress-trac mailing list