[wp-hackers] MySQL: Union vs global table

Morgan Tocker morgan.tocker at oracle.com
Mon Feb 24 15:11:59 UTC 2014


Hi wp-hackers,

> thank you so much for your availability. I will contact you in private shortly.

With Dino’s permission I’m sharing the email he sent me and my reply - 

———

Hi Dino,

> thank you again for your time. Attached you can find the definition for the tables used by my plugin, for your reference. 
> It's basically a real time analytics tool for WordPress. The main table is wp_slim_stats, which contains all the pageviews.
> 
> In a MU environment, the plugin will create its tables for each new site (wp_1_slim_stats, wp_2_slim_stats, etc). Site admins can access reports for their own blogs. 
> 
> An activity log in reverse chronological order would be something like:
> 
> SELECT t1.*, tb.*,tci.* 
> FROM wp_slim_stats t1 
>    INNER JOIN wp_slim_browsers tb ON t1.browser_id = tb.browser_id
>    INNER JOIN wp_slim_content_info tci ON t1.content_info_id = tci.content_info_id
> ORDER BY t1.dt desc
> LIMIT 0, 50
> 
> The corresponding explain is attached as explain-select-1.png.

So here is how this is working:

MySQL examines table t1 (wp_slim_stats) first and applies a sort on dt (will be expensive if there are a lot of rows).
From the largest dt value until 50 is reached, it will very efficiently do a primary key lookup on tb (wp_slim_browsers) and then tci (wp_slim_content_info).  As long as there is full participation (INNER JOIN always finds rows in this table, the joining is efficient).

Short story:
I think you need an index on wp_slim_stats.dt to avoid the sort - but otherwise it’s an efficient query.

> Now, some users have requested to get a Network View of their reports, which basically combines all the pageviews to see what post is the most popular network-wide, etc.
> 
> In order to do that, I was thinking about two approaches: 1) do a UNION ALL of all those selects or 2) replace all the blog-specific tables with a global wp_slim_stats that includes a new column blog_id.
> 
> The latter seems to be easier to implement, but what about performances? Think about a network with 100 blogs that make 100k pageviews/month. The new global table would have 100x100k = 10M records.

You would not be UNION’ing ALL millions of rows though, you’d be UNIONing top 50 from all tables right?  As long as you can use the indexes (like the query you specified), then the number of rows in the table doesn’t matter.  I think as long as it is efficient enough, this is probably the more desired option?

In MySQL until 5.7 (currently in developer preview) UNION ALL needs to creates a temporary table to buffer results before sending them to the client (so there is a small level of inefficiency), so make sure that each query being unioned doesn’t have too many rows.  More information on this optimization on my blog here: 

http://www.tocker.ca/2013/12/10/testing-the-union-all-optimization-in-mysql-5.7-dmr3.html


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: table-definitions.txt
URL: <http://lists.automattic.com/pipermail/wp-hackers/attachments/20140224/e2e43515/attachment.txt>
-------------- next part --------------

-------------- next part --------------





More information about the wp-hackers mailing list