[wp-hackers] Time for a wp_post_relationships table?

Otto otto at ottodestruct.com
Wed Aug 4 20:59:56 UTC 2010


On Wed, Aug 4, 2010 at 2:57 PM, Mike Schinkel
<mikeschinkel at newclarity.net> wrote:
> On the surface that goes against everything I've learned about databases and SQL in both developing and professional instruction over 20+ years but I'm not going to summarily dismiss it out of hand.

You're right, it does fly in the face of conventional thinking.
Context is what matters here.

In a conventional networked database application, you're looking at a
multi-user environment with a lot of people doing things
simultaneously. DB servers can scale pretty well in that respect, but
generally they are limited in some manner on the other end. Only a few
types of applications probably access the database, the queries are
all relatively similar, etc, etc. DBs are designed to support that
specific type of load very well (query caching and other tricks are
commonplace), and thus they can handle lots of extra processing on the
back end. In DB design of this sort, you're generally encouraged to
use complex queries, because your bottleneck isn't the processing
power of the DB server, the bottleneck is often the data transfer
itself, as you're sending data down the wire to all the applications
accessing it. So getting exactly what you want is a good thing.

When it comes to web applications though, the game changes a bit. Now
you're looking at a whole other magnitude of simultaneous users. And
those users (web applications) are all sorts of different things,
executing a larger variety of queries and getting all sorts of
different types of data. Plus, in a lot of hosting environments, one
DB server may be supporting hundreds or thousands of websites. It's a
heavy load to carry. On the other hand, these websites are usually
connected directly to the DB with a fat network connection, so the
data transfer rate isn't an issue. All the DB data transfer is local
to the webservers, which then serve it off to the web browsers.

The point being here that your processing power on the websites is a
whole lot greater than your processing power on the DB server serving
all those sites. So forcing the DB server to do your work for you is a
losing game. Why have a dozen webservers sitting idle, waiting in line
for the DB to do all this extra work? Better to get the data and
offload it over that fat pipe to the webservers, where they can the
majority of your processing.

This sort of a scenario is also the starting point for the whole NoSQL
movement, where traditional relational database design is being
ditched for a different storage system entirely. Usually ones based on
storing key/value pairs. BigTable, Cassandra, even memcached take this
approach.

Web apps are fundamentally different enough from traditional database
applications that you have to factor that in as well when designing
them.

> Can you provide any benchmarks or commentary from people at MySQL to show this to be true other and not just opinion?  Any other evidence?

Not off the top of my head, but look into the whole NoSQL thing and
you'll find a lot of commentary along these lines.

> That said, if it is true then you'd really want to avoid using the taxonomy system to link posts (because it requires 3 to 4 table joins) and instead prefer the proposed wp_post_relationships table I proposed (because it would require only 1 to 2 table joins), right?

It depends on the specifics of what you're trying to do. In the
examples we've discussed, I don't think it's absolutely necessary to
have a way to connect post to other posts directly. All of these I've
looked at look more like you'd be selecting terms from posts and then
getting posts based on that. One, maybe two joins only, since you
start out by selecting a term, not by selecting a post and finding a
new post from it..

-Otto


More information about the wp-hackers mailing list