[HyperDB] Sharding scheme
Brian Layman
hyperdb at thecodecave.com
Tue Jan 25 18:03:41 UTC 2011
I have an install that will likely grow to 10,000 sites by year's end.
There's potential in this market space for that number to be
significantly higher. In the last week, it has doubled in size to 2200
sites. The database now has over 20,000 tables. Obviously, there is a
need for sharding here. There were extenuating circumstances that
prevented sharding on the initial launch but I would like to activate it
now.
I was wondering what you thought would be a good breakdown of the
sites... I was thinking 100 sites per dataset would be a fair number.
That puts each databases at 1500-ish tables per dataset. It would
result in my having 100 databases in the end, it has a nice even ring to
it and is easily comprehended at a glance. My worry is that I am
creating too many database connections and will a bottle neck of another
kind. Is this a concern? I don't want to create a problem where one
doesn't currently exist. The hardware is keeping up just fine at the
moment and I am getting sub 3 or 2 second load times.
So I was thinking of functions like this in db-config.php:
for ( $counter = 0; $counter <= 30; $counter++) { // Allow for 3000
sites for now
// Add the connection to the Master
$wpdb->add_database(array(
'host' => DB_HOST,
'user' => DB_USER,
'password' => DB_PASSWORD,
'name' => DB_NAME . '_' . $counter,
'write' => 1,
'read' => 1,
'dataset' => DB_NAME . '_' . $counter,
'timeout' => 0.2, ));
// Add the connection to the Slave(s)
$wpdb->add_database(array(
'host' => DB_HOST2,
'user' => DB_USER,
'password' => DB_PASSWORD,
'name' => DB_NAME . '_' . $counter,
'write' => 0,
'read' => 1,
'dataset' => DB_NAME . '_' . $counter,
'timeout' => 0.2,
));
}
$wpdb->add_callback('ehi_dbname');
function ehi_dbname($query, $wpdb) {
if ( preg_match("/^{$wpdb->base_prefix}\d+_/i", $wpdb->table) ) {
$dbid = intval($wpdb->blogid/100);
$NEW_DB_NAME = DB_NAME . "_" . $dbid;
error_log ($wpdb->table . " would use :" . $NEW_DB_NAME);
} else {
$NEW_DB_NAME = DB_NAME . "_common";
error_log ($wpdb->table . " would use :" . $NEW_DB_NAME);
}
return $NEW_DB_NAME;
}
Other than the fact that I wouldn't create all 100 database connections
immediately, does that make sense? I'll probably make a foreach loop to
make adding read servers an easy task. But I think I've got the basic
concepts all there. Does this seem to be a reasonable configuration
with 1 read and 1 read/write server. These are currently running on hand
built Amazon m1.large ec2 servers - though I am considering switching
them over to RDS during this process.
One more thought... Moving forward, as the need arises to create more
read servers, does it make sense to have all of the datasets active on
all of the read servers in order to keep reads evenly distributed? Or
would it be better to assign odd datasets to odd numbered read servers
and even to even numbered read servers and thus divide the connections
needed and evenly reduce the databases being accessed on each machines.
Would this perhaps improve caching?
I'd love to gain from the expertise of people who have been down this
road before.
Thanks!
-Brian Layman
The eHermit
eHermitsInc.com
More information about the HyperDB
mailing list