top | item 4339595

(no title)

zxoq | 13 years ago

That explains how you serve the same data to that user, but the more interesting problem is: how do you propagate the changes to the rest of the users over time?

Do you run some sort of backlog daemon that moves stuff between the different shards? And how is that faster than doing so immediately (the same data would be moved either way...)?

discuss

mechanical_fish|13 years ago

The generic answer must of course be: It depends on your architecture.

My answer assumed that we were talking about replication lag, where you are copying data among the back-end data stores "immediately", which is to say "quickly, hopefully on the order of milliseconds but you can't count on that". Unfortunately, depending on the design of your database, replication is often not "immediate" enough to guarantee consistent results to one user even under the best conditions, let alone when the replication slows down or breaks, and that's why you might build something like the affinity system I just outlined.

The actual mechanism of replication depends on the data store. MySQL replication involves having a background daemon stream the binary log of database writes to another daemon on a slave server which replays the changes in order. (Of course, multiple clients can write to MySQL at a time, and ordering all those writes into a consistent serial stream is MySQL's job.) Other replication schemes involve having the client contact multiple servers and write directly to each one, aborting with an error if a critical number of those writes are not acknowledged as complete - and then there presumably needs to be some kind of janitorial process that cleans up any inconsistencies afterward. (I'm no CS genius; consult your local NoSQL guru for more information. Needless to say, you should strive to buy this stuff wrapped up in a box rather than build it yourself.)

As for "shards", that's different. My understanding of the notion of a "shard", as distinct from a "mirror" or a "slave", is that it's a mechanism for partitioning writes. If you've built a system with N servers but every write has to happen on each one, the throughput will be no higher than on a system with one server and one write, and you're doing "sharding" wrong. What you wanted was mirroring, and hopefully you weren't expecting that to help scale your write traffic.

If data does need to be spread around from shard to shard, hopefully it has to happen eventually rather than quickly, and some temporary inconsistencies from shard to shard are not a big problem. So you can use something like a queueing system: If you have some piece of data that needs to be mailed to N databases, you put it on the queue, and later a worker comes along and takes it off the queue and tries to write it to all N databases, and when it fails with one or two of the databases it puts the task back on the queue to try again later. (Or, perhaps, you put N messages on the queue, one per database, and run one worker for each database.) You'll also need a procedure or tool to deal with stale tasks, and overflowing queues, and the occasional permanent removal of databases from the system.

This soliloquy is looking long enough, now, so it seems like a good time to reiterate patio11's point: Don't build more scaling than you need right now. Hire someone to convince you not to design and build these things.

loumf|13 years ago

Scalability isn't about it being faster, just "able to be scaled". Sometimes it's about being able to make it faster by adding hardware and configuration.