We need to shard the keys for the redis server. For that, we need to define a good strategy for the sharding keys and then implement it. What we want is a constant key hashing mechanism, so we can add or remove shards without having to move data along.
Redis itself doesn't provide a stable sharding mechanism itself. The Redis documentation has a section about partitioning that I spent some time reading: http://redis.io/topics/partitioning So, they point at different possibilities: 1. Use a proxy to do the request to the right database, they point to a proxy used by twitter named Twmemproxy (https://github.com/twitter/twemproxy); 2. Have the client do the sharding directly. They don't recommend using this directly and prefer using a proxy. These two techniques are quite similar in concept, but One really important thing to keep in mind is that it's not currently possible, with Redis, to add / remove shards as the data grows, unless we consider re-balancing the data inside the node a solution (that is, take time to move the data to the right nodes). One technique they consider is to do what they call "Pre Sharding". That is, adding new shards beforehand, even if we're not using all the memory on them. My belief is that it could work pretty well for us, if we find a way to copy the data from one redis to another bigger one. Loop currently provides that, but only when we're using one database, not using shards. Using shards, this is possible (and explained in their documentation) by creating a new redis instance, registering it as a slave to the old (master) redis instance, and once the data is copied, turn the old one off. I'm not sure how that matches with what's implemented on the Amazon side, but I foresee this being our only problem). It's probably also worth noticing that Redis provides a beta quality so called "cluster" http://redis.io/topics/cluster-tutorial. It's not ready yet and probably not good enough for us nor available on Amazon, but that's worth keeping an eye on it.
I don't know what's the best option to use at the AWS level, because I believe AWS imposes us some use of Redis. Bob, Dean, do you know more about that maybe?
(In reply to Alexis Metaireau (:alexis) from comment #2) > I don't know what's the best option to use at the AWS level, because I > believe AWS imposes us some use of Redis. Bob, Dean, do you know more about > that maybe? I'm not sure. At the AWS re:Invent elasticache deep-dive the presenter recommended using a consistent hashing algorithm (http://dl.acm.org/citation.cfm?id=258660) to support data sharding. I'm not against re-balancing, but I'm not sure it's a wheel we want to re-invent. It might be that the TTLs are sufficient to handle purging data after a re-balance.
After discussion, we agree with :bobm here. The plan is to go with a 10 nodes shard cluster. Which allow a size from 10GB to 2.37TB I allows consistent hashing as well. We will start to work on a implementation for it.
The plan is roughtly as follows: - we definitely want to use consistent hashing as it lowers a lot the rebalancing (we rebalance only one node when we add more nodes); - we want to reuse twemproxy which is doing that already for us: https://github.com/twitter/twemproxy - we need to find a way to do rebalancing (I believe we already have that with the migration backend ;)) We'll work on sharding next week and the plan is to test this solution on stage and see how it behaves.
I've considered a few options to make our Redis setup sharded. The best option we have (well, see the note below) is to use a proxy to distribute our key lookups and use consistent hashing with the pre-sharding technique. Principle is simple: each key goes trough a function which associates it to a unique shard. Pre-sharding means you chose a number of shards and then you're stuck with this number of shards, until you rebalance the keys. The hashing function we're using allows consistent hashing, which means that if we need to add / remove nodes, only some keys would need to be rebalanced into the new shard setup. Not ideal, but feasible. In order to avoid these rebalances, you chose a hight number of shards from the beginning, and you increase the size of each shard as traffic shows up (or decrease if you get less and less usage). If we chose to have 10 shards, the maximum size of the database would be 10 * 254Gb (which is 25.4 terabytes, representing approx 400 000 000 active users with 3 rooms if my math is correct) considering the maximum size available in AWS right now. The proposal ============ We should use Twemproxy , a really fast proxy which supports consistent hashing. It's considered to be the state of the art for what we're trying to accomplish. Twemproxy doesn't support all Redis operations . For instance, transactions aren't supported. We've modified the loop-server to bypass migrations  (which we are only using for minimizing the number of calls to the database), and that's the only unsupported operation we're currently using in our code, so we're good. We will still need to do the original migration, and it will be exactly like what we've done in the past when we increased the size of the database, but rather than migration from a Redis instance to another one, we will use the ports exposed by Twemproxy as the new database. Ops questions ============= I'm mainly looking for ops feedback here, since the feedback look from devs is already done. I think having one proxy per web-head would allow us to do what we want. The *worse* overhead of Twemproxy is 20%, but should be quite low for the operations we're currently doing. When scaling up our nodes, we would need to do the following : - Start empty new instances; - Move data configuring these new instances as slaves for your source instances. - Update the configuration of Twemproxy with the new server IP address. - Send the SLAVEOF NO ONE command to the slaves in the new server. - Restart the proxy - Finally shut down the no longer used instances. I believe the work on your side would be to: - Package Twemproxy - Add the configuration files in puppet (I can help with this) - Create a script (?) to handle the scaling up of our nodes Does this looks correct to you? For richard ============== So, it seems once we'll have everything we need from ops, the plan is to test this. We should be able to do it the same way we've done for the migration. Things to check: - We're able to do the initial migration of the data from the single redis to our sharded cluster; - We're able to scale up the cluster by increasing the size of all nodes. Notes ===== Actually, the real best option is something baked by Antirez (the guy who wrote Redis), named redis-cluster. Problem is, this software is still in beta (RC4 in the past weeks), and as such we cannot use it as of now. I believe it will be available in the next months with the new redis release, and a bit after that on Redis elasticache, but we're not there yet, I'll keep an eye on that. Here are a few numbers I computed a week ago: - Redis cpu percentage is above 1% - Between Feb 24 th, database storage passed from 5.5% to 17% (+12%), in 4 days. - The pace slowed down a bit after that. - As a side note, the CPU and MEM of our webheads is pretty low (14 and 12%)  More on consistent hashing — https://en.wikipedia.org/wiki/Consistent_hashing  Loop sharding bug — https://bugzilla.mozilla.org/show_bug.cgi?id=1114986  Twemproxy — https://github.com/twitter/twemproxy/  Redis supported operations on the proxy — https://github.com/twitter/twemproxy/blob/master/notes/redis.md  Pull request to enable sharding on loop-server — https://github.com/mozilla-services/loop-server/pull/315  Redis partioning docs — http://redis.io/topics/partitioning
We decided not to enable this so far. Closing.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.