Closed Bug 1114986 Opened 10 years ago Closed 9 years ago

Add a sharding mechanism for redis

Categories

(Hello (Loop) :: Server, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: alexis+bugs, Unassigned)

Details

We need to shard the keys for the redis server.

For that, we need to define a good strategy for the sharding keys and then implement it. What we want is a constant key hashing mechanism, so we can add or remove shards without having to move data along.
Redis itself doesn't provide a stable sharding mechanism itself.

The Redis documentation has a section about partitioning that I spent some time reading: http://redis.io/topics/partitioning

So, they point at different possibilities:

1. Use a proxy to do the request to the right database, they point to a proxy used by twitter named Twmemproxy (https://github.com/twitter/twemproxy);
2. Have the client do the sharding directly. They don't recommend using this directly and prefer using a proxy.

These two techniques are quite similar in concept, but 
One really important thing to keep in mind is that it's not currently possible, with Redis, to add / remove shards as the data grows, unless we consider re-balancing the data inside the node a solution (that is, take time to move the data to the right nodes).

One technique they consider is to do what they call "Pre Sharding". That is, adding new shards beforehand, even if we're not using all the memory on them.

My belief is that it could work pretty well for us, if we find a way to copy the data from one redis to another bigger one. Loop currently provides that, but only when we're using one database, not using shards.

Using shards, this is possible (and explained in their documentation) by creating a new redis instance, registering it as a slave to the old (master) redis instance, and once the data is copied, turn the old one off. I'm not sure how that matches with what's implemented on the Amazon side, but I foresee this being our only problem).

It's probably also worth noticing that Redis provides a beta quality so called "cluster" http://redis.io/topics/cluster-tutorial. It's not ready yet and probably not good enough for us nor available on Amazon, but that's worth keeping an eye on it.
I don't know what's the best option to use at the AWS level, because I believe AWS imposes us some use of Redis. Bob, Dean, do you know more about that maybe?
Flags: needinfo?(dwilson)
Flags: needinfo?(bobm)
(In reply to Alexis Metaireau (:alexis) from comment #2)
> I don't know what's the best option to use at the AWS level, because I
> believe AWS imposes us some use of Redis. Bob, Dean, do you know more about
> that maybe?

I'm not sure.  At the AWS re:Invent elasticache deep-dive the presenter recommended using a consistent hashing algorithm (http://dl.acm.org/citation.cfm?id=258660) to support data sharding.  I'm not against re-balancing, but I'm not sure it's a wheel we want to re-invent.  It might be that the TTLs are sufficient to handle purging data after a re-balance.
Flags: needinfo?(bobm)
After discussion, we agree with :bobm here.

The plan is to go with a 10 nodes shard cluster. Which allow a size from 10GB to 2.37TB
I allows consistent hashing as well.

We will start to work on a implementation for it.
Flags: needinfo?(dwilson)
The plan is roughtly as follows:

- we definitely want to use consistent hashing as it lowers a lot the rebalancing (we rebalance only one node when we add more nodes);
- we want to reuse twemproxy which is doing that already for us: https://github.com/twitter/twemproxy
- we need to find a way to do rebalancing (I believe we already have that with the migration backend ;))

We'll work on sharding next week and the plan is to test this solution on stage and see how it behaves.
I've considered a few options to make our Redis setup sharded.

The best option we have (well, see the note below) is to use a proxy
to distribute our key lookups and use consistent hashing with the pre-sharding
technique.

Principle is simple: each key goes trough a function which associates it to a
unique shard.

Pre-sharding means you chose a number of shards and then you're stuck with this
number of shards, until you rebalance the keys.

The hashing function we're using allows consistent hashing, which means that if
we need to add / remove nodes, only some keys would need to be rebalanced into
the new shard setup. Not ideal, but feasible.

In order to avoid these rebalances, you chose a hight number of shards from the
beginning, and you increase the size of each shard as traffic shows up
(or decrease if you get less and less usage).

If we chose to have 10 shards, the maximum size of the database would be
10 * 254Gb (which is 25.4 terabytes, representing approx 400 000 000 active
users with 3 rooms if my math is correct) considering the maximum size
available in AWS right now.

The proposal
============

We should use Twemproxy [2], a really fast proxy which supports consistent
hashing. It's considered to be the state of the art for what we're trying to
accomplish.

Twemproxy doesn't support all Redis operations [3]. For instance, transactions
aren't supported. We've modified the loop-server to bypass migrations [4]
(which we are only using for minimizing the number of calls to the database),
and that's the only unsupported operation we're currently using in our code,
so we're good.

We will still need to do the original migration, and it will be exactly like
what we've done in the past when we increased the size of the database, but
rather than migration from a Redis instance to another one, we will use the
ports exposed by Twemproxy as the new database.

Ops questions
=============

I'm mainly looking for ops feedback here, since the feedback look from devs is
already done.

I think having one proxy per web-head would allow us to do what we want. The
*worse* overhead of Twemproxy is 20%, but should be quite low for the operations
we're currently doing.

When scaling up our nodes, we would need to do the following [5]:

  - Start empty new instances;
  - Move data configuring these new instances as slaves for your source
    instances.
  - Update the configuration of Twemproxy with the new server IP address.
  - Send the SLAVEOF NO ONE command to the slaves in the new server.
  - Restart the proxy
  - Finally shut down the no longer used instances.

I believe the work on your side would be to:
- Package Twemproxy
- Add the configuration files in puppet (I can help with this)
- Create a script (?) to handle the scaling up of our nodes

Does this looks correct to you?

For richard
==============

So, it seems once we'll have everything we need from ops, the plan is to test
this. We should be able to do it the same way we've done for the migration.

Things to check:

- We're able to do the initial migration of the data from the single redis
  to our sharded cluster;
- We're able to scale up the cluster by increasing the size of all nodes.

Notes
=====

Actually, the real best option is something baked by Antirez (the guy who wrote
Redis), named redis-cluster. Problem is, this software is still in beta (RC4 in
the past weeks), and as such we cannot use it as of now. I believe it will be
available in the next months with the new redis release, and a bit after that
on Redis elasticache, but we're not there yet, I'll keep an eye on that.

Here are a few numbers I computed a week ago:

- Redis cpu percentage is above 1%
- Between Feb 24 th, database storage passed from 5.5% to 17% (+12%), in 4 days.
- The pace slowed down a bit after that.
- As a side note, the CPU and MEM of our webheads is pretty low (14 and 12%)

[0] More on consistent hashing — https://en.wikipedia.org/wiki/Consistent_hashing
[1] Loop sharding bug — https://bugzilla.mozilla.org/show_bug.cgi?id=1114986
[2] Twemproxy — https://github.com/twitter/twemproxy/
[3] Redis supported operations on the proxy — https://github.com/twitter/twemproxy/blob/master/notes/redis.md
[4] Pull request to enable sharding on loop-server — https://github.com/mozilla-services/loop-server/pull/315
[5] Redis partioning docs — http://redis.io/topics/partitioning
We decided not to enable this so far. Closing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.