Closed Bug 487691 Opened 15 years ago Closed 15 years ago

Put together a model for predicting scalability requirements

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hello, Assigned: zandr)

Details

We need to have a working model (even if it's rough at first) so we can predict the scalability performance of the Weave servers.
Component: Weave Server → Server
Product: Mozilla Labs → Weave
Target Milestone: -- → ---
QA Contact: weaveserver → server
What we understand intrinsically (or have no better way to test):

* We understand the volume of data per user *at this stage*. Obviously, this will grow as we add new features to the Weave client. It will grow unpredictably as other people start to use Weave as storage. However, at this time, we have solid numbers for average data size per user.

* We understand how much space that occupies and what that means for the db server, which is an obvious bottleneck. However, if a cluster can't scale to that many users in other ways, there may be no point in leveraging bigger (spacewise) dbs.

* We are getting a picture of user/active-user ratio and can see whether that ratio remains constant over time.


What we understand intuitively:

* Database queries are not taxing to the system. It should be able to scale up to a pretty high number of qps

* Scaling out the api/proxy layer is pretty easy, as it's just a matter of adding more boxes (or stunnel)

* We have a lot of capacity for growth, especially at the current rate.

* We *believe* that the bottleneck on the current cluster will be disk size. That is not a sufficient level of data to help us model the optimal cluster.


What we don't understand as well as we could/should:

* What does the average payload look like? We can construct them abstractly, but don't really have a good picture of it in the wild.

* How do the proxies scale with increased load? Is performance linear? At what point do response times start to deteriorate to an unacceptable level?

* Same thing for the dbs and disk i/o. Will it start to show load faster or slower than the proxies? Does the read db or the write db have issues first? (scaling the read db is much, much easier than scaling the write db)

* At what level of capacity should we be building out a new cluster? That's not an instantaneous or cheap process, so understanding an average rate of growth would be important.



In an attempt to get a little more clarity on some of these questions, I would propose we do the following:


1) Add a line of code to one of the frontends and gather about 6 hours worth of payloads. Gather the access logs for that time as well.

2) Install a second weave cluster on the current cluster. This is easy to do from a weave code standpoint, and we'd just use a different url prefix.

3) Write some code to loop-replay the logs, randomly sampling payloads for any PUT or POST request. Since we don't actually care about db size (that's basically a known), we can do this with only a dozen or so test accounts.

4) Start running that code on some clients. We'll need a stack of them in order to get some serious load.

5) Periodically sample qps, proxy load, db reads/writes and load, and iostat on the boxes. Plot graphs of these vs the qps. Steadily bring more clients on line. We shouldn't need to do a long-term stress-test, so this should be doable in a matter of hours. There's also not, at this point, a need to test the machines to failover - just understanding the characteristics may be enough and would save us from having to degrade the live user's experience.

6) um... Profit? Well, at least drop the second weave db to claim back the space.
Assignee: nobody → zandr
We'll need to add two lines of code to one of the machines:

In the file /var/www/weaveserver/server/index.php (path may vary on the prod machines)

after line 201:

		while ($data = fread($putdata,2048)) {$json .= $data;};

add 

	error_log("put payload: " . $json);


and after line 253:

		while ($data = fread($putdata,2048)) {$jsonstring .= $data;}

add

		error_log("post payload: " . $jsonstring);


We should also preserve the access logs for that time period so that we can replay the appropriate kind of read queries.
The approach you've laid out will allow us to derive a model from collected data.  That is valid and good, but we can also construct parts of the model assuming some things as variables.  For example:

* if median number of records is n and average payload size is s, how much disk space will we need?
* if required disk space is x, how many backend servers will we need? (for current and other potential storage options we might try)
* if user base expands to y, how much bandwidth do we need to reserve?
* how many ssl frontend machines do we need for x bandwidth?
* given x available disk space, how many new users can we accept?

These are just a few off the top of my head.  They are inter-related (e.g. if average payload size goes up, then required bandwidth probably goes up too), but we should be able to come up with a model where we can plug in the values we know and get answers to questions like the above.

I suggest these steps for this alternate method:

1) think about what questions are interesting to us (e.g. like the above).
2) guess some formulas/basic algorithms (no code necessary) that describe the solution to each question.  This can be anything as simple as multiplying variables or something more complicated if we want to take into account things like maximum data per server + cost of setting up a new server.
3) think about dependencies between each set of equations, so we're aware.
4) use the resulting doc to inform what data we want to collect, both to validate the equations empirically as well as to use the data to look at current status & make predictions.
5) return to (1) and come up more abstract questions that build upon the current set.
Fine questions

* if median number of records is n and average payload size is s, how much disk
space will we need?

This is pretty much answerable right now (given a third variable for number of users), with the huge caveat that we expect the average user profile to change substantially and possibly unpredictably as we roll out new services.


* if required disk space is x, how many backend servers will we need? (for
current and other potential storage options we might try)

I'm not sure I follow this one. There's a disk cost per machine (size of disk doesn't scale linearly with price), but I think we should be reversing this question - given the cpu capacity of a server to handle requests, what size disk is appropriate to support the users that the cpu can handle?


* if user base expands to y, how much bandwidth do we need to reserve?
* how many ssl frontend machines do we need for x bandwidth?

These two questions are basically what we're going to try to examine with this test.

* given x available disk space, how many new users can we accept?

Again, if disk space turns out to be the bounding factor on a cluster, then this calculation is easy. 

The real goal here is to find the pain points. Is it ssl termination? disk i/o on the db boxes? Physical disk space? Some of those can be addressed cheaply by altering the parameters of the cluster (e.g. more ssl termination boxes). Others will suggest a reasonable limit for each cluster.

Fundamentally, we're asking one question: What is the $cost per user? Intertwined with that is figuring out the optimal configuration for $cost per user given a user's profile.
(In reply to comment #4)
> * if median number of records is n and average payload size is s, how much disk
> space will we need?
> 
> This is pretty much answerable right now

Right, we have answerable questions that we should already be using to create our model.  We need to start a page with this stuff and iterate on it.

> users), with the huge caveat that we expect the average user profile to change
> substantially and possibly unpredictably as we roll out new services.

So we put down an initial solution and then refine it or replace it as needed.

> * if required disk space is x, how many backend servers will we need? (for
> current and other potential storage options we might try)
> 
> I'm not sure I follow this one. There's a disk cost per machine (size of disk
> doesn't scale linearly with price), but I think we should be reversing this
> question - given the cpu capacity of a server to handle requests, what size
> disk is appropriate to support the users that the cpu can handle?

What I meant is, if we follow the above question ("how much space will we need?"), we'll get an answer in terms of total disk space.  We'll then need to know if we can scale to that given our current hardware/bandwidth, or if we'll need more, and how much more.

For example, say we decide we want to support storage of icons, and expect that to take 1.5MB/user.  I want to have a model (a document somewhere) where I can go in and figure out what 1.5MB extra would mean.  Does it mean we can't grow beyond N users unless we put up another cluster?  Or does it mean we'll just need more servers to handle the load? etc.

> Fundamentally, we're asking one question: What is the $cost per user?
> Intertwined with that is figuring out the optimal configuration for $cost per
> user given a user's profile.

Cost per user is interesting, but not the main thing I am after.  I am after a model where I can plug in data for hypothetical scenarios and get approximations based on that (including cost of course)--I want the apparatus that we use to derive the number, not the number itself.
(In reply to comment #5)
> What I meant is, if we follow the above question ("how much space will we
> need?"), we'll get an answer in terms of total disk space.  We'll then need to
> know if we can scale to that given our current hardware/bandwidth, or if we'll
> need more, and how much more.

Ah, I see. You're asking a very different question here, and I'm not sure that the answer is all that useful. After all, if space was the only variable, we could answer that by slapping a few terabyte arrays on and calling it a day. I think there are two questions to be answered that, when they are, basically produce what you are looking for. They are:

1) What is the optimal configuration (frontends, backends, CPUs, disk sizes) of a cluster? Failing this, do we have reasonable confidence that our current ratios are appropriate

2) Given the optimal configuration, how many users can a cluster support?


The first one is difficult, and is really shaped by what our traffic looks like. The second one is a lot more testable. But the answer to "how much capacity (not just space) do we need" is entirely dependent on how many users we need to support.
(In reply to comment #6)
> (In reply to comment #5)
> > What I meant is, if we follow the above question ("how much space will we
> > need?"), we'll get an answer in terms of total disk space.  We'll then need to
> > know if we can scale to that given our current hardware/bandwidth, or if we'll
> > need more, and how much more.
> 
> Ah, I see. You're asking a very different question here, and I'm not sure that
> the answer is all that useful. After all, if space was the only variable, we
> could answer that by slapping a few terabyte arrays on and calling it a day.

Space is not the only variable at all.  When I said "For example" I really meant it :)  The questions are examples of the kinds of things I think we might want to know.

> think there are two questions to be answered that, when they are, basically
> produce what you are looking for. They are:
> 
> 1) What is the optimal configuration (frontends, backends, CPUs, disk sizes) of
> a cluster? Failing this, do we have reasonable confidence that our current
> ratios are appropriate

The optimal configuration for what?  If there was one cluster configuration that was better for any application no one would bother setting up anything else.

The reason the model is useful is because we don't know what applications we will implement.  An "optimal" cluster for our current application will not necessarily work well if we vary what we do (e.g. vary payload size, frequency of reads or writes, etc).

> 2) Given the optimal configuration, how many users can a cluster support?

This *is* useful, it's just not a static number (not even close).  That's why the model is much more important than the actual answer.

Just to be super-clear: Collecting empirical data is great, I'm not knocking that at all.  I view that as a parallel task to this bug.
(In reply to comment #7)
> (In reply to comment #6)

> > 1) What is the optimal configuration (frontends, backends, CPUs, disk sizes) of
> > a cluster? Failing this, do we have reasonable confidence that our current
> > ratios are appropriate
> 
> The optimal configuration for what?  If there was one cluster configuration
> that was better for any application no one would bother setting up anything
> else.
> 

Sure, there are lots of possible configurations. But there is a correlation between, say, the traffic pattern of our app and the ratio of cpu to actual disk space needed. These aren't numbers that can really be derived mathematically, since in practice, the graph is much more of a step function. 

That's the core problem with a model. It's essentially unconstructable as an abstract concept. We can make some good guesses from experience, and we can try to identify likely bottlenecks, but actually producing a system of equations where we can plug in a series of values for all our parameters and output some result... not that I've ever encountered in the real world! (I wish, writing load tests is boring, and I wouldn't have to do them :) )



> The reason the model is useful is because we don't know what applications we
> will implement.  An "optimal" cluster for our current application will not
> necessarily work well if we vary what we do (e.g. vary payload size, frequency
> of reads or writes, etc).

Right. But we futureproof by speculating on where the data model might go, by building a system flexible enough to allow us to shift traffic around, by not getting locked into a particular technology or approach, and by making sure we allow for lots of extra capacity. 

Don't get me wrong, I'd love to be able to generate a model, but I think the best we're going to get is informed guesswork based on empirical evidence!
(In reply to comment #2)

Thinking about this some more, I think a little extra data would be useful to log, so

> 
> after line 201:
> 
>         while ($data = fread($putdata,2048)) {$json .= $data;};
> 
> add 
> 
>     error_log("put payload: " . $json);
> 
> 

error_log("put payload (" . time() . "): " . $json);


> and after line 253:
> 
>         while ($data = fread($putdata,2048)) {$jsonstring .= $data;}
> 
> add
> 
>         error_log("post payload: " . $jsonstring);
> 


error_log("post payload (" . time() . "): " . $jsonstring);

That saves us having to know the definitive start and end times, and also lets us see how bursty the write traffic is.
(In reply to comment #8)
> Sure, there are lots of possible configurations. But there is a correlation
> between, say, the traffic pattern of our app and the ratio of cpu to actual
> disk space needed. These aren't numbers that can really be derived
> mathematically, since in practice, the graph is much more of a step function. 

Surely there is some rough ballpark calculation that works.  Like, if we increase payload sizes to 5MB average per user, will we need one extra server or one thousand?  Will they mostly be SSL servers or mysql servers?  Right now we have no idea (at least I don't).

> That's the core problem with a model. It's essentially unconstructable as an
> abstract concept. We can make some good guesses from experience, and we can try
> to identify likely bottlenecks, but actually producing a system of equations
> where we can plug in a series of values for all our parameters and output some
> result... not that I've ever encountered in the real world! (I wish, writing
> load tests is boring, and I wouldn't have to do them :) )

You are arguing for this bug to be wontfixed, but I heartily disagree.  The model does not need to be perfect, and it does not in any way replace load testing requirements.  It is a parallel process which feeds off of load test results (and also informs which load tests that need to be written in the future).

> Don't get me wrong, I'd love to be able to generate a model, but I think the
> best we're going to get is informed guesswork based on empirical evidence!

Informed guesswork *is* a model.
I started a page to build the model here:

https://intranet.mozilla.org/Labs/Weave/Cluster_Scalability
Load testing is done, wiki page built... time to close the bug.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.