1365677 - Start load testing Download and Symbolication with QA

Reporter

Description

•

7 years ago

I'm not entirely convinced that this should block https://bugzilla.mozilla.org/show_bug.cgi?id=1365672 since we're not expecting any high load on the service any time soon. 

We *might* some day later this year. Then, extremely high load. 
Would be good to get the ball rolling on this as soon as possible though. At least to find some low-hang fruit in terms of figuring out if things break under load. 

Our goal is to test against a different environment other than Dev.

Peter Bengtsson [:peterbe]

Reporter

Comment 1

•

7 years ago

Miles, 
Two questions for you to decide:

1. Should we use prod or stage to do the load testing? If they're identical in terms of machine resources, it would be nice to use stage since it'll be around and ready to try-to-break once we start using/depending on prod. 

2. What is the URL (domain) we point our load testing to?

Flags: needinfo?(miles)

Peter Bengtsson [:peterbe]

Reporter

Updated

•

7 years ago

Summary: Start loading testing Download and Symbolication with QA → Start load testing Download and Symbolication with QA

Peter Bengtsson [:peterbe]

Reporter

Comment 2

•

7 years ago

:grumpy,

I have some tooling [0] I've been using to bombard the service locally. I've been using this primarily to do optimization of the service because it's easier to see what to make fast when it's being hit. 

The two services we want to load test are:

1. Symbolication (sending in a JSON blob with hex addresses and expecting them to be replaced with C++ signatures from symbol files stored in S3). This is what symbolication.py does in tecken-loader

2. Download (doing a GET on a symbol file will redirect to the symbols canonical public URL in S3). This is what download.py does in tecken-loader.

I've never used molotov before but if you think that's good stuff perhaps we can join forces and write some scripts together and you, grumpy, can be responsible for running them. Where/How do we start? 


[0] https://github.com/peterbe/tecken-loader

Flags: needinfo?(chartjes)

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 3

•

7 years ago

Just to clarify, the purpose of a load test is not solely to know at what point your app falls over for contrived unrealistic conditions of load. The purpose of a load test is multi-fold:

1. it helps establish what infrastructure is needed to run the app

2. it helps establish indicators for how the app should scale

3. it gives us a baseline for how the app behaves under increasing load so that later on down the line we have good feels for how the app will behave as the requirements and purpose change

4. assuming we run load tests in such a way that they're repeatable, we then have what we need to test architecture changes and anything else that could heavily affect the performance of the app

We definitely want some kind of load test before going to prod because otherwise we just have no idea how to answer questions related to the above things.

Peter Bengtsson [:peterbe]

Reporter

Comment 4

•

7 years ago

PS. (attention :grumpy) We *don't* need to wait (to start on this bug) for a stage and/or production environment. We can write the tests now and use Dev or local docker laptop environments.

Chris Hartjes [:grumpy][:chartjes]

Assignee

Comment 5

•

7 years ago

:peterbe,

I definitely think we can take some of the stuff that you wrote in download.py and symbolication.py and make them work with molotov.

The quick-start docs are pretty good

https://molotov.readthedocs.io/en/latest/tutorial/

Flags: needinfo?(chartjes)

Peter Bengtsson [:peterbe]

Reporter

Comment 6

•

7 years ago

(In reply to Chris Hartjes [:grumpy][:chartjes] from comment #5)
> :peterbe,
> 
> I definitely think we can take some of the stuff that you wrote in
> download.py and symbolication.py and make them work with molotov.
> 
> The quick-start docs are pretty good
> 
> https://molotov.readthedocs.io/en/latest/tutorial/

Do you want to take a stab at it or should I?

Chris Hartjes [:grumpy][:chartjes]

Assignee

Comment 7

•

7 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #6)
> (In reply to Chris Hartjes [:grumpy][:chartjes] from comment #5)
> > :peterbe,
> > 
> > I definitely think we can take some of the stuff that you wrote in
> > download.py and symbolication.py and make them work with molotov.
> > 
> > The quick-start docs are pretty good
> > 
> > https://molotov.readthedocs.io/en/latest/tutorial/
> 
> Do you want to take a stab at it or should I?

Given that I don't have a running version of tekken on my laptop, probably better if you give it a try.

Miles Crabill [:miles]

Comment 8

•

7 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #1)
> 1. Should we use prod or stage to do the load testing? If they're identical
> in terms of machine resources, it would be nice to use stage since it'll be
> around and ready to try-to-break once we start using/depending on prod. 
Stage and prod will be identical - same AMIs (so same code). They will be in different regions (stage is us-east-1, prod is us-west-2). Stage is the standard environment to use for this sort of thing.

> 2. What is the URL (domain) we point our load testing to?
Though not available yet, the domain will be symbols.stage.mozaws.net. I'll check back in when we have a functional stage environment for symbols/tecken.

Flags: needinfo?(miles)

Peter Bengtsson [:peterbe]

Reporter

Comment 9

•

7 years ago

Update:
https://github.com/mozilla-services/tecken-loadtests/pull/2
Comments within.

Peter Bengtsson [:peterbe]

Reporter

Comment 10

•

7 years ago

This is now in. https://github.com/mozilla-services/tecken-loadtests/blob/master/loadtest.py

I'm not sure what to do next. 

Miles, are you ready to set up a Stage instance so that :grumpy can start bombarding?

:grumpy, will you take ownership of this bug now?


Note-to-self; I'm not entirely convinced the test is good. The business logic for if a symbol download should be 404 or 200 depends on time and I took a snapshot. We might have to remove the test [0] and just make sure it's EITHER 200 or 404 but nothing else.

[0]  https://github.com/mozilla-services/tecken-loadtests/blob/ceb7a0773e756a7f23f165bb77fcbbe515eec733/loadtest.py#L149-L152

Flags: needinfo?(miles)

Chris Hartjes [:grumpy][:chartjes]

Assignee

Updated

•

7 years ago

Assignee: nobody → chartjes

Chris Hartjes [:grumpy][:chartjes]

Assignee

Comment 11

•

7 years ago

:peterbe I'm happy to take ownership. I noticed that there are some features in the latest release of molotov that can help, so I will refactor the load test code to use them.

Miles Crabill [:miles]

Comment 12

•

7 years ago

Symbols is now ready for load testing in stage. Here is some relevant info:

APM <= single node running new relic
app <= autoscaled nodes not running new relic

symbols.stage.mozaws.net <= main endpoint, hits both APM and app instances
symbols-loadtest-apm.stage.mozaws.net <= specifically for load testing, hits only APM instance
symbols-loadtest-as.stage.mozaws.net <= specifically for load testing, hits both APM and app instances

New Relic and Datadog are configured for Symbols.

https://rpm.newrelic.com/accounts/1402187/applications/52227224 <= New Relic
https://app.datadoghq.com/dash/286319/tecken <= Datadog

Logging isn't quite working yet, coming soon.

Other than that, you're ready to go!

Flags: needinfo?(miles)

Peter Bengtsson [:peterbe]

Reporter

Comment 13

•

7 years ago

Any news on this? It would be nice to know when it's going to happen so I can be on standby with the graphs and stuff.

Flags: needinfo?(chartjes)

Chris Hartjes [:grumpy][:chartjes]

Assignee

Comment 14

•

7 years ago

I just need a node on the same network as the symbol server staging instance and I will be ready to do it.

Flags: needinfo?(chartjes) → needinfo?(miles)

Miles Crabill [:miles]

Comment 15

•

7 years ago

The node is up and afaik we are good to go on load testing.

Flags: needinfo?(miles)

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 16

•

7 years ago

What's the load numbers for the current symbols server? We can probably use that as a 1x target number.

Also, it's worth finding the load numbers for when Durst was hitting the symbols server with his python script. That's probably also a good 1x target number.

For Antenna, we put the 1x, 3x, and 10x numbers into some of the graphs so that we knew what our goals were for load testing and health.

Peter Bengtsson [:peterbe]

Reporter

Comment 17

•

7 years ago

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #16)
> What's the load numbers for the current symbols server? We can probably use
> that as a 1x target number.
> 
> Also, it's worth finding the load numbers for when Durst was hitting the
> symbols server with his python script. That's probably also a good 1x target
> number.
> 
> For Antenna, we put the 1x, 3x, and 10x numbers into some of the graphs so
> that we knew what our goals were for load testing and health.

https://docs.google.com/document/d/1UGz4sY-WESTr_x0_6j9duKhfJHSPsRrKnhCY1PyskLQ

Peter Bengtsson [:peterbe]

Reporter

Comment 18

•

7 years ago

We have to add symbols-loadtest-apm.stage.mozaws.net and symbols-loadtest-as.stage.mozaws.net to ALLOWED_HOSTS. Right now they're returning 400 Bad Request.

Flags: needinfo?(miles)

Miles Crabill [:miles]

Comment 19

•

7 years ago

Yikes. That's my bad. Making the changes and pushing now.

Flags: needinfo?(miles)

Miles Crabill [:miles]

Comment 20

•

7 years ago

To update, those hosts are now allowed properly.

Chris Hartjes [:grumpy][:chartjes]

Assignee

Comment 21

•

7 years ago

Finished up my load testing yesterday, summary is in https://docs.google.com/document/d/1UGz4sY-WESTr_x0_6j9duKhfJHSPsRrKnhCY1PyskLQ/edit#

Peter Bengtsson [:peterbe]

Reporter

Comment 22

•

7 years ago

This is no longer actionable. We have results (not great but that's another story) and we have a framework to do loadtesting. And the Google doc that talks about needs, targets and baselines is done and still useful.

After this we'll work on new optimizations (infra and code) and start new load testing. 

Also, a technical detail we learned, is that one of the results is that Tecken is not yet ready to handle the load from Socorro's processors. We'll deal with that after we go to prod.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

Start load testing Download and Symbolication with QA

Categories

(Socorro :: Symbols, task)

Tracking

(Not tracked)

People

(Reporter: peterbe, Assigned: grumpy)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21

Comment 22