Closed
Bug 1365677
Opened 7 years ago
Closed 7 years ago
Start load testing Download and Symbolication with QA
Categories
(Socorro :: Symbols, task)
Socorro
Symbols
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Assigned: grumpy)
References
Details
I'm not entirely convinced that this should block https://bugzilla.mozilla.org/show_bug.cgi?id=1365672 since we're not expecting any high load on the service any time soon. We *might* some day later this year. Then, extremely high load. Would be good to get the ball rolling on this as soon as possible though. At least to find some low-hang fruit in terms of figuring out if things break under load. Our goal is to test against a different environment other than Dev.
Reporter | ||
Comment 1•7 years ago
|
||
Miles, Two questions for you to decide: 1. Should we use prod or stage to do the load testing? If they're identical in terms of machine resources, it would be nice to use stage since it'll be around and ready to try-to-break once we start using/depending on prod. 2. What is the URL (domain) we point our load testing to?
Flags: needinfo?(miles)
Reporter | ||
Updated•7 years ago
|
Summary: Start loading testing Download and Symbolication with QA → Start load testing Download and Symbolication with QA
Reporter | ||
Comment 2•7 years ago
|
||
:grumpy, I have some tooling [0] I've been using to bombard the service locally. I've been using this primarily to do optimization of the service because it's easier to see what to make fast when it's being hit. The two services we want to load test are: 1. Symbolication (sending in a JSON blob with hex addresses and expecting them to be replaced with C++ signatures from symbol files stored in S3). This is what symbolication.py does in tecken-loader 2. Download (doing a GET on a symbol file will redirect to the symbols canonical public URL in S3). This is what download.py does in tecken-loader. I've never used molotov before but if you think that's good stuff perhaps we can join forces and write some scripts together and you, grumpy, can be responsible for running them. Where/How do we start? [0] https://github.com/peterbe/tecken-loader
Flags: needinfo?(chartjes)
Comment 3•7 years ago
|
||
Just to clarify, the purpose of a load test is not solely to know at what point your app falls over for contrived unrealistic conditions of load. The purpose of a load test is multi-fold: 1. it helps establish what infrastructure is needed to run the app 2. it helps establish indicators for how the app should scale 3. it gives us a baseline for how the app behaves under increasing load so that later on down the line we have good feels for how the app will behave as the requirements and purpose change 4. assuming we run load tests in such a way that they're repeatable, we then have what we need to test architecture changes and anything else that could heavily affect the performance of the app We definitely want some kind of load test before going to prod because otherwise we just have no idea how to answer questions related to the above things.
Reporter | ||
Comment 4•7 years ago
|
||
PS. (attention :grumpy) We *don't* need to wait (to start on this bug) for a stage and/or production environment. We can write the tests now and use Dev or local docker laptop environments.
Assignee | ||
Comment 5•7 years ago
|
||
:peterbe, I definitely think we can take some of the stuff that you wrote in download.py and symbolication.py and make them work with molotov. The quick-start docs are pretty good https://molotov.readthedocs.io/en/latest/tutorial/
Flags: needinfo?(chartjes)
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Chris Hartjes [:grumpy][:chartjes] from comment #5) > :peterbe, > > I definitely think we can take some of the stuff that you wrote in > download.py and symbolication.py and make them work with molotov. > > The quick-start docs are pretty good > > https://molotov.readthedocs.io/en/latest/tutorial/ Do you want to take a stab at it or should I?
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #6) > (In reply to Chris Hartjes [:grumpy][:chartjes] from comment #5) > > :peterbe, > > > > I definitely think we can take some of the stuff that you wrote in > > download.py and symbolication.py and make them work with molotov. > > > > The quick-start docs are pretty good > > > > https://molotov.readthedocs.io/en/latest/tutorial/ > > Do you want to take a stab at it or should I? Given that I don't have a running version of tekken on my laptop, probably better if you give it a try.
Comment 8•7 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #1) > 1. Should we use prod or stage to do the load testing? If they're identical > in terms of machine resources, it would be nice to use stage since it'll be > around and ready to try-to-break once we start using/depending on prod. Stage and prod will be identical - same AMIs (so same code). They will be in different regions (stage is us-east-1, prod is us-west-2). Stage is the standard environment to use for this sort of thing. > 2. What is the URL (domain) we point our load testing to? Though not available yet, the domain will be symbols.stage.mozaws.net. I'll check back in when we have a functional stage environment for symbols/tecken.
Flags: needinfo?(miles)
Reporter | ||
Comment 9•7 years ago
|
||
Update: https://github.com/mozilla-services/tecken-loadtests/pull/2 Comments within.
Reporter | ||
Comment 10•7 years ago
|
||
This is now in. https://github.com/mozilla-services/tecken-loadtests/blob/master/loadtest.py I'm not sure what to do next. Miles, are you ready to set up a Stage instance so that :grumpy can start bombarding? :grumpy, will you take ownership of this bug now? Note-to-self; I'm not entirely convinced the test is good. The business logic for if a symbol download should be 404 or 200 depends on time and I took a snapshot. We might have to remove the test [0] and just make sure it's EITHER 200 or 404 but nothing else. [0] https://github.com/mozilla-services/tecken-loadtests/blob/ceb7a0773e756a7f23f165bb77fcbbe515eec733/loadtest.py#L149-L152
Flags: needinfo?(miles)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → chartjes
Assignee | ||
Comment 11•7 years ago
|
||
:peterbe I'm happy to take ownership. I noticed that there are some features in the latest release of molotov that can help, so I will refactor the load test code to use them.
Comment 12•7 years ago
|
||
Symbols is now ready for load testing in stage. Here is some relevant info: APM <= single node running new relic app <= autoscaled nodes not running new relic symbols.stage.mozaws.net <= main endpoint, hits both APM and app instances symbols-loadtest-apm.stage.mozaws.net <= specifically for load testing, hits only APM instance symbols-loadtest-as.stage.mozaws.net <= specifically for load testing, hits both APM and app instances New Relic and Datadog are configured for Symbols. https://rpm.newrelic.com/accounts/1402187/applications/52227224 <= New Relic https://app.datadoghq.com/dash/286319/tecken <= Datadog Logging isn't quite working yet, coming soon. Other than that, you're ready to go!
Flags: needinfo?(miles)
Reporter | ||
Comment 13•7 years ago
|
||
Any news on this? It would be nice to know when it's going to happen so I can be on standby with the graphs and stuff.
Flags: needinfo?(chartjes)
Assignee | ||
Comment 14•7 years ago
|
||
I just need a node on the same network as the symbol server staging instance and I will be ready to do it.
Flags: needinfo?(chartjes) → needinfo?(miles)
Comment 15•7 years ago
|
||
The node is up and afaik we are good to go on load testing.
Flags: needinfo?(miles)
Comment 16•7 years ago
|
||
What's the load numbers for the current symbols server? We can probably use that as a 1x target number. Also, it's worth finding the load numbers for when Durst was hitting the symbols server with his python script. That's probably also a good 1x target number. For Antenna, we put the 1x, 3x, and 10x numbers into some of the graphs so that we knew what our goals were for load testing and health.
Reporter | ||
Comment 17•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #16) > What's the load numbers for the current symbols server? We can probably use > that as a 1x target number. > > Also, it's worth finding the load numbers for when Durst was hitting the > symbols server with his python script. That's probably also a good 1x target > number. > > For Antenna, we put the 1x, 3x, and 10x numbers into some of the graphs so > that we knew what our goals were for load testing and health. https://docs.google.com/document/d/1UGz4sY-WESTr_x0_6j9duKhfJHSPsRrKnhCY1PyskLQ
Reporter | ||
Comment 18•7 years ago
|
||
We have to add symbols-loadtest-apm.stage.mozaws.net and symbols-loadtest-as.stage.mozaws.net to ALLOWED_HOSTS. Right now they're returning 400 Bad Request.
Flags: needinfo?(miles)
Comment 19•7 years ago
|
||
Yikes. That's my bad. Making the changes and pushing now.
Flags: needinfo?(miles)
Comment 20•7 years ago
|
||
To update, those hosts are now allowed properly.
Assignee | ||
Comment 21•7 years ago
|
||
Finished up my load testing yesterday, summary is in https://docs.google.com/document/d/1UGz4sY-WESTr_x0_6j9duKhfJHSPsRrKnhCY1PyskLQ/edit#
Reporter | ||
Comment 22•7 years ago
|
||
This is no longer actionable. We have results (not great but that's another story) and we have a framework to do loadtesting. And the Google doc that talks about needs, targets and baselines is done and still useful. After this we'll work on new optimizations (infra and code) and start new load testing. Also, a technical detail we learned, is that one of the results is that Tecken is not yet ready to handle the load from Socorro's processors. We'll deal with that after we go to prod.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•