Closed Bug 1186091 Opened 9 years ago Closed 9 years ago

Perform initial "Smoke Test" of MDN staging to measure baseline performance for MDN code

Categories

(developer.mozilla.org Graveyard :: Performance, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Unassigned)

References

()

Details

(Keywords: in-triage)

User Story

As an MDN dev, I need to run a performance smoke test against the MDN stage server, so I know if new code will cause performance regression in production.
Pick a stress/load-testing tool
Write stress test based on catalog of page types in bug 1186085
Coordinate with WebOps to load-test stage server
Blocks: 1186093
Depends on: 1182182
No longer blocks: 1186081
Keywords: in-triage
:cyliang - is there any way we can run something like a bees-with-machine-guns [1] or locust.io [2] on some node/vm (new or existing) that would hit the stage web-heads directly? i.e. - not the load balancer in front?

[1] https://github.com/mostlygeek/beeswithmachineguns
[2] https://github.com/locustio/locust
User Story: (updated)
Flags: needinfo?(cliang)
Summary: Perform "Stress Test" of MDN staging to learn max load limit of MDN code → Perform initial "Smoke Test" of MDN staging to measure baseline performance for MDN code
We don't currently have a setup to do something like that but it should be possible to spin up some VMs in the same VLAN as the web heads to do that kind of work.  I have a mild preference here to set up Locust because 1) I've not interacted with it before and 2) I don't know if beeswithmachineguns is so AWS-centric that it won't work inside the datacenter.  (If you want to bypass the load balancers, the load testing framework probably needs to be in the datacenter.)

Do you have any parameters yet WRT how many concurrent connections, etc. that you'd like to throw at the webheds?  If you don't have any, I can look at spinning up a small, five node cluster (mostly 1 CPU servers); we can then see what numbers we can pull out of that.
Flags: needinfo?(cliang)
I like locust too. :)

Back-of-the-napkin parameter math: In a 24-hour period, prod back-end gets 1.43k rpm (requests per minute) on 3 web-heads with 8GB memory, 4 CPUs. Since stage has 1 web-head with 8GB memory and 2 CPUs, how about 1.43k/3/2 = 240 rpm.

Assuming the web-heads get all requests from the load-balancer, we should be able simulate realistic back-end request load by sending all 240 rpm from a single locust node, right?

In that case, we could start as simply as a single locust node (1 CPU vm) with just enough power to send 240 rpm to the stage server. Maybe even use the admin node for it, exposing the locust web app for devs to use like we expose the chief web app?
Flags: needinfo?(cliang)
I think it's going to depend on how much overhead is needed by the framework itself and the nature of the jobs that are creating load.  This makes it sort of hard to guess ahead of time.

I've put in a bug for 4 VMS: 1 that is sized so it can hold the web app and 3 to act as slaves.  They are on the smallish side (4GB, 1 CPU).  It should be easy enough to reboot if we find we have too much or too little of one resource or another.
Flags: needinfo?(cliang)
Note: I sent https://github.com/mozilla/kuma/pull/3400 for this, which includes docs for how to run the first locust test. [1] I noticed after an initial spike of requests hitting the backend, the subsequent requests did not reach the back-end; presumably they were returned by the load balancer.

(I don't know if I caused a celery chord job to get stuck too. :( NR is reporting thousands of "celery.chord_unlock celery.exceptions.Retry" errors

[1] https://github.com/mozilla/kuma/pull/3400/files#diff-463fe8ae3b0066f435dcef833237b911
Commits pushed to master at https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/9cbb82a999e693ecebc02053422e9460bd30e7d5
bug 1186091 - performance smoke test with locust

https://github.com/mozilla/kuma/commit/d7e5520fc5180cc28d0bf24ee5af5799417775e6
Merge pull request #3400 from mozilla/start-perf-smoke-test-1186091

bug 1186091 - performance smoke test with locust
http://locustmaster1.webapp.scl3.mozilla.com:8089/ is now running the initial performance smoke/regression test on a continual basis against the stage environment. I watched it during today's stage deployment. Now it's just a matter of enhancing our smoke/perf test. (bug 1186093)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.