Closed Bug 1174817 Opened 9 years ago Closed 9 years ago

Please deploy shavar v0.6.2 to STAGE

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rpapa, Unassigned)

References

Details

Some minor fixes going out.

Per ckolos, this has already been deployed to STAGE so marking fixed.
I'll begin verification today.

ELB
12:38:24 PM <ckolos> shavar-stage-2-ELB-112EQV5AWKQWO-1276938218.us-west-2.elb.amazonaws.com

r53 record
12:39:08 PM <ckolos> 29-shavar.stage.mozaws.net
12:39:28 PM <ckolos> and, of course, tracking.stage.mozaws.net
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
QA Contact: rpappalardo
------------------------------------
SUMMARY
------------------------------------

Since Shumway has again opted to use shavar, i'll need restart stack sizing activity (once workweek is over) to determine ideal configuration for carrying maximum load.  
For now, this release will be deployed with an ELB stack of m3.medium instances (same as currently in production for 0.6.1).  

I've created a #shavar channel on IRC to coordinate deployments.

For easier communication going forward, I'll also be creating all PROD deployment tickets once STAGE has been verified.

------------------------------------
E2E TEST
------------------------------------

End-to-end tests have been performed both manually and using mwobensmith's tool for this release.

------------------------------------
LOADTEST
------------------------------------

4 consecutive loadtests were performed using beeswithmachineguns

duration: 30 mins
bee size: m3.xlarge
number of bees: 60
conc. connections: 512
connections / bee: 9
avg. completed reqs: 1,793,231
total failures: 0 (first run had 329)
% failures:  0.00% (all runs)
max reqs/sec: ~1,000
time/req (ms):  ~500
CPU% (avg.max): < 60% (on latter runs)
I noticed that the 0.6.2 stack is handling about 1/3 of the requests/sec that 0.6.1 was previous handling when subjected to the same load on stage.  Could this be due to the fact that I'm running bees from us-east-1 (this time shavar is staged in us-west-1)?  It doesn't look like there's any changes in this tag that would have decreased it's scalability.  Also, since CPU is running significantly lower (< 60% vs. ~90% previously), it seems like perhaps running from a different region may have dimished the load impact.

Also, could you confirm what our current load  (reqs/second) is on production? I recall last time we checked it was still quite low, but that may change soon.
Flags: needinfo?(ckolos)
That is a very likely candidate. Cross-region traffic also implies cross-VPC traffic. In short, you're running the loadtest from a completely different datacenter.

You can check the numbers for prod/stage using the datadog dashboards that I have set up. I am still working on some of the detailed graphs, but the overall picture is there.

https://app.datadoghq.com/dash/46191/shavar

Use the $ShavarEnv dropdown in the upper-left to switch between stage and prod.

As it stands right now, we are averaging 34.63 nginx r/s on the prod stack.
Flags: needinfo?(ckolos)
Please use this timeboard instead of the one linked above:

https://app.datadoghq.com/dash/53348
Thanks, Chris.  That's very helpful.  I'll try to re-run a few loadtests from us-west-1 today, but given the light traffic on production, even with the tests I did run, things look good.  Would you have time for a prod deployment tomorrow afternoon or thurs?
Flags: needinfo?(ckolos)
:rpapa I can deploy tomorrow, but I'd prefer Thursday...
Flags: needinfo?(ckolos)
Actually, thurs works best for me as well.  I'll send a calendar invite.
Do you prefer morning or afternoon?
Flags: needinfo?(ckolos)
:rpapa - no pref.
Flags: needinfo?(ckolos)
You need to log in before you can comment on or make changes to this bug.