Closed
Bug 1026644
Opened 10 years ago
Closed 10 years ago
Deploy BrowserID-Verifier 0.2.2 to Stage
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: jbonacci, Assigned: mostlygeek)
References
Details
(Whiteboard: [qa+])
Based on this fix: https://github.com/mozilla/browserid-verifier/pull/55 And the comments here: https://github.com/mozilla-services/puppet-config/issues/600
Reporter | ||
Updated•10 years ago
|
Whiteboard: [qa+]
Comment 1•10 years ago
|
||
Version 0.2.2 tagged and ready to go; bug title updated accordingly
Summary: Deploy latest BrowserID-Verifier to Stage → Deploy BrowserID-Verifier 0.2.2
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → bwong
Assignee | ||
Updated•10 years ago
|
Summary: Deploy BrowserID-Verifier 0.2.2 → Deploy BrowserID-Verifier 0.2.2 to Stage
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•10 years ago
|
||
Verified the physical deployment to Stage. We are now running on two c3.large instances: i-2fb8137d i-ca6fbde1 Verified bump in version: rpm -qa | grep verifier fxa-browserid-verifier-svcops 0.2.2-1 x86_64 29249727 We don't have any quick deployment verification tests at this time, so moving on to load testing, keeping the following fixes/updates/commments in mind: https://github.com/mozilla/browserid-verifier/pull/55 https://github.com/mozilla-services/puppet-config/issues/600
Reporter | ||
Comment 3•10 years ago
|
||
The 30 minute load test looked good for the first 20 minutes, then it started showing errors in the Loads dashboard: Tests over 337881 Successes 337743 Failures 138 I will investigate once the test is over...
Reporter | ||
Comment 4•10 years ago
|
||
Final stats: Test was launched by jbonacci Run Id 0721432e-b4b4-4db4-ab5a-ee121c3c2381 Duration 30 min and 22 sec. Started 2014-06-18 21:04:07 UTC Ended 2014-06-18 21:34:29 UTC State Ended Users [20] Hits None Agents 5 Duration 1800 Server URL https://verifier.stage.mozaws.net Tests over 442362 Successes 442100 Failures 0 Errors 0 TCP Hits 442362 Opened web sockets 0 Total web sockets 0 Bytes/websockets 0 Requests / second (RPS) 242 addFailure 262 REF: https://loads.services.mozilla.com/run/0721432e-b4b4-4db4-ab5a-ee121c3c2381
Reporter | ||
Comment 5•10 years ago
|
||
OK, so this matches exactly with the number of 503s on instance ec2-54-81-41-31 The other instance is clean. :mostlygeek can you take a look at this?
Reporter | ||
Comment 6•10 years ago
|
||
Forgot to add, in the verifier_err.log file on the same server, there are the following messages: {"op":"bid.server","name":"bid.server","time":"2014-06-18T21:26:37.964Z","pid":2250,"v":1,"hostname":"ip-10-203-170-59","message":"too busy"} (262 of them ;-) )
Comment 7•10 years ago
|
||
> We don't have any quick deployment verification tests
That's what the `make test` target in the loadtest scripts are for :-)
Comment 8•10 years ago
|
||
This is reminiscent of Bug 996763 Comment 44 - 503s with only 40% CPU utilization. I'm also surprised to see it failing on "toobusy" rather than from the computecluster. Any chance we've busted some configuration here? I'll dig in...
Comment 9•10 years ago
|
||
Config seems to be ok. Checking the log, all of the toobusy errors occurred in the space of <1 second, from 2014-06-18T21:26:37.861Z to 2014-06-18T21:28:38.616Z. The loadtest then continued successfully for another five minutes, until 2014-06-18T22:43:25.008Z. The toobusy module polls the event-loop every 500ms, so this is likely a single latency spike triggering toobusy and then recovering quickly on the next poll. It's quite possible we just got unlucky here, dealt with it gracefully, and recovered quickly. It looks like we're using the default toobusy maxLag of 70ms, which is quite small. Given the generally compute-heavy workload here, maybe we should consider bumping it upwards a little to smooth out these spikes. CPU utilization of 40% is not awesome though. Is this due to e.g. having more or bigger instances than in the previous test?
Reporter | ||
Comment 10•10 years ago
|
||
And, why only on one of the two instances?
Reporter | ||
Comment 11•10 years ago
|
||
Teamwork. 16:18 < rfkelly> jbonacci mostlygeek I think we can ship it 16:19 < mostlygeek> ship.it. :-)
Status: RESOLVED → VERIFIED
Comment 12•10 years ago
|
||
> And, why only on one of the two instances?
Well, that's part of why I think it was just a random latency spike putting us over the line. Only once, on only one of the instances, and recovered straight away. I will file a puppet-config PR for a slightly higher maxLag setting.
You need to log in
before you can comment on or make changes to this bug.
Description
•