Closed
Bug 803389
Opened 13 years ago
Closed 13 years ago
sync: server-storage stage deploy: server_storage -> 1.15-1, server_core -> 2.12-1
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task, P1)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Assigned: bobm)
References
Details
Please deploy server-storage 1.15-1 and server-core 2.12-1 to sync server stage environment. Build command:
make build PYPI=http://pypi.build.mtv1.svc.mozilla.com/simple PYPIEXTRAS=http://pypi.build.mtv1.svc.mozilla.com/extras PYPISTRICT=1 SERVER_STORAGE=rpm-1.15-1 SERVER_CORE=rpm-2.12-1 CHANNEL=prod RPM_CHANNEL=prod build_rpms
This version includes the memory usage fix from Bug 802486, as well as updated dependencies as described in the 1.14-1 deployment request from Bug 800254.
I would like to take it all the way through to production to see how much this helps with the memory-usage issued reported in Bug 799727. Relevant bugs that will be on their way to production for the first time:
* Bug 802486 - fix memory leak in SQLStorage._temp_cache
* Bug 693896 - use DELETE /storage in the loadtest
* Bug 624791 - MySQL "query execution interrupted" errors should produce a 503
* Bug 784567 - log reason for all authentication failures
* Bug 648607 - special-case handling of "lock wait timeout" errors
Ops notes:
* Please revert the gunicorn-syncstorage daemontools change from Bug 800254, so that it is using "-k gevent" like in production
QA notes:
* this is targeted for production so we will need to arrange a full 48-hour loadtest.
* the loadtest script has been changed since last deployment, so we may see new failure modes that weren't exercised previously (Bug 693896)
| Reporter | ||
Comment 2•13 years ago
|
||
Oh, also an additional config change please.
In /etc/sync/sync.conf, section [storage], there is currently a key named "hosts" which lists all the individual hostnames in use. Remove it.
This setting is no longer necessary, as the app loader scans through and finds all config sections named [host:BLAH] to build up the list of hostnames dynamically.
Assuming it works as intended, we should make a similar change in prod when we push out this version.
| Reporter | ||
Comment 3•13 years ago
|
||
I'm adding two known issues with the stage environment as blockers for this bug.
They're not hard blockers, just pieces of the infra that are temporarily out-of-commission. Easy to work around during loadtesting. However, their absence will increase the load on the other machines and might lead to unnecessary loadtest failures.
Bob, please triage as you see fit, and I will add any necessary workarounds for loadtesting.
Depends on: 801470
| Assignee | ||
Comment 4•13 years ago
|
||
Beginning build phase.
| Assignee | ||
Comment 5•13 years ago
|
||
RPMs built. Investigating configuration file changes required in puppet.
| Assignee | ||
Comment 6•13 years ago
|
||
Deployed.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 7•13 years ago
|
||
Bug 784098 reports that sync1.db.scl2.stage is back in service, but I still receive "server issue: database marked as down" from requests that target it. Can you please check if it needs to be re-enabled in zeus or something like that?
Comment 8•13 years ago
|
||
It looks like stage-sync[1-10] still had downed=1 set in mysql. I've changed downed=0 for all 10.
| Reporter | ||
Comment 9•13 years ago
|
||
Loadtesting of this push revels a handful of 503s with the message:
"server error: converted 502"
"502 Bad Gateway" is nginx reporting that it can't connect to the gunicorn processes. I'll need to dig into what's going on here.
Comment 11•11 years ago
|
||
Well I guess we are beyond this.
Bounce if you need to...
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•