Closed Bug 803389 Opened 13 years ago Closed 13 years ago

sync: server-storage stage deploy: server_storage -> 1.15-1, server_core -> 2.12-1

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task, P1)

task

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: bobm)

References

Details

Please deploy server-storage 1.15-1 and server-core 2.12-1 to sync server stage environment. Build command: make build PYPI=http://pypi.build.mtv1.svc.mozilla.com/simple PYPIEXTRAS=http://pypi.build.mtv1.svc.mozilla.com/extras PYPISTRICT=1 SERVER_STORAGE=rpm-1.15-1 SERVER_CORE=rpm-2.12-1 CHANNEL=prod RPM_CHANNEL=prod build_rpms This version includes the memory usage fix from Bug 802486, as well as updated dependencies as described in the 1.14-1 deployment request from Bug 800254. I would like to take it all the way through to production to see how much this helps with the memory-usage issued reported in Bug 799727. Relevant bugs that will be on their way to production for the first time: * Bug 802486 - fix memory leak in SQLStorage._temp_cache * Bug 693896 - use DELETE /storage in the loadtest * Bug 624791 - MySQL "query execution interrupted" errors should produce a 503 * Bug 784567 - log reason for all authentication failures * Bug 648607 - special-case handling of "lock wait timeout" errors Ops notes: * Please revert the gunicorn-syncstorage daemontools change from Bug 800254, so that it is using "-k gevent" like in production QA notes: * this is targeted for production so we will need to arrange a full 48-hour loadtest. * the loadtest script has been changed since last deployment, so we may see new failure modes that weren't exercised previously (Bug 693896)
Taking this bug.
Assignee: nobody → bobm
Status: NEW → ASSIGNED
Blocks: 799727
Oh, also an additional config change please. In /etc/sync/sync.conf, section [storage], there is currently a key named "hosts" which lists all the individual hostnames in use. Remove it. This setting is no longer necessary, as the app loader scans through and finds all config sections named [host:BLAH] to build up the list of hostnames dynamically. Assuming it works as intended, we should make a similar change in prod when we push out this version.
I'm adding two known issues with the stage environment as blockers for this bug. They're not hard blockers, just pieces of the infra that are temporarily out-of-commission. Easy to work around during loadtesting. However, their absence will increase the load on the other machines and might lead to unnecessary loadtest failures. Bob, please triage as you see fit, and I will add any necessary workarounds for loadtesting.
Depends on: 801470
Beginning build phase.
RPMs built. Investigating configuration file changes required in puppet.
Deployed.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Bug 784098 reports that sync1.db.scl2.stage is back in service, but I still receive "server issue: database marked as down" from requests that target it. Can you please check if it needs to be re-enabled in zeus or something like that?
It looks like stage-sync[1-10] still had downed=1 set in mysql. I've changed downed=0 for all 10.
Loadtesting of this push revels a handful of 503s with the message: "server error: converted 502" "502 Bad Gateway" is nginx reporting that it can't connect to the gunicorn processes. I'll need to dig into what's going on here.
What is the status here?
Priority: -- → P1
Well I guess we are beyond this. Bounce if you need to...
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.