Closed Bug 786569 Opened 13 years ago Closed 13 years ago

sync: server-storage stage deploy: server_storage -> 1.13-7, server_core -> 2.10-7

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Unassigned)

Details

(Whiteboard: [qa+])

Please deploy server-storage 1.13-7 and server-core 2.10-7 to stage sync server environments. Build command: make build PYPI=http://pypi.build.mtv1.svc.mozilla.com/simple PYPIEXTRAS=http://pypi.build.mtv1.svc.mozilla.com/extras PYPISTRICT=1 SERVER_STORAGE=rpm-1.13-7 SERVER_CORE=rpm-2.10-7 CHANNEL=prod RPM_CHANNEL=prod build_rpms This includes an experimental umemcache-based backend so that we can try to do some basic connection pooling, and is otherwise unchanged from 1.13-6. Please deploy it with the MozScvWorker gunicorn worker enabled as described in Bug 786479 Comment 1.
Built and deployed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Light load is running, a couple of failures which I will dig into. Interestingly, the greenlet-blocking detector is reporting lots of instances of blocking in os-level functions like this: File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 442, in spawn_worker worker.init_process() File "/usr/lib/python2.6/site-packages/services/gunicorn_worker.py", line 93, in init_process super(MozSvcWorker, self).init_process() File "/usr/lib/python2.6/site-packages/gunicorn/workers/ggevent.py", line 105, in init_process super(GeventWorker, self).init_process() File "/usr/lib/python2.6/site-packages/gunicorn/workers/base.py", line 102, in init_process self.run() File "/usr/lib/python2.6/site-packages/gunicorn/workers/ggevent.py", line 77, in run self.notify() File "/usr/lib/python2.6/site-packages/gunicorn/workers/base.py", line 66, in notify self.tmp.notify() File "/usr/lib/python2.6/site-packages/gunicorn/workers/workertmp.py", line 34, in notify os.fchmod(self._tmp.fileno(), self.spinner) Not much we can do about these and they don't seem to be causing any problems, we may just have to up the checking interval enough to exclude them.
Testing with no connection pooling produced very similar results to the ones encountered with python-memcached, as reported in Bug 786536 Comment 2. This is as expected since it would be using approximately the same number of connections to couchbase. In addition I am seeing some requests time out after approx 40 seconds. This is new, and at a first guess I'd say it's probably a bug in the connection manager I implemented on top of umemcache. Next I will try some connection pooling to see if I can remove the "proxy downstream timeout" errors.
Limiting things to one memcache connection per worker process, this version has sustained 10mins of loadtest, serving ~300qps with no errors. That's a very good start...
With three connections per worker process I also see no errors. With ten connections per worker process, I see regular occurrences of the "proxy downstream timeout" error. This seems to support our hypothesis that the loadtest errors from yesterday were caused by too many concurrent connections to couchbase, especially since the graphs of couchbase activity are pretty much identical between these runs. (This seems like a very low number of total connections: 10 connections x 4 workers x 4 webheads. But it's hard to argue with the results.)
Reopening this one.... :bobm and :rfkelly to be consistent, we should be deploying to Dev as well... Dev Sync webheads (sync{2..5}.web.mtv1.dev) are running the following: python26-services-2.10-6 python26-syncstorage-1.13-6 or older (inconsistent)! Also, looks like sync1.web.mtv1.dev.svc is down?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Alternately (rather than follow the ts/aitc model), I can open a separate bug for Sync Dev Env...
Whiteboard: [qa+]
IIRC sync dev env is currently blocked due to some dependency problems, pending a rebuilt with new OS. In any case, I'm about to prep a new deploy request so no point hassling :bobm to get this one into dev :-)
Sounds good. Let's go with a fresh ticket then.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.