976729 - [apk] memcache for signer

It looks like django.core.cache.backends.memcached.MemcachedCache is not playing nicely with uwsgi. After each worker handles 210 requests the worker locks up and stops responding and uwsgi claims that the worker is 'busy'. Here is part of gdb backtrace, I've attached the whole backtrace. Frame 0x3a73f30, for file /opt/apk-signer/apk-signer/venv/lib/python2.6/site-packages/django/dispatch/dispatcher.py, line 270, in _remove_receiver (self=<Signal(use_caching=False, lock=<thread.lock at remote 0x7fc6c20bf6a8>, providing_args=set([]), sender_receivers_cache={}, receivers=[((38228048, 7195216), <weakref at remote 0x247db50>), (((49751120, 49833936), 7195216), <BoundMethodWeakref(weakFunc=<weakref at remote 0x2f87050>, deletionMethods=[<instancemethod at remote 0x2f6c460>], weakSelf=<weakref at remote 0x2f87158>, funcName='close', selfName='<django.core.cache.backends.memcached.MemcachedCache object at 0x2f72450>', key=(49751120, 49833936)) at remote 0x2f25750>), (((57907344, 49833936), 7195216), <BoundMethodWeakref(weakFunc=<weakref at remote 0x3735a48>, deletionMethods=[<instancemethod at remote 0x3731500>], weakSelf=<weakref at remote 0x3735af8>, funcName='close', selfName='<django.core.cache.backends.memcached.MemcachedCache object at 0x3739890>', key=(57907344, 49833936)) at remote 0x37398d0>...(truncated) Strangely if I run the worker as root and it is able to create bytecode files the issue disappears. I've tested with pylibmc and that works without a issue. https://github.com/mozilla/apk-signer/pull/19 for it.

Jason Thomas [:jason]

Assignee

Comment 7

•

11 years ago

Attached file gdb.txt — Details

gdb bt full

Kumar McMillan [:kumar]

Reporter

Comment 8

•

11 years ago

This is the pylibc patch https://github.com/mozilla/apk-signer/commit/ba248a9ab116e018188137596ef94ef421e14c0b This should fix the issue.

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Jason Thomas [:jason]

Assignee

Comment 9

•

11 years ago

The issues in comment 6 look related to this django bug https://code.djangoproject.com/ticket/21952

Jason Thomas [:jason]

Assignee

Comment 10

•

11 years ago

Unfortunately pylibmc only seems to delay the issue in comment 6. After about 2-3 days of uptime the worker becomes unresponsive.

Kumar McMillan [:kumar]

Reporter

Comment 11

•

11 years ago

Re-opening since it's still not fixed. Here are some solutions I see: - get a dev to spend time backporting the Django 1.7 code to 1.6. It looks like this got started by someone else but it probably will take a good chunk of time. - try to use redis instead of memcache - disable caching altogether. This opens us up to replay attacks in Hawk but that should be ok for what the signer does - downgrade to Django 1.5. This will be tricky since the signer was built for 1.6 - periodically restart the workers :) I guess you're already doing that

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Kumar McMillan [:kumar]

Reporter

Comment 12

•

11 years ago

Another workaround might be to switch to a multi-process wsgi setup rather than multithreaded. I guess it would use more memory.

Kumar McMillan [:kumar]

Reporter

Comment 13

•

11 years ago

nevermind comment #12, we were already running the apk signer was processes

Jason Thomas [:jason]

Assignee

Comment 14

•

11 years ago

(In reply to Kumar McMillan [:kumar] (needinfo for quickness) from comment #11) > - periodically restart the workers :) I guess you're already doing that Yes this is what we have in place right now and seems to be working okay.

Jason Thomas [:jason]

Assignee

Comment 15

•

11 years ago

Spoke with :kumar yesterday about this. I am going to configure heka to push nginx logs to kibana so we have an idea about how many ISE 500 requests are being served due to uwsgi harakiri. We are also aiming to move APK Signer to Django 1.7 once it is available.

Status: REOPENED → ASSIGNED

Jeremy Orem [:oremj]

Updated

•

11 years ago

Summary: Memcache for APK Signer → [apk] memcache for signer

Jeremy Orem [:oremj]

Updated

•

11 years ago

Component: Server Operations: AMO Operations → Operations: Marketplace

Product: mozilla.org → Mozilla Services

Version: other → unspecified

Jason Thomas [:jason]

Assignee

Comment 16

•

11 years ago

controller.apk.firefox.com nginx logs are available at https://kibana.shared.us-west-2.prod.mozaws.net/#/dashboard/elasticsearch/PROD%20-%20APK%20HTTP%20Status signer nginx logs are available at https://kibana.shared.us-west-2.prod.mozaws.net/#/dashboard/elasticsearch/PROD%20-%20APK%20Signer%20HTTP%20Status I don't see any ISE 500 requests but there are a few 400 and 499 requests in the controller nginx logs. The signer nginx logs look okay.

Jason Thomas [:jason]

Assignee

Comment 17

•

11 years ago

signer nginx logs look okay, no 500s. closing this request out.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED