Closed
Bug 976729
Opened 11 years ago
Closed 11 years ago
[apk] memcache for signer
Categories
(Cloud Services :: Operations: Marketplace, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kumar, Assigned: jason)
References
Details
Attachments
(1 file)
45.24 KB,
text/plain
|
Details |
Let's add memcache to the APK Signer so we can check nonce values (bug 963141)
Assignee | ||
Updated•11 years ago
|
Assignee: server-ops-amo → jthomas
Updated•11 years ago
|
Priority: -- → P1
Assignee | ||
Comment 1•11 years ago
|
||
Assignee | ||
Comment 2•11 years ago
|
||
Memcache is configured on -dev. This will be enabled on stage and prod on the next push.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•11 years ago
|
||
Jason, let us know when you're able to stabilize memcache on dev. I'm re-opening this so we don't lose track of it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 4•11 years ago
|
||
Assignee | ||
Comment 6•11 years ago
|
||
It looks like django.core.cache.backends.memcached.MemcachedCache is not playing nicely with uwsgi. After each worker handles 210 requests the worker locks up and stops responding and uwsgi claims that the worker is 'busy'. Here is part of gdb backtrace, I've attached the whole backtrace.
Frame 0x3a73f30, for file /opt/apk-signer/apk-signer/venv/lib/python2.6/site-packages/django/dispatch/dispatcher.py, line 270, in _remove_receiver (self=<Signal(use_caching=False, lock=<thread.lock at remote 0x7fc6c20bf6a8>, providing_args=set([]), sender_receivers_cache={}, receivers=[((38228048, 7195216), <weakref at remote 0x247db50>), (((49751120, 49833936), 7195216), <BoundMethodWeakref(weakFunc=<weakref at remote 0x2f87050>, deletionMethods=[<instancemethod at remote 0x2f6c460>], weakSelf=<weakref at remote 0x2f87158>, funcName='close', selfName='<django.core.cache.backends.memcached.MemcachedCache object at 0x2f72450>', key=(49751120, 49833936)) at remote 0x2f25750>), (((57907344, 49833936), 7195216), <BoundMethodWeakref(weakFunc=<weakref at remote 0x3735a48>, deletionMethods=[<instancemethod at remote 0x3731500>], weakSelf=<weakref at remote 0x3735af8>, funcName='close', selfName='<django.core.cache.backends.memcached.MemcachedCache object at 0x3739890>', key=(57907344, 49833936)) at remote 0x37398d0>...(truncated)
Strangely if I run the worker as root and it is able to create bytecode files the issue disappears.
I've tested with pylibmc and that works without a issue. https://github.com/mozilla/apk-signer/pull/19 for it.
Assignee | ||
Comment 7•11 years ago
|
||
gdb bt full
Reporter | ||
Comment 8•11 years ago
|
||
This is the pylibc patch https://github.com/mozilla/apk-signer/commit/ba248a9ab116e018188137596ef94ef421e14c0b
This should fix the issue.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 9•11 years ago
|
||
The issues in comment 6 look related to this django bug https://code.djangoproject.com/ticket/21952
Assignee | ||
Comment 10•11 years ago
|
||
Unfortunately pylibmc only seems to delay the issue in comment 6. After about 2-3 days of uptime the worker becomes unresponsive.
Reporter | ||
Comment 11•11 years ago
|
||
Re-opening since it's still not fixed.
Here are some solutions I see:
- get a dev to spend time backporting the Django 1.7 code to 1.6. It looks like this got started by someone else but it probably will take a good chunk of time.
- try to use redis instead of memcache
- disable caching altogether. This opens us up to replay attacks in Hawk but that should be ok for what the signer does
- downgrade to Django 1.5. This will be tricky since the signer was built for 1.6
- periodically restart the workers :) I guess you're already doing that
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 12•11 years ago
|
||
Another workaround might be to switch to a multi-process wsgi setup rather than multithreaded. I guess it would use more memory.
Reporter | ||
Comment 13•11 years ago
|
||
nevermind comment #12, we were already running the apk signer was processes
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Kumar McMillan [:kumar] (needinfo for quickness) from comment #11)
> - periodically restart the workers :) I guess you're already doing that
Yes this is what we have in place right now and seems to be working okay.
Assignee | ||
Comment 15•11 years ago
|
||
Spoke with :kumar yesterday about this. I am going to configure heka to push nginx logs to kibana so we have an idea about how many ISE 500 requests are being served due to uwsgi harakiri. We are also aiming to move APK Signer to Django 1.7 once it is available.
Status: REOPENED → ASSIGNED
Updated•11 years ago
|
Summary: Memcache for APK Signer → [apk] memcache for signer
Updated•11 years ago
|
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
Version: other → unspecified
Assignee | ||
Comment 16•11 years ago
|
||
controller.apk.firefox.com nginx logs are available at https://kibana.shared.us-west-2.prod.mozaws.net/#/dashboard/elasticsearch/PROD%20-%20APK%20HTTP%20Status
signer nginx logs are available at https://kibana.shared.us-west-2.prod.mozaws.net/#/dashboard/elasticsearch/PROD%20-%20APK%20Signer%20HTTP%20Status
I don't see any ISE 500 requests but there are a few 400 and 499 requests in the controller nginx logs. The signer nginx logs look okay.
Assignee | ||
Comment 17•11 years ago
|
||
signer nginx logs look okay, no 500s. closing this request out.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•