Closed
Bug 1247736
Opened 9 years ago
Closed 9 years ago
Please deploy tokenserver 1.2.20
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Unassigned)
References
Details
(Whiteboard: [qa+])
This version of tokenserver includes handling for a new "reset" event type from the FxA SQS stream, to help disconnect devices in a more timely manner after password reset:
Bug 1226094 - Notify tokenserver of password reset events in FxA
Please deploy. For QA'ing this in stage, we'll want to reproduce the steps in Bug 1206325 to test that devices are indeed disconnected after password reset in a timely manner.
(No particular urgency here though, if you want to wait until after the dockerization work in Bug 1245385 that's a-oh-kay)
Updated•9 years ago
|
QA Contact: kthiessen
Comment 1•9 years ago
|
||
Notice: I am in London doing Kinto work for the week of 15-19 February; my bandwidth for other tasks will be pretty limited.
Happy to schedule this work sometime after next week.
Comment 2•9 years ago
|
||
Looked at the changes. If the message's event type is not recognize then it is deleted from the queue.
I propose these changes:
1. rename process_account_deletions.py to process_account_events.py
- requires puppet changes, but removes a YTF? in the future
2. if the message type is unrecognized, log it and ignore it.
- It will go back on the queue to be processed again.
- logging: write an event to stdout (mozlog), we'll get these into kibana
- increment a (datadog) statsd tokenserver.account_events.ignored metric
- this will be monitored and we'll be alerted
The reason for #2 is that servers will run different versions during a deployment. Important messages will get deleted and we won't know why.
If we don't care about losing events, just ignore my above comment. :)
Reporter | ||
Comment 3•9 years ago
|
||
See:
* https://github.com/mozilla-services/tokenserver/pull/80/files
* https://github.com/mozilla-services/puppet-config/pull/1782
(Although I haven't test the later in combination with the former)
> If we don't care about losing events,
I don't care about losing these events (in fact they're already being sent by the fxa-auth-server, and being silently dropped by the current version of tokenserver).
Reporter | ||
Comment 4•9 years ago
|
||
v1.2.19 includes the renaming linked above
Summary: Please deploy tokenserver 1.2.18 → Please deploy tokenserver 1.2.19
Comment 5•9 years ago
|
||
Now that bug 1245385 is done I can focus on rolling this out. Should be pretty straight forward.
Comment 6•9 years ago
|
||
Planning on rolling this out ASAP. Waiting on https://github.com/mozilla/browserid-verifier/pull/78 to merge to get some security patches from node 0.10.42.
This release also updates openresty to 1.9.7.3 which includes the glibc security fix.
Comment 7•9 years ago
|
||
New tokenserver boxes rolled out with:
- tokenserver 1.2.19 with the new SQS event handling code
- browserid-verifier 0.5.1
- openresty 1.9.7.3
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 8•9 years ago
|
||
I had to roll back this deployment. About an hour ago the new account SQS event processor started crashing. It looks like it doesn't sanitize data correctly and Python throws an exception which systemd eventually stops restarting the service.
I filed an issue against it here: https://github.com/mozilla-services/tokenserver/issues/85
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 9•9 years ago
|
||
I've tagged 1.2.20 with a fix for the above issue.
Summary: Please deploy tokenserver 1.2.19 → Please deploy tokenserver 1.2.20
Comment 10•9 years ago
|
||
I think this is ready for another shot. :mostlygeek, want anything else before we deploy?
Flags: needinfo?(bwong)
Comment 11•9 years ago
|
||
Don't need anything. In my deploy queue. Planning on launching it today or tomorrow.
Flags: needinfo?(bwong)
Comment 12•9 years ago
|
||
I just rolled this out and it looks a lot better. I'm going to run a canary server in prod for a day to make sure no strange behavior pops up. After that I shall mark it verified.
Updated•9 years ago
|
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Comment 13•9 years ago
|
||
Just checked 1.2.20 in prod and no more crashes processing events.
No longer blocks: 1254734
Status: RESOLVED → VERIFIED
Comment 14•9 years ago
|
||
Note that 1.2.20 has not been totally rolled out yet. There's a mix of 1.2.17 and 1.2.20 servers in the wild right now. After Bug 1254734 is deployed then 1.2.20 will be totally deployed.
You need to log in
before you can comment on or make changes to this bug.
Description
•