Please deploy tokenserver 1.4.4
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Tracking
(Not tracked)
People
(Reporter: rfkelly, Assigned: jrgm)
References
Details
Attachments
(1 file)
|
36.22 KB,
text/plain
|
Details |
As part of our effort to accurately report and track the timestamp at which a users keys were rotated, this release of tokenserver loosens some restrictions on the data that must be provided by OAuth sync clients. It's step (1) of 3 in the above-linked github issue.
To effectively QA this change, we should confirm that Desktop and the Lockwise mobile app can sync successfully with the user's account, both before and after changing the account password.
| Reporter | ||
Comment 1•6 years ago
|
||
Cross-linking for context, we need to have this stable in production before merging https://github.com/mozilla/fxa/pull/2570 on the FxA side of things.
| Reporter | ||
Comment 2•6 years ago
|
||
Whoops, I somehow managed to merge the intended commit into a different branch rather than into master, stand by for a v1.4.4 to fix it...
| Reporter | ||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
tokenserver v1.4.4 is up on docker hub now
Updated•6 years ago
|
| Assignee | ||
Comment 4•6 years ago
|
||
Tokenserver v1.4.5 (note extra version bump to pick up a startup bug) is in stage. I've run a small loadtest on it, and that appears normal. Is there QA assigned to test the points in comment #0?
$ curl -s https://token.stage.mozaws.net/__version__ | python -mjson.tool
{
"build": "https://circleci.com/gh/mozilla-services/tokenserver/268",
"commit": "8d0df2b26a4a7bd3d47fa36284ee26aef9c39de2",
"source": "https://github.com/mozilla-services/tokenserver",
"version": "1.4.5"
}
$ curl -s https://token.stage.mozaws.net/ | python -mjson.tool
{
"auth": "https://token.stage.mozaws.net",
"browserid": {
"allowed_issuers": [
"api.accounts.firefox.com",
"api-accounts.stage.mozaws.net",
"api-accounts-dev.stage.mozaws.net",
"api-accounts.dev.lcip.org",
"api-accounts-latest.dev.lcip.org",
"nightly.dev.lcip.org",
"latest.dev.lcip.org",
"mockmyid.com",
"mockmyid.s3-us-west-2.amazonaws.com"
],
"trusted_issuers": [
"api.accounts.firefox.com",
"api-accounts.stage.mozaws.net",
"api-accounts-dev.stage.mozaws.net",
"api-accounts.dev.lcip.org",
"api-accounts-latest.dev.lcip.org",
"nightly.dev.lcip.org",
"latest.dev.lcip.org"
]
},
"oauth": {
"default_issuer": "api-accounts.stage.mozaws.net",
"scope": "https://identity.mozilla.com/apps/oldsync",
"server_url": "https://mock-oauth-stage.dev.lcip.org/v1/"
},
"services": {
"sync": [
"1.5"
]
}
}
$
| Reporter | ||
Comment 5•6 years ago
|
||
Is there QA assigned to test the points in comment #0?
I'm a bit out of the loop on tokenserver QA these days; Rebecca, is this still something you can help us wrangle?
Comment 6•6 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #5)
Is there QA assigned to test the points in comment #0?
I'm a bit out of the loop on tokenserver QA these days; Rebecca, is this still something you can help us wrangle?
Rebecca's on PTO.
Peter, can you help with this?
Comment 7•6 years ago
|
||
Hi, I can take this bug for verification. Please let me know what settings (prefs) should be made for testing on stage environment.
Comment 8•6 years ago
•
|
||
Thanks, Vasilica.
To test:
-
For Desktop - Follow these instructions for testing against staging.
-
For Lockwise iOS and Android - at the moment you'll need to do a custom local build of each one to test, with the staging sync url hardcoded into the builds. We're putting together more clear steps for you on how to do this, which we'll get to you shortly. In the meantime however, you should be able to start testing Desktop. Since they use the same components, we're comfortable moving forward testing one of these - not necessarily both.
Comment 9•6 years ago
|
||
Tested around Lockwise feature on Firefox 70.0.1 between Windows 10 64-bit and Mac OS X 10.14.6 profiles and all the passwords were successfully synced, both before and after changing the password (verified using two methods: changed from FxA settings and via forgot password flow using the recovery key)
Waiting for further instructions for android and ios, in order to verify also across this platforms.
Comment 10•6 years ago
|
||
Hi Vasilica,
Alrighty, so we've got some updated instructions for you to test iOS against staging. That should be enough to confirm we're good to go with this release, since both iOS and Android use the same shared rust components here.
And the good news here is you don't need to do any custom local builds. If you refer back to this document, you'll find there's updated instructions under "Using staging on ios". Follow those instructions to point firefox iOS to staging, and we should be good to go here.
Let me know if you need anything else on our end, and thanks so much!
| Assignee | ||
Comment 11•6 years ago
|
||
Adding ni? for rfk at his request: question is, is running with a canary of 1.4.5 and the rest on lower version a compatible mode, or is it preferable to not remain in "dual version" for very long?
| Reporter | ||
Comment 12•6 years ago
|
||
is running with a canary of 1.4.5 and the rest on lower version a compatible mode, or is it preferable to not remain in "dual version" for very long?
Thank :jrgm; I expect it will be OK do run in "dual mode". The main change here removes some consistency checks around generation numbers, because we expect that OAuth clients will soon start failing those checks. But that won't happen until we ship https://github.com/mozilla/fxa/pull/2570 to production in FxA, so I don't think there's any risk to have some tokenserver servers with the checks and some without in the meantime.
Comment 13•6 years ago
|
||
Seems that the password syncing does not work correctly when running the following steps:
- Make some changes on desktop
- Sync on desktop
- Change password on desktop (The user is not automatically logged out from fennec and there is no notification displayed)
- Go to Manage account on iOS - > relogin
- Sync on iOS
Actual results: passwords do not match
Notes: the passwords are synced after a log out/login on fennec + sync on iOS
I’ve also received a sync-error but I’m not sure if it is related to the above scenario.
Tested between Windows 10 64-bit and iOS 13.1.2.
Comment 14•6 years ago
|
||
Adding ni? for rfk here; Ryan, I can take a look too to try and confirm this is also an issue later today if that'd be helpful...
Vasilica; are you seeing this for passwords only, or all sync data between iOS and Desktop? And, just to confirm; could you paste your about:sync-log here as well to help us investigate this one?
Comment 15•6 years ago
|
||
Vasilica,
A few more clarifying questions for you here:
- When you say, "change password on desktop", are you talking about your FxA password?
- When you're testing against iOS, can you confirm you've followed the steps to point iOS to staging?
Comment 16•6 years ago
•
|
||
Ok, so I think I figured out what's going on here.
Some of the sync engines in mobile (the newer ones, using the Rust components) aren't wired up to allow pointing to stage. We at least still do have a few older sync engines ("history" being the one I've confirmed so far) that does allow overwriting the server in iOS to point to stage. So...what I think needs to be tested here (:rfk, would love your input as well) is that you can Sync across Desktop <-> say, iOS using say, History.
I've confirmed that syncing History DOES work against staging in iOS after changing FxA password (while say "bookmarks" do not, which would be explained by the fact that they're using Rust components in iOS).
Comment 17•6 years ago
|
||
(In reply to Rachel Tublitz [:tublitzed] from comment #16)
I've confirmed that syncing History DOES work against staging in iOS after changing FxA password (while say "bookmarks" do not, which would be explained by the fact that they're using Rust components in iOS).
I think this observation is correct. The code only changes the endpoints for those engines that are written in Swift and not for those in Rust. It is unclear if that was something we missed or if the Rust code is missing the option to set different endpoints. Looping in :garvan to take a peek at the code for a final conclusion.
How urgent is this?
| Reporter | ||
Comment 18•6 years ago
|
||
How urgent is this?
I don't think we should block this bug on fixing an issue in Firefox for iOS; leaving ni? myself to suggest an alternate QA path for the behaviour in question (and to try it to confirm that it works!)
Comment 19•6 years ago
|
||
In Firefox iOS, this is where the staging server URLs are specified
https://github.com/mozilla-mobile/firefox-ios/blob/master/Account/FirefoxAccountConfiguration.swift#L30
| Reporter | ||
Comment 20•6 years ago
|
||
:garvan did a bit of testing on Firefox for iOS and it appears that it can in fact successfully talk to the stage tokenserver, so I'm not sure what's going on here and will need to dig in further.
Some of the sync engines in mobile (the newer ones, using the Rust components) aren't wired up to allow pointing to stage
To add some more detail here: the rust sync-and-storage components themselves are capable of talking to stage or to self-hosted servers, but they have to be told explicitly to do so by the app. Most of the consumers of these components currently don't expose any facility for doing that, but it looks like Firefox for iOS does.
That said, it appears both Rachel and Garvan tried to sync bookmarks from Firefox iOS to stage, and the sync failed, so there's something going on here. I'll see if I can repro myself.
| Assignee | ||
Comment 21•6 years ago
|
||
So I'm holding on proceeding until some of the issues noted above are resolved or determined to be ok to proceed with. Let me know how things look, or if there is something you need from me in the stage environment.
| Assignee | ||
Comment 22•6 years ago
|
||
We can look at deploying tomorrow morning if it's all ready to go.
| Reporter | ||
Comment 23•6 years ago
|
||
I was able to configure our command-line logins sync demo client to talk to stage, and using that, was able to successfully sync logins between a rust-components client and desktop Firefox. I'm going to attach a screengrab of the session for completeness.
Everything synced correctly and demonstrated the behaviour I was expecting. In particular, I was able to manually edit the credentials being used by the rust-components client in order to give it an incorrect generation number, and tokenserver allowed it to proceed through to sync. This is the intended effect of the change being deployed here, and if I try the same trick on the production servers, tokenserver rejects it.
So I'm happy that functionality is working as expected here.
However, during my test session I did manage to trigger a single 503 Service Unavailabe from the tokenserver. It might have been a database blip, or it might have been some unhandled edge-case in the server code. :jrgm, could you please check the server logs for any corresponding traceback, at around 5 or 10 minutes prior to the timestamp of this comment?
| Reporter | ||
Updated•6 years ago
|
| Reporter | ||
Comment 24•6 years ago
|
||
Here's a recording of what I did for completeness: https://drive.google.com/open?id=1fXZUZnNncg8GKmNH42-53-1XVDExocK1
Comment 25•6 years ago
|
||
(In reply to Rachel Tublitz [:tublitzed] from comment #14)
Vasilica; are you seeing this for passwords only, or all sync data between iOS and Desktop?
This happens for all sync data and it seems more like a FxA issue. I also filed an issue in GitHub ( https://github.com/mozilla/fxa/issues/3283 ) for more investigations.
And, just to confirm; could you paste your
about:sync-loghere as well to help us investigate this one?
I already attached the about:sync-log file is Comment #13
When you say, "change password on desktop", are you talking about your FxA password?
Yes. The FxA password changed from Manage Account page
When you're testing against iOS, can you confirm you've followed the steps to point iOS to staging?
The environment was set following the instructions on "Using staging on ios" paragraph from https://docs.google.com/document/d/1KnFdE1ee-yv8En5QaPoRj3x2s9kTESoW0lGNh2LwIrI/edit#heading=h.q75o2k4vzjg
| Assignee | ||
Comment 26•6 years ago
|
||
The 503 details are from a failure on a request to https://mock-oauth-stage.dev.lcip.org
Nov 08 07:12:47 ip-172-31-44-212.ec2.internal docker-tokenserver[6010]: {"error": "ConnectionError(u"Verification request to https://mock-oauth-stage.dev.lcip.org/v1/ failed; reason: ('Connection aborted.', error(104, 'Connection reset by peer'))",)", "traceback": "Uncaught exception:
File "/app/tokenserver/views.py", line 202, in _validate_oauth_token
token = verifier.verify(token)
File "/app/tokenserver/verifiers.py", line 231, in verify
raise ConnectionError(msg)
<class 'browserid.errors.ConnectionError'>
ConnectionError(u"Verification request to https://mock-oauth-stage.dev.lcip.org/v1/ failed; reason: ('Connection aborted.', error(104, 'Connection reset by peer'))",)
", "time": "2019-11-08T07:12:47.550036Z", "v": 1, "message": "Unexpected verification error", "hostname": "ip-172-31-44-212.ec2.internal", "pid": 15, "op": "tokenserver", "name": "tokenserver"}
Nov 08 07:12:47 ip-172-31-44-212.ec2.internal docker-tokenserver[6010]: {"token.oauth.verify_failure": 1, "code": 503, "v": 1, "pid": 15, "agent": "reqwest/0.9.22", "token.oauth.connection_error": 1, "path": "https://token.stage.mozaws.net/1.0/sync/1.5", "tokenserver.oauth.verify": 0.00575709342956543, "name": "mozsvc.metrics", "request_time": 0.0065190792083740234, "remoteAddressChain": ["121.200.7.201", "172.31.42.210", "127.0.0.1"], "method": "GET", "time": "2019-11-08T07:12:47.550758Z", "hostname": "ip-172-31-44-212.ec2.internal", "op": "mozsvc.metrics"}
| Assignee | ||
Comment 27•6 years ago
|
||
So, I can't explain why but the above was a network error in invoking the api gateway for that endpoint. I don't see anything in the metrics for that api gateway or in the lambda handler that indicates it was having a problem. I can't explain why there'd be a network level error in this case. But I'm thinking this transient error is not a blocker to what rfk was demonstrating.
| Reporter | ||
Comment 28•6 years ago
|
||
But I'm thinking this transient error is not a blocker to what rfk was demonstrating.
I agree, this seems like a spurious error unrelated to the change in question.
| Assignee | ||
Comment 29•6 years ago
|
||
After chatting with some folks, this deploy to production will happen Monday morning PST.
Comment 30•6 years ago
|
||
Sounds good, thanks :jrgm!
| Reporter | ||
Comment 31•6 years ago
|
||
I already attached the about:sync-log file is Comment #13
The file attached in Comment 13 appears to be a copy of services/sync/modules/policies.js from mozilla-central rather than a sync error log - Vasilica any chance you still have the profile and are able to double-check whether this is indeed what was in the sync error logs for that run?
| Reporter | ||
Updated•6 years ago
|
Comment 32•6 years ago
|
||
Stefan, since Vasi is out PTO today, any chance you can get us the sync-log here?
| Assignee | ||
Comment 33•6 years ago
|
||
This was put in production this afternoon about 1530 PST. Note: that's with a purge task disabled, and I'll have to work out how to get that caught up but the main functionality is in place.
Comment 34•6 years ago
|
||
thanks, John
Comment 35•6 years ago
|
||
Thanks for covering QA Ica
Comment 36•6 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #20)
That said, it appears both Rachel and Garvan tried to sync bookmarks from Firefox iOS to stage, and the sync failed, so there's something going on here. I'll see if I can repro myself.
I tested the bookmarks using the spanner account in the production version of FF for iOS and I can't sync any bookmarks. It seems that the iOS bookmarks are not synced with other OS's and vice-versa.
Is there any ticket about this issue or should I open a new one?
Notes:
- History and Passwords are working properly.
Updated•6 years ago
|
| Reporter | ||
Comment 37•6 years ago
|
||
Is there any ticket about this issue or should I open a new one?
This is consistent with what Garvan observed above as well. Please do file a separate issue for us to dig in to this more. Thanks!
Description
•