Closed Bug 1480335 Opened 6 years ago Closed 8 days ago

Silently being disconnected from Sync

Categories

(Cloud Services :: Server: Firefox Accounts, defect)

defect
Not set
normal

Tracking

(firefox63 affected)

RESOLVED WORKSFORME
Tracking Status
firefox63 --- affected

People

(Reporter: markh, Unassigned)

Details

Attachments

(1 file)

Attached file sync-log.txt
Alex reported that he appeared to have been unexpectedly kicked out of Sync - completely disconnected rather than in a "please reconnect" state. His logs just show a startOver completing and we now delete old logs on disconnect, so haven't got a lot to go on.

Coincidentally, just a couple of hours later, one of my test profiles appeared to have the same thing happen. I'm not 100% sure of that the state of that test profile was, although I'm sure I didn't accidentally disconnect via the UI.

However, I did capture a log, which I'm attaching. It also doesn't have much to go on. It shows sync being initialized very quickly after startup, which implies session restore restored either about:preferences or about:sync.

Also worth noting:
* We've no external bug reports on this, so it's probably not widespread.
* It's vaguely possible about:sync causes this by triggering some race condition if it forces sync to init (eg, if loaded as part of session restore). I'm going to investigate this a little more.

This isn't particularly actionable at the moment, but thought I'd get it on file before I lose my log.

Log shows:

> console.log:  *** aboutsync:  starting up
> console.log:  *** aboutsync:  not installing aboutsync extension into window of type null
> 1533014484997   Sync.Service    INFO    Loading Weave 1.65.0
<snip sync initializing log entries, until>
> 1533014485076   Sync.Engine.Tabs.Tracker        TRACE   Registering tab listeners in window
> 1533014485076   Sync.Engine.Bookmarks.Tracker   TRACE   start().
> 1533014485076   FirefoxAccounts DEBUG   Notifying observers of fxaccounts:onlogout

oh oh - we've started disconnecting - even though sync only just finished initialzing. There's nothing in the log to indicate what kicked the disconnect off.

then snip lots of logs from the reset, then:
> 1533014493207   FirefoxAccounts DEBUG   FxAccountsWebChannel message received: fxaccounts:logout
> 1533014493207   FirefoxAccounts DEBUG   FxAccountsWebChannel message details: {"command":"fxaccounts:logout","data":{},"messageId":null}
> 1533014493664   FirefoxAccounts DEBUG   FxAccountsWebChannel message received: fxaccounts:loaded
> 1533014493664   FirefoxAccounts DEBUG   FxAccountsWebChannel message details: {"command":"fxaccounts:loaded","messageId":null}

so the webchannel also told us to log out - which implies accounts.firefox.com was also open. This seems odd and I can't reproduce it when manually disconnecting - but while there's nothing in the log to imply this was initiated remotely, it is a little strange.
> * We've no external bug reports on this, so it's probably not widespread.

Being logged out of a service is not usually a cause for alarm for people so I wouldn't expect many reports of this. 

If I was a user experiencing this, I'd just think that the session expiry lengths changed or something like that. I'd sign back in and make nothing of it. I'd be just surprised that it wasn't more obvious that I was signed out.
I'm struggling to reproduce this, but I have done it again while I had some diagnostics in place. The disconnection is definitely coming from a webchannel. I was starting the browser after changing the password on a different device (so was expecting the browser to enter a "please reauth" state), and the active tab being loaded was (I think) https://accounts.firefox.com/signup - it was certainly on accounts.firefox.com. I saw the logout happen with the following stack:

> async*signOut@resource://gre/modules/FxAccounts.jsm:853:12
> Async*logout/<@resource://gre/modules/FxAccountsWebChannel.jsm:338:16
> promise callback*logout@resource://gre/modules/FxAccountsWebChannel.jsm:334:12
> _receiveMessage@resource://gre/modules/FxAccountsWebChannel.jsm:161:9
> listener@resource://gre/modules/FxAccountsWebChannel.jsm:247:11
> deliver@resource://gre/modules/WebChannel.jsm:321:9
> _listener@resource://gre/modules/WebChannel.jsm:94:13
> MessageListener.receiveMessage*registerChannel@resource://gre/modules/WebChannel.jsm:43:7
> listen@resource://gre/modules/WebChannel.jsm:256:7
> BG__onFirstWindowLoaded@file:///C:/src/moz/gecko/obj-release/dist/bin/browser/components/nsBrowserGlue.js:983:5
> BG_observe@file:///C:/src/moz/gecko/obj-release/dist/bin/browser/components/nsBrowserGlue.js:434:9
> async*_delayedStartup@chrome://browser/content/browser.js:1584:5
> EventListener.handleEvent*onLoad@chrome://browser/content/browser.js:1388:5
> onload@chrome://browser/content/browser.xul:1:1

There are a number of complications here:

* There's often a race during browser startup between the browser webchannel mechanism initializing (which is in the above stack), the FxA webchannel initializing (which happens "on idle" after browser startup), and the content page being restored. So often, when restoring a session with accounts.firefox.com, the content page often sends a message before we are prepared to handle it.

* Similarly, the sync logs often aren't setup when this happens, so we don't see webchannel logging at this time. I added a dump() with an Error().stack to get the above dump.

I'm not sure which interactions are necessary to trigger this. Things I've rules out:

* I'm confident that on this profile, only one FxA user has ever been signed in. IOW, I don't think it's possible there is a user mismatch between local storage and fxa, for example.

* I instrumented the webchannel code so the fxaccounts:fxa_status response always said there was no user signed in, to rule out a race at startup where we may not yet know who is logged in. I didn't see the issue there.

Shane, can you please give me some idea about in what other circumstances we'd get a logout() webchannel message?

(And more generally, I wonder if we should have a discussion about the ability for a webchannel message to log us out at any time? But we can have that later at our leasure)
Flags: needinfo?(stomlinson)
:markh - the scenario you describe makes sense. 

Let's say you have 2 browsers, browser 1 and browser 2. Browser 1 has session restore enabled.

* In browser 1, open the /settings page from about:preferences#sync. Close the browser.
* In browser 2, you sign into the same FxA account as browser 1 and change the password.
* In browser 1, you open the browser. We do an `fxaccounts:fxa_status`. Now imagine
the web channel is ready, the browser responds with its view of the signed in user.
* FxA tries to check the sessionToken returned by the browser in the previous step, see the sessionToken is no longer valid, and sends the browser an "fxaccounst:logout" message.

In the 3rd step, imagine a 2nd scenario, the web channel is not ready. We'll wait 90 seconds for a response [1], showing the spinner during this time. Since you didn't mention the spinner being displayed and a blank screen, I'm assuming this didn't happen.

In the 3rd step, imagine a 3rd scenario, the web channel is ready but not set up correctly, the browser responds with an "invalid web channel" error. I think we should already be handling this [2][3], though there is a good chance we missed some cases. What should happen is that the browser responds with "invalid web channel" and we try to get the previous session from localStorage. Similar to the first scenario, it gets the session from localStorage, sees the session is no longer valid, and sends an "fxaccounts:logout" message, but the browser should reject the message because the channel isn't set up correctly.

I feel like the most likely scenario is the first, though I have no data to back this up.

[1] - https://github.com/mozilla/fxa-content-server/blob/379dc872480372dd3e767d6bfe849b13d88c2d97/app/scripts/lib/channels/duplex.js#L37
[2] - https://github.com/mozilla/fxa-content-server/blob/379dc872480372dd3e767d6bfe849b13d88c2d97/app/scripts/models/auth_brokers/base.js#L154
[3] - https://github.com/mozilla/fxa-content-server/pull/5117/files
Flags: needinfo?(stomlinson)
(In reply to Shane Tomlinson [:stomlinson|pto] from comment #3)
> :markh - the scenario you describe makes sense. 
> 
> Let's say you have 2 browsers, browser 1 and browser 2. Browser 1 has
> session restore enabled.
> 
> * In browser 1, open the /settings page from about:preferences#sync. Close
> the browser.
> * In browser 2, you sign into the same FxA account as browser 1 and change
> the password.
> * In browser 1, you open the browser. We do an `fxaccounts:fxa_status`. Now
> imagine
> the web channel is ready, the browser responds with its view of the signed
> in user.
> * FxA tries to check the sessionToken returned by the browser in the
> previous step, see the sessionToken is no longer valid, and sends the
> browser an "fxaccounst:logout" message.

Right - but that's very bad - we shouldn't kick the user out of sync in that case. The only time we should kick a user out of Sync via the web channel that I can think of is if the account was deleted.
Component: Sync → Server: Firefox Accounts
Product: Firefox → Cloud Services
We've talked in the past about adding more nuanced state management to sessionTokens, such as allowing for a state where the session token still is exists but is marked as "disconnected".  This bug could be another case where that would help us provide a better user experience.
(In reply to Mark Hammond [:markh] from comment #4)

> Right - but that's very bad - we shouldn't kick the user out of sync in that
> case. The only time we should kick a user out of Sync via the web channel
> that I can think of is if the account was deleted.

TBH, I'm trying to recall why we did it this way, some sleuthing needed.
Mark and I were discussion and wondered if it may be an accidental side-effect of FxA web content tearding down its own state upon discovering the invalid session token, rather than a deliberate intention to disconnect the browser.
> Mark and I were discussion and wondered if it may be an accidental side-effect of FxA web content tearding down its own state upon discovering the invalid session token, rather than a deliberate intention to disconnect the browser.

I reckon its a side effect of trying to keep two tabs open to /settings in sync. A logout in one tab should cause the other to logout too.
I *think* this behavior is the result of trying to handle the case of a user having a session on one device, they go to another device and delete the account. On the first device, the user goes to about:preferences#sync, see they are no longer signed in, load FxA. We see the session is invalid and notify the browser of the logout, then allow the user to sign up again. We did this because at the time the browser was unable to handle an fxaccounts:login message w/ a different uid.

Here's the background: 

* https://github.com/mozilla/fxa-content-server/issues/3057
* https://github.com/mozilla/fxa-content-server/issues/3283

Maybe with push notifications this is no longer a problem and the browser handles deleted accounts itself and we no longer have to send "logout" messages when we detect an invalid session.

Remembering this made my head explode. Too much history.
(In reply to Shane Tomlinson [:stomlinson|pto] from comment #9)
> I *think* this behavior is the result of trying to handle the case of a user
> having a session on one device, they go to another device and delete the
> account. On the first device, the user goes to about:preferences#sync, see
> they are no longer signed in, load FxA. We see the session is invalid and
> notify the browser of the logout, then allow the user to sign up again. We
> did this because at the time the browser was unable to handle an
> fxaccounts:login message w/ a different uid.

Thanks Shane - that makes sense, although that doesn't seem to apply here - while the session is invalid, the account remains valid.

> Maybe with push notifications this is no longer a problem and the browser
> handles deleted accounts itself and we no longer have to send "logout"
> messages when we detect an invalid session.

I don't think we should rely on push for this, but desktop already does a few checks for a deleted account. However, as above, I'm still not quite sure how that applies here as we aren't talking about an account deletion.

@mark - do you think it's worth pursuing this further? If so, do you have a recommendation on what should change?

Flags: needinfo?(markh)

I haven't seen this since, so let's close it.

Status: NEW → RESOLVED
Closed: 8 days ago
Flags: needinfo?(markh)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: