Open Bug 1469653 Opened 4 years ago Updated 11 months ago

Custom UA header causes FxA login to fail - red bar says "Working...", but never completes login.

Categories

(Cloud Services :: Server: Firefox Accounts, defect)

60 Branch
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: trishaxuk, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393

Steps to reproduce:

Tried to login/sync my profile.  It's possible I may have deleted my device from the account (not sure if this was the beginning of my problems), as it was working before that.

Despite clearing all devices and starting from scratch, even going so far as completely deleting the sync account and creating a new one; I am unable to complete the sync login process.  

I tried some of the checks in the "about:support" page (I did quite a bit of digging around for a resolution before filing this).  The profile does not appear to have any obvious problems, but appears to mess up the sync login process.




Actual results:

It begins; then after about 3-4 seconds, A red "Working..." status, appears above the credentials fields, and a circle egg-timer animation continues in place of the login button.  The login/sync never completes (I gave it over an hour on more than one occasion).

Unsanitised sync-log attached (please sanitise if there is anything sensitive that can be extracted from it).

I'm guessing that the following is the crux of the issue; but unsure how to dig deeper.

If possible; can all methods of integrity checking and/or repairing the profile, be listed in one go (or one tool that does it all be identified), to save time, if this is the prescribed course).  - many thanks in advance.


Expected results:

The sync should have completed, and all content should have replicated to other linked devices, but through this experience, I realised that sync had stopped working a little while back (I think sometime in May if I was reading the logs correctly).
Duplicate of this bug: 1469658
https://token.services.mozilla.com/1.0/sync/1.5

Can you load the above link in either a new profile [1] or a different browser, or does the request time out?
Are you connecting to the Internet through a proxy?

[1] https://support.mozilla.com/kb/profile-manager-create-and-remove-firefox-profiles
Component: Untriaged → Operations
Flags: needinfo?(trishaxuk)
Product: Firefox → Cloud Services
(In reply to Gingerbread Man from comment #2)
> https://token.services.mozilla.com/1.0/sync/1.5
> 
> Can you load the above link in either a new profile [1] or a different
> browser, or does the request time out?
> Are you connecting to the Internet through a proxy?
> 
> [1]
> https://support.mozilla.com/kb/profile-manager-create-and-remove-firefox-
> profiles

No proxy.  This is direct.
The link opened (no timeout).

It showed a weird JSON thing:
{"status": "error", "errors": [{"location": "body", "name": "", "description": "Unauthorized"}]}

Headers tab thing showed:
Connection: keep-alive
Content-Length: 96
Content-Security-Policy: default-src 'none' ; script-src resource:; 
Content-Type: application/json
Date: Wed, 20 Jun 2018 09:12:13 GMT
X-Content-Type-Options: nosniff
X-Timestamp: 1529485933

Accept: text/html,application/xhtml+xml,image/jxr,*/*
Accept-Encoding: gzip, deflate, br
Accept-Language: en
Connection: keep-alive
Host: token.services.mozilla.com
Referer: https://bugzilla.mozilla.org/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393
Flags: needinfo?(trishaxuk)
BTW:  The raw of the weird JSON thing I pasted above, was the same as when the link was loaded in Chrome.

(The output above was from visiting the link you gave, in FF in the problem profile.)

Additionally - the other profiles are syncing to respective sync profiles without issue from the same browser install.
(In reply to Trisha Xuk from comment #3)
> The link opened (no timeout).
> 
> It showed a weird JSON thing:

That's the expected result. Then the NS_ERROR_NET_TIMEOUT error code in the log is puzzling and this doesn't belong in Operations. Moving to Firefox Sync: Backend for investigation.
Component: Operations → Firefox Sync: Backend
I've not seen a report like this and it's strange that visiting the URL manually works while it doesn't work correctly from Sync.

The 'red "Working..." status' mentioned in comment 0 is also strange - that's almost certainly from the login process and may be another symptom of the same problem.

(In reply to Trisha Xuk from comment #3)
> User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
> (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393

That's a strange user-agent string - could you please arrange for this to be reset to the default (https://support.mozilla.org/en-US/kb/how-reset-default-user-agent-firefox might help) and see if your problem goes away?

Another thing you could consider is a profile refresh - https://support.mozilla.org/en-US/kb/refresh-firefox-reset-add-ons-and-settings

Ryan, it seems unlikely, but can you see any reason a non-default UA string might cause problems in the login or token-fetch flows?
Flags: needinfo?(trishaxuk)
Flags: needinfo?(rfkelly)
> Ryan, it seems unlikely, but can you see any reason a non-default UA string might cause problems
> in the login or token-fetch flows?

It's unlikely; I've seen them cause trouble in the past when they included e.g. high unicode characters, but this one doesn't have anything too concerning in it that I can see.

> It begins; then after about 3-4 seconds, A red "Working..." status, appears above the credentials fields,
> and a circle egg-timer animation continues in place of the login button.  The login/sync never completes
> (I gave it over an hour on more than one occasion).

On the login page, you should be able to open the web console [1] to see more detailed logging from the login process.  If you do so, does it show any warnings or errors?

[1] https://developer.mozilla.org/en-US/docs/Tools/Web_Console/Opening_the_Web_Console
Flags: needinfo?(rfkelly)
> 1525944261092	Sync.Service	INFO	Starting sync at 2018-05-10 10:24:21 in browser session ILg_EGHc-HRp

Is that sync log from May 10th? Or is your client clock off by 40ish days?
(In reply to John Morrison [:jrgm] from comment #8)
> > 1525944261092	Sync.Service	INFO	Starting sync at 2018-05-10 10:24:21 in browser session ILg_EGHc-HRp
> 
> Is that sync log from May 10th? Or is your client clock off by 40ish days?

That seems to be the last (most recent) log.  I'm guessing it was recording it because it was still thinking it was linked/connected.  Maybe Once I disconnected/logged out; it's not creating a log (at least in this place associated to this profile)?
(In reply to Ryan Kelly [:rfkelly] from comment #7)
> > Ryan, it seems unlikely, but can you see any reason a non-default UA string might cause problems
> > in the login or token-fetch flows?
> 
> It's unlikely; I've seen them cause trouble in the past when they included
> e.g. high unicode characters, but this one doesn't have anything too
> concerning in it that I can see.
> 
> > It begins; then after about 3-4 seconds, A red "Working..." status, appears above the credentials fields,
> > and a circle egg-timer animation continues in place of the login button.  The login/sync never completes
> > (I gave it over an hour on more than one occasion).
> 
> On the login page, you should be able to open the web console [1] to see
> more detailed logging from the login process.  If you do so, does it show
> any warnings or errors?
> 
> [1]
> https://developer.mozilla.org/en-US/docs/Tools/Web_Console/
> Opening_the_Web_Console

raven.js was the script.  If I understand what I'm seeing; the hang was at line 360 
(screenshot here on ImgBB: https://ibb.co/cWOFN8)
Flags: needinfo?(trishaxuk)
Thanks!  The error message from the web console is "Response not received for: fxaccounts:can_link_account" which would definitely explain the "Working..." spinner on FxA.  Basically, the signin page asked the browser to confirm that the user wants to sign in, but didn't receive a reply back from the browser.

I've never seen this before - Mark, maybe profile corruption that's somehow preventing the browser from doing its "another user was previously logged in" check?
Thanks Ryan - You beat me to my additional comment...

Mark! - the UA was the guilty party (well the trigger here) - Thank you!!! 
I reset "general.useragent.override", which had the weird string from back when I was troubleshooting something for a legacy NHS app. a couple of years ago.  Strange that it caused an issue now.  Updated code somewhere?  Maybe raven.js:360 will help someone that knows the code?
Thanks to all of you for this.  If I can help further, I don't mind running additional tests of submitting some more data.

Stellar effort :)
I can't explain how the client would fail to send a response in this case. I do know the FxA server code has some funky checks to identify the UA at the other side of the login, but can't explain from these symptoms how that could be at fault (but I'll CC Shane anyway!).

I think we should keep this open to reproduce and get to the bottom of and see if there's anything we can do to mitigate it.
Summary: Some kind of minor corruption in a profile, is preventing firefox sync from completing. Red bar says "Working...", but never completes login. → Custom UA header causes FxA login to fail - red bar says "Working...", but never completes login.
It seems strange to me that the WebChannel request would fail because of the UA string. We do parse UA strings to determine UI next steps, but `context` should be all that's needed to determine which method to use to communicate with the browser. I can confirm this is the behavior though, I just tested with the user agent string provided in the first comment.

It's problems like this that make me extremely leery of trying to auto-determine the context query parameter [1] unless we can get access to Fx Desktop's UITour API to get the true browser version.


[1] - https://github.com/mozilla/fxa-content-server/issues/6317
Even more strange, in local development, there is no problem signing in. I tried to debug on prod using sourcemaps, a difficult proposition. Setting one breakpoint causes the breakpoint to trigger on every statement, I suspect the minified JS is causing the debugger to go awry.
Here is a clue from the Browser Console:

> WebChannelMessageToChrome sent with an object from a non-whitelisted principal  browser-content.js:1046
>	_onMessageToChrome chrome://global/content/browser-content.js:1046:11
>	init/< chrome://global/content/browser-content.js:1019:7
>	send/< https://accounts-static.cdn.mozilla.net/bundle-16f418bdf2737c0ef115e9eab723e924b7d94dab/app.bundle.en.js:1:257186

So the WebChannel message is being sent but being rejected because of a disallowed principal. I wonder if the principal checking code somehow depends upon the UA string?
The webchannel should only be getting strings - we removed FxA from that whitelist in bug 1346072.
(In reply to Mark Hammond [:markh] from comment #19)
> The webchannel should only be getting strings - we removed FxA from that
> whitelist in bug 1346072.

Oh, this is the golden nugget needed for debugging!

The problem lies at [1]. We check whether the user is in 
Fx Desktop >= 50 or Fx for Android >= 50 and then stringify
the WebChannel contents. If we can't tell which browser
is being used, we do not stringify the details.

I knew UA sniffing would bite us in the behind at some point.

We have no good way of getting the canonical UA version
from within FxA, having access to UITour or similar functionality
within all of the Foxes would allow us to get that info.

Maybe until that golden age arrives, we should reverse the
way we decide whether to stringify the detail field. Since
Fennec and Desktop both expect stringified objects by default,
make that the normal flow. If we detect Fx Desktop or Fx for
Android < 50, then send the original object through. 

This approach is still brittle, a UA munger such as the
one used for this could send a UA string that matches
Fx < 50 and the same problem would happen.

[1] - https://github.com/mozilla/fxa-content-server/blob/d95118004500ecd522061b4de08ace068e21ae37/app/scripts/lib/channels/senders/web-channel.js#L114
Status: UNCONFIRMED → NEW
Component: Firefox Sync: Backend → Server: Firefox Accounts
Ever confirmed: true
Duplicate of this bug: 1664059
You need to log in before you can comment on or make changes to this bug.