Closed Bug 1528562 Opened 5 years ago Closed 5 years ago

browser.identity.launchWebAuthFlow hangs after authentication

Categories

(WebExtensions :: General, defect, P1)

64 Branch
defect

Tracking

(firefox67 fixed, firefox68 verified)

RESOLVED FIXED
mozilla68
Tracking Status
firefox67 --- fixed
firefox68 --- verified

People

(Reporter: scott, Assigned: mixedpuppy)

Details

(Whiteboard: [qa-triaged])

Attachments

(7 files)

Attached file oauth-test.zip

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36

Steps to reproduce:

  1. Create a browser extension with a browser action
  2. Generate redirect URL for extension and output to console
  3. Create a new app for some service that allows OAuth2 authentication.
  4. Add URL as a valid redirect URL for some service that allows OAuth2 authentication (I was using LinkedIn)
  5. Add Client secret/Client ID to background script for use by launchWebAuthFlow()
  6. Add a background script listening for clicks on a browserAction
  7. When click on toolbar button, run a content script that adds a button to the current tab. (This step is unnecessary but I did it to simulate how my actual plugin is working)
  8. When button is clicked, send a message to the background script to perform authentication
  9. Use launchWebAuthFlow() to launch OAuth2 authentication, e.g.

const redirectUrl = browser.identity.getRedirectURL(),
url = 'https://www.linkedin.com/oauth/v2/authorization?'
+ 'response_type=code'
+ '&client_id=77nvyvxfkiobho'
+ '&redirect_uri=' + encodeURIComponent(redirectUrl)
+ '&state=asdfiou12345u6'
+ '&scope=w_share';

Actual results:

Authentication window pops up just fine and I am able to authenticate with the external service (LinkedIn). When the external service redirects back to the redirect_uri, the status bar shows activity - first it says "Waiting for linkedin...." and after about another 60s it shows "Connecting to 12341234.extensions.allizom.org..." and finally after about another 60s, the browser window finally closes and the promise is resolved successfully.

Expected results:

After authenticating successfully with the provider, the redirect back to the extension should be near immediate. The window should close and the promise should be resolved or rejected. It should not take 2+ minutes for this process. The same exact plugin works fantastic in Chrome.

I also tried this entire setup with the google-userinfo sample plugin and it does the same thing. The promise eventually resolves successfully and the extension is able to then get an access token and display my user info but it takes several minutes.

I have attached a zip of a sample plugin I wrote to show the problem. I used the exact same plugin in both Chrome and FF and it works great in Chrome but not in FF.

I also attached a full networking trace from about:networking. In this trace, I clicked the "Allow" button in the web auth window with LinkedIn (where the 2+ minutes starts) around 19:55. I am no expert in reading these traces but you will notice that it seems that what it is taking forever to do is resolve the 6b58f8236c5fabbc2af9a2329831c9e097dce470.extensions.allizom.org hostname with DNS. It does the following several times:

2019-02-16 19:57:53.566004 UTC - [Parent 59553: Socket Thread]: D/nsHostResolver Adding address to blacklist for host [6b58f8236c5fabbc2af9a2329831c9e097dce470.extensions.allizom.org], host record [0x135bf2160].used trr=0
2019-02-16 19:57:53.566020 UTC - [Parent 59553: Socket Thread]: D/nsHostResolver Successfully adding address [63.245.213.11] to blacklist for host [6b58f8236c5fabbc2af9a2329831c9e097dce470.extensions.allizom.org].
2019-02-16 19:57:53.566039 UTC - [Parent 59553: Socket Thread]: D/nsSocketTransport nsSocketInputStream::OnSocketReady [this=0x124002a88 cond=804b000e]
2019-02-16 19:57:53.566054 UTC - [Parent 59553: Socket Thread]: D/nsSocketTransport nsSocketOutputStream::OnSocketReady [this=0x124002ac0 cond=804b000e]

This final time above is right around when the window finally closes and the promise is resolved.

This bug is exactly the condition that is also described in this form entry: https://discourse.mozilla.org/t/browser-identity-launchwebauthflow-hangs-after-authentication/33186

I noticed that if I use the BrowserToolbox and put a breakpoint in the ext-identity.js on the first conditional in the state change listener below I am able to see that browser displays a "12341234.extensions.allizom.org... was taking too long to respond" message. I didn't see this originally b/c the state change listener immediately closes the window. But with the breakpoint in I was able to see it. So I think for sure that Firefox is trying to connect to this host rather than immediately intercepting the redirect before it ever gets the the networking engine and instead sending it straight back to the extension. Attached a screenshot.

Lastly I was also able to catch the network trace using Browser Toolbox and having the breakpoint in. You can see here that it is no linkedin that is the problem. It is the redirect to extensions.allizom.org that is the issue.

Here is the code where I was putting the breakpoint:

wpl = {
  onStateChange(progress, request, flags, status) {
    // "request" is now a RemoteWebProgressRequest and is not cancelable
    // using request.cancel.  We can however, stop everything using
    // webNavigation.
    if (request && request.URI &&   (<---- put breakpoint here)
      request.URI.spec.startsWith(redirectURI)) {
      window.gBrowser.webNavigation.stop(Ci.nsIWebNavigation.STOP_ALL);
      window.removeEventListener("unload", unloadlistener);
      window.gBrowser.removeProgressListener(wpl);
      window.close();
      resolve(request.URI.spec);
    }
  },
};

promiseDocumentLoaded(window.document).then(() => {
  window.gBrowser.addProgressListener(wpl);
  window.addEventListener("unload", unloadlistener);
});

I noticed that the first time this was called it was called with flags 0xF0001 and it is called for the initial POST to linkedin.com/authorization/... However, it is NEVER called on the redirect to 12341234.extensions.allizom.org. In other words the onStateChange is only called twice in this flow. The first time it is called with the START state 0xF0001 and after the 2+ minutes, it is finally called with 0xC0010. So it is never called when linkedin.com sends the redirect to extensions.allizom.org.

Component: Untriaged → DOM: Web Authentication
Product: Firefox → Core

This isn't using WebAuthn, rather having a problem doing OAuth from a WebExtension. I'm not really sure where to put this, so let me first go into WebExtensions:General and I'll poke on IRC.

Component: DOM: Web Authentication → General
Product: Core → WebExtensions

If you change the pref extensions.webextensions.identity.redirectDomain to use extensions.mozilla.org is your issue resolved?

Flags: needinfo?(scott)

We're experiencing the same issue, but with a different service provider. I can confirm that changing the extensions.webextensions.identity.redirectDomain pref to extensions.mozilla.org fixes the issue. Tested on both Ubuntu and Windows 10.

Will the fix require a change in the Firefox code or are there any other options (maybe some configuration changes in the manifest or some OAuth flow parameters) to make it work without the pref change?

You can work around this simply by using a get arg "redirect_uri" in the url you use for the launchWebAuthFlow call.

(In reply to Shane Caraveo (:mixedpuppy) from comment #13)

You can work around this simply by using a get arg "redirect_uri" in the url you use for the launchWebAuthFlow call.

Well, slight correction on that...if the service will user redirect_uri you can do that. The auth result must redirect to that uri, so the server has to work with the argument.

that works, but what is the plan for fixing this on the Firefox side? Is the extensions.allizom.org address going to replace extensions.mozilla.org at some point? Or is that just a typo in the Firefox code?

i.e. we can easily do this

browser.identity.getRedirectURL().replace('extensions.allizom.org','extensions.mozilla.org')

but I'm now sure how how reliable that is as a fix in production code

It was a work-around to get people unstuck.

Is DNS resolving for you on extensions.allizom.org and anything.extensions.allizom.org?

Flags: needinfo?(bridget.almas)

(In reply to Shane Caraveo (:mixedpuppy) from comment #16)

It was a work-around to get people unstuck.

Is DNS resolving for you on extensions.allizom.org and anything.extensions.allizom.org?

both resolve to 63.245.213.11 but traceroute requests timeout

traceroute to anything.extensions.allizom.org (63.245.213.11), 30 hops max, 60 byte packets
1 * * *
2 142.254.211.65 (142.254.211.65) 10.244 ms 10.927 ms 11.719 ms
3 agg62.schnny5202h.northeast.rr.com (24.58.200.117) 179.978 ms 180.489 ms 181.205 ms
4 24.58.34.234 (24.58.34.234) 14.057 ms 15.764 ms 14.860 ms
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

traceroute to extensions.allizom.org (63.245.213.11), 30 hops max, 60 byte packets
1 * * *
2 142.254.211.65 (142.254.211.65) 19.868 ms 20.824 ms 21.613 ms
3 agg62.schnny5202h.northeast.rr.com (24.58.200.117) 532.184 ms 532.820 ms 533.603 ms
4 24.58.34.234 (24.58.34.234) 22.323 ms 23.130 ms 23.958 ms
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

whereas extensions.mozilla.org resolves to 63.245.208.213 and traceroute completes

traceroute to extensions.mozilla.org (63.245.208.213), 30 hops max, 60 byte packets
1 * * *
2 142.254.211.65 (142.254.211.65) 11.309 ms 10.550 ms 12.061 ms
3 agg62.schnny5202h.northeast.rr.com (24.58.200.117) 28.403 ms 29.205 ms 30.019 ms
4 24.58.34.234 (24.58.34.234) 13.399 ms 14.218 ms 15.052 ms
5 be26.rochnyei01r.northeast.rr.com (24.58.32.52) 24.742 ms 20.740 ms 23.809 ms
6 bu-ether15.chcgildt87w-bcr00.tbone.rr.com (66.109.6.72) 36.830 ms 36.816 ms 39.979 ms
7 66.109.5.137 (66.109.5.137) 36.585 ms 40.414 ms 41.181 ms
8 * * *
9 ae-0-11.bar1.SanFrancisco1.Level3.net (4.69.140.145) 82.121 ms 83.926 ms 83.497 ms
10 QUALITY-TEC.bar1.SanFrancisco1.Level3.net (4.14.111.246) 84.257 ms 84.100 ms 84.495 ms
11 173.225.175.154 (173.225.175.154) 80.654 ms 173.225.175.158 (173.225.175.158) 82.328 ms 80.788 ms
12 65.74.145.155 (65.74.145.155) 82.987 ms 65.74.145.153 (65.74.145.153) 80.737 ms 80.542 ms
13 ethernet1-13_4094.fw1.untrust.mdc1.mozilla.net (63.245.208.18) 81.465 ms 63.245.208.22 (63.245.208.22) 81.015 ms ethernet1-13_4094.fw1.untrust.mdc1.mozilla.net (63.245.208.18) 80.767 ms
14 redirects-http-only.public.mdc1.mozilla.com (63.245.208.213) 82.123 ms 81.235 ms 81.115 ms

Flags: needinfo?(bridget.almas)

There are some services that are redirecting in a fashion that we are not catching. Github works fine, linkedin does not. I can see the redirect via webRequest.onBeforeRedirect, and the redirect is happening in the same document as main_frame. The webProgressListener does not.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(scott)
Priority: -- → P1

A little more detail.

Github does a 302 redirect from a get request to a get request.

Linkedin does a 303 redirect from a post request to a get request.

nsIChannelEventSink in WebRequest gets the 303, nsIWebProgressListener does not get a onStateChange call (thus the failure).

This seems to be aggravated by our use of allizom.org (where there is no server response) and (at least from one report) changing to mozilla.org works. That may indicate a server response of some kind happened and the progress listener ended up getting the onStateChange call.

These seem to be handled slightly different in a couple places, but I'm not clear on a lot of the docloader/nsidoc/webprogress interactions.

Lets see if :smaug can provide some guidance.

Flags: needinfo?(bugs)

I don't really know what is being asked here.
Could you perhaps give searchfox links to the relevant places possibly causing the issue or something?
Is this about https://searchfox.org/mozilla-central/rev/8ff2cd0a27e3764d9540abdc5a66b2fb1e4e9644/uriloader/base/nsDocLoader.cpp#1409 not calling nsIWebProgressListener object the way webextensions expect? If so, someone need to debug what is happening.

Flags: needinfo?(bugs)
Assignee: nobody → mixedpuppy

I've tracked it down a bit further now, hoping for some guidance on what to do here.

Handling the oauth authorization window fails in a particular case: the final redirect is a 303 with POST.

  • nsDocLoader is in fact calling the progress listener onStateChange for the redirect.
  • our listener uses gBrowser.addProgressListener, which in the end relies on nsBrowserStatusFilter.cpp
  • nsBrowserStatusFilter only fires onStateChange if STATE_IS_NETWORK [1]
  • in this case, flags do not have STATE_IS_NETWORK, but looks to have:
    • STATE_START
    • STATE_IS_REQUEST
    • STATE_IS_REDIRECTED_DOCUMENT [2]
  • If I remove the STATE_IS_NETWORK check in nsBrowserStatusFilter, the progress listener (in ext-identity.js) catches the redirect and the oauth authentications finishes.

I'm not sure if this should be fixed in docloader or the statusfilter, nor what the right fix would be (e.g. just adding STATE_IS_NETWORK where we set STATE_IS_REDIRECTED_DOCUMENT would work for my case).

I'm not clear who owns nsBrowserStatusFilter, so ni?ehsan since he's done resent work there.

[1] https://searchfox.org/mozilla-central/rev/3d469329a42644b41e66944d8da22d78d8c0f5ec/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp#164

[2] https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/uriloader/base/nsDocLoader.cpp#447

Flags: needinfo?(ehsan)
Flags: needinfo?(bugs)

I've only done some refactoring in this code and don't know much about it.

STATE_IS_REDIRECTED_DOCUMENT was added relatively(!) recently in bug 1342989. Based on that bug, it doesn't seem unreasonable to make nsBrowserStatusFilter not eat that status change if you need it. The whole purpose of nsBrowserStatusFilter is to compress unnecessary webprogress status updates that hurt performance, and I'd say if it's eating something that is causing a bug as a result, it is totally fine to relax its rules (as long as we're careful with performance regressions.) This status change message in particular doesn't seem like one that could result in a perf regression to me.

Flags: needinfo?(ehsan)

QE: STR is to install an extension I have that connects to linkedin (contact me directly for the extension). Login, then allow permission. At that point, we currently "hang" (ie. the window never closes). With the fix, the window is closed successfully.

Flags: qe-verify+
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da3353f475f6
support POST with 303 redirect in identity.launchWebAuthFlow r=rpl,Ehsan
Flags: needinfo?(bugs)
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ca9b52a04b08
support POST with 303 redirect in identity.launchWebAuthFlow r=rpl,Ehsan
Flags: needinfo?(mixedpuppy)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Comment on attachment 9057339 [details]
Bug 1528562 support POST with 303 redirect in identity.launchWebAuthFlow

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: identity api
  • User impact if declined: The webextensions identity api fails to work with oauth providers that use certain redirects (e.g. 303 after POST request), so extensions cannot use the api to request access with sites like LinkedIn
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Comment 25 has an str
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It's a late request, but the code change is very minimal.
  • String changes made/needed: none
Attachment #9057339 - Flags: approval-mozilla-beta?
Flags: qe-verify+ → qe-verify?

Let's have QA verify on Nightly first before evaluating an uplift to beta.

Flags: qe-verify? → qe-verify+
Whiteboard: [qa-triaged]

Verified using Nightly 68.0a1 (20190414214746) on Windows 10 x64 with the Linkedin extension provided by Shane, the login/close process is nearly instant, works as expected.

Flags: qe-verify+

Comment on attachment 9057339 [details]
Bug 1528562 support POST with 303 redirect in identity.launchWebAuthFlow

Low risk patch with tests, verified by QA on Nightly, uplift approved for 67 beta 12, thanks.

Attachment #9057339 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.