Closed Bug 1768429 Opened 3 years ago Closed 2 years ago

New tab => type internal site URL => hit enter => reopens in Work container => infinite spinner

Categories

(WebExtensions :: Developer Outreach, defect, P5)

Firefox 100
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: amos, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36

Steps to reproduce:

I'm using the official Multi-Account Containers extension to be able to visit both work sites and personal sites from the same browser window.

Because I often do Ctrl+T to open a new tab, and type in part of a URL (until it auto-completes) then press enter, I've accepted the "Always open this site in this (Work) container" option in the prompt that was shown to me after I right-clicked on a tab and chose "Open in New container tab => Work".

Most work sites work. No issues with our Slack subdomain for example. Or our Discourse instance. However, we have a site that's a Rails app, only accessible over VPN (happens to be Tailscale), that I often cannot open in Firefox.

I do Ctrl+T (this creates a tab "not in any container" afaict), type the URL, press enter.

Actual results:

This immediately closes the tab and re-opens a "Work container" tab (I can tell because the tab with no decorations disappears, and I can see the new tab animation for a tab with an orange overline). Then I see a loading spinner forever. The tab name is internal-domain/, the address bar shows internal-domain, and the dev tools don't show anything (no requests).

I can leave it to "load" for minutes with no changes. I suspect it's not doing any loading at all. I searched for ways to enable more verbose logging but the only promising thing I could find on Windows was the -console command-line flag, which opens a console but doesn't print anything. I captured Windows API calls with ProcMon and nothing jumped out.

Expected results:

If closing & opening a new tab in the UI is what it takes to transition from "no container" to "Work container", then so be it. But then, the site should load. It loads in Google Chrome, it loads in Microsoft Edge, and I'm able to curl it with no issues.

At this point I believe it's a race condition between the UI and the browser engine, because what the page is trying to do (as far as I can tell from the curl I ran) is /immediately/ redirect to a Google login page.

So I imagine what's happening is that the redirect starts getting processed /before/ Multi-Account Containers even has a chance to re-open it into a new tab. Then in the process of closing the original tab / opening a new one, maybe the redirect is "cancelled"? But the UI still thinks the new tab is loading, so it shows a loading spinner, when nothing is actually loading.

It's very hard for me to tell much of anything else, otherwise I would've opened a bug earlier. You can probably reproduce this by having a very fast local site that redirects to some slow page, and reproduce the same Work container set up.

Note that if I first open one of the "Work container only" sites that /work/, and /then/ navigate to that internal site from that tab, it loads just fine. If I do that once, then the session is "fixed" and I'm able to close all tabs and open the internal site as usual. Presumably because it sets session cookies.

That's all I got.

The Bugbug bot thinks this bug should belong to the 'Firefox::Tabbed Browser' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Tabbed Browser

(In reply to Amos Wenger from comment #0)

Most work sites work. No issues with our Slack subdomain for example. Or our Discourse instance. However, we have a site that's a Rails app, only accessible over VPN (happens to be Tailscale), that I often cannot open in Firefox.

Does this site not have a . in the hostname? And if so, if you instead:

I do Ctrl+T (this creates a tab "not in any container" afaict), type the URL, press enter.

Ctrl+T, followed by http://internal-domain/ (or https if supported I guess), followed by enter, does that behave any differently/better?

This immediately closes the tab and re-opens a "Work container" tab (I can tell because the tab with no decorations disappears, and I can see the new tab animation for a tab with an orange overline). Then I see a loading spinner forever.

loading spinner where? In the content area? In the tabstrip? Both?

The tab name is internal-domain/, the address bar shows internal-domain, and the dev tools don't show anything (no requests).

If you open the devtools console and execute window.location?.href and document.documentURI, what's the output?

I can leave it to "load" for minutes with no changes. I suspect it's not doing any loading at all. I searched for ways to enable more verbose logging but the only promising thing I could find on Windows was the -console command-line flag, which opens a console but doesn't print anything. I captured Windows API calls with ProcMon and nothing jumped out.

You probably want https://firefox-source-docs.mozilla.org/networking/http/logging.html . That would at least clarify if (a) there is more than 1 request, and (b) if there is any difference between the redirect that results from reopening the page in the "work" container, and a request made directly in the "work" container (in a new session).

because what the page is trying to do (as far as I can tell from the curl I ran) is /immediately/ redirect to a Google login page.

Redirect how? <meta refresh, or 301/302, or <script>location.href = ... or something else?

Note that if I first open one of the "Work container only" sites that /work/, and /then/ navigate to that internal site from that tab, it loads just fine. If I do that once, then the session is "fixed" and I'm able to close all tabs and open the internal site as usual. Presumably because it sets session cookies.

If you long-press or right-click the new tab button and open a fresh "work" tab directly and directly open the problematic site, does it work?

For the session cookies, do things work if you pin a tab with this site before closing and reopening Firefox (so the pinned tab will load in the correct container immediately) ? If so that might be a workaround in terms of your current experience while we figure out what is causing the original issue.

Flags: needinfo?(amos)

I can answer some of these immediately, I'll answer in more detail later:

  • The domain has four parts (a.b.c.net). Adding http:// before the domain name by hand (before pressing enter) doesn't change anything.
  • Loading spinner in the tabstrip (where the favicon would be). Content area is blank.. although, a darker shade of white than the real about:blank
  • Both window.location?.href and document.documentURI are set to about:blank
  • I'll try HTTP logging later
  • Redirect via HTTP 302 response
  • Opening a work tab manually /then/ navigating to a.b.c.net doesn't work as a workaround.

Also, I tried my workaround again:

Note that if I first open one of the "Work container only" sites that /work/, and /then/ navigate to that internal site from that tab, it loads just fine. If I do that once, then the session is "fixed" and I'm able to close all tabs and open the internal site as usual. Presumably because it sets session cookies.

And I can't get it to work again, so it doesn't seem like a reliable workaround. (I expected the "opening a work tab manually" workaround to also work, given that).

Flags: needinfo?(amos)

Here's a video of what I see:

In the video I first:

  • Do what I usually do: Ctrl+T, type domain, press enter
  • Then open a Work container tab manually (you can't see the context menu as it's a different window and I used OBS's window capture) then type domain and press enter. I show that the URL in this case is about:newtab
  • I do what I usually do again and show that the URL in this case is about:blank

Sorry for the delay, I had to check with work that it was ok to post a video with this domain, and take the time to redact other completions.

OK, if this doesn't work from the URL bar in a tab that's already in the right container, I think something is going wrong either in networking code or in process switching to get the right process for the target domain. Moving to networking as a first stab; I'm not sure how best to investigate further - I'm hoping Kershaw or Nika can make further suggestions. Some more high-level troubleshooting ideas follow, in case they bring up anything relevant:

  • anything in the browser console when this happens? (NOT the regular devtools, the browser one - ctrl-shift-j to open on Windows. You could also potentially see more of the network shenanigans with the browser toolbox if you were so inclined)

  • I assume you've already tried to reproduce with all your add-ons except the multi-account containers one disabled (ie excluding ublock / privacy badger / whatever you've got installed anti-tracking/adblock-wise from the list of potential culprits) ?

  • is the VPN/tailscale thing enabled 100% of the time (ie for the entire machine / all of Firefox), or container/site-specific ?

  • is the target, internal site using any "interesting" networking stuff (http2/http3 not done by apache or other mainstream webservers, perhaps?)

And when looking at this:

At this point I believe it's a race condition between the UI and the browser engine, because what the page is trying to do (as far as I can tell from the curl I ran) is /immediately/ redirect to a Google login page.

vs.

Note that if I first open one of the "Work container only" sites that /work/, and /then/ navigate to that internal site from that tab, it loads just fine. If I do that once, then the session is "fixed" and I'm able to close all tabs and open the internal site as usual. Presumably because it sets session cookies.

vs.

And I can't get it to work again, so it doesn't seem like a reliable workaround. (I expected the "opening a work tab manually" workaround to also work, given that).

This is all pretty mysterious - is it possible that it only starts working when you've signed in within the "work" container? (not explicitly listed in your steps as navigating from an internal site also including login - maybe it's implicit, just checking!) And are there other sites that depend on this sign-on that do work, in the same container? Do they feature the same redirect-to-google thing when not signed in?

Is the google login page definitely not set up to be forced into a different container, and/or is the "work" container limited to domains which excludes the google login one? (you can check by clicking the container toolbar icon, then "manage containers", then the container(s) in question, then check the state of "Limit to Designated Sites" toggle and/or "Manage site list...")

Have you tried clearing cookies for the target site and/or the google signin domains? (if you decide to do this now, it may be useful to back up your profile dir if you're interested in further figuring out what was tripping this, in case it fixes it - after all, once cleared, the cookies are gone so it might be tricky to get back into this state)

Component: Tabbed Browser → Networking
Flags: needinfo?(nika)
Flags: needinfo?(kershaw)
Flags: needinfo?(amos)
Product: Firefox → Core

(In reply to :Gijs (he/him) from comment #5)

  • anything in the browser console when this happens? (NOT the regular devtools, the browser one - ctrl-shift-j to open on Windows. You could also potentially see more of the network shenanigans with the browser toolbox if you were so inclined)

Nothing in the browser console, nothing in the browser toolbox's console either.

  • I assume you've already tried to reproduce with all your add-ons except the multi-account containers one disabled (ie excluding ublock / privacy badger / whatever you've got installed anti-tracking/adblock-wise from the list of potential culprits) ?

I have not 🙃 Disabling everything /but/ multi-account containers seems to fix it.

If I re-enable, in order:

  • uBlock origin: still working
  • 1password: still working
  • Dark Reader: still working
  • DuckDuckGo Privacy Essentials: broken

Re-enabling all except DDG Privacy Essentials seems to let me load the site consistently. With DDG PE enabled, it fails most of the time. (I'm still suspecting a race condition somewhere, there's maybe ~20% of sessions that do work from the get-go).

  • is the VPN/tailscale thing enabled 100% of the time (ie for the entire machine / all of Firefox), or container/site-specific ?

Entire machine all of the time.

  • is the target, internal site using any "interesting" networking stuff (http2/http3 not done by apache or other mainstream webservers, perhaps?)

I wish! Boring old HTTP/1.1.

This is all pretty mysterious - is it possible that it only starts working when you've signed in within the "work" container? (not explicitly listed in your steps as navigating from an internal site also including login - maybe it's implicit, just checking!) And are there other sites that depend on this sign-on that do work, in the same container? Do they feature the same redirect-to-google thing when not signed in?

(Re-enabling DDG PE to test)

There's other sites that require Google login, but they don't seem to have any effect on admin.vpn.flyio.net failing to load. In fact I was able to log into another site, then from the same work site type admin.vpn.flyio.net, enter, and it's stuck on infinite spinner again. So that workaround is not reliable either.

Is the google login page definitely not set up to be forced into a different container, and/or is the "work" container limited to domains which excludes the google login one? (you can check by clicking the container toolbar icon, then "manage containers", then the container(s) in question, then check the state of "Limit to Designated Sites" toggle and/or "Manage site list...")

Definitely not forced. Another site (internal Grafana) requires Google login and the full flow happens in the same Work container, no switching/new tabs involved.

Have you tried clearing cookies for the target site and/or the google signin domains? (if you decide to do this now, it may be useful to back up your profile dir if you're interested in further figuring out what was tripping this, in case it fixes it - after all, once cleared, the cookies are gone so it might be tricky to get back into this state)

I'm not sure exactly how to clear all of Google login's cookies. Signing out doesn't seem to make a difference either way.

The new best explanation just became DDG Privacy Essentials: with it enabled, sometimes I can reproduce my issue, and sometimes I can't. With it disabled, I cannot reproduce the issue. (Also, the internal site in question is HTTP, so I guess the full flow is HTTP (internal site) => HTTPS (Google login) => HTTP (internal site), which might be what trips ups DDG PE).

Given that the problem most likely lies in an extension, do y'all still consider that it's something worth for you to look into? Or do you consider it DDG's bug/problem now. The extension page says it has 1.5M users 🤷 (But also HTTP=>HTTPS=>HTTP is probably pretty infrequent).

Flags: needinfo?(amos)

Dropping Nika and Kershaw given we have a better diagnosis.

Mike, do we have contacts at DDG who we can advise of this? At a minimum the add-on should probably redirect to an error page so the user isn't left thinking their browser is broken.

Component: Networking → Developer Outreach
Flags: needinfo?(nika)
Flags: needinfo?(mconnor)
Flags: needinfo?(kershaw)
Product: Core → WebExtensions

(In reply to Amos Wenger from comment #6)

(In reply to :Gijs (he/him) from comment #5)

  • I assume you've already tried to reproduce with all your add-ons except the multi-account containers one disabled (ie excluding ublock / privacy badger / whatever you've got installed anti-tracking/adblock-wise from the list of potential culprits) ?

Re-enabling all except DDG Privacy Essentials seems to let me load the site consistently. With DDG PE enabled, it fails most of the time. (I'm still suspecting a race condition somewhere, there's maybe ~20% of sessions that do work from the get-go).

Fascinating. Well, I'm glad that you figured it out - sorry for the false starts, I had indeed assumed you'd tried this already.

<snip>

The new best explanation just became DDG Privacy Essentials: with it enabled, sometimes I can reproduce my issue, and sometimes I can't. With it disabled, I cannot reproduce the issue. (Also, the internal site in question is HTTP, so I guess the full flow is HTTP (internal site) => HTTPS (Google login) => HTTP (internal site), which might be what trips ups DDG PE).

Given that the problem most likely lies in an extension, do y'all still consider that it's something worth for you to look into? Or do you consider it DDG's bug/problem now. The extension page says it has 1.5M users 🤷 (But also HTTP=>HTTPS=>HTTP is probably pretty infrequent).

I think we've established a correlation and that will help work out what's going on; it might be a bug in the add-on, or in principle it might still be a bug in Firefox's webextension implementation. The add-on is open-source (github link) but from a skim of the feature list I'm not sure what would cause the issue - unless the https upgrade thing includes all subdomains (would be wrong...) and flyio.net is on their https upgrade list or something? But then it working some of the time is still surprising.

Either way, the easiest way to move forward now and find a real root cause is likely either someone with knowledge of their add-on, or able to reproduce (sorry!) digging in further into what the add-on is doing. It's quite likely the behaviour is impacted by domain names (given tracking and https upgrade lists etc. used) so even if I or other Mozilla folks produced a dummy site that had the same redirect behaviour I expect it's unlikely we'd immediately be able to reproduce. :-(

Per twitter, Jonathan is looking at this. :-)

Flags: needinfo?(mconnor) → needinfo?(jonathan)

Hello,

I’m from QA and I’m attempting to reproduce the issue in order to confirm it, but I’m having trouble with this so far.

As per reading through the comments, DDG PE has been revealed to be the culprit causing the issue it seems, so as such, I’ve installed the add-on and next I’ve tried connecting to an internal tool we have.

To access the tool, the same flow as described so far is used: internal site => login => internal site. The only difference is that it’s all HTTPS. No HTTP. So the flow is HTTPS (internal site) => HTTPS (Microsoft login) => HTTPS (internal site).
However, the issue does not occur at all after several attempts. So I’m not sure how this fits into the whole situation, but I thought that some additional details won’t hurt.

Hey! I said on twitter I was able to reproduce but now I'm no longer certain. I was seeing what is in the video without the extension and in the Browser Toolbox seeing a timeout network request. However I'm not just seeing 'Unable to connect' instantly.

If you toggle the extensions protections for the website does it fix the issue? (Click the browser action icon and click the 'Site Privacy Protection' and then either report or don't; if the page doesn't reload do it manually)

  • I suspect this is the extensions HTTP->HTTPS redirection changes but it's hard to say for certain. I think if the toggle doesn't work this is the most likely cause.

I'm interested do you also have HTTPS-Only mode enabled in about:preferences? Would you be willing to share the raw data from about:support?

Has something changed on the site overnight?

Alex: However, the issue does not occur at all after several attempts.

I'm fairly sure the issue only happens with HTTP => HTTPS => HTTP. We have other HTTPS => HTTPS => HTTPS sites that work fine under the exact same scenario.

Jonathan: If you toggle the extensions protections for the website does it fix the issue?

Yes, looks like it! (Interestingly, even though the location.href / documentURI are about:blank, the DDG Privacy Extension UI shows the right domain name).

Jonathan: I'm interested do you also have HTTPS-Only mode enabled in about:preferences?

No, I don't remember changing that settings, it's at "Don't enable HTTPS-Only Mode" for now.

Would you be willing to share the raw data from about:support?

I'd rather share pieces of it that you think are useful, at the risk of more back-and-forth!

Severity: -- → S4
Priority: -- → P5

Clear a needinfo that is pending on an inactive user.

Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jonathan)

I'm not able to reproduce this issue anymore, since I don't work at that job with the internal site anymore.

From what I can recall, disabling the DDG Privacy Extension fixed it, so it might just be an unfortunate interaction there. Anyway, I feel like this is safe to close unless someone else shows up with a very similar problem.

(In reply to Amos Wenger from comment #14)

I'm not able to reproduce this issue anymore, since I don't work at that job with the internal site anymore.

From what I can recall, disabling the DDG Privacy Extension fixed it, so it might just be an unfortunate interaction there. Anyway, I feel like this is safe to close unless someone else shows up with a very similar problem.

Thanks for the update! I'll close this out; we can always reopen when something similar appears.

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.