Closed Bug 1626362 Opened 4 years ago Closed 4 years ago

DocumentLoadListener should not perform process switching for HTTP 204 requests or canceled requests

Categories

(Core :: DOM: Content Processes, defect, P2)

74 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- verified

People

(Reporter: juraj.masiar, Assigned: mattwoodrow)

References

(Regression)

Details

(Keywords: regression)

Attachments

(13 files)

1.79 MB, image/gif
Details
2.47 MB, image/gif
Details
610 bytes, application/zip
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

  1. Assign some domain to a specific container
  2. open any extension page
  3. change URL to open the domain

Actual results:

New tab is opened with the URL in the container.
Existing tab with the extension is now broken - empty.

Expected results:

It should not break the extension page!
This does not affect normal pages, only extension pages.
This mostly affects all Speed Dial extensions that has links to pages assigned to a container.

This is a regression (but I can't say which version). ESR 68 is not affected - opening page assigned to a container will open the page in new tab but close the extension page.

So far I have managed to reproduce this consistently on Windows pr 10 64- bit on on FF 76.0a1 (20200406092400) but not on FF 74.0.1(20200403064753) by using the following STR.

  1. In https://addons.mozilla.org/ search and install Speed Dial and Multi account Containers extensions.
  2. Open new tab and Save using Speed dial the "https://addons.mozilla.org/en-US/firefox/" url.
  3. In a new tab open an url, for example "https://en.wikipedia.org/wiki/Main_Page" and right click and select to re-open it one of the assigned containers.
  4. While on that tab open multi account containers popup and confirm "Always open in [ContainerName]". To ensure that open the same url in a new tab and confirm once more that the URl should always open in the chose container and that the decision should be remembered.
  5. Close all open tabs and using and open page saved in speed dial (. "https://addons.mozilla.org/en-US/firefox/" url.)
  6. In the url bar type in or search for the page that was assigned to a multi account container. (https://en.wikipedia.org/wiki/Main_Page).

Actual result:
The page is loaded in a new tab. The speed dial page becomes blank but keeps the typed in address in the url bar.

Can you confirm that this is the behavior you were referring to?
I do not manage to reproduce this with any random urls; right now these are the only two that consistently provide the same result when testing on Nightly.

Flags: needinfo?(juraj.masiar)

Yes that's it.
Yes it affects only extension pages, not normal pages.

Very common use case I'm facing is that I open new tab, type facebook, press Enter and now my New tab page is broken - because my New tab page is Group Speed Dial extension an my Facebook is assigned to a container.

Flags: needinfo?(juraj.masiar)

Reproduced this issue on Windows 10 Pro 64-bit and macOS catalina 10.15 on FF 75.0 (20200403170909) and FF 77.0a1 (20200406214750).

Found simpler repro steps:

  1. In https://addons.mozilla.org/ search and install Speed Dial and Multi account Containers extensions.
  2. Open a new tab and choose to "Keep changes" made by speed dial.
  3. In a new tab open and proceed with assigning 'https://en.wikipedia.org/wiki/Main_Page' to alywas open in one of the multi account Default containers.
  4. Return to tab opened in step two and start typing in and select the address saved in step 3.

Actual results:
The multi account container page opens in a new tab and the speed dial default page is left blank with the selected url left in the address bar.

Expected results:
The multi account container page should load in the speed dial tab.

Regression range:
app_name: firefox
build_date: 2020-01-10 01:44:49.366000
build_type: integration
build_url: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/FHTK318TSmCIa2QvIYGhAQ/runs/0/artifacts/public%2Fbuild%2Ftarget.zip
changeset: a9b06feb5942c5e16bcf9adefe84e081edc402c7
pushlog_url: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=325da10270e4bbc644df0fd29a6c3f6ef536d262&tochange=a9b06feb5942c5e16bcf9adefe84e081edc402c7

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Regressed by: 1601779

Hi Luke,
it seems that the changes linked in the regression window from comment 1 may have changed some behavior and the multi-account-containers feature that "opens a certain url always in a given container" is not working as it used to be? (at least when used in combination with speed dial)

Can you help us to investigate this issue? (or point us to another multi-accounts-container developer that could)

in particular we are interested to double-check how that feature is implemented in multi-account-containers and how the change has affected it, so that we can more easily decide if this has to be fixed on the Firefox side or something we'll need to fix on the extension side instead.

Flags: needinfo?(lcrouch)

Hi Shane, can you please (let) triage this bug? Thank you!

Flags: needinfo?(mixedpuppy)

Jens, I'm not sure what you're asking, but this but is in triage already, we've asked for input from someone familiar with one of the involved addons.

I think it's worth probably asking Matt to look at this as well, since the regression is due to the process switching changes.

Flags: needinfo?(mixedpuppy) → needinfo?(matt.woodrow)

BTW, this still reproduces on a nightly build from yesterday.

This seems like a bad interaction between the two extensions.

My reading of the Multi Account Containers code is that it always opens a new tab when we need to switch containers, and it sometimes closes the old tab too.

The old tab is closed if it's a new tab page (which it shouldn't be in this case, since it's the speed dial moz-extension: page, not about:home/newtab etc), or if the old tab was opened in the last two seconds (much longer ago, following the STR in comment 3).

Given that, it seems expected that we'd still have the old tab around in this case.

Before the regressing change, we'd have triggered a process switch (from the extension process to the web process) before the extension code got a onBeforeRequest listener callback. Now the http load is initiated from the old process, and we process switch during the response.

Jonathan, does that match your understanding of what's supposed to happen here?

Flags: needinfo?(matt.woodrow) → needinfo?(jonathan)

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is -- (Backlog,) indicating it has has not been previously triaged, the bug's Severity is being updated to -- (default, untriaged.)

Severity: normal → --
Flags: needinfo?(tomica)

Sorry for the delay, I'm not reading ni's at the moment (I'll update my status accordingly).

Yeah that is how I implemented the heuristics for tab switching in MaC (I say heuristics because there are a fair few other edge cases that are a pain to fix), I suspect this could be rectified in the addon (especially if more info from platform is provided through the APIs). I vaguely remember it was difficult to determine if this was a default Tab URL which would mean the extension could close it without worrying about data loss.

A better fix would be making contextual identity mutable in the extension APIs for the same tab; that way the extension wouldn't have to be hopping between tabs with interactions.

Flags: needinfo?(lcrouch)
Flags: needinfo?(jonathan)

Rob will investigate further

Severity: -- → S3
Flags: needinfo?(tomica) → needinfo?(rob)
Priority: -- → P3

It appears that canceling a navigation (away from addons.mozilla.org) results in an unusable tab (no content, context menu not working).

This is a minimal STR using extensions (in case we need to verify it again). I will shortly post STR without extensions.

STR:

  1. Load attached extension at about:debugging.
  2. Visit https://addons.mozilla.org
  3. Visit example.com/?cancelme in the same tab

Expected:

  • Request at step 3 is canceled, the tab should still show the content from step 2.

Actual:

  • Tab is completely unusable. The title still shows the value of the previous page, but it's not useful.
Flags: needinfo?(rob)

STR without extensions:

  1. Visit https://addons.mozilla.org
  2. Visit https://204.robwu.nl in the same tab

Expected:

  • Content from step 1 should still be shown.

Actual:

  • Page becomes completely unusable. The context menu is not working for example.
  • The tab title is still showing the previous title.
  • When I focus the location bar with Ctrl-L, the cursor appears at the end of the location bar (instead of selecting it). When I press Esc, the location bar becomes empty.
  • I cannot drag and drop URLs to the location bar of the broken tab any more.

Given a non-extension str in comment 13 and the regression range, I'm moving this over to dom navigation.

Severity: S3 → --
Component: Frontend → DOM: Navigation
Flags: needinfo?(matt.woodrow)
Priority: P3 → --
Product: WebExtensions → Core

I tried mozregression with 73, 74, 75 and could not reproduce good behavior.

However, when I started with good=68, I was able to narrow down the regression range to https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=7844dd0167b33057843784328291488bc3ccd2d7&tochange=c6cdd18f14022f0f808da0142b2180b3bef9b266
i.e. bug 1557074 (Firefox 71).

Kicking this bug over to that component for triage.

Component: DOM: Navigation → DOM: Content Processes
Flags: needinfo?(matt.woodrow)
Regressed by: 1557074
Has Regression Range: --- → yes
Summary: Opening page assigned to a Firefox Container will break opened addon page → A navigation away from addons.mozilla.org to a URL that is canceled (e.g. HTTP 204) results in an unusable tab

Matt, the STR in comment 13 can reproduce this bug without any extensions installed. Nika thinks this is a DocumentLoadListener bug. Can you look into this? I don't know how severe this bug.

Nika says DocumentLoadListener should not perform process switching for HTTP 204 requests or canceled requests. OnStartRequest is selecting a process to switch to, but doesn't check for HTTP 204 cancels. We should never complete this request. HTTP 204 is considered a successful HTTP response code.

Severity: -- → S3
Flags: needinfo?(matt.woodrow)
Priority: -- → P2
Summary: A navigation away from addons.mozilla.org to a URL that is canceled (e.g. HTTP 204) results in an unusable tab → DocumentLoadListener should not perform process switching for HTTP 204 requests or canceled requests

I'm worried that there are multiple different bugs here.

  • Firefox container switching addon doesn't use the existing tab and opens a new one instead - I debugged this using the STR from comment 3, and wrote up my explanation in comment 8. This seems like mostly a bug with the addon, where it's using heuristics to decide when it can re-use the tab, and the platform changed a bit and broke them.

  • The old tab is left in an unusable state when this happens. Seems like a frontend bug to me, if we can't open context menus etc.

  • Cancelling a load from an extension results in a broken tab. Again, seems like a frontend issue, possibly the same one.

  • 204 responses trigger a process switch, and then a broken tab. This will be easy to prevent in DocumentLoadListener, but I don't think doing so addresses any of the other bugs here.

Flags: needinfo?(matt.woodrow)

Looks like extensions canceling loads happens with NS_ERROR_ABORT, so we should probably avoid process switching in that case too.

Gijs, do you have any ideas why we end up in a broken state here?

We also probably need a test for some of these.

Flags: needinfo?(gijskruitbosch+bugs)

I'm still worried that there might be other cases that decide not to load anything, and we don't want to have a broken tab.

Assignee: nobody → matt.woodrow

Attached a patch that fixes the STR from comment 12 and comment 13.

(In reply to Matt Woodrow (:mattwoodrow) from comment #17)

I'm worried that there are multiple different bugs here.

  • Firefox container switching addon doesn't use the existing tab and opens a new one instead - I debugged this using the STR from comment 3, and wrote up my explanation in comment 8. This seems like mostly a bug with the addon, where it's using heuristics to decide when it can re-use the tab, and the platform changed a bit and broke them.

The Multi-Account Containers add-on cancels the load and retries with a new tab to switch the container / userContextId. That looks intentional, and unless it's possible to change the container state for an existing tab, this looks optimal. Comment 12 is a minimal reduction that triggers the same relevant code.

Possibly relevant: bug 1174493, bug 1610523 (esp bug 1610523 comment 4).

Basically, 204s in toplevel browsing contexts have confusing behaviour, even without process switches involved.

(In reply to Matt Woodrow (:mattwoodrow) from comment #19)

Gijs, do you have any ideas why we end up in a broken state here?

We also probably need a test for some of these.

We seem to be loading the initial (about:blank) document with a null principal and then doing nothing else (based on what the browsing context is saying in the parent, that is). Right-clicking doesn't create the context menu child, which indicates that the relevant event is not arriving in the child process's jswindowactor. Given that the <browser> is visible, and it has a(n apparently valid) frameLoader and remoteTab pointing to the new process, I think we've successfully switched processes, but then stop painting anything or responding to events, or something? I'm not 100% convinced that's a frontend issue - DOM code should be creating our actor for the event, I think?

You're definitely right that this can occur without add-ons or an explicit process switch; you can see the same thing if you e.g. set Firefox to open about:blank as the default new tab / new window (in the prefs, "Home" section), then copy-paste data:text/html,<a href="https://204.robwu.nl/">Click into the URL bar, hit enter to load it, then middle-click / right click + open in new tab the link.

In terms of updating the UI (tab title etc.), I think we might be getting a start request and then mis-processing the stop request. I think an issue we've had in the past here is that when users trigger a load, especially in an empty tab, and the load never completes because of network issues and we end up not showing an error page, they don't want to lose the URL that they typed/copied (x-ref bug 610357 ; there were a lot of bugs like this at one point, e.g. bug 1354796, bug 798249, various bugs linked from there). We do revert the URL if something is currently loaded in the tab, given that the URL bar needs to be usable as a security indicator of what is being shown - see https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/browser/base/content/tabbrowser.js#6068-6079 - but here, we don't because there is nothing loaded, the tab is empty.

What should happen here? Probably load about:blank "correctly", with a null principal, and context menu + browser UI should work. But I actually don't know why the context menu doesn't work, and I also don't know if/how we could distinguish the 204 case from the "the network failed us" case, by the time we get to onStateChange... Does the request object have information we can use here?

Not sure this comment is helping / what you are asking about - please feel free to re-ping if I'm misunderstanding the question...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(matt.woodrow)

I don't think we need these to be done at different times, since the set of checks that happen before uri fixup are error codes that won't be affected by fixup.

Depends on D80106

We shouldn't need to handle this with an early return in docshell, since the classifier failure codes won't display an error page anyway.

Depends on D80107

This is only used by Thunderbird, and is always true for Firefox. I've made CanSet only allow the embedder process, which is the desired behaviour, and should work for non-e10s.

Depends on D80108

This also fixes a bug where we were setting mOriginalUriString in docshell before InternalLoad (which clears it), instead of after.

Depends on D80109

Attachment #9156155 - Attachment description: Bug 1626362 - Don't process switch for loads that aren't going to display anything. r?nika → Bug 1626362 - Don't forward loads to DocumentLoadListener if nsDocumentOpenInfo returned a failure, since doing so would lose the error code. r?nika
Flags: needinfo?(matt.woodrow)
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/66a9ba0f87cb
Add test for not process switching when we get a 204 response. r=Gijs
https://hg.mozilla.org/integration/autoland/rev/4d31c2ede058
Move docshell uri fixup code into a static helper. r=nika
https://hg.mozilla.org/integration/autoland/rev/5b170666991e
Move checks for deciding if we should load an error page into helpers. r=nika
https://hg.mozilla.org/integration/autoland/rev/5f3ee46032b6
Combine two load error page functions into one. r=nika
https://hg.mozilla.org/integration/autoland/rev/a627870e35fb
Move url classifier notifications to parent process. r=nika,necko-reviewers,JuniorHsu
https://hg.mozilla.org/integration/autoland/rev/1d31061f4421
Move nsDocShell::mUserErrorPages to BrowsingContext. r=nika
https://hg.mozilla.org/integration/autoland/rev/69b5b01da9c1
Make mOriginalUriString available in DocumentLoadListener and docshell. r=nika,necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/5ba5cf98a95d
Let DocumentLoadListener determine if docshell will display content (from the channel or from loading an error page), and only allow switching to a new process if so. r=nika
https://hg.mozilla.org/integration/autoland/rev/d29ac651a4fd
Don't forward loads to DocumentLoadListener if nsDocumentOpenInfo returned a failure, since doing so would lose the error code. r=nika

Backed out for failures on a-download-click-404.html

backout: https://hg.mozilla.org/integration/autoland/rev/dce41fe38c1ba37aaf629abff6d9d97b4f8830c7

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=wpt&revision=d29ac651a4fd75e4eb32a8dbd26cb07ce0f44b23&group_state=expanded

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307443058&repo=autoland&lineNumber=4246

[task 2020-06-24T23:32:46.899Z] 23:32:46 INFO - TEST-START | /html/semantics/text-level-semantics/the-a-element/a-download-click-404.html
[task 2020-06-24T23:32:46.914Z] 23:32:46 INFO - Closing window 20
[task 2020-06-24T23:32:47.469Z] 23:32:47 INFO -
[task 2020-06-24T23:32:47.469Z] 23:32:47 INFO - TEST-UNEXPECTED-FAIL | /html/semantics/text-level-semantics/the-a-element/a-download-click-404.html | Do not navigate to 404 for anchor with download - assert_unreached: Navigated instead of downloading Reached unreachable code
[task 2020-06-24T23:32:47.470Z] 23:32:47 INFO - TEST-OK | /html/semantics/text-level-semantics/the-a-element/a-download-click-404.html | took 575ms

Flags: needinfo?(matt.woodrow)

The WPT a-download-click-404.html requires this behaviour for links with the download attribute, and this is also the current behaviour for Content-Disposition: Attachment (see bug 1604308).

This previously worked because the parent process version of nsDocumentOpenInfo failed (with NS_ERROR_FILE_NOT_FOUND), but the error code was discarded and we forwarded the channel to the content process. The content process version then would then return NS_ERROR_WONT_HANDLE_CONTENT since the load requires downloading, but we don't allow that in the content process. This new error code is one that doesn't have an associated error page (unlilke the original error), so was silently discarded.

Depends on D79409

See Also: → 1647460
See Also: 1647460
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/811cf28bf5f3
Add test for not process switching when we get a 204 response. r=Gijs
https://hg.mozilla.org/integration/autoland/rev/721b94e6e3ce
Move docshell uri fixup code into a static helper. r=nika
https://hg.mozilla.org/integration/autoland/rev/a9e250dce7ed
Move checks for deciding if we should load an error page into helpers. r=nika
https://hg.mozilla.org/integration/autoland/rev/741edfcbb68d
Combine two load error page functions into one. r=nika
https://hg.mozilla.org/integration/autoland/rev/5c59b040c35a
Move url classifier notifications to parent process. r=nika,necko-reviewers,JuniorHsu
https://hg.mozilla.org/integration/autoland/rev/4e545131f22a
Move nsDocShell::mUserErrorPages to BrowsingContext. r=nika
https://hg.mozilla.org/integration/autoland/rev/45ec43c2ad0f
Make mOriginalUriString available in DocumentLoadListener and docshell. r=nika,necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/53cb6ca4a42f
Let DocumentLoadListener determine if docshell will display content (from the channel or from loading an error page), and only allow switching to a new process if so. r=nika
https://hg.mozilla.org/integration/autoland/rev/b6efc270e9a0
Don't forward loads to DocumentLoadListener if nsDocumentOpenInfo returned a failure, since doing so would lose the error code. r=nika
https://hg.mozilla.org/integration/autoland/rev/30cde47f9364
Don't return a failure from nsDocumentOpenInfo::OnStartRequest if we decide to handle the response with the external helper app and then fail. r=nika
Flags: needinfo?(matt.woodrow)

Is this a bug we can live with on ESR78 for the next year or is this something we should consider backporting there?

Flags: needinfo?(matt.woodrow)

It's a fairly large set of changes, not sure we want to uplift that right at the end of the cycle.

Flags: needinfo?(matt.woodrow)
QA Whiteboard: [qa-79b-p2]

Reproduced with 79.0a1 (2020-06-10) on Windows 10.
Verified fixed with Fx 80.0a1 (2020-07-20) and Fx 79.0 on Windows 10 and Ubuntu 18.04

Status: RESOLVED → VERIFIED
Regressions: 1656310
Regressions: 1662925
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: