An increase in the number of tabs not correctly connecting to content processes are being reported for recent Nightlies
Categories
(Firefox :: Tabbed Browser, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox74 | --- | unaffected |
firefox75 | --- | verified |
firefox76 | --- | verified |
People
(Reporter: mconley, Assigned: mconley)
References
(Regression)
Details
(Keywords: regression, regressionwindow-wanted)
Attachments
(8 files)
Both tbabos and shorlander have been seeing this behaviour over the past two days or so.
From what I can gather, tabs don't have nsIRemoteTab available off of their browser's frameloaders, so a bunch of stuff in the front-end breaks because it's not designed to handle the case where the frameloader didn't successfully connect to a content process.
Putting this in DOM :: Content Processes for now, but perhaps this could also belong in Firefox :: Tabbed Browser, in the event that we're not handling a failure case properly here.
Comment hidden (obsolete) |
Assignee | ||
Comment 2•5 years ago
|
||
Apparently, both tbabos and shorlander are experiencing this on Windows - version 10, presumably.
Comment 3•5 years ago
|
||
Mike, is there an existing bug to improve frontend error handling of process launch failures?
This bug might be a regression from Yoric's async process launching changes (in bug 1602712), but it's unclear whether the bug is caused by actual process launch failures or frontend code getting confused by launching processes asynchronously.
Comment 4•5 years ago
|
||
Did not encounter this again so far on Windows 10. Attaching the screenshot I did when it happened.
Comment 5•5 years ago
|
||
I also encountered this issue. At the time being, I was connected via NordVpn and loaded amazon.com using the latest Nightly 75.0a1 on Windows 10 x64. I'm attaching a screenshot of the browser console.
Assignee | ||
Comment 7•5 years ago
|
||
I'm not aware of a pre-existing bug, no. Presuming content process launch is at fault here, I'm also not 100% certain what the appropriate response should be from the parent - should we retry? Show an error message? Something else?
Comment 8•5 years ago
|
||
This bug can happen without Fission (because e10s content process launching can fail), but this bug becomes more likely with Fission's many iframe processes and async process launching.
The frontend code needs to handle tab process launch failure more robustly. Moving to the Firefox frontend component.
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #8)
The frontend code needs to handle tab process launch failure more robustly.
Thanks, cpeterson. Do you know how we should be handling this case? Who do we talk to about that?
Comment 10•5 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) from comment #9)
(In reply to Chris Peterson [:cpeterson] from comment #8)
The frontend code needs to handle tab process launch failure more robustly.
Thanks, cpeterson. Do you know how we should be handling this case? Who do we talk to about that?
Nika will know. She recommended a new test be written to verify the frontend's handling of process launch failures doesn't regress.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
The root of many of these issues which cause frontend code to lock up is that a browser element may both be considered "remote", and not have any remoteTab
associated with it, as the remote tab has already crashed. There is then code in important code paths, such as the tab switching code, which don't null-check this value before accessing it, causing an exception.
There are 2 main ways that frontend code can handle this better:
- Catch all of the places where we access
remoteTab
without null-checking it, and add null-checks to them. - Ensure that any browser elements which contain a crashed tab are promptly replaced with an error document, so that general frontend code doesn't have to deal with the potentially-broken state.
The second case seems to be the most likely solution, as it can be done locally, and doesn't require writing tests for every frontend functionality operating correctly on a crashed tab. It also lets us load browser crashed UI into the crashed tab, which is a nicer UX than a blank document.
Right now, frontend code tries to show a crashed UI from the "oop-browser-crashed"
or "oop-browser-buildid-mismatch"
events are fired. If the crashing browser is selected, the onSelectedBrowserCrash
method adds the browser into a queue, and doesn't mark properly swap it to a tab crashed page until an "ipc:content-shutdown" event is fired with the ChildID from browser.frameLoader.childID
. I think this is the wrong behaviour in the case where the process failed to start at all, rather than crashing, however. In that case, the childID
will be 0
, and no "ipc:content-shutdown" observer notification will be fired, leaving the browser permanently in the queue. We probably want to immediately switch to a tab crashed page if childID
was 0
.
I worry there may also be some issues if the crashing browser is not selected. In the case of a normal "oop-browser-crashed", it seems we immediately try to restore it in the background, but in a "oop-browser-buildid-mismatch" case, we appear to do nothing. I worry this could also lead to issues if the tab which required starting a new process was in the background, and then was switched to the foreground.
Comment 12•5 years ago
|
||
Now I get this behavior on the latest Nightly, was just tab switching. Hovering over the tab will show "New Tab(pid 19060)"
Comment 13•5 years ago
|
||
Comment 14•5 years ago
•
|
||
Reproduction steps:
Set up:
dom.payments.request.enabled to "true"
region: US or CA
- Open several tabs
- Go to https://rsolomakhin.github.io/pr/us/
- Click on Buy
Payment widget is not displayed - Start switching tabs and occasionally refresh the Payments test page
For this one, I get a different error in browser console but its basically the same behavior. Attached the recording for repro steps.
Comment 15•5 years ago
|
||
Assignee | ||
Comment 16•5 years ago
|
||
Hm, no luck reproducing this on my Windows 10 machine with a recent Nightly.
Hey tbabos, if you're able to reproduce this semi-reliably, any chance you could help us find a regression range?
Assignee | ||
Comment 17•5 years ago
|
||
I'm able to reproduce if I happen to cause the an update to install by opening a separate profile using the same instance of Firefox, and then having a background tab attempt to migrate from one process to another.
Presuming something like this is the underlying cause, I'm taking Nika's advice here - my plan to immediately switch to the error pages regardless of foreground state if:
- It's a build-id mismatch crash
- It's a crash where we never had a childID (so likely that we didn't correctly launch a content process)
These are speculative fixes, because we're theorizing that this is what's causing this behaviour out in the wild, but it's the best we can do without more clues.
Assignee | ||
Comment 18•5 years ago
|
||
Updated•5 years ago
|
Comment 19•5 years ago
|
||
I am getting different regression ranges and quite different behaviors on different windows machines..:(
The only common thing is that the regression ranges are both from mid-february 2020.
However, both behaviors have the console error as mentioned in Comment 14, can't reproduce the error mentioned in Comment 4.
I will check if that fix solves my scenario when the patch is landed in Nightly tho, fingers crossed.
Comment 20•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
bugherder |
Assignee | ||
Comment 23•5 years ago
|
||
The speculative patch has been in Nightly for a few days now. tbabos, are you still seeing this?
Comment 24•5 years ago
|
||
Do we have any idea of when this regressed?
Updated•5 years ago
|
Comment 26•5 years ago
•
|
||
For the timeline it is somewhere mid February, I first saw it on 02-28 and Mike submitted the issue for it.
As good news, I can't reproduce it anymore using the steps from Comment 14 on latest Nightly!
As bad news, saw it yesterday on Beta but couldn't reproduce it again: new profile, open private window, close private window, open new private window.
Leaving this open a bit until I get to work more around Nightly and be confident about it as it is very intermittent but annoying when it happens.
Comment 27•5 years ago
|
||
This problem is older than February. I posted on reddit about this issue 4 months ago.
https://www.reddit.com/r/firefox/comments/doetiz/firefox_is_not_loading_pages/
Unfortunately I don't remember which version I was running back then.
I posted about the same a month ago a finally opened bug. This bug was closed as a duplicate of this one.
Comment 28•5 years ago
•
|
||
Hi Mathieu,
Thanks for reaching out to us again! Could you also check it on Nightly and see if it happens? We got hopes that the fix landed in Nightly could solve this issue. You can download it from here: https://www.mozilla.org/en-US/firefox/channel/desktop/
Comment 29•5 years ago
|
||
Fresh installation of nightly on Fedora 31. "First Run" tab didn't load.
Comment 30•5 years ago
|
||
I installed the latest nightly 76.0a1 on Fedora 31. The same problem occurred in the first 10 minutes of usage. On a new start of Firefox a new tab "Firefox Nightly First Run Page" opened. I noticed the following:
- this tab remains empty
- there is no loading animation
- the "Home" button does nothing
- typing an URL does nothing
- other tabs are working correctly
I could not get the content of the browser console after recording my screen.
Few minutes before that I had the same problem with the stable release from Fedora repo (v74.0): one of my pinned tab didn't load.
Updated•5 years ago
|
Comment 31•5 years ago
|
||
Did that happen before you signed in to sync? The addons that were synced could cause this too, please disable them and check it out once more.
Comment 32•5 years ago
|
||
Comment 33•5 years ago
|
||
Yesterday it did happen after I signed in to sync.
Today I started nightly with a new profile. This time I did not signin to sync and did not install any addon. It took less than 5 minutes before I get a tab that was stuck with the loading animation.
Comment 34•5 years ago
|
||
The patch landed in nightly and beta is affected.
:mconley, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 35•5 years ago
|
||
Hi mathieu.carpentier,
I suspect that this patch then is not solving things for you, and that you're likely experiencing a slightly different issue.
Do you happen to have the ESET Nod32 antivirus product installed and enabled on your Linux box?
Assignee | ||
Comment 36•5 years ago
|
||
Comment on attachment 9133606 [details]
Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws!
Beta/Release Uplift Approval Request
- User impact if declined: Background tabs that fail to get a content process associated with them due to launch failures (build ID mismatch, or other launch failures), might result in a broken tab.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The code being changed is nicely isolated, and also has automated tests. There is enough coverage here to make me confident that this can be uplifted safely.
- String changes made/needed: None.
Comment 37•5 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) from comment #35)
Hi mathieu.carpentier,
I suspect that this patch then is not solving things for you, and that you're likely experiencing a slightly different issue.
Do you happen to have the ESET Nod32 antivirus product installed and enabled on your Linux box?
Yes ! I do have ESET Nod32 v4 installed and running on my Linux machine.
Assignee | ||
Comment 38•5 years ago
|
||
(In reply to mathieu.carpentier from comment #37)
Yes ! I do have ESET Nod32 v4 installed and running on my Linux machine.
In that case, I suspect you're hitting bug 1604218. According to bug 1604218 comment 34, ESET is shipping an update that will fix this issue.
Comment 39•5 years ago
|
||
Comment on attachment 9133606 [details]
Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws!
approved for 75 rc1
Comment 40•5 years ago
|
||
bugherder uplift |
Comment 41•5 years ago
|
||
Spent a whole day opening and loading a lot of tabs with heavy content and didn't experience this issue anymore on Windows. Find it more comfortable to say it is fixed now on latest Nightly 76.0a1 (2020-03-30) (64-bit).
Comment 42•5 years ago
|
||
Verified-fixed on latest Beta 75.0 (64-bit) on Windows 10 x64 as well. Didn't encounter the issue during a day of surfing around on Beta.
Description
•