Closed Bug 1618936 Opened 4 years ago Closed 4 years ago

An increase in the number of tabs not correctly connecting to content processes are being reported for recent Nightlies

Tracking

()

Status:

VERIFIED FIXED

Milestone:

Firefox 76

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox74

---

unaffected

firefox75

---

verified

firefox76

---

verified

People

(Reporter: mconley, Assigned: mconley)

References

(Regression)

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(8 files)

Browser Console error 4 years ago Timea Cernea [:tbabos][inactive] 174.43 KB, image/png		Details
2020-03-06_16h20_53.png 4 years ago Simona Badau, Desktop QA 85.29 KB, image/png		Details
76.0a1 (2020-03-13) (64-bit) 4 years ago Timea Cernea [:tbabos][inactive] 879.91 KB, video/mp4		Details
Browser Console error 2 4 years ago Timea Cernea [:tbabos][inactive] 45.51 KB, image/png		Details
Recording for repro steps 4 years ago Timea Cernea [:tbabos][inactive] 2.43 MB, video/mp4		Details
Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws! 4 years ago Mike Conley (:mconley) (:⚙️) 47 bytes, text/x-phabricator-request	jcristau : approval-mozilla-release+	Details \| Review
nightly 76.0a1 tab not loading 4 years ago mathieu.carpentier 1.63 MB, video/mp4		Details
76.0a1 (2020-03-25) (64-bit) 4 years ago mathieu.carpentier 2.46 MB, video/mp4		Details

Mike Conley (:mconley) (:⚙️)

Assignee

Description

•

4 years ago

Both tbabos and shorlander have been seeing this behaviour over the past two days or so.

From what I can gather, tabs don't have nsIRemoteTab available off of their browser's frameloaders, so a bunch of stuff in the front-end breaks because it's not designed to handle the case where the frameloader didn't successfully connect to a content process.

Putting this in DOM :: Content Processes for now, but perhaps this could also belong in Firefox :: Tabbed Browser, in the event that we're not handling a failure case properly here.

Comment hidden (obsolete)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 2

•

4 years ago

Apparently, both tbabos and shorlander are experiencing this on Windows - version 10, presumably.

Chris Peterson [:cpeterson]

Comment 3

•

4 years ago

Mike, is there an existing bug to improve frontend error handling of process launch failures?

This bug might be a regression from Yoric's async process launching changes (in bug 1602712), but it's unclear whether the bug is caused by actual process launch failures or frontend code getting confused by launching processes asynchronously.

Flags: needinfo?(mconley)

Comment 4

•

4 years ago

Attached image Browser Console error — Details

Did not encounter this again so far on Windows 10. Attaching the screenshot I did when it happened.

Simona Badau, Desktop QA

Comment 5

•

4 years ago

Attached image 2020-03-06_16h20_53.png — Details

I also encountered this issue. At the time being, I was connected via NordVpn and loaded amazon.com using the latest Nightly 75.0a1 on Windows 10 x64. I'm attaching a screenshot of the browser console.

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 7

•

4 years ago

I'm not aware of a pre-existing bug, no. Presuming content process launch is at fault here, I'm also not 100% certain what the appropriate response should be from the parent - should we retry? Show an error message? Something else?

Flags: needinfo?(mconley)

Chris Peterson [:cpeterson]

Comment 8

•

4 years ago

This bug can happen without Fission (because e10s content process launching can fail), but this bug becomes more likely with Fission's many iframe processes and async process launching.

The frontend code needs to handle tab process launch failure more robustly. Moving to the Firefox frontend component.

Component: DOM: Content Processes → Tabbed Browser

Product: Core → Firefox

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 9

•

4 years ago

(In reply to Chris Peterson [:cpeterson] from comment #8)

The frontend code needs to handle tab process launch failure more robustly.

Thanks, cpeterson. Do you know how we should be handling this case? Who do we talk to about that?

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 10

•

4 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #9)

(In reply to Chris Peterson [:cpeterson] from comment #8)

The frontend code needs to handle tab process launch failure more robustly.

Thanks, cpeterson. Do you know how we should be handling this case? Who do we talk to about that?

Nika will know. She recommended a new test be written to verify the frontend's handling of process launch failures doesn't regress.

Flags: needinfo?(cpeterson) → needinfo?(nika)

Dão Gottwald [:dao]

Updated

•

4 years ago

Priority: -- → P1

Summary: An increase in the number of tabs not correctly connecting to content processes are being reported for recent Nightly's → An increase in the number of tabs not correctly connecting to content processes are being reported for recent Nightlies

Nika Layzell [:nika] (ni? for response)

Comment 11

•

4 years ago

The root of many of these issues which cause frontend code to lock up is that a browser element may both be considered "remote", and not have any remoteTab associated with it, as the remote tab has already crashed. There is then code in important code paths, such as the tab switching code, which don't null-check this value before accessing it, causing an exception.

There are 2 main ways that frontend code can handle this better:

Catch all of the places where we access remoteTab without null-checking it, and add null-checks to them.
Ensure that any browser elements which contain a crashed tab are promptly replaced with an error document, so that general frontend code doesn't have to deal with the potentially-broken state.

The second case seems to be the most likely solution, as it can be done locally, and doesn't require writing tests for every frontend functionality operating correctly on a crashed tab. It also lets us load browser crashed UI into the crashed tab, which is a nicer UX than a blank document.

Right now, frontend code tries to show a crashed UI from the "oop-browser-crashed" or "oop-browser-buildid-mismatch" events are fired. If the crashing browser is selected, the onSelectedBrowserCrash method adds the browser into a queue, and doesn't mark properly swap it to a tab crashed page until an "ipc:content-shutdown" event is fired with the ChildID from browser.frameLoader.childID. I think this is the wrong behaviour in the case where the process failed to start at all, rather than crashing, however. In that case, the childID will be 0, and no "ipc:content-shutdown" observer notification will be fired, leaving the browser permanently in the queue. We probably want to immediately switch to a tab crashed page if childID was 0.

I worry there may also be some issues if the crashing browser is not selected. In the case of a normal "oop-browser-crashed", it seems we immediately try to restore it in the background, but in a "oop-browser-buildid-mismatch" case, we appear to do nothing. I worry this could also lead to issues if the tab which required starting a new process was in the background, and then was switched to the foreground.

Flags: needinfo?(nika) → needinfo?(mconley)

Timea Cernea [:tbabos][inactive]

Comment 12

•

4 years ago

Attached video 76.0a1 (2020-03-13) (64-bit) — Details

Now I get this behavior on the latest Nightly, was just tab switching. Hovering over the tab will show "New Tab(pid 19060)"

Timea Cernea [:tbabos][inactive]

Comment 13

•

4 years ago

Attached image Browser Console error 2 — Details

Timea Cernea [:tbabos][inactive]

Comment 14

•

4 years ago

•

Edited

Reproduction steps:
Set up:
dom.payments.request.enabled to "true"
region: US or CA

Open several tabs
Go to https://rsolomakhin.github.io/pr/us/
Click on Buy
Payment widget is not displayed
Start switching tabs and occasionally refresh the Payments test page

For this one, I get a different error in browser console but its basically the same behavior. Attached the recording for repro steps.

Timea Cernea [:tbabos][inactive]

Comment 15

•

4 years ago

Attached video Recording for repro steps — Details

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 16

•

4 years ago

Hm, no luck reproducing this on my Windows 10 machine with a recent Nightly.

Hey tbabos, if you're able to reproduce this semi-reliably, any chance you could help us find a regression range?

Flags: needinfo?(mconley) → needinfo?(tbabos)

Keywords: regressionwindow-wanted

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 17

•

4 years ago

I'm able to reproduce if I happen to cause the an update to install by opening a separate profile using the same instance of Firefox, and then having a background tab attempt to migrate from one process to another.

Presuming something like this is the underlying cause, I'm taking Nika's advice here - my plan to immediately switch to the error pages regardless of foreground state if:

It's a build-id mismatch crash
It's a crash where we never had a childID (so likely that we didn't correctly launch a content process)

These are speculative fixes, because we're theorizing that this is what's causing this behaviour out in the wild, but it's the best we can do without more clues.

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 18

•

4 years ago

Attached file Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws! — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → mconley

Status: NEW → ASSIGNED

Timea Cernea [:tbabos][inactive]

Comment 19

•

4 years ago

I am getting different regression ranges and quite different behaviors on different windows machines..:(
The only common thing is that the regression ranges are both from mid-february 2020.
However, both behaviors have the console error as mentioned in Comment 14, can't reproduce the error mentioned in Comment 4.
I will check if that fix solves my scenario when the patch is landed in Nightly tho, fingers crossed.

Flags: needinfo?(tbabos)

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

4 years ago

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Pulsebot

Comment 21

•

4 years ago

Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59cef5b69286
Make the front-end more robust in how it handles content process launch failures. r=dao

Cristina Coroiu [:ccoroiu]

Comment 22

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/59cef5b69286

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox76: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 76

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 23

•

4 years ago

The speculative patch has been in Nightly for a few days now. tbabos, are you still seeing this?

Flags: needinfo?(tbabos)

Ryan VanderMeulen [:RyanVM]

Comment 24

•

4 years ago

Do we have any idea of when this regressed?

Flags: needinfo?(mconley)

Flags: in-testsuite+

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 25

•

4 years ago

I don't. Perhaps tbabos has a timeline?

Flags: needinfo?(mconley)

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox74: --- → unaffected

status-firefox75: --- → affected

status-firefox-esr68: --- → unaffected

Timea Cernea [:tbabos][inactive]

Comment 26

•

4 years ago

•

Edited

For the timeline it is somewhere mid February, I first saw it on 02-28 and Mike submitted the issue for it.
As good news, I can't reproduce it anymore using the steps from Comment 14 on latest Nightly!

As bad news, saw it yesterday on Beta but couldn't reproduce it again: new profile, open private window, close private window, open new private window.

Leaving this open a bit until I get to work more around Nightly and be confident about it as it is very intermittent but annoying when it happens.

Flags: needinfo?(tbabos)

mathieu.carpentier

Comment 27

•

4 years ago

This problem is older than February. I posted on reddit about this issue 4 months ago.
https://www.reddit.com/r/firefox/comments/doetiz/firefox_is_not_loading_pages/

Unfortunately I don't remember which version I was running back then.

I posted about the same a month ago a finally opened bug. This bug was closed as a duplicate of this one.

Timea Cernea [:tbabos][inactive]

Comment 28

•

4 years ago

•

Edited

Hi Mathieu,
Thanks for reaching out to us again! Could you also check it on Nightly and see if it happens? We got hopes that the fix landed in Nightly could solve this issue. You can download it from here: https://www.mozilla.org/en-US/firefox/channel/desktop/

Flags: needinfo?(mathieu.carpentier)

mathieu.carpentier

Comment 29

•

4 years ago

Attached video nightly 76.0a1 tab not loading — Details

Fresh installation of nightly on Fedora 31. "First Run" tab didn't load.

Flags: needinfo?(mathieu.carpentier)

mathieu.carpentier

Comment 30

•

4 years ago

I installed the latest nightly 76.0a1 on Fedora 31. The same problem occurred in the first 10 minutes of usage. On a new start of Firefox a new tab "Firefox Nightly First Run Page" opened. I noticed the following:

this tab remains empty
there is no loading animation
the "Home" button does nothing
typing an URL does nothing
other tabs are working correctly

I could not get the content of the browser console after recording my screen.

Few minutes before that I had the same problem with the stable release from Fedora repo (v74.0): one of my pinned tab didn't load.

Emma Malysz

Updated

•

4 years ago

Regressed by: 1242912

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Timea Cernea [:tbabos][inactive]

Comment 31

•

4 years ago

Did that happen before you signed in to sync? The addons that were synced could cause this too, please disable them and check it out once more.

Flags: needinfo?(mathieu.carpentier)

mathieu.carpentier

Comment 32

•

4 years ago

Attached video 76.0a1 (2020-03-25) (64-bit) — Details

Flags: needinfo?(mathieu.carpentier)

mathieu.carpentier

Comment 33

•

4 years ago

Yesterday it did happen after I signed in to sync.

Today I started nightly with a new profile. This time I did not signin to sync and did not install any addon. It took less than 5 minutes before I get a tab that was stuck with the loading animation.

BugBot [:suhaib / :marco/ :calixte]

Comment 34

•

4 years ago

The patch landed in nightly and beta is affected.
:mconley, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(mconley)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 35

•

4 years ago

Hi mathieu.carpentier,

I suspect that this patch then is not solving things for you, and that you're likely experiencing a slightly different issue.

Do you happen to have the ESET Nod32 antivirus product installed and enabled on your Linux box?

Flags: needinfo?(mconley) → needinfo?(mathieu.carpentier)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 36

•

4 years ago

Comment on attachment 9133606 [details]
Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws!

Beta/Release Uplift Approval Request

User impact if declined: Background tabs that fail to get a content process associated with them due to launch failures (build ID mismatch, or other launch failures), might result in a broken tab.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): The code being changed is nicely isolated, and also has automated tests. There is enough coverage here to make me confident that this can be uplifted safely.
String changes made/needed: None.

Attachment #9133606 - Flags: approval-mozilla-beta?

mathieu.carpentier

Comment 37

•

4 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #35)

Hi mathieu.carpentier,

I suspect that this patch then is not solving things for you, and that you're likely experiencing a slightly different issue.

Do you happen to have the ESET Nod32 antivirus product installed and enabled on your Linux box?

Yes ! I do have ESET Nod32 v4 installed and running on my Linux machine.

Flags: needinfo?(mathieu.carpentier)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 38

•

4 years ago

(In reply to mathieu.carpentier from comment #37)

Yes ! I do have ESET Nod32 v4 installed and running on my Linux machine.

In that case, I suspect you're hitting bug 1604218. According to bug 1604218 comment 34, ESET is shipping an update that will fix this issue.

Julien Cristau [:jcristau] (back April 22)

Comment 39

•

4 years ago

Comment on attachment 9133606 [details]
Bug 1618936 - Make the front-end more robust in how it handles content process launch failures. r?jaws!

approved for 75 rc1

Attachment #9133606 - Flags: approval-mozilla-beta? → approval-mozilla-release+

Julien Cristau [:jcristau] (back April 22)

Comment 40

•

4 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-release/rev/d3a9a22636ab

status-firefox75: affected → fixed

Timea Cernea [:tbabos][inactive]

Comment 41

•

4 years ago

Spent a whole day opening and loading a lot of tabs with heavy content and didn't experience this issue anymore on Windows. Find it more comfortable to say it is fixed now on latest Nightly 76.0a1 (2020-03-30) (64-bit).

status-firefox76: fixed → verified

Timea Cernea [:tbabos][inactive]

Comment 42

•

4 years ago

Verified-fixed on latest Beta 75.0 (64-bit) on Windows 10 x64 as well. Didn't encounter the issue during a day of surfing around on Beta.

Status: RESOLVED → VERIFIED

status-firefox75: fixed → verified

Chris Peterson [:cpeterson]

Updated

•

4 years ago