Open Bug 1463960 Opened 7 years ago Updated 2 years ago

When Linux distro's update distro-built Firefox, Firefox can no longer spawn new processes

Categories

(Core :: IPC, defect, P2)

defect

Tracking

()

People

(Reporter: cpearce, Unassigned)

References

Details

Attachments

(1 file, 3 obsolete files)

Linux distro's like Fedora and Ubuntu build and package their own Firefox builds. They push out updates to these packages using their own package managers. They update Firefox while Firefox is running. This means the Firefox on disk no longer matches the Firefox in memory. When Firefox in memory tries to spawn a new child process, it does so by executing a command line. This means the new child process is instantiated with a different version of Firefox/libxul than what's in memory. We then check that the buildID in the child process matches the parent [1]. If they don't match, we abort creating the process. This means that when Firefox is updated by the distro, we can't create any new content, plugin, or other kind of processes. This basically cripples Firefox. Since September last year, every 6 weeks when we ship a new Firefox version, I get an email from Netflix saying they've had a very dramatic spike in failures to play on Linux in the outgoing version of Firefox, and I finally tracked it down to this issue. The plugin-container process we're launching after the distro updater has run is failing to start due to this, and this manifests as (what appears to be) a CDM plugin crash. STR: (Ubuntu 17.10 x64) 1. Open Firefox release, as distributed by Ubuntu's package manager. 2. While Firefox is still running, downgrade Firefox with `sudo apt-get install firefox=56.0+build6-0ubuntu1`. 3. Login to Netflix and try to play a video. Observe failure to play. [1] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/ipc/glue/MessageChannel.cpp#1033
As discussed with Anthony, I think the failure should go back to the user via the UI, inviting them to restart. CDM is only a small part of what can go wrong in those situations.
This sounds like a dupe of bug 1366808, which recently landed and changes the behavior to warn the user to restart their browser rather than simply crash. For what it is worth, the reason I added this check was because we were getting crashes in this situation anyways, because we do not support backwards compatibility for IPDL messages. This simply consolidated the expected crashes into a single signature.
Stephen: In bug 1366808 you handled the case where something updates the Firefox binaries on disk causing problems for running Firefox instances. You added a prompt to encourage the user to restart when something updates the binary while Firefox is running. Does this prompt also show when creating a GMP/CDM process fails due to buildID mismatch? Netflix sees a spike of failures every time we ship a new Firefox version due to a buildID mismatch in the GMP/CDM process, caused by Linux distro updaters updating Firefox while we're running.a
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Chris Pearce (:cpearce) from comment #3) > Does this prompt also show when creating a GMP/CDM process fails due to buildID mismatch? Possibly. When you say "GMP/CDM process", are we talking about ContentProcess.cpp or GPUProcessImpl.cpp code? Or is this another kind of process? Attachment 8968929 [details] [diff] is what added code to recognize the buildID mismatch, and TabParent.cpp is dispatching the buildid-mismatch error. This error is then handled by tabbrowser.js, which makes the docshell display the about:restartrequired page via ContentCrashHandlers.jsm.
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Stephen A Pohl [:spohl] from comment #4) > Possibly. When you say "GMP/CDM process", are we talking about > ContentProcess.cpp or GPUProcessImpl.cpp code? Or is this another kind of > process? Another kind of process: GeckoProcessType_GMPlugin, top-level actor PGMP, process host class GMPProcessParent. (GMP in this context is “Gecko Media Plugin”.)
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #5) > (In reply to Stephen A Pohl [:spohl] from comment #4) > > Possibly. When you say "GMP/CDM process", are we talking about > > ContentProcess.cpp or GPUProcessImpl.cpp code? Or is this another kind of > > process? > > Another kind of process: GeckoProcessType_GMPlugin, top-level actor PGMP, > process host class GMPProcessParent. (GMP in this context is “Gecko Media > Plugin”.) Thank you. From taking a quick look, it seems like GMPParent and GMPChild could mimic ContentParent and ContentChild, i.e. pass the buildID from parent to child and the child exit if there is a mismatch. Depending on the desired behavior, we might want to dispatch the "oop-browser-buildid-mismatch" event, which would trigger the about:restartrequired page to be displayed via ContentCrashHandlers.jsm. This may have to be tweaked however, since this would most likely replace the contents of a tab with the about:restartrequired page. In contrast, in bug 1366808, the about:restartrequired page is displayed when a user opens a new tab which launches a new content process rather than replacing any existing content in a tab.
Attachment #8980485 - Attachment is obsolete: true
Attachment #8980486 - Attachment is obsolete: true
Stephen: The patch here send the notification to tabbrowser.js. That gets into ContentCrashHandlers.jsm, but since the tab hasn't actually crashed, we don't display about:restartrequired. What needs to change in the front end code to make about:restartrequired display when the tab hasn't crashed? I don't know much about how the front end code works...
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Chris Pearce (:cpearce) from comment #10) > Stephen: The patch here send the notification to tabbrowser.js. That gets > into ContentCrashHandlers.jsm, but since the tab hasn't actually crashed, we > don't display about:restartrequired. What needs to change in the front end > code to make about:restartrequired display when the tab hasn't crashed? I > don't know much about how the front end code works... This is most likely due to the fact that ContentCrashHandlers.jsm observes "ipc:content-shutdown", which is currently only dispatched by ContentParent. You may prefer to simply do a loadURI call to about:restartrequired from tabbrowser.js instead of routing this through ContentCrashHandlers.jsm.
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Stephen A Pohl [:spohl] from comment #11) > (In reply to Chris Pearce (:cpearce) from comment #10) > > Stephen: The patch here send the notification to tabbrowser.js. That gets > > into ContentCrashHandlers.jsm, but since the tab hasn't actually crashed, we > > don't display about:restartrequired. What needs to change in the front end > > code to make about:restartrequired display when the tab hasn't crashed? I > > don't know much about how the front end code works... > > This is most likely due to the fact that ContentCrashHandlers.jsm observes > "ipc:content-shutdown", which is currently only dispatched by ContentParent. > You may prefer to simply do a loadURI call to about:restartrequired from > tabbrowser.js instead of routing this through ContentCrashHandlers.jsm. ... although this would display the about:restartrequired page after a user clicks the restart button. So we might want to route this through ContentCrashHandlers.jsm after all. In terms of loading the page after a restart, note that we're currently running into bug 1464441 for content processes. But I suspect that if the page has loaded before the GMP process causes the page to switch to about:restartrequired, the page would probably reload and work correctly after a restart.
Stephen: Are you able to take over this bug? Most of the stuff requiring knowledge of how GMP works is hooked up here. You can instantiate a GMP process (and repro the version mismatch) by pointing Firefox at https://cpearce.github.io/mse-eme/ We also need to check whether the "Extension process" is affected.
Flags: needinfo?(spohl.mozilla.bugs)
I plan to look into this, but I have a large backlog of other bugs and reviews to work through first. Keeping n-i set.
Attached patch Patch (wip)Splinter Review
Updated patch to current trunk to start investigating. Leaving n-i set.
Assignee: nobody → spohl.mozilla.bugs
Attachment #8980487 - Attachment is obsolete: true
Status: NEW → ASSIGNED

Clearing n-i until this bug has bubbled back to the top of the backlog.

Flags: needinfo?(spohl.mozilla.bugs)
Assignee: spohl.mozilla.bugs → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: