Daemonize the crash helper process on Linux and macOS
Categories
(Toolkit :: Crash Reporting, enhancement)
Tracking
()
People
(Reporter: gsvelto, Assigned: gsvelto)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(12 files, 1 obsolete file)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
The crash helper process was designed to go away as soon as the main process dies, however there's been reports of the process lingering around after the main process died. The simplest way to avoid this issue is to daemonize the crash helper so that it will be re-parented to PID 1 and the main process won't have to deal with reaping it.
| Assignee | ||
Comment 1•9 months ago
|
||
This is the second step towards daemonizing the crash helper process. The code
used to listen to incoming connections and messages is moved directly into the
server for better coupling. New and existing connections are now categorized
as belonging to Firefox parent process, a child process or an external process
(typically the WER service on Windows). The permission checks on the various
messages are now based on these categories, instead of the PIDs of the
crash helper clients.
| Assignee | ||
Comment 2•9 months ago
|
||
This paves the way for the daemonization of the crash helper. First of all the
main process won't know the PID of the crash helper process directly, as it
won't be a child of the main process anymore. Additionally, child processes'
IPC channels will be created in the main process, and then both ends will be
handed to the crash helper and child respectively. Because of this it won't be
possible for a child to fetch the PID of the other endpoint of the pipe (all
operating systems record the process identifier of the process that created
the pipe as the endpoint, regardless of what process one endpoint is handed
to afterwards).
This patch removes all references to the crash helper PID and ways to fetch it
from the crash helper IPC machinery. See the following patches for how we
deal with authorizing certain operations now that we don't have the PIDs
anymore.
| Assignee | ||
Comment 3•9 months ago
|
||
This is the second step towards daemonizing the crash helper process. The code
used to listen to incoming connections and messages is moved directly into the
server for better coupling. New and existing connections are now categorized
as belonging to Firefox parent process, a child process or an external process
(typically the WER service on Windows). The permission checks on the various
messages are now based on these categories, instead of the PIDs of the
crash helper clients.
Updated•9 months ago
|
| Assignee | ||
Comment 4•9 months ago
|
||
| Assignee | ||
Comment 5•9 months ago
|
||
| Assignee | ||
Comment 6•9 months ago
|
||
| Assignee | ||
Comment 7•9 months ago
|
||
This channel is currently only used on Linux to rendez-vous with the
crash helper, obtain its PID and use it to enable the process to be
dumped when the Yama LSM is enabled. It will be used to actually request
a dump in the future, when the breakpad exception handler is removed.
| Assignee | ||
Comment 8•9 months ago
|
||
This is almost ready, it just needs a bit of testing and a tiny bit of code in the Linux implementation. It has grown significantly though (the patch set is probably north of 1k lines worth of changes) but it paves the way for future changes.
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
| Assignee | ||
Comment 9•8 months ago
|
||
I was ready to land this but further testing on TC shows that the changes might impact the number of intermittent failures on macOS. Specifically it seems like this makes a known problem with utility processes worse. Under some conditions we fail to generate a minidump when we kill a utility process. See this error for how it looks like in the wild. This is a known issue and a long standing one, but it's been under the pain threshold for the longest time so we haven't addressed it yet. It is also possible that it is a Breakpad issue, which made me loathe looking at it given that we're phasing out breakpad. I'll dig into it because I cannot land this with this kind of failure rate, it'd make the macOS test runs very orange.
| Assignee | ||
Comment 10•8 months ago
|
||
More digging into the test results shows that the failures are happening largely on 10.15. I probably hit some nasty corner-case in 10.15 with the new code I've introduced.
| Assignee | ||
Comment 11•8 months ago
|
||
This appears to be an issue with AF_UNIX sockets on macOS 10.15 specifically. This might be a lot of hassle given even the existing code path for more recent versions of macOS contains two workarounds around an unfixed kernel bug.
Comment 12•8 months ago
|
||
Comment 13•8 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/e792d735168b
https://hg.mozilla.org/mozilla-central/rev/6eefae4581af
https://hg.mozilla.org/mozilla-central/rev/65d1352abcb1
https://hg.mozilla.org/mozilla-central/rev/b8224aa4253e
https://hg.mozilla.org/mozilla-central/rev/96c59acf4e43
https://hg.mozilla.org/mozilla-central/rev/c9486da0aa0b
Updated•7 months ago
|
Comment 14•7 months ago
|
||
(In reply to Pulsebot from comment #12)
Pushed by gsvelto@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/0b2f2877c6f2
https://hg.mozilla.org/integration/autoland/rev/e792d735168b
Remove the crash helper PID from the IPC connectors r=afranchuk
https://github.com/mozilla-firefox/firefox/commit/8d6e237319bf
https://hg.mozilla.org/integration/autoland/rev/6eefae4581af
Move the IPC waiting and connection logic directly into the server
r=afranchuk
https://github.com/mozilla-firefox/firefox/commit/d8a08af8244f
https://hg.mozilla.org/integration/autoland/rev/65d1352abcb1
Allow macOS crash helper IPC to pass file descriptors r=afranchuk
https://github.com/mozilla-firefox/firefox/commit/ef48704b7a7b
https://hg.mozilla.org/integration/autoland/rev/b8224aa4253e
Allow Windows crash helper IPC to pass handles r=afranchuk
https://github.com/mozilla-firefox/firefox/commit/767696d4884e
https://hg.mozilla.org/integration/autoland/rev/96c59acf4e43
Daemonize the crash helper process on Linux and macOS r=afranchuk
https://github.com/mozilla-firefox/firefox/commit/846cb99e6fab
https://hg.mozilla.org/integration/autoland/rev/c9486da0aa0b
Introduce an IPC channel between child processes and the crash helper
r=afranchuk
Perfherder has detected a browsertime performance change from push c9486da0aa0b91e203bd0ffc33d65f799f61cbee.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
|---|---|---|---|---|---|
| 15% | speedometer3 Perf-Dashboard/SelectingRange/Async | windows11-64-24h2-shippable | fission webrender | 1.30 -> 1.11 | Before/After |
| 3% | speedometer3 NewsSite-Next/NavigateToUS/Async | windows11-64-24h2-shippable | fission webrender | 21.12 -> 20.53 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 45932
The following documentation link provides more information about this command.
| Assignee | ||
Comment 15•6 months ago
|
||
This paves the way for the daemonization of the crash helper. First of all the
main process won't know the PID of the crash helper process directly, as it
won't be a child of the main process anymore. Additionally, child processes'
IPC channels will be created in the main process, and then both ends will be
handed to the crash helper and child respectively. Because of this it won't be
possible for a child to fetch the PID of the other endpoint of the pipe (all
operating systems record the process identifier of the process that created
the pipe as the endpoint, regardless of what process one endpoint is handed
to afterwards).
This patch removes all references to the crash helper PID and ways to fetch it
from the crash helper IPC machinery. See the following patches for how we
deal with authorizing certain operations now that we don't have the PIDs
anymore.
Original Revision: https://phabricator.services.mozilla.com/D251558
Updated•6 months ago
|
| Assignee | ||
Comment 16•6 months ago
|
||
This is the second step towards daemonizing the crash helper process. The code
used to listen to incoming connections and messages is moved directly into the
server for better coupling. New and existing connections are now categorized
as belonging to Firefox parent process, a child process or an external process
(typically the WER service on Windows). The permission checks on the various
messages are now based on these categories, instead of the PIDs of the
crash helper clients.
Original Revision: https://phabricator.services.mozilla.com/D251559
Updated•6 months ago
|
| Assignee | ||
Comment 17•6 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D252664
Updated•6 months ago
|
| Assignee | ||
Comment 18•6 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D254045
Updated•6 months ago
|
| Assignee | ||
Comment 19•6 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D254046
Updated•6 months ago
|
| Assignee | ||
Comment 20•6 months ago
|
||
This channel is currently only used on Linux to rendez-vous with the
crash helper, obtain its PID and use it to enable the process to be
dumped when the Yama LSM is enabled. It will be used to actually request
a dump in the future, when the breakpad exception handler is removed.
Original Revision: https://phabricator.services.mozilla.com/D254047
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Comment 21•6 months ago
|
||
| uplift | ||
Description
•