Content processes never call SetJitExceptionHandler

RESOLVED FIXED in Firefox 54

Status

()

Core
JavaScript Engine: JIT
RESOLVED FIXED
8 months ago
8 months ago

People

(Reporter: dmajor, Assigned: dmajor)

Tracking

unspecified
mozilla55
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox53 wontfix, firefox54+ fixed, firefox55 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

8 months ago
While debugging something else, I noticed that we never take the |if (sJitExceptionHandler)| paths in content processes.

The parent process sets up the handler here:

00 0000002c`fc9ff000 00007ffb`683cbebc xul!CrashReporter::SetExceptionHandler+0x496
01 0000002c`fc9ff240 00007ffb`683cb6c1 xul!XREMain::XRE_mainInit+0x36c
02 0000002c`fc9ff6f0 00007ffb`683cb240 xul!XREMain::XRE_main+0x449
03 0000002c`fc9ff770 00007ff6`a2da189c xul!XRE_main+0x48
04 0000002c`fc9ff920 00007ff6`a2da13cc firefox!do_main+0x204
05 0000002c`fc9ffae0 00007ff6`a2da1b53 firefox!NS_internal_main+0x120
06 0000002c`fc9ffb50 00007ff6`a2dcc4c1 firefox!wmain+0x163
07 (Inline Function) --------`-------- firefox!invoke_main+0x22
08 0000002c`fc9ffbb0 00007ffb`b3752774 firefox!__scrt_common_main_seh+0x11d
09 0000002c`fc9ffbf0 00007ffb`b41e0d61 KERNEL32!BaseThreadInitThunk+0x14
0a 0000002c`fc9ffc20 00000000`00000000 ntdll!RtlUserThreadStart+0x21

which doesn't run in content processes because NS_internal_main at frame 05 diverges into content_process_main.

Ted: Can you remind me how crash reporting works for child processes? Surely we must set up a Breakpad handler _somehow_, even if it's not through CrashReporter::SetExceptionhandler. Wherever that code is for the child, I assume we should add another call to SetJitExceptionHandler.
Flags: needinfo?(ted)
Child processes call `SetRemoteExceptionHandler`:
https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#3763

Note that there are 3 implementations of it, one for each of Windows/Mac/Linux. We could certainly call this for child processes, it shouldn't harm anything and probably will help with JIT crashes. We also don't hook `SetUnhandledExceptionFilter` in child processes like we do in the browser process:
https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#1779

I'm not sure if that's important or not. We added that in response to third-party code calling it and breaking our crash reporting.
Flags: needinfo?(ted)
(Assignee)

Comment 2

8 months ago
Thanks Ted! Not hooking SetUnhandledExceptionFilter shouldn't be a problem.
(Assignee)

Comment 3

8 months ago
Just confirmed with crash-stats that this is actually affecting reporting.

In the past three months of Win64 builds:
1868 crashes in EnterBaseline in e10s-off processes
701 crashes in EnterBaseline in parent processes
12 crashes in EnterBaseline in content processes

(Note that the jit crash categorization doesn't work on Win64 builds so we still keep the EnterBaseline signature.)
(Assignee)

Comment 4

8 months ago
Created attachment 8860495 [details] [diff] [review]
childjit
Assignee: nobody → dmajor
Attachment #8860495 - Flags: review?(ted)
(Assignee)

Comment 5

8 months ago
Oops - I'll put the function return type on its own line before landing.
Attachment #8860495 - Flags: review?(ted) → review+

Comment 6

8 months ago
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7e83c4c31ee5
Call SetJitExceptionHandler in child processes too. r=ted
https://hg.mozilla.org/mozilla-central/rev/7e83c4c31ee5
Status: NEW → RESOLVED
Last Resolved: 8 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Comment 8

8 months ago
[Tracking Requested - why for this release]: This issue prevents us from having accurate crash numbers for Win64.
status-firefox53: --- → wontfix
status-firefox54: --- → affected
tracking-firefox54: --- → ?
(Assignee)

Comment 9

8 months ago
Comment on attachment 8860495 [details] [diff] [review]
childjit

I assume I don't have to do aurora approval anymore?

Approval Request Comment
[Feature/Bug causing the regression]: e10s
[User impact if declined]: We lose some crash reports on Win64. The folks managing the rollout need to have accurate crash numbers.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: partially. I do see some initial crashes rolling in but I'll want to look again when we have more data to make sure the volume is right.
[Needs manual test from QE? If yes, steps to reproduce]: no, but crash-stats people should also watch the numbers to verify
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: The JIT exception code has been around for a very long time in parent processes and non-e10s processes. This patch makes us use that code in child processes too.
[String changes made/needed]: none
Attachment #8860495 - Flags: approval-mozilla-beta?
(Assignee)

Updated

8 months ago
Blocks: 558448
Blocks: 1340936
Track 54+ as we need to have accurate crash numbers for Win64.
tracking-firefox54: ? → +
Comment on attachment 8860495 [details] [diff] [review]
childjit

In order to get accurate crash numbers for win64. Beta54+. Should be in 54 beta 3.
Attachment #8860495 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 12

8 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/bc5340535ce2
status-firefox54: affected → fixed
(In reply to David Major [:dmajor] from comment #9)
> [Is this code covered by automated tests?]: no
> [Has the fix been verified in Nightly?]: partially. I do see some initial
> crashes rolling in but I'll want to look again when we have more data to
> make sure the volume is right.
> [Needs manual test from QE? If yes, steps to reproduce]: no, but crash-stats
> people should also watch the numbers to verify

Setting qe-verify- based on David's assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.