While debugging something else, I noticed that we never take the |if (sJitExceptionHandler)| paths in content processes. The parent process sets up the handler here: 00 0000002c`fc9ff000 00007ffb`683cbebc xul!CrashReporter::SetExceptionHandler+0x496 01 0000002c`fc9ff240 00007ffb`683cb6c1 xul!XREMain::XRE_mainInit+0x36c 02 0000002c`fc9ff6f0 00007ffb`683cb240 xul!XREMain::XRE_main+0x449 03 0000002c`fc9ff770 00007ff6`a2da189c xul!XRE_main+0x48 04 0000002c`fc9ff920 00007ff6`a2da13cc firefox!do_main+0x204 05 0000002c`fc9ffae0 00007ff6`a2da1b53 firefox!NS_internal_main+0x120 06 0000002c`fc9ffb50 00007ff6`a2dcc4c1 firefox!wmain+0x163 07 (Inline Function) --------`-------- firefox!invoke_main+0x22 08 0000002c`fc9ffbb0 00007ffb`b3752774 firefox!__scrt_common_main_seh+0x11d 09 0000002c`fc9ffbf0 00007ffb`b41e0d61 KERNEL32!BaseThreadInitThunk+0x14 0a 0000002c`fc9ffc20 00000000`00000000 ntdll!RtlUserThreadStart+0x21 which doesn't run in content processes because NS_internal_main at frame 05 diverges into content_process_main. Ted: Can you remind me how crash reporting works for child processes? Surely we must set up a Breakpad handler _somehow_, even if it's not through CrashReporter::SetExceptionhandler. Wherever that code is for the child, I assume we should add another call to SetJitExceptionHandler.
Child processes call `SetRemoteExceptionHandler`: https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#3763 Note that there are 3 implementations of it, one for each of Windows/Mac/Linux. We could certainly call this for child processes, it shouldn't harm anything and probably will help with JIT crashes. We also don't hook `SetUnhandledExceptionFilter` in child processes like we do in the browser process: https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#1779 I'm not sure if that's important or not. We added that in response to third-party code calling it and breaking our crash reporting.
Thanks Ted! Not hooking SetUnhandledExceptionFilter shouldn't be a problem.
Just confirmed with crash-stats that this is actually affecting reporting. In the past three months of Win64 builds: 1868 crashes in EnterBaseline in e10s-off processes 701 crashes in EnterBaseline in parent processes 12 crashes in EnterBaseline in content processes (Note that the jit crash categorization doesn't work on Win64 builds so we still keep the EnterBaseline signature.)
Created attachment 8860495 [details] [diff] [review] childjit
Oops - I'll put the function return type on its own line before landing.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7e83c4c31ee5 Call SetJitExceptionHandler in child processes too. r=ted
[Tracking Requested - why for this release]: This issue prevents us from having accurate crash numbers for Win64.
Comment on attachment 8860495 [details] [diff] [review] childjit I assume I don't have to do aurora approval anymore? Approval Request Comment [Feature/Bug causing the regression]: e10s [User impact if declined]: We lose some crash reports on Win64. The folks managing the rollout need to have accurate crash numbers. [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: partially. I do see some initial crashes rolling in but I'll want to look again when we have more data to make sure the volume is right. [Needs manual test from QE? If yes, steps to reproduce]: no, but crash-stats people should also watch the numbers to verify [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: The JIT exception code has been around for a very long time in parent processes and non-e10s processes. This patch makes us use that code in child processes too. [String changes made/needed]: none
Track 54+ as we need to have accurate crash numbers for Win64.
Comment on attachment 8860495 [details] [diff] [review] childjit In order to get accurate crash numbers for win64. Beta54+. Should be in 54 beta 3.
(In reply to David Major [:dmajor] from comment #9) > [Is this code covered by automated tests?]: no > [Has the fix been verified in Nightly?]: partially. I do see some initial > crashes rolling in but I'll want to look again when we have more data to > make sure the volume is right. > [Needs manual test from QE? If yes, steps to reproduce]: no, but crash-stats > people should also watch the numbers to verify Setting qe-verify- based on David's assessment on manual testing needs.