Closed
Bug 1352496
Opened 7 years ago
Closed 7 years ago
CrashManager only permits (non-main) crash pings from content process types
Categories
(Toolkit :: Crash Reporting, defect)
Toolkit
Crash Reporting
Tracking
()
VERIFIED
FIXED
mozilla55
People
(Reporter: ddurst, Assigned: Dexter)
References
Details
(Whiteboard: [measurement:client])
Attachments
(2 files, 1 obsolete file)
59 bytes,
text/x-review-board-request
|
ted
:
review+
benjamin
:
review+
gchang
:
approval-mozilla-beta+
|
Details |
7.86 KB,
patch
|
Details | Diff | Splinter Review |
Per https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c10, only PROCESS_TYPE_CONTENT crash pings are being sent. [https://dxr.mozilla.org/mozilla-central/source/toolkit/components/crashes/CrashManager.jsm#472] Which means that anything else (PROCESS_TYPE_GPU, for example) is not. Leaving it up to someone more responsible to decide whether we should discretely add process types to control inclusion, or to just accept all things that are not main or null/undefined.
Reporter | ||
Comment 1•7 years ago
|
||
NI Alessio, since I know he's been looking at (admittedly some different) parts of this machinery lately. Alessio -- I would say we should patch this and uplift it as far as we can. Is that something you can do easily in gsvelto's absence?
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → alessio.placitelli
Flags: needinfo?(alessio.placitelli)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
Benjamin mentioned that Gabriele added this constraint deliberately. Should we still change it? I have no background for this functionality, so I might be missing some important piece of the puzzle
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(ddurst)
Assignee | ||
Updated•7 years ago
|
Attachment #8854004 -
Flags: review?(ted)
Reporter | ||
Comment 5•7 years ago
|
||
It was added deliberately when we first introduced the content process (https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c10). We absolutely should change it, the question is just about how. Should we: a) introduce each process type's inclusion to sending its crash ping there, so that they're gated in one place until we're satisfied with them and then refactor it to be all process types that are not main? b) just constrain it to not main and that way anything that is a valid declared type could be included? c) add some more process type control so that we could easily toggle those off if necessary (that's scope creep, but seemed worth suggesting)?
Flags: needinfo?(ddurst)
Reporter | ||
Comment 6•7 years ago
|
||
(Oh, and I should add that Milan needs GPU process crashes yesterday, which is the driver for this.)
Comment 7•7 years ago
|
||
Realistically I think we probably should send all process types. I imagine the following concerns: * bandwidth/storage: make sure that this won't blow telemetry budgets (P1) * analysis: make sure that our various aggregates and analyses are expecting this new data (P1) * quality control: ** make sure that this actually works for GPU/plugin/GMP (P1) ** and the submitted data is reasonably correct (P1) ** and that it doesn't break stuff in the browser (P1) * analysis part 2: make sure that our aggregates are actively presenting this data (P2) In terms of solving the immediate problem, can we do this for GPU right away and then come back to plugins? There are a lot of plugin crashes, and I'm not sure we care about them much. Making Flash CtP will dramatically reduce that crash rate where it won't affect budget as much.
Updated•7 years ago
|
Flags: needinfo?(ddurst)
Flags: needinfo?(alessio.placitelli)
Comment 8•7 years ago
|
||
mozreview-review |
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review128656 Do we know if `addCrash` gets called for crashes from these other process types? Is this a thing that happens by default for new process types or does someone need to wire it up? Given the discussion of new process types we should probably get these sorts of things documented so people don't miss steps and not get crash data when they were expecting to. ::: toolkit/components/crashes/tests/xpcshell/test_crash_manager.js:529 (Diff revision 2) > + [["payload", "crashId"], id], > + ]); > + Assert.ok(!found, "No telemetry ping must be submitted for invalid process types"); > + } > +}); > +/* Did you intend to leave this test commented out?
Attachment #8854004 -
Flags: review?(ted) → review+
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Realistically I think we probably should send all process types. I imagine > the following concerns: > > * bandwidth/storage: make sure that this won't blow telemetry budgets (P1) > * analysis: make sure that our various aggregates and analyses are expecting > this new data (P1) > * quality control: > ** make sure that this actually works for GPU/plugin/GMP (P1) > ** and the submitted data is reasonably correct (P1) > ** and that it doesn't break stuff in the browser (P1) > * analysis part 2: make sure that our aggregates are actively presenting > this data (P2) > > In terms of solving the immediate problem, can we do this for GPU right away > and then come back to plugins? There are a lot of plugin crashes, and I'm > not sure we care about them much. Making Flash CtP will dramatically reduce > that crash rate where it won't affect budget as much. Clearing my ni?, as ddurst is probably the best person to answer this :)
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Updated•7 years ago
|
Whiteboard: [measurement:client]
Reporter | ||
Comment 10•7 years ago
|
||
> (In reply to Benjamin Smedberg [:bsmedberg] from comment #7)
> > Realistically I think we probably should send all process types. I imagine
> > the following concerns:
> >
> > * bandwidth/storage: make sure that this won't blow telemetry budgets (P1)
> > * analysis: make sure that our various aggregates and analyses are expecting
> > this new data (P1)
> > * quality control:
> > ** make sure that this actually works for GPU/plugin/GMP (P1)
> > ** and the submitted data is reasonably correct (P1)
> > ** and that it doesn't break stuff in the browser (P1)
> > * analysis part 2: make sure that our aggregates are actively presenting
> > this data (P2)
> >
> > In terms of solving the immediate problem, can we do this for GPU right away
> > and then come back to plugins? There are a lot of plugin crashes, and I'm
> > not sure we care about them much. Making Flash CtP will dramatically reduce
> > that crash rate where it won't affect budget as much.
I agree -- we should enable GPU explicitly right now and complete the analysis (both phases) listed above.
Then we should do a thorough look through the bandwidth and repeat for other types once verified OK.
Flags: needinfo?(ddurst)
Reporter | ||
Comment 11•7 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8) > Do we know if `addCrash` gets called for crashes from these other process > types? Is this a thing that happens by default for new process types or does > someone need to wire it up? Given the discussion of new process types we > should probably get these sorts of things documented so people don't miss > steps and not get crash data when they were expecting to. I assumed that the specific case for PROCESS_TYPE_CONTENT proved that it did, but that's probably too trusting. Milan -- is this something your team would have wired, or are you using default behavior (in which case, I will track it down)?
Flags: needinfo?(milan)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 13•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review128656 Given the conversation on the bug, I've changed CrashManager.jsm to only allow content and GPU crashes. > Did you intend to leave this test commented out? Whoops, sorry, no. Leftover :-)
David, question in comment 11.
Flags: needinfo?(milan) → needinfo?(dvander)
(In reply to David Durst [:ddurst] from comment #11) > (In reply to Ted Mielczarek [:ted.mielczarek] from comment #8) > > Do we know if `addCrash` gets called for crashes from these other process > > types? Is this a thing that happens by default for new process types or does > > someone need to wire it up? Given the discussion of new process types we > > should probably get these sorts of things documented so people don't miss > > steps and not get crash data when they were expecting to. > > I assumed that the specific case for PROCESS_TYPE_CONTENT proved that it > did, but that's probably too trusting. > > Milan -- is this something your team would have wired, or are you using > default behavior (in which case, I will track it down)? If I understand what you're asking correctly, AddCrash is called unconditionally when crashes are reported in Gecko: http://searchfox.org/mozilla-central/rev/381a7b8f8a78b481336cfa384cb0a5536e217e4a/ipc/glue/CrashReporterHost.cpp#160
Flags: needinfo?(dvander)
Reporter | ||
Comment 16•7 years ago
|
||
This should answer ted's question from #c8. Alessio, it's all you.
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to David Durst [:ddurst] from comment #16) > This should answer ted's question from #c8. Alessio, it's all you. Thanks David. By looking at what :dvander said, I noticed that a few lines above [1] Gabriele added this: > if (mProcessType == nsICrashService::PROCESS_TYPE_CONTENT) { > CrashReporter::RunMinidumpAnalyzer(mChildDumpID); > } Which, by looking at [2], seems to "extract the stack traces from the dump and store them in JSON format as an annotation in the extra file associated with the crash". Is this something that we need to do for the GPU process in this bug? [1] - http://searchfox.org/mozilla-central/rev/381a7b8f8a78b481336cfa384cb0a5536e217e4a/ipc/glue/CrashReporterHost.cpp#144 [2] - http://searchfox.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#3096
Flags: needinfo?(ddurst)
Reporter | ||
Comment 18•7 years ago
|
||
I assume so, based on the content process crash handling being the prototype for new/additional process crash handling. NI ted to confirm that [2] would be necessary for GPU (any, really) process crashes as well. I believe the original intent holds for all non-main process crashes (https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c11).
Flags: needinfo?(ddurst) → needinfo?(ted)
Comment 19•7 years ago
|
||
Yes, I had forgotten about that, but we'll need to do that. The minidump-analyzer is what produces the JSON containing the threads and modules etc that we submit in the crash ping.
Flags: needinfo?(ted)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19) > Yes, I had forgotten about that, but we'll need to do that. The > minidump-analyzer is what produces the JSON containing the threads and > modules etc that we submit in the crash ping. No problem, I've updated the patch to only allow crash pings on content + GPU crashes and made it run the minidum-analyzer for gpu crashes as well! Does that need a new review? @Benjamin, I'm flagging you because I'm assuming this needs a data review. I've updated the documentation bits to mention that the crash ping is sent for the gpu process as well.
Flags: needinfo?(ted)
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review131616 data-r=me
Attachment #8854004 -
Flags: review?(benjamin) → review+
Comment 23•7 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #21) > No problem, I've updated the patch to only allow crash pings on content + > GPU crashes and made it run the minidum-analyzer for gpu crashes as well! > Does that need a new review? If those are the only changes then you can carry forward my r+.
Flags: needinfo?(ted)
Comment 24•7 years ago
|
||
Pushed by alessio.placitelli@gmail.com: https://hg.mozilla.org/integration/autoland/rev/0788b396df3c Enable sending other child processes crash pings from the CrashManager. r=bsmedberg,ted
Comment 25•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0788b396df3c
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 26•7 years ago
|
||
Hi :milan! Before flagging for QA validation on this bug, could you suggest any way to trigger a "GPU crash" in Firefox? Is this even possible?
Flags: needinfo?(milan)
Nightly has a "Terminate GPU Process" button in about:support. The other option is to look up the GPU process ID (also in about:support), find it in the task manager, and terminate the process there.
Flags: needinfo?(milan)
Comment 28•7 years ago
|
||
Terminating the GPU process from the task manager doesn't cause it to crash and won't trigger a crash report. I don't know how "terminate GPU process" works so I can't say whether that triggers a crash or not. But Alessio you should be able to use crashfirefox.exe to trigger a crash; you'll need to specify the PID of the GPU process.
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #28) > Terminating the GPU process from the task manager doesn't cause it to crash > and won't trigger a crash report. I don't know how "terminate GPU process" > works so I can't say whether that triggers a crash or not. > > But Alessio you should be able to use crashfirefox.exe to trigger a crash; > you'll need to specify the PID of the GPU process. Grand! Thanks Benjamin, I was getting crazy, "crashfirefox.exe" finally did the trick and I can see crash pings being sent. @Abe, could you please validate this bug? Let's make sure that: - Crash pings are generated when a GPU crash happens. They should be sent when Firefox restarts. - They validate correctly against our schema To crash the GPU process you need to build "crashfirefox.exe" from [1]. After its built, you want to execute "crashfirefox.exe <gpu pid>". To get the PID of the GPU process, go to about:support and look for the "GPUProcessPid" entry: the number is the PID. Please note that GPU acceleration might not be available if you're running Firefox in a virtual machine (didn't work for me in Firefox on Ubuntu in a VM). [1] - https://github.com/bsmedberg/crashfirefox-intentionally [2] - https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/telemetry/crash.schema.json
Flags: needinfo?(amasresha)
Comment 30•7 years ago
|
||
Prebuilt crashfirefox binary is available at https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/
Comment 31•7 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #30) > Prebuilt crashfirefox binary is available at > https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/ Benjamin, what tools do you suggest to crash the GPU process in Mac and Linux?
Flags: needinfo?(amasresha) → needinfo?(benjamin)
Comment 32•7 years ago
|
||
See instructions at https://developer.mozilla.org/en-US/docs/Mozilla/How_to_report_a_hung_Firefox
Flags: needinfo?(benjamin)
Comment 33•7 years ago
|
||
n.b.: killing the GPU process with SIGABRT on OS X won't currently generate a crash report, but bug 1350917 landed on autoland yesterday so it should be in Nightly soon.
Comment 34•7 years ago
|
||
I have tested this on 64 bits of Windows-10 with 32 bits of Nightly and it works as expected. Here are the test cases: https://public.etherpad-mozilla.org/p/Bug_1352496#lineNumber=151 However, I have two concerns: First, it seems that we do not support GPU processes in Mac and Linux. So the testing will not cover those operating systems. Second, "crashfirefox.exe" only crashes 32 bits of Firefox, not 64 bits of Firefox. So what other tool do you recommend to crash the GPU process in 64 bits of Firefox? Thanks
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Abe - QA (:Abe_LV) from comment #34) > I have tested this on 64 bits of Windows-10 with 32 bits of Nightly and it > works as expected. Here are the test cases: > https://public.etherpad-mozilla.org/p/Bug_1352496#lineNumber=151 > > However, I have two concerns: > First, it seems that we do not support GPU processes in Mac and Linux. So > the testing will not cover those operating systems. Ok! Good! > Second, "crashfirefox.exe" only crashes 32 bits of Firefox, not 64 bits of > Firefox. So what other tool do you recommend to crash the GPU process in 64 > bits of Firefox? > Thanks I've compiled the "crashfirefox.exe" executable on my own (x64), and it correctly crashes my x64 Firefox. I've uploaded it for you here: https://drive.google.com/a/mozilla.com/file/d/0B1RACVTBwZ94VENQbFRDbTlkMHM/view?usp=sharing
Flags: needinfo?(alessio.placitelli)
Comment 36•7 years ago
|
||
Verified-fixed. We have completed testing on windows 10 with 32 bits and 64 bits of latest nightly, and all is green. Test cases and runs are here:https://public.etherpad-mozilla.org/p/Bug_1352496 Please needinfo or email me if you have questions about the testing. Thanks
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 37•7 years ago
|
||
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. Approval Request Comment [User impact if declined]: The GFX team won't have access to crash ping Telemetry triggered by GPU crashes [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Low risk [Why is the change risky/not risky?]: The changes to enable GPU code are trivial and could only compromise sending GPU crash pings, which were not available prior to this patch. The biggest chunk of the patch is to improve automated test coverage. [String changes made/needed]: None
Attachment #8854004 -
Flags: approval-mozilla-aurora?
Comment 38•7 years ago
|
||
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. 54 was merged to Beta today.
Attachment #8854004 -
Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Assignee | ||
Comment 39•7 years ago
|
||
(In reply to [Out of Office Until 24-April] Ryan VanderMeulen [:RyanVM] from comment #38) > Comment on attachment 8854004 [details] > Bug 1352496 - Enable sending other child processes crash pings from the > CrashManager. > > 54 was merged to Beta today. Whoops, thanks Ryan.
Updated•7 years ago
|
status-firefox54:
--- → affected
Comment 40•7 years ago
|
||
Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. Fix a crash ping telemetry issue and was verified. Beta54+.
Attachment #8854004 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 41•7 years ago
|
||
needs rebasing for beta because of problems like : warning: conflicts while merging toolkit/components/crashes/tests/xpcshell/test_crash_manager.js! (edit, then use 'hg resolve --mark') abort: unresolved conflicts, can't continue
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Comment 42•7 years ago
|
||
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Updated•7 years ago
|
Attachment #8859479 -
Attachment is patch: true
Comment 43•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/c8b0d74c8c41
Flags: in-testsuite+
Comment 44•7 years ago
|
||
Backed out for eslint failures in test_crash_manager.js. https://hg.mozilla.org/releases/mozilla-beta/rev/22a981da3926 https://treeherder.mozilla.org/logviewer.html#?job_id=92845452&repo=mozilla-beta
Flags: needinfo?(alessio.placitelli)
Comment 45•7 years ago
|
||
Not sure if tied to the eslint failures or not, but the test was failing too. https://treeherder.mozilla.org/logviewer.html#?job_id=92863062&repo=mozilla-beta
Assignee | ||
Comment 46•7 years ago
|
||
Gah, sorry, I made a mistake during the rebase, I've tested this locally and it works. Yes, the eslint failures were related. Sorry for the mess. Tomcat, would you mind relanding this patch on beta?
Attachment #8859479 -
Attachment is obsolete: true
Flags: needinfo?(alessio.placitelli) → needinfo?(cbook)
Comment 47•7 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/c1ef9597ea513c00df914df119d146c45a5cc784 landed :)
Flags: needinfo?(cbook)
Comment 48•7 years ago
|
||
Alessio, should Release QA manually verify this on Beta 54 as well?
Flags: needinfo?(alessio.placitelli)
Assignee | ||
Comment 49•7 years ago
|
||
(In reply to Andrei Vaida [:avaida], Desktop Release QA – please ni? me from comment #48) > Alessio, should Release QA manually verify this on Beta 54 as well? David, do you think that's useful?
Flags: needinfo?(alessio.placitelli) → needinfo?(ddurst)
Comment 51•7 years ago
|
||
(In reply to David Durst [:ddurst] from comment #50) > Yes! I'm always in favor of verification. Roger that. We can use Comment 36 for context.
Flags: qe-verify+
Comment 52•7 years ago
|
||
Verified as fixed using Firefox 54 beta 7 32- and 64-bit under Win 10 64-bit.
You need to log in
before you can comment on or make changes to this bug.
Description
•