1352496 - CrashManager only permits (non-main) crash pings from content process types

Reporter

Description

•

9 years ago

Per https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c10, only PROCESS_TYPE_CONTENT crash pings are being sent. [https://dxr.mozilla.org/mozilla-central/source/toolkit/components/crashes/CrashManager.jsm#472] Which means that anything else (PROCESS_TYPE_GPU, for example) is not. Leaving it up to someone more responsible to decide whether we should discretely add process types to control inclusion, or to just accept all things that are not main or null/undefined.

David Durst [:ddurst]

Reporter

Comment 1

•

9 years ago

NI Alessio, since I know he's been looking at (admittedly some different) parts of this machinery lately. Alessio -- I would say we should patch this and uplift it as far as we can. Is that something you can do easily in gsvelto's absence?

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Updated

•

9 years ago

Assignee: nobody → alessio.placitelli

Flags: needinfo?(alessio.placitelli)

Comment hidden (mozreview-request)

Alessio Placitelli [:Dexter]

Assignee

Comment 4

•

9 years ago

Benjamin mentioned that Gabriele added this constraint deliberately. Should we still change it? I have no background for this functionality, so I might be missing some important piece of the puzzle

Alessio Placitelli [:Dexter]

Assignee

Updated

•

9 years ago

Flags: needinfo?(ddurst)

Alessio Placitelli [:Dexter]

Assignee

Updated

•

9 years ago

Attachment #8854004 - Flags: review?(ted)

David Durst [:ddurst]

Reporter

Comment 5

•

9 years ago

It was added deliberately when we first introduced the content process (https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c10). We absolutely should change it, the question is just about how. Should we: a) introduce each process type's inclusion to sending its crash ping there, so that they're gated in one place until we're satisfied with them and then refactor it to be all process types that are not main? b) just constrain it to not main and that way anything that is a valid declared type could be included? c) add some more process type control so that we could easily toggle those off if necessary (that's scope creep, but seemed worth suggesting)?

Flags: needinfo?(ddurst)

David Durst [:ddurst]

Reporter

Comment 6

•

9 years ago

(Oh, and I should add that Milan needs GPU process crashes yesterday, which is the driver for this.)

Benjamin Smedberg

Comment 7

•

9 years ago

Realistically I think we probably should send all process types. I imagine the following concerns: * bandwidth/storage: make sure that this won't blow telemetry budgets (P1) * analysis: make sure that our various aggregates and analyses are expecting this new data (P1) * quality control: ** make sure that this actually works for GPU/plugin/GMP (P1) ** and the submitted data is reasonably correct (P1) ** and that it doesn't break stuff in the browser (P1) * analysis part 2: make sure that our aggregates are actively presenting this data (P2) In terms of solving the immediate problem, can we do this for GPU right away and then come back to plugins? There are a lot of plugin crashes, and I'm not sure we care about them much. Making Flash CtP will dramatically reduce that crash rate where it won't affect budget as much.

Benjamin Smedberg

Updated

•

9 years ago

Flags: needinfo?(ddurst)

Flags: needinfo?(alessio.placitelli)

(not currently active) Ted Mielczarek

Comment 8

•

9 years ago

mozreview-review

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review128656 Do we know if `addCrash` gets called for crashes from these other process types? Is this a thing that happens by default for new process types or does someone need to wire it up? Given the discussion of new process types we should probably get these sorts of things documented so people don't miss steps and not get crash data when they were expecting to. ::: toolkit/components/crashes/tests/xpcshell/test_crash_manager.js:529 (Diff revision 2) > + [["payload", "crashId"], id], > + ]); > + Assert.ok(!found, "No telemetry ping must be submitted for invalid process types"); > + } > +}); > +/* Did you intend to leave this test commented out?

Attachment #8854004 - Flags: review?(ted) → review+

Alessio Placitelli [:Dexter]

Assignee

Comment 9

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Realistically I think we probably should send all process types. I imagine > the following concerns: > > * bandwidth/storage: make sure that this won't blow telemetry budgets (P1) > * analysis: make sure that our various aggregates and analyses are expecting > this new data (P1) > * quality control: > ** make sure that this actually works for GPU/plugin/GMP (P1) > ** and the submitted data is reasonably correct (P1) > ** and that it doesn't break stuff in the browser (P1) > * analysis part 2: make sure that our aggregates are actively presenting > this data (P2) > > In terms of solving the immediate problem, can we do this for GPU right away > and then come back to plugins? There are a lot of plugin crashes, and I'm > not sure we care about them much. Making Flash CtP will dramatically reduce > that crash rate where it won't affect budget as much. Clearing my ni?, as ddurst is probably the best person to answer this :)

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Updated

•

9 years ago

Whiteboard: [measurement:client]

David Durst [:ddurst]

Reporter

Comment 10

•

9 years ago

> (In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > > Realistically I think we probably should send all process types. I imagine > > the following concerns: > > > > * bandwidth/storage: make sure that this won't blow telemetry budgets (P1) > > * analysis: make sure that our various aggregates and analyses are expecting > > this new data (P1) > > * quality control: > > ** make sure that this actually works for GPU/plugin/GMP (P1) > > ** and the submitted data is reasonably correct (P1) > > ** and that it doesn't break stuff in the browser (P1) > > * analysis part 2: make sure that our aggregates are actively presenting > > this data (P2) > > > > In terms of solving the immediate problem, can we do this for GPU right away > > and then come back to plugins? There are a lot of plugin crashes, and I'm > > not sure we care about them much. Making Flash CtP will dramatically reduce > > that crash rate where it won't affect budget as much. I agree -- we should enable GPU explicitly right now and complete the analysis (both phases) listed above. Then we should do a thorough look through the bandwidth and repeat for other types once verified OK.

Flags: needinfo?(ddurst)

David Durst [:ddurst]

Reporter

Comment 11

•

9 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8) > Do we know if `addCrash` gets called for crashes from these other process > types? Is this a thing that happens by default for new process types or does > someone need to wire it up? Given the discussion of new process types we > should probably get these sorts of things documented so people don't miss > steps and not get crash data when they were expecting to. I assumed that the specific case for PROCESS_TYPE_CONTENT proved that it did, but that's probably too trusting. Milan -- is this something your team would have wired, or are you using default behavior (in which case, I will track it down)?

Flags: needinfo?(milan)

Comment hidden (mozreview-request)

Alessio Placitelli [:Dexter]

Assignee

Comment 13

•

9 years ago

mozreview-review-reply

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review128656 Given the conversation on the bug, I've changed CrashManager.jsm to only allow content and GPU crashes. > Did you intend to leave this test commented out? Whoops, sorry, no. Leftover :-)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 14

•

9 years ago

David, question in comment 11.

Flags: needinfo?(milan) → needinfo?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 15

•

9 years ago

(In reply to David Durst [:ddurst] from comment #11) > (In reply to Ted Mielczarek [:ted.mielczarek] from comment #8) > > Do we know if `addCrash` gets called for crashes from these other process > > types? Is this a thing that happens by default for new process types or does > > someone need to wire it up? Given the discussion of new process types we > > should probably get these sorts of things documented so people don't miss > > steps and not get crash data when they were expecting to. > > I assumed that the specific case for PROCESS_TYPE_CONTENT proved that it > did, but that's probably too trusting. > > Milan -- is this something your team would have wired, or are you using > default behavior (in which case, I will track it down)? If I understand what you're asking correctly, AddCrash is called unconditionally when crashes are reported in Gecko: http://searchfox.org/mozilla-central/rev/381a7b8f8a78b481336cfa384cb0a5536e217e4a/ipc/glue/CrashReporterHost.cpp#160

Flags: needinfo?(dvander)

David Durst [:ddurst]

Reporter

Comment 16

•

9 years ago

This should answer ted's question from #c8. Alessio, it's all you.

Alessio Placitelli [:Dexter]

Assignee

Comment 17

•

9 years ago

(In reply to David Durst [:ddurst] from comment #16) > This should answer ted's question from #c8. Alessio, it's all you. Thanks David. By looking at what :dvander said, I noticed that a few lines above [1] Gabriele added this: > if (mProcessType == nsICrashService::PROCESS_TYPE_CONTENT) { > CrashReporter::RunMinidumpAnalyzer(mChildDumpID); > } Which, by looking at [2], seems to "extract the stack traces from the dump and store them in JSON format as an annotation in the extra file associated with the crash". Is this something that we need to do for the GPU process in this bug? [1] - http://searchfox.org/mozilla-central/rev/381a7b8f8a78b481336cfa384cb0a5536e217e4a/ipc/glue/CrashReporterHost.cpp#144 [2] - http://searchfox.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#3096

Flags: needinfo?(ddurst)

David Durst [:ddurst]

Reporter

Comment 18

•

9 years ago

I assume so, based on the content process crash handling being the prototype for new/additional process crash handling. NI ted to confirm that [2] would be necessary for GPU (any, really) process crashes as well. I believe the original intent holds for all non-main process crashes (https://bugzilla.mozilla.org/show_bug.cgi?id=1293656#c11).

Flags: needinfo?(ddurst) → needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 19

•

9 years ago

Yes, I had forgotten about that, but we'll need to do that. The minidump-analyzer is what produces the JSON containing the threads and modules etc that we submit in the crash ping.

Flags: needinfo?(ted)

Comment hidden (mozreview-request)

Alessio Placitelli [:Dexter]

Assignee

Comment 21

•

9 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19) > Yes, I had forgotten about that, but we'll need to do that. The > minidump-analyzer is what produces the JSON containing the threads and > modules etc that we submit in the crash ping. No problem, I've updated the patch to only allow crash pings on content + GPU crashes and made it run the minidum-analyzer for gpu crashes as well! Does that need a new review? @Benjamin, I'm flagging you because I'm assuming this needs a data review. I've updated the documentation bits to mention that the crash ping is sent for the gpu process as well.

Flags: needinfo?(ted)

Benjamin Smedberg

Comment 22

•

9 years ago

mozreview-review

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. https://reviewboard.mozilla.org/r/126002/#review131616 data-r=me

Attachment #8854004 - Flags: review?(benjamin) → review+

(not currently active) Ted Mielczarek

Comment 23

•

9 years ago

(In reply to Alessio Placitelli [:Dexter] from comment #21) > No problem, I've updated the patch to only allow crash pings on content + > GPU crashes and made it run the minidum-analyzer for gpu crashes as well! > Does that need a new review? If those are the only changes then you can carry forward my r+.

Flags: needinfo?(ted)

Pulsebot

Comment 24

•

9 years ago

Pushed by alessio.placitelli@gmail.com: https://hg.mozilla.org/integration/autoland/rev/0788b396df3c Enable sending other child processes crash pings from the CrashManager. r=bsmedberg,ted

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 25

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0788b396df3c

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Alessio Placitelli [:Dexter]

Assignee

Comment 26

•

9 years ago

Hi :milan! Before flagging for QA validation on this bug, could you suggest any way to trigger a "GPU crash" in Firefox? Is this even possible?

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 27

•

9 years ago

Nightly has a "Terminate GPU Process" button in about:support. The other option is to look up the GPU process ID (also in about:support), find it in the task manager, and terminate the process there.

Flags: needinfo?(milan)

Benjamin Smedberg

Comment 28

•

9 years ago

Terminating the GPU process from the task manager doesn't cause it to crash and won't trigger a crash report. I don't know how "terminate GPU process" works so I can't say whether that triggers a crash or not. But Alessio you should be able to use crashfirefox.exe to trigger a crash; you'll need to specify the PID of the GPU process.

Alessio Placitelli [:Dexter]

Assignee

Comment 29

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #28) > Terminating the GPU process from the task manager doesn't cause it to crash > and won't trigger a crash report. I don't know how "terminate GPU process" > works so I can't say whether that triggers a crash or not. > > But Alessio you should be able to use crashfirefox.exe to trigger a crash; > you'll need to specify the PID of the GPU process. Grand! Thanks Benjamin, I was getting crazy, "crashfirefox.exe" finally did the trick and I can see crash pings being sent. @Abe, could you please validate this bug? Let's make sure that: - Crash pings are generated when a GPU crash happens. They should be sent when Firefox restarts. - They validate correctly against our schema To crash the GPU process you need to build "crashfirefox.exe" from [1]. After its built, you want to execute "crashfirefox.exe <gpu pid>". To get the PID of the GPU process, go to about:support and look for the "GPUProcessPid" entry: the number is the PID. Please note that GPU acceleration might not be available if you're running Firefox in a virtual machine (didn't work for me in Firefox on Ubuntu in a VM). [1] - https://github.com/bsmedberg/crashfirefox-intentionally [2] - https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/telemetry/crash.schema.json

Flags: needinfo?(amasresha)

Benjamin Smedberg

Comment 30

•

9 years ago

Prebuilt crashfirefox binary is available at https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/

Abe - QA (:Abe_LV)

Comment 31

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #30) > Prebuilt crashfirefox binary is available at > https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/ Benjamin, what tools do you suggest to crash the GPU process in Mac and Linux?

Flags: needinfo?(amasresha) → needinfo?(benjamin)

Benjamin Smedberg

Comment 32

•

9 years ago

See instructions at https://developer.mozilla.org/en-US/docs/Mozilla/How_to_report_a_hung_Firefox

Flags: needinfo?(benjamin)

(not currently active) Ted Mielczarek

Comment 33

•

9 years ago

n.b.: killing the GPU process with SIGABRT on OS X won't currently generate a crash report, but bug 1350917 landed on autoland yesterday so it should be in Nightly soon.

Abe - QA (:Abe_LV)

Comment 34

•

9 years ago

I have tested this on 64 bits of Windows-10 with 32 bits of Nightly and it works as expected. Here are the test cases: https://public.etherpad-mozilla.org/p/Bug_1352496#lineNumber=151 However, I have two concerns: First, it seems that we do not support GPU processes in Mac and Linux. So the testing will not cover those operating systems. Second, "crashfirefox.exe" only crashes 32 bits of Firefox, not 64 bits of Firefox. So what other tool do you recommend to crash the GPU process in 64 bits of Firefox? Thanks

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Comment 35

•

9 years ago

(In reply to Abe - QA (:Abe_LV) from comment #34) > I have tested this on 64 bits of Windows-10 with 32 bits of Nightly and it > works as expected. Here are the test cases: > https://public.etherpad-mozilla.org/p/Bug_1352496#lineNumber=151 > > However, I have two concerns: > First, it seems that we do not support GPU processes in Mac and Linux. So > the testing will not cover those operating systems. Ok! Good! > Second, "crashfirefox.exe" only crashes 32 bits of Firefox, not 64 bits of > Firefox. So what other tool do you recommend to crash the GPU process in 64 > bits of Firefox? > Thanks I've compiled the "crashfirefox.exe" executable on my own (x64), and it correctly crashes my x64 Firefox. I've uploaded it for you here: https://drive.google.com/a/mozilla.com/file/d/0B1RACVTBwZ94VENQbFRDbTlkMHM/view?usp=sharing

Flags: needinfo?(alessio.placitelli)

Abe - QA (:Abe_LV)

Comment 36

•

9 years ago

Verified-fixed. We have completed testing on windows 10 with 32 bits and 64 bits of latest nightly, and all is green. Test cases and runs are here:https://public.etherpad-mozilla.org/p/Bug_1352496 Please needinfo or email me if you have questions about the testing. Thanks

Status: RESOLVED → VERIFIED

Alessio Placitelli [:Dexter]

Assignee

Comment 37

•

9 years ago

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. Approval Request Comment [User impact if declined]: The GFX team won't have access to crash ping Telemetry triggered by GPU crashes [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Low risk [Why is the change risky/not risky?]: The changes to enable GPU code are trivial and could only compromise sending GPU crash pings, which were not available prior to this patch. The biggest chunk of the patch is to improve automated test coverage. [String changes made/needed]: None

Attachment #8854004 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Comment 38

•

9 years ago

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. 54 was merged to Beta today.

Attachment #8854004 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?

Alessio Placitelli [:Dexter]

Assignee

Comment 39

•

9 years ago

(In reply to [Out of Office Until 24-April] Ryan VanderMeulen [:RyanVM] from comment #38) > Comment on attachment 8854004 [details] > Bug 1352496 - Enable sending other child processes crash pings from the > CrashManager. > > 54 was merged to Beta today. Whoops, thanks Ryan.

Gerry Chang [:gchang]

Updated

•

9 years ago

status-firefox54: --- → affected

Gerry Chang [:gchang]

Comment 40

•

9 years ago

Comment on attachment 8854004 [details] Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. Fix a crash ping telemetry issue and was verified. Beta54+.

Attachment #8854004 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Carsten Book [:Tomcat]

Comment 41

•

9 years ago

needs rebasing for beta because of problems like : warning: conflicts while merging toolkit/components/crashes/tests/xpcshell/test_crash_manager.js! (edit, then use 'hg resolve --mark') abort: unresolved conflicts, can't continue

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Comment 42

•

9 years ago

Attached patch [BETA ONLY] Rebased patch for beta. (obsolete) — Details — Splinter Review

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Updated

•

9 years ago

Attachment #8859479 - Attachment is patch: true

Ryan VanderMeulen [:RyanVM]

Comment 43

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/c8b0d74c8c41

status-firefox54: affected → fixed

Flags: in-testsuite+

Ryan VanderMeulen [:RyanVM]

Comment 44

•

9 years ago

Backed out for eslint failures in test_crash_manager.js. https://hg.mozilla.org/releases/mozilla-beta/rev/22a981da3926 https://treeherder.mozilla.org/logviewer.html#?job_id=92845452&repo=mozilla-beta

status-firefox54: fixed → affected

Flags: needinfo?(alessio.placitelli)

David Durst [:ddurst]

Reporter

Updated

•

9 years ago

Updated

•

9 years ago

Blocks: 1310716

Ryan VanderMeulen [:RyanVM]

Comment 45

•

9 years ago

Not sure if tied to the eslint failures or not, but the test was failing too. https://treeherder.mozilla.org/logviewer.html#?job_id=92863062&repo=mozilla-beta

Alessio Placitelli [:Dexter]

Assignee

Comment 46

•

9 years ago

Attached patch [BETA ONLY] Rebased patch for beta. — Details — Splinter Review

Gah, sorry, I made a mistake during the rebase, I've tested this locally and it works. Yes, the eslint failures were related. Sorry for the mess. Tomcat, would you mind relanding this patch on beta?

Attachment #8859479 - Attachment is obsolete: true

Flags: needinfo?(alessio.placitelli) → needinfo?(cbook)

Carsten Book [:Tomcat]

Comment 47

•

9 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/c1ef9597ea513c00df914df119d146c45a5cc784 landed :)

status-firefox54: affected → fixed

Flags: needinfo?(cbook)

Andrei Vaida [:avaida]

Comment 48

•

9 years ago

Alessio, should Release QA manually verify this on Beta 54 as well?

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Assignee

Comment 49

•

9 years ago

(In reply to Andrei Vaida [:avaida], Desktop Release QA – please ni? me from comment #48) > Alessio, should Release QA manually verify this on Beta 54 as well? David, do you think that's useful?

Flags: needinfo?(alessio.placitelli) → needinfo?(ddurst)

David Durst [:ddurst]

Reporter

Comment 50

•

9 years ago

Yes! I'm always in favor of verification.

Flags: needinfo?(ddurst)

Andrei Vaida [:avaida]

Comment 51

•

9 years ago

(In reply to David Durst [:ddurst] from comment #50) > Yes! I'm always in favor of verification. Roger that. We can use Comment 36 for context.

Flags: qe-verify+

Petruta Horea [:phorea]

Comment 52

•

9 years ago

Verified as fixed using Firefox 54 beta 7 32- and 64-bit under Win 10 64-bit.

status-firefox54: fixed → verified

status-firefox55: fixed → verified

Flags: qe-verify+

Bug 1352496 - Enable sending other child processes crash pings from the CrashManager. 9 years ago Alessio Placitelli [:Dexter] 59 bytes, text/x-review-board-request	ted : review+ benjamin : review+ gchang : approval-mozilla-beta+	Details
[BETA ONLY] Rebased patch for beta. 9 years ago Alessio Placitelli [:Dexter] 7.68 KB, patch		Details \| Diff \| Splinter Review
[BETA ONLY] Rebased patch for beta. 9 years ago Alessio Placitelli [:Dexter] 7.86 KB, patch		Details \| Diff \| Splinter Review