Closed Bug 1401400 Opened 7 years ago Closed 7 years ago

Shutdown hangs don't produce useful crash pings

Categories

(Toolkit :: Crash Reporting, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: agashlin, Assigned: agashlin)

Details

Attachments

(5 files, 8 obsolete files)

4.94 KB, patch
gsvelto
: review+
Details | Diff | Splinter Review
2.33 KB, patch
gsvelto
: review+
Details | Diff | Splinter Review
1.20 KB, patch
gsvelto
: review+
Details | Diff | Splinter Review
50.67 KB, patch
gsvelto
: review+
Details | Diff | Splinter Review
3.03 KB, patch
agashlin
: review+
Details | Diff | Splinter Review
This is somewhat related to bug 1398830 but I think it is different from hangs in general as it deals with the main process and doesn't use the "PluginHang" annotation mechanism.

RunWatchdog crashes [1] on shutdown hang, and Socorro specifically looks for this function on the stack in order to instead report the main thread stack [2]. Crash ping analysis will likely want to do the same, but as crash pings currently (since bug 1380252) only include one thread (the crashing/requesting thread), it won't be able to look at other threads. Thus all shutdown hangs will be grouped together as far as crash pings are concerned.

I think we can fix this with a crash annotation just before the MOZ_CRASH() to indicate that we should pass --full to minidump-analyzer, which will include all thread stacks.

[1] http://searchfox.org/mozilla-central/rev/1c13d5cf85f904afb8976c02a80daa252b893fca/toolkit/components/terminator/nsTerminator.cpp#160
[2] https://github.com/mozilla-services/socorro/blob/d4b8836a8db7ef05c81dc94518e677319327dd2a/socorro/signature/signature_utilities.py#L579-L600
Assignee: nobody → agashlin
Status: NEW → ASSIGNED
As observed back in bug 1280477 comment 21 there isn't a test of the crashreporter client in order to make sure the AllThreads annotation is being handled by the crashreporter, but I have tested it manually. The test included in this patch at least ensures that we have the annotation.
Attachment #8911967 - Attachment is obsolete: true
Fix executable mode added by accident.
Attachment #8911985 - Attachment is obsolete: true
Attachment #8911965 - Flags: review?(gsvelto)
Attachment #8911998 - Flags: review?(gsvelto)
Attachment #8912007 - Flags: review?(gsvelto)
Attachment #8912033 - Flags: review?(gsvelto)
Sorry for the delay, after inspecting the code I think that we have a better alternative to implement this:

- For process hangs it should be sufficient to gather all threads when the crash type is set CRASH_TYPE_HANG [1]. From what I can tell this covers plugin processes only, hang detection doesn't seem to cover content processes.

- For main process hangs - both shutdown and regular ones [2][3] - an easier way than to add an annotation is to provide a global flag in the crash reporter that will instruct the client to gather all stack traces from the dump. Something like SetMinidumpAnalysisAllThreads() exposed from nsExceptionHandler.h. You'll then need a way to communicate this to the client, possibly by adding an optional parameter to it (like we do with the minidump-analyzer) and then adding it when the client is invoked via LaunchProgram() [4].

Does this sound reasonable? It might require a little more code on the crashreporter client side but we'll avoid having to read the .extra file twice.

[1] https://dxr.mozilla.org/mozilla-central/rev/97efdde466f18cf580fda9673cf4c38ee21fc7b7/toolkit/components/crashes/CrashService.js#169
[2] https://dxr.mozilla.org/mozilla-central/rev/97efdde466f18cf580fda9673cf4c38ee21fc7b7/toolkit/components/terminator/nsTerminator.cpp#160
[3] https://dxr.mozilla.org/mozilla-central/rev/97efdde466f18cf580fda9673cf4c38ee21fc7b7/xpcom/threads/HangMonitor.cpp#122
[4] https://dxr.mozilla.org/mozilla-central/rev/97efdde466f18cf580fda9673cf4c38ee21fc7b7/toolkit/crashreporter/nsExceptionHandler.cpp#821
Comment on attachment 8911965 [details] [diff] [review]
Part 1. Support AllThreads crash annotation

Clearing review flags for now.
Attachment #8911965 - Flags: review?(gsvelto)
Attachment #8911998 - Flags: review?(gsvelto)
Attachment #8912007 - Flags: review?(gsvelto)
Attachment #8912033 - Flags: review?(gsvelto)
I'm not clear on who's got the next step here.
Flags: needinfo?(gsvelto)
We discussed the approach described in comment 8 last week (IIRC) so Adam will pick it up again soon.
Flags: needinfo?(gsvelto)
Attachment #8911965 - Attachment is obsolete: true
Attachment #8911998 - Attachment is obsolete: true
Attachment #8912007 - Attachment is obsolete: true
Attachment #8912033 - Attachment is obsolete: true
Attachment #8920398 - Flags: review?(gsvelto)
Comment on attachment 8920399 [details] [diff] [review]
Part 2. Analyze all threads for "hang" type crashes

Review of attachment 8920399 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8920399 - Flags: review?(gsvelto) → review+
Comment on attachment 8920400 [details] [diff] [review]
Part 3. Process all threads when crashing for hangs

Review of attachment 8920400 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8920400 - Flags: review?(gsvelto) → review+
Comment on attachment 8920398 [details] [diff] [review]
Part 1. Global setting to analyze all threads

LGMT though I'd really love to see this covered with tests. Adding a test for the CrashService functionality should be trivial in toolkit/components/crashes/tests/xpcshell/test_crash_service.js. As for actual shutdown hangs I'm not sure if we have integration tests covering those. If we do then we should extend them, if we don't then let's open a bug to write them.
Attachment #8920398 - Flags: review?(gsvelto) → review+
Right, I've only manually tested the shutdown hangs, across Windows, Linux and Mac. We do have a test for the shutdown terminator [1] but I'm not sure how this could be extended to explicitly test the new functionality.

I'll look into adding the test for CrashService, that should be pretty similar to what I had done for the previous patch series.

[1] http://searchfox.org/mozilla-central/source/toolkit/crashreporter/test/unit/test_crash_terminator.js
(In reply to Adam Gashlin [:agashlin] from comment #18)
> Right, I've only manually tested the shutdown hangs, across Windows, Linux
> and Mac. We do have a test for the shutdown terminator [1] but I'm not sure
> how this could be extended to explicitly test the new functionality.

It should be rather easy, the `extra` parameter in the after_crash() function holds the contents of the .extra file, so it should have the StackTraces field populated. You'd have to check that `StackTraces.threads.length` is larger than 1.

> I'll look into adding the test for CrashService, that should be pretty
> similar to what I had done for the previous patch series.

Yep.
Attachment #8922492 - Flags: review?(gsvelto) → review+
Comment on attachment 8922493 [details] [diff] [review]
Part 5. Test analyzing all threads of a hang through crash service

Excellent, let's land this.
Attachment #8922493 - Flags: review?(gsvelto) → review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0290bb0af376
Part 1: Option to analyze all threads. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa28abe68df1
Part 2: Dump all threads for "hang" type crashes. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5860d5128fb
Part 3: Process all threads when crashing for hang. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ccd39a02c21
Part 4: Test real crash dump, multiple threads. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/788f5831a14d
Part 5: Test processing all threads for hang. r=gsvelto
Keywords: checkin-needed
Fixed the eslint semicolon issue, running a try suite in case I missed something else: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de2618667452533090e77ec5e3cbd6d4b4371b13
Attachment #8922493 - Attachment is obsolete: true
Flags: needinfo?(agashlin)
Attachment #8922897 - Flags: review+
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fbd7462750c6
Part 1. Option to analyze all threads. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5219bf056ec
Part 2. Dump all threads for "hang" type crashes. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/687bbdd7afaf
Part 3. Process all threads when crashing for hang. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/b5b0c49cc271
Part 4. Test real crash dump, multiple threads. r=gsvelto
https://hg.mozilla.org/integration/mozilla-inbound/rev/6873f266d7a6
Part 5. Test processing all threads for hang r=gsvelto
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: