Closed
Bug 555309
Opened 15 years ago
Closed 15 years ago
Hang crash reports have (no signature)
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: benjamin, Assigned: cjones)
References
Details
(Whiteboard: [land lorentz])
Attachments
(5 files)
4.27 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
19.53 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
3.90 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
7.35 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
8.88 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
Crash reporets that are a result of a plugin hang currently end up with (no signature). This is not useful at all.
e.g. parent-side http://crash-stats.mozilla.com/report/index/7b2399aa-1ef1-48ab-be59-35a6a2100326
child-side http://crash-stats.mozilla.com/report/index/e50ae8e1-9381-4181-b4ea-11bdc2100326
"No thread was identified as the cause of the crash; No signature could be created because we do not know which thread crashed".
What we want is the signature of the main thread (for the parent process) and the plugin thread (in the child process). My thought was simply to annotate (in the .extra file) from which thread socorro should generate the signature. If we can just fix the dump file itself, that's fine too.
If we go the .extra route, we need to file a socorro bug too, so let me know how you want to do it.
Comment 1•15 years ago
|
||
I have doubts that this is going to produce anything really useful either way. You're dumping live code in both processes, so even for the same hang the chances of getting the same signature don't seem really likely.
Anyway, if you want this to show up we'll have to put a breakpad info stream in the dump (which I asked cjones not to when he wrote that patch, bug 544936 comment 47). The processor uses the crashed or requesting thread from the exception stream or breakpad info stream respectively, and Socorro follows that:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/minidump_processor.cc#73
Reporter | ||
Comment 2•15 years ago
|
||
If the process is dead-hung, which is by far the most common thing we've seen, the stack in the child and parent are both likely to be relatively stable. It seemed to me that modifying the minidump was the hard path, and just telling socorro "take the signature from thread N" would be easier.
The stacks from the parent are going to require additional work, I think. e.g. with the parent crash above you want the frame below CrashReporter::CreatePairedMinidumps, and you probably want to continue all the way below mozilla::ipc::RPCChannel::Call which is where the variability is really going to start.
Reporter | ||
Comment 3•15 years ago
|
||
FWIW, I think that getting usable statistical data out of these reports is vital for the beta. I think we can do it manually if necessary, but I'd really like to end up with socorro doing it for us pretty quickly.
Assignee | ||
Comment 4•15 years ago
|
||
I suspect the best solution would be extracting this information from fully-processed dumps on crash-stats, but we might be able to hack some exception info in the mean time.
Assignee | ||
Comment 5•15 years ago
|
||
Hacking this into the linux code wasn't /too/ bad. Here's a thread dump with the new code
http://crash-stats.mozilla.com/report/index/aebaa3e3-eb15-48ea-a590-90c062100329
(No stack, huh. This is a debug build, but I guess socorro has been switched to eh_frame-based stackwalking exclusively?) Anyway, on linux I'm marking the plugin thread as having crashed on a SIGSTOP, which should easily distinguish it from "real" crashes.
The plugin subprocess reports the plugin thread ID back during NP_Initialize(). If for some reason the plugin thread has died before we dump it, then we get the same kind of minidumps as the current code: "No crashing thread could be identified".
Looking into windows now.
Assignee | ||
Comment 6•15 years ago
|
||
From what I can tell, the best we'll be able to do on windows is identify the plugin thread in the dump, but not set its "crashing address". To do that, we'd need access to the plugin thread's CONTEXT while it's paused by MiniDumpWriteDump(). I'll see what socorro does with a crash address of "0"; if it looks at the top of the crashing thread's stack, we're back in business. Otherwise we'll need another approach for windows. (ISTR it just uses the 0 address.)
One hacky option that requires socorro cooperation is to set a magic crash address, say 42, that says "look at the top of the crashed thread's stack."
Assignee | ||
Comment 7•15 years ago
|
||
(In reply to comment #6)
> One hacky option that requires socorro cooperation is to set a magic crash
> address, say 42, that says "look at the top of the crashed thread's stack."
Actually, this wouldn't need socorro cooperation, just a change to minidump_processor.
Assignee | ||
Comment 8•15 years ago
|
||
Looks like another option is to |OpenThread(pluginThreadId); SuspendThread(); GetThreadContext();| and hope that the thread is actually hung, i.e. the context doesn't change until MiniDumpWriteDump() finishes. MiniDumpWriteDump() might go ahead and properly dump an already-attached thread too; that's another option.
Assignee | ||
Comment 9•15 years ago
|
||
Went with the SuspendThread()/GetThreadContext()/MiniDumpWriteDump() approach; working quite well indeed. Here's what the plugin-side crashes look like
http://crash-stats.mozilla.com/report/index/bba7e1e4-13c6-46ac-8f6e-101932100329
I used EXCEPTION_BREAKPOINT to distinguish them from real crashes. Looks like we'll need some new filters to chomp the kernelbase/ntdll frames.
Interestingly enough, I didn't touch the browser-side dumping code, but see a signature there nonetheless
http://crash-stats.mozilla.com/report/index/500bb5ed-8c21-48e1-8df5-c351a2100329
Assignee | ||
Comment 10•15 years ago
|
||
OK, this is a weird interface to expose through crashreporter. Granted. The reason it's done is that on linux, the gettid() syscall isn't exposed through libc, and breakpad goes to great lengths to do so correctly on several platforms. This patch is my attempt to not duplicate that work.
reviews -> :bs b/c of ted's upcoming paternity leave.
Attachment #435831 -
Flags: review?(benjamin)
Assignee | ||
Comment 11•15 years ago
|
||
Attachment #435833 -
Flags: review?(benjamin)
Assignee | ||
Comment 12•15 years ago
|
||
Separated into parts (a) and (b) for easier reviewing. They don't compile separately, so I'll qfold them before pushing.
Attachment #435834 -
Flags: review?(benjamin)
Assignee | ||
Comment 13•15 years ago
|
||
Attachment #435835 -
Flags: review?(benjamin)
Comment 14•15 years ago
|
||
(In reply to comment #5)
> (No stack, huh. This is a debug build, but I guess socorro has been switched
> to eh_frame-based stackwalking exclusively?)
That's an x86-64 build. We can't walk the stack at all without symbols on the server.
Reporter | ||
Updated•15 years ago
|
Attachment #435831 -
Flags: review?(benjamin) → review+
Reporter | ||
Updated•15 years ago
|
Attachment #435833 -
Flags: review?(benjamin) → review+
Reporter | ||
Updated•15 years ago
|
Attachment #435834 -
Flags: review?(benjamin) → review+
Reporter | ||
Comment 15•15 years ago
|
||
Comment on attachment 435834 [details] [diff] [review]
Part 2b: Allow a particular subprocess thread to be "blamed" in an OOP dump
The indentation in exception_handler.cc is a little weird, but other than that this looks great.
Reporter | ||
Comment 16•15 years ago
|
||
Comment on attachment 435835 [details] [diff] [review]
Part 3: Grab the plugin thread's ID on startup and blame it in hang dumps
Is `0` a good default for mPluginThread?
Attachment #435835 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 17•15 years ago
|
||
Here's a windows browser-side dump with Part 4
http://crash-stats.mozilla.com/report/index/9dada40d-86b1-43a0-ade9-459b82100330
and linux
http://crash-stats.mozilla.com/report/index/bp-38fa0e43-137c-49fb-9303-461fd2100330
I took the same approach for these as for the plugin-side dumps, identifying the crash reason as EXCEPTION_BREAKPOINT/SIGSTOP. Patch here in a tick.
Assignee | ||
Comment 18•15 years ago
|
||
Relatively simple patch, so I stuffed the windows and linux changes together.
Attachment #436000 -
Flags: review?(benjamin)
Assignee | ||
Comment 19•15 years ago
|
||
(In reply to comment #15)
> (From update of attachment 435834 [details] [diff] [review])
> The indentation in exception_handler.cc is a little weird, but other than that
> this looks great.
Thanks, my windows emacs isn't configured correctly.
(In reply to comment #16)
> (From update of attachment 435835 [details] [diff] [review])
> Is `0` a good default for mPluginThread?
It is on linux, and per jimm's test on IRC windows treats 0 as an invalid thread id.
Reporter | ||
Updated•15 years ago
|
Attachment #436000 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 20•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/3232c9725125
http://hg.mozilla.org/mozilla-central/rev/8202b93d6396
http://hg.mozilla.org/mozilla-central/rev/17fc9010307d
http://hg.mozilla.org/mozilla-central/rev/e8b52ed02f99
Whiteboard: [land lorentz]
Reporter | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 21•15 years ago
|
||
http://crash-stats.mozilla.com/report/index/a94540ca-746f-403f-935e-666a72100331
http://crash-stats.mozilla.com/report/index/57d6a823-1feb-47d4-9f30-be9392100331
From today's nightly.
Status: RESOLVED → VERIFIED
Updated•3 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•