Closed Bug 544936 Opened 15 years ago Closed 15 years ago

Kill hung plugins in a way that generates minidumps, and grab browser minidumps around the same time

Categories

(Core Graveyard :: Plug-ins, enhancement)

enhancement
Not set
normal

Tracking

(status1.9.2 .4-fixed)

RESOLVED FIXED
Tracking Status
status1.9.2 --- .4-fixed

People

(Reporter: cjones, Assigned: cjones)

References

Details

(Whiteboard: [fixed-lorentz])

Attachments

(12 files, 15 obsolete files)

11.41 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
22.75 KB, patch
Dolske
: review+
jaas
: review+
Gavin
: review+
Details | Diff | Splinter Review
48.38 KB, application/zip
Details
24.78 KB, patch
Details | Diff | Splinter Review
24.18 KB, patch
ted
: review+
Details | Diff | Splinter Review
17.23 KB, patch
Details | Diff | Splinter Review
15.29 KB, patch
ted
: review+
Details | Diff | Splinter Review
17.70 KB, patch
ted
: review+
Details | Diff | Splinter Review
14.72 KB, application/octet-stream
Details
51.48 KB, application/zip
Details
1.54 KB, patch
Details | Diff | Splinter Review
775 bytes, patch
benjamin
: review+
Details | Diff | Splinter Review
Currently this is pretty easy, because hang detection is only done in the browser process. It's more complicated if the plugin process starts doing hang detection too.
Blocks: 544940
Based on the revised UI direction, I think bug 544095 would be most easily implemented by re-using all the plugin-crashed machinery we'll soon have in place. So, we'll need to "make it look like an accident" when the plugin hangs.
Blocks: 544095
Summary: Get plugin and browser minidumps when users choose to kill a hung plugin → Kill hung plugins in a way that generates minidumps, and grab browser minidumps around the same time
Assignee: nobody → jones.chris.g
I looked into using breakpad APIs for this so as to avoid intentionally crashing the plugin process, but no simple solution presented itself. It would have been relatively straightforward on linux, but windows looked to be a pain.
Attachment #430548 - Flags: review?(benjamin)
I'm not what I would need to do to get the "BrowserCorrelate" info into the socorro (?) UI, but I'm also not sure it's a high priority.
Attachment #430551 - Flags: review?(benjamin)
Will look into what's required to get the new browser dumps submitted. If it's beyond me I'll file a followup or pass the baton.
Comment on attachment 430548 [details] [diff] [review] Part 3: Give toplevel IPDL parent actors a KillForMinidump() interface that takes down the child side >diff --git a/ipc/glue/AsyncChannel.cpp b/ipc/glue/AsyncChannel.cpp > void > AsyncChannel::OnMessageReceived(const Message& msg) > { > AssertIOThread(); > NS_ASSERTION(mChannelState != ChannelError, "Shouldn't get here!"); > >+ // intercept this message on the IO thread because the worker may >+ // be blocked >+ if (DIE_WITH_MINIDUMP_MESSAGE_TYPE == msg.type()) { >+ NS_ABORT_IF_FALSE(mChild, "supposed to only be sent from parent!"); >+ DieWithMinidump_SEE_WORKER_THREAD(); The NS_ABORT_IF_FALSE isn't strong enough. What if the child process is owned, or really confused, and sends this message anyway? The parent process shouldn't die (in release builds) because of it. Also it seems that we could just call NS_RUNTIMEABORT here, especially if there was a way to get the abort string to the crash report output. Finally, will the fact that this is a hang be visible in the crash reporter minidump other than the stack info itself? E.g. do we send anything in the .extra file (comments/custom annotations)?
(In reply to comment #7) > Also it seems that we could just call NS_RUNTIMEABORT here, especially if there > was a way to get the abort string to the crash report output. > I originally was using NS_RUNTIMEABORT(), but it wasn't generating a minidump in my debug build. After a good night's sleep, I realize now that this might have something to do with SIGTRAP. > Finally, will the fact that this is a hang be visible in the crash reporter > minidump other than the stack info itself? E.g. do we send anything in the > .extra file (comments/custom annotations)? Sort of, see Part 4. On a hang, I set a "BrowserCorrelate" field in the minidump with the (non-empty) browser-side crash ID. Do you have something like a "CrashReason" field in mind?
Comment on attachment 430551 [details] [diff] [review] Part 4: Collect paired browser/plugin minidumps when the hang timeout expires The correlation requirements are in bug 544940... basically we should generate a HangID and submit it with both the child and browser .extra file, and socorro will associate the two. I suppose that could be one of the existing dump IDs, if we wanted. I'm disappointed that we can't just get a minidump directly from the child process. It seems like it should be possible!
The "Notes" field is human-readable data which will be displayed by Socorro (see nsICrashReporter.appendAppNotesToCrashReport. As noted in one of these bugs (oops, forgot to submit that review), you should use HangID because that's what we specced for socorro, but that won't show up in the crash UI until they implement it.
Comment on attachment 430548 [details] [diff] [review] Part 3: Give toplevel IPDL parent actors a KillForMinidump() interface that takes down the child side I'll give generating "friendly" dumps OOP another shot.
Attachment #430548 - Flags: review?(benjamin)
Comment on attachment 430547 [details] [diff] [review] Part 2: Add a CrashReporter::WriteMinidump() API for dumping the current process Per IRC, will try using a custom callback instead of going through the common path, since we're in a "safe" context.
Attachment #430547 - Flags: review?(ted.mielczarek)
Comment on attachment 430546 [details] [diff] [review] Part 1: Split MinidumpCallback logic into .extra writing and crashreporter launching, and sprinkle some XP pixie dust Clearing review per IRC discussion, cjones is working up new patches.
Attachment #430546 - Flags: review?(ted.mielczarek)
Comment on attachment 430551 [details] [diff] [review] Part 4: Collect paired browser/plugin minidumps when the hang timeout expires This API is likely to change.
Attachment #430551 - Flags: review?(benjamin)
Quick note: one issue we discussed with this patch on linux was possibly dumping or killing a reincarnated PID if a plugin crash raced with the hang detector. We won't: we only start "watching" a subprocess for the opportunity to clean it up after ~PluginModuleParent, so that can't race with the hang detector. That said, a plugin crash racing with hang detection might cause either the crash generation server or the main-thread minidump generator to fail, but (i) in either case we should get a minidump; (ii) there's no way around that; (iii) breakpad appears to deal already. I did come across bug 551392 while checking this though.
Giving feedback? a try. The idea is to SIGSTOP the subprocess's main thread, grab its context, dump the subprocess as if the main thread had crashed, then SIGCONT the main thread. The rest of the new code required is pretty straightforward: essentially just converting between register struct formats.
Attachment #430546 - Attachment is obsolete: true
Attachment #430547 - Attachment is obsolete: true
Attachment #430548 - Attachment is obsolete: true
Attachment #430551 - Attachment is obsolete: true
Attachment #431581 - Flags: feedback?(ted.mielczarek)
Attachment #431581 - Flags: feedback?(jim)
Attachment #431581 - Attachment is obsolete: true
Attachment #431581 - Flags: feedback?(ted.mielczarek)
Attachment #431581 - Flags: feedback?(jim)
Comment on attachment 431581 [details] Abstraction of linux implementation of ExceptionHandler::WriteMinidumpForChild() I suspect, but didn't check, that delivering SIGSTOP will put the thread in a signal context, which would mess up the dump. I think it'll be easier to directly PTRACE_ATTACH the main thread and refactor the dumper code to deal with that.
I have the linux/breakpad and CrashReporter::+PluginModuleParent work done. Windows/breakpad remains. Because even after that's done a possibly nontrivial amount of work (UI?) remains to propagate browser dumps up into firefox land, I'm going to go ahead and request review on what I have.
I inadvertently confirmed that this doesn't mess with the browser crashreporter by crashing while testing some other code :). Firefox still went into the normal exception handler.
Attachment #432083 - Flags: review?(ted.mielczarek)
Part 2 will be the windows/breakpad work.
Attachment #432084 - Flags: review?(ted.mielczarek)
->:bs to distribute the review load, please reassign if appropriate.
Attachment #432086 - Flags: review?(benjamin)
I don't know where the browser dumps will go from here. Someone else is probably going to have to take on parts 6+.
Wow, like, easiest patch evar. I wrote this on linux while waiting for my windows build, so it may not compile, but this is the general idea. The biggest snag (not that big) is that it appears to be difficult to get a process ID from a process handle on windows; the code I included for this I copied verbatim from chromium.
Attachment #432101 - Flags: review?(ted.mielczarek)
Some windows fixes filtered up the patch series, so to keep things saner, I'll repost from here on up.
Attachment #432101 - Attachment is obsolete: true
Attachment #432285 - Flags: review?(ted.mielczarek)
Attachment #432101 - Flags: review?(ted.mielczarek)
Attachment #432084 - Attachment is obsolete: true
Attachment #432286 - Flags: review?(ted.mielczarek)
Attachment #432084 - Flags: review?(ted.mielczarek)
I've got the s/minidumpID/pluginDumpId, browserDumpID/ patch finished up, but I'm getting an error from here function writeSubmittedReport(crashID, viewURL) { let directoryService = Cc["@mozilla.org/file/directory_service;1"]. getService(Ci.nsIProperties); let reportFile = directoryService.get("UAppData", Ci.nsIFile); reportFile.append("Crash Reports"); reportFile.append("submitted"); reportFile.append(crashID + ".txt"); var fstream = Cc["@mozilla.org/network/file-output-stream;1"]. createInstance(Ci.nsIFileOutputStream); // open, write, truncate fstream.init(reportFile, -1, -1, 0); ************************************************************ * Call to xpconnect wrapped JSObject produced this error: * [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFileOutputStream.init]" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: file:///home/cjones/mozilla/ff-dbg/dist/bin/modules/CrashSubmit.jsm :: writeSubmittedReport :: line 173" data: no] ************************************************************ which doesn't make sense because the fstream.init() is supposed to be using O_CREAT. I'll sort this and get part 6 up for review tomorrow.
That's probably bug 517404, where this fails if the "submitted" directory doesn't exist yet (e.g. standalone crashreporter has never run). Please fix!
Comment on attachment 432286 [details] [diff] [review] Part 3: Add a CrashReporter API to create a child/parent minidump pair with its own GUID, v2 Need more work.
Attachment #432286 - Flags: review?(ted.mielczarek)
Comment on attachment 432287 [details] [diff] [review] Part 4: Move some common code for dealing with minidumps and .extra files into nsExceptionHandler, v2 Needs more work.
Attachment #432287 - Flags: review?(ted.mielczarek)
Sorry, part 5 is going to be out of order.
Attachment #432286 - Attachment is obsolete: true
Attachment #432923 - Flags: review?(ted.mielczarek)
(In reply to comment #35) > Created an attachment (id=432925) [details] > Part 6: Submit a browser minidump along with the plugin's, if it exists Ted, with this patch, minidump submission seems to be working fine from the browser's perspective, but the dumps don't seem to "stick" in socorro. I get the "Oh noes! Report not found" message. Any suggestions for debugging?
Hopefully there's something obviously wrong with the .dmp's or .extra's.
Comment on attachment 432925 [details] [diff] [review] Part 6: Submit a browser minidump along with the plugin's, if it exists Plugin module changes look fine to me.
Attachment #432925 - Flags: review?(joshmoz) → review+
I again tried to submit a mozilla-runtime SIGSEGV crash report, and although that worked a few days ago (though didn't display anything useful in crash-stats because it was from an x86-64 machine), today it didn't appear within 5 minutes. I wonder if crash-stats is just under high load? Another option is that it previously was accepting x86-64 crash reports and now isn't. I'll try from an x86 machine.
Comment on attachment 432925 [details] [diff] [review] Part 6: Submit a browser minidump along with the plugin's, if it exists >- submitReport : function(minidumpID) { >+ submitReport : function(dumpIDs) { >+ let { pluginDumpID: pluginDumpID, browserDumpID: browserDumpID } = dumpIDs; Looks fine to me, though having to deal with this 2-property object seems like a hassle. Would probably be easier to make submitReport take two args, and change addLinkClickCallback to have callbackArg1 and callbackArg2 (or be clever with |arguments| to support any number...). Over to gavin since I'm not a browser reviewer.
Attachment #432925 - Flags: review?(gavin.sharp)
Attachment #432925 - Flags: review?(dolske)
Attachment #432925 - Flags: review+
Comment on attachment 432925 [details] [diff] [review] Part 6: Submit a browser minidump along with the plugin's, if it exists I concur with dolske re: submitReport's arguments. Also need to remove those dump()s before landing.
Attachment #432925 - Flags: review?(gavin.sharp) → review+
Attachment #432288 - Flags: review?(benjamin) → review+
Quick note: the nsExceptionHandler patches have bitrotted somewhat from the backout of bug 551392, and now that we build --enable-ipc on OS X they'll need stub impls there. But I don't foresee these changes being major enough to request re-review. Please advise.
(In reply to comment #43) > and now that we build --enable-ipc on OS X they'll need > stub impls there. Nm, my patches busted on tryserver for an unrelated reason. We'll need these but it shouldn't block this bug.
Comment on attachment 432083 [details] [diff] [review] Add the ability to generate a minidump of a child process at any time (linux only) >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc >+// static >+bool ExceptionHandler::WriteMinidumpForChild(pid_t child, >+ const std::string &dump_path, >+ MinidumpCallback callback, >+ void *callback_context) >+{ >+ // This function is not run in a compromised context. >+ ExceptionHandler eh(dump_path, NULL, NULL, NULL, false); >+ if (!google_breakpad::WriteMinidump(eh.next_minidump_path_c_, child)) >+ return false; I think you need to UpdateNextID() here, otherwise you'll wind up reusing the GUID the next time a dump is generated. >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h >+ // Assign a new GUID to |guid| if creation was successful. |guid| >+ // is unchanged if unsuccessful. Return true if successful. >+ static bool CreateGUID(std::string& guid); This is a weird thing to expose from this class, since it doesn't really have anything to do with exception handling. I think you should just use the other GUID code directly, or use the gecko platform UUID generation instead of exposing this from this class. >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc >@@ -54,68 +54,96 @@ > #include <algorithm> > > #include "client/linux/minidump_writer/directory_reader.h" > #include "client/linux/minidump_writer/line_reader.h" > #include "common/linux/file_id.h" > #include "common/linux/linux_libc_support.h" > #include "common/linux/linux_syscall_support.h" > >-// Suspend a thread by attaching to it. >-static bool SuspendThread(pid_t pid) { >+namespace google_breakpad { >+ >+bool AttachThread(pid_t pid) { These renamings are a little weird considering they're still called from Threads{Suspend,Resume}. >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h >+// Suspend a thread by attaching to it. >+bool AttachThread(pid_t pid); >+ >+// Resume a thread by detaching from it. >+bool DetachThread(pid_t pid); >+ >+// Fill |info| with the register state of |info->tid|. The thread >+// must be attached to the calling process. Return true on success. >+bool GetThreadRegisters(ThreadInfo* info); Is there any reason not to just make these static member functions of LinuxDumper? >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc > // Juggle an x86 user_(fp|fpx|)regs_struct into minidump format > // out: the minidump structure > // info: the collection of register structures. >-static void CPUFillFromThreadInfo(MDRawContextX86 *out, >- const google_breakpad::ThreadInfo &info) { >+static void CPUFillFromThreadInfo(MDRawContextX86 *out, ThreadInfo &info) { You dropped a 'const' here. >+static uintptr_t InstructionPointer(const ThreadInfo& info) { >+ return info.regs.eip; >+} FYI, I landed the Linux/ARM support on trunk, so you'll have to implement these for ARM as well. >-static void CPUFillFromThreadInfo(MDRawContextAMD64 *out, >- const google_breakpad::ThreadInfo &info) { >+static void CPUFillFromThreadInfo(MDRawContextAMD64 *out, ThreadInfo &info) { Dropped the const here too. >+bool WriteMinidump(const char* filename, pid_t process) { >+ // The scheme is >+ // - attach to the main thread of |process| (aka |(tid)process|) >+ // - grab its context from whereever ptrace stopped it >+ // - proceed dumping |process| as if the main thread had crashed >+ // - detach from the main thread of |process| >+ // There are many race conditions here, but all should manifest as >+ // failed syscalls. >+ >+ if (!AttachThread(process)) >+ return false; >+ >+ // from here on, we have to ensure that the process is detached before >+ // returning >+ struct AutoDetach { >+ AutoDetach(pid_t proc) : proc_(proc) { } >+ ~AutoDetach() { DetachThread(proc_); } >+ pid_t proc_; >+ } detach(process); >+ >+ ThreadInfo info; >+ my_memset(&info, 0, sizeof(info)); >+ >+ info.tid = process; >+ if (!GetThreadRegisters(&info)) >+ return false; >+ >+ // we'll pretend that the main thread "crashed" on SIGSTOP >+ siginfo_t siginfo; >+ my_memset(&siginfo, 0, sizeof(siginfo)); >+ siginfo.si_signo = SIGSTOP; >+ siginfo.si_addr = reinterpret_cast<void*>(InstructionPointer(info)); I'd prefer to have us just write a minidump without an exception stream. It's valid to do so, we just won't identify any thread as having crashed. Since I don't think we can get reliable signatures here anyway (since the process you're dumping is still running), I don't think we should worry about this for now. r- for a few things there, but overall it looks like a good approach.
Attachment #432083 - Flags: review?(ted.mielczarek) → review-
(In reply to comment #45) > (From update of attachment 432083 [details] [diff] [review]) > >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc > >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc > >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc > >+// static > >+bool ExceptionHandler::WriteMinidumpForChild(pid_t child, > >+ const std::string &dump_path, > >+ MinidumpCallback callback, > >+ void *callback_context) > >+{ > >+ // This function is not run in a compromised context. > >+ ExceptionHandler eh(dump_path, NULL, NULL, NULL, false); > >+ if (!google_breakpad::WriteMinidump(eh.next_minidump_path_c_, child)) > >+ return false; > > I think you need to UpdateNextID() here, otherwise you'll wind up reusing the > GUID the next time a dump is generated. > |eh|'s dump ID is private and local, created during its constructor, and |eh| is thrown away after this function returns. > >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc > >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc > >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.cc > >@@ -54,68 +54,96 @@ > > #include <algorithm> > > > > #include "client/linux/minidump_writer/directory_reader.h" > > #include "client/linux/minidump_writer/line_reader.h" > > #include "common/linux/file_id.h" > > #include "common/linux/linux_libc_support.h" > > #include "common/linux/linux_syscall_support.h" > > > >-// Suspend a thread by attaching to it. > >-static bool SuspendThread(pid_t pid) { > >+namespace google_breakpad { > >+ > >+bool AttachThread(pid_t pid) { > > These renamings are a little weird considering they're still called from > Threads{Suspend,Resume}. > I renamed them because the new google_breakpad::WriteMinidump() depends on the thread being attached with ptrace, and "Suspend" doesn't make that clear. Are you OK with this change if I update ThreadsSuspend/Resume too? > > >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h > >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h > >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/linux_dumper.h > >+// Suspend a thread by attaching to it. > >+bool AttachThread(pid_t pid); > >+ > >+// Resume a thread by detaching from it. > >+bool DetachThread(pid_t pid); > >+ > >+// Fill |info| with the register state of |info->tid|. The thread > >+// must be attached to the calling process. Return true on success. > >+bool GetThreadRegisters(ThreadInfo* info); > > Is there any reason not to just make these static member functions of > LinuxDumper? > No, but why do you prefer that? > >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc > >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc > >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc > > // Juggle an x86 user_(fp|fpx|)regs_struct into minidump format > > // out: the minidump structure > > // info: the collection of register structures. > >-static void CPUFillFromThreadInfo(MDRawContextX86 *out, > >- const google_breakpad::ThreadInfo &info) { > >+static void CPUFillFromThreadInfo(MDRawContextX86 *out, ThreadInfo &info) { > > You dropped a 'const' here. > Hmm, ISTR a reason for that at some point, but don't remember anymore. I'll try to jigger things to restore the |const|.
Comment on attachment 432285 [details] [diff] [review] Part 2: Add the ability to generate a minidump of a child process at any time (windows), v2 >diff --git a/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc b/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc >--- a/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc >+++ b/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc >+// Helper for GetProcId() >+bool GetProcIdViaGetProcessId(HANDLE process, DWORD* id) { This seems like a lot of work just to support pre-XPSP1, but I guess we still technically support Win2k. >+// static >+bool ExceptionHandler::WriteMinidumpForChild(HANDLE child, >+ const wstring &dump_path, >+ MinidumpCallback callback, >+ void *callback_context) { >+ DWORD childId = GetProcId(child); >+ if (0 == childId) >+ return false; You have inconsistent indentation here. >+bool ExceptionHandler::WriteMinidumpWithExceptionForProcess( >+ DWORD requesting_thread_id, >+ EXCEPTION_POINTERS* exinfo, >+ MDRawAssertionInfo* assertion, >+ HANDLE process, >+ DWORD processId) { >+ bool success = false; >+ if (minidump_write_dump_) { >+ HANDLE dump_file = CreateFile(next_minidump_path_c_, >+ GENERIC_WRITE, >+ 0, // no sharing >+ NULL, >+ CREATE_NEW, // fail if exists >+ FILE_ATTRIBUTE_NORMAL, >+ NULL); >+ if (dump_file != INVALID_HANDLE_VALUE) { >+ MINIDUMP_EXCEPTION_INFORMATION except_info; >+ except_info.ThreadId = requesting_thread_id; >+ except_info.ExceptionPointers = exinfo; >+ except_info.ClientPointers = FALSE; >+ >+ // Add an MDRawBreakpadInfo stream to the minidump, to provide additional >+ // information about the exception handler to the Breakpad processor. The >+ // information will help the processor determine which threads are >+ // relevant. The Breakpad processor does not require this information but >+ // can function better with Breakpad-generated dumps when it is present. >+ // The native debugger is not harmed by the presence of this information. >+ MDRawBreakpadInfo breakpad_info; >+ breakpad_info.validity = MD_BREAKPAD_INFO_VALID_DUMP_THREAD_ID | >+ MD_BREAKPAD_INFO_VALID_REQUESTING_THREAD_ID; >+ breakpad_info.dump_thread_id = GetCurrentThreadId(); >+ breakpad_info.requesting_thread_id = requesting_thread_id; I don't think it makes sense to put thread ids of threads external to the dumped process here. The processor code only uses this so it can a) skip printing a stack for the live breakpad thread in the process, and b) determine which thread requested the dump. You should just leave this stream out when dumping an external process. >diff --git a/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.h b/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.h >--- a/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.h >+++ b/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.h >+ // Write a minidump of |child| immediately. This can be used to >+ // capture the execution state of |child| independently of a crash. >+ static bool WriteMinidumpForChild(HANDLE child, >+ const wstring &dump_path, >+ MinidumpCallback callback=NULL, >+ void *callback_context=NULL); Google style guide says no default parameters. >+ // Assign a new GUID to |guid| if creation was successful. |guid| >+ // is unchanged if unsuccessful. Return true if successful. >+ static bool CreateGUID(wstring& guid); As I said in the review for the other patch, this doesn't belong here. >- // This function does the actual writing of a minidump. It is called >- // on the handler thread. requesting_thread_id is the ID of the thread >+ // This function is called on the handler thread. If calls into s/If/It/ ? >+ // WriteMinidumpWithExceptionForProcess() with a handle to the >+ // current process. requesting_thread_id is the ID of the thread > // that requested the dump. If the dump is requested as a result of > // an exception, exinfo contains exception information, otherwise, > // it is NULL. > bool WriteMinidumpWithException(DWORD requesting_thread_id, r=me with those things fixed.
Attachment #432285 - Flags: review?(ted.mielczarek) → review+
Ted, are you waiting on linux v2 before reviewing the second two patches? I was waiting on answers to my questions and the second two reviews before picking this up again. I'll post a speculative v2 and bubbled updates to avoid deadlock and more delay.
(In reply to comment #45) > (From update of attachment 432083 [details] [diff] [review]) > >diff --git a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h > >--- a/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h > >+++ b/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.h > >+ // Assign a new GUID to |guid| if creation was successful. |guid| > >+ // is unchanged if unsuccessful. Return true if successful. > >+ static bool CreateGUID(std::string& guid); > > This is a weird thing to expose from this class, since it doesn't really have > anything to do with exception handling. I think you should just use the other > GUID code directly, or use the gecko platform UUID generation instead of > exposing this from this class. I changed this to use nsIUUIDGenerator, and the format is slightly different than breakpad's GUIDs e01ed135-6cf2-4e59-bd24-2bb769bbacbd <- gen UUID 2f37a51e-bac6-1166-0ae9b438-40252e63 <- breakpad GUID Does socorro understand both or will this require some string hacking?
Use nsIUUIDGenerator instead of r-'d ExceptionHandler API.
Attachment #432083 - Attachment is obsolete: true
Attachment #432285 - Attachment is obsolete: true
Attachment #432923 - Attachment is obsolete: true
Attachment #434171 - Flags: review?(ted.mielczarek)
Attachment #432923 - Flags: review?(ted.mielczarek)
(In reply to comment #39) > I again tried to submit a mozilla-runtime SIGSEGV crash report, and although > that worked a few days ago Rechecking this with (presumably) lighter socorro load. Tried each test twice, 1-2-3 both times. firefox-bin segfault: appears on crash-stats quickly (<=2 refreshes) mozilla-runtime segfault: "Oh noes!" timeout hang dump pair: "Oh noes!" timeout, both Not sure what this implies, looking more.
OK, just tried submitting the dump pair again, but monitoring my network activity while submitting. They definitely went over the wire, and I got "Queue Info" for the first time. I wonder if I was hitting some kind of race condition earlier that monitoring network activity slowed me down enough to avoid. Still getting "Oh Noes!" too, but avg. wait is getting pretty high now.
Attachment #434169 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 434169 [details] [diff] [review] Part 1: Add the ability to generate a minidump of a child process at any time (linux only), v2 This looks good, thanks!
Comment on attachment 434171 [details] [diff] [review] Part 3: Add a CrashReporter API to create a child/parent minidump pair with its own GUID, v4 >diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp >--- a/toolkit/crashreporter/nsExceptionHandler.cpp >+++ b/toolkit/crashreporter/nsExceptionHandler.cpp >+struct EnumerateAnnotationsContext { >+ const Blacklist* blacklist; Nit: if you declare this as const Blacklist&, you can avoid the ugly *ctx-> down below. > static bool >+WriteExtraForMinidump(nsILocalFile* minidump, >+ const Blacklist& blacklist, >+ nsILocalFile** extraFile) >+{ ... >+ // Now write out the annotations to it >+ PRFileDesc* fd; >+ rv = extra->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE | PR_TRUNCATE, >+ 0600, &fd); Your indentation is screwed up here. > static void > OnChildProcessDumpRequested(void* aContext, ... >+ >+ { >+#ifdef XP_WIN >+ pid = aClientInfo->pid(); The lexical block here is just to scope the MutexAutoLock, right? > #else >+ pid = aClientInfo->pid_; > #endif Can you add that accessor to the Linux version of this class so you can avoid the ifdef? >+//----------------------------------------------------------------------------- >+// CreatePairedMinidumps() and helpers >+// >+struct PairedDumpContext { >+ nsCOMPtr<nsILocalFile>* minidump; >+ nsCOMPtr<nsILocalFile>* extra; >+ const Blacklist* blacklist; Same comment here as above WRT pointer vs. reference. >+static bool >+PairedDumpCallback(const XP_CHAR* dump_path, >+ const XP_CHAR* minidump_id, >+ void* context, >+#ifdef XP_WIN32 >+ EXCEPTION_POINTERS* /*unused*/, >+ MDRawAssertionInfo* /*unused*/, >+#endif >+ bool succeeded) >+{ >+ PairedDumpContext* ctx = static_cast<PairedDumpContext*>(context); >+ nsCOMPtr<nsILocalFile>& minidump = *ctx->minidump; >+ nsCOMPtr<nsILocalFile>& extra = *ctx->extra; >+ const Blacklist& blacklist = *ctx->blacklist; >+ >+ xpstring dumpFilename(dump_path); >+ dumpFilename += XP_PATH_SEPARATOR; >+ dumpFilename += minidump_id; >+ dumpFilename += dumpFileExtension; This is sort of awkward, but I guess doing anything else (like using nsILocalFile) is a pain cross-platform? >+bool >+CreatePairedMinidumps(ProcessHandle childPid, >+ nsAString* pairGUID, >+ nsILocalFile** childDump, >+ nsILocalFile** parentDump) >+{ >+ if (!GetEnabled()) >+ return false; >+ >+ // create the UUID for the hang dump as a pair >+ nsresult rv; >+ nsCOMPtr<nsIUUIDGenerator> uuidgen = >+ do_GetService("@mozilla.org/uuid-generator;1", &rv); >+ NS_ENSURE_SUCCESS(rv, false); >+ >+ nsID id; >+ rv = uuidgen->GenerateUUIDInPlace(&id); >+ NS_ENSURE_SUCCESS(rv, false); >+ >+ char chars[NSID_LENGTH]; >+ id.ToProvidedString(chars); >+ CopyASCIItoUTF16(chars, *pairGUID); >+ >+ // trim off braces >+ pairGUID->Cut(0, 1); >+ pairGUID->Cut(pairGUID->Length()-1, 1); Bleh, that sucks. WTF does that put in braces for? >+ >+ // dump the child >+ static const char* sChildBlacklist[] = { >+ "FramePoisonBase", >+ "FramePoisonSize", >+ "StartupTime", >+ "URL" >+ }; Is there a reason this is duplicated from the other call site? Might as well just make it file-global. >+ // dump the parent >+ static const char* sParentBlacklist[] = { >+ // anything to blacklist here? >+ "" >+ }; I don't think there's any reason to blacklist anything in the parent, no, since all that data does apply to the parent process. (Change the comment to indicate that.) >+ nsCOMPtr<nsILocalFile> parentMinidump; >+ nsCOMPtr<nsILocalFile> parentExtra; >+ Blacklist parentBlacklist(sParentBlacklist, >+ NS_ARRAY_LENGTH(sParentBlacklist)); >+ PairedDumpContext parentCtx = >+ { &parentMinidump, &parentExtra, &parentBlacklist }; >+ if (!google_breakpad::ExceptionHandler::WriteMinidump( >+ gExceptionHandler->dump_path(), >+ PairedDumpCallback, >+ &parentCtx)) >+ return false; >+ >+ // success >+ if (ShouldReport()) { >+ MoveToPending(childMinidump, childExtra); >+ MoveToPending(parentMinidump, parentExtra); >+ } Hm, you're writing the HangID out to the .extra files in the caller of this function? Seems like it would make sense to just write it here, since you already have all the necessary bits. r=me with those comments addressed.
Attachment #434171 - Flags: review?(ted.mielczarek) → review+
(In reply to comment #57) > (From update of attachment 434171 [details] [diff] [review]) > >diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp > >--- a/toolkit/crashreporter/nsExceptionHandler.cpp > >+++ b/toolkit/crashreporter/nsExceptionHandler.cpp > >+struct EnumerateAnnotationsContext { > >+ const Blacklist* blacklist; > > Nit: if you declare this as const Blacklist&, you can avoid the ugly *ctx-> > down below. > Sure. But I'd trade it for ugly |= ctx->|. In an ideal world the ctx would be a C++ functor. > > static void > > OnChildProcessDumpRequested(void* aContext, > ... > >+ > >+ { > >+#ifdef XP_WIN > >+ pid = aClientInfo->pid(); > > The lexical block here is just to scope the MutexAutoLock, right? > Yup. >+//---------------------------------------------------------------------------- > >+static bool > >+PairedDumpCallback(const XP_CHAR* dump_path, > >+ const XP_CHAR* minidump_id, > >+ void* context, > >+#ifdef XP_WIN32 > >+ EXCEPTION_POINTERS* /*unused*/, > >+ MDRawAssertionInfo* /*unused*/, > >+#endif > >+ bool succeeded) > >+{ > >+ PairedDumpContext* ctx = static_cast<PairedDumpContext*>(context); > >+ nsCOMPtr<nsILocalFile>& minidump = *ctx->minidump; > >+ nsCOMPtr<nsILocalFile>& extra = *ctx->extra; > >+ const Blacklist& blacklist = *ctx->blacklist; > >+ > >+ xpstring dumpFilename(dump_path); > >+ dumpFilename += XP_PATH_SEPARATOR; > >+ dumpFilename += minidump_id; > >+ dumpFilename += dumpFileExtension; > > This is sort of awkward, but I guess doing anything else (like using > nsILocalFile) is a pain cross-platform? > That was my reasoning. With this I just need to from xpstring -> nsILocalFile once with a platform-specific helper. > >+ > >+ // dump the child > >+ static const char* sChildBlacklist[] = { > >+ "FramePoisonBase", > >+ "FramePoisonSize", > >+ "StartupTime", > >+ "URL" > >+ }; > > Is there a reason this is duplicated from the other call site? Might as well > just make it file-global. > In case we wanted to have different blacklists in different situations. Sounds like we don't, so --> file-global. > >+ // dump the parent > >+ static const char* sParentBlacklist[] = { > >+ // anything to blacklist here? > >+ "" > >+ }; > > I don't think there's any reason to blacklist anything in the parent, no, since > all that data does apply to the parent process. (Change the comment to indicate > that.) > Yup, wasn't sure. > >+ nsCOMPtr<nsILocalFile> parentMinidump; > >+ nsCOMPtr<nsILocalFile> parentExtra; > >+ Blacklist parentBlacklist(sParentBlacklist, > >+ NS_ARRAY_LENGTH(sParentBlacklist)); > >+ PairedDumpContext parentCtx = > >+ { &parentMinidump, &parentExtra, &parentBlacklist }; > >+ if (!google_breakpad::ExceptionHandler::WriteMinidump( > >+ gExceptionHandler->dump_path(), > >+ PairedDumpCallback, > >+ &parentCtx)) > >+ return false; > >+ > >+ // success > >+ if (ShouldReport()) { > >+ MoveToPending(childMinidump, childExtra); > >+ MoveToPending(parentMinidump, parentExtra); > >+ } > > Hm, you're writing the HangID out to the .extra files in the caller of this > function? Seems like it would make sense to just write it here, since you > already have all the necessary bits. > So, this is an API design decision. The interface is CreatePairedDump(), because nsExceptionHandler doesn't know about plugin hangs or HangID. I can imagine other uses of this function that aren't HangID, but I'd be OK making this CreateHangDumps() until then if you prefer. Please advise.
Comment on attachment 432924 [details] [diff] [review] Part 4: Move some common code for dealing with minidumps and .extra files into nsExceptionHandler, v3 >diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp >--- a/toolkit/crashreporter/nsExceptionHandler.cpp >+++ b/toolkit/crashreporter/nsExceptionHandler.cpp >+static bool >+GetPendingDir(nsILocalFile** dir) >+{ >+ if (ShouldReport()) { >+ // dumps are going to "normal" dir The logic here is...weird, since sometimes it doesn't actually give you the pending dir. It seems wrong to lie about that. I understand what you want here, to find a minidump no matter where it lives, but this just doesn't seem right. I'd rather have this function always return the real pending dir, and GetMinidumpForID do the switching based on ShouldReport. (Or, you could rename this function, which might be simpler.) > static bool >+AppendExtraData(nsILocalFile* extraFile, >+ const AnnotationTable& data, >+ const Blacklist& blacklist, >+ const char* const crashTime=NULL, >+ bool truncate=false) This function signature is confusing. I'd suggest swapping the crashTime param for a "bool addCrashTime", and just doing the time(); XP_TTOA(); in this function (instead of making all the callers that want crashTime do that themselves). The function name also is confusing considering that it has a truncate parameter. It's not really appending if you truncate... Maybe just rename the function to something like WriteExtraData? r- because I think those things really need to be fixed. The rest looks good, though.
Attachment #432924 - Flags: review?(ted.mielczarek) → review-
(In reply to comment #58) > So, this is an API design decision. The interface is CreatePairedDump(), > because nsExceptionHandler doesn't know about plugin hangs or HangID. I can > imagine other uses of this function that aren't HangID, but I'd be OK making > this CreateHangDumps() until then if you prefer. Please advise. Ok, what you're doing makes sense.
Comment on attachment 434320 [details] [diff] [review] Part 4: Move some common code for dealing with minidumps and .extra files into nsExceptionHandler, v4 >diff --git a/toolkit/crashreporter/nsExceptionHandler.cpp b/toolkit/crashreporter/nsExceptionHandler.cpp >--- a/toolkit/crashreporter/nsExceptionHandler.cpp >+++ b/toolkit/crashreporter/nsExceptionHandler.cpp >+// The "limbo" dir is where minidumps go to wait for something else to >+// use them. If we're |ShouldReport()|, then the "something else" is >+// a minidump submitter, and they're coming from the >+// Crash Reports/pending/ dir. Otherwise, we don't know what the >+// "somthing else" is, but the minidumps stay in [profile]/minidumps/ nit: "something" The rest looks good, thanks!
Attachment #434320 - Flags: review?(ted.mielczarek) → review+
There appears to be a problem with the windows, plugin-process dumps: http://crash-stats.mozilla.com/report/index/8c450d3b-00a9-4017-b784-73d6a2100323 /home/processor/stackwalk/bin/stackwalk.sh returned no header lines for reportid: 103702501; No thread was identified as the cause of the crash; No signature could be created because we do not know which thread crashed; /home/processor/stackwalk/bin/stackwalk.sh returned no frame lines for reportid: 103702501; /home/processor/stackwalk/bin/stackwalk.sh failed with return code 1 when processing dump 8c450d3b-00a9-4017-b784-73d6a2100323 I don't know what missing information is triggering this. Anyone happen to know?
Attached file Bad dump from windows
Seeing cjones@beast:~/mozilla/dumps[]$ minidump_stackwalk e6eaef8a-56c1-4fa1-9234-dfca31490a6b.dmp 2010-03-23 20:42:59: minidump_processor.cc:238: INFO: Processing minidump in file e6eaef8a-56c1-4fa1-9234-dfca31490a6b.dmp 2010-03-23 20:42:59: minidump.cc:3494: INFO: Minidump opened minidump e6eaef8a-56c1-4fa1-9234-dfca31490a6b.dmp 2010-03-23 20:42:59: minidump.cc:3539: INFO: Minidump not byte-swapping minidump 2010-03-23 20:42:59: minidump.cc:3896: INFO: GetStream: type 1197932545 not present 2010-03-23 20:42:59: minidump.cc:3896: INFO: GetStream: type 6 not present 2010-03-23 20:42:59: minidump.cc:3896: INFO: GetStream: type 1197932546 not present 2010-03-23 20:42:59: minidump.cc:1942: INFO: MinidumpModule could not determine version for c:\Users\cjones\mozilla\ff-dbg\dist\bin\mozjs.dll 2010-03-23 20:42:59: minidump.cc:1309: ERROR: MinidumpThread has a memory region problem, 0x0+0x0 2010-03-23 20:42:59: minidump.cc:1503: ERROR: MinidumpThreadList cannot read thread 0/4 2010-03-23 20:42:59: minidump.cc:3919: ERROR: GetStream could not read stream type 3 2010-03-23 20:42:59: minidump_processor.cc:102: ERROR: Minidump e6eaef8a-56c1-4fa1-9234-dfca31490a6b.dmp has no thread list 2010-03-23 20:42:59: minidump.cc:3466: INFO: Minidump closing minidump 2010-03-23 20:42:59: minidump_stackwalk.cc:527: ERROR: MinidumpProcessor::Process failed from minidump_stackwalker on this. I thought this might be a security problem and so tried writing parent/child programs with what I understand as being the relevant code copied from breakpad/chromium, but that worked just fine. Dead end for me atm, need to work on other blockers.
minidump_stackwalk gives me what I had expected from firefox-bin/mozilla-runtime. No crash Thread 0 0 ntdll.dll + 0x464f4 eip = 0x77d864f4 esp = 0x0012febc ebp = 0x0012ff24 ebx = 0x7ffd4000 esi = 0x0012ff00 edi = 0x00000000 eax = 0x00000001 ecx = 0x0012fc98 edx = 0x77d864f4 efl = 0x00000206 Found by: given as instruction pointer in context 1 KERNELBASE.dll + 0x1817 eip = 0x76111818 esp = 0x0012ff2c ebp = 0x0012ff34 Found by: previous frame's frame pointer 2 dumpee.exe + 0x101a eip = 0x0040101b esp = 0x0012ff3c ebp = 0x0012ff40 Found by: previous frame's frame pointer 3 dumpee.exe + 0x1207 eip = 0x00401208 esp = 0x0012ff48 ebp = 0x0012ff88 Found by: previous frame's frame pointer 4 kernel32.dll + 0x51193 eip = 0x76401194 esp = 0x0012ff90 ebp = 0x0012ff94 Found by: previous frame's frame pointer 5 ntdll.dll + 0x5b3f4 eip = 0x77d9b3f5 esp = 0x0012ff9c ebp = 0x0012ffd4 Found by: previous frame's frame pointer 6 ntdll.dll + 0x5b3c7 eip = 0x77d9b3c8 esp = 0x0012ffdc ebp = 0x0012ffec Found by: previous frame's frame pointer
If the tree doesn't allow landing this while I'm around today/tomorrow, then what needs to be committed is all the patches in http://hg.mozilla.org/users/cjones_mozilla.com/mcmq up to and including 544936-browser-submit.
Comment on attachment 434706 [details] [diff] [review] Followup: Ensure the child process HANDLE passed to WriteMinidumpForChild has appropriate capabilities add a short comment explaining what happened here
Attachment #434706 - Flags: review?(benjamin) → review+
Comment on attachment 434706 [details] [diff] [review] Followup: Ensure the child process HANDLE passed to WriteMinidumpForChild has appropriate capabilities Ted and I would actually prefer the other solution we found.
Attachment #434706 - Flags: review+
This patch complements the first followup, but is probably the one we want to take for beta. IMHO the first followup is useful, e.g. for the case we saw: MiniDumpWriteDump() claiming to succeed but not doing anything useful.
Attachment #434723 - Flags: review?(benjamin)
Attachment #434723 - Flags: review?(benjamin) → review+
Was it really necessary to reformat all the whitespace in browser.js in this patch? My backport hates you!
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Oh crud. js2-mode did that the first time I edited it, but I switched to fundamental-mode thereafter. Must have forgot once when un-bitrotting :(.
Blanket approval for Lorentz merge to mozilla-1.9.2 a=beltzner for 1.9.2.4 - please make sure to mark status1.9.2:.4-fixed
Depends on: 557808
Patch #1 is breaking Maemo builds: /home/cltbld/build/mobile-trunk-maemo5-gtk/mozilla-central/toolkit/crashreporter/google-breakpad/src/common/stabs_reader.cc: In member function 'bool google_breakpad::StabsReader::Process()': /home/cltbld/build/mobile-trunk-maemo5-gtk/mozilla-central/toolkit/crashreporter/google-breakpad/src/common/stabs_reader.cc:94: error: 'N_UNDF' was not declared in this scope http://tinderbox.mozilla.org/showlog.cgi?tree=Mobile&errorparser=unix&logfile=1278936344.1278936907.2847.gz&buildtime=1278936344&buildname=Maemo%205%20GTK%20mozilla-central%20build&fulltext=1#err0 It seems like a.out.h is not being included via: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/stabs_reader.h#52
Can't be this bug! Sorry, I'll keep looking
probably fallout from bug 567424.
(In reply to comment #80) > probably fallout from bug 567424. Indeed. Fixed by bug 578080
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: