Closed
Bug 771251
Opened 12 years ago
Closed 12 years ago
OOP crash reporting access the directory service off the main thread
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox15 | --- | fixed |
People
(Reporter: benjamin, Assigned: benjamin)
References
Details
Attachments
(3 files, 2 obsolete files)
35.61 KB,
patch
|
ted
:
review+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
3.64 KB,
text/plain
|
Details | |
747 bytes,
patch
|
Details | Diff | Splinter Review |
The OOP crash reporting implementation, and especially the method OnChildProcessDumpRequested happens on a helper thread but ends up using the directory service. There is also a race condition involved with the new Flash crash injector: because the crashed process can disappear before the crash callback, it is possible that the IPC mechanism will notice that the process has died before the callback occurs; this means that the crash mechanism will not have a notification of the crash minidump. This patch fixes that issue by synchronously informing the main thread of the minidump before allowing the crashing process to continue/crash.
Assignee | ||
Comment 1•12 years ago
|
||
Attachment #639418 -
Flags: review?(ehsan)
Comment 2•12 years ago
|
||
Comment on attachment 639418 [details] [diff] [review] Move the crash callback guts to the main thread synchronously, rev. 1 Review of attachment 639418 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/crashreporter/nsExceptionHandler.cpp @@ +1876,5 @@ > +}; > + > +NS_IMETHODIMP > +HandleRemoteCrash::Run() > +{ Please assert that this always runs on the main thread.
Attachment #639418 -
Flags: review?(ehsan) → review+
Assignee | ||
Comment 3•12 years ago
|
||
Comment on attachment 639418 [details] [diff] [review] Move the crash callback guts to the main thread synchronously, rev. 1 This patch causes deadlocks when a crash occurs in plugin-container during an RPC call. I think I have a smaller solution to this, but I also think it can wait until ted gets back, so I'll ping him about it then.
Attachment #639418 -
Attachment is obsolete: true
Assignee | ||
Comment 4•12 years ago
|
||
Attachment #640706 -
Flags: review?(ted.mielczarek)
Comment 5•12 years ago
|
||
Comment on attachment 640706 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2 Review of attachment 640706 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/plugins/ipc/PluginModuleParent.cpp @@ +304,5 @@ > + } > + else { > + RemoveMinidump(childDumpFile); > + } > + } This feels like it ought to be a loop or a function call or something, but it's not really critical. ::: toolkit/crashreporter/nsExceptionHandler.cpp @@ +232,5 @@ > +#endif > + { } > + > + nsCOMPtr<nsIFile> minidump; > + PRUint32 sequence; // sequence of crashing processes Could you make this comment a little more descriptive?
Attachment #640706 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 6•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/95c9bc0e63f7 Callek graciously agreed to watch the tree and back me out if things go awry.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 7•12 years ago
|
||
And things went awry... https://hg.mozilla.org/mozilla-central/rev/d75c06f195d2 Error from linux64 (though B2g, as well as android and osx10.7 builds are all red too): ../../../toolkit/crashreporter/nsExceptionHandler.cpp: In function 'void CrashReporter::OOPInit()': ../../../toolkit/crashreporter/nsExceptionHandler.cpp:2077:39: error: cannot convert 'PRUnichar*' to 'CrashReporter::XP_CHAR*' in assignment ../../../toolkit/crashreporter/nsExceptionHandler.cpp: At global scope: ../../../toolkit/crashreporter/nsExceptionHandler.cpp:1986:13: warning: 'bool CrashReporter::ChildFilter(void*)' defined but not used make[6]: *** [nsExceptionHandler.o] Error 1 make[6]: Leaving directory `/builds/slave/m-cen-lnx64/build/obj-firefox/toolkit/crashreporter' make[5]: *** [crashreporter_libs] Error 2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•12 years ago
|
||
Not just the build bustage, though: on Windows, where you did build, you had both a mochitest-chrome failure and 32 xpcshell failures.
Assignee | ||
Comment 9•12 years ago
|
||
Stupid errors and the xpcshell testing framework relies on crash reporting working even when there is no user directory, so I've changed some asserts into warnings. I'll have an interdiff up shortly for ease of review.
Attachment #640706 -
Attachment is obsolete: true
Attachment #641069 -
Flags: review?(ted.mielczarek)
Assignee | ||
Comment 10•12 years ago
|
||
Updated•12 years ago
|
Attachment #641069 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 11•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/6a640ca09064 for this morning's nightly.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 12•12 years ago
|
||
Comment on attachment 641069 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1 Bug caused by (feature/regressing bug #): insufficient implementation of Flash process crash reporting User impact if declined: Flash process crash reporting will list the wrong crash process in some cases (40-50% of the time) Testing completed (on m-c, etc.): m-c landed, verification on self-builds of the correct behavior Risk to taking this patch (and alternatives if risky): This patch is larger than I wanted but I believe it is fairly safe. In the common case of non-Flash crash reporting the risk is only that there is a code error which prevents reports from being submitted in general. String or UUID changes made by this patch: There is one API change to XRE_TakeMinidumpForChild, but that would only be used by embedders and even that is very unlikely.
Attachment #641069 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 13•12 years ago
|
||
bc, this introduces a behavior change which may affect your automation: previously when a Flash process crashed, you would usually get either two or three .dmp files in the pending directory, one for the Flash process and one for the plugin-container. Now we will only keep the minidump for the process which crashed first, and delete the others. Let me know if you'd like an environment variable/pref setting which will keep all three minidumps, and I can arrange that.
Comment 14•12 years ago
|
||
bsmedberg: I think I can live with the change and in fact I believe I will like the new behavior more than the old. Thanks for the heads up though.
Assignee | ||
Comment 15•12 years ago
|
||
Comment on attachment 641069 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1 Clearing aurora nom until the crash is figured out. Thanks scoobidiver.
Attachment #641069 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 16•12 years ago
|
||
bc, here is a patch to use an envvar to disable dump deletion per our IRC conversation, please check to see if it works.
Assignee | ||
Comment 17•12 years ago
|
||
Comment on attachment 641069 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1 re-requesting aurora approval along with followup: bug 773665 and bug 773830
Attachment #641069 -
Flags: approval-mozilla-aurora?
Comment 18•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #16) > bc, here is a patch to use an envvar to disable dump deletion per our IRC > conversation, please check to see if it works. I spent some time manually trying to get crashes with multiple dumps but none of the urls I tested would create multiple dumps. I ran a set of daily urls with and without the patch hoping to get an overall comparison of the difference but the variation in the crash reproducibility hid whatever differences the crash workers experienced. If you want to wait for me to find a reproducible example where the patch has an affect that is fine with me. Otherwise it seems straightforward to land.
Assignee | ||
Comment 19•12 years ago
|
||
Comment on attachment 641069 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1 Merge day happened, transferring nom.
Attachment #641069 -
Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Comment 20•12 years ago
|
||
Comment on attachment 641069 [details] [diff] [review] Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1 approving, please land before tomorrow morning so we can get this in the first beta which will go to build then.
Attachment #641069 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 21•12 years ago
|
||
(In reply to Lukas Blakk [:lsblakk] from comment #20) > Comment on attachment 641069 [details] [diff] [review] > Give child crashes a sequence number, save the pending directory for > threadsafey, rev. 2.1 > > approving, please land before tomorrow morning so we can get this in the > first beta which will go to build then. I'm sure you already know this, but we just want to make sure that if this lands on beta that bug 773665 lands too.
Assignee | ||
Comment 22•12 years ago
|
||
Per IRC, I had to do a little backporting of this patch because of nsIFile/nsILocalFile changes and I'm not comfortable pushing it without another tryserver run. So we're going to do 15.0b1 without this change and pick it up for 15.0b2.
Assignee | ||
Comment 23•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/0cdc0b40d6d9
status-firefox15:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•