Closed Bug 771251 Opened 8 years ago Closed 8 years ago

OOP crash reporting access the directory service off the main thread

Categories

(Core :: IPC, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox15 --- fixed

People

(Reporter: benjamin, Assigned: benjamin)

References

Details

Attachments

(3 files, 2 obsolete files)

The OOP crash reporting implementation, and especially the method OnChildProcessDumpRequested happens on a helper thread but ends up using the directory service.

There is also a race condition involved with the new Flash crash injector: because the crashed process can disappear before the crash callback, it is possible that the IPC mechanism will notice that the process has died before the callback occurs; this means that the crash mechanism will not have a notification of the crash minidump. This patch fixes that issue by synchronously informing the main thread of the minidump before allowing the crashing process to continue/crash.
Comment on attachment 639418 [details] [diff] [review]
Move the crash callback guts to the main thread synchronously, rev. 1

Review of attachment 639418 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1876,5 @@
> +};
> +
> +NS_IMETHODIMP
> +HandleRemoteCrash::Run()
> +{

Please assert that this always runs on the main thread.
Attachment #639418 - Flags: review?(ehsan) → review+
Comment on attachment 639418 [details] [diff] [review]
Move the crash callback guts to the main thread synchronously, rev. 1

This patch causes deadlocks when a crash occurs in plugin-container during an RPC call. I think I have a smaller solution to this, but I also think it can wait until ted gets back, so I'll ping him about it then.
Attachment #639418 - Attachment is obsolete: true
Comment on attachment 640706 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2

Review of attachment 640706 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/plugins/ipc/PluginModuleParent.cpp
@@ +304,5 @@
> +        }
> +        else {
> +            RemoveMinidump(childDumpFile);
> +        }
> +    }

This feels like it ought to be a loop or a function call or something, but it's not really critical.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +232,5 @@
> +#endif
> +  { }
> +
> +  nsCOMPtr<nsIFile> minidump;
> +  PRUint32 sequence; // sequence of crashing processes

Could you make this comment a little more descriptive?
Attachment #640706 - Flags: review?(ted.mielczarek) → review+
https://hg.mozilla.org/mozilla-central/rev/95c9bc0e63f7

Callek graciously agreed to watch the tree and back me out if things go awry.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
And things went awry... https://hg.mozilla.org/mozilla-central/rev/d75c06f195d2

Error from linux64 (though B2g, as well as android and osx10.7 builds are all red too):

../../../toolkit/crashreporter/nsExceptionHandler.cpp: In function 'void CrashReporter::OOPInit()':
../../../toolkit/crashreporter/nsExceptionHandler.cpp:2077:39: error: cannot convert 'PRUnichar*' to 'CrashReporter::XP_CHAR*' in assignment
../../../toolkit/crashreporter/nsExceptionHandler.cpp: At global scope:
../../../toolkit/crashreporter/nsExceptionHandler.cpp:1986:13: warning: 'bool CrashReporter::ChildFilter(void*)' defined but not used
make[6]: *** [nsExceptionHandler.o] Error 1
make[6]: Leaving directory `/builds/slave/m-cen-lnx64/build/obj-firefox/toolkit/crashreporter'
make[5]: *** [crashreporter_libs] Error 2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Not just the build bustage, though: on Windows, where you did build, you had both a mochitest-chrome failure and 32 xpcshell failures.
Stupid errors and the xpcshell testing framework relies on crash reporting working even when there is no user directory, so I've changed some asserts into warnings. I'll have an interdiff up shortly for ease of review.
Attachment #640706 - Attachment is obsolete: true
Attachment #641069 - Flags: review?(ted.mielczarek)
Attached file v2 -> v2.1 interdiff
Attachment #641069 - Flags: review?(ted.mielczarek) → review+
https://hg.mozilla.org/mozilla-central/rev/6a640ca09064 for this morning's nightly.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Comment on attachment 641069 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1

Bug caused by (feature/regressing bug #): insufficient implementation of Flash process crash reporting
User impact if declined: Flash process crash reporting will list the wrong crash process in some cases (40-50% of the time)
Testing completed (on m-c, etc.): m-c landed, verification on self-builds of the correct behavior
Risk to taking this patch (and alternatives if risky): This patch is larger than I wanted but I believe it is fairly safe. In the common case of non-Flash crash reporting the risk is only that there is a code error which prevents reports from being submitted in general.
String or UUID changes made by this patch: There is one API change to XRE_TakeMinidumpForChild, but that would only be used by embedders and even that is very unlikely.
Attachment #641069 - Flags: approval-mozilla-aurora?
bc, this introduces a behavior change which may affect your automation: previously when a Flash process crashed, you would usually get either two or three .dmp files in the pending directory, one for the Flash process and one for the plugin-container. Now we will only keep the minidump for the process which crashed first, and delete the others. Let me know if you'd like an environment variable/pref setting which will keep all three minidumps, and I can arrange that.
bsmedberg: I think I can live with the change and in fact I believe I will like the new behavior more than the old. Thanks for the heads up though.
Depends on: 773665
Comment on attachment 641069 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1

Clearing aurora nom until the crash is figured out. Thanks scoobidiver.
Attachment #641069 - Flags: approval-mozilla-aurora?
bc, here is a patch to use an envvar to disable dump deletion per our IRC conversation, please check to see if it works.
Comment on attachment 641069 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1

re-requesting aurora approval along with followup: bug 773665 and bug 773830
Attachment #641069 - Flags: approval-mozilla-aurora?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #16)

> bc, here is a patch to use an envvar to disable dump deletion per our IRC
> conversation, please check to see if it works.

I spent some time manually trying to get crashes with multiple dumps but none of the urls I tested would create multiple dumps. I ran a set of daily urls with and without the patch hoping to get an overall comparison of the difference but the variation in the crash reproducibility hid whatever differences the crash workers experienced. If you want to wait for me to find a reproducible example where the patch has an affect that is fine with me. Otherwise it seems straightforward to land.
Comment on attachment 641069 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1

Merge day happened, transferring nom.
Attachment #641069 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Comment on attachment 641069 [details] [diff] [review]
Give child crashes a sequence number, save the pending directory for threadsafey, rev. 2.1

approving, please land before tomorrow morning so we can get this in the first beta which will go to build then.
Attachment #641069 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Lukas Blakk [:lsblakk] from comment #20)
> Comment on attachment 641069 [details] [diff] [review]
> Give child crashes a sequence number, save the pending directory for
> threadsafey, rev. 2.1
> 
> approving, please land before tomorrow morning so we can get this in the
> first beta which will go to build then.

I'm sure you already know this, but we just want to make sure that if this lands on beta that bug 773665 lands too.
Per IRC, I had to do a little backporting of this patch because of nsIFile/nsILocalFile changes and I'm not comfortable pushing it without another tryserver run. So we're going to do 15.0b1 without this change and pick it up for 15.0b2.
Depends on: 779849
You need to log in before you can comment on or make changes to this bug.