Closed Bug 1390143 Opened 3 years ago Closed 3 years ago

Browser reports in paired minidumps no long reflect browser state

Categories

(Core :: DOM: Content Processes, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox55 --- unaffected
firefox56 --- wontfix
firefox57 --- fixed

People

(Reporter: jimm, Assigned: cyu)

References

Details

Attachments

(2 files, 2 obsolete files)

In bug 1360308  we added support for taking paired browser crash reports async. This broke the symmetry  between browser and content process reports. 

The fix is to make this and potentially the plugin paired dumps sync rather than async here -

http://searchfox.org/mozilla-central/rev/6482c8a5fa5c7446e82ef187d1a1faff49e3379e/dom/ipc/ContentParent.cpp#2985
http://searchfox.org/mozilla-central/rev/6482c8a5fa5c7446e82ef187d1a1faff49e3379e/dom/ipc/ProcessHangMonitor.cpp#715

Cervantes, can you provide some detail as to why we made this change? I don't want to regress what we were trying to fix in bug 1360308 if possible.
Flags: needinfo?(cyu)
Attached patch content patchSplinter Review
Assignee: nobody → jmathies
Attachment #8896994 - Flags: review?(cyu)
Blocks: 1366264
(In reply to Jim Mathies [:jimm] from comment #0)
> Cervantes, can you provide some detail as to why we made this change? I
> don't want to regress what we were trying to fix in bug 1360308 if possible.

We made this change for offloading main thread IO when we generate minidumps. This is done to keep disk IO from janking/hanging the browser as observed by BHR.

The design is to do the heavy lifting of disk IO in CreateMinidumpsAndPairIneternal() (http://searchfox.org/mozilla-central/rev/e5b13e6224dbe3182050cf442608c4cb6a8c5c55/toolkit/crashreporter/nsExceptionHandler.cpp#4195) on a dedicated thread. Then we work up the calling chain to save the states in the callback chain to be invoked when we finished writing the minidumps. The operation is designed support sync and async operating modes, where sync mode runs exactly the same as without the patch.

Your patch changes the async argument to false, which effectively puts disk IO back to the main thread for the affected code path. This should be the last resort when there is no other way to make it async. I think we should be able to forward-fix ContentParent::KillHard() and HangMonitorParent::SendHangNotification(). I can work on it.
Flags: needinfo?(cyu)
Jim, could you provide more detail about the breakage of the symmetry betw browser and content reports? I tested the path of http://searchfox.org/mozilla-central/rev/6482c8a5fa5c7446e82ef187d1a1faff49e3379e/dom/ipc/ProcessHangMonitor.cpp#715 and the dumps are generated as follows:

-rw------- 1 cervantes cervantes 1499376 Aug 15 19:31 0f91f85f-a2ee-8c9a-0818-2f3510fa9e49-browser.dmp
-rw------- 1 cervantes cervantes  706376 Aug 15 19:31 0f91f85f-a2ee-8c9a-0818-2f3510fa9e49-content.dmp
-rw------- 1 cervantes cervantes  328360 Aug 15 19:31 0f91f85f-a2ee-8c9a-0818-2f3510fa9e49.dmp
-rw------- 1 cervantes cervantes   10086 Aug 15 19:31 0f91f85f-a2ee-8c9a-0818-2f3510fa9e49.extra

It looks like the reports are generated correctly. Or am I missing some detail?
Flags: needinfo?(jmathies)
I wonder if we'll have a similar problem with plugin hangs? The parent side stack is usually pretty valuable. If the browser side is generated async we lose the context.
Oh I see. Following this line of reasoning, I don't think it's right to forward fix this bug since we need to have both parent and child stacks. Sorry for the regression. The correct way to get paired minidumps back to work is reverting bug 1360308. But it will bring the problem back: we hang the browser further when we are reporting a child process hang.

Maybe we need to find a lightweight method for getting hang stacks. Generating dumps seems to be an overkill. Maybe what we need is stackwalking. Or we find some way to dump to memory and then flush the content to disk, but this doesn't look like an easy task given that we use dbghelp.dll to write minidumps on Windows.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> Maybe we need to find a lightweight method for getting hang stacks.
> Generating dumps seems to be an overkill. Maybe what we need is
> stackwalking. Or we find some way to dump to memory and then flush the
> content to disk, but this doesn't look like an easy task given that we use
> dbghelp.dll to write minidumps on Windows.

I for one would very much like to continue have the ability to examine a minidump.
(In reply to Jim Mathies [:jimm] from comment #5)
> I wonder if we'll have a similar problem with plugin hangs? The parent side
> stack is usually pretty valuable. If the browser side is generated async we
> lose the context.

Unlike ContentParent::KillHard(), which always delegates parent minidump to CrashReporterHost::GenerateMinidumpAndPair(), when we create paired minidumps for plugins, it depends on whether we already have a browser dump at hand to pair. If so then generation of parent minidump will be synchronous and should be symmetrical with the child one.

I think we can take one step back from bug 1360308 without turning off async dump generation completely: we onload dump generation of the parent process back to the main thread and keep the child minidump async.
Gabriele, we need to move half (and maybe more) IO operations back to the main thread, unfortunately, to keep the stack of the parent process of paired minidumps.
Attachment #8897815 - Flags: review?(gsvelto)
Attachment #8897815 - Flags: feedback?(jmathies)
Comment on attachment 8896994 [details] [diff] [review]
content patch

I am cancelling this review since I'd like to have a forward fix as a counter proposal. If my proposal doesn't work out, then I'd like to revert the original bug and find another way around.
Attachment #8896994 - Flags: review?(cyu)
Comment on attachment 8897815 [details] [diff] [review]
Generate the parent dump synchronously of paired minidumps.

Review of attachment 8897815 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +4104,5 @@
>           , GetMinidumpType()
>  #endif
>        )) {
>      runnable = [&](){
> +      targetMinidump->Remove(false);

Shouldn't we also be removing the .extra file here?

@@ +4114,5 @@
>  
>    nsCOMPtr<nsIFile> targetExtra;
>    GetExtraFileForMinidump(targetMinidump, getter_AddRefs(targetExtra));
> +  if (!targetExtra) {
> +    targetMinidump->Remove(false);

I'm a bit puzzled by this, why are you synchronously removing the target minidump and then doing so again in the callback again? Is it a typo or am I missing something?
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
> Comment on attachment 8897815 [details] [diff] [review]
> Generate the parent dump synchronously of paired minidumps.
> 
> Review of attachment 8897815 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/crashreporter/nsExceptionHandler.cpp
> @@ +4104,5 @@
> >           , GetMinidumpType()
> >  #endif
> >        )) {
> >      runnable = [&](){
> > +      targetMinidump->Remove(false);
> 
> Shouldn't we also be removing the .extra file here?
> 
The .extra hasn't been generated here. Actually we don't need to remove the dump file if the return value here is false because the minidump callback doesn't assign to it on failure. I'll update the patch for that.


> @@ +4114,5 @@
> >  
> >    nsCOMPtr<nsIFile> targetExtra;
> >    GetExtraFileForMinidump(targetMinidump, getter_AddRefs(targetExtra));
> > +  if (!targetExtra) {
> > +    targetMinidump->Remove(false);
> 
> I'm a bit puzzled by this, why are you synchronously removing the target
> minidump and then doing so again in the callback again? Is it a typo or am I
> missing something?
You are not missing something. It's an error. Thanks for catching that.

BTW, it looks to me that it's safe to perform file operations off the main thread. I don't think we need to dispatch the IO operations, like RenameAdditionalHangMinidump() or MoveToPending(), back to the main thread. We can unclutter this function by moving these operations out of the callback lambdas.
I suppose that nsIFile operations are safe off the main thread and move file operations in the callback to the main thread out of the callbacks. This makes the touched functions clearer.
Attachment #8897815 - Attachment is obsolete: true
Attachment #8897815 - Flags: review?(gsvelto)
Attachment #8897815 - Flags: feedback?(jmathies)
Attachment #8898252 - Flags: review?(gsvelto)
Attachment #8898252 - Flags: feedback?(jmathies)
See Also: → 1391224
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #13)
> I suppose that nsIFile operations are safe off the main thread and move file
> operations in the callback to the main thread out of the callbacks. This
> makes the touched functions clearer.

Yeah, they should be asynchronous anyway. I'll have a look at the new patch right away.
Comment on attachment 8898252 [details] [diff] [review]
Generate the parent dump synchronously of paired minidumps.

Review of attachment 8898252 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me if but I'd like to hear Jim feedback before we land it.
Attachment #8898252 - Flags: review?(gsvelto) → review+
Carryover r+ with the fix of a nullptr deref in calling CreatePairedChildMinidumpAsync().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d385c8eb7711a0a39e407de16691f8bffaf30928
Assignee: jmathies → cyu
Attachment #8899426 - Flags: review+
Attachment #8898252 - Attachment is obsolete: true
Attachment #8898252 - Flags: feedback?(jmathies)
Comment on attachment 8899426 [details] [diff] [review]
Generate the parent dump synchronously of paired minidumps (v2)

Jim, we are about the land this patch and would like to seek your feedback. This patch onloads the parent part of paired minidumps back to the main thread to keep the stacks for content and plug hangs. Child dumps remains async. Do you have any other concern about this? Thanks.
Flags: needinfo?(jmathies)
Comment on attachment 8899426 [details] [diff] [review]
Generate the parent dump synchronously of paired minidumps (v2)

lgtm
Flags: needinfo?(jmathies)
Attachment #8899426 - Flags: feedback+
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c705cc6d5708d24e9f6648ca537bf110e62f305
Bug 1390143 - Generate the parent minidump synchronously to keep parent process's stack when creating paired minidumps. r=gsvelto
https://hg.mozilla.org/mozilla-central/rev/5c705cc6d570
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Blocks: 1394924
Too late for 56. Mass won't fix for 56.
You need to log in before you can comment on or make changes to this bug.