Generate crash reports for hangs in replaying processes

RESOLVED FIXED in Firefox 63

Status

()

enhancement
RESOLVED FIXED
9 months ago
9 months ago

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

Trunk
mozilla63
Points:
---

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

9 months ago
Posted patch patchSplinter Review
After bug 1481009 we can generate crash reports for crashed recording/replaying processes, but if a replaying process hangs then we won't generate a report.  The attached patch fixes this so that we generate a report with all stacks for hanged replaying processes.  (We still don't watch for hangs in recording processes, as these act like normal content processes and AFAICT we don't watch for hangs in normal content processes anyways.)
(Assignee)

Comment 1

9 months ago
When a hang has been detected in a replaying process, provoke it into crashing so that it will produce a minidump.
Attachment #8999032 - Flags: review?(nfroyd)
(Assignee)

Comment 2

9 months ago
When we provoke a crash in the replaying process we get back a minidump whose extra data just has the stack of the crashed thread, which isn't useful (this thread will be the one which received the message directing it to crash).  With this patch we recognize these crashes as hangs, which causes downstream stuff to include threads for all stacks when analyzing the minidump.
Attachment #8999037 - Flags: review?(gsvelto)
Comment on attachment 8999037 [details] [diff] [review]
Part 2 - Recognize hanged replaying processes when reporting crashes.

Review of attachment 8999037 [details] [diff] [review]:
-----------------------------------------------------------------

In principle this works but reading the code again made me think we can clean it up. Instead of passing both the annotation tables into NotifyCrashService() let's give it a crash type parameter instead and inspect the annotations right into CrashReporterHost::FinalizeCrashReport(), pseudo-code ahead:

int32_t GetCrashType(processType, annotations, parentAnnotations) {
 // Figure out if this is a hang or not and return nsICrashService::CRASH_TYPE_CRASH/nsICrashService::CRASH_TYPE_HANG
}

CrashReporterHost::FinalizeCrashReport() {
  ...
  int32_t crashType = GetCrashType(mProcessType, notes, mExtraNotes);
  ...
  NotifyCrashService(mProcessType, crashType, mDumpID);
  ...
}

Also yesterday bug 1348273 landed so you'll have to rebase this on top of that. The changes should be entirely mechanical, just a matter of looking up the annotations via their enum value instead of using a string.
Attachment #8999037 - Flags: review?(gsvelto) → review-
(Assignee)

Comment 4

9 months ago
Updated patch per comments.
Attachment #8999037 - Attachment is obsolete: true
Attachment #8999367 - Flags: review?(gsvelto)
Comment on attachment 8999367 [details] [diff] [review]
Part 2 - Recognize hanged replaying processes when reporting crashes.

Review of attachment 8999367 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good, thanks!
Attachment #8999367 - Flags: review?(gsvelto) → review+
Comment on attachment 8999367 [details] [diff] [review]
Part 2 - Recognize hanged replaying processes when reporting crashes.

Review of attachment 8999367 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/CrashReporterHost.cpp
@@ +63,5 @@
> +int32_t
> +CrashReporterHost::GetCrashType(const CrashReporter::AnnotationTable& aAnnotations)
> +{
> +  // RecordReplayHang is set in the middleman content process, so check aAnnotations.
> +  nsCString val = aAnnotations[CrashReporter::Annotation::RecordReplayHang];

Can we take a `const nsCString&` here, so we don't unnecessarily copy the string?

@@ +69,5 @@
> +    return nsICrashService::CRASH_TYPE_HANG;
> +  }
> +
> +  // PluginHang is set in the parent process, so check mExtraAnnotations.
> +  nsCString val = mExtraAnnotations[CrashReporter::Annotation::PluginHang];

Same question here.
Comment on attachment 8999032 [details] [diff] [review]
Part 1 - Trigger crashes in hanged replaying processes.

Review of attachment 8999032 [details] [diff] [review]:
-----------------------------------------------------------------

Some small things below;  r=me with those fixed.

::: toolkit/recordreplay/ipc/ChildProcess.cpp
@@ +621,5 @@
>        } else {
>          TimeStamp deadline = mLastMessageTime + TimeDuration::FromSeconds(HangSeconds);
>          if (TimeStamp::Now() >= deadline) {
>            MonitorAutoUnlock unlock(*gMonitor);
> +          if (!hasTerminated) {

Is this supposed to be `sentTerminateMessage`?  Because...

@@ +628,5 @@
> +            // wait another HangSeconds before hitting the restart case below.
> +            CrashReporter::AnnotateCrashReport(nsAutoCString(kHangAnnotation),
> +                                               NS_LITERAL_CSTRING("1"));
> +            SendMessage(TerminateMessage());
> +            sentTerminateMessage = true;

...otherwise this assignment is dead code.

::: toolkit/recordreplay/ipc/ParentIPC.h
@@ +62,5 @@
>                                   const char* aRecordingFile, bool aRecording,
>                                   std::vector<std::string>& aExtraArgs);
>  
> +// Crash annotation set when a child process appears to have hanged.
> +static const char* kHangAnnotation = "RecordReplayHang";

Two small things:

Please don't define this in a header.  If it needs to be in a single .cpp, move it to that .cpp.  If it needs to be in multiple .cpps, use an `extern const char* kHangAnnotation` here and define it once in some suitable .cpp.

Please declare/define this as `const char kHangAnnotation[]`, which is slightly smaller and more efficient.
Attachment #8999032 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Please declare/define this as `const char kHangAnnotation[]`, which is
> slightly smaller and more efficient.

Actually, now that bug 1348273 has landed, this needs to be declared in CrashAnnotations.yaml [1], then the definition will be generated automatically.

[1] https://searchfox.org/mozilla-central/source/toolkit/crashreporter/CrashAnnotations.yaml
(Assignee)

Comment 9

9 months ago
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Comment on attachment 8999032 [details] [diff] [review]
> Part 1 - Trigger crashes in hanged replaying processes.
> 
> Review of attachment 8999032 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Some small things below;  r=me with those fixed.
> 
> ::: toolkit/recordreplay/ipc/ChildProcess.cpp
> @@ +621,5 @@
> >        } else {
> >          TimeStamp deadline = mLastMessageTime + TimeDuration::FromSeconds(HangSeconds);
> >          if (TimeStamp::Now() >= deadline) {
> >            MonitorAutoUnlock unlock(*gMonitor);
> > +          if (!hasTerminated) {
> 
> Is this supposed to be `sentTerminateMessage`?  Because...

Oops, yes, I changed the name of the variable just before submitting the patch but missed this spot.

> ::: toolkit/recordreplay/ipc/ParentIPC.h
> @@ +62,5 @@
> >                                   const char* aRecordingFile, bool aRecording,
> >                                   std::vector<std::string>& aExtraArgs);
> >  
> > +// Crash annotation set when a child process appears to have hanged.
> > +static const char* kHangAnnotation = "RecordReplayHang";
> 
> Two small things:
> 
> Please don't define this in a header.  If it needs to be in a single .cpp,
> move it to that .cpp.  If it needs to be in multiple .cpps, use an `extern
> const char* kHangAnnotation` here and define it once in some suitable .cpp.
> 
> Please declare/define this as `const char kHangAnnotation[]`, which is
> slightly smaller and more efficient.

The earlier version of part 2 used kHangAnnotation to avoid repeated uses of the same bare string literal.  This stuff is all much better now that bug 1348273 has landed.

Comment 10

9 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/51d56958d3d0
Part 1 - Trigger crashes in hanged replaying processes, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d492e85d1040
Part 2 - Recognize hanged replaying processes when reporting crashes, r=gsvelto.

Comment 11

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/51d56958d3d0
https://hg.mozilla.org/mozilla-central/rev/d492e85d1040
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.