Closed Bug 1295622 Opened 7 years ago Closed 7 years ago

e10s crash in IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was

Categories

(Toolkit :: about:memory, defect)

49 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
e10s + ---
firefox49 --- fixed
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: philipp, Assigned: jld)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-ad715afe-629e-467d-88ed-765e02160816.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp:434
2 	xul.dll 	mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) 	dom/ipc/ContentChild.cpp:2359

this signature seems to regress with firefox 49 & it's currently making up ~2% of all content crashes in 49.0b3.
Crash Signature: [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was ] → [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was%20 ]
Crash Signature: [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was%20 ] → [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was ]
Accurate crash list:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~Msg_PMemoryReportRequestConstructor&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Andrew, is there someone on your team who could take a look at this?
Crash Signature: [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was ] → [@ IPCError-content | (msgtype=0x46000A,name=PContent::Msg_PMemoryReportRequestConstructor) Processing error: message was ]
Component: General → DOM
Flags: needinfo?(overholt)
Are these just OOMs?
Flags: needinfo?(overholt) → needinfo?(continuation)
Component: DOM → about:memory
Product: Core → Toolkit
I feel like I might have run into this once.  I opened my normal set of windows (3 gmails, irc, bugzilla, telegram, twitter, etc) and then ran an about:memory measurement.  The browser crashed.  The key difference from my usual workflow was that I did not run a minimize before the measurement.  Not sure if that helps here.
So this blames PContent::Msg_PMemoryReportRequestConstructor and says the problem is that the handler returned false, which implicates [1]; if this is the non-minimize case, then that might be the “already in progress” check in [2].  I think the problem here is [1] turning an nsresult error into an IPDL false — errors here ought to be reported somehow (add an nsresult to PMemoryReportRequest::__delete__, maybe?), but it shouldn't kill the process.  (I had a lot less experience with the IPC framework when I wrote that code.)

[1] http://searchfox.org/mozilla-central/rev/b35975ec17c8/dom/ipc/ContentChild.cpp#1070
[2] http://searchfox.org/mozilla-central/rev/b35975ec17c8/xpcom/base/nsMemoryReporterManager.cpp#1807
(With thanks to the Metadata tab on crash-stats, where the ipc_channel_error key has the non-truncated version of the error message.)
Flags: needinfo?(continuation)
Thanks for looking at this, Jed!
Tentatively assigning to Jed.
Assignee: nobody → jld
Looks like there are about 170 crashes on beta 5 in the last week. We could still take a patch for beta.
This is the simple “band-aid” fix.  It's not actually tested, because I don't have STR (and it's more complicated than just starting multiple memory reports in succession; the timeout stuff might be involved somehow, and see also bug 1297164 as mentioned by :erahm on IRC the other day), but it ensures that RecvPMemoryReportRequestConstructor can't ever return `false`, which will prevent this crash.

Also, I added an NS_WARN_IF so that we get *some* indication (at least on debug builds) that things went wrong even in the minimizeMemoryUsage case, where we're currently not even printing a warning, it looks like.

Longer-term the error handling (or lack thereof) of all the memory reporting stuff should be looked at, but a fix at that level would need more time than I currently have (and would probably be difficult and high-risk to uplift, whereas this is very simple and low-risk).
Attachment #8785534 - Flags: review?(n.nethercote)
Comment on attachment 8785534 [details] [diff] [review]
bug1295622-remote-memory-report-fail-hg0.diff

Review of attachment 8785534 [details] [diff] [review]:
-----------------------------------------------------------------

In bug 1297658 I just streamlined (i.e. removed) a lot of the error checking within individual memory reporters because it wasn't useful. I think that's orthogonal to this, but Jed you might be interested in at least looking at that bug's patch.

::: dom/ipc/ContentChild.cpp
@@ +1062,5 @@
>    }
>  
> +  // Bug 1295622: don't kill the process just because this failed.
> +  NS_WARN_IF(NS_FAILED(rv));
> +  return true;

The process is killed if this function fails? Really?
Attachment #8785534 - Flags: review?(n.nethercote) → review+
(In reply to Nicholas Nethercote [:njn] from comment #10)
> In bug 1297658 I just streamlined (i.e. removed) a lot of the error checking
> within individual memory reporters because it wasn't useful. I think that's
> orthogonal to this, but Jed you might be interested in at least looking at
> that bug's patch.

Thanks for the pointer.

> ::: dom/ipc/ContentChild.cpp
> @@ +1062,5 @@
> >    }
> >  
> > +  // Bug 1295622: don't kill the process just because this failed.
> > +  NS_WARN_IF(NS_FAILED(rv));
> > +  return true;
> 
> The process is killed if this function fails? Really?

Yes, if it fails by returning false rather than by signaling an error in-band with IPC messages.  https://developer.mozilla.org/en-US/docs/Mozilla/IPDL/Tutorial#Shutdown_and_Error_Handling tries to explain.  It's been like that for longer than I've been a contributor to this area (as confirmed by the MDN history — which makes me wonder how I managed to miss it when I read through the tutorial the first time), so I don't know the original rationale.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9df0314cff2
Don't crash the content process if a memory report fails. r=njn
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c9df0314cff2
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
This seems to be a very low-risk patch: could we uplift it to 50 and 49? I almost just filed this bug because personally experience it almost every night if I leave nightly open. This is the kind of thing that might really annoy our dedicated/active userbase.
Flags: needinfo?(jld)
And FWIW, I didn't ever use about:memory, so this was triggered somehow else.
Comment on attachment 8785534 [details] [diff] [review]
bug1295622-remote-memory-report-fail-hg0.diff

Approval Request Comment
[Feature/regressing bug #]: e10s memory reporting (since bug 946407?)
[User impact if declined]: Crashes — and comment #14 reports this can happen without explicitly trying to take a memory report.
[Describe test coverage new/current, TreeHerder]: There are e10s memory reporting tests in mochitest-other; the patch has been stable on m-c for a little while.
[Risks and why]: Very low; this applies to cases where we would have crashed, and ignores an error which is already being ignored in other cases.
[String/UUID change made/needed]: None
Flags: needinfo?(jld)
Attachment #8785534 - Flags: approval-mozilla-beta?
Attachment #8785534 - Flags: approval-mozilla-aurora?
(In reply to Benjamin Smedberg [:bsmedberg] from comment #15)
> And FWIW, I didn't ever use about:memory, so this was triggered somehow else.

That's interesting — a PMemoryReportRequest has to have been created to get to this crash, so *something* was doing a memory report.  Looking through the source for things that might be using nsIMemoryReporterManager and/or nsIMemoryInfoDumper to get reports (and that aren't tests or about:memory itself) finds the memory.dump_reports_on_oom pref (off by default, used by the JS engine allocator?) and nsThread::SaveMemoryReportNearOOM -> nsXULAppInfo::SaveMemoryReport.
Comment on attachment 8785534 [details] [diff] [review]
bug1295622-remote-memory-report-fail-hg0.diff

Looks low risk, should fix a common crash in beta. 
This should make it into the beta 9 build on Thursday.
Attachment #8785534 - Flags: approval-mozilla-beta?
Attachment #8785534 - Flags: approval-mozilla-beta+
Attachment #8785534 - Flags: approval-mozilla-aurora?
Attachment #8785534 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.