Large OOM crash in MessageChannel::OnMessageReceivedFromLink()

RESOLVED DUPLICATE of bug 1235633

Status

()

Core
IPC
RESOLVED DUPLICATE of bug 1235633
3 years ago
2 years ago

People

(Reporter: mccr8, Unassigned)

Tracking

({crash})

Trunk
crash
Points:
---

Firefox Tracking Flags

(firefox40 affected)

Details

(crash signature)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

3 years ago
From a casual perusing of crash-stats, this is the top OOM | large crash.  The allocations are all over 1MB.  The reports I looked at are happening on this line:
   mPending.push_back(aMsg);

mPending has type |typedef std::deque<Message> MessageQueue;|, and the argument type of push_back in C++11 is |(const value_type& val);|, so I think this means we're copying the message, which may be causing the OOM.  It is also possible 

An example crash:
https://crash-stats.mozilla.com/report/index/6173fe7e-bd6d-46b1-b314-084f32150417

This is similar to bug 1092010.
(Reporter)

Comment 1

3 years ago
"It is also possible "

Err, it is also possible that the queue itself is really huge and we need some segmented array there, though given bug 1092010 I feel like it is more likely that the message is just big.
(Reporter)

Comment 2

3 years ago
MessageChannel::MaybeUndeferIncall() also looks like it does some shuffling around of messages that might be problematic.
(Reporter)

Comment 3

3 years ago
Created attachment 8594973 [details] [diff] [review]
I like to move it move it.

I slapped some Moves() in those two functions and it at least compiles and runs.  Feedback welcome.
Comment on attachment 8594973 [details] [diff] [review]
I like to move it move it.

I think you should make OnMessageReceivedFromLink have a Message&& param, then Move() every time you call it.
(Reporter)

Comment 5

3 years ago
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #4)
> I think you should make OnMessageReceivedFromLink have a Message&& param,
> then Move() every time you call it.

That sounds less scary, I'll try it.

At froydnj's suggestion, I tried deleting the copy constructor for Message to see what happens.  Turns out we use it a lot of places. ;)  I don't know if it is worth auditing those further or if we should just continue to rely on crash stats to point out common badness.
Good find!
Keywords: crash
(Reporter)

Comment 7

3 years ago
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #4)
> I think you should make OnMessageReceivedFromLink have a Message&& param,
> then Move() every time you call it.

I started working on this, but in at least one place it would seem to require changing the type of a method in IPC::Channel::Listener::OnMessageReceived(), which seems like maybe a bad idea because of the widespread changes to Chromium IPC code it would entail.  Here's the place:

ProcessLink::OnMessageReceived(const Message& msg)
{
     ...
     mChan->OnMessageReceivedFromLink(msg);
(Reporter)

Comment 8

3 years ago
Created attachment 8595057 [details] [diff] [review]
Bug 1156484  - start marking MessageChannel::OnMessageReceivedFromLink() take move argument and propagate that outwards

This is what I have so far for Move-ifying OnMessageReceivedFromLink.
What if you make a local copy within ProcessLink::OnMessageReceived, and Move that?
(In reply to Andrew McCreight [:mccr8] from comment #7)
> which seems like maybe a bad idea because of the widespread changes
> to Chromium IPC code it would entail.

Yeah, sounds like it would be worth it though.
Maybe the Chromium folks have made the widespread changes to make Message move'able?  Dunno how much of a pain that'd be to backport, though...
(Reporter)

Comment 12

3 years ago
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #10)
> Yeah, sounds like it would be worth it though.

Just in what I've done so far, the lifetimes do seem a little nicer to reason about, rather than just throwing a pointer around and hoping it works out.

(In reply to David Major [:dmajor] from comment #9)
> What if you make a local copy within ProcessLink::OnMessageReceived, and
> Move that?

ProcessLink::OnMessageReceived() is in the OOM crash stack, so making a copy here to save a copy later won't help.
(Reporter)

Comment 13

3 years ago
Looking at the code the next layer up, in ChannelImpl::ProcessIncomingMessages(), it seems like the top level Message is just a pointer into the middle of some input buffer, so it seems a little dicey to use a move constructor on it anywhere.  I don't know how you'd easily ensure that the thing on mPending outlives the input buffer. So maybe a copy is unavoidable here?
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #11)
> Maybe the Chromium folks have made the widespread changes to make Message
> move'able?  Dunno how much of a pain that'd be to backport, though...

There don't appear to be any Message rvalue references anywhere in Chromium master, currently.
(In reply to Andrew McCreight [:mccr8] from comment #13)
> So maybe a copy is unavoidable here?

Oh, sorry, didn't realize that we were already back at the input buffer. We definitely have to copy once out of the input buffer. Thereafter we should be Move()'ing.
(Reporter)

Comment 16

3 years ago
Ok, so since there don't appear to be any extra copies, we'd either have to make this fallible and hope that the browser is still in some usable state after, or make the underlying storage some kind of segmented buffer, which could require a lot of changes.
I can hit this really easily with the following steps, on Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 20150901030226:

1. Load https://cloudharmony.com/speedtest
2. Click the "Test Options" dropdown link
3. Select all the test-type options, and also choose "10 mins"
4. Click "Start"

I crash 100% after some time, with this bug's stack.  Here's a couple recent crash reports:

https://crash-stats.mozilla.com/report/index/eb2ce5e1-ba7d-4738-8d19-d70252150901
https://crash-stats.mozilla.com/report/index/5dc3bcb3-d4b7-467c-8a65-4bd822150901
Yikes, OOM with >400MB address space available.

Who can evaluate which approach from comment 16 is the best way to go?

Independently it may be worthwhile to look at that speedtest site and see if we can use less memory in the first place.
(Reporter)

Comment 19

3 years ago
Looking at about:memory for comment 17, we end up with a large number of images around 8 to 10MB. I also occasionally see large spikes of heap-unclassified, up to like 175MB. Presumably that's some IPC stuff. I'll look at that with DMD.
(Reporter)

Comment 20

3 years ago
This crash does not seem very common on crash-stats any more. I only saw 2 reports, which may just be the ones from comment 17. :)
BTW, got an "all tabs" crash on the same site, same scenario, but different stack: https://crash-stats.mozilla.com/report/index/eee038e6-b8e1-4d3e-8fd9-7d4eb2150902
(In reply to Andrew McCreight [:mccr8] from comment #20)
> This crash does not seem very common on crash-stats any more. I only saw 2
> reports, which may just be the ones from comment 17. :)

I see a couple hundred, including strangely some size==0 that get labeled as "OOM | unknown":
https://crash-stats.mozilla.com/search/?signature=~FromLink&signature=%24OOM&_facets=signature&_facets=build_id&_facets=version&_facets=release_channel&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
(Reporter)

Comment 23

3 years ago
dmajor asked me to report what I found with DMD, but I haven't had a chance as I've been looking at a top crash on beta, but hopefully I can get to it later this week.
Flags: needinfo?(continuation)
(Reporter)

Updated

3 years ago
Attachment #8594973 - Attachment is obsolete: true
(Reporter)

Updated

3 years ago
Attachment #8595057 - Attachment is obsolete: true
(Reporter)

Comment 24

3 years ago
I did notice 10MB of some kind of Necko IPC stuff in the parent process that isn't being reported. I'll attach a log for that, one that includes some other Necko IPC stuff that is being reported, for contrast.

I also noticed a few things that are not really relevant to the issue at hand. I filed bug 1200812 for some event handler stuff, bug 1200795 for some CC stuff. I should probably file another one for Safe Browsing, as I see a fair bit of that.
Flags: needinfo?(continuation)
(Reporter)

Comment 25

3 years ago
Created attachment 8661514 [details]
nsMimeInputStream-related heap-unclassified

Updated

2 years ago
Crash Signature: [@ OOM | large | NS_ABORT_OOM(unsigned int) | mozilla::ipc::MessageChannel::OnMessageReceivedFromLink(IPC::Message const&) ] → [@ OOM | large | NS_ABORT_OOM(unsigned int) | mozilla::ipc::MessageChannel::OnMessageReceivedFromLink(IPC::Message const&) ] [@ OOM | large | NS_ABORT_OOM | mozilla::ipc::MessageChannel::OnMessageReceivedFromLink ]

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1235633
You need to log in before you can comment on or make changes to this bug.