Closed
Bug 1268616
Opened 9 years ago
Closed 9 years ago
Very large allocations in call to input_overflow_buf_.reserve()
Categories
(Core :: IPC, defect, P1)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
e10s | + | --- |
firefox46 | --- | unaffected |
firefox47 | --- | wontfix |
firefox48 | --- | wontfix |
firefox49 | --- | fixed |
People
(Reporter: mccr8, Assigned: erahm)
References
Details
(Keywords: regression, Whiteboard: btpp-active)
Crash Data
Attachments
(3 files, 1 obsolete file)
2.43 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
1.19 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
1.68 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
I've come across 7 OOM crashes on this line:
input_overflow_buf_.reserve(length + kReadBufferSize);
...where try_realloc() is attempting to allocate over 1GB.
Here are two examples:
4GB:
https://crash-stats.mozilla.com/report/index/dffd132e-0d84-4dfa-a878-071802160428
2GB:
https://crash-stats.mozilla.com/report/index/4ff040cf-0d63-4863-9d33-400f92160428
I have no idea what is going wrong to cause this.
Reporter | ||
Updated•9 years ago
|
Crash Signature: [@ OOM | large | NS_ABORT_OOM | Buffer::try_realloc ]
The length in the message must be corrupted somehow. We probably shouldn't be so trusting of it. I think it would make sense to limit length to 32MB or something. If the message is actually bigger, it just means we'll have to grow the buffer.
Assignee | ||
Comment 3•9 years ago
|
||
Several issues here:
#1) kMaximumMessageSize is too large [1]
#2) We check the size *after* growing the buffer [2], rather than before [3]
#3) We don't enforce the max message size when *sending*
#4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly broken in the more than one message in a buffer case [4]. We steal the buffer and assume the message was at the front. But that assumption is incorrect if there happened to have been a smaller message first. This would result in a malformed message as well as the next message in the buffer pointing to junk.
#5) The nesting and density of this code probably led to #4
[1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel.h#51
[2] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358
[3] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358
[4] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#388-408
Assignee | ||
Comment 4•9 years ago
|
||
I guess there's technically another, although I think we always kill the connection if |ProcessIncomingMessages| fails.
#6) We clear the buffer if the message was too big, but we leave the pipe with a potentially half-read message [1]
[1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358-362
Reporter | ||
Comment 5•9 years ago
|
||
FWIW, Chromium currently uses 128MB for its max message size: https://code.google.com/p/chromium/codesearch#chromium/src/ipc/ipc_channel.h&q=kMaximumMessageSize&sq=package:chromium&type=cs&l=98
(In reply to Eric Rahm [:erahm] from comment #3)
> Several issues here:
>
> #1) kMaximumMessageSize is too large [1]
It is sort of big, but I'm not convinced we should make it smaller. Any reason?
> #2) We check the size *after* growing the buffer [2], rather than before [3]
> #3) We don't enforce the max message size when *sending*
> #4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly
> broken in the more than one message in a buffer case [4]. We steal the
> buffer and assume the message was at the front. But that assumption is
> incorrect if there happened to have been a smaller message first. This would
> result in a malformed message as well as the next message in the buffer
> pointing to junk.
I don't think that can happen. See the comment:
https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#393
> #5) The nesting and density of this code probably led to #4
This code is going to change significantly in bug 1262671, so please hold off on any big cleanups for now. In general I agree the code is overly complicated, though. Especially the duplication between the posix and windows implementations.
Assignee | ||
Comment 7•9 years ago
|
||
#7) If we had a partial message on the previous iteration for some reason we grow the overflow buffer by kReadBufferSize (4K) [1], but then we just truncate it further down [2].
[1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#374
[2] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#441
(In reply to Eric Rahm [:erahm] from comment #7)
> #7) If we had a partial message on the previous iteration for some reason we
> grow the overflow buffer by kReadBufferSize (4K) [1], but then we just
> truncate it further down [2].
In [1], we're sizing the buffer to the full size of the message plus some slop in case there are any other messages read in at the same time. That happens the first time we start reading a big message. We shouldn't resize the buffer down again until that message is processed. That's why the if condition is there in [2].
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #6)
> (In reply to Eric Rahm [:erahm] from comment #3)
> > Several issues here:
> >
> > #1) kMaximumMessageSize is too large [1]
>
> It is sort of big, but I'm not convinced we should make it smaller. Any
> reason?
OOMs, as mccr8 noted it's less in upstream, but if we're going to segment things maybe it doesn't matter as much.
> > #4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly
> > broken in the more than one message in a buffer case [4]. We steal the
> > buffer and assume the message was at the front. But that assumption is
> > incorrect if there happened to have been a smaller message first. This would
> > result in a malformed message as well as the next message in the buffer
> > pointing to junk.
>
> I don't think that can happen. See the comment:
> https://dxr.mozilla.org/mozilla-central/rev/
> 86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/
> ipc_channel_win.cc#393
Ah right, a completed large message couldn't come after another message due to the size being greater than the read buffer. Looks like the posix version has a release assert for that.
> > #5) The nesting and density of this code probably led to #4
>
> This code is going to change significantly in bug 1262671, so please hold
> off on any big cleanups for now. In general I agree the code is overly
> complicated, though. Especially the duplication between the posix and
> windows implementations.
Sure I can revert that (I just added some newlines and // end if/while/for (foo) comments).
(In reply to Bill McCloskey (:billm) from comment #8)
> (In reply to Eric Rahm [:erahm] from comment #7)
> > #7) If we had a partial message on the previous iteration for some reason we
> > grow the overflow buffer by kReadBufferSize (4K) [1], but then we just
> > truncate it further down [2].
>
> In [1], we're sizing the buffer to the full size of the message plus some
> slop in case there are any other messages read in at the same time. That
> happens the first time we start reading a big message. We shouldn't resize
> the buffer down again until that message is processed. That's why the if
> condition is there in [2].
Yeah the flow here is a bit weird:
#1) We append the message data just received
#2) We reserve enough memory for the full message (we might already have a full message) + 4K
#3) If we get a full message, we process it, truncate the buffer
It seems like instead we'd want to reverse 1) and 2) if we have enough data to get the length. And for 2) I guess what we really want is reserve(round_up(len, 4k)).
It also seems silly to shrink overflow_buf to anything less the kReadBufferSize. Well unless there are a bunch of them, but there's just one right?
Anyhow, if you're changing all of this around perhaps it's not worth getting into this further.
Updated•9 years ago
|
Whiteboard: btpp-active
Assignee | ||
Comment 10•9 years ago
|
||
Assignee | ||
Comment 11•9 years ago
|
||
Attachment #8747130 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 12•9 years ago
|
||
Attachment #8747131 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 13•9 years ago
|
||
This reduces the maximum message size to 128MiB which matches what is done
upstream. The goal is to help reduce OOMs due to overly large messages.
Attachment #8747132 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 14•9 years ago
|
||
Bill I understand if you'd rather not land these if they overlap too much with what you're doing. I stuck with the simplest changes. If not maybe you can incorporate them into what you have.
Attachment #8747130 -
Flags: review?(wmccloskey) → review+
Comment on attachment 8747131 [details] [diff] [review]
Part 2: Stop sending messages that are too large
Review of attachment 8747131 [details] [diff] [review]:
-----------------------------------------------------------------
Could you instead stick a check in MessageLink that will crash if the message is too big?
http://searchfox.org/mozilla-central/source/ipc/glue/MessageLink.cpp#174
That way, we'll get a stack telling us exactly who sent the message.
Attachment #8747131 -
Flags: review?(wmccloskey)
Attachment #8747132 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 16•9 years ago
|
||
This adds an assert rather than just failing to send. I'm not sure if we would have just asserted on failing to send previously, I think we probably just closed the connection.
Attachment #8748945 -
Flags: review?(wmccloskey)
Assignee | ||
Updated•9 years ago
|
Attachment #8747131 -
Attachment is obsolete: true
Comment on attachment 8748945 [details] [diff] [review]
Part 2: Stop sending messages that are too large
Review of attachment 8748945 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks Eric.
Attachment #8748945 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 18•9 years ago
|
||
Assignee | ||
Comment 19•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d579653d73332d5eb11d9b3cab86295e50624f1
Bug 1268616 - Part 1: Check max message size before resizing. r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/5cdbf605f9724757819da6515e098095772e6dd4
Bug 1268616 - Part 2: Stop sending messages that are too large. r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd5c9cfed83db345be921735e61a8bdd63070841
Bug 1268616 - Part 3: Reduce the maxmimum IPC message size. r=billm
Comment 20•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9d579653d733
https://hg.mozilla.org/mozilla-central/rev/5cdbf605f972
https://hg.mozilla.org/mozilla-central/rev/bd5c9cfed83d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Depends on: 1271102
Updated•9 years ago
|
tracking-e10s:
--- → +
Priority: -- → P1
Comment 21•9 years ago
|
||
It looks like the above commits have caused a crash regression, see https://bugzilla.mozilla.org/show_bug.cgi?id=1273258.
Updated•9 years ago
|
Comment 22•8 years ago
|
||
bill, should we uplift this to 48?
status-firefox46:
--- → wontfix
status-firefox47:
--- → wontfix
status-firefox48:
--- → affected
Flags: needinfo?(wmccloskey)
I don't think so. It seems to have caused problems for some people. I'm not sure if that's because of the 256->128 change or something else.
Flags: needinfo?(wmccloskey)
Comment 24•8 years ago
|
||
hi, i'm not sure if this crash is fixed. crashes with the signature [@ OOM | large | NS_ABORT_OOM | Buffer::try_realloc] were continuing on nightly after the patch here in the bug has landed.
also, the signature wasn't showing up in builds before 47. in early crash data this issue is now on #20 with 0.5% of all crashes.
Keywords: regression
Reporter | ||
Comment 25•8 years ago
|
||
This was not intended to fix all crashes with this signature, only those where the OOM allocation size is very large (say, over 1GB). It does seem to have done that, though I only found two signatures on 38 (where this is not fixed), so it is hard to say what the rate of this crash really is.
Comment 26•8 years ago
|
||
(In reply to [:philipp] from comment #24)
> in early crash data this issue is now on #20 with 0.5% of all crashes.
oops comment 24 should have said: in early crash data for 47.0 this signature is ~0.5% of all crashes
You need to log in
before you can comment on or make changes to this bug.
Description
•