Closed Bug 1268616 Opened 9 years ago Closed 9 years ago

Very large allocations in call to input_overflow_buf_.reserve()

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla49

Tracking Flags:

Tracking

Status

e10s

---

firefox46

---

unaffected

firefox47

---

wontfix

firefox48

---

wontfix

firefox49

---

fixed

People

(Reporter: mccr8, Assigned: erahm)

References

Details

(Keywords: regression, Whiteboard: btpp-active)

Crash Data

Attachments

(3 files, 1 obsolete file)

Part 1: Check max message size before resizing 9 years ago Eric Rahm [:erahm] 2.43 KB, patch	billm : review+	Details \| Diff \| Splinter Review
Part 2: Stop sending messages that are too large 9 years ago Eric Rahm [:erahm] 2.68 KB, patch		Details \| Diff \| Splinter Review
Part 3: Reduce the maxmimum IPC message size 9 years ago Eric Rahm [:erahm] 1.19 KB, patch	billm : review+	Details \| Diff \| Splinter Review
Part 2: Stop sending messages that are too large 9 years ago Eric Rahm [:erahm] 1.68 KB, patch	billm : review+	Details \| Diff \| Splinter Review

Andrew McCreight [:mccr8]

Reporter

Description

•

9 years ago

I've come across 7 OOM crashes on this line: input_overflow_buf_.reserve(length + kReadBufferSize); ...where try_realloc() is attempting to allocate over 1GB. Here are two examples: 4GB: https://crash-stats.mozilla.com/report/index/dffd132e-0d84-4dfa-a878-071802160428 2GB: https://crash-stats.mozilla.com/report/index/4ff040cf-0d63-4863-9d33-400f92160428 I have no idea what is going wrong to cause this.

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Crash Signature: [@ OOM | large | NS_ABORT_OOM | Buffer::try_realloc ]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 1

•

9 years ago

The length in the message must be corrupted somehow. We probably shouldn't be so trusting of it. I think it would make sense to limit length to 32MB or something. If the message is actually bigger, it just means we'll have to grow the buffer.

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

9 years ago

Eric said he'd look at this.

Assignee: nobody → erahm

Eric Rahm [:erahm]

Assignee

Comment 3

•

9 years ago

Several issues here: #1) kMaximumMessageSize is too large [1] #2) We check the size *after* growing the buffer [2], rather than before [3] #3) We don't enforce the max message size when *sending* #4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly broken in the more than one message in a buffer case [4]. We steal the buffer and assume the message was at the front. But that assumption is incorrect if there happened to have been a smaller message first. This would result in a malformed message as well as the next message in the buffer pointing to junk. #5) The nesting and density of this code probably led to #4 [1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel.h#51 [2] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358 [3] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358 [4] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#388-408

Eric Rahm [:erahm]

Assignee

Comment 4

•

9 years ago

I guess there's technically another, although I think we always kill the connection if |ProcessIncomingMessages| fails. #6) We clear the buffer if the message was too big, but we leave the pipe with a potentially half-read message [1] [1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#358-362

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

9 years ago

FWIW, Chromium currently uses 128MB for its max message size: https://code.google.com/p/chromium/codesearch#chromium/src/ipc/ipc_channel.h&q=kMaximumMessageSize&sq=package:chromium&type=cs&l=98

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 6

•

9 years ago

(In reply to Eric Rahm [:erahm] from comment #3) > Several issues here: > > #1) kMaximumMessageSize is too large [1] It is sort of big, but I'm not convinced we should make it smaller. Any reason? > #2) We check the size *after* growing the buffer [2], rather than before [3] > #3) We don't enforce the max message size when *sending* > #4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly > broken in the more than one message in a buffer case [4]. We steal the > buffer and assume the message was at the front. But that assumption is > incorrect if there happened to have been a smaller message first. This would > result in a malformed message as well as the next message in the buffer > pointing to junk. I don't think that can happen. See the comment: https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#393 > #5) The nesting and density of this code probably led to #4 This code is going to change significantly in bug 1262671, so please hold off on any big cleanups for now. In general I agree the code is overly complicated, though. Especially the duplication between the posix and windows implementations.

Eric Rahm [:erahm]

Assignee

Comment 7

•

9 years ago

#7) If we had a partial message on the previous iteration for some reason we grow the overflow buffer by kReadBufferSize (4K) [1], but then we just truncate it further down [2]. [1] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#374 [2] https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ipc_channel_win.cc#441

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 8

•

9 years ago

(In reply to Eric Rahm [:erahm] from comment #7) > #7) If we had a partial message on the previous iteration for some reason we > grow the overflow buffer by kReadBufferSize (4K) [1], but then we just > truncate it further down [2]. In [1], we're sizing the buffer to the full size of the message plus some slop in case there are any other messages read in at the same time. That happens the first time we start reading a big message. We shouldn't resize the buffer down again until that message is processed. That's why the if condition is there in [2].

Eric Rahm [:erahm]

Assignee

Comment 9

•

9 years ago

(In reply to Bill McCloskey (:billm) from comment #6) > (In reply to Eric Rahm [:erahm] from comment #3) > > Several issues here: > > > > #1) kMaximumMessageSize is too large [1] > > It is sort of big, but I'm not convinced we should make it smaller. Any > reason? OOMs, as mccr8 noted it's less in upstream, but if we're going to segment things maybe it doesn't matter as much. > > #4) The steal-the-buffer-if-the-message-is-big insanity is unsurprisingly > > broken in the more than one message in a buffer case [4]. We steal the > > buffer and assume the message was at the front. But that assumption is > > incorrect if there happened to have been a smaller message first. This would > > result in a malformed message as well as the next message in the buffer > > pointing to junk. > > I don't think that can happen. See the comment: > https://dxr.mozilla.org/mozilla-central/rev/ > 86730d0a82093d705e44f33a34973d28b269f1ea/ipc/chromium/src/chrome/common/ > ipc_channel_win.cc#393 Ah right, a completed large message couldn't come after another message due to the size being greater than the read buffer. Looks like the posix version has a release assert for that. > > #5) The nesting and density of this code probably led to #4 > > This code is going to change significantly in bug 1262671, so please hold > off on any big cleanups for now. In general I agree the code is overly > complicated, though. Especially the duplication between the posix and > windows implementations. Sure I can revert that (I just added some newlines and // end if/while/for (foo) comments). (In reply to Bill McCloskey (:billm) from comment #8) > (In reply to Eric Rahm [:erahm] from comment #7) > > #7) If we had a partial message on the previous iteration for some reason we > > grow the overflow buffer by kReadBufferSize (4K) [1], but then we just > > truncate it further down [2]. > > In [1], we're sizing the buffer to the full size of the message plus some > slop in case there are any other messages read in at the same time. That > happens the first time we start reading a big message. We shouldn't resize > the buffer down again until that message is processed. That's why the if > condition is there in [2]. Yeah the flow here is a bit weird: #1) We append the message data just received #2) We reserve enough memory for the full message (we might already have a full message) + 4K #3) If we get a full message, we process it, truncate the buffer It seems like instead we'd want to reverse 1) and 2) if we have enough data to get the length. And for 2) I guess what we really want is reserve(round_up(len, 4k)). It also seems silly to shrink overflow_buf to anything less the kReadBufferSize. Well unless there are a bunch of them, but there's just one right? Anyhow, if you're changing all of this around perhaps it's not worth getting into this further.

Andrew Overholt [:overholt]

Updated

•

9 years ago

Whiteboard: btpp-active

Eric Rahm [:erahm]

Assignee

Comment 10

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5e05ea1397d

Eric Rahm [:erahm]

Assignee

Comment 11

•

9 years ago

Attached patch Part 1: Check max message size before resizing — Details — Splinter Review

Attachment #8747130 - Flags: review?(wmccloskey)

Eric Rahm [:erahm]

Assignee

Comment 12

•

9 years ago

Attached patch Part 2: Stop sending messages that are too large (obsolete) — Details — Splinter Review

Attachment #8747131 - Flags: review?(wmccloskey)

Eric Rahm [:erahm]

Assignee

Comment 13

•

9 years ago

Attached patch Part 3: Reduce the maxmimum IPC message size — Details — Splinter Review

This reduces the maximum message size to 128MiB which matches what is done upstream. The goal is to help reduce OOMs due to overly large messages.

Attachment #8747132 - Flags: review?(wmccloskey)

Eric Rahm [:erahm]

Assignee

Comment 14

•

9 years ago

Bill I understand if you'd rather not land these if they overlap too much with what you're doing. I stuck with the simplest changes. If not maybe you can incorporate them into what you have.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

9 years ago

Attachment #8747130 - Flags: review?(wmccloskey) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 15

•

9 years ago

Comment on attachment 8747131 [details] [diff] [review] Part 2: Stop sending messages that are too large Review of attachment 8747131 [details] [diff] [review]: ----------------------------------------------------------------- Could you instead stick a check in MessageLink that will crash if the message is too big? http://searchfox.org/mozilla-central/source/ipc/glue/MessageLink.cpp#174 That way, we'll get a stack telling us exactly who sent the message.

Attachment #8747131 - Flags: review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

9 years ago

Attachment #8747132 - Flags: review?(wmccloskey) → review+

Eric Rahm [:erahm]

Assignee

Comment 16

•

9 years ago

Attached patch Part 2: Stop sending messages that are too large — Details — Splinter Review

This adds an assert rather than just failing to send. I'm not sure if we would have just asserted on failing to send previously, I think we probably just closed the connection.

Attachment #8748945 - Flags: review?(wmccloskey)

Eric Rahm [:erahm]

Assignee

Updated

•

9 years ago

Attachment #8747131 - Attachment is obsolete: true

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 17

•

9 years ago

Comment on attachment 8748945 [details] [diff] [review] Part 2: Stop sending messages that are too large Review of attachment 8748945 [details] [diff] [review]: ----------------------------------------------------------------- Thanks Eric.

Attachment #8748945 - Flags: review?(wmccloskey) → review+

Eric Rahm [:erahm]

Assignee

Comment 18

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd2303d21d5e

Eric Rahm [:erahm]

Assignee

Comment 19

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9d579653d73332d5eb11d9b3cab86295e50624f1 Bug 1268616 - Part 1: Check max message size before resizing. r=billm https://hg.mozilla.org/integration/mozilla-inbound/rev/5cdbf605f9724757819da6515e098095772e6dd4 Bug 1268616 - Part 2: Stop sending messages that are too large. r=billm https://hg.mozilla.org/integration/mozilla-inbound/rev/bd5c9cfed83db345be921735e61a8bdd63070841 Bug 1268616 - Part 3: Reduce the maxmimum IPC message size. r=billm

Carsten Book [:Tomcat]

Comment 20

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9d579653d733 https://hg.mozilla.org/mozilla-central/rev/5cdbf605f972 https://hg.mozilla.org/mozilla-central/rev/bd5c9cfed83d

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox49: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Updated

•

9 years ago

Depends on: 1271102

Jim Mathies [:jimm]

Updated

•

9 years ago

tracking-e10s: --- → +

Priority: -- → P1

Jukka Jylänki

Updated

•

9 years ago

Blocks: 1273258

Jukka Jylänki

Comment 21

•

9 years ago

It looks like the above commits have caused a crash regression, see https://bugzilla.mozilla.org/show_bug.cgi?id=1273258.

Jim Mathies [:jimm]

Updated

•

9 years ago

No longer blocks: 1273258

Depends on: 1273258

Jim Mathies [:jimm]

Updated

•

9 years ago

Depends on: 1274074

Jim Mathies [:jimm]

Updated

•

9 years ago

Depends on: 1274075

Ting-Yu Chou [:ting] (away)

Updated

•

9 years ago

Depends on: 1277744

Jim Mathies [:jimm]

Comment 22

•

9 years ago

bill, should we uplift this to 48?

status-firefox46: --- → wontfix

status-firefox47: --- → wontfix

status-firefox48: --- → affected

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 23

•

9 years ago

I don't think so. It seems to have caused problems for some people. I'm not sure if that's because of the 256->128 change or something else.

Flags: needinfo?(wmccloskey)

[:philipp]

Comment 24

•

9 years ago

hi, i'm not sure if this crash is fixed. crashes with the signature [@ OOM | large | NS_ABORT_OOM | Buffer::try_realloc] were continuing on nightly after the patch here in the bug has landed. also, the signature wasn't showing up in builds before 47. in early crash data this issue is now on #20 with 0.5% of all crashes.

status-firefox46: wontfix → unaffected

Keywords: regression

Andrew McCreight [:mccr8]

Reporter

Comment 25

•

9 years ago

This was not intended to fix all crashes with this signature, only those where the OOM allocation size is very large (say, over 1GB). It does seem to have done that, though I only found two signatures on 38 (where this is not fixed), so it is hard to say what the rate of this crash really is.

status-firefox48: affected → wontfix

[:philipp]

Comment 26

•

9 years ago

(In reply to [:philipp] from comment #24) > in early crash data this issue is now on #20 with 0.5% of all crashes. oops comment 24 should have said: in early crash data for 47.0 this signature is ~0.5% of all crashes

Marco Castelluccio [:marco]

Updated

•

9 years ago

Depends on: 1288997

You need to log in before you can comment on or make changes to this bug.