Closed
Bug 1259480
Opened 8 years ago
Closed 8 years ago
[e10s] FatalError crash in PContentChild::Read from PContentChild::SendReadPrefsArray
Categories
(Core :: DOM: Content Processes, defect, P1)
Core
DOM: Content Processes
Tracking
()
People
(Reporter: benjamin, Unassigned)
References
Details
(Whiteboard: btpp-active, e10st?)
Crash Data
Filed from crash-stats based on the 46 experiment. https://crash-stats.mozilla.com/search/?signature=~mozalloc_abort+%7C+NS_DebugBreak+%7C+mozilla%3A%3Aipc%3A%3AFatalError+%7C+mozilla%3A%3Adom%3A%3APContentChild%3A%3ARead&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports This does not appear to be exploitable, but it's a weird crash that shouldn't happen.
Updated•8 years ago
|
Priority: -- → P1
Comment 2•8 years ago
|
||
This didn't show up in the top 50 in the apz 46 experiment.
Comment 3•8 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #2) > This didn't show up in the top 50 in the apz 46 experiment. My mistake, this was 21 in the content process list. It looks like it's startup related, but not all of the signatures initiate from SendReadPrefsArray. The startup crashes all do though. This one might be pretty easy to track down, Kanru do you have anyone who's free to take a look at this? https://docs.google.com/spreadsheets/d/17P0SBGcgrB-fFycCh4sA8KeK719-srjWEPiuvfcr6y8/edit#gid=777801084
Flags: needinfo?(kchen)
Comment 4•8 years ago
|
||
I can't view that spreadsheet, FWIW, even with my MoCo account.
Comment 5•8 years ago
|
||
This is an odd one. We're attempting to deserialize what could be a very large array of data that contains the parent's preferences. child send: http://mxr.mozilla.org/mozilla-central/source/modules/libpref/Preferences.cpp#515 parent side: http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2678 http://mxr.mozilla.org/mozilla-central/source/modules/libpref/Preferences.cpp#394 We're dealing with user prefs here, I wonder if we should have better error checking on the parent side.
Comment 6•8 years ago
|
||
In a fairly clean profile, I see this preferences stuff generate a pretty big IPC message, doubling its way up to 262KB. Maybe for somebody with a ton of preferences it could be even larger, and thus be a source of OOMs for people low on memory? We probably need to wait until Kan-ru's patch in bug 1258312 filters out to a beta experiment. This is one of many deserialization crashes.
See Also: → 1258312
Updated•8 years ago
|
Whiteboard: btpp-active
Comment 9•8 years ago
|
||
Signature is likely to be changed to OOM by bug 1258312 but I think we still need to fix the large preferences over IPC issue. We will need to update the signature once we figure how to collect that for bug 1258312.
Updated•8 years ago
|
Updated•8 years ago
|
Comment 10•8 years ago
|
||
From this report https://crash-stats.mozilla.com/report/index/9f96e936-2ecd-4588-a575-e35af2160418 the build already has the fix for bug 1258312, but we still see it crash in deserializing the prefs array. This doesn't look like an OOM. The error message in FatalError() is "unknown union type", that is, MaybePrefValue is neither of TPrefValue or Tnull_t. It's still unclear how such unexpected union value get passed from parent's check.
Comment 11•8 years ago
|
||
I sampled several dumps and saw the union have values like 206424 or 0, and the prefs array has length values > 2000, which is unlikely because the from the crash reports there aren't many addons installed. It looks like the IPC message was corrupted.
Comment 12•8 years ago
|
||
This isn't quite the same issue, but I found this crash on beta 47. It is an OOM while trying to send a prefs array, and the OOM allocation size is around 220MB: https://crash-stats.mozilla.com/report/index/01246d16-3c71-4bb1-a148-35b992160428
Updated•8 years ago
|
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::PContentChild::Read ] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::PContentChild::Read ] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::PContentChild::FatalError | mozilla::dom::PContentChild::Read ]
Comment 13•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #12) > This isn't quite the same issue, but I found this crash on beta 47. It is an > OOM while trying to send a prefs array, and the OOM allocation size is > around 220MB: > https://crash-stats.mozilla.com/report/index/01246d16-3c71-4bb1-a148- > 35b992160428 Maybe the fix in bug 1258312 moved the crash from the child process to the parent process.
Comment 14•8 years ago
|
||
I have a preliminary patch for checksumming IPC messages (header and payload separately) using crc32c and ran several talos e10s suites locally (damp, svgr, others). Checksum over IPC messages has ~1 % in some tests and indifferent in most tests. Having checksums doesn't tell us where the problem is, but allows for early detection of errors (if there are really message corruptions). In addition, we may also need to log message type of the previous message received in the channel. Jim, do you think it help to add checksums to detect message corruption?
Flags: needinfo?(jmathies)
Comment 15•8 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #14) > I have a preliminary patch for checksumming IPC messages (header and payload > separately) using crc32c and ran several talos e10s suites locally (damp, > svgr, others). Checksum over IPC messages has ~1 % in some tests and > indifferent in most tests. > > Having checksums doesn't tell us where the problem is, but allows for early > detection of errors (if there are really message corruptions). In addition, > we may also need to log message type of the previous message received in the > channel. > > Jim, do you think it help to add checksums to detect message corruption? I wouldn't anticipate messages corrupting in transit, I can't really think of anything that could cause that. We're just sending data over pipes. Two more likely scenarios would be memory corruption of the message data on the receiving side, or potentially some sort of a flaw in our serializer on the sending side. I don't think we want to land something like this but you could run it through try to see if it hits on something. ni'ing Gabor who's been doing a lot of work in IPC for an opinion. We might want to ask billm too.
Flags: needinfo?(jmathies) → needinfo?(gkrizsanits)
Comment 16•8 years ago
|
||
Given that these are mostly for this one particular message, it is probably worth looking at the serialization code for prefs and see if there are more cases where it can fail somewhere, but still end up sending the message. One idea I had was that we could add a opt-in mechanism to immediately deserialize a message after we create it. This might give us more information. I filed bug 1268900 for that. It might be worth looking into if staring at the serialization code doesn't turn up anything.
Comment 17•8 years ago
|
||
My first bet would be that the pref file was corrupted somehow. The parsing algorithm code looks pretty scary, so I would not be surprised if it generated something crazy for a corrupted file (OR wrote something crazy while trying to save an already broken state). Can we do some sanity check on the parsed prefs? Comment 11 sounds like something we could filter out with some sanity check on the parsed prefs before sending it over.
Flags: needinfo?(gkrizsanits)
Comment 18•8 years ago
|
||
I filed bug 1269365 about turning OOM failures during deserialization of nsTArray into OOM crashes. This isn't consistent with what Cervantes found in comment 10, but it would be nice to at least eliminate that as a possibility.
Comment 19•8 years ago
|
||
mccr8, do we still need this with bug 1269365?
Flags: needinfo?(continuation)
Updated•8 years ago
|
Assignee: cyu → nobody
Comment 20•8 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #19) > mccr8, do we still need this with bug 1269365? I haven't seen any crashes from bug 1269365, so it still isn't clear what these crashes are caused by.
Flags: needinfo?(continuation)
Comment 21•8 years ago
|
||
I tried to implement the specialized version of bug 1268900, only for this bug. It looks like that we made PContentParent::Write() overloads private, and this makes it difficult to dry run message deserialization in ContentParent::RecvReadPrefsArray(). We still need to make this happen in the code generator.
Updated•8 years ago
|
Updated•8 years ago
|
Comment 22•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #17) > My first bet would be that the pref file was corrupted somehow. The parsing > algorithm code looks pretty scary, so I would not be surprised if it > generated something crazy for a corrupted file (OR wrote something crazy > while trying to save an already broken state). Can we do some sanity check > on the parsed prefs? Comment 11 sounds like something we could filter out > with some sanity check on the parsed prefs before sending it over. I've looked into this more thoroughly and I don't think this is the case. I've found a few parts that I can imagine that can go wrong but nothing that could cause what is described in Comment 10. In the serialization we write length/data pairs. To read an invalid pref value data type, the memory should be either totally corrupted, or we should write a length that is shorter or longer than the data that follows it. But even if the pref array is full of invalid data that can never happen based on how IPCMessageUtils is implemented.
Updated•8 years ago
|
Whiteboard: btpp-active → btpp-active, e10st?
Comment 23•8 years ago
|
||
A number of mitigations for this issue have landed, so I'm going to mark this WFM. (There are some crashes with this signature, but I think the signature would look like something else now anyways.)
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Comment 24•8 years ago
|
||
Crash volume for signature 'mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::PContentChild::FatalError | mozilla::dom::PContentChild::Read': - nightly (version 50): 24 crashes from 2016-06-06. - aurora (version 49): 7 crashes from 2016-06-07. - beta (version 48): 6 crashes from 2016-06-06. - release (version 47): 1 crash from 2016-05-31. - esr (version 45): 0 crash from 2016-04-07. Crash volume on the last weeks: Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7 - nightly 0 0 0 0 0 1 0 - aurora 0 0 5 0 2 0 0 - beta 1 0 0 5 0 0 0 - release 1 0 0 0 0 0 0 - esr 0 0 0 0 0 0 0 Affected platform: Windows
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
You need to log in
before you can comment on or make changes to this bug.
Description
•