OOMs during deserialization of nsTArray should result in an OOM crash

RESOLVED FIXED in Firefox 47

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mccr8, Assigned: mccr8)

Tracking

Trunk
mozilla49
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox47 fixed, firefox48 fixed, firefox49 fixed)

Details

(Whiteboard: btpp-active)

Attachments

(3 attachments)

ParamTraits<InfallibleTArray<E>>::Read() deserializes to a fallible array, then converts that to an infallible array. If the deserialization of the fallible array has an allocation failure, then the failure is treated as a deserialization failure, and we crash in IPC code. We should instead treat this as an OOM crash, to make it clear what is going wrong.

This may be causing the mysterious deserialization failures in bug 1259480, given there is other evidence that the preferences can become extremely large (bug 1268662, though I'd guess that is more the result of data than number of prefs, but who knows).
So just switch deserialization to use infallible arrays always?
Oddly, we seem to IPDL codegen the serialization of nsTArray here. See PContentChild::Read(nsTArray<PrefSetting>* v__, [...]). So this won't actually use ParamTraits<InfallibleTArray<E>>::Read(). (Spot checking, this does not seem to apply to things like uint8_t[], which is fortunate.) I guess as a quick fix I'll look into making the codegen use infallible allocation. Longer term, I should figure out why we codegen this, and if it can be eliminated.
We codegen this as a hack, because we don't actually codegen ParamTraits implementations for all the things we need to serialize.  What we do instead is define overloaded Read()/Write() implementations in each of the classes that need to serialize something, and then a catch-all templated Read()/Write() implementation that forwards to our normal ParamTraits specializations.

Unwinding this would require some thought and likely some rethinking of how the IPDL compiler itself works.
This should not change the behavior, because fallible_t is passed to the array methods.

Rename from InfallibleTArray to nsTArray, because that is more usual.

Remove unnecessary spaces from some template definitions.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=697e1d2c59ab
Attachment #8747899 - Flags: review?(nfroyd)
This will turn allocation failures during deserialization from IPC FatalError crashes into OOM crashes.
Attachment #8747900 - Flags: review?(nfroyd)
As with part 2, this will turn some allocation failures during deserialization from IPC FatalError crashes into OOM crashes.
Attachment #8747901 - Flags: review?(nfroyd)
Attachment #8747899 - Flags: review?(nfroyd) → review+
Comment on attachment 8747900 [details] [diff] [review]
part 2 - Make ParamTraits<nsTArray<E>>::Read use infallible allocation.

Review of attachment 8747900 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the comment described below.

::: ipc/glue/IPCMessageUtils.h
@@ +491,5 @@
>        if (!aMsg->ReadBytes(aIter, &outdata, pickledLength)) {
>          return false;
>        }
>  
> +      E* elements = aResult->AppendElements(length);

I think we should have comments somewhere in or above this method that describe why we want to use infallible allocations here, rather than failing to deserialize and try to handle that failure somewhere else.
Attachment #8747900 - Flags: review?(nfroyd) → review+
Attachment #8747901 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #7)
> I think we should have comments somewhere in or above this method that
> describe why we want to use infallible allocations here, rather than failing
> to deserialize and try to handle that failure somewhere else.

Good point. I added a comment to the Read() method.
tracking-e10s: --- → ?
Whiteboard: btpp-active
FWIW the intention here was to make sure that a rogue child process can't cause the parent to crash by sending a huge message. I guess the error handling in the parent process wasn't sufficient on its own, but shouldn't that just be improved rather than turning this into an OOM abort?
Blocks: e10s-oom
Priority: -- → P1
(In reply to Ben Turner (not reading bugmail, use the needinfo flag!) from comment #10)
> FWIW the intention here was to make sure that a rogue child process can't
> cause the parent to crash by sending a huge message. I guess the error
> handling in the parent process wasn't sufficient on its own, but shouldn't
> that just be improved rather than turning this into an OOM abort?

My patch is only changing how we crash, not if we crash, because a FatalError() in the parent process will already kill the parent process. That was changed because the prior behavior wasn't getting proper crash reports, or something like that. But yeah, in the long run we want to be able to back this patch out.
This looks like something we might want to uplift to 47 and 48?
Flags: needinfo?(continuation)
Comment on attachment 8747899 [details] [diff] [review]
part 1 - Swap fallible and infallible TArray ParamTraits.

I was waiting to see if it happens on Nightly, but it doesn't really seem to, so I guess we should uplift it in case it helps.

Approval Request Comment
[Feature/regressing bug #]: e10s
[User impact if declined]: This will make certain kinds of e10s OOM crash reports easier to understand.
[Describe test coverage new/current, TreeHerder]: this code runs frequently
[Risks and why]: low. It should mostly turn one kind of crash into another.
[String/UUID change made/needed]: none
Flags: needinfo?(continuation)
Attachment #8747899 - Flags: approval-mozilla-beta?
Attachment #8747899 - Flags: approval-mozilla-aurora?
Attachment #8747900 - Flags: approval-mozilla-beta?
Attachment #8747900 - Flags: approval-mozilla-aurora?
Attachment #8747901 - Flags: approval-mozilla-beta?
Attachment #8747901 - Flags: approval-mozilla-aurora?
Comment on attachment 8747899 [details] [diff] [review]
part 1 - Swap fallible and infallible TArray ParamTraits.

Improvements to better diagnose e10s OOM crashes, Aurora48+, Beta47+
Attachment #8747899 - Flags: approval-mozilla-beta?
Attachment #8747899 - Flags: approval-mozilla-beta+
Attachment #8747899 - Flags: approval-mozilla-aurora?
Attachment #8747899 - Flags: approval-mozilla-aurora+
Attachment #8747900 - Flags: approval-mozilla-beta?
Attachment #8747900 - Flags: approval-mozilla-beta+
Attachment #8747900 - Flags: approval-mozilla-aurora?
Attachment #8747900 - Flags: approval-mozilla-aurora+
Attachment #8747901 - Flags: approval-mozilla-beta?
Attachment #8747901 - Flags: approval-mozilla-beta+
Attachment #8747901 - Flags: approval-mozilla-aurora?
Attachment #8747901 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.