Open Bug 808080 Opened 7 years ago Updated 6 years ago

IPC parent aborts due to too large allocation for infallible array in PContent::Msg_AsyncMessage

Categories

(Core :: IPC, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

People

(Reporter: posidron, Assigned: cyu)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Attachments

(3 files)

Attached file callstack
This crash occurred with an optimized/non-debug build of B2G. I was intentionally using a non debug build to pass the DCHECK() functions in pickle.

Additionally we see the following warnings:

[Parent 2173] WARNING: waitpid failed pid:2356 errno:10: file /Users/cdiehl/Code/Mozilla/b2g/gecko/ipc/chromium/src/base/process_util_posix.cc, line 260
[Parent 2173] WARNING: Failed to deliver SIGKILL to 2356!(3).: file /Users/cdiehl/Code/Mozilla/b2g/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc, line 118
[Parent 2173] WARNING: pipe error (99): Connection reset by peer: file /Users/cdiehl/Code/Mozilla/b2g/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 425


---

An interception occurred only then if XRE_GetProcessType() equaled GeckoProcessType_Content.

Let me know if you need further information.
Is there a test case for this?
Looks like this crash is the result of running the Faulty IPC fuzzer. We can take a look at the crash stack and see if we can make a test case for it. I'll take a look at this.
Assignee: nobody → cyu
Reproduced in gdb by manipulating the InfallibleTArray length read in Pickle::ReadUInt32(). This is a release build:

207       UpdateIter(iter, sizeof(*result));
(gdb) p *result
$4 = 0
(gdb) p *result = 4000000000
$5 = -294967296
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x41dec13a in mozalloc_abort (msg=<value optimized out>) at /home/cervantes/hg/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30
30          MOZ_CRASH();
(gdb) bt
#0  0x41dec13a in mozalloc_abort (msg=<value optimized out>) at /home/cervantes/hg/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30
#1  0x414c229e in nsTArrayInfallibleAllocator::SizeTooBig (this=0xbe80451c, newLen=3196074784) at ../../dist/include/nsTArray.h:213
#2  nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyElements<mozilla::jsipc::CpowEntry> >::EnsureCapacity (this=0xbe80451c, newLen=3196074784) at ../../dist/include/nsTArray-inl.h:112
#3  nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyElements<mozilla::jsipc::CpowEntry> >::InsertSlotsAt (this=0xbe80451c, newLen=3196074784) at ../../dist/include/nsTArray-inl.h:258
#4  nsTArray_Impl<mozilla::jsipc::CpowEntry, nsTArrayInfallibleAllocator>::InsertElementsAt (this=0xbe80451c, newLen=3196074784) at ../../dist/include/nsTArray.h:1418
#5  nsTArray_Impl<mozilla::jsipc::CpowEntry, nsTArrayInfallibleAllocator>::SetLength (this=0xbe80451c, newLen=3196074784) at ../../dist/include/nsTArray.h:1378
#6  0x414d76ea in mozilla::dom::PContentParent::Read (this=0x46a47c00, __v=0xbe80451c, __msg=0xbe80464c, __iter=0xbe8045c4) at /home/cervantes/git/unagi/B2G/objdir-gecko/ipc/ipdl/PContentParent.cpp:6565
#7  0x414da03c in mozilla::dom::PContentParent::OnMessageReceived (this=0x46a47c00, __msg=...) at /home/cervantes/git/unagi/B2G/objdir-gecko/ipc/ipdl/PContentParent.cpp:3195
#8  0x414a5676 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0x46a47c30, aMsg=...) at /home/cervantes/hg/mozilla-central/ipc/glue/MessageChannel.cpp:970
In the parent we should use fallible allocation for any array that we construct based on an IPC message. If the allocation fails then we should probably kill the child...
Agree. I am trying make the codegen to produce code like this:

auto PContentParent::Read(
        InfallibleTArray<InputStreamParams>* __v,
        const Message* __msg,
        void** __iter) -> bool
{
    FallibleTArray<InputStreamParams> a;
    uint32_t length;
    if ((!(Read((&(length)), __msg, __iter)))) {
        FatalError("Error deserializing 'length' (uint32_t) of 'InputStreamParams[]'");
        return false;
    }

    if (!(__v)->SetLength(length)) {
      FatalError("Error setting length of 'InputStreamParams[]'");
      return false;
    }

    for (uint32_t i = 0; (i) < (length); (++(i))) {
        if ((!(Read((&(a[i])), __msg, __iter)))) {
            FatalError("Error deserializing 'InputStreamParams[i]'");
            return false;
        }
    }

    (__v)->SwapElements(a);
    return true;
}
You should use SetCapacity instead of SetLength!
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #6)
> You should use SetCapacity instead of SetLength!

For not doing unless default initialization and later get assigned? SetCapacity doesn't sound right because it only allocates spaces for holding the elements, but the elements don't exist semantically. Accessing them triggers assertion failure in debug builds.

I have the codegen produce the code. The code should be equivalent except for the extra allocation failure check, but b2g doesn't start correctly. I am trying to figure out what goes wrong.
Yeah, you're right. I thought that our serialization code constructed elements one by one but I'm wrong!
I narrowed down the problem to PLayerTransactionParent.cpp and PLayerTransactionChild.cpp where "(__v)->SwapElements(a);" doesn't work. The main thread of the chrome process is blocked in an infinite loop.

If I change replace it with "(__v)->ReplaceElementsAt(0, (__v)->Length(), (a).Elements(), length)" only in these 2 files the problem disappears.

The pattern of allocating a FallibleTArray, setting its length, and then Read() into each element doesn't look incorrect as in ipc/glue/IPCMessageUtils.h. I am wondering if LayerTransaction{Parent|Child} perform some tricks with the array elements and make this pattern fail.
Comment on attachment 826583 [details] [diff] [review]
Handle allocation error in reading array parameters in protocol actors.

Review of attachment 826583 [details] [diff] [review]:
-----------------------------------------------------------------

Hm, the EditReply stuff seems unrelated to this fix, right? I wonder if we can't move those smarts into nsTArray actually.

Also, can you attach a copy of the generated code?
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #12)
> Comment on attachment 826583 [details] [diff] [review]
> Handle allocation error in reading array parameters in protocol actors.
> 
> Review of attachment 826583 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Hm, the EditReply stuff seems unrelated to this fix, right? I wonder if we
> can't move those smarts into nsTArray actually.
> 
Actually it is related. Without the EditReply trait if we perform SwapElements with an AutoInfallibleTArray then we will get an incorrect result because it's unsafe to memmove or memcpy EditReply. We will enter an infinite loop later when we try to access mozilla::layers::OpContentBufferSwap::frontUpdatedRegion_. nsRegion maintains a circular linked list using the address of nsRegion::mRectListHead, and this makes EditReply unsafe to memcpy.
> Also, can you attach a copy of the generated code?
Generated code: PLayerTransactionChild.cpp
The reason I don't place the trait in nsTArray is that EditReply is IPC-specific code. Placing it in nsTArray makes it visible globally. Placing it in IPCMessageUtils.h make specific to IPC.
Attachment #826622 - Attachment mime type: text/x-c++src → text/plain
Comment on attachment 826583 [details] [diff] [review]
Handle allocation error in reading array parameters in protocol actors.

Review of attachment 826583 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/IPCMessageUtils.h
@@ +90,5 @@
>  
> +// mozilla::layers::EditReply needs to be copied with constructors because
> +// it may contain nsRegion, which is not memmove-safe.
> +template<>
> +struct nsTArray_CopyChooser<mozilla::layers::EditReply> {

Yeah, I really think this is the wrong place for this... It seems like what we want is for IPDL to generate this specialization automatically for all structs/unions *if* their types are non memmove-safe.

I'm pretty sure that this can be done with a combination of codegen and template magic like so:

  template<>
  struct nsTArray_CopyChooser<UnionType> :
    mozilla::Conditional<mozilla::IsSame<nsTArray_CopyChooser<Type1>::Type,
                                         nsTArray_CopyWithMemutils>::value &&
                         mozilla::IsSame<nsTArray_CopyChooser<Type2>::Type,
                                         nsTArray_CopyWithMemutils>::value,
                         nsTArray_CopyWithMemutils,
                         nsTArray_CopyWithConstructors<UnionType>>
  { };


Can you give that a try? I'm reluctant to start special-casing things like this in IPCMessageUtils.

::: ipc/ipdl/ipdl/lower.py
@@ +4357,4 @@
>          read.addstmts([
> +            StmtDecl(Decl(Type('FallibleTArray',
> +                               T=_cxxBareType(eltipdltype, self.side)),
> +                          avar.name)),

Let's move this after we read the length.
Attachment #826583 - Flags: review?(bent.mozilla) → review-
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #16)
> Comment on attachment 826583 [details] [diff] [review]
> Handle allocation error in reading array parameters in protocol actors.
> 
> Review of attachment 826583 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/IPCMessageUtils.h
> @@ +90,5 @@
> >  
> > +// mozilla::layers::EditReply needs to be copied with constructors because
> > +// it may contain nsRegion, which is not memmove-safe.
> > +template<>
> > +struct nsTArray_CopyChooser<mozilla::layers::EditReply> {
> 
> Yeah, I really think this is the wrong place for this... It seems like what
> we want is for IPDL to generate this specialization automatically for all
> structs/unions *if* their types are non memmove-safe.
> 
> I'm pretty sure that this can be done with a combination of codegen and
> template magic like so:
> 
>   template<>
>   struct nsTArray_CopyChooser<UnionType> :
>     mozilla::Conditional<mozilla::IsSame<nsTArray_CopyChooser<Type1>::Type,
>                                          nsTArray_CopyWithMemutils>::value &&
>                          mozilla::IsSame<nsTArray_CopyChooser<Type2>::Type,
>                                          nsTArray_CopyWithMemutils>::value,
>                          nsTArray_CopyWithMemutils,
>                          nsTArray_CopyWithConstructors<UnionType>>
>   { };
> 
> 
So we are going to generate things like this:

template<>
struct nsTArray_CopyChooser<EditReply> :
  mozilla::Conditional<mozilla::IsSame<nsTArray_CopyChooser<OpContentBufferSwap>::Type,
                                       nsTArray_CopyWithMemutils>::value &&
                       mozilla::IsSame<nsTArray_CopyChooser<OpTextureSwap>::Type,
                                       nsTArray_CopyWithMemutils>::value &&
                       mozilla::IsSame<nsTArray_CopyChooser<ReplyTextureRemoved>::Type,
                                       nsTArray_CopyWithMemutils>::value,
                       nsTArray_CopyWithMemutils,
                       nsTArray_CopyWithConstructors<UnionType>>
{ };

To know whether EditReply is memcpy-able the compiler has to instantiate
template<>
struct nsTArray_CopyChooser<OpContentBufferSwap>

Then we need to either descend down to the data members of OpContentBufferSwap, or stop here by providing a specialization outside of the code-gen'd file.
If we choose to descend to the member types, we still need to provide some specialization for the basic types. And this could slow down the compilation of IPC code.
And with the landing of bug 845874, we no longer need to provide a specialization to copy EditReply with constructors. Do we still need to implement the above in the code gen?
Yes, I was thinking that we should make IPDL generate these structs for all union/struct types. Any members that are not union/struct types will need to rely on hand-written specializations that should hopefully exist in the relevant modules (e.g. gfx for nsRegion). Any members that don't have custom specializations will just use the default memcpy automatically.

The compiler will only need to look at these templates if we end up using them in an array, so it shouldn't slow down most of our compile I don't think.

And if we don't do this now we're just going to have bugs later on. Seems better to fix while we understand the problem.
Generating the specialization is not too difficult. There is one problem with
> mozilla::IsSame<nsTArray_CopyChooser<StructMemberType1>::Type, ...
where StructMemberType must be qualified, but in ipdlh it can appear as not qualified.
For example, in PDNSRequestParams.ipdlh:
> using NetAddrArray from "mozilla/net/PDNSParams.h";
and NetAddrArray is:
> namespace mozilla {
> namespace net {
> 
> // Need to define typedef in .h file--can't seem to in ipdl.h file?
> typedef nsTArray<NetAddr> NetAddrArray;
> 
> } // namespace net
> } // namespace mozilla
We should use mozilla::net::NetAddrArray as StructMemberType1, where we can't easily know that NetAddrArray belongs in namespace mozilla::net without parsing the .h file referenced in the ipdlh file.

I guess what we can do at best is try to check if the using statement in ipdlh refers to a qualified name (checking the existence of "::" in the name). If not then typedef it again like:

> //-----------------------------------------------------------------------------
> // Declaration of the IPDL type |struct DNSRecord|
> //
> namespace mozilla {
> namespace net {
> class DNSRecord MOZ_FINAL
> {
> public:
>   typedef NetAddrArray NetAddrArray_t;
So we can safely use mozilla::IsSame<nsTArray_CopyChooser<mozilla::net::DNSRecord::NetAddrArray_t>::Type.
You need to log in before you can comment on or make changes to this bug.