Closed Bug 912725 Opened 7 years ago Closed 7 years ago

Segfault in GrallocBufferActor::ActorDestroy when closing CubeVid from b2g window manager

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla26
blocking-b2g koi+
Tracking Status
firefox25 --- unaffected
firefox26 --- fixed
firefox-esr17 --- unaffected
b2g18 --- unaffected

People

(Reporter: jld, Assigned: bjacob)

References

(Depends on 1 open bug)

Details

(Keywords: csectype-uaf, sec-critical)

Attachments

(1 file)

STR: start CubeVid, switch to the window manager (long press of home button), swipe card up to close the app.  The b2g main process then crashes.

(gdb) bt
#0  0x417021f8 in mozilla::layers::GrallocBufferActor::ActorDestroy (this=<value optimized out>)
    at /home/jld/src/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:273
#1  0x4143cbcc in mozilla::dom::bluetooth::PBluetoothRequestChild::DestroySubtree (
    this=0x45a3198c, 
    why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown)
    at /home/jld/src/B2G/objdir-gecko/ipc/ipdl/PBluetoothRequestChild.cpp:330
#2  0x414ac354 in mozilla::layers::PLayerTransactionParent::DestroySubtree (this=0x45ba7ec0, 
    why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown)
    at /home/jld/src/B2G/objdir-gecko/ipc/ipdl/PLayerTransactionParent.cpp:865
#3  0x414a03ce in mozilla::layers::PCompositorParent::DestroySubtree (this=0x47626800, 
    why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown)
    at /home/jld/src/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:855
#4  0x414a03fa in mozilla::layers::PCompositorParent::OnChannelError (this=0x47626800)
    at /home/jld/src/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:732
#5  0x4142df62 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x4762680c)
    at /home/jld/src/B2G/gecko/ipc/glue/AsyncChannel.cpp:582
#6  0x4142e21a in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x4762680c)
    at /home/jld/src/B2G/gecko/ipc/glue/AsyncChannel.cpp:547
#7  0x4109ddb6 in DispatchToMethod<WebCore::ReverbConvolver, void (WebCore::ReverbConvolver::*)()>
    (this=<value optimized out>) at /home/jld/src/B2G/gecko/ipc/chromium/src/base/tuple.h:383
#8  RunnableMethod<WebCore::ReverbConvolver, void (WebCore::ReverbConvolver::*)(), Tuple0>::Run (
    this=<value optimized out>) at /home/jld/src/B2G/gecko/ipc/chromium/src/base/task.h:307
[...]

(gdb) l
269     // used only for hacky fix in gecko 23 for bug 862324
270     void GrallocBufferActor::ActorDestroy(ActorDestroyReason)
271     {
272       if (mDeprecatedTextureHost) {
273         mDeprecatedTextureHost->ForgetBuffer();
274       }
275     }

(gdb) disas
Dump of assembler code for function _ZN7mozilla6layers18GrallocBufferActor12ActorDestroyENS_3ipc16IProtocolManagerINS2_10RPCChannel11RPCListenerEE18ActorDestroyReasonE:
   0x417021f0 <+0>:     push    {r4, lr}
   0x417021f2 <+2>:     ldr     r0, [r0, #64]   ; 0x40
   0x417021f4 <+4>:     cbz     r0, 0x417021fc <_ZN7mozilla6layers18GrallocBufferActor12ActorDestroyENS_3ipc16IProtocolManagerINS2_10RPCChannel11RPCListenerEE18ActorDestroyReasonE+12>
   0x417021f6 <+6>:     ldr     r3, [r0, #0]
=> 0x417021f8 <+8>:     ldr     r3, [r3, #84]   ; 0x54
   0x417021fa <+10>:    blx     r3
   0x417021fc <+12>:    pop     {r4, pc}
End of assembler dump.

Given that r0 seems to be mDeprecatedTextureHost and r3 its vtable pointer:

(gdb) p/x $r0
$9 = 0x440b7c10
(gdb) p/x $r3
$10 = 0x1
(gdb) p/x sizeof(*mDeprecatedTextureHost)
$11 = 0x18
(gdb) x/6xw $r0
0x440b7c10:     0x00000001      0x00000001      0x00000000      0x3e800000
0x440b7c20:     0x3dcccccd      0x3e800000

That looks bad.
I'd bet it's a double free.
Blocks: GFXB2G1.2
It gets worse.  I was able to use a conditional breakpoint to stop before the first `ldr` and thus recover the value of `this`, but this time the corruption is different:

(gdb) x/w $r0 + 64
0x438f5370:     0x4369df40
(gdb) x/w 0x4369df40
0x4369df40:     0x00680063

ASCII letters are immediately suspect....

(gdb) x/12xw 0x4369df40
0x4369df40:     0x00680063      0x006c0069      0x002d0064      0x00720070
0x4369df50:     0x0063006f      0x00730065      0x002d0073      0x00680073
0x4369df60:     0x00740075      0x006f0064      0x006e0077      0x00000000

(gdb) p *(char[2][24]*) 0x4369df40
$2 = {"c\000h\000i\000l\000d\000-\000p\000r\000o\000c\000e\000s", 
  "s\000-\000s\000h\000u\000t\000d\000o\000w\000n\000\000\000\000"}

So, "child-process-shutdown" as a JavaScript string.  And it overwrote the DeprecatedTextureHost.

I notice that mDeprecatedTextureHost is a plain pointer to DeprecatedTextureHost, not a RefPtr or anything.  Could this be a use-after-free?
Looks like a similar problem to what happened in bug 900012.

I am still uncertain about why we need to call ForgetBuffer() on the TextureHost there. I think the android::GraphicBuffer could outlive the GrallocBufferActor without blowing up.
Assignee: nobody → nical.bugzilla
Assignee: nical.bugzilla → bjacob
...working on it, have big gdb commands files and logs, need more debugging...
Depends on: 879681
So. GDB is nearly unusable with the B2G emulator (seems to work better with devices). Went back to printf debugging.

Here's the gist of it:

# part 1

ENTER DeprecatedImageHostSingle 0x45ecc840 Update(SurfaceDescriptor=0x47049514)
  ENTER GrallocDeprecatedTextureHostOGL 0x45efa340 SwapTexturesImpl (SurfaceDescriptor=0x47049514) with mBuffer=0x0
    ENTER RegisterDeprecatedTextureHostAtGrallocBufferActor(TextureHost=0x45efa340, SurfaceDescriptor=0x47049514)
      Valid descriptor, GrallocBufferActor=0x439ee7e0
      ENTER GrallocBufferActor 0x439ee7e0 SetDeprecatedTextureHost(TextureHost=0x45efa340)
        GrallocBufferActor's mDeprecatedTextureHost pointer at 0x439ee820 changed from 0x45fd20c0 to 0x45efa340
      LEAVE GrallocBufferActor 0x439ee7e0 SetDeprecatedTextureHost(TextureHost=0x45efa340)
    LEAVE RegisterDeprecatedTextureHostAtGrallocBufferActor(TextureHost=0x45efa340, SurfaceDescriptor=0x47049514)
  LEAVE GrallocDeprecatedTextureHostOGL 0x45efa340 SwapTexturesImpl (SurfaceDescriptor=0x47049514) with mBuffer=0x0
LEAVE DeprecatedImageHostSingle 0x45ecc840 Update(SurfaceDescriptor=0x47049514)

(... snip ...)

# part 2

ENTER GrallocDeprecatedTextureHostOGL 0x45efa340 destructor
LEAVE GrallocDeprecatedTextureHostOGL 0x45efa340 destructor

(... snip ...)

# part 3

ENTER GrallocBufferActor 0x439ee7e0 ActorDestroy, mDeprecatedTextureHost=0x45efa340
  *crash*

So there you go:

 - GrallocBufferActor::ActorDestroy crashes because mDeprecatedTextureHost is dangling (see part 3);
 - the GrallocBufferActor was not notified when that TextureHost went away (part 2) because the mBuffer pointer was null;
 - so the question is how is it possible that the GrallocBufferActor knew about that TextureHost without the TextureHost knowing about the GrallocBufferActor, and the answer is above (part 1): this is a DeprecatedImageHostSingle, which calls GrallocDeprecatedTextureHostOGL::SwapTexturesImpl, which indeed has TWO calls to RegisterDeprecatedTextureHostAtGrallocBufferActor, only the second one is hit here (because mBuffer is null) and it's not matched by a corresponding setting of the TextureHost's mBuffer, so we end up in this weird situation where our TextureHost doesn't remember about the GrallocBufferActor that it's registered at.

I don't remember what was the thinking here in SwapTexturesImpl, behind not updating mBuffer.

In fact, shouldn't we replace all the non-null calls to RegisterDeprecatedTextureHostAtGrallocBufferActor by self-calls to SetBuffer, which takes care to call RegisterDeprecatedTextureHostAtGrallocBufferActor as well as updating mBuffer?
Another question, was mBuffer really the right concept for "the grallocbufferactor that we need to notify when a TextureHost goes away" ?

I'm blocked by not really understanding what SwapTexturesImpl is doing.
(In reply to Benoit Jacob [:bjacob] from comment #6)
> Another question, was mBuffer really the right concept for "the
> grallocbufferactor that we need to notify when a TextureHost goes away" ?
> 
> I'm blocked by not really understanding what SwapTexturesImpl is doing.

Did you forget to call SetBuffer with DeprecatedImageHostSingle case?
https://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ImageHost.cpp#228
But we did SetBuffer calls for DeprecatedImageHostBuffer.
https://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ImageHost.cpp#370
(In reply to Benoit Jacob [:bjacob] from comment #6)
> Another question, was mBuffer really the right concept for "the
> grallocbufferactor that we need to notify when a TextureHost goes away" ?
> 
> I'm blocked by not really understanding what SwapTexturesImpl is doing.

mBuffer was originally meant to be the back buffer of double buffered texture hosts. So if something was not to be double buffered it would not hold anything in mBuffer.

Then things went banana with gralloc, quick'n hacky fixes came in and I am not sure that it still means that anymore.
(In reply to peter chang[:pchang] from comment #7)
> (In reply to Benoit Jacob [:bjacob] from comment #6)
> > Another question, was mBuffer really the right concept for "the
> > grallocbufferactor that we need to notify when a TextureHost goes away" ?
> > 
> > I'm blocked by not really understanding what SwapTexturesImpl is doing.
> 
> Did you forget to call SetBuffer with DeprecatedImageHostSingle case?
> https://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> ImageHost.cpp#228
> But we did SetBuffer calls for DeprecatedImageHostBuffer.
> https://mxr.mozilla.org/mozilla-central/source/gfx/layers/composite/
> ImageHost.cpp#370

Oh it doesn't work because it is single buffer use case.
(Hiding this bug, just in case - dveditz can you have a look at this please from a security perpective?)
Group: core-security
Flags: needinfo?(dveditz)
This code has not been put in the hands of users yet and will be fixed before it gets there. I don't think we need to hide the bug.
Duplicate of this bug: 903680
(In reply to Nicolas Silva [:nical] from comment #11)
> This code has not been put in the hands of users yet and will be fixed
> before it gets there. I don't think we need to hide the bug.

Indeed. Un-hiding.
Group: core-security
(In reply to Paul Theriault [:pauljt] from comment #10)
> (Hiding this bug, just in case - dveditz can you have a look at this please
> from a security perpective?)

This is a heap use-after-free, so it would be really bad (possibly sec-critical) from a security perspective if it hit end-users. But since it's bad enough to block releasing, and we are confident that we can fix it in short order, that won't happen, so there's no need to hide it.
Why this is fixes this crash:

The second RegisterDeprecatedTextureHostAtGrallocBufferActor call in SwapTexturesImpl is bogus in the single-buffered case, where this sets a one-way relationship from the GBA to the TextureHost, causing the present crash as explained above. So we want to avoid it in the single-buffered case. This patch just removes it, and re-introduces it (behind the overridden SetBuffer()) in SwapTextures() in the base class, which is only called in the double-buffering case.

Indeed note that SetBuffer is overridden by GrallocDeprecatedTextureHostOGL to call RegisterDeprecatedTextureHostAtGrallocBufferActor.

Why this is actually an improvement:

We really want to have our TextureHost's registered with the right GBA at any time, or else we have this kind of crashes. In SwapTextures, we were overwriting *mBuffer without notifying the underlying GBA about the TextureHost change, relying on SwapTexturesImpl to do so. Since these two things need to be done together, it's better to have them done in the same place. This patch moves the RegisterDeprecatedTextureHostAtGrallocBufferActor to the same place (called by SetBuffer).
Attachment #802558 - Flags: review?(nical.bugzilla)
Note: I checked that this patch does fix this crash for me on B2G emulator.
Comment on attachment 802558 [details] [diff] [review]
Do the registration of the TextureHost with the mBuffer exactly when we overwrite *mBuffer, so that in particular we don't do a bogus registration in the single-buffered case

Review of attachment 802558 [details] [diff] [review]:
-----------------------------------------------------------------

Given how fragile the surrounding code is, make sure you push to try before landing!
Attachment #802558 - Flags: review?(nical.bugzilla) → review+
Blocks: 901559
Tagging koi+ as it blocks 901559 which is koi+
blocking-b2g: --- → koi+
No longer blocks: GFXB2G1.2
https://hg.mozilla.org/mozilla-central/rev/e20f0978e31e
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 916714
This broke <video> on Android.
It turns out the that assertion that this triggered on android was just a faulty assertion, see bug 916714 comment 6.
Flags: needinfo?(dveditz)
Depends on: 923530
You need to log in before you can comment on or make changes to this bug.