crash in mozilla::layers::SharedSurfaceTextureHost::Lock()

RESOLVED FIXED in Firefox 36

Status

()

defect
--
critical
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: jesup, Assigned: nical)

Tracking

({crash, csectype-uaf, sec-critical})

unspecified
mozilla38
All
Android
Points:
---

Firefox Tracking Flags

(firefox36+ fixed, firefox37+ fixed, firefox38+ fixed, firefox-esr31 unaffected, b2g-v1.4 unaffected, b2g-v2.0 unaffected, b2g-v2.0M unaffected, b2g-v2.1 unaffected, b2g-v2.1S unaffected, b2g-v2.2 unaffected, b2g-master unaffected)

Details

(Whiteboard: [adv-main36+] gfx-noted, crash signature, )

Attachments

(4 attachments)

This bug was filed from the Socorro interface and is 
report bp-adf37f72-5201-47ea-ada3-087122141231.
=============================================================
Highly repeatable at http://diskdetective.org
Hit the icon that looks like an open book, scroll around in the help (and other techniques as well).
The addresses from a crashreporter search aren't good....

Milan; please find the right people
This is use after free, right? Kats, can you reproduce this on any of the devices you have?
See Also: → 1111889
I hit it tons (hard to use the help popups on that page or scroll around) on my Nexus 10 (Android 5.0.1)
I ran a local debug build with gdb attached, and using the STR ended up triggering the MOZ_ASSERT at http://mxr.mozilla.org/mozilla-central/source/gfx/gl/SharedSurface.h?rev=ccd27160dd97#106 which caused Fennec to crash. The assert was triggered on the compositor thread, so grabbed the backtraces from both the compositor thread and the Gecko thread, they are attached. I'm not sure if this is enough to debug what's going on but I can get more info if needed since I seem to be able to reliably repro.
Guessing nical is the person to look at this...
Flags: needinfo?(nical.bugzilla)
The SharedSurface is being used after it is freed. Looks like it might be the same issue as bug 1098227 and duplicates btw. Luckily Kats's stack trace has more helpful info than the stacks in the related bugs.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
What seems to be happening is that the SharedSurface object is deleted on the main thread ~ShSurfHandle which happens in CanvasClient::Clear, without any form of synchronization with the TextureClient/Host memory management protocol. Normally TextureClient is supposed to fully own its underlying data (the SharedSurface in our case) but it's not the case for the SharedSurfaceTextureClient/Host.

The right way to fix this would be to have the SharedSurfaceTextureClient own the SharedSurface, but in that case, if the SharedSurface needs to be deleted on the main thread, the only way to ensure that the deallocation happens race-free is to make it synchronous (TextureClient synchronously waits for the host to be deallocated) which is sad and slow), or to resurect the logic we used to have where the data would be deleted along with the TextureChild. If the SharedSurface can be deleted on the compositor thread, things are simpler and we can follow the same deallocation protocol as with most TextureClient types (but I have a feeling it can't).

Otherwise we can hack around it and have a special ForceSynchronousDeallocation() which would only be called when the CanvasClient is cleared if the TextureClient is a SharedSurfaceTextureClient. Paying the cost of Synchronous deallocation is acceptable if it happens only on "rare" occasions like when tearing down the layer (vs every frame).
Sec-crit with understanding of how to resolve it - is there a plan/eta for resolving it?
I think we should patch this by having the ShSurfTexClient hold a ShSurfHandle, and make it destruct synchronously. This would solve the immediate problem, and we can have room to breathe before working on a permanent solution.

We want to get rid of the ShSurfClient/Host pair, but the TexClients we use won't own their own data until we supplant ShSurf with TexClient.
This should prevent the SharedSurface to be deallocated before the TextureHost since the latter is synchronously deallocated before the ShSurfHandle. This patch attempts to only do synchronous deallocation when clearing CanvasClient since the race is unlikely to happen anywhere else (unless the client side can overproduce more than one frame in advance but I think that Matt Woodrow made it impossible, I can't remember in which bug). If this is not enough we can make the destruction of all TextureClients used by CanvasClientSurfaceStream synchronous, but performance will suffer.
Attachment #8553088 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8553088 [details] [diff] [review]
Synchronously deallocate the last TextureClient when destroying CanvasClientSurfaceStream

Review of attachment 8553088 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/TextureHost.h
@@ +551,5 @@
>    PTextureParent* mActor;
>    TextureFlags mFlags;
>    int mCompositableCount;
> +  // whether client is reponsible for deallocating the shared data
> +  bool mClientDeallocation;

Either TextureHost.h shouldn't be part of this patch or TextureHost.cpp should be?
Comment on attachment 8553088 [details] [diff] [review]
Synchronously deallocate the last TextureClient when destroying CanvasClientSurfaceStream

Review of attachment 8553088 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/TextureHost.h
@@ +551,5 @@
>    PTextureParent* mActor;
>    TextureFlags mFlags;
>    int mCompositableCount;
> +  // whether client is reponsible for deallocating the shared data
> +  bool mClientDeallocation;

Oops, this is a leftover from a different attempt at fixing the issue. Please ignore this chunk.
Comment on attachment 8553088 [details] [diff] [review]
Synchronously deallocate the last TextureClient when destroying CanvasClientSurfaceStream

Review of attachment 8553088 [details] [diff] [review]:
-----------------------------------------------------------------

As a short term fix. It seems good.

::: gfx/layers/client/CanvasClient.cpp
@@ +405,5 @@
>  
> +void
> +CanvasClientSharedSurface::ClearSurfaces()
> +{
> +  if (mFrontTex && (mFront || mPrevFront)) {

Is there a case that mPrevFront is related to mFrontTex?
Even when there is not relation ship between mFrontTex and mPrevFront,
synchronous ipc seems to indirectly fix the problem.
Attachment #8553088 - Flags: review?(sotaro.ikeda.g) → review+
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> Is there a case that mPrevFront is related to mFrontTex?
> Even when there is not relation ship between mFrontTex and mPrevFront,
> synchronous ipc seems to indirectly fix the problem.

I was being a bit conservative about triggering the synchronous safe-guard, with the reasoning that ::Clear could happen just after ::update in which case the SharedSurface recently stored in mPrevFront might still be in use on the compositor side. The timing is a bit unlikely, it feels safer to me to just rule out this possibility.
AFAICT, this should have had sec-approval first. Please fill it out retroactively.
https://wiki.mozilla.org/Security/Bug_Approval_Process
Flags: needinfo?(nical.bugzilla)
https://hg.mozilla.org/mozilla-central/rev/57c76122bb27
Status: NEW → RESOLVED
Closed: 5 years ago
status-b2g-v1.4: --- → ?
status-b2g-v2.0: --- → ?
status-b2g-v2.1: --- → ?
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment on attachment 8553088 [details] [diff] [review]
Synchronously deallocate the last TextureClient when destroying CanvasClientSurfaceStream

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

I suppose it would be quite hard. This bug can cause the compositor thread to try to read form an already-destroyed texture and crash (no writes).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No

Which older supported branches are affected by this flaw?

Gecko 34 and later, but something recent must have made the race condition more likely to happen. b2g is not impacted.

If not all supported branches, which bug introduced the flaw?

My guess would be Bug 1066280 but it's hard to be sure.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

We should uplift the fix just to be on the safe side when we have confirmed that it does fix the issue.

How likely is this patch to cause regressions; how much testing does it need?

I think that the patch isn't risky. Let's give it a few days to bake in central to be sure.
Flags: needinfo?(nical.bugzilla)
Attachment #8553088 - Flags: sec-approval?
Comment on attachment 8553088 [details] [diff] [review]
Synchronously deallocate the last TextureClient when destroying CanvasClientSurfaceStream

sec-approval+ given (which is good since it went in anyway). I'll mark the Aurora and Beta patches for uplift. We should do those soon.
Attachment #8553088 - Flags: sec-approval? → sec-approval+
Comment on attachment 8554581 [details] [diff] [review]
Patch upliftable to aurora

[Triage Comment]
Attachment #8554581 - Flags: approval-mozilla-aurora+
Comment on attachment 8554582 [details] [diff] [review]
Patch upliftable to beta

Approving patches for branches for sec-critical issue.
Attachment #8554582 - Flags: approval-mozilla-beta+
Whiteboard: gfx-noted → [adv-main36+] gfx-noted
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.