Closed Bug 1395138 Opened 2 years ago Closed 2 years ago

Crash in mozilla::layers::RenderLayers<T>

Categories

(Core :: Graphics: Layers, defect, P1, critical)

57 Branch
Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 57+ fixed
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 + fixed
firefox58 + fixed

People

(Reporter: jcristau, Assigned: jnicol)

References

Details

(5 keywords, Whiteboard: [gfx-noted][adv-main57+][adv-esr52.5+][post-critsmash-triage])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-68349ab9-b464-49b2-88fc-be1af0170830.
=============================================================

This looks like UAF affecting firefox for android, spiking on August 29.

0 	libxul.so 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite> 	gfx/layers/composite/ContainerLayerComposite.cpp:414
1 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 	gfx/layers/composite/ContainerLayerComposite.cpp:620
2 	libxul.so 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite> 	gfx/layers/composite/ContainerLayerComposite.cpp:456
3 	libxul.so 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> 	gfx/layers/composite/ContainerLayerComposite.cpp:620
4 	libxul.so 	mozilla::layers::LayerManagerComposite::Render 	gfx/layers/composite/LayerManagerComposite.cpp:955
5 	libxul.so 	mozilla::layers::LayerManagerComposite::UpdateAndRender 	gfx/layers/composite/LayerManagerComposite.cpp:532
6 	libxul.so 	mozilla::layers::LayerManagerComposite::EndTransaction 	gfx/layers/composite/LayerManagerComposite.cpp:462
7 	libxul.so 	mozilla::layers::CompositorBridgeParent::CompositeToTarget 	gfx/layers/ipc/CompositorBridgeParent.cpp:1040
8 	libxul.so 	mozilla::layers::CompositorVsyncScheduler::Composite 	gfx/layers/ipc/CompositorVsyncScheduler.cpp:262
9 	libxul.so 	mozilla::detail::RunnableMethodImpl<SoftwareDisplay*, void (SoftwareDisplay::*)(mozilla::TimeStamp), true, (mozilla::RunnableKind)1u, mozilla::TimeStamp>::Run 	xpcom/threads/nsThreadUtils.h:1142
10 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:452
11 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:460
12 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:535

https://crash-stats.mozilla.com/signature/?address=~e5e5&signature=mozilla%3A%3Alayers%3A%3ARenderLayers<T>
Less frequently there are windows crashes with this signature. Every one is a UAF address.

This is happening in release versions (not beta) so a spike must represent some web content change, not necessarily a "regression".
Keywords: sec-high
Track 56+/57+ as sec-high.
I can't reproduce this, and I'm not sure what's going on.

I wonder if in ContainerPrepare() it matters that the IsBackfaceHidden() check is outside of the AsContainerLayer() check.

Potentially bug 1381753 could be related to the recent spike. Any ideas Matt?
Flags: needinfo?(matt.woodrow)
It's not obvious why that change would affect this.

I'm not actually sure what's been free'd here, but I think it's the Layer itself, which wouldn't have happened anywhere near this code.

I don't see how we could have got through ContainerPrepare without crashing though.
Flags: needinfo?(matt.woodrow)
I'm not entirely sure either, but think it is mPrepared that's freed.

If it was the layer, presumably through a different pointer, this pointer would be dangling and the addresses would be more random. But the addresses are mainly 0x10 or 0xe5e5e5f5 (free'd, + 0x10).

Not that I can see how we get in that state either, because if a layer isn't prepared then it won't be rendered as far as I can tell.
Actually I think it is the layer itself which has been freed. The vtable pointer is poisoned to 0xe5e5e5e5, and the GetLayer function pointer is at an offset of 0x10 in the vtable.

Still not sure how the layer is being freed between the calls to Prepare() and Render().
See Also: → 1397144
Duplicate of this bug: 1398638
See Also: 1397144
Duplicate of this bug: 1397144
Two comments mentioned the window gone small then crash

Some mentioned fullscreen and playing video
Milan, this is apparently a top crash on the Android Nightly build 20170910100151, in addition to being some kind of use-after-free. Do you know who might be able to look into this? Thanks.
Flags: needinfo?(milan)
Do we poison things immediately after we've freed the pointer, or can it happen later on?

If so, then maybe the Layer was already dead earlier, and that's how we survived ContainerPrepare?

Alternatively, maybe we're reusing old ContainerPrepare results somehow.
(In reply to Matt Woodrow (:mattwoodrow) from comment #11)
> Do we poison things immediately after we've freed the pointer, or can it
> happen later on?

It happens inside the free call.
Jamie's been looking at it, let's continue that.
In the meantime, Matt/Markus, we seem to have a five (or so :) long string of regressions caused by fixes to regressions caused by fixes to regressions.  Is there something we can back out here?  Because we can't ship with this kind of a high volume crash.
Assignee: nobody → jnicol
Flags: needinfo?(mstange)
Flags: needinfo?(milan)
Flags: needinfo?(matt.woodrow)
Priority: -- → P1
Whiteboard: [gfx-noted]
The chain seems to be: bug 1349418 -> bug 1361970 -> bug 1373479 -> bug 1381753 -> this bug.
I don't think we have any desperate need for bug 1349418 at the moment, so in theory I think it could be backed out, but I don't know if backing out this many patches doesn't have too much risk associated with it.
Flags: needinfo?(mstange)
I don't see a way bug 1381753 could have caused this.

It looks like this particular crash first showed up in 56.0b7.

The changesets between 56.0b1 and 56.0b7 are:

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=6c489d5df6d4d85ddb297666e8c1cbbda96a852c&tochange=8081a895e435

Unfortunately, there's still nothing that stands out in that.

We can try backing out bug 1381753 to see if it affects the crash rate, but it seems unlikely to me.

If we can't figure out the regression, then changing Layers to use strong references to their children might be the best approach for a fix.
Flags: needinfo?(matt.woodrow)
If we're desperate enough, bug 1397918 is a basic compositor "freeing invalid pointer" bug, and while the stacks are not really the same, perhaps it helps.
(In reply to Matt Woodrow (:mattwoodrow) from comment #15)
> If we can't figure out the regression, then changing Layers to use strong
> references to their children might be the best approach for a fix.

Isn't this already the case? Albeit by manually calling NS_ADDREF/NS_RELEASE on insert/remove. Are you suggesting to make ContainerLayer hold an array of RefPtr<Layer> for its children?
Flags: needinfo?(matt.woodrow)
I actually forgot about that. Having RefPtr<> seems safer, but it does seem like the existing code should work.
Flags: needinfo?(matt.woodrow)
Duplicate of this bug: 1399020
Maybe this is worth a try. I really don't think it should be necessary but I'm out of ideas otherwise.
Attachment #8909269 - Flags: review?(matt.woodrow)
Comment on attachment 8909269 [details] [diff] [review]
0001-Bug-1395138-Hold-reference-to-layers-in-ContainerLay.patch

Review of attachment 8909269 [details] [diff] [review]:
-----------------------------------------------------------------

Worth a try for sure!
Attachment #8909269 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8909269 [details] [diff] [review]
0001-Bug-1395138-Hold-reference-to-layers-in-ContainerLay.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No

Which older supported branches are affected by this flaw?

It looks like all of them, though it seems to have gotten especially bad on 57 on Android

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Patch should apply directly

How likely is this patch to cause regressions; how much testing does it need?

Not likely. Changes a raw pointer to be a RefPtr. Have tested locally and it doesn't cause any problems.
Attachment #8909269 - Flags: sec-approval?
[@ nexttowardf ] seems to be a variant of this crash. It first showed up in the 9-17 build. For instance: bp-e85a3a23-8a61-4bb2-a11c-90cf80170917
Crash Signature: [@ mozilla::layers::RenderLayers<T>] → [@ mozilla::layers::RenderLayers<T>] [@ nexttowardf ]
sec-approval for trunk checkin on October 9, two weeks into the cycle. Do not check this that date, please.

We'll want a beta patch made and nominated for 57 and ESR52 at that time.
Whiteboard: [gfx-noted] → [gfx-noted][checkin on 10/9]
Attachment #8909269 - Flags: sec-approval? → sec-approval+
"Do not check this in before that date" rather (that is, October 9).
Duplicate of this bug: 1402730
Patch backported to 52
Attached patch Patch for 57Splinter Review
Same as patch for central, but that was made using git and this one is hg
Jamie pushed this to inbound.
https://hg.mozilla.org/integration/mozilla-inbound/rev/359f2f4a8a6c
Whiteboard: [gfx-noted][checkin on 10/9] → [gfx-noted]
Hi Jamie, could you please nominate a patch for beta57 and esr52? I am hoping we can land it everywhere (central/beta/esr52) today, i.e. 10/09 date as tagged by sec team.
Flags: needinfo?(jnicol)
Comment on attachment 8916679 [details] [diff] [review]
Patch for 57

Approval Request Comment
[Feature/Bug causing the regression]:

Not sure

[User impact if declined]:

Exploitable crash

[Is this code covered by automated tests?]:

Yes

[Has the fix been verified in Nightly?]:

Not yet, landed to inbound earlier today. I have built and run beta branch with this patch however. 

[Needs manual test from QE? If yes, steps to reproduce]:

N/A
 
[List of other uplifts needed for the feature/fix]:

N/A

[Is the change risky?]:

No

[Why is the change risky/not risky?]:

Simple change, make pointer refcounted

[String changes made/needed]:

N/A
Attachment #8916679 - Flags: approval-mozilla-beta?
Comment on attachment 8916670 [details] [diff] [review]
Patch backported to 52

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec:high
User impact if declined: exploitable crash
Fix Landed on Version: 58
Risk to taking this patch (and alternatives if risky): Low, simple change
String or UUID changes made by this patch: N/A
Attachment #8916670 - Flags: approval-mozilla-esr52?
This is also affecting 56, should we uplift there too?
Flags: needinfo?(jnicol)
https://hg.mozilla.org/mozilla-central/rev/359f2f4a8a6c
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment on attachment 8916679 [details] [diff] [review]
Patch for 57

Fix a crash and a sec-high, taking it.
Should be in 57b8
Attachment #8916679 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8916670 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Group: gfx-core-security → core-security-release
Whiteboard: [gfx-noted] → [gfx-noted][adv-main57+][adv-esr52.5+]
Flags: qe-verify-
Whiteboard: [gfx-noted][adv-main57+][adv-esr52.5+] → [gfx-noted][adv-main57+][adv-esr52.5+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.