Closed Bug 1107009 Opened 10 years ago Closed 9 years ago

Intermittent test_bug346659.html | application crashed [@ std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::Lay

Categories

(Core :: Graphics: Layers, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox35 --- wontfix
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed
firefox-esr31 36+ fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: cbook, Assigned: kats)

References

()

Details

(4 keywords, Whiteboard: [e10s only?][adv-main36+][adv-esr31.5+][b2g-adv-main2.2+])

Attachments

(4 files, 1 obsolete file)

Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-e10s-3

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=4348319&repo=mozilla-inbound

9:53:35 WARNING - PROCESS-CRASH | /tests/dom/tests/mochitest/bugs/test_bug346659.html | application crashed [@ std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::find(unsigned long const&)]
19:53:35 INFO - Crash dump filename: /tmp/tmpfntc2z.mozrunner/minidumps/2cb3b840-7f80-6a60-2d6f69c0-40390d3d.dmp
19:53:35 INFO - Operating system: Linux
19:53:35 INFO - 0.0.0 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
19:53:35 INFO - CPU: amd64
19:53:35 INFO - family 6 model 45 stepping 7
19:53:35 INFO - 1 CPU
19:53:35 INFO - Crash reason: SIGSEGV
19:53:35 INFO - Crash address: 0x0
19:53:35 INFO - Thread 0 (crashed)
19:53:35 INFO - 0 libxul.so!std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::find(unsigned long const&) [stl_tree.h:5b8a4fcf2894 : 1098 + 0x3]
19:53:35 INFO - rbx = 0x00007febd7144840 r12 = 0x00007fff7dd46008
19:53:35 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
19:53:35 INFO - r15 = 0x00007febc929eb00 rip = 0x00007febe908993e
19:53:35 INFO - rsp = 0x00007fff7dd458e0 rbp = 0x00007fff7dd458e0
19:53:35 INFO - Found by: given as instruction pointer in context
19:53:35 INFO - 1 libxul.so!mozilla::layers::CompositorParent::GetIndirectShadowTree(unsigned long) [stl_map.h:5b8a4fcf2894 : 749 + 0xb]
19:53:35 INFO - rbx = 0x00007febd7144840 r12 = 0x00007fff7dd46008
19:53:35 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
19:53:35 INFO - r15 = 0x00007febc929eb00 rip = 0x00007febe908998c
19:53:35 INFO - rsp = 0x00007fff7dd458f0 rbp = 0x00007fff7dd45900
19:53:35 INFO - Found by: call frame info
19:53:35 INFO - 2 libxul.so!mozilla::layers::CompositorParent::GetAPZCTreeManager(unsigned long) [CompositorParent.cpp:5b8a4fcf2894 : 1302 + 0x4]
19:53:35 INFO - rbx = 0x00007febd7144840 r12 = 0x00007fff7dd46008
19:53:35 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
19:53:35 INFO - r15 = 0x00007febc929eb00 rip = 0x00007febe9089a85
19:53:35 INFO - rsp = 0x00007fff7dd45910 rbp = 0x00007fff7dd45910
19:53:35 INFO - Found by: call frame info
19:53:35 INFO - 3 libxul.so!mozilla::layout::RenderFrameParent::GetApzcTreeManager() [RenderFrameParent.cpp:5b8a4fcf2894 : 332 + 0x8]
19:53:35 INFO - rbx = 0x00007febd7144840 r12 = 0x00007fff7dd46008
19:53:35 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
19:53:35 INFO - r15 = 0x00007febc929eb00 rip = 0x00007febea08e8ee
19:53:35 INFO - rsp = 0x00007fff7dd45920 rbp = 0x00007fff7dd45940
The crash in comment 2 has:
00:38:59 INFO - Crash reason: SIGSEGV
00:38:59 INFO - Crash address: 0x5a5a5a6e
Group: gfx-core-security
Severity: normal → critical
Component: Layout → Graphics: Layers
All three crashes so far is for e10s tests.
Whiteboard: [e10s only?]
I understand 0x5a5a5a6e is free memory read...
Group: core-security
Mason, is this something you could look at?  It looks like you've written some patches recently in CompositorParent.
Flags: needinfo?(mchang)
Summary: Intermittenttest_bug346659.html | application crashed [@ std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::Laye → Intermittent test_bug346659.html | application crashed [@ std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::Lay
Flags: needinfo?(mchang)
Attached file Full Stack Trace
Hi Kats, does this look familiar to you? Do you know if any of the event handling code landed ~December 3? Thanks!
Flags: needinfo?(bugmail.mozilla)
After chatting with :mccr8, it'd be really helpful to see if this issue is still occurring. A couple of options would be to clone this to a public bug to see if it happens or (b) unhide this bug. Andrew said we can decide after a reply to comment 7.
Yeah, it happened 3 times on 12-3, then didn't happen for 2 days before the bug was hidden, which kind of sounds to me like some patch landed that caused this, then it got backed out.
From the stack it's not obvious to me why this would happen. sIndirectLayerTrees is a static std::map so calling find() on it should pretty much always work. The only case I can think of this might fail is if code accesses sIndirectLayerTrees from multiple threads concurrently. It's meant to be used only on the main thread so it's possible something landed to use it on the compositor thread and then got backed out.

If this crash is still happening I can investigate more.
Flags: needinfo?(bugmail.mozilla)
Blocks: 1110275
Carsten have you seen this bug present itself lately (re: comment 11).
Flags: needinfo?(cbook)
Bug 1110275 hasn't picked up any stars since it was filed and a quick search for "std::_Rb_tree" only turns up this bug and that one, so I'm going to say no.

BTW, bug 1069937 appears to be a dupe as well?
Flags: needinfo?(cbook)
Since this doesn't seem to be an issue anymore, can we resolve this?
Flags: needinfo?(continuation)
Sounds like a plan.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(continuation)
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
sIndirectLayerTree accesses don't appear to be thread-safe: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8cb0663d54d
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> sIndirectLayerTree accesses don't appear to be thread-safe:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8cb0663d54d

Is that something you could look into fixing?  Thanks.
Flags: needinfo?(bugmail.mozilla)
Yeah i can take this.
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
I just also want to note that I'm assuming the unsynchronized access to this variable is the problem. It might be something else in which case this patch may not fix it. Until we have reliable STR it'll be hard to tell one way or the other.
Update patch that actually works [1]. It took me a few attempts to get here and I'm not totally happy with the patch. I want to think about how to do this better, but requesting feedback for suggestions.

[1] https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=310c8ea54bca
Attachment #8543371 - Attachment is obsolete: true
Attachment #8544535 - Flags: feedback?(bgirard)
Comment on attachment 8544535 [details] [diff] [review]
Lock sIndirectLayerTrees when touching it

Review of attachment 8544535 [details] [diff] [review]:
-----------------------------------------------------------------

It's better than what we have now. I'm a bit worried that if something requires locking we might need the whole operation to be locked and not individual access to the tree. For instance a piece of code check that a node is in the tree, takes a branch based on that and now expects the following call to be able to operate on the node.

I'll leave it up to you to weight this potential problem since it's purely theoretical at this point.

::: gfx/layers/ipc/CompositorParent.cpp
@@ +1701,5 @@
>  CrossProcessCompositorParent::ForceComposite(LayerTransactionParent* aLayerTree)
>  {
>    uint64_t id = aLayerTree->GetId();
>    MOZ_ASSERT(id != 0);
> +  CompositorParent* parent;

The life times are hard to understand here:
You're not holding a reference to CompositorParent, just a raw pointer, but you're locking the tree which admits that another thread can modify thus potentially remove stuff from the tree. It's not trivial to prove that something is holding the compositor parent.
Attachment #8544535 - Flags: feedback?(bgirard) → feedback+
In an IRC converstion with BenWa a few days ago he said I could turn his f+ into an r+ and land the patch. I'm not really happy with the patch but it's better than nothing and it might fix stuff so I'm going to land it.
FYI, patches for sec-critical bugs need sec-approval before landing.

Probably too late to back out now though, so maybe it's best to just leave it in.
I suggest you ask for sec-approval anyway for awareness and to get the details
on record.
Flags: needinfo?(bugmail.mozilla)
Comment on attachment 8544535 [details] [diff] [review]
Lock sIndirectLayerTrees when touching it

My mistake, sorry.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
- It's unclear from the stack how this crash is actually happening. The patch is primarily a speculative fix. I would say it would be difficult to make an exploit just based on the patch.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- The patch makes it clear that sIndirectLayerTrees was not accessed in a thread-safe manner. It would be fairly easy to find the codepaths that are accessing the variable across threads but quite hard to figure out how reliably trigger this problem because it is a race condition.

Which older supported branches are affected by this flaw?
- Any branch with OMTC enabled, I think. Fennec has had it enabled for a long time now so we can probably say that all release branches are affected.

If not all supported branches, which bug introduced the flaw?
- N/A

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- Assuming this patch fixes the problem (not really sure if it will) we can easily backport it to other branches. The general pattern for the patch is pretty straightforward.

How likely is this patch to cause regressions; how much testing does it need?
- It would be good to let it bake on central for a few days or so just in case there are any deadlocks or re-entrant pieces of code that my loacl testing didn't account for. The code is fairly well exercised during normal usage on an OMTC platform.
Flags: needinfo?(bugmail.mozilla)
Attachment #8544535 - Flags: sec-approval?
And B2G.
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=1148605&repo=b2g-inbound

Also, I assume this affects B2G too? How far back would that go if so?
B2G has had OMTC since the beginning I think so probably "all the way back". Maybe now that it's backed out I (or preferably somebody else) should spend some more time investigating exactly what's going on here. I have a feeling this is related to bug 1120252 in some way and it's possible it only manifests in more recent versions of code.
Also blew up e10s tests on desktop.
Comment on attachment 8544535 [details] [diff] [review]
Lock sIndirectLayerTrees when touching it

Crap. We need patches nominate for Aurora and Beta for this since it went in.
Attachment #8544535 - Flags: sec-approval? → sec-approval+
Fixed the oranges by moving around the EnsureLayerTreeMapReady calls so that they are all actually run on the main thread. Try push at https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cde6d7b00e48 looks green enough, re-landing:

https://hg.mozilla.org/integration/b2g-inbound/rev/1ba6070d3f51

I'll make aurora (37) and beta (36) versions too.
Comment on attachment 8544535 [details] [diff] [review]
Lock sIndirectLayerTrees when touching it

The patch I landed on b2g-inbound applies cleanly to aurora, and there's one hunk that needs trivial rebasing for beta. Let me know if I should rebase the patch to older b2g releases as well.

Approval Request Comment
[Feature/regressing bug #]: unsure, happened a long time ago
[User impact if declined]: unsynchronized access to a data structure from multiple threads, leading to potential crashes.
[Describe test coverage new/current, TBPL]: mostly untested/speculative, just landed on b2g-inbound
[Risks and why]: I would say it's a pretty low-risk patch in that if there are errors they should be obvious and get flushed out pretty quickly. Would recommend letting it bake on nightly for a few days before uplifting.
[String/UUID change made/needed]: none
Attachment #8544535 - Flags: approval-mozilla-beta?
Attachment #8544535 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/1ba6070d3f51
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Attachment #8544535 - Flags: approval-mozilla-esr31?
Kershaw reported a possible deadlock from this patch at https://bugzilla.mozilla.org/show_bug.cgi?id=1038620#c30 - we will need to resolve that before uplifting.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #38)
> hrm:
> https://treeherder.mozilla.org/logviewer.html#?job_id=1707271&repo=fx-team

Assuming you're referring to the "PROCESS-CRASH | dom/media/webaudio/test/test_delayNodeWithGain.html" stack, it looks unrelated. It's crapping out while trying to destroy the field at [1] which is independent of the sIndirectLayerTrees stuff this bug was fixing. They both use std::map, though, and so the stacks look similar because it's crashing in there somewhere. It might be another case of unsynchronized access around this field but it shouldn't be a regression from this bug at least.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientLayerManager.h?rev=da6a98c3a8d1#331
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #37)
> Kershaw reported a possible deadlock from this patch at
> https://bugzilla.mozilla.org/show_bug.cgi?id=1038620#c30 - we will need to
> resolve that before uplifting.
So, should we accept this patch in aurora or beta now?
Flags: needinfo?(bugmail.mozilla)
Yes please. Bug 1122408 will also need uplifting.
Flags: needinfo?(bugmail.mozilla)
Attachment #8544535 - Flags: approval-mozilla-esr31?
Attachment #8544535 - Flags: approval-mozilla-esr31+
Attachment #8544535 - Flags: approval-mozilla-beta?
Attachment #8544535 - Flags: approval-mozilla-beta+
Attachment #8544535 - Flags: approval-mozilla-aurora?
Attachment #8544535 - Flags: approval-mozilla-aurora+
I looked over the code in esr31 after the uplift and it looks like there are a couple more sites [1][2] that need protection with the lock. These sites don't exist in aurora/beta because bug 997367 (which landed in 32) changed these sites to use GetIndirectShadowTree instead of accessing sIndirectLayerTrees directly.

I can try to make a patch for esr31 that uplifts a part of bug 997367 and addresses the issue.

[1] https://hg.mozilla.org/releases/mozilla-esr31/file/c4c37781a802/gfx/layers/ipc/CompositorParent.cpp#l1295
[2] https://hg.mozilla.org/releases/mozilla-esr31/file/c4c37781a802/gfx/layers/ipc/CompositorParent.cpp#l1332
Group: gfx-core-security
Whiteboard: [e10s only?] → [e10s only?][adv-main36+][adv-esr31.5+]
Whiteboard: [e10s only?][adv-main36+][adv-esr31.5+] → [e10s only?][adv-main36+][adv-esr31.5+][b2g-adv-main2.2+]
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: