Crash in [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId]

RESOLVED FIXED in Firefox 68

Status

()

defect
P2
critical
RESOLVED FIXED
2 months ago
2 months ago

People

(Reporter: calixte, Assigned: dthayer)

Tracking

(Blocks 1 bug, Regression, {crash, regression})

Trunk
mozilla68
Unspecified
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox66 unaffected, firefox67 unaffected, firefox68 fixed)

Details

(crash signature)

Attachments

(1 attachment)

Reporter

Description

2 months ago

This bug is for crash report bp-95969093-75f1-42c7-9026-02f140190324.

Top 10 frames of crashing thread:

0 xul.dll mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId gfx/layers/ipc/CompositorBridgeParent.cpp:1194
1 xul.dll wr_finished_scene_build gfx/webrender_bindings/RenderThread.cpp:910
2 xul.dll static void webrender_bindings::bindings::{{impl}}::post_scene_swap gfx/webrender_bindings/src/bindings.rs:874
3 xul.dll static void webrender::scene_builder::SceneBuilder::run gfx/wr/webrender/src/scene_builder.rs:326
4 xul.dll static void std::sys_common::backtrace::__rust_begin_short_backtrace<closure,  src/libstd/sys_common/backtrace.rs:136
5 xul.dll static void alloc::boxed::{{impl}}::call_box< src/liballoc/boxed.rs:673
6 xul.dll static void std::sys::windows::thread::{{impl}}::new::thread_start src/libstd/sys/windows/thread.rs:56
7 kernel32.dll BaseThreadInitThunk 
8 mozglue.dll static void patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:712
9 ntdll.dll RtlUserThreadStart 

There is 1 crash in nightly 68 with buildid 20190324094708. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1441308.

[1] https://hg.mozilla.org/mozilla-central/rev?node=96da9d241051

Flags: needinfo?(dothayer)
Priority: -- → P2
Assignee

Comment 1

2 months ago

Leaving the needinfo for now since I want to look closer at this, but at the outset, despite having its fingerprints on the backtrace, I don't see how my patch would lead to this crash.

Assignee

Comment 2

2 months ago

Kats, is there anything preventing ClearResources executing concurrently with GetCompositorBridgeParentFromWindowId? It doesn't look like it to me. It looks like both CompositorBridgeParent::StopAndClearResources and CompositorBridgeParent::RecvAdoptChild are not locking on sIndirectLayerTreesLock when they call into ClearResources, and I don't know of any reason why the scene builder thread would not be running. Thoughts?

Flags: needinfo?(dothayer) → needinfo?(kats)

I believe the way this code was supposed to work was that this Destroy call which invokes ClearResources would block until all the WR-side structures for that window were shut down. So there would be no calls to GetCompositorBridgeParentFromWindowId for that window id happening after that function returned.

But even so, it's not required that sIndirectLayerTreesLock be held when ClearResources is called (in fact we explicitly make sure not to hold that, or it can result in deadlock). The only requirement is that sIndirectLayerTreesLock be held while modifying the sIndirectLayerTrees strucutre, which is the case in StopAndClearResources and RectAdoptChild

Is there a specific interleaving of these threads that you can think of that would cause this crash?

Flags: needinfo?(kats)

I looked at the other threads in the crash reports; the compositor thread is handling a RecvShutdown message on PWebRenderBridge, so presumably the Destroy call is the problem. The only thing I can think of is that mApis is being touched on two different threads without any synchronization. mApis gets emptied here on the compositor thread, and it's concurrently read from here at the crash point on the scene builder thread.

Previously mApi was a single RefPtr so it might not have had this problem, or at least it might have manifested differently.

Assignee

Comment 5

2 months ago

Yeah, since WebRenderAPI implements thread-safe ref counting I think this was correct before, but in our case when we check mApis.Length(), there's no guarantee that it won't be cleared (and subsequently have its memory given to something else which, say, zeroes it), by the time we return mApis[(int)aRenderRoot].

I'd like to solve this without locking on a mutex in GetWebRenderAPI. We could do so by changing mApis to a RenderRootArray and just nulling out all the entries in ClearResources. Thoughts?

That sounds reasonable to me.

Assignee

Updated

2 months ago
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Assignee

Comment 7

2 months ago

This will allow us to simply do null checks when trying to get
an API, which will prevent problems with concurrently accessing
an nsTArray.

Comment 8

2 months ago
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d3fca2f0721
Replace mApis nsTArray with RenderRootArray r=sotaro

Comment 9

2 months ago
bugherder
Status: ASSIGNED → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Duplicate of this bug: 1538787
Crash Signature: [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId] → [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId] [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId(mozilla::wr::WrWindowId const &)]
You need to log in before you can comment on or make changes to this bug.