Closed Bug 1538572 Opened 5 years ago Closed 5 years ago

Crash in [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId]

Categories

(Core :: Graphics: WebRender, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- unaffected
firefox67 --- unaffected
firefox68 --- fixed

People

(Reporter: calixte, Assigned: alexical)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-95969093-75f1-42c7-9026-02f140190324.

Top 10 frames of crashing thread:

0 xul.dll mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId gfx/layers/ipc/CompositorBridgeParent.cpp:1194
1 xul.dll wr_finished_scene_build gfx/webrender_bindings/RenderThread.cpp:910
2 xul.dll static void webrender_bindings::bindings::{{impl}}::post_scene_swap gfx/webrender_bindings/src/bindings.rs:874
3 xul.dll static void webrender::scene_builder::SceneBuilder::run gfx/wr/webrender/src/scene_builder.rs:326
4 xul.dll static void std::sys_common::backtrace::__rust_begin_short_backtrace<closure,  src/libstd/sys_common/backtrace.rs:136
5 xul.dll static void alloc::boxed::{{impl}}::call_box< src/liballoc/boxed.rs:673
6 xul.dll static void std::sys::windows::thread::{{impl}}::new::thread_start src/libstd/sys/windows/thread.rs:56
7 kernel32.dll BaseThreadInitThunk 
8 mozglue.dll static void patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:712
9 ntdll.dll RtlUserThreadStart 

There is 1 crash in nightly 68 with buildid 20190324094708. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1441308.

[1] https://hg.mozilla.org/mozilla-central/rev?node=96da9d241051

Flags: needinfo?(dothayer)
Priority: -- → P2

Leaving the needinfo for now since I want to look closer at this, but at the outset, despite having its fingerprints on the backtrace, I don't see how my patch would lead to this crash.

Kats, is there anything preventing ClearResources executing concurrently with GetCompositorBridgeParentFromWindowId? It doesn't look like it to me. It looks like both CompositorBridgeParent::StopAndClearResources and CompositorBridgeParent::RecvAdoptChild are not locking on sIndirectLayerTreesLock when they call into ClearResources, and I don't know of any reason why the scene builder thread would not be running. Thoughts?

Flags: needinfo?(dothayer) → needinfo?(kats)

I believe the way this code was supposed to work was that this Destroy call which invokes ClearResources would block until all the WR-side structures for that window were shut down. So there would be no calls to GetCompositorBridgeParentFromWindowId for that window id happening after that function returned.

But even so, it's not required that sIndirectLayerTreesLock be held when ClearResources is called (in fact we explicitly make sure not to hold that, or it can result in deadlock). The only requirement is that sIndirectLayerTreesLock be held while modifying the sIndirectLayerTrees strucutre, which is the case in StopAndClearResources and RectAdoptChild

Is there a specific interleaving of these threads that you can think of that would cause this crash?

Flags: needinfo?(kats)

I looked at the other threads in the crash reports; the compositor thread is handling a RecvShutdown message on PWebRenderBridge, so presumably the Destroy call is the problem. The only thing I can think of is that mApis is being touched on two different threads without any synchronization. mApis gets emptied here on the compositor thread, and it's concurrently read from here at the crash point on the scene builder thread.

Previously mApi was a single RefPtr so it might not have had this problem, or at least it might have manifested differently.

Yeah, since WebRenderAPI implements thread-safe ref counting I think this was correct before, but in our case when we check mApis.Length(), there's no guarantee that it won't be cleared (and subsequently have its memory given to something else which, say, zeroes it), by the time we return mApis[(int)aRenderRoot].

I'd like to solve this without locking on a mutex in GetWebRenderAPI. We could do so by changing mApis to a RenderRootArray and just nulling out all the entries in ClearResources. Thoughts?

That sounds reasonable to me.

Assignee: nobody → dothayer
Status: NEW → ASSIGNED

This will allow us to simply do null checks when trying to get
an API, which will prevent problems with concurrently accessing
an nsTArray.

Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d3fca2f0721
Replace mApis nsTArray with RenderRootArray r=sotaro
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Crash Signature: [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId] → [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId] [@ mozilla::layers::CompositorBridgeParent::GetCompositorBridgeParentFromWindowId(mozilla::wr::WrWindowId const &)]
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: