Closed Bug 1515689 Opened Last year Closed 11 months ago

Crash in mozilla::layers::UiCompositorControllerParent::Initialize

Categories

(Core :: Graphics: Layers, defect, P3, critical)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
geckoview64 + fixed
geckoview65 + fixed
firefox-esr60 --- unaffected
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: snorp, Assigned: snorp)

References

Details

(Keywords: crash, Whiteboard: [gfx-noted][geckoview:p1])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is
report bp-a192985e-1e5b-47a2-aeb5-d31a00181217.
=============================================================

Top 10 frames of crashing thread:

0 libxul.so mozilla::layers::UiCompositorControllerParent::Initialize mfbt/RefPtr.h:332
1 libxul.so mozilla::detail::RunnableMethodImpl<FdWatcher*, void  xpcom/threads/nsThreadUtils.h:1197
2 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:451
3 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:459
4 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:534
5 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:38
6 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:325
7 libxul.so base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:198
8 libxul.so ThreadFunc ipc/chromium/src/base/platform_thread_posix.cc:40
9 libc.so libc.so@0x46dc3 

=============================================================

This is currently the top GeckoView crash in Focus
Priority: -- → P3
Whiteboard: [gfx-noted]
Could this crash be related to the runnable crash in RunAndroidUiTasks bug 1515930? That RunAndroidUiTasks crash signature is now the GV top crash.

It might also be related to the mozilla::layers::UiCompositorControllerChild::OpenForSameProcess crashes like bp-28ef68b5-94a8-4db5-b2b3-e359f0181230.
See Also: → 1515930
Whiteboard: [gfx-noted] → [gfx-noted][geckoview:p1]
Why are we even initializing UiCompositorControllerParent in Focus? That's part of the Fennec dynamic toolbar which is not used in Focus AFAIK.
Flags: needinfo?(snorp)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> Why are we even initializing UiCompositorControllerParent in Focus? That's
> part of the Fennec dynamic toolbar which is not used in Focus AFAIK.

The Android widget uses it for seemingly all compositor interaction (pausing, resuming) in addition to the toolbar stuff.
Flags: needinfo?(snorp)
We have several crashes with different signatures[0][1][2][3] indicating that there is a lifetime management problem with UiCompositorControllerChild. It seems like maybe the rug is getting pulled out while our runnable is queued (or the actual runnable itself in some cases? how?).

[0] https://crash-stats.mozilla.com/report/index/6731a1fe-dbd3-44b7-87bb-f3cb70190102
[1] https://crash-stats.mozilla.com/report/index/fb4cd92d-7d30-4746-b0bc-732de0190102
[2] https://crash-stats.mozilla.com/report/index/f5952098-fb9f-4377-9e76-171980190102
[3] https://crash-stats.mozilla.com/report/index/382f24e4-e701-4555-9b93-a93210190102
Kats/Botond, do you have any idea what could be happening here? AFAICT, we should always be destroying things on the UI thread, so I don't know see how there can be a race.
Flags: needinfo?(kats)
Flags: needinfo?(botond)
Assignee: nobody → snorp
We have the UiCompositorController child side open and set up the IPC channel from the UI thread at [1], and then it calls InitializeForSameProcess [2] on the parent which defers that message to the compositor thread. But before it runs, the child side could get torn down at [3] which (on the UI thread) will tear down the channel directly [4]. There's a comment above [3] saying it's sync but it's not clear to me what enforces that. What if this channel teardown happens before or concurrently with Initialize() running on the compositor thread? That might result in all kinds of weirdness.

[1] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/gfx/layers/ipc/UiCompositorControllerChild.cpp#272
[2] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/gfx/layers/ipc/UiCompositorControllerChild.cpp#281
[3] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/gfx/ipc/InProcessCompositorSession.cpp#79
[4] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/gfx/layers/ipc/UiCompositorControllerChild.cpp#181
In fact the comment at [1] predates the UiCompositorController stuff, so it probably is referring to the mCompositorBridgeChild->Destroy(); line below. So I suspect this is probably the root cause.

[1] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/gfx/ipc/InProcessCompositorSession.cpp#73

The crash looks like a null RefPtr dereference.

Looking at Initialize(), the only RefPtr I can see being dereferenced, is state->mParent here [1]. We do null-check it just above, but if things are shutting down, perhaps another thread is clearing it in between?

Holding the layer tree state lock long enough to get a strong ref to the CBP might help in that case, something like:

RefPtr<CompositorBridgeParent> cbp;
CompositorBridgeParent::CallWithIndirectShadowTree(
    mRootLayerTreeId,
    [&](LayerTreeState& aState) {
      MOZ_ASSERT(aState.mParent);
      if (!aState.mParent) {
        return;
      }
      cbp = aState.mParent;
      aState.mUiControllerParent = this;
    });
if (cbp) {
  if (AndroidDynamicToolbarAnimator* animator = 
          cbp->GetAndroidDynamicToolbarAnimator()) {
    animator->Initialize(mRootLayerTreeId);
  }
}

[1] https://searchfox.org/mozilla-central/rev/5053031ba7621fa8f63f42de4c204ab3561e4e59/gfx/layers/ipc/UiCompositorControllerParent.cpp#290

Flags: needinfo?(botond)

(In reply to Botond Ballo [:botond] from comment #8)

Looking at Initialize(), the only RefPtr I can see being dereferenced, is state->mParent here [1]. We do null-check it just above, but if things are shutting down, perhaps another thread is clearing it in between?

It looks like the only place where we clear state->mParent is CompositorBridgeParent::StopAndClearResources() [1] [2]. Not sure if that can run on the UI thread?

[1] https://searchfox.org/mozilla-central/rev/5053031ba7621fa8f63f42de4c204ab3561e4e59/gfx/layers/ipc/CompositorBridgeParent.cpp#449
[2] https://searchfox.org/mozilla-central/rev/5053031ba7621fa8f63f42de4c204ab3561e4e59/gfx/layers/ipc/CompositorBridgeParent.cpp#466

This will wait on the current thread until the passed runnable has
been executed.
It looks like we can do initialization and destruction from the UI
thread before the bits that run on the Compositor thread have run. Avoid
this by synchronously waiting on the Compositor.

Depends on D16595
Attachment #9036705 - Attachment is obsolete: true
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/06c3ceef7945
Fix initialization/destruction race in UiCompositorControllerParent r=botond
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

We should uplift this fix to the GV 64 relbranch (for Focus 8.0) and either the 65 Beta channel or just the (yet-to-be-created) GV 65 relbranch (for Focus 9.0), if the change to UiCompositorControllerParent.cpp might risk destabilizing Fennec 65 or Firefox desktop.

Attachment #9036706 - Flags: approval-mozilla-geckoview65?
Attachment #9036706 - Flags: approval-mozilla-geckoview64?

Comment on attachment 9036706 [details]
Bug 1515689 - Fix initialization/destruction race in UiCompositorControllerParent r=botond

focus topcrash fix, approved for gv64 and gv65.

Attachment #9036706 - Flags: approval-mozilla-geckoview65?
Attachment #9036706 - Flags: approval-mozilla-geckoview65+
Attachment #9036706 - Flags: approval-mozilla-geckoview64?
Attachment #9036706 - Flags: approval-mozilla-geckoview64+
See Also: → 1547760
You need to log in before you can comment on or make changes to this bug.