Closed Bug 1427089 Opened 2 years ago Closed 2 years ago

Crash in mozilla::layers::CompositorBridgeChild::NotifyFinishedAsyncEndLayerTransaction

Categories

(Core :: Graphics, defect, P3, critical)

58 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: philipp, Assigned: rhunt)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-f4e571ad-4843-4199-9c0a-d77cc0171226.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::layers::CompositorBridgeChild::NotifyFinishedAsyncEndLayerTransaction gfx/layers/ipc/CompositorBridgeChild.cpp:1214
1 xul.dll mozilla::detail::RunnableFunction<<lambda_6eeeb7b83d1e657662c13eb47b907d64> >::Run xpcom/threads/nsThreadUtils.h:529
2 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1039
3 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:510
4 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:334
5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319
6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:299
7 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:423
8 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397
9 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:137

=============================================================

this crash signature in the content process is newly showing up in firefox 58 with MOZ_RELEASE_ASSERT(mOutstandingAsyncPaints == 0) that got added in bug 1388921.
The crash report seems to be not available currently, I will be back when it works.
unfortunately old crash reports got collectively purged - bp-2a120b45-d3f1-4b87-ac8d-f4b8f0171229 would be a new one relating to this signature.
It seems to be relative with Bug 1388921. :mattwoodrow, can you take a look at this?
Flags: needinfo?(matt.woodrow)
David or Ryan would be the best people to take a look at this.
Flags: needinfo?(matt.woodrow)
As Comment 3, Can you take a look at this? Ryan.
Flags: needinfo?(rhunt)
So from that assertion it looks like we have async operations queued when the AsyncEndLayerTransaction message is processed.

I think it's possible for ClientLayerManager::EndEmptyTransaction to be called twice in a row before EndTransaction. [1]

EndEmptyTransaction will call Render on the layer tree and then queue a AsyncEndLayerTransaction message if an operation was queued. If in both EndEmptyTransaction's an operation is queued, we could have a race with the first AsyncEndLayerTransaction message running after the second set of operations are queued, causing this assertion to be triggered.

I'm not sure if this is definitively the cause of the crashes, but it seems worth fixing. We had a similar race with one EndEmptyTransaction followed by EndTransaction, and fixed that by calling FlushAsyncPaints. I think we can do the same here.

In addition, it seems like a ClientPaintedLayer will almost always queue a buffer operation [2], as ContentClient unconditionally creates mBufferState [3]. That should be fixed. 

[1] https://searchfox.org/mozilla-central/rev/b24e6342d744c5a83fab5c15972e11eeb69d68e6/layout/base/PresShell.cpp#6442
[2] https://searchfox.org/mozilla-central/rev/b24e6342d744c5a83fab5c15972e11eeb69d68e6/gfx/layers/client/ClientPaintedLayer.cpp#231
[3] https://searchfox.org/mozilla-central/rev/b24e6342d744c5a83fab5c15972e11eeb69d68e6/gfx/layers/client/ContentClient.cpp#164
Flags: needinfo?(rhunt)
Assignee: nobody → rhunt
Attachment #8939682 - Flags: review?(dvander)
Attachment #8939683 - Flags: review?(dvander)
Attachment #8939682 - Flags: review?(dvander) → review+
Comment on attachment 8939683 [details] [diff] [review]
omtp-race-2.patch

Review of attachment 8939683 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch.
Attachment #8939683 - Flags: review?(dvander) → review+
Priority: -- → P3
Whiteboard: [gfx-noted]
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71e1bf5e542d
Only queue a ContentClient buffer state if it has operations (bug 1427089, r=dvander)
https://hg.mozilla.org/integration/mozilla-inbound/rev/8526a7b083c3
Wait for a previous empty transaction to complete before doing another empty transaction (bug 1427089, r=dvander)
https://hg.mozilla.org/mozilla-central/rev/71e1bf5e542d
https://hg.mozilla.org/mozilla-central/rev/8526a7b083c3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Looks like some pretty small patches. Worth a Beta uplift request?
Flags: needinfo?(rhunt)
Looking at the crash reports I found there was one crash report that has happened since these patches landed [1]. I still think these are worth uplifting as I believe they can minimize the chance of this race condition happening.

[1] https://crash-stats.mozilla.com/report/index/2a29dc80-6be4-446b-b447-61bb80180106
Flags: needinfo?(rhunt)
Comment on attachment 8939682 [details] [diff] [review]
omtp-race-1.patch

Approval Request Comment
[Feature/Bug causing the regression]: Race condition with OMTP
[User impact if declined]: Chance of slightly higher crash rate
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: The other patch in this bug
[Is the change risky?]: No
[Why is the change risky/not risky?]: This patch just prevents queuing no-op operations
[String changes made/needed]: None
Attachment #8939682 - Flags: approval-mozilla-beta?
Comment on attachment 8939683 [details] [diff] [review]
omtp-race-2.patch

Approval Request Comment
[Feature/Bug causing the regression]: OMTP race condition
[User impact if declined]: Slightly higher crash rate
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: The other patch in this bug
[Is the change risky?]: No
[Why is the change risky/not risky?]: This patch adds another sync point to prevent race conditions
[String changes made/needed]: None
Attachment #8939683 - Flags: approval-mozilla-beta?
Comment on attachment 8939682 [details] [diff] [review]
omtp-race-1.patch

Fix an OMTP crash. Beta58+.
Attachment #8939682 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8939683 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.