Closed Bug 1709603 Opened 5 months ago Closed 18 days ago

Crash in [@ mozilla::layers::LockD3DTexture<T>]

Categories

(Core :: Graphics: Layers, defect, P1)

All
Windows
defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- fixed

People

(Reporter: bobowen, Assigned: bobowen)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

I've spotted another potential issue in TextureClient::OnForwardedToHost.
If mUpdated hasn't been set then we don't take a read lock, but the texture is still being forwarded and so we might incorrectly try and use it as a back buffer, while still being used in the compositor.

I can trigger this condition when tab switching.

+++ This bug was initially created as a clone of Bug #1704445 +++

The patch on bug 1658991, didn't reduce the crashes, certainly not by much if at all.
So it looks like they can't be down to device resets.

Filing a new bug to track further investigations / resolutions.

+++ This bug was initially created as a clone of Bug #1658991 +++

This bug is for crash report bp-bafc3b28-7bbb-42e1-b1b7-954c80200813.

Top 10 frames of crashing thread:

0 xul.dll CrashStatsLogForwarder::CrashAction gfx/thebes/gfxPlatform.cpp:409
1 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush gfx/2d/Logging.h:277
2 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log gfx/2d/Logging.h:270
3 xul.dll mozilla::layers::LockD3DTexture<ID3D11Texture2D> gfx/layers/d3d11/TextureD3D11.cpp
4 xul.dll mozilla::layers::D3D11TextureData::Lock gfx/layers/d3d11/TextureD3D11.cpp:309
5 xul.dll mozilla::layers::CanvasTranslator::HandleExtensionEvent gfx/layers/ipc/CanvasTranslator.cpp:317
6 xul.dll mozilla::layers::CanvasTranslator::TranslateRecording gfx/layers/ipc/CanvasTranslator.cpp:266
7 xul.dll mozilla::layers::CanvasTranslator::StartTranslation gfx/layers/ipc/CanvasTranslator.cpp:162
8 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1240
9 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:158

This seems to be something that we've fixed before, but it's back, and the number of crashes isn't going to the right direction!

nical: I tried removing the mUpdated check, because if we're forwarding the texture it seems like it should always be read locked.
However this causes crashes on Android, because it seems like it doesn't release the read lock in the GPU process.
Any idea how to proceed?

Flags: needinfo?(nical.bugzilla)

My recollection is very fuzzy but my understanding is that you read-lock once after writing into the texture and the compositor read-unlocks once after it is not using the texture anymore. By the time the compositor read-unlocks, we may have forwarded the texture in multiple transactions (or to multiple layers in the same transaction), what is important is that it we only wrote into it once before the first time we forward it, and that we can't write into again until it is read-unlocked.

So when we forward a texture, If mUpdated isn't set it implies that we have either:

  • already written, read-locked and forwarded the texture in the past, and we are just telling the compositor to keep using the texture unmodified.
  • never written into the texture which is bug, we shouldn't forward it since it doesn't have valid content.
Flags: needinfo?(nical.bugzilla)

(In reply to Nicolas Silva [:nical] from comment #2)
...

So when we forward a texture, If mUpdated isn't set it implies that we have either:

  • already written, read-locked and forwarded the texture in the past, and we are just telling the compositor to keep using the texture unmodified.

So this is where I think we might have a problem.
If I add in logging, I see forwarding of the texture to use again after the read unlock, which leads to it relocking (via the keyed mutex) the texture for the compositor.
In theory this could happen at the same time as the texture has been chosen as a back buffer, because there is no read lock.

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(nical.bugzilla)

My understanding of how this system works:

  • Each time a texture is written to, it is read-locked (but only then).
  • When the compositor is done using a texture, it read-unlocks it.
    • If the texture is on the GPU (the case here), this happens a frame after the compositor is not showing the texture anymore (it has been replaced by another texture or the page/canvas isn't shown anymore).
  • When we continue showing a canvas without updating its content, the texture is forwarded without taking the read-lock. The compositor continues showing the texture and therefore the texture stays read-locked.
  • So as long as the canvas's front buffer isn't replaced with something else it should stay read-locked (in the case of GPU textures), except for that cas where the canvas isn't shown for a frame (switched to another tab or the canvas is invisible for a time). In that case the compositor will read-unlock the texture even though it is still the current front buffer of the canvas. This case is handled by the code interacting with mTextureLockIsUnreliable in PersistentBufferProvider.cpp. This member points to the front buffer texture in events that would lead to the compositor read-unlocking the texture even though we my still show it again unmodified.
See Also: → 1717209

Looks like the issue in the description is real and we have STR in bug 1717209.
I've got what I hope is a fix, I'll land that in bug 1717209 and if it resolves all the crashes then I'll duplicate this over.

The remaining crashes are nearly all (or perhaps all) down to keyed mutex contention and then deadlock when locking in the renderer/compositor thread for compositing and in the canvas worker threads for copying the previous texture to the new back buffer.

To avoid this contention I've created a patch that uses a permanent back buffer when the texture has internal synchronisation.
Much of the rest of the flow stays the same, as in it still locks the shared back buffer at the start of each transaction when the DrawTarget is borrowed. This is done because we still rely on waiting for the lock to have happened in the canvas thread before forwarding the textures.
Locking at the start of the transaction means that we hopefully wait for the least amount of time.

Using a separate back buffer also means we can simplify the snapshot caching, because we can always use the separate back buffer.
We also don't need to detach all snapshots, because the permanent back buffer remains locked.

Try push:
https://treeherder.mozilla.org/jobs?repo=try&collapsedPushes=954510&revision=208eda325091a2fc7524275838aceabb77863285

Looks like we actually get better performance, particularly for images (probably down to getImageData):
https://treeherder.mozilla.org/jobs?repo=try&collapsedPushes=954510&revision=01b4a8162afc543d270cacadf20fe43e4376f3bf
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=79c9826353c7af12caee209756022cbd8657997d&newProject=try&newRevision=01b4a8162afc543d270cacadf20fe43e4376f3bf&framework=13&page=1&showOnlyComparable=1
(the macos improvement is spurious)

Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d3df32375772
Use a separate permanent canvas back buffer when texture has synchronization. r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 18 days ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Regressions: 1730564
Regressions: 1730598
You need to log in before you can comment on or make changes to this bug.