Closed Bug 1658991 Opened 4 years ago Closed 3 years ago

Crash in [@ mozilla::layers::LockD3DTexture<T>]

Categories

(Core :: Graphics: Layers, defect, P1)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- fixed

People

(Reporter: sefeng, Assigned: bobowen)

References

(Depends on 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-bafc3b28-7bbb-42e1-b1b7-954c80200813.

Top 10 frames of crashing thread:

0 xul.dll CrashStatsLogForwarder::CrashAction gfx/thebes/gfxPlatform.cpp:409
1 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush gfx/2d/Logging.h:277
2 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log gfx/2d/Logging.h:270
3 xul.dll mozilla::layers::LockD3DTexture<ID3D11Texture2D> gfx/layers/d3d11/TextureD3D11.cpp
4 xul.dll mozilla::layers::D3D11TextureData::Lock gfx/layers/d3d11/TextureD3D11.cpp:309
5 xul.dll mozilla::layers::CanvasTranslator::HandleExtensionEvent gfx/layers/ipc/CanvasTranslator.cpp:317
6 xul.dll mozilla::layers::CanvasTranslator::TranslateRecording gfx/layers/ipc/CanvasTranslator.cpp:266
7 xul.dll mozilla::layers::CanvasTranslator::StartTranslation gfx/layers/ipc/CanvasTranslator.cpp:162
8 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1240
9 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:158

This seems to be something that we've fixed before, but it's back, and the number of crashes isn't going to the right direction!

Monitoring this, but seeing if we can delay as we are trying to deprecate this code path in the future.

Severity: -- → S3
Priority: -- → P3

We call IDXGIKeyedMutex::AcquireSync in a lot of places, but almost none of them seem to handle WAIT_TIMEOUT and WAIT_ABANDONED correctly.

https://docs.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgikeyedmutex-acquiresync

If we get WAIT_ABANDONED, it probably means the process which holds it crashed (or the thread was shutdown I guess), and we now hold ownership of the lock. It seems clear we are responsible to release it in this case. Doing the wrong thing here might cause lock timeouts elsewhere.

In this particular block of code, we also handle WAIT_TIMEOUT as if we hold the lock, which is definitely not true:

https://searchfox.org/mozilla-central/rev/8a0745cd346f0cfb89ae71690babbf7bff706113/gfx/layers/d3d11/TextureD3D11.cpp#231

This is because the FAILED/SUCCEEDED macros treat WAIT_TIMEOUT and WAIT_ABANDONED as success since they aren't negative.

The crash rate (on Nightly) has doubled since last week. Andrew, are you aware of changes which explain this?

Flags: needinfo?(aosmond)

:bobowen, can you comment to the bug?

Flags: needinfo?(bobowencode)

(In reply to Sotaro Ikeda [:sotaro] from comment #4)

:bobowen, can you comment to the bug?

I imagine that this will have gone up because of bug 1696967.
Before we took a snapshot before returning the DrawTarget, which was used to populate the next frames canvas texture, but this causes an extra copy and therefore performance issues on lower end GPUs. It was probably wasteful anyway.

We now have to lock the texture that may be being used in the compositor/renderer to do that copy.
(Unfortunately the keyed mutex doesn't have read/write locks.)

The timeout is 10 secs, so something has probably gone really wrong already (device reset?), but for Lock3DTexture, we gfxDevCrash when in other cases we generally don't.

Maybe we should handle this more like we do here for example.

We could also look to moving towards not using textures with keyed mutexes in the remote canvas code.
I've been thinking about this a fair bit, it will be quite a bit more work, but probably not too bad and may well have other performance benefits as well.

Flags: needinfo?(bobowencode)

This is currently the top GPU process crash in Nightly: http://mathies.com/mozilla/crashes/nightly.html

I'll add in what I suggested in comment 5, so we can find out if these are mainly down to device resets.

Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
Priority: P3 → P1
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3708fca6d466
Don't dev crash in LockD3DTexture when device has been removed. r=jrmuizel
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
See Also: → 1704445

Filed new bug 1704445, because it looks like we can rule out the theory in comment 5, that these might be down to device resets.

No longer blocks: gfx-triage
Blocks: 1709600
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: