Crash in [@ mozilla::layers::LockD3DTexture<T>]
Categories
(Core :: Graphics: Layers, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox87 | --- | wontfix |
firefox88 | --- | wontfix |
firefox89 | --- | fixed |
People
(Reporter: sefeng, Assigned: bobowen)
References
(Depends on 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
This bug is for crash report bp-bafc3b28-7bbb-42e1-b1b7-954c80200813.
Top 10 frames of crashing thread:
0 xul.dll CrashStatsLogForwarder::CrashAction gfx/thebes/gfxPlatform.cpp:409
1 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush gfx/2d/Logging.h:277
2 xul.dll mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log gfx/2d/Logging.h:270
3 xul.dll mozilla::layers::LockD3DTexture<ID3D11Texture2D> gfx/layers/d3d11/TextureD3D11.cpp
4 xul.dll mozilla::layers::D3D11TextureData::Lock gfx/layers/d3d11/TextureD3D11.cpp:309
5 xul.dll mozilla::layers::CanvasTranslator::HandleExtensionEvent gfx/layers/ipc/CanvasTranslator.cpp:317
6 xul.dll mozilla::layers::CanvasTranslator::TranslateRecording gfx/layers/ipc/CanvasTranslator.cpp:266
7 xul.dll mozilla::layers::CanvasTranslator::StartTranslation gfx/layers/ipc/CanvasTranslator.cpp:162
8 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1240
9 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:158
This seems to be something that we've fixed before, but it's back, and the number of crashes isn't going to the right direction!
Comment 1•4 years ago
|
||
Monitoring this, but seeing if we can delay as we are trying to deprecate this code path in the future.
Comment 2•4 years ago
|
||
We call IDXGIKeyedMutex::AcquireSync in a lot of places, but almost none of them seem to handle WAIT_TIMEOUT and WAIT_ABANDONED correctly.
https://docs.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgikeyedmutex-acquiresync
If we get WAIT_ABANDONED, it probably means the process which holds it crashed (or the thread was shutdown I guess), and we now hold ownership of the lock. It seems clear we are responsible to release it in this case. Doing the wrong thing here might cause lock timeouts elsewhere.
In this particular block of code, we also handle WAIT_TIMEOUT as if we hold the lock, which is definitely not true:
This is because the FAILED/SUCCEEDED macros treat WAIT_TIMEOUT and WAIT_ABANDONED as success since they aren't negative.
Comment 3•3 years ago
|
||
The crash rate (on Nightly) has doubled since last week. Andrew, are you aware of changes which explain this?
Assignee | ||
Comment 5•3 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
:bobowen, can you comment to the bug?
I imagine that this will have gone up because of bug 1696967.
Before we took a snapshot before returning the DrawTarget, which was used to populate the next frames canvas texture, but this causes an extra copy and therefore performance issues on lower end GPUs. It was probably wasteful anyway.
We now have to lock the texture that may be being used in the compositor/renderer to do that copy.
(Unfortunately the keyed mutex doesn't have read/write locks.)
The timeout is 10 secs, so something has probably gone really wrong already (device reset?), but for Lock3DTexture, we gfxDevCrash when in other cases we generally don't.
Maybe we should handle this more like we do here for example.
We could also look to moving towards not using textures with keyed mutexes in the remote canvas code.
I've been thinking about this a fair bit, it will be quite a bit more work, but probably not too bad and may well have other performance benefits as well.
Comment 6•3 years ago
|
||
This is currently the top GPU process crash in Nightly: http://mathies.com/mozilla/crashes/nightly.html
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
I'll add in what I suggested in comment 5, so we can find out if these are mainly down to device resets.
Assignee | ||
Comment 8•3 years ago
|
||
Assignee | ||
Comment 9•3 years ago
|
||
Comment 10•3 years ago
|
||
Pushed by bobowencode@gmail.com: https://hg.mozilla.org/integration/autoland/rev/3708fca6d466 Don't dev crash in LockD3DTexture when device has been removed. r=jrmuizel
Comment 11•3 years ago
|
||
bugherder |
Assignee | ||
Comment 12•3 years ago
|
||
Filed new bug 1704445, because it looks like we can rule out the theory in comment 5, that these might be down to device resets.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 months ago
|
Description
•