Closed Bug 1409176 Opened 2 years ago Closed 2 years ago

Crash in mozilla::layers::SyncObjectD3D11Client::Init

Categories

(Core :: Graphics, defect, P2, critical)

57 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: philipp, Assigned: jerry)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-2f803393-bf2b-449a-af89-bbd1b0171016.
=============================================================
Crashing Thread (95), Name: MediaPDecoder #6
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::layers::SyncObjectD3D11Client::Init() 	gfx/layers/d3d11/TextureD3D11.cpp:1739
1 	xul.dll 	mozilla::layers::SyncObjectD3D11Client::Synchronize() 	gfx/layers/d3d11/TextureD3D11.cpp:1777
2 	xul.dll 	mozilla::D3D11DXVA2Manager::CopyToImage(IMFSample*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::Image**) 	dom/media/platforms/wmf/DXVA2Manager.cpp:990
3 	xul.dll 	mozilla::WMFVideoMFTManager::CreateD3DVideoFrame(IMFSample*, __int64, mozilla::VideoData**) 	dom/media/platforms/wmf/WMFVideoMFTManager.cpp:1074
4 	xul.dll 	mozilla::WMFVideoMFTManager::Output(__int64, RefPtr<mozilla::MediaData>&) 	dom/media/platforms/wmf/WMFVideoMFTManager.cpp:1204
5 	xul.dll 	mozilla::WMFMediaDataDecoder::ProcessOutput(nsTArray<RefPtr<mozilla::MediaData> >&) 	dom/media/platforms/wmf/WMFMediaDataDecoder.cpp:154
6 	xul.dll 	mozilla::WMFMediaDataDecoder::ProcessDecode(mozilla::MediaRawData*) 	dom/media/platforms/wmf/WMFMediaDataDecoder.cpp:142
7 	xul.dll 	mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, 1>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, 1> > ( mozilla::TheoraDecoder::*)(mozilla::MediaRawData*), mozilla::TheoraDecoder, mozilla::MediaRawData*>::Run() 	xpcom/threads/MozPromise.h:1395
8 	xul.dll 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:246
9 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:226
10 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1039
11 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:521
12 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
13 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
14 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
15 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:427
16 	nss3.dll 	PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
17 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
18 	ucrtbase.dll 	thread_start<unsigned int (__cdecl*)(void* __ptr64)> 	
19 	kernel32.dll 	BaseThreadInitThunk 	
20 	ntdll.dll 	RtlUserThreadStart

this low-level volume crash signature is regressing in firefox 57 on windows - most of them with MOZ_CRASH(GFX: Cannot get D3D11 KeyedMutex). perhaps related to bug 1357299?
Looks media related, Anthony, can you help triage?
Flags: needinfo?(ajones)
Blake - can you find someone to take a look at this? We have 5 reports in the last week on Windows 10.
Flags: needinfo?(ajones) → needinfo?(bwu)
Peter,
Thanks for your asking on slack. We do need your help on this bug. :-)
Please feel free to ping me back if you find the root cause is in decoding pipeline.
Flags: needinfo?(bwu) → needinfo?(howareyou322)
Priority: -- → P2
Whiteboard: [gfx-noted]
Found the following error on the crash report and the corresponding error mapping from MSDN. 
It looks like the device reset happened on user's environment and it was unable to query the KeyedMutex.

Jerry, it looks like the video didn't notice the device reset and caused this crash. I think it is worth to expose gfx's D3D11 wrapper API for video which involves the device reset checking logic. How do you think? 

https://crash-stats.mozilla.com/report/index/5ef9ecc3-a188-4157-bcef-71b820171024#tab-metadata
[402][GFX1-]: Failed to OpenSharedResource for SyncObjectD3D11: 0x80070057 (t=9131.96) |
[403][GFX1-]: Failed to OpenSharedResource for SyncObjectD3D11: 0x80070057 (t=9131.96) |
[404][GFX1]: Failed to get KeyedMutex (2): 0x80004002 (t=9132.25) |
[405][GFX1-]: (gfxWindowsPlatform) Detected device reset: 7 (t=9132.33) 

E_INVALIDARG	One or more arguments are not valid	0x80070057
E_NOINTERFACE	No such interface supported	0x80004002
Flags: needinfo?(howareyou322) → needinfo?(hshih)
I would like to forward the error to decoder instead of using the gfx_critical_error(). The decoder has the mechanism to recreate itself for some error. We could set the corresponding error code in decoder to create a new decoder. Then, the decoder could get the latest d3d device which is set by our device-reset flow.
Assignee: nobody → hshih
Status: NEW → ASSIGNED
Flags: needinfo?(hshih)
Currently, the device-reset flow doesn't notify the decoder for device change
immediately. The decoder might use an invalid sync-object for synchronization.
Then, we will hit some assertion.
This patch try to make the synchronization flow become fallible, then pass
the error code to the media framework.

MozReview-Commit-ID: ELDFV75Z3Ff
Currently, the device-reset flow doesn't notify the decoder for device change
immediately. The decoder might use an invalid sync-object for synchronization.
Then, we will hit some assertions.
This patch try to make the synchronization flow become fallible, then we could
pass the error to the media framework for decoder recreation.

MozReview-Commit-ID: BFY32MmOdt0
Attachment #8924498 - Flags: review?(dvander)
Attachment #8924139 - Attachment is obsolete: true
The DXVA decoder might hit some error because of the hardware device status.
This patch try to pass the error code to the decoder promise object, then
try to recreate a new decoder for DXGI_ERROR_DEVICE_RESET error.

MozReview-Commit-ID: IAj8gzKGF3j
Attachment #8924499 - Flags: review?(jyavenard)
Comment on attachment 8924498 [details] [diff] [review]
P1: make SyncObjectD3D11Client become fallible.

Review of attachment 8924498 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/SyncObject.h
@@ +60,5 @@
>    };
>  
>    virtual SyncType GetSyncType() = 0;
>  
> +  // Return false for synchronization failed.

for failed synchronization
Comment on attachment 8924499 [details] [diff] [review]
P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode().

Review of attachment 8924499 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

However I'm concerned with the potential unlimited re-creation of decoder should the problem always occur.

If you think it doesn't matter than sure...

Asking Matt for his opinion on the matter.

::: dom/media/platforms/wmf/DXVA2Manager.cpp
@@ +928,5 @@
>    RefPtr<TextureClient> client = image->GetTextureClient(ImageBridgeChild::GetSingleton().get());
>    NS_ENSURE_TRUE(client, E_FAIL);
>  
>    RefPtr<IDXGIKeyedMutex> mutex;
> +  HRESULT hr = S_OK;

not technically required as it will be set in AutoTextureLock, but I guess it's clearer that way

@@ +987,5 @@
>      // It appears some race-condition may allow us to arrive here even when mSyncObject
>      // is null. It's better to avoid that crash.
>      client->SyncWithObject(mSyncObject);
> +    if (!mSyncObject->Synchronize(true)) {
> +      return DXGI_ERROR_DEVICE_RESET;

is a 10s timeout value always related to a device reset?

10s sounds like a damn long time, could it be possible to lower the value?

::: dom/media/platforms/wmf/WMFMediaDataDecoder.cpp
@@ +119,2 @@
>    return DecodePromise::CreateAndReject(
> +    MediaResult((aError == DXGI_ERROR_DEVICE_RESET) ? NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER

Does the decoder need to be re-created?

when a decode error is returned, it will automatically retry up to 3 times again, before it aborts and no longer attempt to decode.

When returning NEED_NEW_DECODER however, there's no limit to how many times it will happen.

So this could potentially cause a regression in that the video won't play, yet no error is ever returned to the user, it will continue to try again and again, re-creating a new decoder each time.
Attachment #8924499 - Flags: review?(matt.woodrow)
Attachment #8924499 - Flags: review?(jyavenard)
Attachment #8924499 - Flags: review+
Comment on attachment 8924498 [details] [diff] [review]
P1: make SyncObjectD3D11Client become fallible.

Review of attachment 8924498 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +1802,5 @@
>    }
>  
>    RefPtr<ID3D11DeviceContext> ctx;
>    dev->GetImmediateContext(getter_AddRefs(ctx));
> +  if (ctx) {

I don't think GetImmediateContext can give you a nullptr.
Attachment #8924498 - Flags: review?(dvander) → review+
Comment on attachment 8924499 [details] [diff] [review]
P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode().

Review of attachment 8924499 [details] [diff] [review]:
-----------------------------------------------------------------

I agree with jya, we already have retrying for normal errors, we shouldn't use NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER unless we have something else that will prevent it from retrying forever.

The GPU process decoder uses this because we have code in other places to disable the GPU process if it fails too much, so we don't need the media code to add limits for us.

r=me with that changed.
Attachment #8924499 - Flags: review?(matt.woodrow) → review+
Blocks: 1388995
remove the GetImmediateContext ctx checking.
Attachment #8924815 - Flags: review+
Attachment #8924498 - Attachment is obsolete: true
Just return NS_ERROR_DOM_MEDIA_DECODE_ERR for dxva decoder error.
The decoder recreation is the future work.
Attachment #8924816 - Flags: review+
Attachment #8924499 - Attachment is obsolete: true
Comment on attachment 8924499 [details] [diff] [review]
P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode().

Review of attachment 8924499 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/platforms/wmf/DXVA2Manager.cpp
@@ +987,5 @@
>      // It appears some race-condition may allow us to arrive here even when mSyncObject
>      // is null. It's better to avoid that crash.
>      client->SyncWithObject(mSyncObject);
> +    if (!mSyncObject->Synchronize(true)) {
> +      return DXGI_ERROR_DEVICE_RESET;

It's usually failed before the waiting call. And this code is shared between here and the regular web content rendering. So, I would prefer to use the original timeout value.
Duplicate of this bug: 1346267
Pushed by hshih@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/00b5d1cc54d3
make SyncObjectD3D11Client become fallible. r=dvander
https://hg.mozilla.org/integration/mozilla-inbound/rev/382939cc1ff5
handle decoder error in WMFMediaDataDecoder::ProcessDecode(). r=jya,mattwoodrow
https://hg.mozilla.org/mozilla-central/rev/00b5d1cc54d3
https://hg.mozilla.org/mozilla-central/rev/382939cc1ff5
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Depends on: 1416473
See Also: → 1417287
See Also: 1417287
You need to log in before you can comment on or make changes to this bug.