Closed Bug 1409176 Opened 8 years ago Closed 8 years ago

Crash in mozilla::layers::SyncObjectD3D11Client::Init

Categories

(Core :: Graphics, defect, P2)

57 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: philipp, Assigned: jerry)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files, 3 obsolete files)

This bug was filed from the Socorro interface and is report bp-2f803393-bf2b-449a-af89-bbd1b0171016. ============================================================= Crashing Thread (95), Name: MediaPDecoder #6 Frame Module Signature Source 0 xul.dll mozilla::layers::SyncObjectD3D11Client::Init() gfx/layers/d3d11/TextureD3D11.cpp:1739 1 xul.dll mozilla::layers::SyncObjectD3D11Client::Synchronize() gfx/layers/d3d11/TextureD3D11.cpp:1777 2 xul.dll mozilla::D3D11DXVA2Manager::CopyToImage(IMFSample*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::Image**) dom/media/platforms/wmf/DXVA2Manager.cpp:990 3 xul.dll mozilla::WMFVideoMFTManager::CreateD3DVideoFrame(IMFSample*, __int64, mozilla::VideoData**) dom/media/platforms/wmf/WMFVideoMFTManager.cpp:1074 4 xul.dll mozilla::WMFVideoMFTManager::Output(__int64, RefPtr<mozilla::MediaData>&) dom/media/platforms/wmf/WMFVideoMFTManager.cpp:1204 5 xul.dll mozilla::WMFMediaDataDecoder::ProcessOutput(nsTArray<RefPtr<mozilla::MediaData> >&) dom/media/platforms/wmf/WMFMediaDataDecoder.cpp:154 6 xul.dll mozilla::WMFMediaDataDecoder::ProcessDecode(mozilla::MediaRawData*) dom/media/platforms/wmf/WMFMediaDataDecoder.cpp:142 7 xul.dll mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, 1>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, 1> > ( mozilla::TheoraDecoder::*)(mozilla::MediaRawData*), mozilla::TheoraDecoder, mozilla::MediaRawData*>::Run() xpcom/threads/MozPromise.h:1395 8 xul.dll mozilla::TaskQueue::Runner::Run() xpcom/threads/TaskQueue.cpp:246 9 xul.dll nsThreadPool::Run() xpcom/threads/nsThreadPool.cpp:226 10 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1039 11 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:521 12 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 13 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 14 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 15 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:427 16 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 17 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 18 ucrtbase.dll thread_start<unsigned int (__cdecl*)(void* __ptr64)> 19 kernel32.dll BaseThreadInitThunk 20 ntdll.dll RtlUserThreadStart this low-level volume crash signature is regressing in firefox 57 on windows - most of them with MOZ_CRASH(GFX: Cannot get D3D11 KeyedMutex). perhaps related to bug 1357299?
Looks media related, Anthony, can you help triage?
Flags: needinfo?(ajones)
Blake - can you find someone to take a look at this? We have 5 reports in the last week on Windows 10.
Flags: needinfo?(ajones) → needinfo?(bwu)
Peter, Thanks for your asking on slack. We do need your help on this bug. :-) Please feel free to ping me back if you find the root cause is in decoding pipeline.
Flags: needinfo?(bwu) → needinfo?(howareyou322)
Priority: -- → P2
Whiteboard: [gfx-noted]
Found the following error on the crash report and the corresponding error mapping from MSDN. It looks like the device reset happened on user's environment and it was unable to query the KeyedMutex. Jerry, it looks like the video didn't notice the device reset and caused this crash. I think it is worth to expose gfx's D3D11 wrapper API for video which involves the device reset checking logic. How do you think? https://crash-stats.mozilla.com/report/index/5ef9ecc3-a188-4157-bcef-71b820171024#tab-metadata [402][GFX1-]: Failed to OpenSharedResource for SyncObjectD3D11: 0x80070057 (t=9131.96) | [403][GFX1-]: Failed to OpenSharedResource for SyncObjectD3D11: 0x80070057 (t=9131.96) | [404][GFX1]: Failed to get KeyedMutex (2): 0x80004002 (t=9132.25) | [405][GFX1-]: (gfxWindowsPlatform) Detected device reset: 7 (t=9132.33) E_INVALIDARG One or more arguments are not valid 0x80070057 E_NOINTERFACE No such interface supported 0x80004002
Flags: needinfo?(howareyou322) → needinfo?(hshih)
I would like to forward the error to decoder instead of using the gfx_critical_error(). The decoder has the mechanism to recreate itself for some error. We could set the corresponding error code in decoder to create a new decoder. Then, the decoder could get the latest d3d device which is set by our device-reset flow.
Assignee: nobody → hshih
Status: NEW → ASSIGNED
Flags: needinfo?(hshih)
Currently, the device-reset flow doesn't notify the decoder for device change immediately. The decoder might use an invalid sync-object for synchronization. Then, we will hit some assertion. This patch try to make the synchronization flow become fallible, then pass the error code to the media framework. MozReview-Commit-ID: ELDFV75Z3Ff
Currently, the device-reset flow doesn't notify the decoder for device change immediately. The decoder might use an invalid sync-object for synchronization. Then, we will hit some assertions. This patch try to make the synchronization flow become fallible, then we could pass the error to the media framework for decoder recreation. MozReview-Commit-ID: BFY32MmOdt0
Attachment #8924498 - Flags: review?(dvander)
Attachment #8924139 - Attachment is obsolete: true
The DXVA decoder might hit some error because of the hardware device status. This patch try to pass the error code to the decoder promise object, then try to recreate a new decoder for DXGI_ERROR_DEVICE_RESET error. MozReview-Commit-ID: IAj8gzKGF3j
Attachment #8924499 - Flags: review?(jyavenard)
Comment on attachment 8924498 [details] [diff] [review] P1: make SyncObjectD3D11Client become fallible. Review of attachment 8924498 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/SyncObject.h @@ +60,5 @@ > }; > > virtual SyncType GetSyncType() = 0; > > + // Return false for synchronization failed. for failed synchronization
Comment on attachment 8924499 [details] [diff] [review] P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode(). Review of attachment 8924499 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. However I'm concerned with the potential unlimited re-creation of decoder should the problem always occur. If you think it doesn't matter than sure... Asking Matt for his opinion on the matter. ::: dom/media/platforms/wmf/DXVA2Manager.cpp @@ +928,5 @@ > RefPtr<TextureClient> client = image->GetTextureClient(ImageBridgeChild::GetSingleton().get()); > NS_ENSURE_TRUE(client, E_FAIL); > > RefPtr<IDXGIKeyedMutex> mutex; > + HRESULT hr = S_OK; not technically required as it will be set in AutoTextureLock, but I guess it's clearer that way @@ +987,5 @@ > // It appears some race-condition may allow us to arrive here even when mSyncObject > // is null. It's better to avoid that crash. > client->SyncWithObject(mSyncObject); > + if (!mSyncObject->Synchronize(true)) { > + return DXGI_ERROR_DEVICE_RESET; is a 10s timeout value always related to a device reset? 10s sounds like a damn long time, could it be possible to lower the value? ::: dom/media/platforms/wmf/WMFMediaDataDecoder.cpp @@ +119,2 @@ > return DecodePromise::CreateAndReject( > + MediaResult((aError == DXGI_ERROR_DEVICE_RESET) ? NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER Does the decoder need to be re-created? when a decode error is returned, it will automatically retry up to 3 times again, before it aborts and no longer attempt to decode. When returning NEED_NEW_DECODER however, there's no limit to how many times it will happen. So this could potentially cause a regression in that the video won't play, yet no error is ever returned to the user, it will continue to try again and again, re-creating a new decoder each time.
Attachment #8924499 - Flags: review?(matt.woodrow)
Attachment #8924499 - Flags: review?(jyavenard)
Attachment #8924499 - Flags: review+
Comment on attachment 8924498 [details] [diff] [review] P1: make SyncObjectD3D11Client become fallible. Review of attachment 8924498 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d11/TextureD3D11.cpp @@ +1802,5 @@ > } > > RefPtr<ID3D11DeviceContext> ctx; > dev->GetImmediateContext(getter_AddRefs(ctx)); > + if (ctx) { I don't think GetImmediateContext can give you a nullptr.
Attachment #8924498 - Flags: review?(dvander) → review+
Comment on attachment 8924499 [details] [diff] [review] P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode(). Review of attachment 8924499 [details] [diff] [review]: ----------------------------------------------------------------- I agree with jya, we already have retrying for normal errors, we shouldn't use NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER unless we have something else that will prevent it from retrying forever. The GPU process decoder uses this because we have code in other places to disable the GPU process if it fails too much, so we don't need the media code to add limits for us. r=me with that changed.
Attachment #8924499 - Flags: review?(matt.woodrow) → review+
Blocks: 1388995
remove the GetImmediateContext ctx checking.
Attachment #8924815 - Flags: review+
Attachment #8924498 - Attachment is obsolete: true
Just return NS_ERROR_DOM_MEDIA_DECODE_ERR for dxva decoder error. The decoder recreation is the future work.
Attachment #8924816 - Flags: review+
Attachment #8924499 - Attachment is obsolete: true
Comment on attachment 8924499 [details] [diff] [review] P2: handle decoder error in WMFMediaDataDecoder::ProcessDecode(). Review of attachment 8924499 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/platforms/wmf/DXVA2Manager.cpp @@ +987,5 @@ > // It appears some race-condition may allow us to arrive here even when mSyncObject > // is null. It's better to avoid that crash. > client->SyncWithObject(mSyncObject); > + if (!mSyncObject->Synchronize(true)) { > + return DXGI_ERROR_DEVICE_RESET; It's usually failed before the waiting call. And this code is shared between here and the regular web content rendering. So, I would prefer to use the original timeout value.
Pushed by hshih@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/00b5d1cc54d3 make SyncObjectD3D11Client become fallible. r=dvander https://hg.mozilla.org/integration/mozilla-inbound/rev/382939cc1ff5 handle decoder error in WMFMediaDataDecoder::ProcessDecode(). r=jya,mattwoodrow
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Depends on: 1416473
See Also: → 1417287
See Also: 1417287
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: