Closed
Bug 1374254
Opened 7 years ago
Closed 7 years ago
Crash in RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | CUseCountedObject<T>::UCReleaseUse
Categories
(Core :: Graphics: Layers, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
firefox56 | --- | fixed |
People
(Reporter: philipp, Assigned: vliu)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(1 file, 1 obsolete file)
928 bytes,
patch
|
jcristau
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-e15ee2ed-a9b1-4d08-aa97-83bba0170619. ============================================================= Crashing Thread (22), Name: Compositor Frame Module Signature Source 0 ntdll.dll RtlpWaitOnCriticalSection 1 ntdll.dll RtlpEnterCriticalSectionContended 2 ntdll.dll RtlEnterCriticalSection 3 d3d11.dll CUseCountedObject<NOutermost::CDeviceChild>::UCReleaseUse() 4 d3d11.dll NDXGI::CDeviceChild<IDXGISurface, IUnknown>::FinalRelease() 5 d3d11.dll CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() 6 d3d11.dll TOptImmediateContext::AcquireDevCtxIfaceNoSync(bool) 7 d3d11.dll CDevCtxInterface::CDevCtxInterface<CContext>(CContext*, bool, bool, bool) 8 d3d11.dll NDXGI::CDevice::RotateResourceIdentities(IDXGIResource*, IDXGIResource* const*, unsigned int) 9 dxgi.dll CDXGISwapChain::PresentImplCore(SPresentArgs const*, unsigned int, tagRECT const*, unsigned int, DXGI_SCROLL_RECT const*, IDXGIResource*, bool&, bool&, bool&) 10 dxgi.dll CDXGISwapChain::Present(unsigned int, unsigned int) 11 xul.dll mozilla::layers::CompositorD3D11::Present() gfx/layers/d3d11/CompositorD3D11.cpp:1187 12 xul.dll mozilla::layers::CompositorD3D11::EndFrame() gfx/layers/d3d11/CompositorD3D11.cpp:1123 13 xul.dll mozilla::layers::LayerManagerComposite::Render(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&) gfx/layers/composite/LayerManagerComposite.cpp:971 14 xul.dll mozilla::layers::LayerManagerComposite::UpdateAndRender() gfx/layers/composite/LayerManagerComposite.cpp:511 15 xul.dll mozilla::layers::LayerManagerComposite::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/composite/LayerManagerComposite.cpp:435 16 xul.dll mozilla::layers::CompositorBridgeParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) gfx/layers/ipc/CompositorBridgeParent.cpp:1013 17 xul.dll mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp) gfx/layers/ipc/CompositorVsyncScheduler.cpp:258 18 xul.dll mozilla::detail::RunnableMethodImpl<SoftwareDisplay* const, void ( SoftwareDisplay::*)(mozilla::TimeStamp), 1, 1, mozilla::TimeStamp>::Run() obj-firefox/dist/include/nsThreadUtils.h:1133 19 xul.dll MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) ipc/chromium/src/base/message_loop.cc:361 20 xul.dll MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) ipc/chromium/src/base/message_loop.cc:369 21 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc:444 22 xul.dll base::MessagePumpForUI::DoRunLoop() ipc/chromium/src/base/message_pump_win.cc:212 23 xul.dll base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) ipc/chromium/src/base/message_pump_win.cc:56 24 xul.dll base::MessagePumpWin::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_win.h:80 25 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 26 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 27 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc:180 28 xul.dll `anonymous namespace'::ThreadFunc ipc/chromium/src/base/platform_thread_win.cc:28 29 kernel32.dll BaseThreadInitThunk 30 mozglue.dll patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:809 31 ntdll.dll __RtlUserThreadStart 32 ntdll.dll _RtlUserThreadStart this signature is regressing in volume since 55.0a1 build 20170604030205 and subsequent versions. crash reports are coming from users on windows 10 & 8.1 and with all different gpu vendors. the signature is currently accounting for > 1% of browser crashes in early data from the 55.0b staged rollout.
Comment 1•7 years ago
|
||
Looks like a driver/dxgi bug occurring on `swapchain->Present()` call.
Severity: critical → major
Priority: -- → P2
Whiteboard: [gfx-noted]
Comment 2•7 years ago
|
||
Milan, is there anything we can do here? /me apologizes for not knowing what dxgi is :)
Flags: needinfo?(milan)
Given the timing of when the crashes spiked, I wouldn't be surprised if patches on bug 1377324 end up fixing it. Just got r+, let's see if it helps.
Comment 4•7 years ago
|
||
There were still a couple of crashes on nightly after bug 1377324 landed, but the volume on nightly is pretty low overall. Do we need to wait until that's on beta?
(In reply to Julien Cristau [:jcristau] from comment #4) > There were still a couple of crashes on nightly after bug 1377324 landed, > but the volume on nightly is pretty low overall. Do we need to wait until > that's on beta? Yeah, it doesn't look like it was related; this is probably another of the "driver reset" ones, actively worked on in bug 1351349.
Flags: needinfo?(milan)
See Also: → 1351349
Updated•7 years ago
|
Comment 6•7 years ago
|
||
Add kevin/vincent to monitor this issue
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Peter Chang[:pchang] from comment #6) > Add kevin/vincent to monitor this issue From many of the Crash Signatures indicate that they crashed in swapchain->present in ::EndFrame(). The remind of me that we just revert device reset check in EndFrame in bug 1356537. From this result, it seems that device-reset checked only in BeginFrame() is not enough. David, do you think if we can try to push back this check to ::EndFrame() to see if the crash rate can drop or not?
Assignee: nobody → vliu
Flags: needinfo?(dvander)
These are Windows 10 crashes in the UI process. That should never happen. It sounds like we're using D3D11 in the UI after a device reset.
Flags: needinfo?(dvander)
Oh, I see - these sessions are all not getting E10S, so the GPU process is disabled. That sucks. @Vincent, if the timing of the EndFrame removal check coincides with this crash spiking - sure, I'd be in favor of re-adding the aggressive TDR checks. Could we gate them on XRE_IsParentProcess() this time?
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to David Anderson [:dvander] from comment #9) > Oh, I see - these sessions are all not getting E10S, so the GPU process is > disabled. That sucks. > > @Vincent, if the timing of the EndFrame removal check coincides with this > crash spiking - sure, I'd be in favor of re-adding the aggressive TDR > checks. Could we gate them on XRE_IsParentProcess() this time? Yes, from filtering data Telemetry, it is confirmed that all crash reports were run without E10S enabled and GPU process were unavailable. Based on this. it is worth re-adding TDR checks in ::EndFrame().
Attachment #8889340 -
Flags: review?(dvander)
Comment on attachment 8889340 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch Review of attachment 8889340 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d11/CompositorD3D11.cpp @@ +1105,5 @@ > return; > } > > + if (XRE_IsParentProcess() && > + mDevice->GetDeviceRemovedReason() != S_OK) { nit: this condition fits on one line, and the body of this |if| should have two-space indents.
Attachment #8889340 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to David Anderson [:dvander] from comment #11) > Comment on attachment 8889340 [details] [diff] [review] > 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch > > Review of attachment 8889340 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/d3d11/CompositorD3D11.cpp > @@ +1105,5 @@ > > return; > > } > > > > + if (XRE_IsParentProcess() && > > + mDevice->GetDeviceRemovedReason() != S_OK) { > > nit: this condition fits on one line, and the body of this |if| should have > two-space indents. Thanks for the suggestion. Carried r+ed with nit.
Attachment #8889340 -
Attachment is obsolete: true
Assignee | ||
Comment 13•7 years ago
|
||
try result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6af5ea31571e8c475e322ea20c70e0a078db5ed0&selectedJob=117321019
Comment 14•7 years ago
|
||
Pushed by vliu@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/01828d347c22 Re-adding the aggressive TDR checks in D3D11 EndFrame(). r=dvander
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/01828d347c22
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Reporter | ||
Comment 16•7 years ago
|
||
thanks for the patch, please request uplift to beta if you deem fit to do so! (the crash is less frequent on nightly so it won't be possible to verify if this worked by looking at 56.0a1 crash stats)
Flags: needinfo?(vliu)
Assignee | ||
Comment 17•7 years ago
|
||
Comment on attachment 8890167 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch Approval Request Comment [Feature/Bug causing the regression]: No. [User impact if declined]: Crash [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: No because lower crash rate in Nightly [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: [Is the change risky?]: No [Why is the change risky/not risky?]: Only adding abnormal checking mechanism [String changes made/needed]:
Flags: needinfo?(vliu)
Attachment #8890167 -
Flags: approval-mozilla-beta?
Comment 18•7 years ago
|
||
Comment on attachment 8890167 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch graphics crash fix with device reset, beta55+
Attachment #8890167 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 19•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/9b0a6513d07e
Comment 20•7 years ago
|
||
(In reply to Vincent Liu[:vliu] from comment #17) > [Is this code covered by automated tests?]: No > [Has the fix been verified in Nightly?]: No because lower crash rate in > Nightly > [Needs manual test from QE? If yes, steps to reproduce]: No Setting qe-verify- based on Vincent's assessment on manual testing needs.
Flags: qe-verify-
Reporter | ||
Comment 21•7 years ago
|
||
crash reports with this signature are still around in 55.0b13 which got the patch: https://crash-stats.mozilla.com/signature/?product=Firefox&build_id=%3E%3D20170727114534&signature=RtlpWaitOnCriticalSection%20%7C%20RtlpEnterCriticalSectionContended%20%7C%20RtlEnterCriticalSection%20%7C%20CUseCountedObject%3CT%3E%3A%3AUCReleaseUse#reports
Flags: needinfo?(vliu)
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to [:philipp] from comment #21) > crash reports with this signature are still around in 55.0b13 which got the > patch: > https://crash-stats.mozilla.com/signature/ > ?product=Firefox&build_id=%3E%3D20170727114534&signature=RtlpWaitOnCriticalSe > ction%20%7C%20RtlpEnterCriticalSectionContended%20%7C%20RtlEnterCriticalSecti > on%20%7C%20CUseCountedObject%3CT%3E%3A%3AUCReleaseUse#reports yes, it still crashed in Present() even adding TDR check in the beginning of ::EndFrame(). Most of signatures were showed that they got device reset many times before crashing, showing that device-reset happens after TDR checks and before Gecko called Present(). I'd ever tried to filter all signatures about this crash from: 1. The first signature it counted was on build id 20170604030205. At this time 55 still in Nightly build. 2. The first signature counted to 18 was on build id 20170612224034 Looked into Pushlog to see all comments around the above of 2 check point but it is hard to see any root cause. All were e10s disabled and almost were on Windows10. Until now there is no this crash happens on FF56, probably it still in Nightly build. Anyway, I will keep monitoring to see if there is any way to do more.
Flags: needinfo?(vliu)
You need to log in
before you can comment on or make changes to this bug.
Description
•