Closed Bug 1374254 Opened 8 years ago Closed 8 years ago

Crash in RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | CUseCountedObject<T>::UCReleaseUse

Categories

(Core :: Graphics: Layers, defect, P2)

55 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: philipp, Assigned: vliu)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-e15ee2ed-a9b1-4d08-aa97-83bba0170619. ============================================================= Crashing Thread (22), Name: Compositor Frame Module Signature Source 0 ntdll.dll RtlpWaitOnCriticalSection 1 ntdll.dll RtlpEnterCriticalSectionContended 2 ntdll.dll RtlEnterCriticalSection 3 d3d11.dll CUseCountedObject<NOutermost::CDeviceChild>::UCReleaseUse() 4 d3d11.dll NDXGI::CDeviceChild<IDXGISurface, IUnknown>::FinalRelease() 5 d3d11.dll CUseCountedObject<NOutermost::CDeviceChild>::UCDestroy() 6 d3d11.dll TOptImmediateContext::AcquireDevCtxIfaceNoSync(bool) 7 d3d11.dll CDevCtxInterface::CDevCtxInterface<CContext>(CContext*, bool, bool, bool) 8 d3d11.dll NDXGI::CDevice::RotateResourceIdentities(IDXGIResource*, IDXGIResource* const*, unsigned int) 9 dxgi.dll CDXGISwapChain::PresentImplCore(SPresentArgs const*, unsigned int, tagRECT const*, unsigned int, DXGI_SCROLL_RECT const*, IDXGIResource*, bool&, bool&, bool&) 10 dxgi.dll CDXGISwapChain::Present(unsigned int, unsigned int) 11 xul.dll mozilla::layers::CompositorD3D11::Present() gfx/layers/d3d11/CompositorD3D11.cpp:1187 12 xul.dll mozilla::layers::CompositorD3D11::EndFrame() gfx/layers/d3d11/CompositorD3D11.cpp:1123 13 xul.dll mozilla::layers::LayerManagerComposite::Render(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&) gfx/layers/composite/LayerManagerComposite.cpp:971 14 xul.dll mozilla::layers::LayerManagerComposite::UpdateAndRender() gfx/layers/composite/LayerManagerComposite.cpp:511 15 xul.dll mozilla::layers::LayerManagerComposite::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/composite/LayerManagerComposite.cpp:435 16 xul.dll mozilla::layers::CompositorBridgeParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) gfx/layers/ipc/CompositorBridgeParent.cpp:1013 17 xul.dll mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp) gfx/layers/ipc/CompositorVsyncScheduler.cpp:258 18 xul.dll mozilla::detail::RunnableMethodImpl<SoftwareDisplay* const, void ( SoftwareDisplay::*)(mozilla::TimeStamp), 1, 1, mozilla::TimeStamp>::Run() obj-firefox/dist/include/nsThreadUtils.h:1133 19 xul.dll MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) ipc/chromium/src/base/message_loop.cc:361 20 xul.dll MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) ipc/chromium/src/base/message_loop.cc:369 21 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc:444 22 xul.dll base::MessagePumpForUI::DoRunLoop() ipc/chromium/src/base/message_pump_win.cc:212 23 xul.dll base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) ipc/chromium/src/base/message_pump_win.cc:56 24 xul.dll base::MessagePumpWin::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_win.h:80 25 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 26 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 27 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc:180 28 xul.dll `anonymous namespace'::ThreadFunc ipc/chromium/src/base/platform_thread_win.cc:28 29 kernel32.dll BaseThreadInitThunk 30 mozglue.dll patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:809 31 ntdll.dll __RtlUserThreadStart 32 ntdll.dll _RtlUserThreadStart this signature is regressing in volume since 55.0a1 build 20170604030205 and subsequent versions. crash reports are coming from users on windows 10 & 8.1 and with all different gpu vendors. the signature is currently accounting for > 1% of browser crashes in early data from the 55.0b staged rollout.
Looks like a driver/dxgi bug occurring on `swapchain->Present()` call.
Severity: critical → major
Priority: -- → P2
Whiteboard: [gfx-noted]
Milan, is there anything we can do here? /me apologizes for not knowing what dxgi is :)
Flags: needinfo?(milan)
Given the timing of when the crashes spiked, I wouldn't be surprised if patches on bug 1377324 end up fixing it. Just got r+, let's see if it helps.
There were still a couple of crashes on nightly after bug 1377324 landed, but the volume on nightly is pretty low overall. Do we need to wait until that's on beta?
(In reply to Julien Cristau [:jcristau] from comment #4) > There were still a couple of crashes on nightly after bug 1377324 landed, > but the volume on nightly is pretty low overall. Do we need to wait until > that's on beta? Yeah, it doesn't look like it was related; this is probably another of the "driver reset" ones, actively worked on in bug 1351349.
Flags: needinfo?(milan)
See Also: → 1351349
Add kevin/vincent to monitor this issue
(In reply to Peter Chang[:pchang] from comment #6) > Add kevin/vincent to monitor this issue From many of the Crash Signatures indicate that they crashed in swapchain->present in ::EndFrame(). The remind of me that we just revert device reset check in EndFrame in bug 1356537. From this result, it seems that device-reset checked only in BeginFrame() is not enough. David, do you think if we can try to push back this check to ::EndFrame() to see if the crash rate can drop or not?
Assignee: nobody → vliu
Flags: needinfo?(dvander)
These are Windows 10 crashes in the UI process. That should never happen. It sounds like we're using D3D11 in the UI after a device reset.
Flags: needinfo?(dvander)
Oh, I see - these sessions are all not getting E10S, so the GPU process is disabled. That sucks. @Vincent, if the timing of the EndFrame removal check coincides with this crash spiking - sure, I'd be in favor of re-adding the aggressive TDR checks. Could we gate them on XRE_IsParentProcess() this time?
(In reply to David Anderson [:dvander] from comment #9) > Oh, I see - these sessions are all not getting E10S, so the GPU process is > disabled. That sucks. > > @Vincent, if the timing of the EndFrame removal check coincides with this > crash spiking - sure, I'd be in favor of re-adding the aggressive TDR > checks. Could we gate them on XRE_IsParentProcess() this time? Yes, from filtering data Telemetry, it is confirmed that all crash reports were run without E10S enabled and GPU process were unavailable. Based on this. it is worth re-adding TDR checks in ::EndFrame().
Attachment #8889340 - Flags: review?(dvander)
Comment on attachment 8889340 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch Review of attachment 8889340 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d11/CompositorD3D11.cpp @@ +1105,5 @@ > return; > } > > + if (XRE_IsParentProcess() && > + mDevice->GetDeviceRemovedReason() != S_OK) { nit: this condition fits on one line, and the body of this |if| should have two-space indents.
Attachment #8889340 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #11) > Comment on attachment 8889340 [details] [diff] [review] > 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch > > Review of attachment 8889340 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/d3d11/CompositorD3D11.cpp > @@ +1105,5 @@ > > return; > > } > > > > + if (XRE_IsParentProcess() && > > + mDevice->GetDeviceRemovedReason() != S_OK) { > > nit: this condition fits on one line, and the body of this |if| should have > two-space indents. Thanks for the suggestion. Carried r+ed with nit.
Attachment #8889340 - Attachment is obsolete: true
Pushed by vliu@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/01828d347c22 Re-adding the aggressive TDR checks in D3D11 EndFrame(). r=dvander
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
thanks for the patch, please request uplift to beta if you deem fit to do so! (the crash is less frequent on nightly so it won't be possible to verify if this worked by looking at 56.0a1 crash stats)
Flags: needinfo?(vliu)
Comment on attachment 8890167 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch Approval Request Comment [Feature/Bug causing the regression]: No. [User impact if declined]: Crash [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: No because lower crash rate in Nightly [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: [Is the change risky?]: No [Why is the change risky/not risky?]: Only adding abnormal checking mechanism [String changes made/needed]:
Flags: needinfo?(vliu)
Attachment #8890167 - Flags: approval-mozilla-beta?
Comment on attachment 8890167 [details] [diff] [review] 0001-Bug-1374254-Re-adding-the-aggressive-TDR-checks-in-D.patch graphics crash fix with device reset, beta55+
Attachment #8890167 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Vincent Liu[:vliu] from comment #17) > [Is this code covered by automated tests?]: No > [Has the fix been verified in Nightly?]: No because lower crash rate in > Nightly > [Needs manual test from QE? If yes, steps to reproduce]: No Setting qe-verify- based on Vincent's assessment on manual testing needs.
Flags: qe-verify-
(In reply to [:philipp] from comment #21) > crash reports with this signature are still around in 55.0b13 which got the > patch: > https://crash-stats.mozilla.com/signature/ > ?product=Firefox&build_id=%3E%3D20170727114534&signature=RtlpWaitOnCriticalSe > ction%20%7C%20RtlpEnterCriticalSectionContended%20%7C%20RtlEnterCriticalSecti > on%20%7C%20CUseCountedObject%3CT%3E%3A%3AUCReleaseUse#reports yes, it still crashed in Present() even adding TDR check in the beginning of ::EndFrame(). Most of signatures were showed that they got device reset many times before crashing, showing that device-reset happens after TDR checks and before Gecko called Present(). I'd ever tried to filter all signatures about this crash from: 1. The first signature it counted was on build id 20170604030205. At this time 55 still in Nightly build. 2. The first signature counted to 18 was on build id 20170612224034 Looked into Pushlog to see all comments around the above of 2 check point but it is hard to see any root cause. All were e10s disabled and almost were on Windows10. Until now there is no this crash happens on FF56, probably it still in Nightly build. Anyway, I will keep monitoring to see if there is any way to do more.
Flags: needinfo?(vliu)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: