Closed
Bug 1325933
Opened 7 years ago
Closed 2 years ago
Crash in nvwgf2um.dll | TCLSWrappers<T>::CLSDestroy
Categories
(Core :: Graphics, defect, P3)
Tracking
()
People
(Reporter: philipp, Unassigned)
References
Details
(Keywords: crash, regression, Whiteboard: qa-not-actionable)
Crash Data
This bug was filed from the Socorro interface and is report bp-e08213eb-c688-4b27-bb44-59c772161227. ============================================================= Crashing Thread (17) Frame Module Signature Source Ø 0 nvwgf2um.dll nvwgf2um.dll@0x963305 Ø 1 nvwgf2um.dll nvwgf2um.dll@0x962ca8 Ø 2 nvwgf2um.dll nvwgf2um.dll@0x962469 Ø 3 nvwgf2um.dll nvwgf2um.dll@0x95ff1c Ø 4 nvwgf2um.dll nvwgf2um.dll@0x9562a4 Ø 5 nvwgf2um.dll nvwgf2um.dll@0x147191 Ø 6 nvwgf2um.dll nvwgf2um.dll@0x623874 Ø 7 nvwgf2um.dll nvwgf2um.dll@0x7cc991 Ø 8 nvwgf2um.dll nvwgf2um.dll@0xd79ce 9 d3d11.dll TCLSWrappers<CTexture2D>::CLSDestroy(CTexture2D::CLS*, CContext*) 10 d3d11.dll NDXGI::CDeviceChild<IDXGIResource1, IDXGISwapChainInternal>::FinalRelease() 11 d3d11.dll CLayeredObjectWithCLS<CTexture2D>::CContainedObject::Release() 12 xul.dll mozilla::layers::D3D11TextureData::`scalar deleting destructor'(unsigned int) 13 xul.dll mozilla::layers::DestroyTextureData gfx/layers/client/TextureClient.cpp:247 14 xul.dll mozilla::layers::TextureChild::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) gfx/layers/client/TextureClient.cpp:256 15 xul.dll mozilla::layers::PLayerParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) obj-firefox/ipc/ipdl/PLayerParent.cpp:267 16 xul.dll mozilla::layers::PTextureChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PTextureChild.cpp:226 17 xul.dll mozilla::layers::PImageBridgeChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PImageBridgeChild.cpp:665 18 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp:1662 19 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) ipc/glue/MessageChannel.cpp:1600 20 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp:1567 21 mozglue.dll arena_dalloc_small memory/mozjemalloc/jemalloc.c:4667 22 xul.dll mozilla::runnable_args_memfn<RefPtr<mozilla::layers::ImageBridgeChild>, void ( mozilla::layers::ImageBridgeChild::*)(RefPtr<mozilla::layers::ImageClient>, RefPtr<mozilla::layers::ImageContainer>), RefPtr<mozilla::layers::ImageClient>, RefPtr<mozilla::layers::ImageContainer> >::`scalar deleting destructor'(unsigned int) 23 xul.dll MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) ipc/chromium/src/base/message_loop.cc:346 24 xul.dll MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) ipc/chromium/src/base/message_loop.cc:354 25 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc:429 crashes with this signature are regressing in volume since firefox 51 and later builds - they happen too infrequently on nightly to get to a regression range though. most of the time they occur in the content process and on systems with build 10.0.14393 of windows 10 and seem related to dxva2d3d11 media playback. Correlations for Firefox Beta (98.55% in signature vs 00.72% overall) address = 0x14 (100.0% in signature vs 05.80% overall) Module "nvwgf2um.dll" = true (98.55% in signature vs 03.98% overall) Module "msvproc.dll" = true (98.55% in signature vs 04.63% overall) "DXVA2D3D11+" in app_notes = true (98.55% in signature vs 05.82% overall) "DXVA2D3D11?" in app_notes = true (98.55% in signature vs 07.31% overall) reason = EXCEPTION_ACCESS_VIOLATION_WRITE (100.0% in signature vs 12.90% overall) adapter_vendor_id = NVIDIA Corporation (98.55% in signature vs 10.54% overall) Module "mfperfhelper.dll" = true (98.55% in signature vs 12.00% overall) platform_version = 10.0.14393 (98.55% in signature vs 15.96% overall) platform_pretty_version = Windows 10 (98.55% in signature vs 16.12% overall) Module "RTWorkQ.dll" = true (98.55% in signature vs 17.54% overall) Module "MSAudDecMFT.dll" = true (100.0% in signature vs 28.22% overall) Module "xmllite.dll" = true (81.16% in signature vs 04.62% overall) Module "WMVCORE.DLL" = true (81.16% in signature vs 04.62% overall) Module "WMASF.DLL" = true (100.0% in signature vs 32.96% overall) Module "d2d1.dll" = true (100.0% in signature vs 34.57% overall) "D2D1.1+" in app_notes = true (100.0% in signature vs 34.57% overall) "DWrite+" in app_notes = true (100.0% in signature vs 34.58% overall) "DWrite?" in app_notes = true (98.55% in signature vs 31.89% overall) os_arch = amd64 (79.71% in signature vs 07.74% overall) Module "cabinet.dll" = true (100.0% in signature vs 42.97% overall) Module "d3d11.dll" = true (100.0% in signature vs 45.81% overall) Module "dxgi.dll" = true (79.71% in signature vs 17.28% overall) Module "d3dcompiler_47.dll" = true (65.22% in signature vs 01.53% overall) Module "nvspcap.dll" = true (81.16% in signature vs 22.77% overall) Module "qasf.dll" = true (84.06% in signature vs 27.43% overall) Module "MP3DMOD.DLL" = true (84.06% in signature vs 28.60% overall) Module "msdmo.dll" = true (76.81% in signature vs 21.21% overall) Module "winhttp.dll" = true (82.61% in signature vs 31.75% overall) Module "quartz.dll" = true (60.87% in signature vs 10.93% overall) Addon "Adblock Plus" = true (31.88% in signature vs 78.01% overall) Module "winnsi.dll" = true (36.23% in signature vs 00.84% overall) adapter_driver_version = 21.21.13.7633 (36.23% in signature vs 00.84% overall) adapter_driver_version_clean = 376.33 (33.33% in signature vs 01.46% overall) cpu_microcode_version = 0x1e (27.54% in signature vs 00.48% overall) adapter_driver_version_clean = 369.09 (27.54% in signature vs 00.48% overall) adapter_driver_version = 21.21.13.6909 (18.84% in signature vs 00.17% overall) adapter_device_id = 0x1401
Comment 1•7 years ago
|
||
Looks like similar signature also happened on 50.1.0 as well with less volume. Peter, could you find someone to look into it ?
Flags: needinfo?(howareyou322)
Priority: -- → P3
Comment 2•7 years ago
|
||
Kevin, I guess this might be related to bug 1292273. Any thought?
Flags: needinfo?(howareyou322) → needinfo?(kechen)
Comment 3•7 years ago
|
||
Yes, the behavior of the crash report is similar to bug 1292273, I will look into it and see if I can get more information.
Flags: needinfo?(kechen)
Updated•7 years ago
|
Assignee: nobody → kechen
Comment 4•7 years ago
|
||
According to the graph[1], this crash was started around 9/26 in aurora channel with a peak. The the volume increased again around 12/6 in both beta and aurora channel until now. These are some correlations related to this crash which is similar to bug 1292273: (100.0% in signature vs 05.09% overall) "DXVA2D3D11+" in app_notes = true (98.28% in signature vs 00.57% overall) address = 0x14 (98.28% in signature vs 08.25% overall) reason = EXCEPTION_ACCESS_VIOLATION_WRITE (100.0% in signature vs 13.50% overall) adapter_vendor_id = NVIDIA Corporation (100.0% in signature vs 16.54% overall) platform_pretty_version = Windows 10 (98.28% in signature vs 34.77% overall) os_arch = amd64 And all of the call stacks show that the program is trying to destruct D3D11TextureData in content side and destruct CTexture2D in dll file. I will try to check the life cycle of the texture in D3D11TextureData but it also might not be related since the crash is actually in deeper dll file. [1] https://crash-stats.mozilla.com/signature/?product=Firefox&signature=nvwgf2um.dll%20%7C%20TCLSWrappers%3CT%3E%3A%3ACLSDestroy&date=%3E%3D2016-10-04T16%3A02%3A20.000Z&date=%3C2017-01-04T16%3A02%3A20.000Z#graphs
Comment 5•7 years ago
|
||
The crash volume increases after 51.0b5, we may uplift or backout something between 51.0b5 and 51.0b6. Changeset for 51.0b5: 9afe68360fa82c16b760b448b2156230a90caf11 Changeset for 51.0b6: 2dec3c6c7c90e2e27093b8a3512c1b32a8263a8f
Comment 6•7 years ago
|
||
Hello Matt, do you think the frequent recreation of D3D11Device can cause this crash [1]? This changeset is the one I can find which might be related to this crash between 51.0b5 and 51.0b6 according to comment 5. Or do you have any idea about this crash ? [1] https://hg.mozilla.org/releases/mozilla-beta/rev/88ae43bdada9e2076136cb02f4d4083ba0f50773
Flags: needinfo?(matt.woodrow)
Comment 7•7 years ago
|
||
Regression range: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=9afe68360fa82c16b760b448b2156230a90caf11&tochange=2dec3c6c7c90e2e27093b8a3512c1b32a8263a8f Yes, bug 1313883 seems like the most likely culprit. I'm really not sure what to do here, that change was made to fix a different crash on NVIDIA drivers (and did so successfully). If we revert it then this will likely drop off, and the crash in bug 1313883 will come back. We might need input from an NVIDIA driver dev about how we can avoid both.
Flags: needinfo?(matt.woodrow)
Comment 8•7 years ago
|
||
ni on myself to follow up on the last part of Comment 7.
Flags: needinfo?(mozillamarcia.knous)
Comment 9•7 years ago
|
||
I think the root cause of this crash might be the same as [1]. When we frequently recreate decoder device it might raise the chance to hit the race condition mentioned in [1]. We can monitor this crash for a little time to see if this crash is still happened with new version driver. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1292273#c86
Comment 10•7 years ago
|
||
[Tracking Requested - why for this release]: crash worth tracking and see if new driver improves or not.
tracking-firefox51:
--- → ?
Comment 11•7 years ago
|
||
We're waiting for new driver release.
Comment 13•7 years ago
|
||
Since the driver with the fix, Nvidia 21.21.13.7662, has been released and we decided not to blacklist anything currently according to bug 1292273 comment 93, I will keep monitoring this crash and check if the new driver solve this crash.
Updated•7 years ago
|
Flags: needinfo?(mozillamarcia.knous)
Comment 14•7 years ago
|
||
(In reply to Kevin Chen[:kechen] (UTC + 8) from comment #13) > Since the driver with the fix, Nvidia 21.21.13.7662, has been released and > we decided not to blacklist anything currently according to bug 1292273 > comment 93, I will keep monitoring this crash and check if the new driver > solve this crash. Kevin, thanks for following up this bug.
Comment 15•7 years ago
|
||
Mark 51 won't fix as new NV Driver is released and we will keep monitoring the crash.
Comment 16•7 years ago
|
||
Due to the release of Firefox 51, the crash number increases which is about 200 crashes a day. The latest driver failed to solve this crash, we are studying on this issue and Nvidia is also working on it. By the meantime, is it profitable to temporarily revert the fix in bug 1313883 ? How bad we would be without this fix ?
Flags: needinfo?(matt.woodrow)
Comment 17•7 years ago
|
||
Given that bug 1313883 was causing frequent crashes in automation, I'd suspect it was more than 200 crashes per day. If someone can try do a real comparison of the crash rates, then we could use that data to make a decision. Trading one crash for another sucks though, can we ask Nvidia about bug 1313883 too and see if there's any behaviour that will workaround both?
Flags: needinfo?(matt.woodrow)
Comment 18•7 years ago
|
||
I've tried to reproduce this bug by frequently recreating decode device(by switching tabs with video resources) and running some video resource intensive programs(e.g., 3DMARK) at the same time on my Windows 10(32-bits) platform; however, I still failed to reproduce the crash currently. I will ask Nvidia if they have any idea about this bug.
Comment 19•7 years ago
|
||
Mass wontfix for bugs affecting firefox 52.
Comment 20•7 years ago
|
||
Currently there are around 1000 crashes a week on release 52 versions, and only a few on pre-release channels. Wontfix for 53. Kevin, any word from nvidia?
Comment 21•7 years ago
|
||
I was expecting this crash gone since the decreasing in mid-March. I will take this up with Nvidia and see if we have any feedback.
Flags: needinfo?(kechen)
Comment 22•7 years ago
|
||
Lots of crashes happened when destroying texture data in Decoder, I will investigate this part of code before sending the mail to Nvidia.
Comment 23•7 years ago
|
||
Anthony, do you have any clues about recently crashes volume?
Flags: needinfo?(anthony.s.hughes)
Comment 24•7 years ago
|
||
(In reply to Peter Chang[:pchang] from comment #23) > Anthony, do you have any clues about recently crashes volume? Spike started on March 29, 2017 which correlates to the release of 52.0.2. There is a 71% correlation to NVIDIA driver 376.53. However I cannot find reference to this driver anywhere on NVIDIA's website so maybe this comes from another source?
Flags: needinfo?(anthony.s.hughes)
Comment 25•7 years ago
|
||
I will send a mail to Nvidia to check this driver version.
Comment 26•7 years ago
|
||
Only a few crashes remaining. I think we can assume most people have updated their drivers.
Comment 27•7 years ago
|
||
Take some notes: In current release version (which is firefox 54) in these 7 days, all the crashes happened in UI process with telemetry "gpuProcess":{"status":"unavailable"}" and 75% percent of crashes have "compositor":"d3d11" in telemetry. Since all of these reports are on Windows 10, they support to have GPU process; however, they may somehow encounter some crashes in GPU process or device resets and fallback to UI / content model. But the weird part is that gecko keeps using D3D11 as compositor backend, this might be caused by bug 1364563. I will monitor if the value declines in the next beta after bug 1364563 is landed.
Comment 28•7 years ago
|
||
Some fix for comment 27: For those crash reports which contain "gpuProcess":{"status":"unavailable"}" also have "e10sEnabled":false,"; therefore, the usage of d3d11 as the backend of compositor since a correct behavior. The other thing is that most of crash reports logs more than one "Detected device reset" in their GraphicsCriticalError section. Maybe we can consider fallback to software backend after several trials.
Comment 30•2 years ago
|
||
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•