Crash in nvwgf2um.dll | TCLSWrappers<T>::CLSDestroy

NEW
Assigned to

Status

()

Core
Graphics
P3
critical
11 months ago
4 months ago

People

(Reporter: philipp, Assigned: kechen)

Tracking

({crash, regression})

51 Branch
All
Windows 10
crash, regression
Points:
---

Firefox Tracking Flags

(firefox51+ wontfix, firefox52 wontfix, firefox53 wontfix, firefox54 wontfix, firefox55 wontfix)

Details

(crash signature)

(Reporter)

Description

11 months ago
This bug was filed from the Socorro interface and is 
report bp-e08213eb-c688-4b27-bb44-59c772161227.
=============================================================
Crashing Thread (17)
Frame 	Module 	Signature 	Source
Ø 0 	nvwgf2um.dll 	nvwgf2um.dll@0x963305 	
Ø 1 	nvwgf2um.dll 	nvwgf2um.dll@0x962ca8 	
Ø 2 	nvwgf2um.dll 	nvwgf2um.dll@0x962469 	
Ø 3 	nvwgf2um.dll 	nvwgf2um.dll@0x95ff1c 	
Ø 4 	nvwgf2um.dll 	nvwgf2um.dll@0x9562a4 	
Ø 5 	nvwgf2um.dll 	nvwgf2um.dll@0x147191 	
Ø 6 	nvwgf2um.dll 	nvwgf2um.dll@0x623874 	
Ø 7 	nvwgf2um.dll 	nvwgf2um.dll@0x7cc991 	
Ø 8 	nvwgf2um.dll 	nvwgf2um.dll@0xd79ce 	
9 	d3d11.dll 	TCLSWrappers<CTexture2D>::CLSDestroy(CTexture2D::CLS*, CContext*) 	
10 	d3d11.dll 	NDXGI::CDeviceChild<IDXGIResource1, IDXGISwapChainInternal>::FinalRelease() 	
11 	d3d11.dll 	CLayeredObjectWithCLS<CTexture2D>::CContainedObject::Release() 	
12 	xul.dll 	mozilla::layers::D3D11TextureData::`scalar deleting destructor'(unsigned int) 	
13 	xul.dll 	mozilla::layers::DestroyTextureData 	gfx/layers/client/TextureClient.cpp:247
14 	xul.dll 	mozilla::layers::TextureChild::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) 	gfx/layers/client/TextureClient.cpp:256
15 	xul.dll 	mozilla::layers::PLayerParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) 	obj-firefox/ipc/ipdl/PLayerParent.cpp:267
16 	xul.dll 	mozilla::layers::PTextureChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PTextureChild.cpp:226
17 	xul.dll 	mozilla::layers::PImageBridgeChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PImageBridgeChild.cpp:665
18 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:1662
19 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:1600
20 	xul.dll 	mozilla::ipc::MessageChannel::OnMaybeDequeueOne() 	ipc/glue/MessageChannel.cpp:1567
21 	mozglue.dll 	arena_dalloc_small 	memory/mozjemalloc/jemalloc.c:4667
22 	xul.dll 	mozilla::runnable_args_memfn<RefPtr<mozilla::layers::ImageBridgeChild>, void ( mozilla::layers::ImageBridgeChild::*)(RefPtr<mozilla::layers::ImageClient>, RefPtr<mozilla::layers::ImageContainer>), RefPtr<mozilla::layers::ImageClient>, RefPtr<mozilla::layers::ImageContainer> >::`scalar deleting destructor'(unsigned int) 	
23 	xul.dll 	MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) 	ipc/chromium/src/base/message_loop.cc:346
24 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:354
25 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:429

crashes with this signature are regressing in volume since firefox 51 and later builds - they happen too infrequently on nightly to get to a regression range though.
most of the time they occur in the content process and on systems with build  	10.0.14393 of windows 10 and seem related to dxva2d3d11 media playback.

Correlations for Firefox Beta

(98.55% in signature vs 00.72% overall) address = 0x14
(100.0% in signature vs 05.80% overall) Module "nvwgf2um.dll" = true
(98.55% in signature vs 03.98% overall) Module "msvproc.dll" = true
(98.55% in signature vs 04.63% overall) "DXVA2D3D11+" in app_notes = true
(98.55% in signature vs 05.82% overall) "DXVA2D3D11?" in app_notes = true
(98.55% in signature vs 07.31% overall) reason = EXCEPTION_ACCESS_VIOLATION_WRITE
(100.0% in signature vs 12.90% overall) adapter_vendor_id = NVIDIA Corporation
(98.55% in signature vs 10.54% overall) Module "mfperfhelper.dll" = true
(98.55% in signature vs 12.00% overall) platform_version = 10.0.14393
(98.55% in signature vs 15.96% overall) platform_pretty_version = Windows 10
(98.55% in signature vs 16.12% overall) Module "RTWorkQ.dll" = true
(98.55% in signature vs 17.54% overall) Module "MSAudDecMFT.dll" = true
(100.0% in signature vs 28.22% overall) Module "xmllite.dll" = true
(81.16% in signature vs 04.62% overall) Module "WMVCORE.DLL" = true
(81.16% in signature vs 04.62% overall) Module "WMASF.DLL" = true
(100.0% in signature vs 32.96% overall) Module "d2d1.dll" = true
(100.0% in signature vs 34.57% overall) "D2D1.1+" in app_notes = true
(100.0% in signature vs 34.57% overall) "DWrite+" in app_notes = true
(100.0% in signature vs 34.58% overall) "DWrite?" in app_notes = true
(98.55% in signature vs 31.89% overall) os_arch = amd64
(79.71% in signature vs 07.74% overall) Module "cabinet.dll" = true
(100.0% in signature vs 42.97% overall) Module "d3d11.dll" = true
(100.0% in signature vs 45.81% overall) Module "dxgi.dll" = true
(79.71% in signature vs 17.28% overall) Module "d3dcompiler_47.dll" = true
(65.22% in signature vs 01.53% overall) Module "nvspcap.dll" = true
(81.16% in signature vs 22.77% overall) Module "qasf.dll" = true
(84.06% in signature vs 27.43% overall) Module "MP3DMOD.DLL" = true
(84.06% in signature vs 28.60% overall) Module "msdmo.dll" = true
(76.81% in signature vs 21.21% overall) Module "winhttp.dll" = true
(82.61% in signature vs 31.75% overall) Module "quartz.dll" = true
(60.87% in signature vs 10.93% overall) Addon "Adblock Plus" = true
(31.88% in signature vs 78.01% overall) Module "winnsi.dll" = true
(36.23% in signature vs 00.84% overall) adapter_driver_version = 21.21.13.7633
(36.23% in signature vs 00.84% overall) adapter_driver_version_clean = 376.33
(33.33% in signature vs 01.46% overall) cpu_microcode_version = 0x1e
(27.54% in signature vs 00.48% overall) adapter_driver_version_clean = 369.09
(27.54% in signature vs 00.48% overall) adapter_driver_version = 21.21.13.6909
(18.84% in signature vs 00.17% overall) adapter_device_id = 0x1401
Looks like similar signature also happened on 50.1.0 as well with less volume.
Peter, could you find someone to look into it ?
Flags: needinfo?(howareyou322)
Priority: -- → P3

Comment 2

11 months ago
Kevin, I guess this might be related to bug 1292273. Any thought?
Flags: needinfo?(howareyou322) → needinfo?(kechen)
(Assignee)

Comment 3

11 months ago
Yes, the behavior of the crash report is similar to bug 1292273, I will look into it and see if I can get more information.
Flags: needinfo?(kechen)

Updated

11 months ago
Assignee: nobody → kechen
(Assignee)

Comment 4

11 months ago
According to the graph[1], this crash was started around 9/26 in aurora channel with a peak.
The the volume increased again around 12/6 in both beta and aurora channel until now.

These are some correlations related to this crash which is similar to bug 1292273:
(100.0% in signature vs 05.09% overall) "DXVA2D3D11+" in app_notes = true
(98.28% in signature vs 00.57% overall) address = 0x14
(98.28% in signature vs 08.25% overall) reason = EXCEPTION_ACCESS_VIOLATION_WRITE
(100.0% in signature vs 13.50% overall) adapter_vendor_id = NVIDIA Corporation
(100.0% in signature vs 16.54% overall) platform_pretty_version = Windows 10
(98.28% in signature vs 34.77% overall) os_arch = amd64

And all of the call stacks show that the program is trying to destruct D3D11TextureData in content side and destruct CTexture2D in dll file.

I will try to check the life cycle of the texture in D3D11TextureData but it also might not be related since the crash is actually in deeper dll file.

[1] https://crash-stats.mozilla.com/signature/?product=Firefox&signature=nvwgf2um.dll%20%7C%20TCLSWrappers%3CT%3E%3A%3ACLSDestroy&date=%3E%3D2016-10-04T16%3A02%3A20.000Z&date=%3C2017-01-04T16%3A02%3A20.000Z#graphs
(Assignee)

Comment 5

11 months ago
The crash volume increases after 51.0b5, we may uplift or backout something between 51.0b5 and 51.0b6.

Changeset for 51.0b5: 9afe68360fa82c16b760b448b2156230a90caf11
Changeset for 51.0b6: 2dec3c6c7c90e2e27093b8a3512c1b32a8263a8f
(Assignee)

Comment 6

11 months ago
Hello Matt, do you think the frequent recreation of D3D11Device can cause this crash [1]?

This changeset is the one I can find which might be related to this crash between 51.0b5 and 51.0b6 according to comment 5.

Or do you have any idea about this crash ?

[1] https://hg.mozilla.org/releases/mozilla-beta/rev/88ae43bdada9e2076136cb02f4d4083ba0f50773
Flags: needinfo?(matt.woodrow)
Regression range: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=9afe68360fa82c16b760b448b2156230a90caf11&tochange=2dec3c6c7c90e2e27093b8a3512c1b32a8263a8f

Yes, bug 1313883 seems like the most likely culprit.

I'm really not sure what to do here, that change was made to fix a different crash on NVIDIA drivers (and did so successfully).

If we revert it then this will likely drop off, and the crash in bug 1313883 will come back.

We might need input from an NVIDIA driver dev about how we can avoid both.
Flags: needinfo?(matt.woodrow)
ni on myself to follow up on the last part of Comment 7.
Flags: needinfo?(mozillamarcia.knous)
(Assignee)

Comment 9

10 months ago
I think the root cause of this crash might be the same as [1].

When we frequently recreate decoder device it might raise the chance to hit the race condition mentioned in [1].

We can monitor this crash for a little time to see if this crash is still happened with new version driver.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1292273#c86
(Assignee)

Updated

10 months ago
See Also: → bug 1328082
[Tracking Requested - why for this release]: crash worth tracking and see if new driver improves or not.
tracking-firefox51: --- → ?

Comment 11

10 months ago
We're waiting for new driver release.

Comment 12

10 months ago
Track 51+ as new regression related to NV driver.
tracking-firefox51: ? → +
(Assignee)

Comment 13

10 months ago
Since the driver with the fix, Nvidia 21.21.13.7662, has been released and we decided not to blacklist anything currently according to bug 1292273 comment 93, I will keep monitoring this crash and check if the new driver solve this crash.
Flags: needinfo?(mozillamarcia.knous)

Comment 14

10 months ago
(In reply to Kevin Chen[:kechen] (UTC + 8) from comment #13)
> Since the driver with the fix, Nvidia 21.21.13.7662, has been released and
> we decided not to blacklist anything currently according to bug 1292273
> comment 93, I will keep monitoring this crash and check if the new driver
> solve this crash.

Kevin, thanks for following up this bug.

Comment 15

10 months ago
Mark 51 won't fix as new NV Driver is released and we will keep monitoring the crash.
status-firefox51: affected → wontfix
(Assignee)

Comment 16

9 months ago
Due to the release of Firefox 51, the crash number increases which is about 200 crashes a day.
The latest driver failed to solve this crash, we are studying on this issue and Nvidia is also working on it.

By the meantime, is it profitable to temporarily revert the fix in bug 1313883 ? How bad we would be without this fix ?
Flags: needinfo?(matt.woodrow)
Given that bug 1313883 was causing frequent crashes in automation, I'd suspect it was more than 200 crashes per day.

If someone can try do a real comparison of the crash rates, then we could use that data to make a decision.

Trading one crash for another sucks though, can we ask Nvidia about bug 1313883 too and see if there's any behaviour that will workaround both?
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 18

9 months ago
I've tried to reproduce this bug by frequently recreating decode device(by switching tabs with video resources) and running some video resource intensive programs(e.g., 3DMARK) at the same time on my Windows 10(32-bits) platform; however, I still failed to reproduce the crash currently.

I will ask Nvidia if they have any idea about this bug.
Mass wontfix for bugs affecting firefox 52.
status-firefox52: affected → wontfix
Currently there are around 1000 crashes a week on release 52 versions, and only a few on pre-release channels. Wontfix for 53. 

Kevin, any word from nvidia?
status-firefox53: affected → wontfix
status-firefox54: --- → affected
status-firefox55: --- → affected
Flags: needinfo?(kechen)
(Assignee)

Comment 21

7 months ago
I was expecting this crash gone since the decreasing in mid-March.
I will take this up with Nvidia and see if we have any feedback.
Flags: needinfo?(kechen)
(Assignee)

Comment 22

7 months ago
Lots of crashes happened when destroying texture data in Decoder, I will investigate this part of code before sending the mail to Nvidia.

Comment 23

7 months ago
Anthony, do you have any clues about recently crashes volume?
Flags: needinfo?(anthony.s.hughes)
(In reply to Peter Chang[:pchang] from comment #23)
> Anthony, do you have any clues about recently crashes volume?

Spike started on March 29, 2017 which correlates to the release of 52.0.2. There is a 71% correlation to NVIDIA driver 376.53. However I cannot find reference to this driver anywhere on NVIDIA's website so maybe this comes from another source?
Flags: needinfo?(anthony.s.hughes)
(Assignee)

Comment 25

7 months ago
I will send a mail to Nvidia to check this driver version.
Only a few crashes remaining. I think we can assume most people have updated their drivers.
status-firefox54: affected → wontfix
status-firefox55: affected → wontfix
(Assignee)

Comment 27

4 months ago
Take some notes:
  In current release version (which is firefox 54) in these 7 days, all the crashes happened in UI process with telemetry "gpuProcess":{"status":"unavailable"}" and 75% percent of crashes have "compositor":"d3d11" in telemetry.

  Since all of these reports are on Windows 10, they support to have GPU process; however, they may somehow encounter some crashes in GPU process or device resets and fallback to UI / content model. But the weird part is that gecko keeps using D3D11 as compositor backend, this might be caused by bug 1364563.

  I will monitor if the value declines in the next beta after bug 1364563 is landed.
(Assignee)

Comment 28

4 months ago
Some fix for comment 27:
For those crash reports which contain "gpuProcess":{"status":"unavailable"}" also have "e10sEnabled":false,"; therefore, the usage of d3d11 as the backend of compositor since a correct behavior.

The other thing is that most of crash reports logs more than one "Detected device reset" in their GraphicsCriticalError section.
Maybe we can consider fallback to software backend after several trials.
You need to log in before you can comment on or make changes to this bug.