1447146 - Crash in nvwgf2umx.dll | RtlAllocateMemoryBlockLookaside | ... (advapi32.dll)

Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is
report bp-e2b60a52-6910-4ce5-bd93-b12bd0180319.
=============================================================

content crashes with this signature are regressing in volume in the 60.0b cycle from
users on windows 7.

this seems to be constrained to a number of device/driver configurations. would
there a way to blocklist them or should we reach out to nvidia?

Adapter driver version facet
1 	10.18.13.6510 	61 	30.50 %
2 	10.18.13.6472 	37 	18.50 %
3 	10.18.13.6175 	31 	15.50 %
4 	10.18.13.5900 	21 	10.50 %
5 	10.18.13.5921 	18 	9.00 %
6 	10.18.13.6519 	13 	6.50 %
7 	10.18.13.6451 	8 	4.00 %
8 	10.18.13.5850 	6 	3.00 %
9 	10.18.13.5582 	4 	2.00 %
10 	10.18.13.5906 	1 	0.50 %

Adapter device id facet
1 	0x128b 	77 	38.50 %
2 	0x1380 	41 	20.50 %
3 	0x11c2 	21 	10.50 %
4 	0x1402 	19 	9.50 %
5 	0x11c0 	13 	6.50 %
6 	0x1187 	12 	6.00 %
7 	0x0f02 	10 	5.00 %
8 	0x1287 	4 	2.00 %
9 	0x104a 	3 	1.50 %

Ryan Hunt [:rhunt] (on leave until early May)

Updated

•

6 years ago

Flags: needinfo?(milan)

Whiteboard: [gfx-noted]

Jim Mathies [:jimm]

Comment 1

•

6 years ago

This looks like 60 only, but low volume. Lets wait and see.

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

6 years ago

Priority: -- → P3

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

6 years ago

Still small number of installs, low numbers in B13, maybe gone?

Marion Daly [:mdaly]

Comment 3

•

6 years ago

Milan: Verify that it's no longer showing up?

Jim Mathies [:jimm]

Comment 4

•

6 years ago

(In reply to Marion Daly [:mdaly] from comment #3)
> Milan: Verify that it's no longer showing up?

still present, very low user count, high volume for those users.

status-firefox60: affected → fix-optional

status-firefox61: ? → fix-optional

Flags: needinfo?(milan)

Marcia Knous [:marcia]

Updated

•

6 years ago

Marcia Knous [:marcia]

Comment 5

•

6 years ago

Adding 63 and 64 as affected.

status-firefox63: --- → affected

status-firefox64: --- → affected

Pascal Chevrel:pascalc (PTO until April 26)

Comment 6

•

6 years ago

The number of crashes and users affected is going up since we shipped 63, maybe this should be reprioritized as a P2 instead of a P3

status-firefox60: fix-optional → wontfix

status-firefox61: fix-optional → wontfix

status-firefox65: --- → affected

status-firefox-esr60: --- → affected

David Durst [:ddurst]

Comment 7

•

6 years ago

This has been scaling up on Windows on release even prior to 63.

Flags: needinfo?(dbolter)

Priority: P3 → P2

David Bolter [:davidb] (NeedInfo me for attention)

Comment 8

•

6 years ago

Jeff, any thoughts on who could look at this? (Normally I'd ask Bas)

Flags: needinfo?(dbolter) → needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 9

•

6 years ago

Not really. The stacks are pretty unhelpful. It might be best to ask nvidia on mozilla-nvidia discuss.

Flags: needinfo?(jmuizelaar)

David Durst [:ddurst]

Comment 10

•

6 years ago

Any update here?

Flags: needinfo?(dbolter)

David Bolter [:davidb] (NeedInfo me for attention)

Comment 11

•

6 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> Not really. The stacks are pretty unhelpful. It might be best to ask nvidia
> on mozilla-nvidia discuss.

Agreed. Jeff could you kick that off?

Flags: needinfo?(dbolter) → needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 12

•

6 years ago

Done.

Flags: needinfo?(jmuizelaar)

Jared Wein [:jaws] (please needinfo? me)

Updated

•

6 years ago

status-firefox63: affected → wontfix

Jessie [:jbonisteel] pls NI

Comment 13

•

5 years ago

Hey Jeff - any update on this bug?

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 14

•

5 years ago

No. Nvidia did not respond.

Flags: needinfo?(jmuizelaar)

Jessie [:jbonisteel] pls NI

Comment 15

•

5 years ago

Any suggestions on anything else that could be done for investigation?

Flags: needinfo?(jmuizelaar)

Jessie [:jbonisteel] pls NI

Updated

•

5 years ago

status-firefox65: affected → wontfix

Jeff Muizelaar [:jrmuizel]

Comment 16

•

5 years ago

The most productive thing is probably to continue to try to get in touch with Nvidia. I'll try harder.

Flags: needinfo?(jmuizelaar)

Kimmo Kinnunen

Comment 17

•

5 years ago

Thanks for the report. This is NV bug number 2446669. I'll look into this..

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

5 years ago

status-firefox66: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 18

•

5 years ago

I am emailing 6 users who submitted crash reports including their email addresses with one of these signatures. If anyone replies with permission to share the crash dumps, then Jeff or I can provide them, in accordance with Mozilla's data protection policies.

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox64: affected → wontfix

Liz Henry (:lizzard) (relman/hg->git project)

Comment 19

•

5 years ago

Jeff, just noting here that we have permission from the user in email to pass on the dump from this crash to NVIDIA: https://crash-stats.mozilla.com/signature/?email=%3Ddungeonus%40ukr.net&product=Firefox&signature=nvwgf2umx.dll%20%7C%20VirtualQuery&date=%3E%3D2018-07-16T06%3A36%3A32.000Z&date=%3C2019-01-16T04%3A36%3A32.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

Flags: needinfo?(jmuizelaar)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 20

•

5 years ago

Though looking at it I'm not sure how useful that's going to be!

Jeff Muizelaar [:jrmuizel]

Comment 21

•

5 years ago

I've passed a minidump on to Kimmo at Nvidia.

Flags: needinfo?(jmuizelaar)

Kimmo Kinnunen

Comment 22

•

5 years ago

This particular minidump used driver version: 341.44, driver date "2-3-2015", geforce gtx 260
Latest "supported" driver for the card is: Version: 342.01 WHQL, release Date: 2016.12.14
The other crash reports report drivers from newer branches, but around the same era. Example: one crashed with 364.72, it is 2016/H1.

I tried Win7 + 342.01 + GT210 to repro this. The callstack looks like initialization code and IIRC the reports indicate the process lives only few seconds. For the repro I tried to just open and close the app and open multiple windows. I did not succeed. In theory the 342.01 could be fixed wrt this, but I doubt it's the case. More probable explanation is that this more transient and/or needs more specific repro scenario. I'll try the exact 241.44 later.

The main thread crashes at
ul.dll!mozilla::gfx::DoesTextureSharingWorkInternal(ID3D11Device * device, DXGI_FORMAT format, unsigned int bindflags) Line 253 C++
xul.dll!mozilla::gfx::DeviceManagerDx::CreateCompositorDevice(mozilla::gfx::FeatureState & d3d11) Line 495 C++
xul.dll!mozilla::gfx::DeviceManagerDx::CreateCompositorDevices() Line 180 C++
xul.dll!gfxWindowsPlatform::InitializeD3D11() Line 1522 C++
xul.dll!gfxWindowsPlatform::InitializeDevices() Line 1497 C++
xul.dll!gfxWindowsPlatform::HandleDeviceReset() Line 431 C++
xul.dll!gfxWindowsPlatform::UpdateRenderMode() Line 486 C++
xul.dll!nsWindow::OnPaint(HDC__ * aDC, unsigned int aNestingLevel) Line 181 C++
xul.dll!nsWindow::ProcessMessage(unsigned int msg, unsigned __int64 & wParam, __int64 & lParam, int64 * aRetValue) Line 5576 C++
xul.dll!nsWindow::WindowProcInternal(HWND * hWnd, unsigned int msg, unsigned __int64 wParam, int64 lParam) Line 5034 C++
xul.dll!nsWindow::WindowProc(HWND * hWnd, unsigned int msg, unsigned __int64 wParam, __int64 lParam) Line 4986 C++
[External Code]
xul.dll!nsAppShell::ProcessNextNativeEvent(bool mayWait) Line 569 C++
xul.dll!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal * thr, bool mayWait) Line 273 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1160 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 530 C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 97 C++
xul.dll!MessageLoop::RunHandler() Line 319 C++
xul.dll!MessageLoop::Run() Line 299 C++
xul.dll!nsBaseAppShell::Run() Line 160 C++
xul.dll!nsAppShell::Run() Line 420 C++
xul.dll!nsAppStartup::Run() Line 291 C++
xul.dll!XREMain::XRE_mainRun() Line 4777 C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4922 C++
xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 5014 C++

The last moz point in the call stack causes some sort of d3d11 driver flush. The flush drains the work in the driver worker thread, and some bug in the driver code then crashes the process. I got the driver callstacks and did some internal bug queries. Unfortunately the queries did not help, I cannot pinpoint which operation would trigger the bug, nor what the bug is.

If other minidumps would indicate the same crash point, then maybe one workaround could be blacklisting the D3D11 sharing on these old drivers. My knowledge doesn't extend to the driver internals, so I'm not sure how realistic this is.
If checking the other minidumps is a fast operation, then that could be one avenue of investigation.

The driver call stack is related to texture uploads, if that makes sense. I'm not sure if creating d3d11 keyedmutex sharetexture internally needs to upload something. If this is part of gfx system initialization, I'm not sure how much other textures have been created. Of course, if this is done per window, then most likely there could already be multiple textures in flight from other windows..

Note: I could not find the main thread from the crash report web ui reports.

Tim Spurway [:tspurway]

Updated

•

5 years ago

status-firefox66: affected → fix-optional

status-firefox67: --- → affected

Neha Kochar [:neha]

Comment 23

•

5 years ago

Jessie, are we still looking into this?

Flags: needinfo?(jbonisteel)

Jessie [:jbonisteel] pls NI

Comment 24

•

5 years ago

Hey Jeff - is this actionable + important?

Flags: needinfo?(jbonisteel) → needinfo?(bugzmuiz)

bugzmuiz

Updated

•

5 years ago

Flags: needinfo?(bugzmuiz) → needinfo?(jmuizelaar)

Tim Spurway [:tspurway]

Updated

•

5 years ago

status-firefox66: fix-optional → wontfix

status-firefox67: affected → wontfix

status-firefox68: --- → fix-optional

status-firefox-esr60: affected → wontfix

Jeff Muizelaar [:jrmuizel]

Updated

•

5 years ago

Depends on: 1548018

Jeff Muizelaar [:jrmuizel]

Comment 25

•

5 years ago

It seems like our symbol server is not serving up binaries any more. That makes analyzing this a bit of a pain.

Flags: needinfo?(jmuizelaar)

RaresB

Updated

•

3 years ago

QA Whiteboard: qa-not-actionable

BugBot [:suhaib / :marco/ :calixte]

Comment 26

•

2 years ago

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → WORKSFORME