Closed Bug 1241921 Opened 4 years ago Closed 3 years ago

crash in nvd3d9wrapx.dll@*

Categories

(Core :: General, defect, critical)

Unspecified
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox46 + fixed
firefox47 + fixed
firefox52 --- fixed

People

(Reporter: tracy, Assigned: aklotz)

References

Details

(Keywords: crash, regression, topcrash-win, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-91115ee6-3e4f-48cc-848d-715702160122.
=============================================================

In aggregate these startup crash signatures accounted for over 60% of yesterdays (20160121) crash volume on Nightly.  All reports are from either Win 10 or Win 8.0/8.1.
100% of these crashes are on systems with Intel/Nvidia dual GPUs with the most common configuration being Intel Ivybridge HD Graphics 4000 and NVIDIA GeForce GT 635M card combinations (~42%).

Top NVIDIA GPUs:
0x0de3 	42.55 % [Fermi GF108M]
0x0ffc 	29.79 % [Kepler GK107GLM]
0x0def 	12.77 % [Fermi GF108M]
0x11b7 	10.64 % [Kepler GK104GLM]

Top Intel GPUs:
0x0166 	67.62 % [Ivybridge]
0x0416 	20.49 % [Haswell]
0x0a16 	 5.31 % [Haswell]
0x0116 	 3.29 % [Sandybridge]

Based on data it looks like this started with Firefox Nightly 46.0a1 20160121030208:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2e50b83954e62d52d2ef294e850c4380d457d96a&tochange=977d78a8dd78afbc0153d37fd9887c3a200dce6a

Milan, I'm not sure who should look at this, can you please take a look?
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
Keywords: regression
Bas, any chance this is somehow related to bug 1241012?  These seem to be Windows 10, D3D9 level hardware, and weird stack traces.
Flags: needinfo?(milan) → needinfo?(bas)
I'd be surprised if it was bug 1240730, but...
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bgirard)
As we're starting up the compositor parent?
No obvious correlation with any driver driver version. Seeing some old and latest versions. The most commonly crashing driver version is nvd3d9wrapx.dll version 10.18.10.4276 which is the latest intel driver:
https://crash-stats.mozilla.com/search/?product=Firefox&signature=~nvd3d9wrapx.dll&uptime=%3C%3D100&_facets=signature&_facets=adapter_driver_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-adapter_driver_version
Ryan, were you trying to ping me about this issue?  We could (downloadable) block D3D9 on nightly while we're sorting this out, but I'm not clear that the update would happen.
Flags: needinfo?(ryanvm)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #1)
> 100% of these crashes are on systems with Intel/Nvidia dual GPUs with the
> most common configuration being Intel Ivybridge HD Graphics 4000 and NVIDIA
> GeForce GT 635M card combinations (~42%).
> 
> Top NVIDIA GPUs:
> 0x0de3 	42.55 % [Fermi GF108M]
> 0x0ffc 	29.79 % [Kepler GK107GLM]
> 0x0def 	12.77 % [Fermi GF108M]
> 0x11b7 	10.64 % [Kepler GK104GLM]
> 
> Top Intel GPUs:
> 0x0166 	67.62 % [Ivybridge]
> 0x0416 	20.49 % [Haswell]
> 0x0a16 	 5.31 % [Haswell]
> 0x0116 	 3.29 % [Sandybridge]
> 

Can you post the full data somewhere or the query? I've got a 0x0046 (intel) + 0x0xdf1 (nvidia). This is Arrandale, pre-Sandybridge, first generation so this is probably too early to reproduce the bug. Checking the other machine we have in stock now.
These do look like starting up the compositor thread, and maybe with D3D9 since we're creating a TYPE_UI message loop instead of TYPE_DEFAULT. But as far as I can tell, creating the thread/message loop interacts with gfx code in no way. And if D3D9 is involved, that's mysterious since all of these crashes are Windows 10.

Do we have any threads that touch gfx on thread startup?
Bill, just in case, since you had an IPC change (though it's not supposed to be causing something like this, and :dvander didn't think it's related) in that range...
Flags: needinfo?(wmccloskey)
No, was pinging for different reasons. That said, https://hg.mozilla.org/mozilla-central/rev/a786af9186eb seems like a good culprit if they're all x64 crashes. That patch definitely was causing some fun when I was trying to apply it to Aurora (see the bug for more details).
Flags: needinfo?(ryanvm)
What do you think, Aaron?
Flags: needinfo?(aklotz)
Alright I've got a 0x0416 which is a match above with better volume. The NVIDIA is a 0x11fc (Quadro K2100M) which isn't exactly listed above.

Anthony if, and only if, you can quickly run this query it would be nice to see if its a match for NVIDIA and if you can see a crash with this exact combo. I'll probably focus on reproducing in this setup.

Regardless just matching the Intel GPU might be sufficient since we're crashing in the Intel d3d9 wrapper on init.
Flags: needinfo?(anthony.s.hughes)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #11)
> No, was pinging for different reasons. That said,
> https://hg.mozilla.org/mozilla-central/rev/a786af9186eb seems like a good
> culprit if they're all x64 crashes. That patch definitely was causing some
> fun when I was trying to apply it to Aurora (see the bug for more details).

That could be it - see:
https://forums.geforce.com/default/topic/492687/geforce-mobile-gpus/optimus-1000m-fault-in-nvd3d9wrapx-64-bit-ie9-64-bit-aborts-in-nvd3d9wrapx/
http://answers.microsoft.com/en-us/ie/forum/ie9-windows_7/why-does-64-bit-internet-explorer-9-crashes/414abf65-3e4f-e011-8dfc-68b599b31bf5?auth=1
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bgirard)
D3D9 in the name of the DLL above may be a red herring :)
If all the crashes are Win64 Optimus, I'd wager a lot of money on bug 1240848 :)
Yeah, sorry, red herring. Compositor is always TYPE_UI so it's definitely the compositor thread. But it shouldn't be touching GFX on startup.
I'd be pretty surprised if my patches caused this. Comment 11 seems like a really good guess though. The last thing from out code on the stack is a call to CreateWindow and the hook is for CreateWindowExW.
Flags: needinfo?(wmccloskey)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
> What do you think, Aaron?

Just want to add that my first even before seeing this was that it reminded me of a detour conflict bugs that I had discussed with Aaron that he found. The crash signature being in a 'wrap' dll which sounds like a place to detour in.

If we can afford to we should consider backing out the patch for a few Nightly to see if the volume goes away. That's my vote personally.
Given that the uplift is on Monday, I strongly agree. Backed out.

https://hg.mozilla.org/mozilla-central/rev/5f7c184ccd80
The back out has fixed the crash.  The last build ID with these signatures is 20160122030244.
Blocks: 1240977, 1240848
Status: NEW → RESOLVED
Closed: 4 years ago
Component: Graphics → General
Flags: needinfo?(anthony.s.hughes)
Resolution: --- → FIXED
Tracking for 46 since this was a topcrash -- in case this reopens.
There was a problem with the patch in Bug 1240848. I'll be fixing that. This doesn't look like the same problem though. We might need to test that patch on optimus before relanding.
Flags: needinfo?(aklotz)
These crashes appear to have returned, in quantity, in today's (2016-02-10) nightly.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
... which appears to be the first nightly in which bug 1240848 was relanded (although I didn't check changesets, but the times seem to match).
Assignee: nobody → aklotz
(In reply to David Baron [:dbaron] ⌚️UTC-8 from comment #24)
> These crashes appear to have returned, in quantity, in today's (2016-02-10)
> nightly.

Yes. all of the signatures associated with this report have returned in Nightly 20160210071115.
This is now about 8% of crashes on Dev Edition 46 as well.
That said, all the reports on Dev Edition 46 are from the build IDs 20160210004006 and 20160211004010, none from later builds, so the backout of bug 1240848 on the 11th helped.
Tracking for 47 as well since this appears to be spiking recently on nightly.
This was caused by the landing of bug 1240848, which was backed out and reopened, so I'm just going to dupe this.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1240848
Marking as fixed since the cause was backed out.
Bug 1240848 relanded three days ago and these old crash signatures (along with some new ones containing different addresses) have come flooding back :(  Nightly 20161014030204 was the first one to see them. And for Nightly 20161014060324 it's the #1 topcrash by a long way, with 100s of occurrences.

aklotz, over to you...
Status: RESOLVED → REOPENED
Crash Signature: nvd3d9wrapx.dll@0x2ec7] [@ nvd3d9wrapx.dll@0x2c07] [@ nvd3d9wrapx.dll@0x2ff7] → nvd3d9wrapx.dll@0x2ec7] [@ nvd3d9wrapx.dll@0x2c07] [@ nvd3d9wrapx.dll@0x2ff7] [@ nvd3d9wrapx.dll@0x3f47] [@ nvd3d9wrapx.dll@0x3fc7] [@ nvd3d9wrapx.dll@0x3e57] [@ nvd3d9wrapx.dll@0x3e27] [@ nvd3d9wrapx.dll@0x5cfb] [@ nvd3d9wrapx.dll@0x4085] [@ nv…
Flags: needinfo?(aklotz)
Resolution: DUPLICATE → ---
Crash Signature: nvd3d9wrapx.dll@0x2f77] [@ nvd3d9wrapx.dll@0x33cb] → nvd3d9wrapx.dll@0x2f77] [@ nvd3d9wrapx.dll@0x33cb] [@ nvd3d9wrapx.dll@0x3e1b] [@ nvd3d9wrapx.dll@0x3d97] [@ nvd3d9wrapx.dll@0x3097] [@ nvd3d9wrapx.dll@0x3da7] [@ nvd3d9wrapx.dll@0x3dd7]
I hate this DLL so much...
Flags: needinfo?(aklotz)
Note that this recent spike of crashes are all on x64. But based on what I know from this DLL when I debugged the 32-bit version last year [1], the most likely problem is the fact that both we and the NVIDIA DLL are trying to hook CreateWindowExW at the same time.

That stuff was added for the sake of Async Plugin Init, which we ceased development on. My educated guess is that if we remove those now unnecessary hooks, these crashes will clear up.

Since I am aware of people enabling async init in their prefs out in the wild, I think that this would also be a good opportunity to change the plugin code to disregard that pref and leave async init disabled at all times.

[1] http://dblohm7.ca/blog/2016/01/11/bugs-from-hell-injected-third-party-code-plus-detours-equals-a-bad-time/
Thanks for letting me know about this. My review isn't really worth anything here since I don't understand the code. I'll cancel and let Jim take it.
Comment on attachment 8801831 [details]
Bug 1241921: Remove CreateWindow* hooks from IPC glue;

https://reviewboard.mozilla.org/r/86456/#review85264
Attachment #8801831 - Flags: review?(jmathies) → review+
Comment on attachment 8801832 [details]
Bug 1241921: Disable async plugin init regardless of pref;

https://reviewboard.mozilla.org/r/86458/#review85266

Please file a bug on removing the async init code as well that notes the feature has been disabled in code in this bug.
Attachment #8801832 - Flags: review?(jmathies) → review+
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68956648f506
Remove CreateWindow* hooks from IPC glue; r=jimm
https://hg.mozilla.org/integration/autoland/rev/40cd599a9d4e
Disable async plugin init regardless of pref; r=jimm
https://hg.mozilla.org/mozilla-central/rev/68956648f506
https://hg.mozilla.org/mozilla-central/rev/40cd599a9d4e
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
See Also: → 1376177
You need to log in before you can comment on or make changes to this bug.