Open Bug 1284322 Opened 4 years ago Updated 2 years ago

Consider reducing NVIDIA blacklisting

Categories

(Core :: Canvas: WebGL, defect, P3)

defect

Tracking

()

REOPENED
mozilla50
Tracking Status
platform-rel --- -
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: jrmuizel, Assigned: BenWa)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [gfx-noted][platform-rel-nVidia])

Attachments

(2 files)

We still have a somewhat arbitrary choice of nvidia driver before which we blacklist. This accounts for 17% of our WebGL failures. We recently lowered our AMD version with no significant consequences so far. We can probably do the same here with some testing.
Blocks: 1254008, 1257692
Whiteboard: [gfx-noted]
Here are the driver versions:
            {u'5.6.7.3': 1,
             u'6.1.7600.16385': 1,
             u'6.14.10.9382': 1,
             u'6.14.11.6218': 1,
             u'6.14.11.6757': 1,
             u'6.14.11.6921': 1,
             u'6.14.11.7474': 1,
             u'6.14.11.7924': 1,
             u'6.14.11.8224': 3,
             u'8.15.11.8593': 1,
             u'8.15.11.8624': 1,
             u'8.15.11.8644': 1,
             u'8.16.11.8886': 1,
             u'8.16.11.8950': 1,
             u'8.16.11.9181': 1})
Assignee: nobody → bgirard
Relevant entries:
    APPEND_TO_DRIVER_BLOCKLIST(OperatingSystem::WindowsXP,
      (nsAString&) GfxDriverInfo::GetDeviceVendor(VendorNVIDIA), GfxDriverInfo::allDevices,
      GfxDriverInfo::allFeatures, nsIGfxInfo::FEATURE_BLOCKED_DRIVER_VERSION,
      DRIVER_LESS_THAN, V(6,14,11,8265), "FEATURE_FAILURE_NV_XP", "182.65" );
    APPEND_TO_DRIVER_BLOCKLIST(OperatingSystem::WindowsVista,
      (nsAString&) GfxDriverInfo::GetDeviceVendor(VendorNVIDIA), GfxDriverInfo::allDevices,
      GfxDriverInfo::allFeatures, nsIGfxInfo::FEATURE_BLOCKED_DRIVER_VERSION,
      DRIVER_LESS_THAN, V(8,17,11,8265), "FEATURE_FAILURE_NV_VISTA", "182.65" );
    APPEND_TO_DRIVER_BLOCKLIST(OperatingSystem::Windows7,
      (nsAString&) GfxDriverInfo::GetDeviceVendor(VendorNVIDIA), GfxDriverInfo::allDevices,
      GfxDriverInfo::allFeatures, nsIGfxInfo::FEATURE_BLOCKED_DRIVER_VERSION,
      DRIVER_LESS_THAN, V(8,17,11,8265), "FEATURE_FAILURE_NV_W7", "182.65" );
Looking at the versions:
Major v1: 2001
Major v2: 2001-2002
Major v3: 2002
Major v4: 2002-2004
Major v5: 2002-2004
Major v6: 2004-2015
Major v7: 2004-2009
Major v8: 2005-2011
Major v10: A few 2006 drivers Microsoft VGA Adapters, Otherwise it's 2015+ drivers
Major v2009: Looks like NVidia has a few driver version that use the date as a version Only a few drivers during Feb-March of 2009. Example: 2009.3.17.1

XP: v6-v10 (all v2009)
Vista: v6-v10
Win7: v2-v10
Win8: v6-v10
Win8.1: v6-v10
Win10: v7-v10
Jeff pointed out that there's two versioning schemes that differ slightly within the major versions:
8.6.3.8	9-27-2006
vs
8.15.11.8593	5-14-2009

There's a jump in minor version and dates. This data would look very different if you considered these version discontinuities.
So the concerning this is that with XP we allow > 6,14,11,8265 but we have some 7.x, 8.x (plus a few 10.x 2006 VGA). We might want to consider blacklisting those. Perhaps we should query crash stats to see what happens for those users.
I meant to say that on XP we allow some 7.x, 8.x drivers to run from 2005 because of some versioning change that happened at NVIDIA.
* Discard anything on 2006 and older. This will throw out the old versioning scheme. Not strictly required but useful to keep things sane.
* Parse the last 5 padded digits.
* I don't see any correlation with the major version and the OS version (see below).


* The major version range we should support are:
** 6.14.1* : 95, XP (Mostly XP)
*** 071.89 (6-17-2005) -> 153.62 (7-22-2015) 
** 7.14.1*: Just a bit of 2006, ignore this one
** 7.15.1* : XP -> Win10 (Mostly XP)
*** 097.19 (11-1-2006) -> 185.86 (5-1-2009)
** 8.15.1* XP -> Win 10 (Mostly Win7)
*** 181.71 (2-26-2009) -> 190.80 (8-7-2009)
** 8.16.1* XP -> Win 10 (Mostly Win7)
*** 186.81 (8-19-2009) -> 192.36 (11-12-2010)
** 8.17.1* XP -> Win 10 (Mostly win7)
*** 195.39 (10-27-2009) -> 305.53 (8-2-2012)
** 9.18.1* XP - Win 10 (mostly win7)
*** 296.17 (3-5-2012 *varies) -> 353.06 (5-27-2015)
** 10.18.1* xp - Win 1o (mostly win10)
*** 353.24 (6-10-2015) -> 372.21 (6-19-2016)
SELECT system_gfx[1].adapters[1].driver_date as DriverDate, system_gfx[1].adapters[1].description as Device, system_gfx[1].adapters[1].driver_version as DriverVersion, split(system_gfx[1].adapters[1].driver_version, '.')[2] as RANGE, system_os[1].version as OS
FROM longitudinal
WHERE system_gfx[1].adapters[1].vendor_id = '0x10de' AND system_gfx[1].adapters[1].driver_version is not null
ORDER BY  system_gfx[1].adapters[1].driver_version
LIMIT 1000000
8.15.11.8593 on Windows 7 is about 3.7% of our NVIDIA users. They are blocked but it's a 185.93 driver which is newer than our cut off of 182.65 V(8,17,11,8265). We should consider relaxing the blacklist for them.
Alright we want to whitelist:
Win Vista+ users on > 8.15.11.8265
Win Vista+ users on > 8.16.11.8265

To re-iterate these users have drivers that are newer than mid-2009. These are likely users that have XP drivers that updated their OS to Windows 7.

This means whitelisting 1.2% of our Firefox population.
Comment on attachment 8769357 [details]
Bug 1284322 - Unblacklist NVIDIA >8.15.11.8265, >8.16.11.8265.

https://reviewboard.mozilla.org/r/63300/#review60142
Attachment #8769357 - Flags: review?(jmuizelaar) → review+
Here's a version with the compile errors fixed:
https://hg.mozilla.org/try/pushloghtml?changeset=32706d840dd1
Thanks!
Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1421386af3cc
Unblacklist NVIDIA >8.15.11.8265, >8.16.11.8265. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/1421386af3cc
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
I tested this manually in the nightly using 8.16.11.9062 (release 190.62). I tested WebGL/Video/Scrolling with D3D9 on a GeForce 7600 GT (Red NVIDIA Desktop in the gfx lab). Everything is working fine. I tested with an old Nightly and this configuration was previously blocked.

To download the driver I had to look at the range using the SQL queries above, manually google that driver to get a link. The normal NVIDIA driver search page doesn't seem to go this far back (returns up to 20 results).
Comment on attachment 8769357 [details]
Bug 1284322 - Unblacklist NVIDIA >8.15.11.8265, >8.16.11.8265.

Approval Request Comment
[Feature/regressing bug #]: None
I'd like to up-lift this to Aurora (but have it not ride the trains) so that we have a larger testing audience for longer. Any problems that this exposes will be limited to users that don't have up to date drivers so I expect they may be less likely to report issues so I want to test this as much as possible.
Attachment #8769357 - Flags: approval-mozilla-aurora?
I've run an analysis on crashes (Nightly - Windows 7 - NVIDIA - Driver version starting with "8.") before/after this change: https://github.com/marco-c/crashcorrelations/blob/master/NVIDIA%20Blocklist%20Change.ipynb. The bars represent the percentage of crashes with a given property (e.g. the first red bar represents the percentage of crashes with adapter_device_id="0x0162" in the group of crashes with the new blocklist, the first blue bar the same but in the group of crashes with the old blocklist)

Looks like crashes for the 0x0612 graphic card and the "8.15.11.8593" driver version were reduced drastically. For other graphic cards and other driver versions, the opposite (in a very limited way), but simply because of the drastic change in the percentage of crashes with "8.15.11.8593".

The number of crashes is pretty small, so some things could be due to noise (e.g. 1% of the crashes with the new blocklist means two crashes...).

To summarize, it's still early to tell because there are too few crashes for now, but looks like this change improved the stability with the 0x0612 graphic card and the "8.15.11.8593" driver version.

P.S.: Obviously, correlation is not causation, so there could be something else that is causing these changes in the crash population.
Thanks for posting this. A chance in crash rates is probably due to the relative stabilities of the code patch we turned off and the one we turned off. Judging by this data and what Anthony also reported it appears that the unblacklisted hasn't exposed any serious driver crashes.

We're not out of the woods yet however since we might have bad rendering bugs like corruption or black squares that don't cause any crashes.
Comment on attachment 8769357 [details]
Bug 1284322 - Unblacklist NVIDIA >8.15.11.8265, >8.16.11.8265.

Let's try it, thanks
Attachment #8769357 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Needs rebasing for Aurora.
Flags: needinfo?(bgirard)
Friendly reminder the Pulsebot doesn't comment uplifts ;)
Flags: needinfo?(bgirard)
Depends on: 1292311
This effectively got backed out by patches in bug 1292311.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
platform-rel: --- → ?
Whiteboard: [gfx-noted] → [gfx-noted][platform-rel-nVidia]
platform-rel: ? → -
You need to log in before you can comment on or make changes to this bug.