Closed Bug 984417 Opened 10 years ago Closed 10 years ago

Try not unnecessarily blacklisting Direct2D and Direct3D 10

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox31 --- disabled
firefox32 --- verified

People

(Reporter: bjacob, Assigned: bjacob)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

See e.g.

http://people.mozilla.org/~bjacob/gfx_features_stats/#win7

Our Direct3D 10 layers blacklisting rate stays much higher than our Direct3D 9 blacklisting rate, causing many Win7 users to stay on Direct3D 9, with no Direct2D and no fast WebGL compositing (ANGLE shared handles require Direct3D 10). I'm particularly worried about the WebGL performance issue as WebGL usage is picking up and this is silently running slow.

Looking at widget/windows/GfxInfo.cpp this could be caused by two different kinds of issues:

 1) As we relaxed blacklist rules, for some reason, we didn't relax Direct2D rules. Maybe we don't need to special-case Direct2D and we can use the same general driver version cutoff for all features, including Direct2D. The attached patch tries that.

 2) We detect mismatches between driver versions as reported by the Windows registry, and DLL versions. We do so because the Registry is where we get most of our information from, but it can give wrong information, and if it does, as long as it remains our primary source of information, we don't have data to base blacklisting decisions on. I would like to know how many of our users have Direct3D 10 layers blacklisted because of such mismatches. The attached patch annotates crash reports to tell us that.

https://tbpl.mozilla.org/?tree=Try&rev=607f128578e5
Attachment #8392241 - Flags: review?(jmuizelaar)
Attachment #8392241 - Flags: review?(bas)
Summary: Try not unnecessary blacklisting Direct2D and Direct3D 10 → Try not unnecessarily blacklisting Direct2D and Direct3D 10
Hrm, so I think this is interesting, however iirc there were concrete reasons for our Intel D3D10 blacklisting (i.e. older intel drivers -actually- caused a lot of crashes), so I'm a little bit hesitant about that part.
Yes, but:

* The bulk of our original Intel blacklisting was due to a strange crashes while applying windows or driver updates, IIRC, updating the intel driver between two versions that were commonplace at the time. Do we have any idea whether the problem still exists today?

* It could very well be that we'll have to re-blacklist some, but we won't know what we need to until we try, and with the new gecko cycle being 1 day old, now is as good a time as any! We have 4 months to find out. Same story as with Jeff's previous unblacklisting of intel d3d9 drivers.
(In reply to Benoit Jacob [:bjacob] from comment #2)
> Yes, but:
> 
> * The bulk of our original Intel blacklisting was due to a strange crashes
> while applying windows or driver updates, IIRC, updating the intel driver
> between two versions that were commonplace at the time. Do we have any idea
> whether the problem still exists today?

I actually feel like Intel drivers until some version (4405? 2215?) were just crashy in general with D2D.

I do sympathize with the idea of trying something early in the cycle.
Comment on attachment 8392241 [details] [diff] [review]
Remove possibly-unnecessary Direct2D blocklist rules, and annotate crash reports on DriverVersionMismatch

Review of attachment 8392241 [details] [diff] [review]:
-----------------------------------------------------------------

Let's just try.
Attachment #8392241 - Flags: review?(bas) → review+
Comment on attachment 8392241 [details] [diff] [review]
Remove possibly-unnecessary Direct2D blocklist rules, and annotate crash reports on DriverVersionMismatch

Review of attachment 8392241 [details] [diff] [review]:
-----------------------------------------------------------------

Jeff is at GDC.
Attachment #8392241 - Flags: review?(jmuizelaar)
https://hg.mozilla.org/mozilla-central/rev/7e86ed0d8756
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
This already yields one very interesting piece of information: among our Nightly users, the DriverVersionMismatch case is not a significant cause of failing to get D3D10 Layers. Indeed:

bjacob:~/crash-stats$ zcat 20140320-pub-crashdata.csv.gz | grep 'D3D10 Layers-' | wc -l
131648
bjacob:~/crash-stats$ zcat 20140320-pub-crashdata.csv.gz | grep 'D3D10 Layers-' | grep DriverVersionMismatch | wc -l
38

So, on March 20, out of 131648 crash sessions that had D3D10 fail to initialize, only 38 had the DriverVersionMismatch.
Ouch, ignore comment 8, I forgot to filter for Nightly...
Among the Nightly crashes with build ID March 2[01] received on March 21, and using Intel Graphics (vendor=0x8086):

bjacob:~/crash-stats$ zcat 20140321-pub-crashdata.csv.gz | grep 'AdapterVendorID: 0x8086' | grep Firefox.31.0a1.2014032[01] | grep 'D3D10 Layers-' | wc -l
74
bjacob:~/crash-stats$ zcat 20140321-pub-crashdata.csv.gz | grep 'AdapterVendorID: 0x8086' | grep Firefox.31.0a1.2014032[01] | grep 'D3D10 Layers-' | grep DriverVersionMismatch | wc -l
54

So now the proportion of D3D10 blacklisting on Intel caused by DriverVersionMismatch seems to be 54/74 == more than 70% !
The proporting is lower now, though still 37%:

bjacob:~/crash-stats$ zcat 20140429-pub-crashdata.csv.gz | grep 'Firefox.\(31\|32\)' | grep 'D3D10 Layers-' | wc -l
288
bjacob:~/crash-stats$ zcat 20140429-pub-crashdata.csv.gz | grep 'Firefox.\(31\|32\)' | grep 'D3D10 Layers-' | grep DriverVersionMismatch | wc -l
106

106/288 == 37%
Ah no, I had forgotten to filter for Intel Graphics only. The actual figure is 80%:

bjacob:~/crash-stats$ zcat 20140429-pub-crashdata.csv.gz | grep 'AdapterVendorID: 0x8086' | grep Firefox.3[1-9] | grep 'D3D10 Layers-' | wc -l
132
bjacob:~/crash-stats$ zcat 20140429-pub-crashdata.csv.gz | grep 'AdapterVendorID: 0x8086' | grep Firefox.3[1-9] | grep 'D3D10 Layers-' | grep DriverVersionMismatch | wc -l
106

106/132 == 80%.

So 80% of Nightly users on Intel Graphics who have D3D10 blacklisted, have it blacklisted because of DriverVersionMismatch.
Comment on attachment 8415330 [details] [diff] [review]
DriverVersionMismath: don't blacklist, just keep annotating crash reports.

The above already-landed-in-31 patch made us annotate crash reports on DriverVersionMismatch, but we were still blacklisting when that happened.

Now that we're at the start of a new cycle, let's try not blacklisting anymore. We can back out if we get a spike of crashes. The crash report annotation is left in, so it will allow us to find out if crashes are on systems where DriverVersionMismatch happens.
Attachment #8415330 - Attachment description: DriverVersionMismath: don → DriverVersionMismath: don't blacklist, just keep annotating crash reports.
Attachment #8415330 - Attachment is patch: true
Attachment #8415330 - Flags: review?(bas)
Attachment #8415330 - Flags: review?(bas) → review+
Depends on: 988549
Given the backout in bug 988549, we shouldn't forget to mark this disabled for 31 once that backout lands.
This was reverted from Fx31 in bug 988549.
Bas/Benoit, shouldn't this be backed out from all branches?
Flags: needinfo?(bjacob)
Flags: needinfo?(bas)
I guess that's one option, yes. But it's not the only one.

The reason why we did this in the first place is that this has the potential to move about 10% of Win7 users from D3D9 to D3D10, which, in the context of WebGL games, means moving them from "slow WebGL" to "fast WebGL" as we don't have a way at the moment to composite WebGL frames efficiently on D3D9.

Now we ran into a driver bug preventing us from unblacklisting all of what we did try to unblacklist, so we do need to revert that part. But surely that must be specific to one GPU vendor, and probably even to certain particular driver versions and/or device IDs? Have we checked that from crash-stats?

It seems to me that the right thing to do there is to check that and to selectively undo this unblacklisting only on affected configurations.
Flags: needinfo?(bjacob)
And really, that is the only way that we've ever made progress on the gfx unblacklisting front - land a big unblacklisting and look into the resulting crash-stats.
Flags: needinfo?(bas)
Three-part question?
* What are we going to do in the next 5 days before 32beta?
* What are we going to do for 33/soon-to-be-aurora
* What are we going to do long-term?

My general preference would be, unless we know exactly what we want to do, to back this out completely, analyze the data from bug 988549, and then re-land a corrected version of this. That's not the way it has to be, but I really don't want this to end up on beta next week the way it is.
Flags: needinfo?(bjacob)
Sure, let me back this out. The problem here was that this was a long-term project that ended up overlapping my moving away from gfx. This needs to be picked up by someone who cares about gfx acceleration rates and will be on the hook to fix gfx crashes.
Flags: needinfo?(bjacob)
Note that backing out the first patch here, as was done on bug 988549, is not exactly the right thing to do, since it also removes the DriverVersionMismatch crashreport annotation. Instead, we need to keep that hunk, or else we'll never get correct crash data on the impact of the second patch here should it be a cause of problems too.
I've just done that analysis over on bug 988549 comment 59. The issue is entirely Intel-specific. There is thus no need to back out anything but the Intel-specific portion of this patch.
There's now a patch on bug 988549 doing the Intel-only partial backout. We can hope to have it landed on all trees tomorrow.
(In reply to Benoit Jacob [:bjacob] from comment #21)
> And really, that is the only way that we've ever made progress on the gfx
> unblacklisting front - land a big unblacklisting and look into the resulting
> crash-stats.

This time we did it in a way that nobody knew there was a change in gfx blacklisting up until the last week of beta. That is not how to do changes like that.

(In reply to Benoit Jacob [:bjacob] from comment #25)
> I've just done that analysis over on bug 988549 comment 59. The issue is
> entirely Intel-specific. There is thus no need to back out anything but the
> Intel-specific portion of this patch.

Sure, this one signature is Intel-specific, we know that. Have you checked for all potentially additional signatures affected by this? Which graphics cards/chips would be affected?
Also, we are only starting to get driver version as a separate crash report field that we can do searches on, so we are only starting to get into a situation to usefully analyze this.

If we go for unblacklisting, I'd prefer to do it in a controlled way for one class of gfx crads/chips at a time, on nightly and aurora, with all the tooling in place to analyze if new crashes come up on those chips and drivers, and verify the data on beta. This may be slow and need quite a bit of work but we owe it to our users (including those on beta) to be careful and not to make their experience significantly worse.
In the interest of keeping the conversation on only 1 bug, see bug 988549 comment 67.
QA Contact: florin.mezei
I've verified today the behavior for this, considering that the Intel changes were backed out (basically the Intel entries removed in https://hg.mozilla.org/mozilla-central/rev/7e86ed0d8756 were added back). I've verified this with Firefox 32 Beta 8 (BuildID: 20140818191513) on Windows 7 x64 by spoofing the video data:
1. Create "spoofed-firefox.bat" file with following content:
SET MOZ_GFX_SPOOF_WINDOWS_VERSION=
SET MOZ_GFX_SPOOF_VENDOR_ID=
SET MOZ_GFX_SPOOF_DEVICE_ID=
SET MOZ_GFX_SPOOF_DRIVER_VERSION=
"C:\Mozilla\Firefox\firefox.exe" -p -no-remote
2. Set the IDs and versions values, saved the file, opened it to launch Firefox, created a new profile, started Firefox, and verified "about:support" -> Graphics.

Data used and results can be found at https://docs.google.com/spreadsheets/d/12yEBqNZqaR6_2XbqFrVwRVxt_URjNN3iMmu-GX60Z0Q/edit#gid=0, lines 35 to 66.

All results were according to the changes in this fix, with the exception of:
- additional versions blocked in the "browser\blocklist.xml" file, which seem to add to the versions blocked in "widget/windows/GfxInfo.cpp"
- Intel version initially removed, and added back due to issues in bug 988549.

Everything seems to work as intended.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.