Closed Bug 1499554 Opened 10 months ago Closed 10 months ago

Windows AWSY tests don't have a GPU process

Categories

(Testing :: AWSY, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox65 fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: bholley, Assigned: rhunt)

References

Details

Attachments

(3 files)

See bug 1499509 comment 3.

We need to fix this if we want our memory comparisons to be meaningful. It may be related to the fact that we run QR tests on real hardware but virtualize the non-QR tests.

I'm not sure whether it's specific to AWSY or a more general issue in our windows 10 automation. I hope it's not a more general issue, because that would mean we're not testing the bits we ship to most of our users.

Jeff, any theories?
This seems odd to me as well.

We used to only use the GPU process when hardware accelerated, with a pref to allow it for software [1]. This pref rode the trains for windows in bug 1385048. But it looks like that pref is actually unused in Gecko, and I can't find the commits that removed that functionality.

I don't see any code in gfxPlatform to not launch the GPU process for software/windows. [3] It doesn't look like AWSY runs under headless or safe mode.

Bug 1462450 has a patch which could be useful for debugging this, it'll dump about:support which should give us the reason the GPU process isn't being used.

[1] https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/gfx/thebes/gfxPrefs.h#616
[2] https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/modules/libpref/init/all.js#4879
[3] https://searchfox.org/mozilla-central/rev/3d989d097fa35afe19e814c6d5bc2f2bf10867e1/gfx/thebes/gfxPlatform.cpp#2558
Ryan, do you have cycles to look into this? It's pretty important for WR to be able to make apples-to-apples comparisons on AWSY.
Flags: needinfo?(rhunt)
Yeah, I'll try and take a closer look today.
Flags: needinfo?(rhunt)
Thanks!
Assignee: nobody → rhunt
Here's the relevant snippet from about:support on AWSY without WebRender.

{
"name":"GPU_PROCESS",
"description":"GPU Process",
"status":"blacklisted",
"log":[
  {"type":"default","status":"available"},
  {"type":"env","status":"blacklisted","message":"#BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR"}
]
}

Looks like we're blocklisted based on being in a VM?
Here are the bits about the adapter. [1]

"adapterDescription":"Microsoft Basic Display Adapter",
"adapterVendorID":"0x1013",
"adapterDeviceID":"0x00b8",
"adapterSubsysID":"00015853",
"adapterRAM":"0",
"adapterDrivers":"Unknown",
"driverVersion":"10.0.15063.0",
"driverDate":"6-21-2006"

I wonder if there's a reason we don't allow this vendor. Might be worth working around in testing, but leaving as is for release users.

We can force enable the GPU process and that might work around the block list. [2]

[1] https://searchfox.org/mozilla-central/rev/eef79962ba73f7759fd74da658f6e5ceae0fc730/widget/windows/GfxInfo.cpp#1501
[2] https://searchfox.org/mozilla-central/rev/eef79962ba73f7759fd74da658f6e5ceae0fc730/gfx/thebes/gfxPrefs.h#618
Does this blacklist also apply to hardware acceleration, i.e. are w running Basic Compositor for all our windows tests run in a VM? That would be worrisome!
Attached file about-support.json
Should've just attached the full about:support from the beginning. Yes we're getting basic layers here.
Attached file qr-about-support.json
about:support with webrender, for reference.
Jeff, is blacklisting hardware acceleration and the GPU process expected here?

I think we can definitely force enable the GPU process in this situation, but I'm unsure about hardware acceleration.
Flags: needinfo?(jmuizelaar)
Talked with Jeff in the daily call today about this.

I believe the summary is that if we're allowing the GPU process with software rendering in release, we should be consistent here. So I think this fallback unknown blacklisting shouldn't apply to the GPU process.
Flags: needinfo?(jmuizelaar)
That works, though we should _also_ fix the fact that IIUC we're not testing D2D on automation.
(In reply to Bobby Holley (:bholley) from comment #12)
> That works, though we should _also_ fix the fact that IIUC we're not testing
> D2D on automation.

So we do try and test both D2D and Skia on automation, but it's not always watched closely. For example, from a hardware upgrade we accidentally stopped testing Skia in Talos for awhile and didn't notice (bug 1458638). But yes, we do test D2D today in automation for mochitests and reftests [1] [2].

The biggest thing to determine whether we use D2D so far has been the hardware. I understand that it's useful to have D2D test coverage for AWSY, but I'm unsure if it's meaningful within a VM without graphics acceleration. I'm also unsure if we have enough resources to test AWSY with and without D2D.

So right now I'll try and get the GPU process going in AWSY, and someone more involved should answer the D2D question.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=62e384532eeb60f5c61383ca8b16d614c49e1838&selectedJob=206846332
[2] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=62e384532eeb60f5c61383ca8b16d614c49e1838&selectedJob=206846272
So I've kicked off a try run adding the GPU process force enable pref to the AWSY pref set just to make sure we don't have any runtime issues. [1] Thinking about this further, that will enable it for Linux which is undesirable. We should fix this at the blocklist level somehow. 

It looks like the device vendor ID is 0x1013, which isn't in the whitelist. The whitelist includes 'Microsoft' though, and the device description is 'Microsoft Basic Display Adapter', so maybe we should just add this vendor ID to the whitelist?

A quick google search didn't pull up anything about that vendor ID, so I'm unsure what it refers to.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e3edbc70417ff3c08c8103b40fb3449ae5bcd55
(In reply to Ryan Hunt [:rhunt] from comment #14)
> So I've kicked off a try run adding the GPU process force enable pref to the
> AWSY pref set just to make sure we don't have any runtime issues. [1]
> Thinking about this further, that will enable it for Linux which is
> undesirable. We should fix this at the blocklist level somehow. 
> 
> It looks like the device vendor ID is 0x1013, which isn't in the whitelist.
> The whitelist includes 'Microsoft' though, and the device description is
> 'Microsoft Basic Display Adapter', so maybe we should just add this vendor
> ID to the whitelist?
> 
> A quick google search didn't pull up anything about that vendor ID, so I'm
> unsure what it refers to.
> 
> [1]
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=5e3edbc70417ff3c08c8103b40fb3449ae5bcd55

0x1013 is Cirrus Logic which means the graphics card is probably just a dumb framebuffer attached to an output of some sort. These kinds of devices show up in servers.
(In reply to Ryan Hunt [:rhunt] from comment #13)
> So right now I'll try and get the GPU process going in AWSY, and someone
> more involved should answer the D2D question.

I think it's mostly important just for making an apples-to-apples comparison between D2D and WR. So if I have a reliable way to trigger try comparable configurations on try, we should be good to go.

IIUC, WR runs on real hardware rather than in VMs. Presumably I'd need to trigger that for AWSY somehow, which should give me results for D2D?
awsy runs on vms for all linux (included linux64-qr).  awsy runs on hardware for windows
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)
> awsy runs on vms for all linux (included linux64-qr).  awsy runs on hardware
> for windows

This bug seems to indicate that AWSY win10 (non-QR) is running in a VM (which is why it's encountering a phony driver vendor and disabling hardware acceleration and out-of-process composition). Are you saying that's not intended?
Flags: needinfo?(jmaher)
crap, I was wrong- sorry for that.

so win7/win10 opt/pgo are virtual; win10-qr is hardware:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/awsy.yml

I think we want to make all of it virtual except there are stability issues getting win10-qr/awsy running in aws (we have the same issue with win10* reftests)
Flags: needinfo?(jmaher)
We currently allow the GPU process if we are not hardware accelerated. One of the
reasons we might not use hardware acceleration is because the device vendor is
not in the gfxInfo whitelist. In this case, we should be consistent and still
use the GPU process.
Attachment #9019176 - Attachment description: Bug 1499554 - Don't blacklist the GPU process for vendors not on the whitelist. r?jrmuizel → Bug 1499554 - Don't blacklist the GPU process for vendors not on the whitelist. r=jrmuizel
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/autoland/rev/b7b09fca2cc5
Don't blacklist the GPU process for vendors not on the whitelist. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/b7b09fca2cc5
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
AWSY perf wins! \0/

== Change summary for alert #17072 (as of Wed, 24 Oct 2018 05:19:07 GMT) ==

Improvements:

  3%  Explicit Memory windows10-64 pgo stylo     324,192,702.96 -> 314,945,241.56
  2%  Explicit Memory windows10-64 opt stylo     321,061,582.98 -> 314,327,783.43

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=17072
Well, I guess this is a just a baseline update :)
You need to log in before you can comment on or make changes to this bug.