Closed Bug 1571969 Opened Last month Closed 28 days ago

Stop running all the QR tests as virtual-with-gpu

Categories

(Testing :: General, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: bholley, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

I was digging through our cost numbers for automation and realized that our Windows 10 QuantumRender tests all run on VMs with virtualized GPUs. These VMs cost about $1.10 per hour, as opposed to ~$0.32 per hour for regular VMs. So regular VMs cost 70% less.

For non-QR Windows automation, we still run some tests with virtualized GPUs - the mochitest-gpu suite, the webgl tests, reftests, and a handful of other things. This accounts for about 10% of total CPU time running windows tests, and seems like a good cost trade-off for rendering-heavy suites. But I don't think it's really justifiable to run everything with GPUs given what it costs. We should still be able to test the WebRender code paths by forcing it on and running against WARP.

If we align the QR tests with the non-QR ones, we can gain a 70% cost reduction on 90% of our QR tests. That's huge.

I don't think WebRender is getting enabled on WARP. I'll do some investigation into why.

I had done a try push to fix many of the test differences:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8e60caca25de21e532e55fbce2267a2275635d3

given :jrmuizel's comment, I will hold on on finishing that

It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

It looks like all the mochitests pass. There's some WARP related things that need fixing for browserchrome.

Handing this bug off to Jeff, since he's probably the right person to get it over the line.

Assignee: bobbyholley → jmuizelaar

Bug 1573616 should fix the bc5 failures.

Depends on: 1573616
Depends on: 1573645
Depends on: 1573681
Depends on: 1573682

This has an xpcshell test failure in the debug build and a unexpected PWebRenderBridge::Msg_GetSnapshot sync IPC before first paint bc1 failure.

And a version that keeps web-platform-reftests running on gpus https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4ce54bfb618a55ae21aec48a06eded541a06e98

Depends on: 1574281
Depends on: 1574327

This uses the layers.d3d11.enable-blacklist pref to allow running WebRender on WARP.

Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10a056f52e49
Stop running all the QR tests as virtual-with-gpu. r=jmaher
Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f8a94072fa6
Stop running all the QR tests as virtual-with-gpu. r=jmaher

I don't see this in time and requeued a landing. It will need to be backed out again

Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/67be750311a1
Stop running all the QR tests as virtual-with-gpu. r=jmaher
Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a5710687f9b4
Stop running all the QR tests as virtual-with-gpu. r=jmaher
Status: NEW → RESOLVED
Closed: 28 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

Awesome, thanks Jeff!

For posterity, can you explain why we needed to do anything with Talos and Raptor here? I'd think we would be running those on hardware.

:bholley, I know :jrmuizel is on pto this week, in this line:
https://hg.mozilla.org/mozilla-central/rev/a5710687f9b4#l4.17

we skip harnesses that do not support --setpref, and those harnesses will not end up running with WARP, but as they were previously. For Talos and Raptor, we shouldn't do anything, so in this case Raptor is running as before, but Talos has the new pref set. I assume that should be fixed.

To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?

:ahal, can you add in Talos to the list referenced earlier in my comment?

Flags: needinfo?(dave.hunt)
Flags: needinfo?(ahal)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #31)

To be honest, I am not sure what if any regressions we see with win-qr talos/raptor. If there were changes (and I would expect there to be some) we should see them posted in this bug. Dave, can you see what changes came up with this change to win10-qr talos results?

I'm still confused. Per comment 30, I was under the impression that Talos and Raptor always ran on physical hardware, so I would expect that the VM configuration changes in this bug to have no effect on those tests. Am I mistaken somehow?

Looks like talos does support --setpref (and raptor doesn't). So is there still something to fix? I have zero understanding of talos configurations so would like to clarify before submitting patches blindly.

Flags: needinfo?(ahal) → needinfo?(jmaher)

I verified both raptor and talos for win10 and win10-qr are running on hardware. There is one exception, talos-xperf runs on VMs as it measures FileIO operations and not timing. We saw that for those runs we had to add exceptions to the whitelist of known .dll's accessed.

I guess the question is- does the --setpref=layers.d3d11.enable-blacklist=false affect what we run on hardware (i.e. does it use the WARP backend)?

Flags: needinfo?(jmaher)

layers.d3d11.enable-blacklist=false only allows running with WARP. If we're running on hardware with an actual GPU that GPU should be used.

Flags: needinfo?(jmuizelaar)

oh, then probably nothing to do, thanks for the confirmation Jeff (and stay on PTO!)

Regressions: 1575534
Flags: needinfo?(dave.hunt)
Attachment #9083536 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.