Closed Bug 1503776 Opened 6 years ago Closed 5 years ago

WebRender is incompatible with Normandy Pref Rollout

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: cpearce, Assigned: cpearce)

References

Details

Attachments

(4 files)

Normandy's Preference Rollout code sets default values on prefs, not user values (see uses of PrefUtils.setPref() in PreferenceRolloutAction.jsm). Default prefs are not persistent; unlike user prefs, changes to default pref values are not stored on disk. Changes to default values are only made on the in-memory copy of the pref's value, and thus don't survive a browser restart. Normandy changes the rolled out pref's early on in the startup of the browser, but not before gfxPlatform::Init() runs. So that means we can't use Normandy pref rollout to gradually rollout WebRender to release, as gfxPlatform::InitWebRenderConfig() won't see the rolled out version of the pref.

The plan is to work around this issue by saving the default value of the gfx.webrender.all.qualified pref on shutdown, and reading the saved value on startup.
Normandy's Preference Rollout code sets default values on prefs, not user
values (see uses of PrefUtils.setPref() in PreferenceRolloutAction.jsm).
Default prefs are not persistent; unlike user prefs, changes to default pref
values are not stored on disk. Changes to default values are only made on the
in-memory copy of the pref's value, and thus don't survive a browser restart.
Normandy changes the rolled out prefs early on in the startup of the browser,
but not before gfxPlatform::Init() runs. So that means gfx can't use Normandy
pref rollout to gradually rollout WebRender to release, as
gfxPlatform::InitWebRenderConfig() won't see the rolled out version of the
pref in time to turn on WebRender.

So to work around this, add a profile-before-change shutdown observer that
saves the default value of the gfx.webrender.all.qualified pref to a new user
pref, gfx.webrender.all.qualified.default. We check that on startup and
emulate the behavior that the pref system would have if that pref default
value had already been set by Normandy.
Add test that when we restart the browser with a default value set on
gfx.webrender.all.qualified, Firefox saves that value and checks respects
the saved value when initializing WebRender.

Depends on D10527
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/838d9917f028
Save default value of WebRender rollout pref to user pref, check on startup. r=mattwoodrow
https://hg.mozilla.org/integration/mozilla-inbound/rev/56f6234b17fc
Add test to verify WR qualified pref default value saved and respected on restart. r=mattwoodrow,r=bryce
Backed out for marionette failures on gfx\tests\marionette\test_pref_rollout_workaround.py

backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/0b96f524756ecc11750bd423ad122c54f72b19cc

push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=56f6234b17fc10e8312c9349fa8260b01885c7cf&searchStr=marionette&group_state=expanded&selectedJob=211135163

failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=211135163&repo=mozilla-inbound&lineNumber=52623

05:28:19     INFO -  1542000499461	Marionette	DEBUG	Closed connection 4
05:28:19     INFO -  [Parent 5324, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:28:19     INFO -  [Child 2804, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:28:19     INFO -  [Child 2804, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:28:19     INFO -  [Parent 5324, Gecko_IOThread] WARNING: file z:/build/build/src/ipc/chromium/src/base/process_util_win.cc, line 188
05:28:19     INFO -  1542000499699	Marionette	DEBUG	Received observer notification xpcom-will-shutdown
05:28:19     INFO -  1542000499700	Marionette	DEBUG	Remote service is inactive
05:28:19     INFO -  [GPU 3396, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:28:19     INFO -  ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
05:28:21     INFO -  self.wr_status()=unavailable,basic
05:28:21    ERROR -  TEST-UNEXPECTED-FAIL | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_non_qualifying_hw | AssertionError: Should start out as WR opt-in
05:28:21     INFO -  Traceback (most recent call last):
05:28:21     INFO -    File "Z:\task_1541999812\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 159, in run
05:28:21     INFO -      testMethod()
05:28:21     INFO -    File "Z:\task_1541999812\build\tests\marionette\tests\gfx\tests\marionette\test_pref_rollout_workaround.py", line 40, in test_wr_rollout_workaround_on_non_qualifying_hw
05:28:21     INFO -      self.assertEqual(status, 'opt-in', 'Should start out as WR opt-in')
05:28:21     INFO -  TEST-INFO took 2630ms
05:28:21     INFO -  TEST-START | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_qualifying_hw
05:28:23     INFO -  self.wr_status()=unavailable,basic
05:28:23    ERROR -  TEST-UNEXPECTED-FAIL | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_qualifying_hw | AssertionError: Should start out as WR opt-in
05:28:23     INFO -  Traceback (most recent call last):
05:28:23     INFO -    File "Z:\task_1541999812\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 159, in run
05:28:23     INFO -      testMethod()
05:28:24     INFO -    File "Z:\task_1541999812\build\tests\marionette\tests\gfx\tests\marionette\test_pref_rollout_workaround.py", line 79, in test_wr_rollout_workaround_on_qualifying_hw
05:28:24     INFO -      self.assertEqual(status, 'opt-in', 'Should start out as WR opt-in')
05:28:24     INFO -  TEST-INFO took 2058ms
05:28:24     INFO -  SUMMARY
05:28:24     INFO -  -------
05:28:24     INFO -  passed: 880
05:28:24     INFO -  failed: 2
05:28:24     INFO -  todo: 36 (skipped: 30)
05:28:24     INFO -  FAILED TESTS
05:28:24     INFO -  -------
05:28:24     INFO -  test_pref_rollout_workaround.py test_pref_rollout_workaround.WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_non_qualifying_hw
05:28:24     INFO -  test_pref_rollout_workaround.py test_pref_rollout_workaround.WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_qualifying_hw
05:28:24     INFO -  SUITE-END | took 387s
05:28:24    ERROR - Return code: 10
Flags: needinfo?(cpearce)
Hi Joel, I think my Marionette test here is failing because it's running on a machine in CI which doesn't have a GPU. Is it possible to configure this test to run on a GPU machine?
Flags: needinfo?(cpearce) → needinfo?(jmaher)
Note, that we don't support running Marionette tests for different components separated. As such we would have two options:

1) Run all Mn/MnH jobs on Windows with machines having a real GPU. I don't think this is wise, given the extra costs.

2) Creating a separate harness on top of Marionette (similar to Firefox ui tests, and Telemetry tests) for GFX tests, and configure them to run on GPU Windows machines.

For more details about the GPU machines I will let Joel speak up.
I have confirmed that running Mn/MnH jobs with virtual-with-gpu virtualization makes the test pass. I don't understand the costs involved with changing all Mn/MnH jobs to virtual-with-gpu.

I suspect that we won't have many gfx Marionette tests; my patch adds the first.
Do you know which part of the pipeline is failing due to lack of GPU? I wonder if we can just use WR_ROLLOUT_HW_QUALIFIED_OVERRIDE pref to override the behaviour (possibly by making it a tri-state pref).

Aside: on relanding, could you please fix line-wrapping at https://hg.mozilla.org/integration/mozilla-inbound/rev/838d9917f028#l1.258? The "WR! WR+" onwards should be on a new line
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Do you know which part of the pipeline is failing due to lack of GPU? I
> wonder if we can just use WR_ROLLOUT_HW_QUALIFIED_OVERRIDE pref to override
> the behaviour (possibly by making it a tri-state pref).

I added logging of nsIGfxInfo features/info/log:
https://treeherder.mozilla.org/logviewer.html#?job_id=211290188&repo=try&lineNumber=52649

This says D3D is blacklisted, and returns a message: BLOCKLIST_FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR, which I think corresponds to https://searchfox.org/mozilla-central/rev/7f7c353e969e61a6a85201cc8ad3c3de12ac30d8/widget/windows/GfxInfo.cpp#1511

It also says WebRender is unavailable because "ANGLE is disabled".


> Aside: on relanding, could you please fix line-wrapping at
> https://hg.mozilla.org/integration/mozilla-inbound/rev/838d9917f028#l1.258?
> The "WR! WR+" onwards should be on a new line

Sure, no problem.
(In reply to Chris Pearce (:cpearce) from comment #10)
> It also says WebRender is unavailable because "ANGLE is disabled".
> 

Could try gfx.webrender.force-angle = false in the test to work around this.
it sounds like you have determined the virtual-with-gpu solves the problem.  Those machines are 10-25 times the price of other machines, so there is a cost associated when you consider all the runs over time.  Having a Mn-gfx suite that has tests which need gpu makes more sense, or finding a way to integrate this into an existing suite that is already run on gpu enabled instances.
Flags: needinfo?(jmaher)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> (In reply to Chris Pearce (:cpearce) from comment #10)
> > It also says WebRender is unavailable because "ANGLE is disabled".
> > 
> 
> Could try gfx.webrender.force-angle = false in the test to work around this.

No luck there:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=81d1c05ca00b3bc873ff95fae2f8d83bafd263e9&selectedJob=211572567
We'd like to create a new Mn-gpu suite that runs on a GPU machine. Joel: what needs to happen to make that happen? Is that just a matter of patching in-tree .yml files or is there more to it?
Flags: needinfo?(jmaher)
(In reply to Chris Pearce (:cpearce) from comment #14)
> We'd like to create a new Mn-gpu suite that runs on a GPU machine. Joel:
> what needs to happen to make that happen? Is that just a matter of patching
> in-tree .yml files or is there more to it?

Note that we never had a really strong need for separate Mn jobs yet, and as such no work happened yet for bug 1313598. There might be substantial build system, mozharness, and taskcluster work necessary.

It might be easier to create a new derived harness (like firefox-ui, telemetry).
Flags: needinfo?(jmaher)
I would add it to:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/marionette.yml

then make sure the new marionette-gpu suite is added test-sets.yml:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml

note: in test-sets.yml there are many instances of marionette- typically one per platform, so you can control where to run this.
In order to reduce the cost of running marionette tests on a virtual machine
with a GPU, add a marionette-gpu job, and run the WebRender rollout test added
in the previous patch in this new job.

Depends on D10528
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fb616c66a1fd
Save default value of WebRender rollout pref to user pref, check on startup. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/bcb35a8cb22b
Add test to verify WR qualified pref default value saved and respected on restart. r=bryce
https://hg.mozilla.org/integration/autoland/rev/5b6f0d586bf7
Add marionette-gpu job. r=jmaher
Backed out for failing new MnG suite in test_pref_rollout_workaround.py:

https://hg.mozilla.org/integration/autoland/rev/8ada4020596a3a39fb9b08b316cfe849efc394eb

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&group_state=expanded&selectedJob=212696805&revision=5b6f0d586bf79f1427d822cd562ad6a8c248b4f2
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=212700978&repo=autoland

20:40:13     INFO -  1542660013260	Marionette	DEBUG	Received observer notification xpcom-will-shutdown
20:40:13     INFO -  1542660013260	Marionette	DEBUG	Remote service is inactive
20:40:14     INFO -  self.wr_status()=unavailable,basic
20:40:14    ERROR -  TEST-UNEXPECTED-FAIL | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_non_qualifying_hw | AssertionError: Should start out as WR opt-in
20:40:14     INFO -  Traceback (most recent call last):
20:40:14     INFO -    File "Z:\task_1542659068\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 159, in run
20:40:14     INFO -      testMethod()
20:40:14     INFO -    File "Z:\task_1542659068\build\tests\marionette\tests\gfx\tests\marionette\test_pref_rollout_workaround.py", line 40, in test_wr_rollout_workaround_on_non_qualifying_hw
20:40:14     INFO -      self.assertEqual(status, 'opt-in', 'Should start out as WR opt-in')
20:40:14     INFO -  TEST-INFO took 2086ms
20:40:14     INFO -  TEST-START | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_qualifying_hw
20:40:16     INFO -  self.wr_status()=unavailable,basic
20:40:16    ERROR -  TEST-UNEXPECTED-FAIL | gfx\tests\marionette\test_pref_rollout_workaround.py WrPrefRolloutWorkAroundTestCase.test_wr_rollout_workaround_on_qualifying_hw | AssertionError: Should start out as WR opt-in
20:40:16     INFO -  Traceback (most recent call last):
20:40:16     INFO -    File "Z:\task_1542659068\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 159, in run
20:40:16     INFO -      testMethod()
20:40:16     INFO -    File "Z:\task_1542659068\build\tests\marionette\tests\gfx\tests\marionette\test_pref_rollout_workaround.py", line 79, in test_wr_rollout_workaround_on_qualifying_hw
20:40:16     INFO -      self.assertEqual(status, 'opt-in', 'Should start out as WR opt-in')
20:40:16     INFO -  TEST-INFO took 1900ms
Flags: needinfo?(cpearce)
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3eb49d9e73bb
Save default value of WebRender rollout pref to user pref, check on startup. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/2d2d1dfd04d4
Add test to verify WR qualified pref default value saved and respected on restart. r=bryce
https://hg.mozilla.org/integration/autoland/rev/596f70b5d5b0
Add marionette-gpu job. r=jmaher
Flags: needinfo?(cpearce)
Some change in this series caused setupapi.dll to load as a dependency earlier
on in startup while loading xul.dll. I don't understand why, but setupapi.dll
is already a dependency of xul.dll so we should be loading it anyway.

Will file a follow up to investigate.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cf79174757b9aa88ec10ac9745a97869fa9914d

Depends on D12241
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2d716f894292
Save default value of WebRender rollout pref to user pref, check on startup. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/e9ba7f278364
Add test to verify WR qualified pref default value saved and respected on restart. r=bryce
https://hg.mozilla.org/integration/autoland/rev/a25095165e12
Add marionette-gpu job. r=jmaher
https://hg.mozilla.org/integration/autoland/rev/a5d7f9ff483b
Whitelist setupapi.dll for read during talos tests. r=aklotz
See Also: → 1509421
I filed Bug 1509421 to track down the xperf talos test regression/
Flags: needinfo?(cpearce)
See Also: → 1660057
You need to log in before you can comment on or make changes to this bug.