Closed Bug 1645528 Opened 4 years ago Closed 3 years ago

[wayland] Connect nsRefreshDrivers in content processes with a widget-local vsync source

Categories

(Core :: Widget: Gtk, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox85 --- wontfix
firefox86 --- fixed

People

(Reporter: kennylevinsen, Assigned: rmader)

References

(Blocks 2 open bugs)

Details

(Keywords: perf-alert)

Attachments

(7 files, 2 obsolete files)

Currently, WaylandVsyncSource is only used in the main process, for refresh drivers and composition scheduling. Refresh drivers in content processes fall back to a synthetic 60Hz source.

An IPC mechanism already exists to guide vsync to content processes, but it assumes a global vsync source is available.

Instead, a separate IPC mechanism should be used to connect the nsRefreshDriver with its associated root widget. If we move nsRefreshDrivers between native widgets, we also need to make sure to relocate them.

Blocks: vsync, wayland
Type: defect → enhancement
Blocks: 1640779
Blocks: 1629140
See Also: 1629140

Small update: my local patch does appear to work quite well with browser.tabs.remote.autostart:false :/ But getting IPDL to work right is obviously the hardest part here.

Note to myself: make sure to set privacy.reduceTimerPrecision:false while testing.

Progress! \o/
Apparently there was not so much missing from Kennys previous work - and what's even more striking, the missing parts look like they were even already planned more or less by other devs.

Still only works with browser.tabs.remote.autostart:false, but here are some initial screenshots what to expect (with privacy.reduceTimerPrecision:false set and with a slightly enhanced version of bug 1670444):

Attached image on a 60Hz display
Attached image WebGL behaves as well
See Also: → 1601966
See Also: → 1601962

The latest version finally allows to run with e10s enabled - phew! Still just a prototype - it's not clear if this is really the approach we want to go. It does, however, look reasonable to me, especially as it would allow platforms to migrate at their own speed, in contrast to changing the existing logic.

Depends on: 1674758

Here's a try push with an optimized build for myself and anybody interested to play with. I do not own any >60Hz displays, so feedback on that is especially welcome.

https://treeherder.mozilla.org/jobs?repo=try&revision=5243f4d1114d98c4673fd9e38136e8facfe0eecd

To run:
MOZ_ENABLE_WAYLAND=1 mozregression --repo try --launch 5243f4d1114d98c4673fd9e38136e8facfe0eecd --pref gfx.webrender.all:true widget.wayland_vsync.enabled:true privacy.reduceTimerPrecision:false -a https://vsynctester.com

Update Nov 10.:
https://treeherder.mozilla.org/jobs?repo=try&revision=bcd7237c2ab2581ae50a883adc280c14f2f73cc6

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED
Attached image Testufo 60 and 30 Hz

Sorry for the noise, I'm just enjoying the optimized build right now. So here's two windows of the same process running at different refresh rates. Getting Mutter to support different refresh rates on Wayland was quite a journey last cycle (3.38) - being able to make use of it in FF would make things perfect :)

Depends on: 1675680

Some notes concerning threads:

  1. Most vsyncsources use a dedicated vsync thread. The Wayland one runs on the main thread though as it turned out to be faster/better (bug 1675680 comment 1)
  2. VsyncParent runs on the background thread while VsyncChild runs in the main thread
  3. The IPC done by VsyncParent is heavily dominated by NotifyVsync(), which is called on the thread the vsync source runs on
  4. Thus in VsyncParent we currently go vsync-thread->background-thread->IPC
  5. In the proposed patch it's vsync-thread->main-thread->IPC - in the Wayland case this means however: main-thread->IPC, which is alredy optimal
  6. Arguably vsync-thread->main-thread->IPC is worse than vsync-thread->background-thread->IPC for other vsync sources

So when thinking about potentially moving TabVsyncParent off the main thread, it appears to me that the only sensible solution would be to move it into the vsync thread (which, again, in case of Wayland is still the main thread). The background thread, however, does not appear to have any advantages to me.

P.S.: AFAIK there are listeners of NotifyVsync() on the main thread anyway, thus dispatching to the background thread probably does not really help in either case. If that is true, the proposed patch would already not regress.

Markus, do have any insights or opinions about the comment above? Should I try to move TabVsyncParent to the vsync thread?

Flags: needinfo?(mstange.moz)

Very interesting! It does indeed sound like the current approach is optimal for Linux, then. However, it still doesn't seem great to enforce the same limitation on other platforms.

(In reply to Robert Mader [:rmader] from comment #11)

it appears to me that the only sensible solution would be to move it into the vsync thread

This would require a new top-level protocol, which comes with extra code complexity. It may or may not be worth it. Furthermore, some OS APIs may not give you enough control over the vsync thread to properly start up and tear down the protocol endpoint. For example, on macOS, the vsync callback itself really is the only code we can run on the vsync thread. So PBackground has a simplicity advantage.

P.S.: AFAIK there are listeners of NotifyVsync() on the main thread anyway

Yes, the refresh driver of the window document (the browser UI) runs on the main thread of the parent process. So any paints in the UI, and things like scrolling sideways in the tab bar, happen from the vsync callbacks that occur on the parent process main thread.

thus dispatching to the background thread probably does not really help in either case.

It still helps for animations that run independently of the parent process main thread: Compositor CSS animations (for example tab throbbers), APZ scroll animations, and animations that run in the content process.

I'm trying to think of alternatives...

Flags: needinfo?(mstange.moz)

New information: It looks like on Windows, when the GPU process is used (so most of the time), vsync notifications to the compositor also have to roundtrip through the parent-process main thread. Nevermind, I was wrong about this. PVsyncBridge is a top-level protocol whose parent-side endpoint lives on the vsync thread.

One idea would be to make PVSync a sub-protocol of PCompositorBridge - the protocol that connects the compositor thread to the various main threads (both parent and content). At the moment, vsync notifications are sent to the compositor thread already. So, for the vsync notifications that are sent to the various main threads, we could send them from the compositor thread rather than via the background thread. In terms of thread hops, this would be:

  • Equivalent to what we do today on Android and macOS. Instead of vsync thread -> background thread -> content, we'd have vsync thread -> compositor thread -> content.
  • Similar on Windows with a GPU process, but with two instead of one process hops, since the Windows vsync source lives in the parent process. So rather than vsync thread (parent) -> background thread (parent) -> content, we would now go vsync thread (parent) -> compositor thread (gpu) -> content. We might be able to move the vsync source into the GPU process, which would eliminate the extra process hop.
  • Strictly worse on Linux compared to the proposed patch in this bug (which makes PVsync a sub-protocol of PBrowser).
  • Equivalent in a potential future world where macOS has a GPU process, because on macOS we can definitely put the vsync source into the GPU process.

But, other than the extra thread hops, it might be more efficient: PCompositorBridge already has infrastructure for connecting windows to browsers, or rather "compositors" to "LayersId", without creating one protocol instantiation per browser:

[...] The
 *   CompositorBridgeChild is a singleton in the content process. Note that
 *   a given content process may have multiple browser instances (represented
 *   by BrowserChild instances), that are attached to different windows, and therefore
 *   rendered by different compositors. This means that when a browser instance
 *   sends messages via its CompositorBridgeChild, the corresponding
 *   ContentCompositorBridgeParent has to use the layers id in the message
 *   to find the correct compositor or CompositorBridgeParent to pass the message
 *   on to.

There's another advantage of sending vsync via the compositor thread, but it's rather orthogonal to this bug. I'll attempt to describe it anyway:
If we send vsync to content via the compositor, we could have some nice ordering guarantees. We could give the compositor a shot at handling vsync before the we forward vsync to the content process. And we could use this to sample compositor-side APZ scroll animations and forward the updated scroll position value to content processes, and make sure that this value arrives before the vsync notification arrives. As a result, the upcoming content paint would always take the new position into account.
But, as I said, that's a rather orthogonal issue. We could start out with having PVSync a sub-protocol of PBrowser and I could still try to move it to PCompositorBridge later.

And there's another, Android-specific, reason to make vsync go via (a sub-protocol of) PCompositorBridge, which I just filed as bug 1675899.

In my original comment 14 I had some incorrect information about vsync on Windows with GPU process. I have corrected the comment.

(In reply to Robert Mader [:rmader] from comment #10)

Created attachment 9185335 [details]
Testufo 60 and 30 Hz

Sorry for the noise, I'm just enjoying the optimized build right now. So here's two windows of the same process running at different refresh rates. Getting Mutter to support different refresh rates on Wayland was quite a journey last cycle (3.38) - being able to make use of it in FF would make things perfect :)

SHIP IT! :D

(In reply to Markus Stange [:mstange] from comment #14)

But, other than the extra thread hops, it might be more efficient: PCompositorBridge already has infrastructure for connecting windows to browsers, or rather "compositors" to "LayersId", without creating one protocol instantiation per browser:

Ah, in my initial attempt at this (before giving up and later handing over to rmader), I was considering doing something along these lines, making the Vsync IPC message be NotifyVsync(int widgetID) with some central dispatch.

I ditched the idea as getting something simple up and running was more important at the beginning. I also started wondering if it would really do anything: The additional ID duplicates protocol-level instance IDs, the IPC layer is presumably efficient, and unless we had a way to reduce the message count, I don't think there's much to win...

But, plenty of room for experimentation with optimization later on.

(In reply to Kenny Levinsen :kennylevinsen from comment #17)

I also started wondering if it would really do anything: The additional ID duplicates protocol-level instance IDs, the IPC layer is presumably efficient, and unless we had a way to reduce the message count, I don't think there's much to win...

Yeah, I agree with that.

Attachment #9180927 - Attachment description: WIP: Bug 1645528 - Connect nsRefreshDrivers in content processes with a widget-local vsync source → Bug 1645528 - Connect nsRefreshDrivers in content processes with a widget-local vsync source

I'm definitely concerned that moving the vsync notifications to run via the parent-process main thread will be a regression for Windows. It feels like we have quite different requirements per-platform, and trying to have a single solution might not be the best answer.

I really like the idea of having the vsync thread run in the compositor process, and having notifications run across PCompositorBridge, when on Mac/Windows.

Could you make PVsync have two manager declarations, PBrowser and PBackground?

I think we could run PVsync over PBrowser when running Wayland, but keep the current setup as-is for other platforms (and then tackle moving it to PCompositorBridge as a separate project, I don't want to block you on that).

(In reply to Matt Woodrow (:mattwoodrow) from comment #19)

I'm definitely concerned that moving the vsync notifications to run via the parent-process main thread will be a regression for Windows. It feels like we have quite different requirements per-platform, and trying to have a single solution might not be the best answer.

Fully agree on this, I definitely don't want to regress things here.

I really like the idea of having the vsync thread run in the compositor process, and having notifications run across PCompositorBridge, when on Mac/Windows.

Could you make PVsync have two manager declarations, PBrowser and PBackground?

I think we could run PVsync over PBrowser when running Wayland, but keep the current setup as-is for other platforms (and then tackle moving it to PCompositorBridge as a separate project, I don't want to block you on that).

Sounds good to me (if that's possible) - Markus, would that be fine for you?

Flags: needinfo?(mstange.moz)

Yes, that works for me!

Flags: needinfo?(mstange.moz)

Turns out conditionally launching PVsync from from PBackground is quite easy. The last version does so controlled by an env var (MOZ_BACKGROUND_VSYNC), but I'll try to come up with something better. I suppose we won't want to land this before 85 anyway.

Martin, do we already have a way to check if we run on Wayland in the content process? I suppose we don't want to call into GTK/GDK as we do on the parent process, right?

Flags: needinfo?(stransky)

Thanks Martin!

Markus, Matt: I've updated the patch to only use PBrowser on Wayland for now, staying on PBackground on all other backends, including the GLX vsyncsource. I think this is now ready to review - targeting early 85 if possible :)

What's really encouraging is that I get by far the best results on vsynctester.com when using this with the Wayland vsyncsource, compared to software 60Hz timer or GLX. Not sure if that's really a good benchmark, but it looks really cool!

https://treeherder.mozilla.org/jobs?repo=try&revision=f1944b770f51f35d29a699566b99e608e57637a3

Remove dead code that was left in D93173 so file histories would
get preserved.

Depends on D93173

New try build with hopefully some relevant tests (the auto feature doesn't appear to work for me atm): https://treeherder.mozilla.org/jobs?repo=try&revision=bc8194cbf27ee61a8bec416ecb38a7d48e8d2c14

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1f42c376724d
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio
Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fbc13c3ea551
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio
https://hg.mozilla.org/integration/autoland/rev/df3577321bfe
Code cleanups r=mattwoodrow

The second push was not intended, it was triggered by D96934 which was left out previously. Anyhow, the leak should be plugged in D93173

Flags: needinfo?(robert.mader)

Please also check failures on on browser_webconsole_scroll.js -> https://treeherder.mozilla.org/logviewer?job_id=322725131&repo=autoland&lineNumber=4385

(In reply to Narcis Beleuzu [:NarcisB] from comment #34)

Please also check failures on on browser_webconsole_scroll.js -> https://treeherder.mozilla.org/logviewer?job_id=322725131&repo=autoland&lineNumber=4385

After another rebase the test doesn't seem to fail any more, so it was probably unrelated: https://treeherder.mozilla.org/jobs?repo=try&revision=28d8c4ae9fe1eeaf1ac5a98ea11dfc7cb44459f7&selectedTaskRun=DsbzISWkSXGO1DGbVMi0AQ.0

Also pushed a small update for one case where things could regress potentially - but it should barely have any effect (affected calls are https://searchfox.org/mozilla-central/source/xpcom/threads/InputEventStatistics.cpp#54-59 and https://searchfox.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#1824-1829 which both only seem to be small timing optimizations, not something affecting behaviour.

Flags: needinfo?(robert.mader)
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a9dce735340
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio

Oh, ASAP mode :) fixed, thanks!

Flags: needinfo?(robert.mader)

(In reply to Noemi Erli[:noemi_erli] from comment #39)

Robert, seems that your change caused mass failures:
...
FYI for the above fix:)

Hehe, yes, I suppose breaking the ASAP mode (layout.frame_rate=0) is...bad :)

https://treeherder.mozilla.org/jobs?repo=try&revision=a8aa412717f596170fc252f75c7edecfbaa2cd9f

Flags: needinfo?(robert.mader)

Hm, apparently there are still occasional hangs during shutdown somewhere - will investigate :/

I think I can can now reproduce the problem locally. BlockingResourceBase::CheckAcquire() occasionally throws the following error:

[Child 25248, Main Thread] WARNING: Potential deadlock detected:
Cyclical dependency starts at
Mutex : mozilla.ipc.MessageChannel.mMonitor
Next dependency:
Mutex : ThreadEventQueue (currently acquired)

Next dependency:
Mutex : TaskController::mGraphMutex (currently acquired)
Cycle completed at
Mutex : mozilla.ipc.MessageChannel.mMonitor
Deadlock may happen for some other execution

: file /home/robert/firefox/gecko-dev/xpcom/threads/BlockingResourceBase.cpp:250
###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : mozilla.ipc.MessageChannel.mMonitor calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : ThreadEventQueue (currently acquired)
 calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : TaskController::mGraphMutex (currently acquired)
 calling context
  [stack trace unavailable]

=== Cycle completed at
--- Mutex : mozilla.ipc.MessageChannel.mMonitor calling context
  [stack trace unavailable]

Deadlock may happen for some other execution

Lets see....

Ok, looks like I found the issue, it has to do with the fact that we now regularly update the refresh rate. Disabling that seems to fix the issues: https://treeherder.mozilla.org/jobs?repo=try&revision=f502a0cf1ee8bffc924fe387b1733b7087c6dbf7

Now I'd really like to preserve that feature - learning more about IPC is fun :|

Removed the offending call RequestVsyncRate() completely, making VsyncParent responsible for sending updates whenever seen fit.

Here's a try push with a subset of the failing test, with all platforms in the mix: https://treeherder.mozilla.org/jobs?repo=try&revision=46377ef2742c649d6c8b1417545a958a59c622f7

Looks all green now. Did another smaller try with the very latest changes, but those shouldn't have any effect any more: https://treeherder.mozilla.org/jobs?repo=try&revision=5995fcaec76af313a60fd973d4baf7a3fe32b37f

Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bc8953d9bed
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio
https://hg.mozilla.org/integration/autoland/rev/08cd8d747c33
Code cleanups r=mattwoodrow

These lines erronerously slipped in in the last version of the patch,
breaking tab reparenting to another window. Remove them again.

This only affects Wayland, so no test will trigger here.

Keywords: leave-open
Attachment #9190069 - Attachment description: Bug 1645528 - Fix tab reparenting on Wayland, r=mattwoodrow → Bug 1645528 - Do not instantly unref widget vsyncsource, r=mattwoodrow

Backed out 2 changesets (bug 1645528) for scroll related mochitest failures.

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&fromchange=05ee4f04dd02c3f5adcb4de614a12ffe139a8e71&searchStr=mochitest&selectedTaskRun=c47evl_7T-OAxCQPjDfsug.0&tochange=f7c2ac8767d534c32b487fa73ff2922f1525a3ac

Backout link: https://hg.mozilla.org/integration/autoland/rev/f7c2ac8767d534c32b487fa73ff2922f1525a3ac

Failures logs: https://treeherder.mozilla.org/logviewer?job_id=322913474&repo=autoland&lineNumber=4409

...
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - Matched a message with text: "Uncaught Error: myThrownErrorObject", all messages received.
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom - 
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - Wait until the stacktrace is rendered and check the console is scrolled
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom - 
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - Add a console.trace message to check that the console stays scrolled to bottom
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - Matched a message with text: "trace in C", all messages received.
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | There is a vertical overflow - 
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom - 
[task 2020-11-28T00:33:41.526Z] 00:33:41     INFO - Wait until the stacktrace is rendered
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom - 
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Check that repeated messages don't prevent scroll to bottom
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Matched a message with text: "repeat", all messages received.
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Buffered messages logged at 00:33:34
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is still scrolled to the bottom when the repeat badge is added - 
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Check that adding a message after a repeated message scrolls to bottom
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Matched a message with text: "after repeat", all messages received.
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom after a repeated message - 
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Check that switching between editor and inline mode keep the output scrolled to bottom
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The console is scrolled to the bottom after switching to editor mode - 
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Buffered messages finished
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - TEST-UNEXPECTED-FAIL | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | Uncaught exception - at chrome://mochitests/content/browser/devtools/client/shared/test/shared-head.js:700 - Error: Failed waitFor(): Output does not scroll to the bottom after switching back to inline mode
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Failed condition: () => isScrolledToBottom(outputContainer)
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - 
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Stack trace:
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - waitFor@chrome://mochitests/content/browser/devtools/client/shared/test/shared-head.js:700:11
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - async*@chrome://mochitests/content/browser/devtools/client/webconsole/test/browser/browser_webconsole_scroll.js:255:9
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Async*Tester_execTest/<@chrome://mochikit/content/browser-test.js:1069:34
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Tester_execTest@chrome://mochikit/content/browser-test.js:1109:11
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - nextTest/<@chrome://mochikit/content/browser-test.js:932:14
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:1037:23
[task 2020-11-28T00:33:41.527Z] 00:33:41     INFO - Leaving test bound 
[task 2020-11-28T00:33:41.609Z] 00:33:41     INFO - Removing tab.
[task 2020-11-28T00:33:41.611Z] 00:33:41     INFO - Waiting for event: 'TabClose' on [object XULElement].
[task 2020-11-28T00:33:41.631Z] 00:33:41     INFO - Got event: 'TabClose' on [object XULElement].
[task 2020-11-28T00:33:41.632Z] 00:33:41     INFO - GECKO(2353) | console.warn: "IGNORED REDUX ACTION:" "AUTOCOMPLETE_CLEAR"
[task 2020-11-28T00:33:41.653Z] 00:33:41     INFO - Tab removed and finished closing
[task 2020-11-28T00:33:41.697Z] 00:33:41     INFO - TEST-PASS | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | The main process DevToolsServer has no pending connection when the test ends - 
[task 2020-11-28T00:33:41.700Z] 00:33:41     INFO - GECKO(2353) | MEMORY STAT | vsize 20976808MB | residentFast 2815MB
[task 2020-11-28T00:33:41.700Z] 00:33:41     INFO - TEST-OK | devtools/client/webconsole/test/browser/browser_webconsole_scroll.js | took 13004ms

https://treeherder.mozilla.org/logviewer?job_id=322915439&repo=autoland&lineNumber=114855

[task 2020-11-28T01:20:51.184Z] 01:20:51     INFO - TEST-START | dom/animation/test/chrome/test_animation_performance_warning.html
[task 2020-11-28T01:20:51.546Z] 01:20:51     INFO - TEST-INFO | started process screentopng
[task 2020-11-28T01:20:51.763Z] 01:20:51     INFO - TEST-INFO | screentopng: exit 0
[task 2020-11-28T01:20:51.763Z] 01:20:51     INFO - Buffered messages logged at 01:20:51
[task 2020-11-28T01:20:51.764Z] 01:20:51     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 7 passes or known failures.
[task 2020-11-28T01:20:51.765Z] 01:20:51     INFO - Buffered messages finished
[task 2020-11-28T01:20:51.765Z] 01:20:51     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | An animation has: translate - An animation has: translate: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.244Z] 01:20:52     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 17 passes or known failures.
[task 2020-11-28T01:20:52.244Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.245Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | transform with backface-visibility:hidden - transform with backface-visibility:hidden: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.293Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.293Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | translate with backface-visibility:hidden - translate with backface-visibility:hidden: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.409Z] 01:20:52     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 1 passes or known failures.
[task 2020-11-28T01:20:52.410Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.410Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | opacity and transform-like properties with backface-visibility:hidden - opacity and transform-like properties with backface-visibility:hidden: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.457Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.458Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | moz-element referencing a transform - moz-element referencing a transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.506Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.506Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | moz-element referencing a translate - moz-element referencing a translate: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.553Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.554Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | moz-element referencing a translate and transform - moz-element referencing a translate and transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.669Z] 01:20:52     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 1 passes or known failures.
[task 2020-11-28T01:20:52.670Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.670Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple animations: opacity and transform-like properties with backface-visibility:hidden - Multiple animations: opacity and transform-like properties with backface-visibility:hidden: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.712Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.712Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple animations with geometric property: transform and opacity with geometric keyframes - Multiple animations with geometric property: transform and opacity with geometric keyframes: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.760Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.760Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple animations with geometric property: opacity and transform with geometric keyframes - Multiple animations with geometric property: opacity and transform with geometric keyframes: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.816Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.816Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple animations with geometric property: opacity and translate with geometric keyframes - Multiple animations with geometric property: opacity and translate with geometric keyframes: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.856Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.856Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple async animations and geometric animation: transform - Multiple async animations and geometric animation: transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:52.892Z] 01:20:52     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:52.892Z] 01:20:52     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple async animations and geometric animation: translate - Multiple async animations and geometric animation: translate: assert_true: runningOnCompositor property should be true on translate expected true got false
[task 2020-11-28T01:20:53.009Z] 01:20:53     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 1 passes or known failures.
[task 2020-11-28T01:20:53.009Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.010Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | Multiple async animations and geometric animation: opacity, transform, and translate - Multiple async animations and geometric animation: opacity, transform, and translate: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.670Z] 01:20:53     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 16 passes or known failures.
[task 2020-11-28T01:20:53.671Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.671Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | transform on too big element - dimensions - transform on too big element - dimensions: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.712Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.712Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | translate on too big element - area - translate on too big element - area: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.760Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.761Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | translate on too big element - dimensions - translate on too big element - dimensions: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.816Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.816Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | transform of nsIFrame with SVG transform - transform of nsIFrame with SVG transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.857Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.858Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | translate of nsIFrame with SVG transform - translate of nsIFrame with SVG transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.913Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.913Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | rotate of nsIFrame with SVG transform - rotate of nsIFrame with SVG transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:53.954Z] 01:20:53     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:53.954Z] 01:20:53     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | scale of nsIFrame with SVG transform - scale of nsIFrame with SVG transform: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:54.022Z] 01:20:54     INFO - Not taking screenshot here: see the one that was previously logged
[task 2020-11-28T01:20:54.022Z] 01:20:54     INFO - TEST-UNEXPECTED-FAIL | dom/animation/test/chrome/test_animation_performance_warning.html | The animations of transform-like properties are not running on the compositor because any of the properties has important rules - The animations of transform-like properties are not running on the compositor because any of the properties has important rules: assert_equals: runningOnCompositor property should match expected false but got true
[task 2020-11-28T01:20:54.084Z] 01:20:54     INFO - TEST-PASS | dom/animation/test/chrome/test_animation_performance_warning.html | Bug 1196114 - Test metadata related to which animation properties are running on the compositor - Bug 1196114 - Test metadata related to which animation properties are running on the compositor: Elided 3 passes or known failures.
[task 2020-11-28T01:20:54.085Z] 01:20:54     INFO - GECKO(4481) | MEMORY STAT | vsize 2791MB | residentFast 307MB | heapAllocated 112MB
[task 2020-11-28T01:20:54.093Z] 01:20:54     INFO - TEST-OK | dom/animation/test/chrome/test_animation_performance_warning.html | took 2914ms
Flags: needinfo?(robert.mader)
Attachment #9190069 - Attachment is obsolete: true
Attachment #9187533 - Attachment description: Bug 1645528 - Code cleanups → Bug 1645528 - Code cleanups r=mattwoodrow

Thanks, will look into it. That's the corresponding stack:

 0:15.47 GECKO(196265) Assertion failure: HasEffectiveAnimationOfProperty(property.mProperty, *effectSet) (There should be an effective animation of the property while it is marked as being run on the compositor), at /home/robert/firefox/gecko-dev/dom/animation/KeyframeEffect.cpp:1495
Initializing stack-fixing for the first stack frame, this may take a while...
 0:25.40 GECKO(196265) #01: mozilla::dom::KeyframeEffect::NotifyAnimationTimingUpdated(mozilla::PostRestyleMode) [/home/robert/firefox/gecko-dev/dom/animation/KeyframeEffect.cpp:193]
 0:25.40 GECKO(196265) #02: mozilla::dom::Animation::UpdateTiming(mozilla::dom::Animation::SeekFlag, mozilla::dom::Animation::SyncNotifyFlag) [/home/robert/firefox/gecko-dev/dom/animation/Animation.cpp:1561]
 0:25.40 GECKO(196265) #03: mozilla::dom::Animation::Tick() [/home/robert/firefox/gecko-dev/dom/animation/Animation.cpp:859]
 0:25.40 GECKO(196265) #04: mozilla::dom::DocumentTimeline::MostRecentRefreshTimeUpdated() [/home/robert/firefox/gecko-dev/dom/animation/DocumentTimeline.cpp:193]
 0:25.40 GECKO(196265) #05: nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) [/home/robert/firefox/gecko-dev/layout/base/nsRefreshDriver.cpp:2088]
 0:25.40 GECKO(196265) #06: mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) [/home/robert/firefox/gecko-dev/layout/base/nsRefreshDriver.cpp:329]
 0:25.40 GECKO(196265) #07: mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) [/home/robert/firefox/gecko-dev/layout/base/nsRefreshDriver.cpp:350]
 0:25.40 GECKO(196265) #08: mozilla::RefreshDriverTimer::TimerTick(nsITimer*, void*) [/home/robert/firefox/gecko-dev/layout/base/nsRefreshDriver.cpp:371]
 0:25.40 GECKO(196265) #09: nsTimerImpl::Fire(int) [/home/robert/firefox/gecko-dev/xpcom/threads/nsTimerImpl.cpp:562]
 0:25.41 GECKO(196265) #10: nsTimerEvent::Run() [/home/robert/firefox/gecko-dev/xpcom/threads/TimerThread.cpp:251]
 0:25.41 GECKO(196265) #11: mozilla::RunnableTask::Run() [/home/robert/firefox/gecko-dev/xpcom/threads/TaskController.cpp:453]
 0:25.41 GECKO(196265) #12: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [/home/robert/firefox/gecko-dev/xpcom/threads/TaskController.cpp:722]
 0:25.41 GECKO(196265) #13: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [/home/robert/firefox/gecko-dev/xpcom/threads/TaskController.cpp:581]
 0:25.41 GECKO(196265) #14: mozilla::TaskController::ProcessPendingMTTask(bool) [/home/robert/firefox/gecko-dev/xpcom/threads/TaskController.cpp:375]
 0:25.41 GECKO(196265) #15: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_4>::Run() [/home/robert/firefox/gecko-dev/xpcom/threads/nsThreadUtils.h:486]
 0:25.41 GECKO(196265) #16: nsThread::ProcessNextEvent(bool, bool*) [/home/robert/firefox/gecko-dev/xpcom/threads/nsThread.cpp:1199]
 0:25.41 GECKO(196265) #17: NS_ProcessNextEvent(nsIThread*, bool) [/home/robert/firefox/gecko-dev/xpcom/threads/nsThreadUtils.cpp:513]
 0:25.41 GECKO(196265) #18: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [/home/robert/firefox/gecko-dev/ipc/glue/MessagePump.cpp:109]
 0:25.41 GECKO(196265) #19: MessageLoop::RunInternal() [/home/robert/firefox/gecko-dev/ipc/chromium/src/base/message_loop.cc:334]
 0:25.41 GECKO(196265) #20: MessageLoop::Run() [/home/robert/firefox/gecko-dev/ipc/chromium/src/base/message_loop.cc:310]
 0:25.41 GECKO(196265) #21: nsBaseAppShell::Run() [/home/robert/firefox/gecko-dev/widget/nsBaseAppShell.cpp:139]
 0:25.41 GECKO(196265) #22: nsAppStartup::Run() [/home/robert/firefox/gecko-dev/toolkit/components/startup/nsAppStartup.cpp:272]
 0:25.41 GECKO(196265) #23: XREMain::XRE_mainRun() [/home/robert/firefox/gecko-dev/toolkit/xre/nsAppRunner.cpp:5105]
 0:25.41 GECKO(196265) #24: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [/home/robert/firefox/gecko-dev/toolkit/xre/nsAppRunner.cpp:5297]
 0:25.41 GECKO(196265) #25: XRE_main(int, char**, mozilla::BootstrapConfig const&) [/home/robert/firefox/gecko-dev/toolkit/xre/nsAppRunner.cpp:5353]
 0:25.45 GECKO(196265) #26: main [/home/robert/firefox/gecko-dev/browser/app/nsBrowserApp.cpp:337]
fix-stacks: error: failed to read symbols file `/home/robert/firefox/gecko-dev/obj-x86_64-pc-linux-gnu/dist/crashreporter-symbols/libc.so.6/734106315D6CBE5258F3E9EFA35903480/libc.so.6.sym` for `/lib64/libc.so.6`
fix-stacks: note: this is expected and harmless for system libraries on debug automation runs
fix-stacks: No such file or directory (os error 2)
 0:25.45 GECKO(196265) #27: __libc_start_main [/lib64/libc.so.6 + 0x281e2]
 0:25.45 GECKO(196265) #28: _start [/home/robert/firefox/gecko-dev/obj-x86_64-pc-linux-gnu/dist/bin/firefox + 0x3e67e]

Edit: never mind, the test runs in optimized mode, so the assert can be ignored. Would probably still be good to make sure the tests run properly in a debug build.

Flags: needinfo?(robert.mader)

Ok, I think I got it. The patch introduced a race, resulting in these test becoming flaky - calls like await waitForFrame(); would randomly not wait long enough.

Fixed that by using sRegularRateTimer again where appropriate, most importantly the non-e10s one.

https://treeherder.mozilla.org/jobs?repo=try&revision=345f8d44db140fcc9830ee88073fcfbc328cff4e

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2fbe8c11cecb
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio
Keywords: leave-open

Thanks, will look into it.

A quick test suggests that the test works fine locally (./mach wpt test_list webdriver/tests/get_window_rect/user_prompts.py), so I assume, given that this is e10s, what we see here is more racy/timing related stuff :(

Flags: needinfo?(robert.mader)

Note that setting the window rect and especially the x/y position happens here:
https://searchfox.org/mozilla-central/rev/66547980e8e8ca583473c74f207cae5bac1ed541/testing/marionette/driver.js#1578-1581

So I assume the failures here are closely related to bug 1600389.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #57)

Actually I meant bug 1602701.

Oh hehe, just wrote you in parallel (bug 1679705 comment 6). So as the test seems to pass in my try pushes (and locally), would it be reasonable to consider the test flaky? I don't see anything actionable about the patch atm :/

Henrik, I assume you triggered the extra runs in bug 1679705 comment 6 ? In that case we now have 9 consecutive runs without fail - enough to land this again? :)

Sorry for being so pushy, I just really want to see this landed.

Flags: needinfo?(hskupin)

ASAN might not be that flaky and seems to pass most of the time. But as I just checked the Wd2, and Wd3 chunks are actually the failing ones:

https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&searchStr=wdspec&tochange=241fe575507a1cf8fd73da87ee34096f929158dd&fromchange=2fbe8c11cecbb48cd62e4617d68c4850bfcf051d

I'm ok with having the Linux failures marked as expected PASS, FAIL in the manifest files with a reference to bug 1602701.

Flags: needinfo?(hskupin)

These depend on bug 1602701 and become way more common with D93173.
The list may not be comprehensive though.

Attachment #9180927 - Attachment description: Bug 1645528 - Connect nsRefreshDrivers in content processes with a widget-local vsync source → Bug 1645528 - Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio

Comment on attachment 9190419 [details]
Bug 1645528 - Mark some test as flaky on Linux, r?whimboo

Revision D98254 was moved to bug 1602701. Setting attachment 9190419 [details] to obsolete.

Attachment #9190419 - Attachment is obsolete: true
Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e4dc75be2d56
Connect nsRefreshDrivers in content processes with a widget-local vsync source r=mattwoodrow,emilio
https://hg.mozilla.org/integration/autoland/rev/e4c8bc7feaa6
Code cleanups r=mattwoodrow

Tested in Sway and Gnome Wayland, vsync is working nicely now on 75hz monitor without having to set layout_framerate.

❯ MOZ_ENABLE_WAYLAND=1 mozregression --repo try --launch e4c8bc7feaa60470c026236a7d89881a74b29d34 --pref gfx.webrender.all:true widget.wayland_vsync.enabled:true privacy.reduceTimerPrecision:false -a https://vsynctester.com

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

(In reply to Pulsebot from comment #52)

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2fbe8c11cecb
Connect nsRefreshDrivers in content processes with a widget-local vsync
source r=mattwoodrow,emilio

== Change summary for alert #27930 (as of Thu, 03 Dec 2020 06:56:56 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
32% about_preferences_basic windows10-64-shippable-qr e10s stylo webrender-sw 121.98 -> 161.58

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27930

(In reply to Narcis Beleuzu [:NarcisB] from comment #54)

Backed out for wdspec failures on user_prompts.py

Backout link: https://hg.mozilla.org/integration/autoland/rev/241fe575507a1cf8fd73da87ee34096f929158dd
Log link: https://treeherder.mozilla.org/logviewer?job_id=323003983&repo=autoland&lineNumber=40312

== Change summary for alert #27932 (as of Thu, 03 Dec 2020 07:05:14 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
28% about_preferences_basic windows10-64-shippable-qr e10s stylo webrender-sw 152.96 -> 110.16
23% about_preferences_basic windows10-64-shippable-qr e10s stylo webrender-sw 142.03 -> 109.07

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27932

Regressions: 1610710
Regressions: 1673139
Regressions: 1681030

Please back out these changes - there are some pageload regressions that I still haven't understood and which may not be trivial to fix.

(See comment above)

Flags: needinfo?(sheriffs)
Flags: needinfo?(jcristau)

Emilio: Any ETA for the backout (heard there were conflicts)?

Flags: needinfo?(sheriffs)

Sorry for the lag.

Flags: needinfo?(sheriffs)

Some more notes about what needs fixing here:
I previously misunderstood how the startup timer in the content process work(s|ed). IIUC now, it's something like this:

  • When the first timers start, the vsync child is not yet available. Thus a process-wide software startup timer is created (via a static variable).
  • At some point, some driver will start up with the vsync child available. It will create vsync timer from that, move over all refresh drivers from the startup timer and replace the static variable.

With the patch from this bug, it's something like this:

  • The vsync child is available earlier / the vsync drivers get initialized later
  • Thus the most "visible" refresh drivers like those used for RAF will start up with the vsync child around, making sites like vsynctester.com work on Wayland
  • however, there are still drivers started up too early, which will not update on their own to use the vsync child. They may update after e.g. changing tabs back and forth.

This still doesn't explain why page load fires that late in some scenarios - but could have to do with the fact that previously more timers ran on the startup timer for longer.

So what I'll now try is:

  • strip down vsync child to accept a single observer again
  • make all refresh drivers with the same browser child use a single timer again, but now provided by the browser child instead of a process wide one. This possibly fixes certain flakiness regressions as the behaviour should closer match the old behaviour.
  • investigate if the startup timer should run longer before getting exchanged for the vsync timer - also in order to match previous behaviour more closely

The patch has been backed out of beta already, thus clearing the need-info.

Flags: needinfo?(sheriffs)
Depends on: 1683298

\o/
Finally found the cause of the regressions: the way we now compute nsRefreshDriver::GetIdleDeadlineHint is incompatible with how it used to work. Will have a patch in a couple of days :)

See Also: 1601966

Hey! How does it look with the patch/fix? Does it look like the cause of the regressions was figured out?

(In reply to pasik from comment #78)

Hey! How does it look with the patch/fix? Does it look like the cause of the regressions was figured out?

The patch landed in bug 1681030 - so I think all regressions are fixed now :)
Bug 1673139 still needs confirmation though.

(In reply to Robert Mader [:rmader] from comment #79)

The patch landed in bug 1681030 - so I think all regressions are fixed now :)
Bug 1673139 still needs confirmation though.

Oh, nice, I totally missed the patch already landed in bug 1681030. Thanks a lot!

No longer depends on: 1675680
See Also: → 1744896
Regressions: 1764578
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: