Make window manipulation more reliable

RESOLVED FIXED in Firefox 65

Status

enhancement
P1
normal
RESOLVED FIXED
11 months ago
5 months ago

People

(Reporter: ato, Assigned: ato)

Tracking

(Blocks 2 bugs, Regressed 1 bug)

Trunk
mozilla65
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox65 fixed)

Details

Attachments

(20 attachments, 16 obsolete attachments)

46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
Marionette has long had a problem with unreliable window manipulation.
Subsequent invocations of WebDriver:SetWindowRect,
WebDriver:FullscreenWindow followed by a call to SetWindowRect, &c.
are examples of when the window rect might be returned with the
correct values but the subsequent call to GetWindowRect will return
the wrong values, or of when our throtteling code waiting for the
WM to process and for the outerWidth/outerHeight DOM properties to
be propagated doesn’t work.

The aim is to provide synchronous APIs for window manipulation that
does not return control to the user before the window has been
successfully moved/resized/minimized/maximized/fullscreen’ed and
the relevant internal properties, such as
ChromeWindow.{outerWidth,outerHeight} have received the new values.
whimboo helpfully suggested that we need to give the main thread
time enough to process at least one event before we start querying
the updated properties.  I then discovered via an experiment that
if we delegate the entire window resizing code (the implementation
of WebDriver:SetWindowRect) to the main thread, we could see that
the callback resolved much later than when we currently return from
WebDriver:SetWindowRect.

It is not exactly news that we return too early, but the good news
from this finding was that the outerWidth/outerHeight properties
were correctly reported inside the callback that ran on the main
thread.

The working theory we have at the moment is that we need to expand
the use of IdlePromise to encompass all the synchronisation points
checking outerWidth/outerHeight.
Blocks: webdriver
Priority: -- → P2
Blocks: 1471288
Assignee: nobody → ato
Status: NEW → ASSIGNED
Priority: P2 → P1
Depends on: 1493660
Blocks: 1445212
I think we concluded last week that delegation to the main thread
helped for _some_ of the window manipulation commands, but that it
in itself did not solve all the edge cases.

I have since rewritten the way we put a window into fullscreen mode,
how we iconify it, and how we synchronously wait for it to be
restored.  This appears to have solved a few problems with headless
mode and generally seems to make a subset of the cases more reliable.

It is hard to tell conclusively if these changes will indeed be
more reliable in the long run, since changes to this code is
notoriously hard to test due to different configurations of systems,
window managers, and setups.
Summary: Delegate window manipulation calls to main thread → Make window manipulation more reliable
My latest try runs show that the changes I have made are pretty stable:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdacb0bcc955db4cdd9a1a4c9bc0a45ce61a5a64&group_state=expanded

Except very occasionally the ChromeWindow is reported as minimized
on the main thread, but that document.hidden in web content will
report false:

> 1538052289305	webdriver::server	DEBUG	-> POST /session/9aea82ec-d5fa-4b3f-9358-bcb00d52716a/window/minimize {}
> 1538052289307	Marionette	TRACE	0 -> [0,49,"WebDriver:MinimizeWindow",{}]
> 1538052289324	Marionette	TRACE	0 <- [1,49,null,{"x":0,"y":0,"width":800,"height":600,"state":"minimized"}]
> 1538052289328	webdriver::server	DEBUG	<- 200 OK {"value":{"x":0,"y":0,"width":800,"height":600}}
> 1538052289331	webdriver::server	DEBUG	-> POST /session/9aea82ec-d5fa-4b3f-9358-bcb00d52716a/execute/sync {"args": [], "script": "return document.hidden"}
> 1538052289333	Marionette	TRACE	0 -> [0,50,"WebDriver:ExecuteScript",{"args":[],"newSandbox":false,"script":"return document.hidden","scriptTimeout":null,"specialPowers":false}]
> 1538052289419	Marionette	TRACE	0 <- [1,50,null,{"value":false}]
> 1538052289420	webdriver::server	DEBUG	<- 200 OK {"value":false}

My educated guess is that DOM properties take some time to propagate
from the main process to the content child process, and that the
best way to fix this is to add an explicit wait in the test.
It would be good to know how long this delay is until the content process is aware of the hidden state. Maybe we should better have an internal wait for that, so that tests can always trust that the window is minimized when the command returns. Similar to any other command.
We can’t currently wait on document.hidden because of a bug in
headless mode.  Because it appears to happen so infrequently, I
would like to apply the fixes we know work first, file the bug about
document.hidden and get this fixed, and then consider using an
internal wait for document.hidden in web content.
That sounds good.
I had a look at the try build and lots of Wd tests still show a hang in SetWindowRect. Didn't we also wanted to use the timed promise for the changes on that bug? Or did I mix-up bugs?
(In reply to Henrik Skupin (:whimboo) from comment #8)

> I had a look at the try build and lots of Wd tests still show a
> hang in SetWindowRect.

I didn’t request review on these changes yet.  I must have made a
mistake when splitting the changes into multiple patches.

> Didn't we also wanted to use the timed promise for the changes on
> that bug? Or did I mix-up bugs?

The problematic promises that would previously hanged have been
replaced with PollPromise, which eventually will reject after a a
timeout.
Depends on: 1496432
Blocks: 1496489
This adds two new synchronisation primitives to the Marionette
sync module.  MainThreadPromise mimics a Promise and dispatches a
callback function to the main thread of the current process which
awaits until it is resolved.  MainThreadTask is similar, except
it is resolved automatically when the callback function returns,
along with passing on its return value.

Both are built around Services.tm.dispatchToMainThread.
Depends on D7761
Instead of waiting for the visibilitychange event to fire, which does
not work in headless mode, we can poll ChromeWindow.windowState which
appears to be a more reliable way of telling the window's current state.

In addition to this we delegate the polling to the main thread,
and don't return from there until the state has changed.

This will resolve the problem where Marionette never returns from
minimization or restoration due to visibilitychange never firing,
and will additionally make it more reliable.

Depends on D7762
This delegates moving the window to the main thread, and additionally
synchronises the state using an IdlePromise after it has returned
to ensure DOM properties are propagated.

Depends on D7763
Should the window already be at the target X/Y position, we can
skip having to repositon it.

Depends on D7764
This adds a new synchronisation primitive to Marionette which will
allow us to wait for the last in a sequence of events to fire.
This is especially useful for high-frequency events such as the DOM
resize event, where multiple resize events may fire as the window
dimensions are being changed.

Depends on D7765
Depends on D7766
My delegating to the main thread, waiting for the last DOM resize event
to fire, and requesting an animation frame from the ChromeWindow, we
can ensure the window is (more) synchronously resized.  This ensures
better reliability when setting a window's dimensions.

All this means we can get rid of the heuristics we use for "waiting
for a window resize" to occur by checking if the outerWidth/outerHeight
has changed.  These were obviously bug ridden.

Depends on D7767
For the time being we need to poll for document.hidden to become
true because certain driver implementations, such as geckodriver,
occasionally does not wait until the DOM property is propagated to
the child process.

Depends on D7768
The "is True" and "is False" style is unnecessary as the return
values are boolean, which means "assert is_fullscreen(session)"
and "assert not is_fullscreen(session)" is sufficient.

Depends on D8396
The stack is joined with "\n" causing an extra carriage return line
feed to appear at the end of the string.

Depends on D8397
We often use TimedPromise to ensure Marionette does not unexpectedly
block on a promise that, for whatever reason, does not resolve.
It can however be useful to be alerted when they don't, as it quite
often means there is an underlying problem.

Depends on D8398
To be able to reuse is_fullscreen() in other tests, we need to
uplift it to the support.helpers package.

Depends on D8399
On some systems and window managers, such as macOS and when X11
forwarding an application across systems, there exists a 22px
window border that we cannot detect or do anything about.  As this
test is to verify that the width/height changed beyond 800x600,
this assertion change should make the tests pass on more configurations.

Depends on D7769
Attachment #9014511 - Attachment is obsolete: true
Attachment #9014503 - Attachment is obsolete: true
Attachment #9014504 - Attachment is obsolete: true
Attachment #9014505 - Attachment is obsolete: true
Attachment #9014506 - Attachment is obsolete: true
Attachment #9014507 - Attachment is obsolete: true
Attachment #9014508 - Attachment is obsolete: true
Attachment #9014509 - Attachment is obsolete: true
Attachment #9014510 - Attachment is obsolete: true
Attachment #9016315 - Attachment is obsolete: true
Attachment #9016316 - Attachment is obsolete: true
Attachment #9016317 - Attachment is obsolete: true
Attachment #9016318 - Attachment is obsolete: true
Attachment #9016319 - Attachment is obsolete: true
Attachment #9016320 - Attachment is obsolete: true
Attachment #9016321 - Attachment is obsolete: true
The "is True" and "is False" style is unnecessary as the return
values are boolean, which means "assert is_fullscreen(session)"
and "assert not is_fullscreen(session)" is sufficient.

Depends on D8404
The stack is joined with "\n" causing an extra carriage return line
feed to appear at the end of the string.

Depends on D8405
We often use TimedPromise to ensure Marionette does not unexpectedly
block on a promise that, for whatever reason, does not resolve.
It can however be useful to be alerted when they don't, as it quite
often means there is an underlying problem.

Depends on D8406
To be able to reuse is_fullscreen() in other tests, we need to
uplift it to the support.helpers package.

Depends on D8407
For the time being we need to poll for document.hidden to become
true because certain driver implementations, such as geckodriver,
occasionally does not wait until the DOM property is propagated to
the child process.

Depends on D8408
On some systems and window managers, such as macOS and when X11
forwarding an application across systems, there exists a 22px
window border that we cannot detect or do anything about.  As this
test is to verify that the width/height changed beyond 800x600,
this assertion change should make the tests pass on more configurations.

Depends on D8409
This adds a new synchronisation primitive to Marionette which will
allow us to wait for the last in a sequence of events to fire.
This is especially useful for high-frequency events such as the DOM
resize event, where multiple resize events may fire as the window
dimensions are being changed.

Depends on D8411
Instead of waiting for the visibilitychange event to fire, which does
not work in headless mode, we can poll ChromeWindow.windowState which
appears to be a more reliable way of telling the window's current state.

This will resolve the problem where Marionette never returns from
restoration due to visibilitychange never firing, and will additionally
make it more reliable.

Depends on D8412
The visibilitychange DOM event does not fire reliably in all
configurations when retrieved from web content.  It appears to fire more
reliably in chrome, but to ensure a call to WebDriver:MinimizeWindow
never hangs, we additionally change it to be a TimedPromise.

There is an assumption here that if the iconification process does
not complete within this duration, there is nothing we can do.

Depends on D8413
Unlike the visibilitychange event, sizemodechange appears to fire in a
mostly reliable way.  However, in the off-chance that sizemodechange
should not fire, we need an escape path so that Marionette returns
within a timely fashion.

Depends on D8414
win.Maximized does not exist.  This should be WindowState.Maximized.

Depends on D8415
When the ChromeWindow is already in the desired x/y position,
WebDriver:SetWindowRect should act as a no-op.  This avoids a
superfluous call to ChromeWindow.moveTo.

Depends on D8416
My delegating to the main thread, waiting for the last DOM resize event
to fire, and requesting an animation frame from the ChromeWindow, we
can ensure the window is (more) synchronously resized.  This ensures
better reliability when setting a window's dimensions.

All this means we can get rid of the heuristics we use for "waiting
for a window resize" to occur by checking if the outerWidth/outerHeight
has changed.  These were obviously bug ridden.

Depends on D8417
This requests an animation frame off ChromeWindow and waits for
the main thread's event queue to become idle before returning from
window manipulation commands.

This ensures all potential synchronisation code between e.g. the
parent process and the child processes have had time to run before
we return from WebDriver:{MinimizeWindow,MaximizeWindow,FullscreenWindow}.

Depends on D8418
Attachment #9016321 - Attachment is obsolete: false
Attachment #9016320 - Attachment is obsolete: false
Keywords: leave-open
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b95d40e31251
marionette: fix GeckoDriver#setWindowRect docs; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/246b45eae45c
webdriver: fix is_fullscreen assertions; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/0e4677eff22b
marionette: trim crlf off produced stack; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/9ca2c179e63e
marionette: warn on TimedPromise bailing; r=automatedtester
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/13804 for changes under testing/web-platform/tests
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cbca19a1e3f4
webdriver: uplift is_fullscreen as helper; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/bfe0c0f39dd3
webdriver: poll for document.hidden; r=automatedtester
Lots of changes here. It's great to see those landed! Andreas, can this bug be closed now, or are there more fixes to be landed?
(In reply to Henrik Skupin (:whimboo) from comment #49)
> Lots of changes here. It's great to see those landed! Andreas, can this bug
> be closed now, or are there more fixes to be landed?

There’s still a lot more changes to land.  You can see the full
list of open revisions here:
https://phabricator.services.mozilla.com/D8419
Attachment #9016341 - Attachment description: bug 1492499: marionette: empty event queue before returning from window manipulation; → bug 1492499: marionette: empty event queue on window manipulation;
On certain window manager configurations a window may be resized
to fit the full available dimensions of the screen without going
into the maximised state.  Similarily, a maximised window may be
the exact dimensions of the screen.

If the window outerWidth/outerHeight is 800x600 and in maximised
state, resizing it to 800x600 through WebDriver:SetWindowRect will
not work because the window has already reached its requested dimensions.

This patch ensures to wait for the resizeEnd event when the window
state is not normal.

Depends on D8419
The Marionette test_window_maximize.py test is now duplicated in WPT tests.

Depends on D10568
Upstream PR merged
Attachment #9021836 - Attachment description: bug 1492499: marionette: deduplicate maximize window tests; → bug 1492499: marionette: deduplicate window manipulation tests;
Attachment #9016319 - Attachment is obsolete: false
Attachment #9016318 - Attachment is obsolete: false
Attachment #9021836 - Attachment description: bug 1492499: marionette: deduplicate window manipulation tests; → bug 1492499: marionette: drop broken window maximization test;
Attachment #9016316 - Attachment is obsolete: false
Attachment #9016317 - Attachment is obsolete: false
Attachment #9016315 - Attachment is obsolete: false
The things that rely on TimedPromise, such as awaiting a sizemodechange
event on exiting fullscreen, takes a lot longer to perform in debug
builds than in optimised builds.

This patch increases the TimedPromise timeout duration by three
times in debug builds, which is the same amount WPT uses.

Depends on D10569
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/83ee1c23d31a
webdriver: take 22px window border into account on maximizing; r=automatedtester,whimboo
https://hg.mozilla.org/integration/autoland/rev/8490a7301d92
webdriver: add stress tests for window manipulation; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/1cceee950d62
marionette: add debounce sync primitive; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/33fd5380d4bc
marionette: poll on restoring window; r=whimboo
https://hg.mozilla.org/integration/autoland/rev/f1b2efff0c44
marionette: bail if visibilitychange does not fire on minimization; r=whimboo
https://hg.mozilla.org/integration/autoland/rev/1d9d159a04b7
marionette: accept sizemodechange may not always fire; r=automatedtester,whimboo
https://hg.mozilla.org/integration/autoland/rev/15b378cd5a24
marionette: fix buggy window state comparison; r=whimboo
https://hg.mozilla.org/integration/autoland/rev/95700d2eb9c5
marionette: make window positioning idempotent; r=whimboo
https://hg.mozilla.org/integration/autoland/rev/2aaa43050c7a
marionette: wait for last event on resizing window; r=automatedtester,whimboo
https://hg.mozilla.org/integration/autoland/rev/05b349eaba54
marionette: empty event queue on window manipulation; r=automatedtester,whimboo
https://hg.mozilla.org/integration/autoland/rev/81ac8f7d11b1
marionette: restore from maximized before setting rect; r=whimboo
https://hg.mozilla.org/integration/autoland/rev/a4fcddffde41
marionette: drop broken window maximization test; r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/2c1b00a6a97f
marionette: increase TimedPromise timeout on debug; r=whimboo
The Maximize Window command's stress test is broken on Windows due
to a socket problem on Windows specifically.  I think the problem
is that the socket buffer is too small because of a very intensive
check for the window size not changing.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1505806 for more details.
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c812e3810b94
webdriver: disable maximize_window/stress.py on windows;  a=stefan_hindli r=jgraham
Blocks: 1505806
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/13983 for changes under testing/web-platform/tests
Keywords: leave-open
Andreas, a big thank you for working on that bug! It will make use way more stable now for all the window manipulation commands, which cause lesser hangs and timeouts for people using Selenium via geckodriver/Marionette.

Also 5 more commands are no longer listed as timeout (complete red) on wpt.fyi!
Upstream PR merged
Attachment #9016315 - Attachment is obsolete: true
Attachment #9016316 - Attachment is obsolete: true
Attachment #9016317 - Attachment is obsolete: true
Attachment #9016318 - Attachment is obsolete: true
Attachment #9016319 - Attachment is obsolete: true
Attachment #9016320 - Attachment is obsolete: true
Attachment #9016321 - Attachment is obsolete: true
Depends on: 1509557
Blocks: 1509557
No longer depends on: 1509557
Blocks: 1512593
No longer blocks: 1512593
No longer blocks: 1478358
Depends on: 1478358
No longer depends on: 1520284
Regressions: 1520284
You need to log in before you can comment on or make changes to this bug.