Closed Bug 1397007 Opened 2 years ago Closed 2 years ago

Intermittent /webdriver/tests/minimize_window.py | minimize_window.py::test_payload

Categories

(Testing :: geckodriver, defect, P3)

Version 3
defect

Tracking

(firefox56 unaffected, firefox57 fixed)

RESOLVED FIXED
mozilla57
Tracking Status
firefox56 --- unaffected
firefox57 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ato)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Attachments

(1 file)

[task 2017-09-05T20:51:44.208676Z] 20:51:44     INFO - TEST-UNEXPECTED-FAIL | /webdriver/tests/minimize_window.py | minimize_window.py::test_payload 
[task 2017-09-05T20:51:44.209323Z] 20:51:44     INFO - session = <webdriver.client.Session object at 0x7f1cc3983050>
[task 2017-09-05T20:51:44.209862Z] 20:51:44     INFO - 
[task 2017-09-05T20:51:44.210397Z] 20:51:44     INFO -     def test_payload(session):
[task 2017-09-05T20:51:44.210755Z] 20:51:44     INFO -         assert not session.execute_script("return document.hidden")
[task 2017-09-05T20:51:44.211092Z] 20:51:44     INFO -     
[task 2017-09-05T20:51:44.211447Z] 20:51:44     INFO -         response = minimize(session)
[task 2017-09-05T20:51:44.211785Z] 20:51:44     INFO -     
[task 2017-09-05T20:51:44.212117Z] 20:51:44     INFO -         # step 5
[task 2017-09-05T20:51:44.212460Z] 20:51:44     INFO -         assert response.status == 200
[task 2017-09-05T20:51:44.213191Z] 20:51:44     INFO -         assert isinstance(response.body["value"], dict)
[task 2017-09-05T20:51:44.213479Z] 20:51:44     INFO -     
[task 2017-09-05T20:51:44.213819Z] 20:51:44     INFO -         value = response.body["value"]
[task 2017-09-05T20:51:44.214031Z] 20:51:44     INFO -         assert "width" in value
[task 2017-09-05T20:51:44.214345Z] 20:51:44     INFO -         assert "height" in value
[task 2017-09-05T20:51:44.214605Z] 20:51:44     INFO -         assert "x" in value
[task 2017-09-05T20:51:44.214844Z] 20:51:44     INFO -         assert "y" in value
[task 2017-09-05T20:51:44.215117Z] 20:51:44     INFO -         assert isinstance(value["width"], (int, float))
[task 2017-09-05T20:51:44.215403Z] 20:51:44     INFO -         assert isinstance(value["height"], (int, float))
[task 2017-09-05T20:51:44.215674Z] 20:51:44     INFO -         assert isinstance(value["x"], (int, float))
[task 2017-09-05T20:51:44.215948Z] 20:51:44     INFO -         assert isinstance(value["y"], (int, float))
[task 2017-09-05T20:51:44.216113Z] 20:51:44     INFO -     
[task 2017-09-05T20:51:44.216381Z] 20:51:44     INFO - >       assert session.execute_script("return document.hidden")
[task 2017-09-05T20:51:44.216912Z] 20:51:44     INFO - E       assert <bound method Session.execute_script of <webdriver.client.Session object at 0x7f1cc3983050>>('return document.hidden')
[task 2017-09-05T20:51:44.217060Z] 20:51:44     INFO - E        +  where <bound method Session.execute_script of <webdriver.client.Session object at 0x7f1cc3983050>> = <webdriver.client.Session object at 0x7f1cc3983050>.execute_script
[task 2017-09-05T20:51:44.217374Z] 20:51:44     INFO - 
[task 2017-09-05T20:51:44.217505Z] 20:51:44     INFO - tests/web-platform/tests/webdriver/tests/minimize_window.py:59: AssertionError

So `document.hidden` is not True right after calling `minimize` for the current window.
Flags: needinfo?(ato)
I don’t think this warrants my immediate attention, but let’s
see how https://bugzil.la/1396618 affects this.  We know that DOM
attributes propagate slowly, and on Linux especially, we will never
be able to guarantee this command to be race-free.

Before taking any further action, I would like to see how severe the
intermittency rate of this is.  If it turns out to be unacceptable,
we should consider hard coding a timeout on Linux.  That’s the
only option I can think of.
Flags: needinfo?(ato)
Depends on: 1396618
See Also: → 1397069
(In reply to Andreas Tolfsen ‹:ato› from comment #2)
> Before taking any further action, I would like to see how severe the
> intermittency rate of this is.  If it turns out to be unacceptable,
> we should consider hard coding a timeout on Linux.  That’s the
> only option I can think of.

It looks pretty bad! Combined with bug 1397069, I'd anticipate about 400 failures / week. Hope you can look at this soon.
Flags: needinfo?(ato)
Whiteboard: [stockwell needswork]
Looks like throttling has helped here but didn't fix it. When checking the OF stats it looks like that it first started on Sep 5th:

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1397007&startday=2017-09-01&endday=2017-09-11&tree=all

Andreas, did you land something between Sep 4th and 5th, which could have caused this?
Like I said in another bug, I believe it could be affected by
https://searchfox.org/mozilla-central/commit/ba22a006380310c5391f68157e24f8b651a19ba6,
which changes the assertion:

> -     assert session.window.state == "minimized"
> +     assert session.execute_script("return document.hidden")

My guess is that the DOM property is not propagated quickly enough
after the window has been iconified.

I’m not really sure what to do about this to make it more
reliable.  This is inherently racy on Linux due to the way the X11
protocol works.  Ideas?
Flags: needinfo?(ato)
The failure rate is pretty high. So I would propose that you use a one-click loaner to see what's going on. The chance to see it will be pretty good for Linux64.
(In reply to Henrik Skupin (:whimboo) from comment #9)

> The failure rate is pretty high. So I would propose that you use a
> one-click loaner to see what's going on. The chance to see it will
> be pretty good for Linux64.

I don’t know how to interact with mozharness to use loaners and I
found this too hard to spend time learning in the past.

But seeing what is going on doesn’t really help.  The only option
I can think of at the moment is to disable our Minimize Window
command implementation.
I had another dive into this today, and will try to summarise the
problem and a few different potential solutions I came up with.

When the document.hidden assertion fails the window has in
fact already been minimised.  What we are seeing here is a DOM
propagation race condition, where document.hidden isn’t updated
in time before the call to WebDriver:ExecuteScript subsequent to
WebDriver:MinimizeWindow.

We can be sure of this because the sizemodechange event fires
reliably when the widget layer fires an iconification event.  Now,
it is true as I said before that this event is not synchronous
on Linux when using the X11 protocol, but from my own testing
with a non-optimised debug build (which are terribly slow, but
rather good at reproducing this problem) I can see that the
ChromeWindow.windowState and the sizemodechange events are
intrinsicly linked.

This means we never actually return from WebDriver:MinimizeWindow
before at least the internal window state in Firefox is reporting
that it has been iconified.  Whether the OS window widget indeed
has been minimised on the screen is actually of lesser importance
because that is not what the WPT test is asserting.  The window
will certainly within close proximity of sizemodechange firing be
minimised.

The main problem we are seeing here is that whilst we can trust
sizemodechange, the browser’s document.hidden DOM property is not
synchronously updated.  This means that the assertion that uses an
injected script to return its value is getting back the out-of-date,
old document.hidden state because it has not yet been updated.

This was working before because we were returning
ChromeWindow.windowState (linked to sizemodechange) in a "state"
field on the window rect, and using that as in the assertion.
Because window state was dropped from level 1 in the specification,
we now have to rely on web platform APIs such as document.hidden.

I was able to come up with three solutions that so far look
promising.  They all achieve the same result, but use different
mechanisms for getting there:

  (1) OOP polling from the test.  We employ a synchronous explicit
  wait utility, like wait.py, in the test to inject scripts that
  return document.hidden, but don’t continue on before it returns
  true or a timeout is reached.

  (2) New frame script command that polls for document.hidden to
  be true.  Similar to option 1, but the wait-poll is done in
  testing/marionette/listener.js using wait.until.  Better for both
  IPC- and transport reasons.

  (3) A web content DOM event observer.  We write a new observer
  in testing/marionette/listener.js that allows Marionette chrome
  scope to add listeners for any web content DOM event.  We use
  that to listen for visibilitychange to fire, then remove the
  listener.  We put this in a promise that listens for the event,
  calls window.minimize(), then resolves the promise.

As much as I dislike the idea of having to do IPC to the frame
script in order to determine that the window has been minimised,
it is the only reliable solution I could find that removes the
surprise of document.hidden not returning true after calling the
WebDriver:MinimizeWindow command.

Between option 2 and 3, the event-based approach in 3 is obviously
preferable to polling when we the option not to.  It is also more
generic than having specific one-off hacks written specifically for
the fact that the DOM is asynchronous.  It also has the upside that
this is how most other functionality in chrome that depends on some
web content state is written.

Option 3 incurs more complexity, but is a potential feature it
would—in the future—be possible to expose as a Marionette
service.  Having the option to subscribe to events from OOP would be
very exciting.  It obviously would not be compatible with WebDriver
(not being full duplex), but makes it possible to easily evolve
WebDriver in that direction should the working group choose to add
a full duplex parallel pipelining interface (WebSockets?) at some
point in the future.
I’d like buy-in from someone else on proposal 3 before I start
implementation.
Flags: needinfo?(hskupin)
Depends on: 1397780
(In reply to Andreas Tolfsen ‹:ato› from comment #11)
> The main problem we are seeing here is that whilst we can trust
> sizemodechange, the browser’s document.hidden DOM property is not
> synchronously updated.  This means that the assertion that uses an
> injected script to return its value is getting back the out-of-date,
> old document.hidden state because it has not yet been updated.

Olli, do you know if this is a known issue or if that is expected due to the asynchronicity when minimizing a window? We just wonder why `document.hidden` still returns `false` when we already got the window state change with the window not visible anymore. We would appreciate some feedback. Thanks.

I will keep my ni? for now.
Flags: needinfo?(bugs)
sizemodechange and .hidden aren't too strongly connected in the implementation.
And since sizemodechange is a Gecko internal thing, there isn't a spec for this.

Does anyone have a stack when document.hidden becomes true in this case?
Flags: needinfo?(bugs)
Ah, hmm, I guess http://searchfox.org/mozilla-central/rev/e76c0fee79a34a3612740b33508276f4b37c3ee8/docshell/base/nsDocShell.cpp#6343 gets called, and that does the visibility change asynchronously.
(In reply to Olli Pettay [:smaug] (Currently reviews only for
patches which should land to FF57 [2017-09-21]) from comment #15)

> sizemodechange and .hidden aren't too strongly connected in the
> implementation.  And since sizemodechange is a Gecko internal
> thing, there isn't a spec for this.
> 
> Does anyone have a stack when document.hidden becomes true in this
> case?

Henrik:

As I understand it, isn’t there bound to be the potential for
a race condition between when sizemodechange in the main thread
and visibilitychange in the child process fires unless you employ
synchronisation primitives?

sizemodechange fires (in our case) when the GTK "window_state_event"
signal is received.  This eventually makes it to the docshell page
show/hide functions that updates the document’s visibility state.
At some point, and I’m fuzzy on the details here, this is sent via
IPC to the child process which fires the visibilitychange event in
DOM.  But importantly, I fail to see how it is an issue that these
two events are not strongly connected?

What matters here is that the test is asserting document.hidden
because the window rect doesn’t have a "state" field
representing ChromeWindow.windowState anymore, so we need to
delay the WebDriver:MinimizeWindow command from returning until
visibilitychange/document.hidden in the child process has had time
to fire.
Attaching a WIP patch that lets content DOM events be observed.
(In reply to Andreas Tolfsen ‹:ato› from comment #11)
>   (3) A web content DOM event observer.  We write a new observer
>   in testing/marionette/listener.js that allows Marionette chrome
>   scope to add listeners for any web content DOM event.  We use
>   that to listen for visibilitychange to fire, then remove the
>   listener.  We put this in a promise that listens for the event,
>   calls window.minimize(), then resolves the promise.

I don't know the details about the `visibilitychange` event on the document, but wonder if that is an event which might be propagated to the parent's process contentBrowser. Have you checked if that this available there too? Otherwise Olli might know immediately I believe.

Regarding the custom observer notifications for Marionette. Are you planning that all of those (including those we might need in the future) are getting registered at once and are staying active for the lifetime of the framescript? Registering/Unregistering them all the time when specific commands are getting run, could be come a lot more complex.

> Option 3 incurs more complexity, but is a potential feature it
> would—in the future—be possible to expose as a Marionette
> service.  Having the option to subscribe to events from OOP would be
> very exciting.  It obviously would not be compatible with WebDriver
> (not being full duplex), but makes it possible to easily evolve
> WebDriver in that direction should the working group choose to add
> a full duplex parallel pipelining interface (WebSockets?) at some
> point in the future.

I don't know about the second part of that paragraph but for the first part I can totally agree. We could abstract events and in case underlying changes happen, only a single modification would have to be done. I also think about the page load events here...

(In reply to Andreas Tolfsen ‹:ato› from comment #17)
> sizemodechange fires (in our case) when the GTK "window_state_event"
> signal is received.  This eventually makes it to the docshell page
> show/hide functions that updates the document’s visibility state.
> At some point, and I’m fuzzy on the details here, this is sent via
> IPC to the child process which fires the visibilitychange event in
> DOM.  But importantly, I fail to see how it is an issue that these
> two events are not strongly connected?

Sorry, but I have no idea/knowledge about what's going on here.

> What matters here is that the test is asserting document.hidden
> because the window rect doesn’t have a "state" field
> representing ChromeWindow.windowState anymore, so we need to
> delay the WebDriver:MinimizeWindow command from returning until
> visibilitychange/document.hidden in the child process has had time
> to fire.

So `WindowState` will be added again for level 2 (see https://github.com/w3c/webdriver/issues/1023). Why can't we rely on that property internally, if it makes handling of this situation way easier? To the outside it's important that `document.hidden` needs the correct value, which will be the case when using it, right?
Flags: needinfo?(hskupin)
(In reply to Henrik Skupin (:whimboo) from comment #19)

> I don't know the details about the `visibilitychange` event on the
> document, but wonder if that is an event which might be propagated
> to the parent's process contentBrowser. Have you checked if that
> this available there too?

It’s not available on <xul:browser> because it is fired on Document:

	https://www.w3.org/TR/page-visibility/#sec-visibilitychange-event

> Regarding the custom observer notifications for Marionette. Are
> you planning that all of those (including those we might need in
> the future) are getting registered at once and are staying active
> for the lifetime of the framescript? Registering/Unregistering
> them all the time when specific commands are getting run, could be
> come a lot more complex.

You’d register an event listener when you need it for a specific
purpose.  In the WIP patch I’ve attached you can see I’ve
introduced a {once: true} option analogous to that available on
EventTarget.addEventListener:

	https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener

As far as I know there’s no inherent value in listening for all
DOM events all the time.

> (In reply to Andreas Tolfsen ‹:ato› from comment #17)
> 
> > What matters here is that the test is asserting document.hidden
> > because the window rect doesn’t have a "state" field
> > representing ChromeWindow.windowState anymore, so we need to
> > delay the WebDriver:MinimizeWindow command from returning until
> > visibilitychange/document.hidden in the child process has had
> > time to fire.
> 
> So `WindowState` will be added again for level 2 (see
> https://github.com/w3c/webdriver/issues/1023). Why can't we
> rely on that property internally, if it makes handling of
> this situation way easier? To the outside it's important that
> `document.hidden` needs the correct value, which will be the case
> when using it, right?

We can’t rely on sizemodechange internally because the test is
asserting document.hidden, which as Olli pointed out earlier, is
not strongly connected.  There is a race condition between when the
first fires and when the visibilitychange DOM event has had time to
fire in content.

We also can’t rely on sizemodechange externally, in the test
assertion, because the test lives in WPT and we can’t rely on
proprietary Firefox features there.

                                ~ * ~

I’ve filed https://bugzil.la/1400226 to make Marionette wait on
visibilitychange before returning from WebDriver:MinimizeWindow.
Depends on: 1400226
Priority: P5 → P3
Should be fixed now by the patch on bug 1400226.
Assignee: nobody → ato
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
You need to log in before you can comment on or make changes to this bug.