Closed
Bug 1397007
Opened 7 years ago
Closed 7 years ago
Intermittent /webdriver/tests/minimize_window.py | minimize_window.py::test_payload
Categories
(Remote Protocol :: Marionette, defect, P3)
Tracking
(firefox56 unaffected, firefox57 fixed)
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox56 | --- | unaffected |
firefox57 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: ato)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])
Attachments
(1 file)
9.59 KB,
patch
|
Details | Diff | Splinter Review |
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=128666371&repo=autoland https://queue.taskcluster.net/v1/task/bxLHoa-hSFK14guYszqWtA/runs/0/artifacts/public/logs/live_backing.log
Comment 1•7 years ago
|
||
[task 2017-09-05T20:51:44.208676Z] 20:51:44 INFO - TEST-UNEXPECTED-FAIL | /webdriver/tests/minimize_window.py | minimize_window.py::test_payload [task 2017-09-05T20:51:44.209323Z] 20:51:44 INFO - session = <webdriver.client.Session object at 0x7f1cc3983050> [task 2017-09-05T20:51:44.209862Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.210397Z] 20:51:44 INFO - def test_payload(session): [task 2017-09-05T20:51:44.210755Z] 20:51:44 INFO - assert not session.execute_script("return document.hidden") [task 2017-09-05T20:51:44.211092Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.211447Z] 20:51:44 INFO - response = minimize(session) [task 2017-09-05T20:51:44.211785Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.212117Z] 20:51:44 INFO - # step 5 [task 2017-09-05T20:51:44.212460Z] 20:51:44 INFO - assert response.status == 200 [task 2017-09-05T20:51:44.213191Z] 20:51:44 INFO - assert isinstance(response.body["value"], dict) [task 2017-09-05T20:51:44.213479Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.213819Z] 20:51:44 INFO - value = response.body["value"] [task 2017-09-05T20:51:44.214031Z] 20:51:44 INFO - assert "width" in value [task 2017-09-05T20:51:44.214345Z] 20:51:44 INFO - assert "height" in value [task 2017-09-05T20:51:44.214605Z] 20:51:44 INFO - assert "x" in value [task 2017-09-05T20:51:44.214844Z] 20:51:44 INFO - assert "y" in value [task 2017-09-05T20:51:44.215117Z] 20:51:44 INFO - assert isinstance(value["width"], (int, float)) [task 2017-09-05T20:51:44.215403Z] 20:51:44 INFO - assert isinstance(value["height"], (int, float)) [task 2017-09-05T20:51:44.215674Z] 20:51:44 INFO - assert isinstance(value["x"], (int, float)) [task 2017-09-05T20:51:44.215948Z] 20:51:44 INFO - assert isinstance(value["y"], (int, float)) [task 2017-09-05T20:51:44.216113Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.216381Z] 20:51:44 INFO - > assert session.execute_script("return document.hidden") [task 2017-09-05T20:51:44.216912Z] 20:51:44 INFO - E assert <bound method Session.execute_script of <webdriver.client.Session object at 0x7f1cc3983050>>('return document.hidden') [task 2017-09-05T20:51:44.217060Z] 20:51:44 INFO - E + where <bound method Session.execute_script of <webdriver.client.Session object at 0x7f1cc3983050>> = <webdriver.client.Session object at 0x7f1cc3983050>.execute_script [task 2017-09-05T20:51:44.217374Z] 20:51:44 INFO - [task 2017-09-05T20:51:44.217505Z] 20:51:44 INFO - tests/web-platform/tests/webdriver/tests/minimize_window.py:59: AssertionError So `document.hidden` is not True right after calling `minimize` for the current window.
Flags: needinfo?(ato)
Assignee | ||
Comment 2•7 years ago
|
||
I don’t think this warrants my immediate attention, but let’s see how https://bugzil.la/1396618 affects this. We know that DOM attributes propagate slowly, and on Linux especially, we will never be able to guarantee this command to be race-free. Before taking any further action, I would like to see how severe the intermittency rate of this is. If it turns out to be unacceptable, we should consider hard coding a timeout on Linux. That’s the only option I can think of.
Flags: needinfo?(ato)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #2) > Before taking any further action, I would like to see how severe the > intermittency rate of this is. If it turns out to be unacceptable, > we should consider hard coding a timeout on Linux. That’s the > only option I can think of. It looks pretty bad! Combined with bug 1397069, I'd anticipate about 400 failures / week. Hope you can look at this soon.
Flags: needinfo?(ato)
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment 7•7 years ago
|
||
Looks like throttling has helped here but didn't fix it. When checking the OF stats it looks like that it first started on Sep 5th: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1397007&startday=2017-09-01&endday=2017-09-11&tree=all Andreas, did you land something between Sep 4th and 5th, which could have caused this?
Assignee | ||
Comment 8•7 years ago
|
||
Like I said in another bug, I believe it could be affected by https://searchfox.org/mozilla-central/commit/ba22a006380310c5391f68157e24f8b651a19ba6, which changes the assertion: > - assert session.window.state == "minimized" > + assert session.execute_script("return document.hidden") My guess is that the DOM property is not propagated quickly enough after the window has been iconified. I’m not really sure what to do about this to make it more reliable. This is inherently racy on Linux due to the way the X11 protocol works. Ideas?
Flags: needinfo?(ato)
Comment 9•7 years ago
|
||
The failure rate is pretty high. So I would propose that you use a one-click loaner to see what's going on. The chance to see it will be pretty good for Linux64.
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #9) > The failure rate is pretty high. So I would propose that you use a > one-click loaner to see what's going on. The chance to see it will > be pretty good for Linux64. I don’t know how to interact with mozharness to use loaners and I found this too hard to spend time learning in the past. But seeing what is going on doesn’t really help. The only option I can think of at the moment is to disable our Minimize Window command implementation.
Assignee | ||
Comment 11•7 years ago
|
||
I had another dive into this today, and will try to summarise the problem and a few different potential solutions I came up with. When the document.hidden assertion fails the window has in fact already been minimised. What we are seeing here is a DOM propagation race condition, where document.hidden isn’t updated in time before the call to WebDriver:ExecuteScript subsequent to WebDriver:MinimizeWindow. We can be sure of this because the sizemodechange event fires reliably when the widget layer fires an iconification event. Now, it is true as I said before that this event is not synchronous on Linux when using the X11 protocol, but from my own testing with a non-optimised debug build (which are terribly slow, but rather good at reproducing this problem) I can see that the ChromeWindow.windowState and the sizemodechange events are intrinsicly linked. This means we never actually return from WebDriver:MinimizeWindow before at least the internal window state in Firefox is reporting that it has been iconified. Whether the OS window widget indeed has been minimised on the screen is actually of lesser importance because that is not what the WPT test is asserting. The window will certainly within close proximity of sizemodechange firing be minimised. The main problem we are seeing here is that whilst we can trust sizemodechange, the browser’s document.hidden DOM property is not synchronously updated. This means that the assertion that uses an injected script to return its value is getting back the out-of-date, old document.hidden state because it has not yet been updated. This was working before because we were returning ChromeWindow.windowState (linked to sizemodechange) in a "state" field on the window rect, and using that as in the assertion. Because window state was dropped from level 1 in the specification, we now have to rely on web platform APIs such as document.hidden. I was able to come up with three solutions that so far look promising. They all achieve the same result, but use different mechanisms for getting there: (1) OOP polling from the test. We employ a synchronous explicit wait utility, like wait.py, in the test to inject scripts that return document.hidden, but don’t continue on before it returns true or a timeout is reached. (2) New frame script command that polls for document.hidden to be true. Similar to option 1, but the wait-poll is done in testing/marionette/listener.js using wait.until. Better for both IPC- and transport reasons. (3) A web content DOM event observer. We write a new observer in testing/marionette/listener.js that allows Marionette chrome scope to add listeners for any web content DOM event. We use that to listen for visibilitychange to fire, then remove the listener. We put this in a promise that listens for the event, calls window.minimize(), then resolves the promise. As much as I dislike the idea of having to do IPC to the frame script in order to determine that the window has been minimised, it is the only reliable solution I could find that removes the surprise of document.hidden not returning true after calling the WebDriver:MinimizeWindow command. Between option 2 and 3, the event-based approach in 3 is obviously preferable to polling when we the option not to. It is also more generic than having specific one-off hacks written specifically for the fact that the DOM is asynchronous. It also has the upside that this is how most other functionality in chrome that depends on some web content state is written. Option 3 incurs more complexity, but is a potential feature it would—in the future—be possible to expose as a Marionette service. Having the option to subscribe to events from OOP would be very exciting. It obviously would not be compatible with WebDriver (not being full duplex), but makes it possible to easily evolve WebDriver in that direction should the working group choose to add a full duplex parallel pipelining interface (WebSockets?) at some point in the future.
Assignee | ||
Comment 12•7 years ago
|
||
I’d like buy-in from someone else on proposal 3 before I start implementation.
Flags: needinfo?(hskupin)
Comment hidden (Intermittent Failures Robot) |
Comment 14•7 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #11) > The main problem we are seeing here is that whilst we can trust > sizemodechange, the browser’s document.hidden DOM property is not > synchronously updated. This means that the assertion that uses an > injected script to return its value is getting back the out-of-date, > old document.hidden state because it has not yet been updated. Olli, do you know if this is a known issue or if that is expected due to the asynchronicity when minimizing a window? We just wonder why `document.hidden` still returns `false` when we already got the window state change with the window not visible anymore. We would appreciate some feedback. Thanks. I will keep my ni? for now.
Flags: needinfo?(bugs)
Comment 15•7 years ago
|
||
sizemodechange and .hidden aren't too strongly connected in the implementation. And since sizemodechange is a Gecko internal thing, there isn't a spec for this. Does anyone have a stack when document.hidden becomes true in this case?
Flags: needinfo?(bugs)
Comment 16•7 years ago
|
||
Ah, hmm, I guess http://searchfox.org/mozilla-central/rev/e76c0fee79a34a3612740b33508276f4b37c3ee8/docshell/base/nsDocShell.cpp#6343 gets called, and that does the visibility change asynchronously.
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Olli Pettay [:smaug] (Currently reviews only for patches which should land to FF57 [2017-09-21]) from comment #15) > sizemodechange and .hidden aren't too strongly connected in the > implementation. And since sizemodechange is a Gecko internal > thing, there isn't a spec for this. > > Does anyone have a stack when document.hidden becomes true in this > case? Henrik: As I understand it, isn’t there bound to be the potential for a race condition between when sizemodechange in the main thread and visibilitychange in the child process fires unless you employ synchronisation primitives? sizemodechange fires (in our case) when the GTK "window_state_event" signal is received. This eventually makes it to the docshell page show/hide functions that updates the document’s visibility state. At some point, and I’m fuzzy on the details here, this is sent via IPC to the child process which fires the visibilitychange event in DOM. But importantly, I fail to see how it is an issue that these two events are not strongly connected? What matters here is that the test is asserting document.hidden because the window rect doesn’t have a "state" field representing ChromeWindow.windowState anymore, so we need to delay the WebDriver:MinimizeWindow command from returning until visibilitychange/document.hidden in the child process has had time to fire.
Assignee | ||
Comment 18•7 years ago
|
||
Attaching a WIP patch that lets content DOM events be observed.
Comment 19•7 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #11) > (3) A web content DOM event observer. We write a new observer > in testing/marionette/listener.js that allows Marionette chrome > scope to add listeners for any web content DOM event. We use > that to listen for visibilitychange to fire, then remove the > listener. We put this in a promise that listens for the event, > calls window.minimize(), then resolves the promise. I don't know the details about the `visibilitychange` event on the document, but wonder if that is an event which might be propagated to the parent's process contentBrowser. Have you checked if that this available there too? Otherwise Olli might know immediately I believe. Regarding the custom observer notifications for Marionette. Are you planning that all of those (including those we might need in the future) are getting registered at once and are staying active for the lifetime of the framescript? Registering/Unregistering them all the time when specific commands are getting run, could be come a lot more complex. > Option 3 incurs more complexity, but is a potential feature it > would—in the future—be possible to expose as a Marionette > service. Having the option to subscribe to events from OOP would be > very exciting. It obviously would not be compatible with WebDriver > (not being full duplex), but makes it possible to easily evolve > WebDriver in that direction should the working group choose to add > a full duplex parallel pipelining interface (WebSockets?) at some > point in the future. I don't know about the second part of that paragraph but for the first part I can totally agree. We could abstract events and in case underlying changes happen, only a single modification would have to be done. I also think about the page load events here... (In reply to Andreas Tolfsen ‹:ato› from comment #17) > sizemodechange fires (in our case) when the GTK "window_state_event" > signal is received. This eventually makes it to the docshell page > show/hide functions that updates the document’s visibility state. > At some point, and I’m fuzzy on the details here, this is sent via > IPC to the child process which fires the visibilitychange event in > DOM. But importantly, I fail to see how it is an issue that these > two events are not strongly connected? Sorry, but I have no idea/knowledge about what's going on here. > What matters here is that the test is asserting document.hidden > because the window rect doesn’t have a "state" field > representing ChromeWindow.windowState anymore, so we need to > delay the WebDriver:MinimizeWindow command from returning until > visibilitychange/document.hidden in the child process has had time > to fire. So `WindowState` will be added again for level 2 (see https://github.com/w3c/webdriver/issues/1023). Why can't we rely on that property internally, if it makes handling of this situation way easier? To the outside it's important that `document.hidden` needs the correct value, which will be the case when using it, right?
Flags: needinfo?(hskupin)
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #19) > I don't know the details about the `visibilitychange` event on the > document, but wonder if that is an event which might be propagated > to the parent's process contentBrowser. Have you checked if that > this available there too? It’s not available on <xul:browser> because it is fired on Document: https://www.w3.org/TR/page-visibility/#sec-visibilitychange-event > Regarding the custom observer notifications for Marionette. Are > you planning that all of those (including those we might need in > the future) are getting registered at once and are staying active > for the lifetime of the framescript? Registering/Unregistering > them all the time when specific commands are getting run, could be > come a lot more complex. You’d register an event listener when you need it for a specific purpose. In the WIP patch I’ve attached you can see I’ve introduced a {once: true} option analogous to that available on EventTarget.addEventListener: https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener As far as I know there’s no inherent value in listening for all DOM events all the time. > (In reply to Andreas Tolfsen ‹:ato› from comment #17) > > > What matters here is that the test is asserting document.hidden > > because the window rect doesn’t have a "state" field > > representing ChromeWindow.windowState anymore, so we need to > > delay the WebDriver:MinimizeWindow command from returning until > > visibilitychange/document.hidden in the child process has had > > time to fire. > > So `WindowState` will be added again for level 2 (see > https://github.com/w3c/webdriver/issues/1023). Why can't we > rely on that property internally, if it makes handling of > this situation way easier? To the outside it's important that > `document.hidden` needs the correct value, which will be the case > when using it, right? We can’t rely on sizemodechange internally because the test is asserting document.hidden, which as Olli pointed out earlier, is not strongly connected. There is a race condition between when the first fires and when the visibilitychange DOM event has had time to fire in content. We also can’t rely on sizemodechange externally, in the test assertion, because the test lives in WPT and we can’t rely on proprietary Firefox features there. ~ * ~ I’ve filed https://bugzil.la/1400226 to make Marionette wait on visibilitychange before returning from WebDriver:MinimizeWindow.
Depends on: 1400226
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Priority: P5 → P3
Comment 22•7 years ago
|
||
Should be fixed now by the patch on bug 1400226.
Assignee: nobody → ato
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → unaffected
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
Comment 23•1 year ago
|
||
Moving bug to Remote Protocol::Marionette component per bug 1815831.
Component: geckodriver → Marionette
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•