1489700 - Linux only: Race condition in window positioning breaks wpt open-features tests that check window positioning (top, left, screenx, screeny)

Assignee

Description

•

6 years ago

Background: trying to get startup/windowopen perf wins by no longer needlessly creating about:blank content docs in the initial browser, bug 1362774. STR: 1. check out m-c 2. in _setupInitialBrowserAndTab in tabbrowser.js, change the passed value for `uriIsAboutBlank` from `true` to `false`, ie this call: https://searchfox.org/mozilla-central/rev/37663bb87004167184de6f2afa6b05875eb0528e/browser/base/content/tabbrowser.js#291 3. ./mach build faster 4. run (e.g.) ./mach wpt testing/web-platform/meta/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-non-integer-left.html ER: pass AR: fail, the value of all the positioning properties is 0 instead of the expected value. Kicker: in testing/web-platform/meta/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/resources/message-opener.html, add a 5s timeout before passing back the window values to the opener via postMessage. The test will now pass. Clearly, something in the window positioning here is async to a degree that it breaks this test. The actual logic on the DOM side of things shouldn't really change materially here. I... don't really know where to start with this type of problem. The test is fine everywhere but Linux, so I'm led to believe it has something to do with gtk - if there was a slowness factor I'd expect debug to go orange across the board. I also notice the gfx team has disabled it on opt/pgo webrender, perhaps because it became racy with the speedups webrender provides? (bug 1425589 - unfortunately kats is out on leave so I can't ask them) Emilio, I see you've changed things in this part of the world before now (e.g. bug 1449166). Any idea what could be wrong here, where to start looking, or who might know? Karl, do you have any ideas?

Flags: needinfo?(karlt)

Flags: needinfo?(emilio)

:Gijs (out for now; he/him)

Assignee

Comment 1

•

6 years ago

sample trypush: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a12fa8e5a5e9de4edc48a44070f1bf652f103fd8&selectedJob=197717113

Emilio Cobos Álvarez (:emilio)

Comment 2

•

6 years ago

Does reversing https://hg.mozilla.org/mozilla-central/rev/c2f10e92f864 help? If so that's pretty much all I can help with without building gtk and stepping through the code back and forth with rr, which is what I had to do to write that :) I'm not sure I'd have the cycles to debug this, but if for some reason this is super-high priority I could give a shot to debug that... Probably Karl would be faster though.

Flags: needinfo?(emilio)

Karl Tomlinson (:karlt)

Comment 3

•

6 years ago

(In reply to :Gijs (he/him) from comment #0) > Clearly, something in the window positioning here is async Window positioning is async. The app requests that the window manager position the window. The window manager may adjust the position if it chooses. Our test infrastructure runs tests under a window manager that usually positions as expected, but I don't recall the precise ordering of position and shown messages. > to a degree that > it breaks this test. The actual logic on the DOM side of things shouldn't > really change materially here. I assume you are saying that changing `uriIsAboutBlank` should make no material difference to the behavior (which sounds very plausible). > The test is fine everywhere but Linux, so I'm led to believe it has > something to do with gtk. Although GTK may be having an influence on the ordering of messages, this is an expected behavior on the X11 platform, regardless of the toolkit involved. AIUI other platforms have a more synchronous model. I don't know whether this is achieved by blocking or deceit. The async behavior is known and expected, and so it would be entirely reasonable to skip this test on Linux due to non-deterministic results. Perhaps there is a case for delaying the load event until the window manager has responded to the position request. I don't know whether that would be desirable in general or not, but determining which position message is the corresponding window manager response would at least be non-trivial. I don't know whether there is reliable mechanism available or not. I wouldn't block the about:blank work on this.

Flags: needinfo?(karlt)

:Gijs (out for now; he/him)

Assignee

Comment 4

•

6 years ago

(In reply to Karl Tomlinson (:karlt) from comment #3) > (In reply to :Gijs (he/him) from comment #0) > > Clearly, something in the window positioning here is async > > Window positioning is async. The app requests that the window manager > position the window. The window manager may adjust the position if it > chooses. Our test infrastructure runs tests under a window manager that > usually positions as expected, but I don't recall the precise ordering of > position and shown messages. Aha. This makes sense. It being window-manager dependent (so different WMs would still yield different messages in some cases) is... disheartening. > > to a degree that > > it breaks this test. The actual logic on the DOM side of things shouldn't > > really change materially here. > > I assume you are saying that changing `uriIsAboutBlank` should make no > material difference to the behavior (which sounds very plausible). Yeah. I added some debug logging to the XUL window positioning code, and that seems to be the same before/after the patch, so I'm really not sure what's triggering the different behavior. > > The test is fine everywhere but Linux, so I'm led to believe it has > > something to do with gtk. > > Although GTK may be having an influence on the ordering of messages, this is > an expected behavior on the X11 platform, regardless of the toolkit involved. Right, sorry, I meant gtk as opposed to windows/mac, x11 would have been more precise... > AIUI other platforms have a more synchronous model. > I don't know whether this is achieved by blocking or deceit. > > The async behavior is known and expected, and so it would be entirely > reasonable to skip this test on Linux due to non-deterministic results. > > Perhaps there is a case for delaying the load event until the window manager > has responded to the position request. I don't know whether that would be > desirable in general or not, but determining which position message is > the corresponding window manager response would at least be non-trivial. I > don't know whether there is reliable mechanism available or not. I wouldn't > block the about:blank work on this. :jgraham, given Karl's comments, would it be acceptable to "just" disable the following tests on Linux, in testing/web-platform/tests/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/ : open-features-non-integer-left.html open-features-non-integer-screenx.html open-features-non-integer-screeny.html open-features-non-integer-top.html open-features-tokenization-top-left.html open-features-tokenization-screenx-screeny.html ? Who'd review such a change?

Flags: needinfo?(james)

Karl Tomlinson (:karlt)

Comment 5

•

6 years ago

I'd be happy to review such a change.

James Graham [:jgraham]

Comment 6

•

6 years ago

I could review such a change as well, but just disabling the tests seems unfortunate since we are presumbaly losing some coverage. Can't we make the tests accept that the window resize might take some time, and let them poll for a few seconds before failing?

Flags: needinfo?(james)

:Gijs (out for now; he/him)

Assignee

Comment 7

•

6 years ago

(In reply to James Graham [:jgraham] from comment #6) > I could review such a change as well, but just disabling the tests seems > unfortunate since we are presumbaly losing some coverage. Can't we make the > tests accept that the window resize might take some time, and let them poll > for a few seconds before failing? My understanding of web platform tests is that they're supposed to work as-is... are you saying I can just put a few setTimeouts in there?

Flags: needinfo?(james)

James Graham [:jgraham]

Comment 8

•

6 years ago

I mean they're just software so can be buggy like any other software :) Yes, it's fine to modify them when they're broken. The only thing that's not allowed is adjusting the behaviour to match gecko in the case that gecko is doing something nonstandard. This seems like a case where the test is making incorrect assumptions about os behaviour, so it's perfectly alright (and indeed encouraged) to fix the test.

Flags: needinfo?(james)

:Gijs (out for now; he/him)

Assignee

Comment 9

•

6 years ago

OK, I can take a stab at updating the tests. Is there a suitable example for how to "wait a bit but not time out forever" in wpt tests? In mochitests I'd use TestUtils.waitForCondition, but presumably gecko test utils aren't available in wpt...

Assignee: nobody → gijskruitbosch+bugs

Status: NEW → ASSIGNED

Component: Widget: Gtk → web-platform-tests

Flags: needinfo?(james)

Product: Core → Testing

James Graham [:jgraham]

Comment 10

•

6 years ago

I don't know of a good example I think that tests that need it probably just invent something using setTimeout; afaik there isn't a utility function for this although you could probably port waitForCondition if you like. The integration with the test API probably looks something like (assuming test is declared as an async_test): function wait(test, condition, maxTime=5000, interval=100, msg="") { let maxTries = maxTime / interval; let try = 0; funtion wait_() { if (condition()) { test.done() } else if(try > maxTries) { t.step_func(() => assert_unreached("Timed out waiting for condition " + msg)); } else { try++; setTimeout(wait_, interval); } } }

Flags: needinfo?(james)

:Gijs (out for now; he/him)

Assignee

Comment 11

•

6 years ago

Attached file Bug 1489700 - allow window positioning to be asynchronous, r?jgraham — Details

James Graham [:jgraham]

Comment 12

•

6 years ago

Comment on attachment 9008383 [details] Bug 1489700 - allow window positioning to be asynchronous, r?jgraham James Graham [:jgraham] has approved the revision.

Attachment #9008383 - Flags: review+

Pulsebot

Comment 13

•

6 years ago

Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/60b86111e7f7 allow window positioning to be asynchronous, r=jgraham

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 14

•

6 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/12966 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 15

•

6 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Pulsebot

Comment 16

•

6 years ago

Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8675ff71202c allow window positioning to be asynchronous, r=jgraham

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 17

•

6 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Daniel Varga [:dvarga]

Comment 18

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/60b86111e7f7

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox64: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Daniel Varga [:dvarga]

Comment 19

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8675ff71202c

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 20

•

6 years ago

Upstream PR merged

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 21

•

6 years ago

Can't merge web-platform-tests PR due to failing upstream checks: Github PR https://github.com/web-platform-tests/wpt/pull/12966 * Taskcluster (pull_request) (https://tools.taskcluster.net/task-group-inspector/#/NuT1_jZnQKSKNuPK5o7uEA)

:Gijs (out for now; he/him)

Assignee

Updated

•

6 years ago

Depends on: 1503624