Linux only: Race condition in window positioning breaks wpt open-features tests that check window positioning (top, left, screenx, screeny)

RESOLVED FIXED in Firefox 64

Status

RESOLVED FIXED
3 months ago
a month ago

People

(Reporter: Gijs, Assigned: Gijs)

Tracking

Trunk
mozilla64
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox64 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

3 months ago
Background: trying to get startup/windowopen perf wins by no longer needlessly creating about:blank content docs in the initial browser, bug 1362774.

STR:

1. check out m-c
2. in _setupInitialBrowserAndTab in tabbrowser.js, change the passed value for `uriIsAboutBlank` from `true` to `false`, ie this call:

https://searchfox.org/mozilla-central/rev/37663bb87004167184de6f2afa6b05875eb0528e/browser/base/content/tabbrowser.js#291

3. ./mach build faster
4. run (e.g.) ./mach wpt testing/web-platform/meta/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-non-integer-left.html

ER:

pass

AR:

fail, the value of all the positioning properties is 0 instead of the expected value.

Kicker: in testing/web-platform/meta/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/resources/message-opener.html, add a 5s timeout before passing back the window values to the opener via postMessage. The test will now pass.


Clearly, something in the window positioning here is async to a degree that it breaks this test. The actual logic on the DOM side of things shouldn't really change materially here.

I... don't really know where to start with this type of problem. The test is fine everywhere but Linux, so I'm led to believe it has something to do with gtk - if there was a slowness factor I'd expect debug to go orange across the board. I also notice the gfx team has disabled it on opt/pgo webrender, perhaps because it became racy with the speedups webrender provides? (bug 1425589 - unfortunately kats is out on leave so I can't ask them)

Emilio, I see you've changed things in this part of the world before now (e.g. bug 1449166). Any idea what could be wrong here, where to start looking, or who might know? Karl, do you have any ideas?
Flags: needinfo?(karlt)
Flags: needinfo?(emilio)
Does reversing https://hg.mozilla.org/mozilla-central/rev/c2f10e92f864 help?

If so that's pretty much all I can help with without building gtk and stepping through the code back and forth with rr, which is what I had to do to write that :)

I'm not sure I'd have the cycles to debug this, but if for some reason this is super-high priority I could give a shot to debug that... Probably Karl would be faster though.
Flags: needinfo?(emilio)
(In reply to :Gijs (he/him) from comment #0)
> Clearly, something in the window positioning here is async

Window positioning is async.  The app requests that the window manager
position the window.  The window manager may adjust the position if it
chooses.  Our test infrastructure runs tests under a window manager that
usually positions as expected, but I don't recall the precise ordering of
position and shown messages.

> to a degree that
> it breaks this test. The actual logic on the DOM side of things shouldn't
> really change materially here.

I assume you are saying that changing `uriIsAboutBlank` should make no
material difference to the behavior (which sounds very plausible).

> The test is fine everywhere but Linux, so I'm led to believe it has
> something to do with gtk.

Although GTK may be having an influence on the ordering of messages, this is
an expected behavior on the X11 platform, regardless of the toolkit involved.

AIUI other platforms have a more synchronous model.
I don't know whether this is achieved by blocking or deceit.

The async behavior is known and expected, and so it would be entirely
reasonable to skip this test on Linux due to non-deterministic results.

Perhaps there is a case for delaying the load event until the window manager
has responded to the position request.  I don't know whether that would be
desirable in general or not, but determining which position message is
the corresponding window manager response would at least be non-trivial.  I
don't know whether there is reliable mechanism available or not.  I wouldn't
block the about:blank work on this.
Flags: needinfo?(karlt)
(Assignee)

Comment 4

3 months ago
(In reply to Karl Tomlinson (:karlt) from comment #3)
> (In reply to :Gijs (he/him) from comment #0)
> > Clearly, something in the window positioning here is async
> 
> Window positioning is async.  The app requests that the window manager
> position the window.  The window manager may adjust the position if it
> chooses.  Our test infrastructure runs tests under a window manager that
> usually positions as expected, but I don't recall the precise ordering of
> position and shown messages.

Aha. This makes sense. It being window-manager dependent (so different WMs would still yield different messages in some cases) is... disheartening.

> > to a degree that
> > it breaks this test. The actual logic on the DOM side of things shouldn't
> > really change materially here.
> 
> I assume you are saying that changing `uriIsAboutBlank` should make no
> material difference to the behavior (which sounds very plausible).

Yeah. I added some debug logging to the XUL window positioning code, and that seems to be the same before/after the patch, so I'm really not sure what's triggering the different behavior.

> > The test is fine everywhere but Linux, so I'm led to believe it has
> > something to do with gtk.
> 
> Although GTK may be having an influence on the ordering of messages, this is
> an expected behavior on the X11 platform, regardless of the toolkit involved.

Right, sorry, I meant gtk as opposed to windows/mac, x11 would have been more precise...

> AIUI other platforms have a more synchronous model.
> I don't know whether this is achieved by blocking or deceit.
> 
> The async behavior is known and expected, and so it would be entirely
> reasonable to skip this test on Linux due to non-deterministic results.
> 
> Perhaps there is a case for delaying the load event until the window manager
> has responded to the position request.  I don't know whether that would be
> desirable in general or not, but determining which position message is
> the corresponding window manager response would at least be non-trivial.  I
> don't know whether there is reliable mechanism available or not.  I wouldn't
> block the about:blank work on this.

:jgraham, given Karl's comments, would it be acceptable to "just" disable the following tests on Linux, in testing/web-platform/tests/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/ :

open-features-non-integer-left.html
open-features-non-integer-screenx.html
open-features-non-integer-screeny.html
open-features-non-integer-top.html
open-features-tokenization-top-left.html
open-features-tokenization-screenx-screeny.html

? Who'd review such a change?
Flags: needinfo?(james)
I'd be happy to review such a change.
I could review such a change as well, but just disabling the tests seems unfortunate since we are presumbaly losing some coverage. Can't we make the tests accept that the window resize might take some time, and let them poll for a few seconds before failing?
Flags: needinfo?(james)
(Assignee)

Comment 7

3 months ago
(In reply to James Graham [:jgraham] from comment #6)
> I could review such a change as well, but just disabling the tests seems
> unfortunate since we are presumbaly losing some coverage. Can't we make the
> tests accept that the window resize might take some time, and let them poll
> for a few seconds before failing?

My understanding of web platform tests is that they're supposed to work as-is... are you saying I can just put a few setTimeouts in there?
Flags: needinfo?(james)
I mean they're just software so can be buggy like any other software :) Yes, it's fine to modify them when they're broken. The only thing that's not allowed is adjusting the behaviour to match gecko in the case that gecko is doing something nonstandard. This seems like a case where the test is making incorrect assumptions about os behaviour, so it's perfectly alright (and indeed encouraged) to fix the test.
Flags: needinfo?(james)
(Assignee)

Comment 9

3 months ago
OK, I can take a stab at updating the tests. Is there a suitable example for how to "wait a bit but not time out forever" in wpt tests? In mochitests I'd use TestUtils.waitForCondition, but presumably gecko test utils aren't available in wpt...
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Component: Widget: Gtk → web-platform-tests
Flags: needinfo?(james)
Product: Core → Testing
I don't know of a good example I think that tests that need it probably just invent something using setTimeout; afaik there isn't a utility function for this although you could probably port waitForCondition if you like. The integration with the test API probably looks something like (assuming test is declared as an async_test):

function wait(test, condition, maxTime=5000, interval=100, msg="") {
  let maxTries = maxTime / interval;
  let try = 0;
  funtion wait_() {
    if (condition()) {
      test.done()
    } else if(try > maxTries) {
      t.step_func(() => assert_unreached("Timed out waiting for condition " + msg));
    } else {
        try++;
        setTimeout(wait_, interval);
    }
  }
}
Flags: needinfo?(james)
(Assignee)

Comment 11

3 months ago
Created attachment 9008383 [details]
Bug 1489700 - allow window positioning to be asynchronous, r?jgraham
Comment on attachment 9008383 [details]
Bug 1489700 - allow window positioning to be asynchronous, r?jgraham

James Graham [:jgraham] has approved the revision.
Attachment #9008383 - Flags: review+

Comment 13

3 months ago
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/60b86111e7f7
allow window positioning to be asynchronous, r=jgraham
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/12966 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Comment 16

3 months ago
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8675ff71202c
allow window positioning to be asynchronous, r=jgraham
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Comment 18

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/60b86111e7f7
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
status-firefox64: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Upstream PR merged
Can't merge web-platform-tests PR due to failing upstream checks:
Github PR https://github.com/web-platform-tests/wpt/pull/12966
* Taskcluster (pull_request) (https://tools.taskcluster.net/task-group-inspector/#/NuT1_jZnQKSKNuPK5o7uEA)
(Assignee)

Updated

a month ago
Depends on: 1503624
You need to log in before you can comment on or make changes to this bug.