Closed Bug 1522790 Opened 6 years ago Closed 4 months ago

Marionette executor should use "New Window" command to open the test tab

Categories

(Testing :: web-platform-tests, enhancement, P2)

enhancement
Points:
8

Tracking

(firefox132 fixed)

RESOLVED FIXED
132 Branch
Tracking Status
firefox132 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Regressed 2 open bugs)

Details

(Whiteboard: [fxdroid][geckoview:p3][webdriver:m12], [wptsync upstream])

Attachments

(5 files, 7 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Now that we have the support for the "New Window" command in Marionette, we might want to make use of it.

Depends on: 1523234
Blocks: 1501562

Wes, can you add this to your todo list and coordinate with whimboo if you have any questions.

Flags: needinfo?(wkocher)

Note that bug 1523234 isn't needed here because wptrunner expects to have the new tab focused.

No longer depends on: 1523234

All the window.open() calls, which are not only in the py files, but also in various JS modules like:

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/reftest.js

Flags: needinfo?(hskupin)

It's not all, since it can only be used with resources that depend on executormarionette.

I put together a quick patch at https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f2ee65df66a329d7d5acda114a30ddf96a623d7

There are a couple of problems:

  1. The new tab opened in this way appears to show the default nee tab page and not about:blank as required by the spec. This is probably slowing tests down a little.
  2. This depends on an unreleased marionette_driver. If the plan is to not make more releases then we'll need to either avoid using the client methods here or put a copy of the client in [testing/web-platform/tests]tools/third_party for wptrunner to use.

Re 1, I'd suggest setting the new tab and homepage prefs to be about:blank in one of the unittest profiles that wpt depends on.

Depends on: 1530794

Re 2, I will get a new marionette driver package released via bug 1530794.

That last try push looks like a complete fail someone needs to be investigate. I'm happy to fix a possible underlying issue with Marionette.

I just released marionette_driver 2.8.0 to PyPI, which includes support for the new window command. So this unblocks the implementation here.

Depends on: 1533058

Here's jgraham's patch and a patch to set the homepage to about:blank in the wpt testing profile, rebased to current central.

And a try push to see what that looks like: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=3940a503f1c911c6fb5c8c212e18d8d5fada5a7d

Do I need to do anything to pick up the new marionette_driver now that bug 1530794 is completed? Are there any parts of jgraham's patch that can be dropped now?

You have to make sure to at least bump the dependency for marionette_driver.

Here's a try push with the dependency bumped: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=234671713&revision=11d24ebc5935bc60a540d1dfee13189ed6ec896f

Looks like everything's still broken, much in the same way it was for jgraham's original try push from comment 5. It's not every single test failing, though. There's one chunk that passed completely, and there are several TEST-OKs in each of the failing chunks. I'm not sure why things are failing. Are they occasionally getting the wrong window or something?

Flags: needinfo?(james)
Flags: needinfo?(hskupin)

Eh I think it was just a silly error; for some reason I thought Create Window switched to the new window, but it turns out it doesn't. So https://treeherder.mozilla.org/#/jobs?repo=try&revision=608fb12a70a9b1ce8451fdacd51fabd114d15818 is a try push with that fixed.

Flags: needinfo?(james)

Looks better. Still seems like there are several failures, and I'm seeing some of the JavascriptException: TypeError: window.__wptrunner_process_next_event is not a functions that we've been seeing in bug 1522790.

Some seem to be known intermittents, some UNEXPECTED-PASSes. Maybe this fixed some of our tests? Maybe we're just on a bad base revision?

(In reply to James Graham [:jgraham] from comment #16)

Eh I think it was just a silly error; for some reason I thought Create Window switched to the new window, but it turns out it doesn't.

Not by default. If you want that behavior you have to specify focus=True. I can see that this was already used in Wes' try push, so what exactly did you change?

Flags: needinfo?(hskupin) → needinfo?(james)

Not by default. If you want that behavior you have to specify focus=True. I can see that this was already used in Wes' try push, so what exactly did you change?

I mean "switched the webdriver active browsing context to"

Still seems like there are several failures, and I'm seeing some of the JavascriptException: TypeError: window.__wptrunner_process_next_event is not a functions that we've been seeing in bug 1522790.

I'm pretty sure there's a race condition that's causing the __process_next_event thing; I'm investigating.

Flags: needinfo?(james)

https://treeherder.mozilla.org/#/jobs?repo=try&author=james%40hoppipolla.co.uk&selectedJob=235205397 is a try push that ports the manual logic to make the runner wait for pending loads over to marionette (we were already using this for Chrome which doesn't support the right pageLoadStragtegy). This looks like it improved things, although there are still quite a few changed results which could be due to focus changes or something; this needs some investigation. That same patch could be a workaround in bug 1501562 so I suggest trying it there and see if it helps. But in any case it's clearly working around a bug in marionette in each case, so it's not the correct long-term solution.

Flags: needinfo?(wkocher)

So on Geckoview, if I don't also have a time.sleep(2) prior to the navigate(), wait_for_load() ends up in an infinite loop. location.href is 'about:blank', and document.readyState is 'complete'.

If I do have the time.sleep(2) up there, wait_for_load() is called once. location.href is the wpt test url, and document.readyState is 'complete'.

Flags: needinfo?(wkocher)
Whiteboard: [geckoview]
Whiteboard: [geckoview] → [geckoview:p3]
Depends on: 1559120
Assignee: nobody → james
Status: NEW → ASSIGNED
Attachment #9051877 - Attachment is obsolete: true
Severity: normal → S3

James, we should be good to go now with the changes for this bug on both desktop and Android. Are you still interested in updating the existing patch?

Flags: needinfo?(james)
Attachment #9051876 - Attachment description: Bug 1522790 - Use marionette new window command to open wptrunner windows r=whimboo → Bug 1522790 - Use marionette new window command to open wptrunner windows

I rebased the patch, and updated it.

The problem now seems to be that all the focus-related tests are failing; when we open the window with window.open we end up with the document in the new window getting focus, whereas with the new window approach, it seems like the focus ends up in the chrome area, and so tests that expect to be able to call element.focus() and have it change the focus don't work.

I guess the spec doesn't make it clear what should happen here, but it seems reasonable for marionette at least that if we pass focus=True we end up with focus in the content area.

Flags: needinfo?(james)
Depends on: 1798655

James, this should be unblocked now. Please try again after rebasing to latest central.

Flags: needinfo?(james)
Attachment #9383961 - Attachment is obsolete: true

Here the latest try push:
https://treeherder.mozilla.org/jobs?repo=try&revision=eefbf4c472d9f816b70d3398693df76ce05b6f24

James, if you need anything to test from me feel free to ask and as best push the updated patch to Phabricator.

I've pushed my updated patch to phab. There are a still some CI failures. Also I think the reftest change is broken, although we don't use that codepath in practice.

Flags: needinfo?(james)

I've made some updates to the meta files and additional fixes. All in a separate commit. I've pushed only those test suites to try which were failing before:

https://treeherder.mozilla.org/jobs?repo=try&revision=b10273fe599a01d5c93cb642057ebe23a2238b51

Depends on: 1887870
Points: --- → 3
Priority: -- → P2
Whiteboard: [geckoview:p3] → [geckoview:p3][webdriver:m11]

As agreed after the investigation of the sanitizer deadlock failures (see bug 1887870) we will temporarily mark those failing tests as intermittent OK, TIMEOUT. Here a new complete try build for web-platform-tests:

https://treeherder.mozilla.org/jobs?repo=try&revision=bd04435c99c3a95c27b074380ae7413bd2481657

Attachment #9390526 - Attachment description: Bug 1522790 - Fix intermittent test with short timer → Bug 1522790 - [wpt] Fix intermittent test failures for short timer.
Attachment #9394990 - Attachment description: Bug 1522790 - x[wpt] Fix intermittent failures in font-face-[003, 004].html. → Bug 1522790 - [wpt] Fix intermittent failures in font-face-[003, 004].html.

As discussed with James I'm taking this bug over because I investigated all remaining issues, updated the meta data and also fixed some tests. Based on the amount of time it took I would suggest that we set 5 points.

Assignee: james → hskupin
Points: 3 → 5
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9e98e314a251 Use marionette new window command to open wptrunner windows r=jgraham https://hg.mozilla.org/integration/autoland/rev/3acba738b962 [wpt] Fix intermittent test failures for short timer. r=whimboo https://hg.mozilla.org/integration/autoland/rev/251e04906ebf [wpt] Fix intermittent failures in font-face-[003, 004].html. r=jgraham https://hg.mozilla.org/integration/autoland/rev/f8d941b403d0 [wptrunner] Fix MarionetteWindowProtocolPart.set_rect() for optional arguments. r=jgraham https://hg.mozilla.org/integration/autoland/rev/e25a9624d6d9 [wpt] Update meta files for multiple statuses mainly for tsan builds. r=jgraham
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/45576 for changes under testing/web-platform/tests
Whiteboard: [geckoview:p3][webdriver:m11] → [geckoview:p3][webdriver:m11], [wptsync upstream]
Pushed by imoraru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/433392f1c300 [wpt] Fix-up broken wpt meta data. CLOSED TREE
Regressions: 1890038
Regressions: 1890062
Regressions: 1890080
See Also: → 1890248
Depends on: 1838834
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 126 Branch → ---
Upstream PR was closed without merging
Upstream PR was closed without merging
Upstream PR merged by moz-wptsync-bot
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/45606 for changes under testing/web-platform/tests
Backout by smolnar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8de7291b0bab Backed out 4 changesets - backout the remaining changesets. CLOSED TREE

Quick update what's left to do after the requested backout:

  • We should wait until the detected TSAN issues, which cause a lot of intermittent failures for canvas jobs, have been all fixed.
  • Bug 1890038 - "Perma wpt canvas timeout": Might be related to the above canvas issues and we should re-test once these bugs got fixed
  • Bug 1890062 - "TestRunner hit external timeout": Looks like most or all of them are related to bug 1887870 (deadlock detector)
Upstream PR merged by moz-wptsync-bot

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #47)

  • We should wait until the detected TSAN issues, which cause a lot of intermittent failures for canvas jobs, have been all fixed.
  • Bug 1890038 - "Perma wpt canvas timeout": Might be related to the above canvas issues and we should re-test once these bugs got fixed

Both known TSAN issues for graphics (text) have been fixed. That means that canvas tests should work now. I've pushed a try build with all tests for TSAN: https://treeherder.mozilla.org/jobs?repo=try&revision=4d39ecc4ad0fc982708e36e13580c392f25ea2b2

Lets see / hope that all the other canvas timeouts (bug 1890038) are fixed as well.

Attachment #9394992 - Attachment is obsolete: true
Attachment #9395283 - Attachment is obsolete: true
Attachment #9394992 - Attachment is obsolete: false

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #49)

Both known TSAN issues for graphics (text) have been fixed. That means that canvas tests should work now. I've pushed a try build with all tests for TSAN: https://treeherder.mozilla.org/jobs?repo=try&revision=4d39ecc4ad0fc982708e36e13580c392f25ea2b2

Lets see / hope that all the other canvas timeouts (bug 1890038) are fixed as well.

I can confirm that all these issues are fixed now. The try jobs are all green!

We also got the fix for the mutex deadlock landed, and I'll push another try with updated meta data (all previously changed metadata reverted) once it got merged to mozilla-central somtime today.

Depends on: 1890204

Now Android wpt jobs are failing with TestRunner timeouts. Sadly I missed to enable remote trace logs for the last try build, so that I had to push again to try for just Android:

https://treeherder.mozilla.org/jobs?repo=try&revision=88cad767424e03d0200a40d188993870d7c23bda

Status: REOPENED → ASSIGNED

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #52)

Now Android wpt jobs are failing with TestRunner timeouts. Sadly I missed to enable remote trace logs for the last try build, so that I had to push again to try for just Android:

https://treeherder.mozilla.org/jobs?repo=try&revision=88cad767424e03d0200a40d188993870d7c23bda

The problem here seems actually be a delay in finish loading the new window on Android. This can be seen via logging as done to adb logcat and manifests in a browser-delayed-startup-finished notification that gets send out very late while other commands are already running. Current wptrunner is using window.open() and then awaits the load event for the new window's document if not yet finished loading. Right after a navigation is started for the test page. And here the WebProgress listener sees a hick-up and suddenly stops to work, resulting in the page never reaching the readyState=complete and the testharness to timeout.

With my patch to use Marionette's NewWindow command the problem is there as well but could be easily fixed by delaying the return of GeckoViewTabUtil.createNewTab() until the window has been finished initializing / loading. As of now this method only waits for the geckoview-window-created notification which actually is fired too early, and the navigation as triggered right after causes a collision with the navigation to the test page as triggered by wptrunner. Changing the notification to use browser-delayed-startup-finished as well will workaround the problem to let tests work correctly. Tests that could be used to demonstrate the behavior are under /referrer-policy/generic/.

My plan is to patch the GeckoViewTabUtil.createNewTab() function as mentioned above which would allow us to remove around 14.000 intermittent timeout entries across all wpt tests! Then I will file a bug to get the underlying issue investigated.

Depends on: 1891008

With the above method applied the tests for Fission on Android look great. There is not a single timeout of the test harness anymore! Here a try build: https://treeherder.mozilla.org/jobs?repo=try&revision=0ed041267314f5db7a84f528d4ff1f9140543e0d

All the test failures that were still happening are due to the fact that the test finally runs now. I've updated the related meta data and pushed a new try build to verify it works all fine:

https://treeherder.mozilla.org/jobs?repo=try&revision=630b44e6d511312e68e50e7f108426aca0b10aaf

Attachment #9394992 - Attachment description: Bug 1522790 - [wpt] Update meta files for multiple statuses mainly for tsan builds. → Bug 1522790 - [wpt] Update meta files for multiple statuses.
Regressions: 1890113
Regressions: 1891476
Blocks: 1891476
No longer regressions: 1891476

As discussed with the team we are going to bump the number of points based on the amount of extra work that was necessary so far to get working patches done. Hopefully no more work is needed for this bug.

Points: 5 → 8
Status: ASSIGNED → RESOLVED
Closed: 10 months ago10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 127 Branch

(In reply to Iulian Moraru from comment #58)

https://hg.mozilla.org/mozilla-central/rev/d180feadd8b4
https://hg.mozilla.org/mozilla-central/rev/b5b4a44dd06b

So what are these landings and why is the bug marked as fixed? This was - as it looks like - a no-op push?

The latest try build for the remaining Android fixes is green so I'm going to land all the revisions of the stack.

Status: RESOLVED → REOPENED
Flags: needinfo?(imoraru)
Resolution: FIXED → ---

This cannot be landed yet due to the latest wpt-sync causes merge conflicts in quite a lot of files now. I'll have to rebase the last two revisions.

Flags: needinfo?(imoraru)

So I've rebased to latest mozilla-central, which includes the recent metadata updates from the downstream wpt-sync. Hopefully it's still all working. To verify before landing I pushed another try build:

https://treeherder.mozilla.org/jobs?repo=try&revision=f5d54cd034abe7508e03e825bfb42f472bb3c093

Status: REOPENED → ASSIGNED
Target Milestone: 127 Branch → ---
See Also: → 1785603
Regressions: 1891706
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ed9818780499 Use marionette new window command to open wptrunner windows r=jgraham https://hg.mozilla.org/integration/autoland/rev/d099e86e9a3f [wpt] Fix intermittent test failures for short timer. r=whimboo https://hg.mozilla.org/integration/autoland/rev/8ea7b13449af [wpt] Fix intermittent failures in font-face-[003, 004].html. r=jgraham https://hg.mozilla.org/integration/autoland/rev/91d70d27459b [wptrunner] Fix MarionetteWindowProtocolPart.set_rect() for optional arguments. r=jgraham https://hg.mozilla.org/integration/autoland/rev/7c2605ef5f00 [wpt] Update meta files for multiple statuses. r=jgraham https://hg.mozilla.org/integration/autoland/rev/4e0801e07b86 [wpt] Remove metadata for all the intermittently timing out tests for Android Fission. r=jgraham https://hg.mozilla.org/integration/autoland/rev/0cc4328d899f [wpt] Update meta data for tests previously timing out for Android Fission. r=jgraham
Failed to create upstream wpt PR due to merge conflicts. This requires fixup from a wpt sync admin.
Flags: needinfo?(james)
Flags: needinfo?(aborovova)
Whiteboard: [geckoview:p3][webdriver:m11], [wptsync upstream] → [geckoview:p3][webdriver:m11], [wptsync upstream error]
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/45735 for changes under testing/web-platform/tests
Whiteboard: [geckoview:p3][webdriver:m11], [wptsync upstream error] → [geckoview:p3][webdriver:m11], [wptsync upstream]
Flags: needinfo?(james)
Flags: needinfo?(aborovova)
Pushed by acseh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/525e8e012c7b [wpt] Update metadata for /fetch/metadata/trailing-dot.https.sub.any*.html tests. CLOSED TREE
Regressions: 1891766
Upstream PR was closed without merging
Flags: needinfo?(hskupin)
Backout by abutkovits@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/23ea0b523dc6 Backed out 8 changesets as requested by Whimboo CLOSED TREE
Depends on: 1891766
No longer regressions: 1891766
No longer blocks: 1891476
Depends on: 1891476

Comment on attachment 9394990 [details]
Bug 1522790 - [wpt] Fix intermittent failures in font-face-[003, 004].html.

Revision D206622 was moved to bug 1891476. Setting attachment 9394990 [details] to obsolete.

Attachment #9394990 - Attachment is obsolete: true
Attachment #9396911 - Attachment is obsolete: true
Depends on: 1890113
No longer regressions: 1890113

For being able to better analyze the slowness in navigation with the changes applied I would like to first change the navigation handling in Marionette from page load events to the webprogress listener. This means we are blocked on bug 1664165.

Depends on: 1664165
Flags: needinfo?(hskupin)
Depends on: 1891706
No longer regressions: 1891706

How come comment #68 added massive if (os == "android") and fission: [OK, TIMEOUT] everywhere? Not sure that's about the backed out patches?

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #73)

Oh maybe https://phabricator.services.mozilla.com/D207318 is the culprit 🙂

Yes, sadly we had to backout the patch again. With the next landing it should hopefully be sticky and we will have way lesser timing out tests on Android.

I'm not sure those are actual timeouts or just bug 1798871, I've been removing them from my modules and so far there was no other Android timeouts happened.

Note that there could be a race on GeckoView when opening a new window and when it is actually ready to work with. I've fixed a specific case (bug 1891008) for Marionette but the API will be in use with this patch series. I'll observe the other bug to stay up-to-date. Thanks!

See Also: → 1904157

I'm not able to finish this work in M11. As such moving to M12.

Whiteboard: [geckoview:p3][webdriver:m11], [wptsync upstream] → [geckoview:p3][webdriver:m12], [wptsync upstream]
Blocks: 1677190
Blocks: 1903929
See Also: → 1914191
Depends on: 1914209

Comment on attachment 9420158 [details]
Bug 1522790 - Update test grouping logic for web-platform-tests-canvas

Revision D219761 was moved to bug 1914209. Setting attachment 9420158 [details] to obsolete.

Attachment #9420158 - Attachment is obsolete: true
Blocks: 1914191

After the fix on bug 1761634 we should now have more stable results. As such I pushed another complete try build for web-platform tests. Lets cross fingers that we are mostly green now.

https://treeherder.mozilla.org/jobs?repo=try&revision=b315c19d563e9502bf95beb06051aa601ebb88e0

Depends on: 1761634
No longer blocks: 1914191
Depends on: 1914191
See Also: 1914191

We can remove bug 1664165 from the dependency list given that the WebProgress listener is not needed to get these patches landed.

No longer depends on: 1664165
No longer depends on: 1761634
See Also: → 1884684
Whiteboard: [geckoview:p3][webdriver:m12], [wptsync upstream] → [fxdroid][geckoview:p3][webdriver:m12], [wptsync upstream]

Thanks to Olivia there is now a Google document that is used to investigate all the unexpected results of tests for Android. Hopefully it won't take such a long time to finish.

Actually I wrongly triggered the web-platform tests on Android with Fission enabled instead of with SHIP. So the results shouldn't be taken into account. To ensure that SHIP is no longer affected by timeouts of the testharness I've triggered those jobs now:

https://treeherder.mozilla.org/jobs?repo=try&revision=3935508799064fd44cdd7d5caa3927040cd0faa1&searchStr=android%2Cship

So far it looks good and if there are unexpected passes / fails those should be handled on a different bug because SHIP as well is not a default job.

Attachment #9051876 - Attachment description: Bug 1522790 - Use marionette new window command to open wptrunner windows → Bug 1522790 - [wptrunner] Use marionette new window command to open wptrunner windows
Attachment #9394992 - Attachment description: Bug 1522790 - [wpt] Update meta files for multiple statuses. → Bug 1522790 - [wpt] Update wpt meta files for multiple statuses.
Blocks: 1919775
Attachment #9394992 - Attachment is obsolete: true

Bug 1891706 didn't occur anymore in the last try build so lets remove it from the dependency list. But I just stumbled over bug 1890113 which is as well on the dependency list and wasn't investigated yet. I triggered wpt jobs for that build type on the same try build from comment 83.

No longer depends on: 1891706
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8046864b1279 [wptrunner] Use marionette new window command to open wptrunner windows r=jgraham,jdescottes https://hg.mozilla.org/integration/autoland/rev/377a55ac4ec5 [wpt] Fix intermittent test failures for short timer. r=whimboo https://hg.mozilla.org/integration/autoland/rev/a79c4f87abb4 [wptrunner] Fix MarionetteWindowProtocolPart.set_rect() for optional arguments. r=jgraham

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #84)

Bug 1891706 didn't occur anymore in the last try build so lets remove it from the dependency list.

This failure is still present but intermittently and most likely needs a given set of tests run. On autoland we randomly choose tests so this might not have been seen on try. I'll take a look at it.

Blocks: 1891706
No longer blocks: 1891706
Regressions: 1891706
Status: ASSIGNED → RESOLVED
Closed: 10 months ago4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch
Regressions: 1920056
Regressions: 1920301
Upstream PR merged by moz-wptsync-bot
Pushed by wptsync@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/409cd8a11bb8 [wpt PR 45735] - [Gecko Bug 1522790] Use marionette new window command to open wptrunner windows, a=testonly
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: