Closed Bug 1501562 Opened Last year Closed 7 months ago

Run web-platform-tests for GeckoView on Packet x86-64

Categories

(Firefox for Android :: Testing, enhancement, P3)

x86_64
Android
enhancement

Tracking

()

RESOLVED FIXED
Firefox 68
Tracking Status
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: cpeterson, Assigned: KWierso)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [geckoview:p2][wptsync upstream])

Attachments

(12 files, 1 obsolete file)

954 bytes, patch
snorp
: review+
nalexander
: review+
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
572.80 KB, patch
Details | Diff | Splinter Review
50.02 KB, text/plain
Details
50.96 KB, text/plain
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
We don't run web-platform-tests on Fennec, but we'd like to run them on GeckoView, where possible:

Bitbar ARM32 (Moto G5)
Bitbar ARM64 (Pixel 2)
Packet x86-64

We don't need to test Android x86 if we are testing Android x86-64.
Whiteboard: [geckoview]
wpt takes 4 to 6 hours per push to run on desktop. It is unlikely we will find capacity to run these on bitbar.
Hardware: Unspecified → x86_64
Summary: Run web-platform-tests for GeckoView on Bitbar and Packet → Run web-platform-tests for GeckoView on Packet x86-64
Priority: -- → P3
Whiteboard: [geckoview] → [geckoview:p2]

wpt currently runs on x86 Fennec. When we move wpt to x86_64, we will want to use the GV TestRunnerActivity instead of Fennec.

Bug 1496773 was recently resolved and seems to do very good things, but I find wpt still doesn't run in TestRunnerActivity: 'mach -v wpt testing/web-platform/tests/infrastructure/ --package=org.mozilla.geckoview.test' opened TestRunnerActivity but failed to run any tests or open any test pages. I think that's expected - https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c45 - and we need another patch or two to land.

Depends on: 1522790
See Also: → 1524673

(In reply to Geoff Brown [:gbrown] from comment #3)

Bug 1496773 was recently resolved and seems to do very good things, but I find wpt still doesn't run in TestRunnerActivity: 'mach -v wpt testing/web-platform/tests/infrastructure/ --package=org.mozilla.geckoview.test' opened TestRunnerActivity but failed to run any tests or open any test pages. I think that's expected - https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c45 - and we need another patch or two to land.

I think the real issue with wpt specifically is that wpt expects to be able to drive two independent tabs, and TRA just doesn't support tabs like that. (AFAICT.) That is, there's a window.open call that, AFAICT, TRA doesn't support. In fact, none of our GV-based browsers support window.open AFAICT -- see https://github.com/mozilla-mobile/android-components/issues/1503 -- so there's non-trivial product work to be done to support wpt more broadly.

I'm surprised there are problems as fundamental as that. Recall my hacked runs in https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c20 and https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c21 -- tests ran, and many passed.

(In reply to Geoff Brown [:gbrown] from comment #5)

I'm surprised there are problems as fundamental as that. Recall my hacked runs in https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c20 and https://bugzilla.mozilla.org/show_bug.cgi?id=1496773#c21 -- tests ran, and many passed.

I can't see logs for these jobs -- they've timed out. Can you repush, or rebase-and-repush? wpt tests many things; some are tests are just ensuring the harness gets set up. Maybe I'm wrong about tabs, but what I was running locally was very clearly trying to cross-tab communicate, and I don't see how that will work in TRA. I'd be thrilled to be wrong here!

(In reply to Geoff Brown [:gbrown] from comment #7)

I rebased to that very old 5c892a6147ae revision and:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=99aec186dcaf682fb18e2086bba3edf2b73c9fbb

From a quick glance at that, it looks like most of those are are legitimate differences in expectation data. The only ones that jump out as possible issues with the lack of tab support is chunk 16 https://treeherder.mozilla.org/logviewer.html#?job_id=226937767&repo=try

Did a couple of try pushes to see where things stand:

The bottom push is m-c tip with just the switch from using Fennec to using GeckoView, plus some chunking changes and the switch from e10s: False to e10s: True. Everything there fails. WPT-Reftests fail like this, while non-reftest WPT fails like this.

The middle push is that, plus some pieces of gbrown's hacks commit. The "selectTab" part of the hacks commit is already in m-c, so have been omitted from this. With this piece, WPT-Reftests are running successfully (most chunks have failures, though some completely pass), while regular WPT runs and mostly fails with error messages like "TypeError: window.__wptrunner_process_next_event is not a function"

The top push is that, plus the rest of the pieces of gbrown's hacks commit, setting permanentKey in a couple of places. The part where it switches navigator:browser for navigator:geckoview already seems to be handled in m-c, so was omitted here. With this, WPT Reftests are still running like in the middle push, and it looks like regular WPT is doing likewise.

So as it stands, gbrown's try push is getting further than any of my try pushes. The parts I omitted from my try pushes are slightly different than what their equivalent parts already in m-c look like. Maybe some of those differences are significant? See m-c's driver.js vs gbrown's driver.js, and also see m-c's browser.js vs gbrown's browser.js.

I'm at the end of my day, I can try doing more try pushes next week seeing if I can switch each of those pieces over to gbrown's versions and see if that makes things run better.

Both of those differences are from bug 1496773. Hrm.

So none of the hacks as injected by Geoff for his try pushes would be needed anymore, right? Here an overview:

  • Geckoview uses an observer notification to inform Marionette that it has been fully started up (bug 1524673)
  • newSession's waitForWindow() method has a fallback for GeckoView (bug 1496773)
  • permanentkey is getting set by GeckoView (bug 1520226)
  • switching tabs works since bug 1496773

Based on that the only patch to at least run the tests would be the taskcluster one from:
https://hg.mozilla.org/try/rev/69791e89297953a549780e789cc9370fba76cc81

I pushed a try build with those changes to get a view of the current state:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1aa3dc19ac5efc2e124eff963eeb8202759ded30

Depends on: 1520226

The try build shows a lot of the following failures which is bug 1511764:

AttributeError: 'MarionetteTestharnessProtocolPart' object has no attribute 'marionette'

I will try to build GeckoView locally and see if I can reproduce it.

Depends on: 1511764
No longer depends on: 1511764

That bug is a harness bug that happens when the browser has already failed to start or marionette has failed to connect. The underlying issue is whatever's causing the browser startup or (more likely) marionette connection to fail which in this case is presumably missing some part of gbrown's changes that are still required in spite of the geckoview changes.

Cannot we raise or bubble up the exception earlier with more details why Marionette failed to connect? I really miss it to see those details here. The not attribute error is pretty much not helpful. Or is that not something you want for wptrunner?

Flags: needinfo?(james)

I built Fennec / GeckoView locally but how do I run the wpt tests with GV inside an emulator? Doesn't mach wpt automatically launch the emulator?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #14)

Cannot we raise or bubble up the exception earlier with more details why Marionette failed to connect? I really miss it to see those details here. The not attribute error is pretty much not helpful. Or is that not something you want for wptrunner?

Lets actually move this over to bug 1511764.

Flags: needinfo?(james)

The point is that all the logging we have is there, it's just that it's obscured by a later exception (which I now put a patch for in the other bug). So the logs will get less confusing but they won't get more information.

The patch for marionette.js for final startup (https://hg.mozilla.org/try/rev/a47cc9120926e66bdd020eb4de39bf2452f415e8) should not be necessary given that Marionette handles marionette-startup-requested now:

https://searchfox.org/mozilla-central/source/testing/marionette/components/marionette.js#418

And bug 1524673 should have enabled sending this notification when GeckoView starts up. But maybe we have a problem and remote debugging is not enabled by default, and that's why Marionette doesn't initialize? Also note Andreas' comment on bug 1524673 comment 11.

Also as already noted above the permanentKey workaround is a no-op now since bug 1520226 is fixed and GeckoView sets it on browsers.

Nick, maybe you can check the startup logic? Something doesn't seem to be right yet.

I would still be interested to know how to run geckoview locally. So if someone could provide the steps I would kinda appreciate.

Flags: needinfo?(nalexander)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #15)

I built Fennec / GeckoView locally but how do I run the wpt tests with GV inside an emulator? Doesn't mach wpt automatically launch the emulator?

Can someone please help me with that? Thanks.

Flags: needinfo?(wkocher)
Flags: needinfo?(gbrown)

The way I run locally is: 'mach android-emulator && <install test app> && mach wpt <some-test-directory>'

For Fennec, <install test app> is 'mach install'.

For geckoview TestRunnerActivity, <install test app> is 'mach gradle geckoview:installWithGeckoBinariesDebugAndroidTest'; see https://developer.mozilla.org/en-US/docs/Mozilla/Geckoview-Junit_Tests#Running_tests_locally for more info.

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] (away Feb 18, 19) from comment #21)

The way I run locally is: 'mach android-emulator && <install test app> && mach wpt <some-test-directory>'

For Fennec, <install test app> is 'mach install'.

For geckoview TestRunnerActivity, <install test app> is 'mach gradle geckoview:installWithGeckoBinariesDebugAndroidTest'; see https://developer.mozilla.org/en-US/docs/Mozilla/Geckoview-Junit_Tests#Running_tests_locally for more info.

I got it to install but whenever I run mach wpt it picks up the fennec package but not geckoview. So how can I say that I want to run the wpt tests with geckoview?

Oh, sorry, I forgot to say!

mach wpt <some-test-directory> --package=org.mozilla.geckoview.test

Hrm, I used this mozconfig: https://gist.github.com/KWierso/21718414b4b5ee52785330a49469f68b

And ran the following commands:
./mach build && ./mach package && ./mach android-emulator && ./mach gradle geckoview:installWithGeckoBinariesDebugAndroidTest --stacktrace

And that failed like this: https://gist.github.com/KWierso/8d1616f1927a534fb4f4c93197551d0a

I'm wondering if this is because it's the x86-7.0 emulator? The ARM emulator is so slow, I'd rather avoid using it if I can...

Flags: needinfo?(wkocher) → needinfo?(gbrown)

Scratch that, re-ran some commands and it installed Geckoview successfully.

I then run https://gist.github.com/KWierso/f514a1a9b85908c57bbdc77778577426

and it starts getting into a loop of https://gist.github.com/KWierso/2d9059041794bd54c2a10df03e32992e

Didn't see how long it would take to fully quit or if it would loop forever, I killed it after the third iteration.

I believe that will happen if marionette is not enabled. Try adding 'ac_add_options --enable-marionette' to your mozconfig, re-build, re-install....If that still fails, have a look at the logcat - 'adb logcat'.

Flags: needinfo?(gbrown)

Re-added enable-marionette to my mozconfig (it got omitted after I re-ran mach bootstrap to switch to doing geckoview work, I guess), rebuilt, repackaged, reinstalled.

mach wpt still times out. Here's my logcat from the output of mach wpt up through at least the first timeout.

I don't see any mention of "marionette" in there, so something's still not right...

I see the same thing.

At issue is that after bug 1524673, marionette is not initialized unless remote debugging is enabled, and remote debugging is not enabled in TestRunnerActivity.

With this patch, marionette initializes in TRA and I can run wpt reasonably well. Is this what we want to do?

Attachment #9045425 - Flags: review?(snorp)
Attachment #9045425 - Flags: review?(nalexander)
Comment on attachment 9045425 [details] [diff] [review]
bug1501562remotedebug

Review of attachment 9045425 [details] [diff] [review]:
-----------------------------------------------------------------

Yep.
Attachment #9045425 - Flags: review?(nalexander) → review+
Flags: needinfo?(nalexander)

Inbound's closed for backlog, pushed gbrown's patch to phabricator so it can go in the autoland queue.

Attachment #9045425 - Flags: review?(snorp) → review+

Rebased the various patches onto a recent m-c tip that includes the new patch from here and did some try pushes.

The ones that include the hacks failed in weird timeout loops ( timeout: Timed out waiting for connection on 127.0.0.1:2828! )

The one that just included the switch to use GeckoView gets further:

WPT reftests seem to be working (with a need for updated expectation metadata; I can do that in the next round).

Regular WPT seems to be failing everywhere with errors like TypeError: window.__wptrunner_process_next_event is not a function from "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 789, in _handle_error. If I look in the log files, though, I see a bunch of TEST-OKs. Not sure why some tests are working. I haven't looked closely, maybe those ones are tests that are already expected to be failing, so this failure doesn't count as unexpected?

Keywords: leave-open

That is a failure when trying to execute the script_resume script, which is loaded here from testharness_webdriver_resume.js:

https://searchfox.org/mozilla-central/rev/b36e97fc776635655e84f2048ff59f38fa8a4626/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py#652-653

https://searchfox.org/mozilla-central/rev/b36e97fc776635655e84f2048ff59f38fa8a4626/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/testharness_webdriver_resume.js

Here __wptrunner_process_next_event() is defined:

https://searchfox.org/mozilla-central/rev/b36e97fc776635655e84f2048ff59f38fa8a4626/testing/web-platform/tests/tools/wptrunner/wptrunner/testharnessreport.js#20

So something causes this method to disappear. Maybe it is relate to the landing of bug 1514001? That touched a lot of code in that area, and I also see those failures maybe not that much on trunk.

Once we get the wpt issues sorted out, we'll want to update all android annotation data for wpt tests, since Geckoview seems to have different results compared to what we've been expecting on Fennec. This patch should remove all lines that contain os == "android" from wpt's expectation metadata files, so we can start from scratch.

Assignee: nobody → wkocher

Does Fennec not use these same code paths? Or maybe there's some timing issue where the window just isn't ready yet? Also curious why the wpt-reftests aren't affected.

Maybe you could do a try build from a baseline which is before bug 1514001 landed?

Otherwise it would be great to have a try build with Marionette trace logs enabled. That would help me to identify where problems in Marionette could be.

Here's a try push with the baseline being the push to m-c prior to the one that included bug 1514001, with as many relevant commits that happened after that point that I could remember grafted on top to get geckoview stuff working: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=43eb79c27bad9278cb5b4c2f1f62a892eab089fe

I'm not sure how to enable trace logging on a try push, I can only find passing "-vv" to ./mach test documented.

Looks like there's lots of actual test failures, but that window.__wptrunner_process_next_event is not a function error is gone.

(In reply to Wes Kocher (:KWierso) from comment #41)

Looks like there's lots of actual test failures, but that window.__wptrunner_process_next_event is not a function error is gone.

That is great to hear, and the only thing I have to know for now. Is there an actual bug to fix that specific regression on mozilla-central? It would be great to really wait with another attempt to run wpt for geckoview given that it produced so many errors.

Here's switching to geckoview on top of setting the marionette trace logging on top of a recent rebase to the tip of m-c: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=509c7f8d630da5dd3d1954e751db4b7b65348691

I'm not seeing any obvious extra logging going on, though, so I'm not sure I set up the trace logging correctly...

(In reply to Wes Kocher (:KWierso) from comment #43)

I'm not seeing any obvious extra logging going on, though, so I'm not sure I set up the trace logging correctly...

You did. Because GeckoView runs in the emulator all the logging ends-up in the adb logcat. The only problem is that it makes is hard to correlate with the normal log due to time differences of a couple of seconds.

I can see lots of those failures:

03-05 22:37:56.450 14933 14948 E GeckoConsole: [JavaScript Error: "TypeError: this.webNavigation is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 326}]
03-05 22:37:56.450 14933 14948 E GeckoConsole: [JavaScript Error: "RemoteWebProgress failed to call onLocationChange: [Exception... "[JavaScript Error: "this.webNavigation is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 326}]'[JavaScript Error: "this.webNavigation is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 326}]' when calling method: [nsIWebProgressListener::onLocationChange]" nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)" location: "JS frame :: resource://gre/modules/RemoteWebProgress.jsm :: _callProgressListeners :: line 156" data: yes]
03-05 22:37:56.450 14933 14948 E GeckoConsole: " {file: "resource://gre/modules/RemoteWebProgress.jsm" line: 158}]
03-05 22:37:56.450 14933 14948 E GeckoConsole: _callProgressListeners@resource://gre/modules/RemoteWebProgress.jsm:158:14

Maybe this could be related and Marionette is not able to navigate to the requested test page? Nick, could you please help?

Flags: needinfo?(nalexander)

So the try push with the single-window wptrunner changes backed out looks really promising. The one with those changes obviously looks more problematic.

I wonder if the set of tests that's failing with that error is approximately constant or if it's highly variable. In the latter case I would expect that the problem is a race condition, which could maybe happen if navigate doesn't correctly block on the page being loaded. The reason I suspect that is __process_next_event is a function defined in a content script [1]. The test page should load that script and when it's completely loaded navigate() should return and the webdriver_resume.js script should call the function. If it's not defined an obvious reason would be navigate returning too early and so the function not existing. Other obvious possibilities include something to do with the script sandbox, although you would then expect it to affect every test, or something to do the test navigating away from the initial page, but that would also apply on desktop.

[1] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/testharnessreport.js#20

So to debug this further I would look at the state the page is in when we're at [1]. The easiest way to do that is to add code to introspect the DOM at [2] (i.e. in the script we inject). Obvious things to inspect include document.readyState and innerHTML. There might be some difficulty figuring out how to exfiltrate that data; hopefully it's possible to set things up so that either dump or console.log appears in the logcat.

[1] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py#718
[2] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py#718

I added some logging to [2].

Here are all of the non-function properties and their values of document:

[JavaScript Warning: "onmozfullscreenchange is deprecated." {file: "testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py" line: 84}]
[JavaScript Warning: "onmozfullscreenerror is deprecated." {file: "testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py" line: 84}]
activeElement - [object HTMLBodyElement]
alinkColor - 
all - [object HTMLAllCollection]
anchors - [object HTMLCollection]
applets - [object HTMLCollection]
ATTRIBUTE_NODE - 2
baseURI - about:blank
bgColor - 
body - [object HTMLBodyElement]
CDATA_SECTION_NODE - 4
characterSet - UTF-8
charset - UTF-8
childElementCount - 1
childNodes - [object NodeList]
children - [object HTMLCollection]
COMMENT_NODE - 8
compatMode - BackCompat
contentType - text/html
cookie - 
currentScript - null
defaultView - [object Window]
designMode - off
dir - 
doctype - null
DOCUMENT_FRAGMENT_NODE - 11
DOCUMENT_NODE - 9
DOCUMENT_POSITION_CONTAINED_BY - 16
DOCUMENT_POSITION_CONTAINS - 8
DOCUMENT_POSITION_DISCONNECTED - 1
DOCUMENT_POSITION_FOLLOWING - 4
DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC - 32
DOCUMENT_POSITION_PRECEDING - 2
DOCUMENT_TYPE_NODE - 10
documentElement - [object HTMLHtmlElement]
documentURI - about:blank
domain - 
ELEMENT_NODE - 1
embeds - [object HTMLCollection]
ENTITY_NODE - 6
ENTITY_REFERENCE_NODE - 5
fgColor - 
firstChild - [object HTMLHtmlElement]
firstElementChild - [object HTMLHtmlElement]
fonts - [object FontFaceSet]
forms - [object HTMLCollection]
fullscreen - false
fullscreenElement - null
fullscreenEnabled - true
head - [object HTMLHeadElement]
hidden - false
images - [object HTMLCollection]
implementation - [object DOMImplementation]
inputEncoding - UTF-8
isConnected - true
lastChild - [object HTMLHtmlElement]
lastElementChild - [object HTMLHtmlElement]
lastModified - 03/14/2019 22:29:12
lastStyleSheetSet - null
linkColor - 
links - [object HTMLCollection]
location - about:blank
mozFullScreen - false
mozFullScreenElement - null
mozFullScreenEnabled - true
nextSibling - null
nodeName - #document
nodeType - 9
nodeValue - null
NOTATION_NODE - 12
onabort - null
onafterscriptexecute - null
onanimationcancel - null
onanimationend - null
onanimationiteration - null
onanimationstart - null
onauxclick - null
onbeforescriptexecute - null
onblur - null
oncanplay - null
oncanplaythrough - null
onchange - null
onclick - null
onclose - null
oncontextmenu - null
oncopy - null
oncut - null
ondblclick - null
ondrag - null
ondragend - null
ondragenter - null
ondragexit - null
ondragleave - null
ondragover - null
ondragstart - null
ondrop - null
ondurationchange - null
onemptied - null
onended - null
onerror - null
onfocus - null
onfullscreenchange - null
onfullscreenerror - null
oninput - null
oninvalid - null
onkeydown - null
onkeypress - null
onkeyup - null
onload - null
onloadeddata - null
onloadedmetadata - null
onloadend - null
onloadstart - null
onmousedown - null
onmouseenter - null
onmouseleave - null
onmousemove - null
onmouseout - null
onmouseover - null
onmouseup - null
onmozfullscreenchange - null
onmozfullscreenerror - null
onpaste - null
onpause - null
onplay - null
onplaying - null
onpointerlockchange - null
onpointerlockerror - null
onprogress - null
onratechange - null
onreadystatechange - null
onreset - null
onresize - null
onscroll - null
onseeked - null
onseeking - null
onselect - null
onselectionchange - null
onselectstart - null
onshow - null
onstalled - null
onsubmit - null
onsuspend - null
ontimeupdate - null
ontoggle - null
ontouchcancel - null
ontouchend - null
ontouchmove - null
ontouchstart - null
ontransitioncancel - null
ontransitionend - null
ontransitionrun - null
ontransitionstart - null
onvisibilitychange - null
onvolumechange - null
onwaiting - null
onwebkitanimationend - null
onwebkitanimationiteration - null
onwebkitanimationstart - null
onwebkittransitionend - null
onwheel - null
ownerDocument - null
parentElement - null
parentNode - null
plugins - [object HTMLCollection]
pointerLockElement - null
policy - [object Policy]
preferredStyleSheetSet - 
previousSibling - null
PROCESSING_INSTRUCTION_NODE - 7
readyState - complete
referrer - 
rootElement - null
scripts - [object HTMLCollection]
scrollingElement - [object HTMLBodyElement]
selectedStyleSheetSet - 
styleSheets - [object StyleSheetList]
styleSheetSets - [object DOMStringList]
TEXT_NODE - 3
textContent - null
timeline - [object DocumentTimeline]
title - 
URL - about:blank
visibilityState - visible
vlinkColor - 

Of note, URL is about:blank, readyState is complete.
document.body and document.head are empty, which isn't surprising since it's on about:blank.

And here's window:

closed - false
crypto - [object Crypto]
customElements - [object CustomElementRegistry]
devicePixelRatio - 1
event - undefined
external - [object External]
frameElement - null
frames - [object Window]
fullScreen - false
history - [object History]
innerHeight - 1392
innerWidth - 980
InstallTrigger - [object InstallTriggerImpl]
isSecureContext - false
length - 0
localStorage - null
locationbar - [object BarProp]
menubar - [object BarProp]
mozInnerScreenX - 0
mozInnerScreenY - 0
mozPaintCount - 1
name - 
navigator - [object Navigator]
onabort - null
onabsolutedeviceorientation - null
onafterprint - null
onanimationcancel - null
onanimationend - null
onanimationiteration - null
onanimationstart - null
onauxclick - null
onbeforeprint - null
onbeforeunload - null
onblur - null
oncanplay - null
oncanplaythrough - null
onchange - null
onclick - null
onclose - null
oncontextmenu - null
ondblclick - null
ondevicelight - null
ondevicemotion - null
ondeviceorientation - null
ondeviceproximity - null
ondrag - null
ondragend - null
ondragenter - null
ondragexit - null
ondragleave - null
ondragover - null
ondragstart - null
ondrop - null
ondurationchange - null
onemptied - null
onended - null
onerror - null
onfocus - null
onhashchange - null
oninput - null
oninvalid - null
onkeydown - null
onkeypress - null
onkeyup - null
onlanguagechange - null
onload - null
onloadeddata - null
onloadedmetadata - null
onloadend - null
onloadstart - null
onmessage - null
onmessageerror - null
onmousedown - null
onmouseenter - null
onmouseleave - null
onmousemove - null
onmouseout - null
onmouseover - null
onmouseup - null
onmozfullscreenchange - null
onmozfullscreenerror - null
onoffline - null
ononline - null
onorientationchange - null
onpagehide - null
onpageshow - null
onpause - null
onplay - null
onplaying - null
onpopstate - null
onprogress - null
onratechange - null
onreset - null
onresize - null
onscroll - null
onseeked - null
onseeking - null
onselect - null
onselectstart - null
onshow - null
onstalled - null
onstorage - null
onsubmit - null
onsuspend - null
ontimeupdate - null
ontoggle - null
ontouchcancel - null
ontouchend - null
ontouchmove - null
ontouchstart - null
ontransitioncancel - null
ontransitionend - null
ontransitionrun - null
ontransitionstart - null
onuserproximity - null
onvolumechange - null
onvrdisplayactivate - null
onvrdisplayconnect - null
onvrdisplaydeactivate - null
onvrdisplaydisconnect - null
onvrdisplaypresentchange - null
onwaiting - null
onwebkitanimationend - null
onwebkitanimationiteration - null
onwebkitanimationstart - null
onwebkittransitionend - null
onwheel - null
opener - null
orientation - 0
origin - null
outerHeight - 1136
outerWidth - 800
pageXOffset - 0
pageYOffset - 0
parent - [object Window]
performance - [object Performance]
personalbar - [object BarProp]
screen - [object Screen]
screenLeft - 0
screenTop - 0
screenX - 0
screenY - 0
scrollbars - [object BarProp]
scrollMaxX - 0
scrollMaxY - 0
scrollX - 0
scrollY - 0
self - [object Window]
sidebar - [object External]
speechSynthesis - [object SpeechSynthesis]
status - 
statusbar - [object BarProp]
toolbar - [object BarProp]
visualViewport - [object VisualViewport]

I got those from running a single test via ./mach wpt --package=org.mozilla.geckoview.test 2dcontext/compositing/2d.composite.image.source-out.html, which seems to consistently fail with either JavascriptException: SecurityError: The operation is insecure. or JavascriptException: TypeError: window.__wptrunner_process_next_event is not a function.

The tracebacks for both of those look very similar.

SecurityError:

Traceback (most recent call last):
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 605, in _run
    self.result = True, self.func(self.protocol, self.url, self.timeout)
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 719, in do_testharness
    self.script_resume % format_map, async=True)
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 61, in execute_script
    return method(script, new_sandbox=False, sandbox=None)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 1656, in execute_async_script
    rv = self._send_message("WebDriver:ExecuteAsyncScript", body, key="value")
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/decorators.py", line 26, in _
    return func(*args, **kwargs)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 598, in _send_message
    self._handle_error(err)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 618, in _handle_error
    raise errors.lookup(error)(message, stacktrace=stacktrace)
JavascriptException: SecurityError: The operation is insecure.
stacktrace:
	@testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:84:6
	@testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:105:8```

TypeError:
```ERROR /2dcontext/compositing/2d.composite.image.source-out.html - TypeError: window.__wptrunner_process_next_event is not a function
Traceback (most recent call last):
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 605, in _run
    self.result = True, self.func(self.protocol, self.url, self.timeout)
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 719, in do_testharness
    self.script_resume % format_map, async=True)
  File "/Users/wkocher/mozilla/mozilla-central/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 61, in execute_script
    return method(script, new_sandbox=False, sandbox=None)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 1656, in execute_async_script
    rv = self._send_message("WebDriver:ExecuteAsyncScript", body, key="value")
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/decorators.py", line 26, in _
    return func(*args, **kwargs)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 598, in _send_message
    self._handle_error(err)
  File "/Users/wkocher/mozilla/mozilla-central/testing/marionette/client/marionette_driver/marionette.py", line 618, in _handle_error
    raise errors.lookup(error)(message, stacktrace=stacktrace)
JavascriptException: TypeError: window.__wptrunner_process_next_event is not a function
stacktrace:
	@testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:104:8
	@testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:105:8```

And for completion's sake,

window.__wptrunner_url is /2dcontext/compositing/2d.composite.image.source-out.html
window.__wptrunner_testdriver_callback is rv => __webDriverCallback(rv)

Adding a 0.1 sleep() call after protocol.marionette.navigate() didn't seem to help. Adding a 0.1 sleep() call before protocol.marionette.navigate helped a bit, but was still pretty failure-prone. Adding a 0.1 sleep() call before AND after protocol.marionette.navigate helped a bit more. Adding a 0.5 sleep() call on its own before protocol.marionette.navigate seems to fully eliminate the failures, at least locally.

That seems to work. I was able to rerun that test I mentioned in comment 51 20 times with this sleep(0.5) before navigate() is called. As mentioned in the commit message, smaller sleeps made the situation better, but it still was failing about 50% of the time.

There's one other call to time.sleep() in this file. I'm wondering if these should all be sleep(0.1) and maybe add a multiplier to the value if we're testing Android? Or is it okay to add this 0.5s delay to every test run, even if desktop seems mostly unaffected?

Try run in progress at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=a904152891b6a80517281257946a42170a0ef4ed to see if 0.5s is sufficient for android on our CI. Maybe that needs even longer.

I wonder if Marionette is getting initialized too early for GeckoView. It might correlate with what I discussed quickly with snorp on bug 1534635 comment 5. If that is the case GeckoView needs a change to send this observer notification later after all init scripts have been run. Nick, could you have a look at this please?

Early results for that try push look improved. I still see a few JavascriptException: TypeError: window.__wptrunner_process_next_event is not a functions, but not nearly as many. So either the sleep needs to be longer or we need to fix the underlying raciness.

Yeah, we definitely don't want to be adding sleep() as a permanent solution. I think whimboo is correct and this is an issue with the marionette startup in geckoview.

Flags: needinfo?(james)

I am a bit curious why the reftests appear unaffected. Do they not go through the same startup procedure? Or are they just faster or slower enough to avoid all of this, or something?

Attachment #9051217 - Attachment is obsolete: true

Another option for the time being could be the usage of Wait().until() in combination with an execute_script() call which checks for the existence of this property. Even with a timeout of 5s it would return immediately once it is available.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #55)

I wonder if Marionette is getting initialized too early for GeckoView. It might correlate with what I discussed quickly with snorp on bug 1534635 comment 5. If that is the case GeckoView needs a change to send this observer notification later after all init scripts have been run. Nick, could you have a look at this please?

I don't understand what the thinking is here. The only way that the wpt harness can execute JS is via Marionette, so in order for the underlying window.__wptrunner... error to occur, wpt must have already successfully connected to Marionette. Are you saying that there's some further dependency that Marionette must wait for in order for it to actually function? (For what it's worth, the browsertime suite seems to work just fine, although maybe it's just waiting or just slow enough to avoid this issue.)

If that's the case, teach Marionette to wait for whatever event it needs to wait for. It makes no sense to push all of this logic out into every application that consumes Marionette; that's how we got to this mess: by having a bunch of special cases for Fennec that are inside Fennec rather than being encoded in Marionette.

Flags: needinfo?(nalexander)

When do we hit https://searchfox.org/mozilla-central/rev/8ff2cd0a27e3764d9540abdc5a66b2fb1e4e9644/mobile/android/modules/geckoview/GeckoViewRemoteDebugger.jsm#67?

Can we be sure that at this time all startup scripts (if used for GeckoView) have been run? If there are crucial modules which haven't been loaded at this time we clearly shouldn't initialize Marionette that early. Note that we all don't know the code of GeckoView and as such just want to know more in how startup/initialization works.

If that's the case, teach Marionette to wait for whatever event it needs to wait for. It makes no sense to push all of this logic out into every application that consumes Marionette; that's how we got to this mess: by having a bunch of special cases for Fennec that are inside Fennec rather than being encoded in Marionette.

You are disagreeing with yourself here. It was you who wanted to have a notification (marionette-startup-requested) sent by the application to inform that Marionette can be initialized. And we had a lot of special casing in Marionette which has blown up the complexity of the code, right. So having a clear point now when the application tells that it is ok to initialize Marionette is way more robust. Sorry, but I really don't understand what's up now.

Flags: needinfo?(nalexander)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #61)

If that's the case, teach Marionette to wait for whatever event it needs to wait for. It makes no sense to push all of this logic out into every application that consumes Marionette; that's how we got to this mess: by having a bunch of special cases for Fennec that are inside Fennec rather than being encoded in Marionette.

You are disagreeing with yourself here. It was you who wanted to have a notification (marionette-startup-requested) sent by the application to inform that Marionette can be initialized. And we had a lot of special casing in Marionette which has blown up the complexity of the code, right. So having a clear point now when the application tells that it is ok to initialize Marionette is way more robust. Sorry, but I really don't understand what's up now.

Ah, I might understand why you think I'm disagreeing with myself. I want the App-level to be able to say to Marionette: "Go, get started", because not all Apps have the same set of notifications/events or the same interpretations of those events. But if Marionette internally needs something to happen at the Gecko-level -- say, startup recorder stuff, or some profile-related event -- that should be encoded in Marionette itself. The split is between the App and the engine (Gecko). Marionette is part of the engine, and as such it should know how to wait for (or ready) the parts of the engine it needs. For things that need App collaboration, the App should tell Marionette whatever it needs to know: perhaps it supplies some interface for tab manipulation, for example.

At least, that's how I see this working -- but I'm late to this party and don't have great context on what's been tried and failed and/or what's planned.

When do we hit https://searchfox.org/mozilla-central/rev/8ff2cd0a27e3764d9540abdc5a66b2fb1e4e9644/mobile/android/modules/geckoview/GeckoViewRemoteDebugger.jsm#67?

Can we be sure that at this time all startup scripts (if used for GeckoView) have been run? If there are crucial modules which haven't been loaded at this time we clearly shouldn't initialize Marionette that early. Note that we all don't know the code of GeckoView and as such just want to know more in how startup/initialization works.

No, we can't be sure that all startup scripts for GeckoView have been run. But why should we? Only one feature or service can be last; it's not reasonable to have every feature say "wait for everything else, then it's my turn". If Marionette needs something, let's figure out what it is and figure out how to schedule Marionette after its dependencies. If those dependencies are in the engine layer, the scheduling should be inside Marionette. If those dependencies are at the App layer, the scheduling should be in the App layer.

Based on the reports -- i.e., the Marionette connection is made and some JS has been executed -- it sure sounds like there are additional dependencies within the engine layer that aren't being captured. (I'm considering GeckoView to be the App-layer here, because it's generally playing the role of Fennec or another consuming App here; but the lines are blurry, 'cuz some engine internals are also at the GeckoView-layer.)

Does that make sense? Is my mental model missing something or inconsistent?

Flags: needinfo?(nalexander) → needinfo?(hskupin)

One thing I see in the logcats:

In passing runs with time.sleep(1), it looks like this:

GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewContent"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewNavigation"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewSettings"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewNavigation: onLocationChange
GeckoViewSelectionAction[C]: handleEvent: visibilitychange
GeckoViewNavigation[C]: loadURI: uri=about:blank where=1 flags=0 tp=moz-nullprincipal:{9b50bd13-7c3c-46c3-8393-c01b986829be}
GeckoViewContent[C]: handleEvent: DOMTitleChanged
GeckoViewContent[C]: handleEvent: pageshow
GeckoViewSettings[C]: onSettingsUpdate {"useMultiprocess":true,"chromeUri":null,"screenId":0,"userAgentOverride":null,"allowJavascript":true,"userAgentMode":0,"viewportMode":0,"useTrackingProtection":false,"suspendMediaWhenInactive":false,"usePrivateMode":false,"displayMode":0,"fullAccessibilityTree":false}
GeckoViewNavigation[C]: shouldLoadURI about:blank
GeckoViewContent[C]: handleEvent: pagehide
GeckoViewAutoFill: Clearing auto-fill
GeckoViewNavigation: onLocationChange
GeckoViewContent[C]: handleEvent: DOMTitleChanged
GeckoViewContent[C]: handleEvent: pageshow

In failing runs without the sleep:

GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewContent"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewNavigation"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {"module":"GeckoViewSettings"}
GeckoViewModule: enableQueuing false
GeckoViewModule: dispatchQueued
GeckoViewSettings[C]: onSettingsUpdate {"useMultiprocess":true,"chromeUri":null,"screenId":0,"userAgentOverride":null,"allowJavascript":true,"userAgentMode":0,"viewportMode":0,"useTrackingProtection":false,"suspendMediaWhenInactive":false,"usePrivateMode":false,"displayMode":0,"fullAccessibilityTree":false}
GeckoViewNavigation: onLocationChange
GeckoViewNavigation[C]: shouldLoadURI about:blank
GeckoViewContent[C]: handleEvent: pagehide
GeckoViewAutoFill: Clearing auto-fill
GeckoViewContent[C]: handleEvent: DOMTitleChanged
GeckoViewContent[C]: handleEvent: pageshow
GeckoViewNavigation: onLocationChange

Noticeable differences:
In the passing run, there's these additional lines prior to onSettingsUpdate:

GeckoViewNavigation: onLocationChange
GeckoViewSelectionAction[C]: handleEvent: visibilitychange
GeckoViewNavigation[C]: loadURI: uri=about:blank where=1 flags=0 tp=moz-nullprincipal:{9b50bd13-7c3c-46c3-8393-c01b986829be}
GeckoViewContent[C]: handleEvent: DOMTitleChanged
GeckoViewContent[C]: handleEvent: pageshow

In the failing run, those are missing, but there's an onLocationChange after onSettingsUpdate.

Looks like the call to navigate() overlaps a currently active load of about:blank, which may cause an early return. Can you please also enable the Marionette trace logs, so that I can see details from Marionette too?

Flags: needinfo?(hskupin)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #64)

Looks like the call to navigate() overlaps a currently active load of about:blank, which may cause an early return. Can you please also enable the Marionette trace logs, so that I can see details from Marionette too?

Excellent sleuthing, kwierso! I see exactly this situation driving Firefox TV (based on GeckoView) with Marionette via my local geckodriver with a hacky implementation of Bug 1525126.

I concur that there's an active load of about:blank (although the URI doesn't matter, I had data:text/html, and a few other variants, all of which had the same behaviour. I think this is, in fact, the containing application -- e.g., for GVE it's

https://searchfox.org/mozilla-central/source/mobile/android/geckoview_example/src/main/java/org/mozilla/geckoview_example/GeckoViewActivity.java#210

and for firefox-tv I need to pass it in with -d URL to work around where it was getting set by

https://searchfox.org/mozilla-mobile/source/firefox-tv/app/src/main/java/org/mozilla/tv/firefox/MainActivity.kt#129

I worked around it just as you have: I try to load a pre URL, wait a second or two, and don't really care if the pre URL loads.

Now, what is to be done? I'm not sure. We'd like to have a way for the consuming application to know if it's being remote controlled, and to react as appropriate. For Fenix, this might color the URL bar and, potentially, even hide the throbber to make screen capturing more reliable. For other vehicles, including GVE, it might be used to avoid these initial loads so that Marionette and the vehicle aren't racing.

snorp: how do you feel about adding an "isRemoteControlled" API to the GeckoRuntime? It would correspond, I think, to navigator.webdriver.

Flags: needinfo?(snorp)

(In reply to Nick Alexander :nalexander [he/him] from comment #65)

the containing application -- e.g., for GVE it's

Note that the goal here is to run wpt in TestRunnerActivity, not geckoview_example (probably not very relevant, but want to avoid minor misunderstandings).

(In reply to Geoff Brown [:gbrown] from comment #66)

(In reply to Nick Alexander :nalexander [he/him] from comment #65)

the containing application -- e.g., for GVE it's

Note that the goal here is to run wpt in TestRunnerActivity, not geckoview_example (probably not very relevant, but want to avoid minor misunderstandings).

Excellent point. My guess is that if we don't include a URL here

https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/TestRunnerActivity.java#194-201

then GV itself loads about:blank. (I think that's a Web Platform standard.) If it is, we'd see a similar race.

Loading about:blank should be synchronous, so I'm not sure how we're observing a partially loaded state (although there can be later events, so maybe we're respoding to such an event).

But in any case I feel like marionette should be handling this and waiting for the window to complete the current load before returning from the new_session command. If that load is starting after marionette is initalised, we should move the start later until after the load of the inital document is at least started.

(In reply to James Graham [:jgraham] from comment #68)

Loading about:blank should be synchronous, so I'm not sure how we're observing a partially loaded state (although there can be later events, so maybe we're respoding to such an event).

When I mentioned the same a couple of weeks ago in #fxteam, Gijs mentioned that it might not be 100% the case. I cannot remember all the details, but we could expect that it is async.

But in any case I feel like marionette should be handling this and waiting for the window to complete the current load before returning from the new_session command. If that load is starting after marionette is initalised, we should move the start later until after the load of the inital document is at least started.

Hm, we have the following:
https://searchfox.org/mozilla-central/rev/7abb9117c8500ed20833746c9f8e800fce3a4688/testing/marionette/driver.js#754-765

Maybe this only waits for the XUL parts to be completed loading?

Otherwise the WebDriver spec doesn't say that a call to navigate should abort, or wait for an already running page load. At least I cannot find such a part.

I would really appreciate a Marionette trace log, so that we can see what's going on.

(In reply to Nick Alexander :nalexander [he/him] from comment #65)

Now, what is to be done? I'm not sure. We'd like to have a way for the consuming application to know if it's being remote controlled, and to react as appropriate. For Fenix, this might color the URL bar and, potentially, even hide the throbber to make screen capturing more reliable. For other vehicles, including GVE, it might be used to avoid these initial loads so that Marionette and the vehicle aren't racing.

The initial load is not something you can ever really avoid. As Nick says in comment #67 you can pass an initial url in the Intent, but we're going to load either that or about:blank.

I'm not totally convinced about the other points either. For screen capturing, doesn't marionette use internal gecko stuff for that? Since the throbber/urlbar is app UI it wouldn't be included anyway.

snorp: how do you feel about adding an "isRemoteControlled" API to the GeckoRuntime? It would correspond, I think, to navigator.webdriver.

I'm trying to be pretty protective of adding stuff here, as things can get out of hand quickly. What does isRemoteControlled mean if Marionette goes away or is replaced with something else? Right now I'm not convinced this new API adds enough value to combat the churn and maintenance burden.

Flags: needinfo?(snorp)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #59)

Another option for the time being could be the usage of Wait().until() in combination with an execute_script() call which checks for the existence of this property. Even with a timeout of 5s it would return immediately once it is available.

Hrm, even after time.sleep(1) which is sufficient time to ensure the test passes, this still prints out undefined in the logcat: protocol.base.execute_script("console.log(window.__wptrunner_process_next_event)")

Is this not the same window that misses the property, maybe?

This is the trace-logging logcat from me running ./mach wpt --package=org.mozilla.geckoview.test 2dcontext/compositing/2d.composite.image.source-out.html --no-pause-after-test and the test failing with JavascriptException: TypeError: window.__wptrunner_process_next_event is not a function.

And for comparison, this is with the sleep(2), which prevents the failure.

Without the sleep(2), it goes from

03-20 15:46:41.971  9308  9323 I Gecko   : 1553096801971	Marionette	TRACE	[2147483653] Frame script loaded
03-20 15:46:41.982  9308  9323 I Gecko   : 1553096801982	Marionette	TRACE	[2147483653] Frame script registered
03-20 15:46:41.992  9248  9263 I Gecko   : 1553096801992	Marionette	DEBUG	2 <- [1,11,null,{"value":null}]

Immediately to

03-20 15:46:42.027  9248  9263 I Gecko   : 1553096802027	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html"}]
03-20 15:46:42.048  9308  9323 D GeckoViewNavigation[C]: loadURI: uri=http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html where=1 flags=0x1 tp=null
03-20 15:46:42.126  9308  9323 I Gecko   : 1553096802126	Marionette	TRACE	[2147483653] Received DOM event beforeunload for about:blank
03-20 15:46:42.126  9308  9323 D GeckoViewNavigation[C]: shouldLoadURI http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html
03-20 15:46:42.137  9308  9323 I Gecko   : 1553096802137	Marionette	TRACE	[2147483653] Received DOM event beforeunload for about:blank
03-20 15:46:42.140  9308  9323 D GeckoViewNavigation[C]: shouldLoadURI about:blank
03-20 15:46:42.177  9308  9323 I Gecko   : 1553096802177	Marionette	TRACE	[2147483653] Received DOM event pagehide for about:blank
03-20 15:46:42.178  9308  9323 D GeckoViewContent[C]: handleEvent: pagehide
03-20 15:46:42.178  9308  9323 D GeckoViewAutoFill: Clearing auto-fill
03-20 15:46:42.181  9308  9323 I Gecko   : 1553096802180	Marionette	TRACE	[2147483653] Received DOM event unload for about:blank
03-20 15:46:42.198  9308  9323 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
03-20 15:46:42.199  9308  9323 I Gecko   : 1553096802199	Marionette	TRACE	[2147483653] Received DOM event DOMContentLoaded for about:blank
03-20 15:46:42.214  9248  9263 D GeckoViewNavigation: onLocationChange
03-20 15:46:42.248  9308  9323 I Gecko   : 1553096802248	Marionette	TRACE	[2147483653] Received DOM event pageshow for about:blank
03-20 15:46:42.249  9308  9323 D GeckoViewContent[C]: handleEvent: pageshow
03-20 15:46:42.364  9248  9263 I Gecko   : 1553096802364	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]
03-20 15:46:42.380  9248  9263 I Gecko   : 1553096802380	Marionette	DEBUG	2 -> [0,13,"WebDriver:ExecuteAsyncScript",{"scriptTimeout":null,"newSandbox":false,"args":[],"filename":"testing/web-platform/test ... _testdriver_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();","sandbox":null,"line":63}]
03-20 15:46:42.405  9248  9263 I Gecko   : 1553096802405	Marionette	DEBUG	2 <- [1,13,{"error":"javascript error","message":"TypeError: window.__wptrunner_process_next_event is not a function","stacktrace" ... cutormarionette.py:68:8\n@testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:69:8\n"},null]

...

With the sleep(2), it goes from

03-20 15:49:13.581  9752  9767 I Gecko   : 1553096953581	Marionette	TRACE	[2147483653] Frame script loaded
03-20 15:49:13.600  9752  9767 I Gecko   : 1553096953600	Marionette	TRACE	[2147483653] Frame script registered
03-20 15:49:13.607  9692  9707 I Gecko   : 1553096953607	Marionette	DEBUG	2 <- [1,11,null,{"value":null}]
03-20 15:49:13.698  9752  9767 D GeckoViewNavigation[C]: shouldLoadURI about:blank
03-20 15:49:13.756  9752  9767 D GeckoViewContent[C]: handleEvent: pagehide
03-20 15:49:13.757  9752  9767 D GeckoViewAutoFill: Clearing auto-fill
03-20 15:49:13.770  9752  9767 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
03-20 15:49:13.779  9692  9707 D GeckoViewNavigation: onLocationChange
03-20 15:49:13.788  9752  9767 D GeckoViewContent[C]: handleEvent: pageshow

Then sleeps for 2 seconds before going to

03-20 15:49:15.616  9692  9707 I Gecko   : 1553096955616	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html"}]
03-20 15:49:15.626  9752  9767 D GeckoViewNavigation[C]: loadURI: uri=http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html where=1 flags=0x1 tp=null
03-20 15:49:15.637  9752  9767 I Gecko   : 1553096955637	Marionette	TRACE	[2147483653] Received DOM event beforeunload for about:blank
03-20 15:49:15.639  9752  9767 D GeckoViewNavigation[C]: shouldLoadURI http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html
03-20 15:49:15.715  9752  9767 I Gecko   : 1553096955714	Marionette	TRACE	[2147483653] Received DOM event pagehide for about:blank
03-20 15:49:15.715  9752  9767 D GeckoViewContent[C]: handleEvent: pagehide
03-20 15:49:15.716  9752  9767 D GeckoViewAutoFill: Clearing auto-fill
03-20 15:49:15.748  9692  9707 D GeckoViewNavigation: onLocationChange
03-20 15:49:15.800  9752  9767 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
03-20 15:49:16.241  9752  9767 E Web Content: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html" line: 0}]
03-20 15:49:16.255  9752  9767 I Gecko   : 1553096956255	Marionette	TRACE	[2147483653] Received DOM event DOMContentLoaded for http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html
03-20 15:49:16.274  9692  9707 I Gecko   : 1553096956274	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]
03-20 15:49:16.283  9692  9707 I Gecko   : 1553096956283	Marionette	DEBUG	2 -> [0,13,"WebDriver:ExecuteAsyncScript",{"scriptTimeout":null,"newSandbox":false,"args":[],"filename":"testing/web-platform/test ... _testdriver_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();","sandbox":null,"line":63}]
03-20 15:49:16.291  9752  9767 D GeckoViewContent[C]: handleEvent: pageshow

If it helps, Henrik, jgraham's latest try push in bug 1522790 seems to be hitting the window.__wptrunner_process_next_event error pretty consistently on one particular OSX wpt chunk: https://treeherder.mozilla.org/#/jobs?repo=try&revision=608fb12a70a9b1ce8451fdacd51fabd114d15818&searchStr=6a2f8bc7c17e0f64faac4c5660c516e1b1fcde33&group_state=expanded&selectedJob=235021044

I'll try pulling that code and adding the trace logging and repushing that to try, if that's easier for you to debug than via Geckoview and adb. Try's closed for infra issues at the moment, so that'll have to wait.

Flags: needinfo?(hskupin)

(In reply to Wes Kocher (:KWierso) from comment #75)

If it helps, Henrik, jgraham's latest try push in bug 1522790 seems to be hitting the window.__wptrunner_process_next_event error pretty consistently on one particular OSX wpt chunk: https://treeherder.mozilla.org/#/jobs?repo=try&revision=608fb12a70a9b1ce8451fdacd51fabd114d15818&searchStr=6a2f8bc7c17e0f64faac4c5660c516e1b1fcde33&group_state=expanded&selectedJob=235021044

I'll try pulling that code and adding the trace logging and repushing that to try, if that's easier for you to debug than via Geckoview and adb. Try's closed for infra issues at the moment, so that'll have to wait.

Here's the try push for that: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=334de156ff533e640e6cc410543234675b9b5856

The MacOS case looks different, so lets not focus on that here. I had a look at the failing case with sleep(2) and it is fairly clear that we have an overlapping request for page load:

03-20 15:46:42.027 9248 9263 I Gecko : 1553096802027 Marionette DEBUG 2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html"}]
03-20 15:46:42.048 9308 9323 D GeckoViewNavigation[C]: loadURI: uri=http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html where=1 flags=0x1 tp=null
03-20 15:46:42.126 9308 9323 I Gecko : 1553096802126 Marionette TRACE [2147483653] Received DOM event beforeunload for about:blank
03-20 15:46:42.126 9308 9323 D GeckoViewNavigation[C]: shouldLoadURI http://web-platform.test:8000/2dcontext/compositing/2d.composite.image.source-out.html
03-20 15:46:42.137 9308 9323 I Gecko : 1553096802137 Marionette TRACE [2147483653] Received DOM event beforeunload for about:blank
03-20 15:46:42.140 9308 9323 D GeckoViewNavigation[C]: shouldLoadURI about:blank
03-20 15:46:42.177 9308 9323 I Gecko : 1553096802177 Marionette TRACE [2147483653] Received DOM event pagehide for about:blank
03-20 15:46:42.178 9308 9323 D GeckoViewContent[C]: handleEvent: pagehide
03-20 15:46:42.178 9308 9323 D GeckoViewAutoFill: Clearing auto-fill
03-20 15:46:42.181 9308 9323 I Gecko : 1553096802180 Marionette TRACE [2147483653] Received DOM event unload for about:blank
03-20 15:46:42.198 9308 9323 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
03-20 15:46:42.199 9308 9323 I Gecko : 1553096802199 Marionette TRACE [2147483653] Received DOM event DOMContentLoaded for about:blank
03-20 15:46:42.214 9248 9263 D GeckoViewNavigation: onLocationChange
03-20 15:46:42.248 9308 9323 I Gecko : 1553096802248 Marionette TRACE [2147483653] Received DOM event pageshow for about:blank
03-20 15:46:42.249 9308 9323 D GeckoViewContent[C]: handleEvent: pageshow
03-20 15:46:42.364 9248 9263 I Gecko : 1553096802364 Marionette DEBUG 2 <- [1,12,null,{"value":null}]

I assume this could happen at any time when code in Gecko is triggered to load eg about:blank. For GeckoView specific we might indeed have initialized Marionette too early to accidentally hit this case way more often.

Maybe Marionette should cancel another ongoing navigation request before it starts its own one? The HTML spec for navigate says:

If there is a preexisting attempt to navigate browsingContext, and the source browsing context is the same as browsingContext, and that attempt is currently running the unload a document algorithm, then return without affecting the preexisting attempt to navigate browsingContext.

That means that the attempt from Marionette to navigate will not be taken into account, and the call returns immediately.

James, maybe we need some spec changes?

Note that I currently don't have the time to help/work with/on this issue given that I'm on a different project the next weeks.

Flags: needinfo?(hskupin) → needinfo?(james)

(In reply to Nick Alexander :nalexander [he/him] from comment #65)

I concur that there's an active load of about:blank (although the URI doesn't matter, I had data:text/html, and a few other variants, all of which had the same behaviour. I think this is, in fact, the containing application -- e.g., for GVE it's

https://searchfox.org/mozilla-central/source/mobile/android/geckoview_example/src/main/java/org/mozilla/geckoview_example/GeckoViewActivity.java#210

So this is the right answer; locally I have a patch that works by changing a window.open("about:blank", […]) to window.open(undefined, […]). The problem (I think) is that passing about:blank ends up loading that URL twice; once in gecko as the initial content of the document, and again in GeckoViewActivity as the URL passed in. So I think there should be a special case somewhere here for when we are opening a new window and the requested URL is about:blank because gecko implictly loads that and we end up with two loads which is a perf problem and hard to track in cases like this where we are trying to determine when the window is fully open.

Flags: needinfo?(james)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=28b079200773fc9c068ff8a864d63a681b6ee1da is the try push and the issue with the function being undefined does appear to be solved. There are still many incorrect expectations and possibly some other issues, but we can take that patch to unblock this work if needed.

ni? nalexander for whether we can just fix this in the TestRunnerView rather than wallpapering over the issue in wptrunner.

Flags: needinfo?(nalexander)

(In reply to James Graham [:jgraham] from comment #79)

(In reply to Nick Alexander :nalexander [he/him] from comment #65)

I concur that there's an active load of about:blank (although the URI doesn't matter, I had data:text/html, and a few other variants, all of which had the same behaviour. I think this is, in fact, the containing application -- e.g., for GVE it's

https://searchfox.org/mozilla-central/source/mobile/android/geckoview_example/src/main/java/org/mozilla/geckoview_example/GeckoViewActivity.java#210

So this is the right answer; locally I have a patch that works by changing a window.open("about:blank", […]) to window.open(undefined, […]). The problem (I think) is that passing about:blank ends up loading that URL twice; once in gecko as the initial content of the document, and again in GeckoViewActivity as the URL passed in. So I think there should be a special case somewhere here for when we are opening a new window and the requested URL is about:blank because gecko implictly loads that and we end up with two loads which is a perf problem and hard to track in cases like this where we are trying to determine when the window is fully open.

Thanks, this is valuable detective work. I was pondering this over the weekend and I think the ideal end state is that GV knows what its initial URL is (from the containing App) and only loads that exactly once. This may already be the situation, and it's just that we're not doing that in all the containing Apps that are under test!

(In reply to James Graham [:jgraham] from comment #81)

ni? nalexander for whether we can just fix this in the TestRunnerView rather than wallpapering over the issue in wptrunner.

Yes, I strongly support making the GV-consuming vehicles do sensible things for automation rather than continue to work around these issues in harness.

I've looked into this a little, and we have options:

  1. We can make test vehicles handle about:blank specially, i.e., not load it if it's given as an Intent argument.

    This is easy but doesn't really address our issue with Marionette and multiple loads. That is, if we -a android.action.intent.VIEW -d URI that is not about:blank, we get multiple loads (about:blank and URI).

  2. We can teach GeckoView what its initial URL should be.

    Setting the src on the <browser> element around https://searchfox.org/mozilla-central/source/mobile/android/chrome/geckoview/geckoview.js#380 does in fact avoid the about:blank load. I think this would look like a parameter to the GeckoSession constructor (or some builder configuration). The problem I foresee is that we really only want this load to be done once, and really only if this is the initial load. If we send a -a android.action.intent.VIEW -d URI, that really should correspond to a loadURI. Tricky.

  3. We can make test vehicles handle -a android.intent.action.MAIN -d URI specifically for this case. (Or the same for -a android.intent.action.VIEW -d URI.)

    That is, we can anticipate the "first URI load" and try to avoid the initial about:blank load (like 2.), but not setting the default URI for subsequent loads.

  4. We can try to make GeckoView delay loading Marionette until some number of initial loads are complete.

    It's pretty difficult to coordinate across GV and all possible vehicles.

I'm leaning to something like 2. and 3. That is, let's try:

  • to figure out how to make GV support setting the initial URI, so that we don't see a spurious about:blank load
  • to wait for that initial load before starting Marionette
  • to teach the test vehicles to not do spurious loadURI invocations
Flags: needinfo?(nalexander)

James, I'm looking into the test failures a bit more closely now that we're getting better results. In the android wpt11 chunk, there's a test that's crashing, and it just prints the whole traceback into Treeherder's failure view: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=236753031&repo=try&lineNumber=5206

I can reproduce this crash locally by running ./mach wpt --package=org.mozilla.geckoview.test payment-method-basic-card which runs all of the tests in that directory.

If I disable all of the tests in this dir, the crash (unsurprisingly) doesn't happen.

If I just run the test that's crashing (./mach wpt --package=org.mozilla.geckoview.test payment-method-basic-card/idlharness.window.html) the crash doesn't happen.

If I disable the test that runs prior to idlharness.window.html (historical.https.html) and run all of the tests in that directory, the crash doesn't happen.

And the more shocking (to me, at least, unless I added the metadata file incorrectly and the disable isn't applying), if I disable idlharness.window.html and run all of the tests in that directory, the crash still happens in idlharness.window.html.

Should I disable historical.https.html, or the whole dir, or look more into the crash?

Flags: needinfo?(james)

(In reply to Wes Kocher (:KWierso) from comment #84)

James, I'm looking into the test failures a bit more closely now that we're getting better results. In the android wpt11 chunk, there's a test that's crashing, and it just prints the whole traceback into Treeherder's failure view: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=236753031&repo=try&lineNumber=5206

I can reproduce this crash locally by running ./mach wpt --package=org.mozilla.geckoview.test payment-method-basic-card which runs all of the tests in that directory.

If I disable all of the tests in this dir, the crash (unsurprisingly) doesn't happen.

If I just run the test that's crashing (./mach wpt --package=org.mozilla.geckoview.test payment-method-basic-card/idlharness.window.html) the crash doesn't happen.

If I disable the test that runs prior to idlharness.window.html (historical.https.html) and run all of the tests in that directory, the crash doesn't happen.

And the more shocking (to me, at least, unless I added the metadata file incorrectly and the disable isn't applying), if I disable idlharness.window.html and run all of the tests in that directory, the crash still happens in idlharness.window.html.

Should I disable historical.https.html, or the whole dir, or look more into the crash?

FWIW, the bottom of the logcat for the crashing run (https://taskcluster-artifacts.net/AS_6qdgMTQ6oUlyAMeI-dg/0/public/test_info//logcat-emulator-5554.log) shows historical.https.html running but I don't see idlharness.window.html mentioned at all.

:KWierso - I noticed in your recent try builds that you are running against x86 builds; you might want to switch to x86_64 (64 bit) builds. All of the other geckoview tests are running against 64 bit builds now and we want wpt to as well. Of course you may want to switch to geckoview, then switch to 64 bit as a separate step, but I'd hate to see all your greening efforts duplicated if there are slight shifts with 64 bit builds.

Just update:

https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/taskcluster/ci/test/test-sets.yml#396-397

to move your tests into android-x86_64-tests (and remove the empty android-x86-tests).

Thank you (and everyone working away here) for all your efforts. Great to see this moving forward!

Flags: needinfo?(wkocher)

(In reply to Geoff Brown [:gbrown] from comment #86)

:KWierso - I noticed in your recent try builds that you are running against x86 builds; you might want to switch to x86_64 (64 bit) builds. All of the other geckoview tests are running against 64 bit builds now and we want wpt to as well. Of course you may want to switch to geckoview, then switch to 64 bit as a separate step, but I'd hate to see all your greening efforts duplicated if there are slight shifts with 64 bit builds.

Just update:

https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/taskcluster/ci/test/test-sets.yml#396-397

to move your tests into android-x86_64-tests (and remove the empty android-x86-tests).

Thank you (and everyone working away here) for all your efforts. Great to see this moving forward!

Noted! Folded that into the patch switching to geckoview: https://hg.mozilla.org/try/rev/8b23431272dbff1aad2e3c5bcf0f9413e38d8807

This only runs on android opt builds. If I move these two lines https://hg.mozilla.org/try/rev/8b23431272dbff1aad2e3c5bcf0f9413e38d8807#l2.12 down to the android-x86_64-tests section, it would run on opt and debug builds, but I don't think that's necessary yet until we get GV running nicely on opt builds...

Flags: needinfo?(wkocher)

(In reply to Wes Kocher (:KWierso) from comment #87)

This only runs on android opt builds. If I move these two lines https://hg.mozilla.org/try/rev/8b23431272dbff1aad2e3c5bcf0f9413e38d8807#l2.12 down to the android-x86_64-tests section, it would run on opt and debug builds, but I don't think that's necessary yet until we get GV running nicely on opt builds...

There is an opt-only option also: https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/taskcluster/ci/test/test-sets.yml#380

(In reply to Wes Kocher (:KWierso) from comment #85)

Should I disable historical.https.html, or the whole dir, or look more into the crash?

Without trying myself, it seems like disabling historical.https.html might be a good first step, but I won't be surprised if there are further intermittents. Note that we don't have to disable for crashes at all and can mark them as expected crash (although it looks like in this case we also hit some harness bug and so that may not work).

Flags: needinfo?(james)

KWierso: could you do point 1/3 from #c83: filter out the TRA/GVA duplicate "about:blank" loads by checking the URL around https://searchfox.org/mozilla-central/source/mobile/android/geckoview_example/src/main/java/org/mozilla/geckoview_example/GeckoViewActivity.java#162 and not invoking loadFromIntent/loadURLif it's going to be another "about:blank"? I don't know if the GV itself will tell you its URI (in a reliable fashion) at this point. The reason for doing it here is that this will only be the very first-a ...VIEW -d URL, and not block subsequent-a ...VIEW -d "about:blank"` invocations from navigating.

For TRA, it's around https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/TestRunnerActivity.java#201.

It's really not clear to me how to delay Marionette startup until after the "initial" about:blank load, 'cuz that's just not very well defined.

NI to Wes to make sure that doesn't get missed.

Flags: needinfo?(wkocher)

Messing with TRA and testing with ./mach wpt --package=org.mozilla.geckoview.test path/to/single/test and removing the fix from comment 80 to bring back the original window.__wptrunner_process_next_event is not a function error:

I added a Log.i(LOGTAG, "Visited URL: " + uri); on line 202.

The uri is only ever null.

One wpt run, I saw the "visited URL" message once. Another time, it logged twice. Another time, it logged three times.

All of the references to about:blank are much later in the logcat.

Flags: needinfo?(wkocher) → needinfo?(nalexander)

(In reply to Wes Kocher (:KWierso) from comment #92)

Messing with TRA and testing with ./mach wpt --package=org.mozilla.geckoview.test path/to/single/test and removing the fix from comment 80 to bring back the original window.__wptrunner_process_next_event is not a function error:

I added a Log.i(LOGTAG, "Visited URL: " + uri); on line 202.

The uri is only ever null.

One wpt run, I saw the "visited URL" message once. Another time, it logged twice. Another time, it logged three times.

All of the references to about:blank are much later in the logcat.

I'd like to help but I can't. With ./mach android-emulator --version x86-7.0 &, I get obvious root-related failures:

 0:15.18 ERROR Traceback (most recent call last):
  File "/Users/nalexander/Mozilla/gecko/testing/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 207, in init
    self.browser.start(group_metadata=group_metadata, **self.browser_settings)
  File "/Users/nalexander/Mozilla/gecko/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py", line 199, in start
    write_hosts_file(self.config, self.runner.device.device)
  File "/Users/nalexander/Mozilla/gecko/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/fennec.py", line 95, in write_hosts_file
    device.remount()
  File "/Users/nalexander/Mozilla/gecko/testing/mozbase/mozdevice/mozdevice/adb.py", line 1751, in remount
    raise ADBError("Unable to remount device")
ADBError: Unable to remount device
Flags: needinfo?(nalexander)

This patch leaves wpt running against fennec on androidx86 as tier2, adds wpt to run against geckoview testactivity on android x86_64 as tier3, and adds enough metadata to run_info_extras to help differentiate the two in expectation files. Fennec is "os == android and not e10s", while geckoview is "os == android and e10s".

Several fennec wpt annotations are currently either very specific to fennec and the android emulator that runs in CI, or are just 'if os == android'. Once geckoview tests are running, the fennec-emulator-specific annotations won't apply, but the generic 'android' ones will. This commit changes all of the in-tree annotations to be 'if os == android and not e10s' which will apply only to fennec, since e10s is not used for fennec. Future geckoview annotations will be 'if os == android and e10s', since geckoview uses e10s by default.

Depends on D27182

Attachment #9057707 - Attachment description: Bug 1501562 - Run wpt against geckoview while still running it against fennec → Bug 1501562 - Run wpt against geckoview

Okay, these four patches should be sufficient to get the geckoview wpt tests running as tier-3 x86_64 jobs on try and mozilla-central, while leaving the Fennec wpt tests running as tier-2 x86 jobs on try and mozilla-central.

Patch 1 sets up the new jobs, patch 2 migrates all of the android wpt annotations in-tree to apply only to Fennec, patch 3 works around the race condition/extra about:blank load, and patch 4 should parse crash dumps for fennec and geckoview wpt tests, at least in CI (locally in the emulator would require invoking something like mach artifact toolchain [platform]-minidump-toolchain and then passing in the path to that artifact to mach wpt as the stackwalk_binary argument).

Patches 1 and 2 are pretty much done.

I'd like to do better than patch 3 but I don't see a quick way forward there.

Patch 4 gets crash dumps parsed, but the crash stacks are all in libc or libxul, which seems maybe wrong (but maybe that's just the ioerrors that get coerced into the "CRASH" state, rather than actual crashes?).

None of this fixes up expectations for the new tests, so they'd be enabled as failing, but I plan on getting them fixed up soon after enabling them. I'd just really like to stop having to rebase these first four patches every week or so while I work on the rest of it.

Flags: needinfo?(james)
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/a033f955b188
Run wpt against geckoview r=jgraham
https://hg.mozilla.org/integration/autoland/rev/fddb75448c79
Change android wpt expectations so they only apply to fennec r=jgraham
https://hg.mozilla.org/integration/autoland/rev/00f60c8cf8fb
Don't set window.opened window to explicitly load about:blank r=jgraham
https://hg.mozilla.org/integration/autoland/rev/64c05e3826cd
Add crash checking for fennec wpt r=jgraham

I did some try pushes%2C1proc&fromchange=a15c404abb33ab015d0162a1cdce336ae367f7bc&tochange=8a333aaabfa8eeadea17cd35f912bea57b3e1bb7) to figure this out.

The bottom push is just my patches (including annotation updates that weren't part of the initial landing). The test failed on linux debug non-e10s.

The middle push is the bottom push, but the annotation updates have been removed. The test still failed on linux debug non-e10s.

The top push is the middle push, but with https://hg.mozilla.org/try/rev/20812e240c1bb40153c4b2e50b68d7228d105390 backed out. The test passes on linux debug non-e10s.

I'm not sure why that commit causes test failures on linux (and possibly only in the non-e10s configuration).

I'll defer to James on what we should do from here.

Flags: needinfo?(wkocher)

Replacing undefined with null in the window.open seems to be enough to make this work on non-e10s. It seems like a gecko bug if that's making a difference, but assuming it works OK for geckoview too, we could take that change. Otherwise we can just mark that test as expected:FAIL for non-e10s on the basis that no one really cares about non-e10s desktop (and file a bug on the issue).

Flags: needinfo?(james)

(In reply to James Graham [:jgraham] from comment #102)

Replacing undefined with null in the window.open seems to be enough to make this work on non-e10s. It seems like a gecko bug if that's making a difference, but assuming it works OK for geckoview too, we could take that change. Otherwise we can just mark that test as expected:FAIL for non-e10s on the basis that no one really cares about non-e10s desktop (and file a bug on the issue).

null doesn't seem to stop the errors on geckoview: https://treeherder.mozilla.org/logviewer.html#?job_id=242104869&repo=try

OK, then just mark that test as

expected:
  if not e10s: FAIL

for now

Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/f5d44a3e3a7a
Run wpt against geckoview r=jgraham
https://hg.mozilla.org/integration/autoland/rev/8045a87e5ce6
Change android wpt expectations so they only apply to fennec r=jgraham
https://hg.mozilla.org/integration/autoland/rev/7010f2e26969
Don't set window.opened window to explicitly load about:blank r=jgraham
https://hg.mozilla.org/integration/autoland/rev/c666c0a0d042
Add crash checking for fennec wpt r=jgraham
Pushed by wkocher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/36aad514bc3b
Run wpt against geckoview r=jgraham
https://hg.mozilla.org/integration/autoland/rev/b7eb6d443e0a
Change android wpt expectations so they only apply to fennec r=jgraham
https://hg.mozilla.org/integration/autoland/rev/ed687941a894
Don't set window.opened window to explicitly load about:blank r=jgraham
https://hg.mozilla.org/integration/autoland/rev/03e4ec9e7b85
Add crash checking for fennec wpt r=jgraham
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/16517 for changes under testing/web-platform/tests
Whiteboard: [geckoview:p2] → [geckoview:p2][wptsync upstream]
Depends on: 1547009
Upstream PR merged
Pushed by wkocher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2d1f7a0e1e82
Update wpt expectation metadata for geckoview testrunneractivity
https://hg.mozilla.org/integration/autoland/rev/dae0588357c3
Disable some frequent intermittent failing tests on geckoview testrunneractivity
Pushed by wkocher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59314da6bb6b
Fixup some wpt metadata syntax to unbreak wpt tests

Keywords: leave-open

Wes, can we remove the leave-open keyword and close this bug? Are there bugs or meta bugs filed for GeckoView's WPT failures?

Flags: needinfo?(wkocher)

(In reply to Chris Peterson [:cpeterson] from comment #119)

Wes, can we remove the leave-open keyword and close this bug? Are there bugs or meta bugs filed for GeckoView's WPT failures?

Sure, I'll do the rest in followups.

Status: NEW → RESOLVED
Closed: 7 months ago
Flags: needinfo?(wkocher)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → Firefox 68
You need to log in before you can comment on or make changes to this bug.