WebDriver:ExecuteScript returns null where no null is possible
Categories
(Remote Protocol :: Marionette, defect, P2)
Tracking
(Not tracked)
People
(Reporter: 4mr.minj, Unassigned)
References
()
Details
(Keywords: regression)
Attachments
(4 files, 1 obsolete file)
Using Firefox: 91.6.0 ESR
Error is intermittent only: unexpected null is returned from WebDriver:ExecuteScript
where no null is theoretically possible.
See line 60 (Marionette request on 1646144672743
):
return globalThis?.window?.jQuery?.readyWait ?? -1;
returns null
.
It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.
Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id
Is this just Bug 1673478?
NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.
This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.
Comment 1•3 years ago
|
||
(In reply to Mindaugas <LA-MJ> from comment #0)
See line 60 (Marionette request on
1646144672743
):
return globalThis?.window?.jQuery?.readyWait ?? -1;
returnsnull
.It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.
I'm not sure which JS spec you are referring to here but Execute Script
can as well return null
. Also as I already requested on the github issue the additional log with details about the values for globalThis
, globalThis.window
etc was not provided. Instead it was claimed Please don't interrogate my code, I know how it works.
If you want to see progress on issues that you are filing please provide the information that you have been requested for or at least hand-over a minimized testcase.
Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id
Is this just Bug 1673478?
Probably not by what I see from the log but I wouldn't exclude it for now.
NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.
So at least it might be a regression from the Fission compatibility work.
By that time we had the marionette.actors.enabled
preference which could be used to turn on/off the new code. If you could set it to true
within the WebDriver capabilities, and then help us testing older Firefox Nightly releases that would be great. I assume the Firefox 83 release works with this setting? for the checks you can use the tool mozregression. It has a -c
argument to run any command that you specify that determines if a build is fine or not. Depending on how often it fails/passes you might want to run your code snippet several times.
This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.
Please note that this is the same team here. And we are not unphased but are still waiting for the requested information (see above). Also note that our work is based on priorities and your report is the first one in this vein and there is no chance for us to get it reproduced. So instead of blaming lets work together in getting the issue investigated. This indeed would be more fruitful. Thanks.
Updated•3 years ago
|
Reporter | ||
Comment 2•3 years ago
|
||
So finally we come to the root of the communication problem. I guess some of my assumptions are wrong.
- I assumed that we all know what globalThis and window mean in a browser context.
- I assumed that
ExecuteScrip
t works like a code block wrapped in a IIFE. Why else would one be allowed to usereturn
?- Which would all make the meaning of jQuery?.readyWait totally irrelevant to the problem at hand. As according to EcmaScript specification
??
coalesces to the RHS when LHS is null. - But in case there is still interest, one can see this
readyWait
field in jQuery's codebase.
- Which would all make the meaning of jQuery?.readyWait totally irrelevant to the problem at hand. As according to EcmaScript specification
As I've mentioned this is an intermittent and unpredictable problem. It occurs as part of a large testing harness of a proprietary website. Therefore there is no test case to provide.
All of this has lead me to believe all of the relevant information is already provided. Regression testing on nightly builds takes an overwhelming amount of time and effort, especially when you need hours to reproduce the problem, and will have to wait.
What this code is used for is in a routine detecting browser activity (animations, transitions, onready handlers, navigation).
Let me know, what other information is still otherwise missing
Comment 3•3 years ago
|
||
Given the script executed, I agree that null
can only come from our side here (and so probably from https://searchfox.org/mozilla-central/rev/570f6e5c06b6c8140f53bf104d785a18165212ab/remote/marionette/actors/MarionetteCommandsParent.jsm#346)
This would typically happen if a navigation occurs while the script is executed. Two options:
- either the script is executed while an explicit navigation happens
- or we don't wait correctly for the load of the initial page. In this case, the fix from https://bugzilla.mozilla.org/show_bug.cgi?id=1739008 might help (will be in esr 91.8)
Can you help us confirm that? Is it possible that a navigation happens early in the test?
Updated•2 years ago
|
Updated•2 years ago
|
Comment 4•2 years ago
|
||
We might want to think about catching an AbortError
already in the executeScript
method of the MarionetteCommandsParent
actor. As such the client will at least get a valid error response instead of a null
value.
We will observe intermittent failures during the next days to see what the logger entry from bug 1773939 will tell us.
Comment 5•2 years ago
|
||
The test failure over on bug 1761634 shows that we return null
instead of raising an exception.
[task 2022-06-17T22:50:51.153Z] 22:50:51 INFO - PID 18459 | 1655506251152 Marionette DEBUG 0 -> [0,80,"WebDriver:ExecuteScript",{"args":[],"script":"\n return window.name;\n "}]
[task 2022-06-17T22:50:51.161Z] 22:50:51 INFO - PID 18459 | 1655506251159 Marionette TRACE [43] Querying "executeScript" failed with AbortError, returning "null" as fallback
[task 2022-06-17T22:50:51.162Z] 22:50:51 INFO - PID 18459 | 1655506251161 Marionette DEBUG 0 <- [1,80,null,{"value":null}]
As such we should maybe start with Execute Script
and special-case the handling so that a JavaScriptError gets returned.
Then it would also be good to know why or at least when the actor has been destroyed. Right now we don't seem to log that as well.
Updated•2 years ago
|
Comment 6•29 days ago
|
||
While investigating navigation issues for bug 1903929 I noticed that this is actually a causing factor. It's also required for bug 1664165 to move the navigation wait logic to the WebProgress Listener.
I'll check how it works when we actually return a JavaScriptError
in case of an unexpected navigation and related process change for the content process stops the script execution. This means that if we change the logic we need to update the specification as well, see https://github.com/w3c/webdriver/issues/1708.
Comment 7•28 days ago
|
||
Comment 8•28 days ago
|
||
Updated•28 days ago
|
Comment 9•28 days ago
|
||
The listed WebDriver classic spec issue will be discussed and implemented via bug 1673478, which no longer blocks.
Updated•28 days ago
|
Comment 10•27 days ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d87f2bf304d1 [taskcluster] Enhance task timeout for wpt canvas jobs for asan builds and split into tree chunks. r=jmaher https://hg.mozilla.org/integration/autoland/rev/2d8046b0452a [marionette] Raise JavaScriptError if MarionetteCommands actor gets destroyed. r=webdriver-reviewers,Sasha,jdescottes
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47705 for changes under testing/web-platform/tests
Comment 12•27 days ago
|
||
Depends on D219657
Comment 13•27 days ago
|
||
Comment on attachment 9420151 [details]
Bug 1761634 - Update test grouping logic for web-platform-tests-canvas
Revision D219761 was moved to bug 1522790. Setting attachment 9420151 [details] to obsolete.
Comment 14•27 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d87f2bf304d1
https://hg.mozilla.org/mozilla-central/rev/2d8046b0452a
Upstream PR merged by moz-wptsync-bot
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47722 for changes under testing/web-platform/tests
Comment 17•27 days ago
|
||
Backed out for causing Android 5.0 build bustages
Comment 18•27 days ago
•
|
||
Oh wow, I haven't thought that we still run some Fennec specific scripts for Marionette in mozharness. So in the android_emulator_pgo.py
file the Marionette.quit()
isn't used but some custom script evaluation which indeed fails. It should really use Marionette's in_app
shutdown logic here.
Here a try build for the patch fixing this issue:
https://treeherder.mozilla.org/jobs?repo=try&revision=26b8e42c9135eec39e01a0c0d7cb50ff4c8fb29d
Comment 19•27 days ago
|
||
Backout merged in Central: https://hg.mozilla.org/mozilla-central/rev/cb20a055e1a1bd086c54dff2072b169b1182352b
Upstream PR merged by moz-wptsync-bot
Comment 21•27 days ago
|
||
Comment 22•26 days ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/51d7284bb846 [taskcluster] Enhance task timeout for wpt canvas jobs for asan builds and split into tree chunks. r=jmaher https://hg.mozilla.org/integration/autoland/rev/e721121e5f57 [marionette] Raise JavaScriptError if MarionetteCommands actor gets destroyed. r=webdriver-reviewers,Sasha,jdescottes https://hg.mozilla.org/integration/autoland/rev/18b37181f365 [mozharness] Use driver.quit() instead of custom shutdown script evaluation. r=jdescottes
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47738 for changes under testing/web-platform/tests
Comment 24•26 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/51d7284bb846
https://hg.mozilla.org/mozilla-central/rev/e721121e5f57
https://hg.mozilla.org/mozilla-central/rev/18b37181f365
Upstream PR merged by moz-wptsync-bot
Comment 26•15 days ago
|
||
The landing of this patch caused several issues with WebDriver clients. I would suggest that we backout this patch only and leave the other two in tree. Thanks.
Comment 27•15 days ago
|
||
Backed out for causing several issues with WebDriver clients.
Backout link: https://hg.mozilla.org/integration/autoland/rev/18b37181f365
Comment 28•15 days ago
|
||
Backout by abutkovits@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/7d7e1f8f7f85 Backed out changeset 18b37181f365 for causing several issues with WebDriver clients. a=backout
Comment 29•15 days ago
|
||
Thanks for the backout. As discussed we have to first find a proper error type to raise if the navigable goes away, and clients can handle such a situation propertly.
Also as bug 1914525 and bug 1914582 have shown WebDriver clients have issues that we need to solve before re-landing a patch for this bug. Maybe we should consider to have a dedicated preference at the beginning to ensure that clients can adapt to this change.
Updated•15 days ago
|
Comment 30•15 days ago
|
||
FYI I referenced the wrong commit for backout. I discussed with sheriffs and they will correct it. Which means re-land the wrongly backed-out commit and finally backout https://hg.mozilla.org/integration/autoland/rev/e721121e5f57.
Comment 31•15 days ago
|
||
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/8cab9e5ad8eb [mozharness] Use driver.quit() instead of custom shutdown script evaluation. r=jdescottes https://hg.mozilla.org/mozilla-central/rev/c9a192c047cd Backed out changeset e721121e5f57 for causing several issues with WebDriver clients. a=backout
Comment 32•15 days ago
|
||
bugherder |
Updated•15 days ago
|
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47922 for changes under testing/web-platform/tests
Upstream PR was closed without merging
Upstream PR merged by moz-wptsync-bot
Comment 36•11 days ago
|
||
We should discuss this bug in our next triage meeting.
Comment 37•8 days ago
|
||
I'm currently not working on this bug. We will have to figure out some possibilities for bug 1673478 first to actually make progress on this bug.
Description
•