Open Bug 1761634 Opened 3 years ago Updated 8 days ago

WebDriver:ExecuteScript returns null where no null is possible

Categories

(Remote Protocol :: Marionette, defect, P2)

Firefox 91
defect

Tracking

(Not tracked)

People

(Reporter: 4mr.minj, Unassigned)

References

()

Details

(Keywords: regression)

Attachments

(4 files, 1 obsolete file)

Attached file trace.log

Using Firefox: 91.6.0 ESR

Error is intermittent only: unexpected null is returned from WebDriver:ExecuteScript where no null is theoretically possible.

See line 60 (Marionette request on 1646144672743):

return globalThis?.window?.jQuery?.readyWait ?? -1; returns null.

It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.

Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id

Is this just Bug 1673478?

NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.

This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.

(In reply to Mindaugas <LA-MJ> from comment #0)

See line 60 (Marionette request on 1646144672743):

return globalThis?.window?.jQuery?.readyWait ?? -1; returns null.

It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.

I'm not sure which JS spec you are referring to here but Execute Script can as well return null. Also as I already requested on the github issue the additional log with details about the values for globalThis, globalThis.window etc was not provided. Instead it was claimed Please don't interrogate my code, I know how it works. If you want to see progress on issues that you are filing please provide the information that you have been requested for or at least hand-over a minimized testcase.

Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id

Is this just Bug 1673478?

Probably not by what I see from the log but I wouldn't exclude it for now.

NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.

So at least it might be a regression from the Fission compatibility work.

By that time we had the marionette.actors.enabled preference which could be used to turn on/off the new code. If you could set it to true within the WebDriver capabilities, and then help us testing older Firefox Nightly releases that would be great. I assume the Firefox 83 release works with this setting? for the checks you can use the tool mozregression. It has a -c argument to run any command that you specify that determines if a build is fine or not. Depending on how often it fails/passes you might want to run your code snippet several times.

This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.

Please note that this is the same team here. And we are not unphased but are still waiting for the requested information (see above). Also note that our work is based on priorities and your report is the first one in this vein and there is no chance for us to get it reproduced. So instead of blaming lets work together in getting the issue investigated. This indeed would be more fruitful. Thanks.

Severity: -- → S3
Priority: -- → P3

So finally we come to the root of the communication problem. I guess some of my assumptions are wrong.

  • I assumed that we all know what globalThis and window mean in a browser context.
  • I assumed that ExecuteScript works like a code block wrapped in a IIFE. Why else would one be allowed to use return?
    • Which would all make the meaning of jQuery?.readyWait totally irrelevant to the problem at hand. As according to EcmaScript specification ?? coalesces to the RHS when LHS is null.
    • But in case there is still interest, one can see this readyWait field in jQuery's codebase.

As I've mentioned this is an intermittent and unpredictable problem. It occurs as part of a large testing harness of a proprietary website. Therefore there is no test case to provide.

All of this has lead me to believe all of the relevant information is already provided. Regression testing on nightly builds takes an overwhelming amount of time and effort, especially when you need hours to reproduce the problem, and will have to wait.

What this code is used for is in a routine detecting browser activity (animations, transitions, onready handlers, navigation).

Let me know, what other information is still otherwise missing

Given the script executed, I agree that null can only come from our side here (and so probably from https://searchfox.org/mozilla-central/rev/570f6e5c06b6c8140f53bf104d785a18165212ab/remote/marionette/actors/MarionetteCommandsParent.jsm#346)

This would typically happen if a navigation occurs while the script is executed. Two options:

Can you help us confirm that? Is it possible that a navigation happens early in the test?

Blocks: 1768549
Severity: S3 → --
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 10 → All
Priority: P3 → --
Hardware: x86_64 → All
See Also: → 1773939
Severity: -- → S3
Priority: -- → P3
Whiteboard: [webdriver:backlog]

We might want to think about catching an AbortError already in the executeScript method of the MarionetteCommandsParent actor. As such the client will at least get a valid error response instead of a null value.

We will observe intermittent failures during the next days to see what the logger entry from bug 1773939 will tell us.

The test failure over on bug 1761634 shows that we return null instead of raising an exception.

[task 2022-06-17T22:50:51.153Z] 22:50:51     INFO - PID 18459 | 1655506251152	Marionette	DEBUG	0 -> [0,80,"WebDriver:ExecuteScript",{"args":[],"script":"\n        return window.name;\n        "}]
[task 2022-06-17T22:50:51.161Z] 22:50:51     INFO - PID 18459 | 1655506251159	Marionette	TRACE	[43] Querying "executeScript" failed with AbortError, returning "null" as fallback
[task 2022-06-17T22:50:51.162Z] 22:50:51     INFO - PID 18459 | 1655506251161	Marionette	DEBUG	0 <- [1,80,null,{"value":null}]

As such we should maybe start with Execute Script and special-case the handling so that a JavaScriptError gets returned.

Then it would also be good to know why or at least when the actor has been destroyed. Right now we don't seem to log that as well.

No longer blocks: 1768549
Depends on: 1673478
Product: Testing → Remote Protocol

While investigating navigation issues for bug 1903929 I noticed that this is actually a causing factor. It's also required for bug 1664165 to move the navigation wait logic to the WebProgress Listener.

I'll check how it works when we actually return a JavaScriptError in case of an unexpected navigation and related process change for the content process stops the script execution. This means that if we change the logic we need to update the specification as well, see https://github.com/w3c/webdriver/issues/1708.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

The listed WebDriver classic spec issue will be discussed and implemented via bug 1673478, which no longer blocks.

Points: --- → 2
No longer depends on: 1673478
Priority: P3 → P2
Whiteboard: [webdriver:backlog] → [webdriver:m12]
Points: 2 → 3
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d87f2bf304d1
[taskcluster] Enhance task timeout for wpt canvas jobs for asan builds and split into tree chunks. r=jmaher
https://hg.mozilla.org/integration/autoland/rev/2d8046b0452a
[marionette] Raise JavaScriptError if MarionetteCommands actor gets destroyed. r=webdriver-reviewers,Sasha,jdescottes
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47705 for changes under testing/web-platform/tests
Whiteboard: [webdriver:m12] → [webdriver:m12], [wptsync upstream]

Comment on attachment 9420151 [details]
Bug 1761634 - Update test grouping logic for web-platform-tests-canvas

Revision D219761 was moved to bug 1522790. Setting attachment 9420151 [details] to obsolete.

Attachment #9420151 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 27 days ago
Resolution: --- → FIXED
Target Milestone: --- → 131 Branch
Upstream PR merged by moz-wptsync-bot
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47722 for changes under testing/web-platform/tests

Backed out for causing Android 5.0 build bustages

Status: RESOLVED → REOPENED
Flags: needinfo?(hskupin)
Resolution: FIXED → ---
Target Milestone: 131 Branch → ---

Oh wow, I haven't thought that we still run some Fennec specific scripts for Marionette in mozharness. So in the android_emulator_pgo.py file the Marionette.quit() isn't used but some custom script evaluation which indeed fails. It should really use Marionette's in_app shutdown logic here.

https://searchfox.org/mozilla-central/rev/64f196824745c5db5ef7bb23725cfb9a41586149/testing/mozharness/scripts/android_emulator_pgo.py#248-261

Here a try build for the patch fixing this issue:
https://treeherder.mozilla.org/jobs?repo=try&revision=26b8e42c9135eec39e01a0c0d7cb50ff4c8fb29d

Flags: needinfo?(hskupin)
Upstream PR merged by moz-wptsync-bot
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/51d7284bb846
[taskcluster] Enhance task timeout for wpt canvas jobs for asan builds and split into tree chunks. r=jmaher
https://hg.mozilla.org/integration/autoland/rev/e721121e5f57
[marionette] Raise JavaScriptError if MarionetteCommands actor gets destroyed. r=webdriver-reviewers,Sasha,jdescottes
https://hg.mozilla.org/integration/autoland/rev/18b37181f365
[mozharness] Use driver.quit() instead of custom shutdown script evaluation. r=jdescottes
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47738 for changes under testing/web-platform/tests
Status: REOPENED → RESOLVED
Closed: 27 days ago26 days ago
Resolution: --- → FIXED
Target Milestone: --- → 131 Branch
Upstream PR merged by moz-wptsync-bot

The landing of this patch caused several issues with WebDriver clients. I would suggest that we backout this patch only and leave the other two in tree. Thanks.

Flags: needinfo?(sheriffs)
Keywords: regression

Backed out for causing several issues with WebDriver clients.

Backout link: https://hg.mozilla.org/integration/autoland/rev/18b37181f365

Flags: needinfo?(sheriffs) → needinfo?(hskupin)
Backout by abutkovits@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/7d7e1f8f7f85
Backed out changeset 18b37181f365 for causing several issues with WebDriver clients. a=backout

Thanks for the backout. As discussed we have to first find a proper error type to raise if the navigable goes away, and clients can handle such a situation propertly.

Also as bug 1914525 and bug 1914582 have shown WebDriver clients have issues that we need to solve before re-landing a patch for this bug. Maybe we should consider to have a dedicated preference at the beginning to ensure that clients can adapt to this change.

Status: RESOLVED → REOPENED
Flags: needinfo?(hskupin)
Resolution: FIXED → ---
No longer regressed by: 1914582
Regressions: 1914582

FYI I referenced the wrong commit for backout. I discussed with sheriffs and they will correct it. Which means re-land the wrongly backed-out commit and finally backout https://hg.mozilla.org/integration/autoland/rev/e721121e5f57.

Pushed by abutkovits@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/8cab9e5ad8eb
[mozharness] Use driver.quit() instead of custom shutdown script evaluation. r=jdescottes
https://hg.mozilla.org/mozilla-central/rev/c9a192c047cd
Backed out changeset e721121e5f57 for causing several issues with WebDriver clients. a=backout
Status: REOPENED → RESOLVED
Closed: 26 days ago15 days ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 131 Branch → ---
No longer blocks: 1914354
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/47922 for changes under testing/web-platform/tests
Upstream PR was closed without merging
Upstream PR merged by moz-wptsync-bot

We should discuss this bug in our next triage meeting.

Whiteboard: [webdriver:m12], [wptsync upstream] → [webdriver:triage][wptsync upstream]

I'm currently not working on this bug. We will have to figure out some possibilities for bug 1673478 first to actually make progress on this bug.

Assignee: hskupin → nobody
No longer blocks: 1522790, 1664165, 1903929
Status: REOPENED → NEW
Points: 3 → ---
See Also: → 1673478
Whiteboard: [webdriver:triage][wptsync upstream]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: