Element reference is unknown and not stale after a remoteness change since Firefox 110.0
Categories
(Remote Protocol :: Marionette, defect, P1)
Tracking
(firefox-esr102 unaffected, firefox111 wontfix, firefox112 wontfix, firefox113 wontfix, firefox114 wontfix, firefox115 wontfix, firefox116 fixed)
People
(Reporter: xlebnica.booroop, Assigned: whimboo)
References
(Blocks 1 open bug, Regressed 1 open bug, Regression)
Details
(Keywords: regression, Whiteboard: [webdriver:m7], [wptsync upstream][webdriver:relnote])
Attachments
(7 files, 2 obsolete files)
157.73 KB,
image/jpeg
|
Details | |
179.91 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
227.93 KB,
image/svg+xml
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0
Steps to reproduce:
We have the following env:
selenium java 4.7.2
geckodriver 0.32.1
firefox 110.0
After updating firefox from 109.0.1 to 110.0 we have a problem with ExpectedConditions.stalenessOf(element).
Example of code:
WebElement html = seleniumSettings.getWebDriver().findElement(By.tagName("html"));
seleniumSettings.getWebDriver().get("about:blank");
seleniumSettings.getWebDriver().get(page);
new WebDriverWait(seleniumSettings.getWebDriver(), seleniumSettings.getDefaultTimeout())
.withMessage("Element exist.")
.until(ExpectedConditions.stalenessOf(html));
Actual results:
Test break on WebDriverWait with the following error - Caused by: org.openqa.selenium.NoSuchElementException: Unknown element with id...
Expected results:
WebDriverWait should execute correctly
Reporter | ||
Comment 1•9 months ago
|
||
This problem actually only on Linux.
Reporter | ||
Comment 2•9 months ago
|
||
The problem is exactly the same as in Bug 1685951
Reporter | ||
Updated•9 months ago
|
Assignee | ||
Comment 3•9 months ago
|
||
Could you please provide a trace log as generated by geckodriver?
I assume that the originally loaded URL is the same as page
? With that in mind I would say that the navigation to about:blank
is causing the browsing context to be replaced. Navigating to the page
again will again replace the browsing context.
As per latest WebDriver classic changes the remote end needs to raise a no such element
error given that the element isn't known in the new browsing context.
Assignee | ||
Comment 4•9 months ago
|
||
The landing of bug 1692468 has changed this behavior.
Updated•9 months ago
|
Assignee | ||
Comment 5•9 months ago
|
||
It would still be good to get the requested details. But nevertheless we will discuss next week in our triage meeting.
Reporter | ||
Comment 6•8 months ago
|
||
Reporter | ||
Comment 7•8 months ago
|
||
Attached the log file.
Assignee | ||
Comment 8•8 months ago
|
||
So the log shows two interesting scenarios:
- Here an element with the tag name
html
is retrieved from the initially loadedabout:blank
page. Then a navigation toabout:blank
is happening, following by a navigation tohttp://host.docker.internal:8080
. Note that here no remoteness change will happen and as such the element is marked as stale.
1679574453112 webdriver::server DEBUG -> POST /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/element {
"using": "tag name",
"value": "html"
}
[..]
1679574453136 Marionette DEBUG 0 -> [0,7,"WebDriver:Navigate",{"url":"about:blank"}]
1679574453138 RemoteAgent TRACE [8] ProgressListener Start: expectNavigation=false resolveWhenStarted=true unloadTimeout=200 waitForExplicitStart=true
1679574453138 RemoteAgent TRACE [8] ProgressListener Setting unload timer (200ms)
1679574453142 RemoteAgent TRACE [8] ProgressListener Check loading state: isStart=1 isStop=0
1679574453142 RemoteAgent TRACE [8] ProgressListener state=start: about:blank
1679574453142 RemoteAgent TRACE [8] ProgressListener Cleared the unload timer
1679574453142 RemoteAgent TRACE [8] ProgressListener Request to stop listening when navigation started
1679574453142 RemoteAgent TRACE [8] ProgressListener Stop: has error=false
1679574453144 Marionette DEBUG 0 <- [1,7,null,{"value":null}]
1679574453145 webdriver::server DEBUG <- 200 OK {"value":null}
1679574453153 Marionette TRACE [8] MarionetteCommands actor destroyed for window id 4294967297
1679574453154 webdriver::server DEBUG -> POST /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/url {
"url": "http:\u002f\u002fhost.docker.internal:8080"
}
1679574453166 Marionette DEBUG 0 -> [0,8,"WebDriver:Navigate",{"url":"http://host.docker.internal:8080"}]
1679574453167 RemoteAgent TRACE [8] ProgressListener Start: expectNavigation=false resolveWhenStarted=true unloadTimeout=200 waitForExplicitStart=true
1679574453167 RemoteAgent TRACE [8] ProgressListener Setting unload timer (200ms)
1679574453178 RemoteAgent TRACE [8] ProgressListener Check loading state: isStart=1 isStop=0
1679574453178 RemoteAgent TRACE [8] ProgressListener state=start: http://host.docker.internal:8080/
1679574453178 RemoteAgent TRACE [8] ProgressListener Cleared the unload timer
1679574453178 RemoteAgent TRACE [8] ProgressListener Request to stop listening when navigation started
1679574453178 RemoteAgent TRACE [8] ProgressListener Stop: has error=false
1679574453181 Marionette DEBUG 0 <- [1,8,null,{"value":null}]
1679574453183 webdriver::server DEBUG <- 200 OK {"value":null}
1679574453198 webdriver::server DEBUG -> GET /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/element/68833415-3c3a-4346-86c7-882a77a9ced8/enabled
1679574453200 Marionette DEBUG 0 -> [0,9,"WebDriver:IsElementEnabled",{"id":"68833415-3c3a-4346-86c7-882a77a9ced8"}]
1679574453203 Marionette TRACE [8] MarionetteCommands actor created for window id 4294967298
1679574453211 Marionette DEBUG 0 <- [1,9,{"error":"stale element reference","message":"The element reference of <html> is stale; either its node document is not the active document, or it is no longer connected to the DOM","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Er ... arionette/json.sys.mjs:217:10\nreceiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30\n"},null]
1679574453213 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"stale element reference","message":"The element reference of <html> is stale; either its ... ette/json.sys.mjs:217:10\nreceiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30\n"}}
- Here the element will be retrieved from the loaded page
http://host.docker.internal:8080
, Then a navigation toabout:blank
happens, which again doesn't cause a remoteness change. But then the next navigation loadshttp:\u002f\u002fhost.docker.internal:8080\u002fgrid\u002fSimpleGrid.do?ModuleName=ADMIN_PROGRAM
(note the encoded slashes) and that one triggers a remoteness change:
1679574461935 webdriver::server DEBUG -> POST /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/element {
"using": "tag name",
"value": "html"
}
1679574461936 Marionette DEBUG 0 -> [0,64,"WebDriver:FindElement",{"using":"tag name","value":"html"}]
1679574461938 Marionette DEBUG 0 <- [1,64,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"108b4138-1087-4dea-8399-89bc6214e3ec"}}]
1679574461939 webdriver::server DEBUG <- 200 OK {"value":{"element-6066-11e4-a52e-4f735466cecf":"108b4138-1087-4dea-8399-89bc6214e3ec"}}
1679574461948 webdriver::server DEBUG -> POST /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/url {
"url": "about:blank"
}
1679574461949 Marionette DEBUG 0 -> [0,65,"WebDriver:Navigate",{"url":"about:blank"}]
1679574461949 RemoteAgent TRACE [11] ProgressListener Start: expectNavigation=false resolveWhenStarted=true unloadTimeout=200 waitForExplicitStart=true
1679574461949 RemoteAgent TRACE [11] ProgressListener Setting unload timer (200ms)
1679574461953 RemoteAgent TRACE [11] ProgressListener Check loading state: isStart=1 isStop=0
1679574461953 RemoteAgent TRACE [11] ProgressListener state=start: about:blank
1679574461953 RemoteAgent TRACE [11] ProgressListener Cleared the unload timer
1679574461953 RemoteAgent TRACE [11] ProgressListener Request to stop listening when navigation started
1679574461953 RemoteAgent TRACE [11] ProgressListener Stop: has error=false
1679574461954 Marionette DEBUG 0 <- [1,65,null,{"value":null}]
1679574461955 webdriver::server DEBUG <- 200 OK {"value":null}
1679574461959 Marionette TRACE [8] MarionetteCommands actor destroyed for window id 4294967298
1679574461965 webdriver::server DEBUG -> POST /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/url {
"url": "http:\u002f\u002fhost.docker.internal:8080\u002fgrid\u002fSimpleGrid.do?ModuleName=ADMIN_PROGRAM"
}
1679574461966 Marionette TRACE Remoteness change detected. Set new top-level browsing context to 11
1679574461971 Marionette TRACE [11] MarionetteCommands actor destroyed for window id 6442450947
1679574461976 Marionette DEBUG 0 -> [0,66,"WebDriver:Navigate",{"url":"http://host.docker.internal:8080/grid/SimpleGrid.do?ModuleName=ADMIN_PROGRAM"}]
1679574461977 RemoteAgent TRACE [11] ProgressListener Start: expectNavigation=false resolveWhenStarted=true unloadTimeout=200 waitForExplicitStart=true
1679574461977 RemoteAgent TRACE [11] ProgressListener Document already loading about:blank
1679574461978 RemoteAgent TRACE [11] ProgressListener Setting unload timer (200ms)
1679574461988 Marionette WARN Ignoring event 'DOMContentLoaded' because document has an invalid readyState of 'complete'.
1679574462003 RemoteAgent TRACE [11] ProgressListener Check loading state: isStart=0 isStop=16
1679574462011 RemoteAgent TRACE [11] ProgressListener Check loading state: isStart=1 isStop=0
1679574462011 RemoteAgent TRACE [11] ProgressListener state=start: http://host.docker.internal:8080/grid/SimpleGrid.do?ModuleName=ADMIN_PROGRAM
1679574462011 RemoteAgent TRACE [11] ProgressListener Cleared the unload timer
1679574462011 RemoteAgent TRACE [11] ProgressListener Request to stop listening when navigation started
1679574462011 RemoteAgent TRACE [11] ProgressListener Stop: has error=false
1679574462014 Marionette DEBUG 0 <- [1,66,null,{"value":null}]
1679574462021 webdriver::server DEBUG <- 200 OK {"value":null}
1679574462029 webdriver::server DEBUG -> GET /session/3c576275-fd0b-4b2d-9bde-f11c8e7a51f8/element/108b4138-1087-4dea-8399-89bc6214e3ec/enabled
1679574462030 Marionette DEBUG 0 -> [0,67,"WebDriver:IsElementEnabled",{"id":"108b4138-1087-4dea-8399-89bc6214e3ec"}]
1679574462035 RemoteAgent TRACE WebDriverProcessData actor created for PID 1515
1679574462036 Marionette TRACE [11] MarionetteCommands actor created for window id 8589934593
1679574462044 Marionette DEBUG 0 <- [1,67,{"error":"no such element","message":"Unknown element with id 108b4138-1087-4dea-8399-89bc6214e3ec","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Er ... arionette/json.sys.mjs:217:10\nreceiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30\n"},null]
1679574462044 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"no such element","message":"Unknown element with id 108b4138-1087-4dea-8399-89bc6214e3ec" ... ette/json.sys.mjs:217:10\nreceiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30\n"}}
That means that the top-level browsing context got replaced as what I have assumed in my last comment, and that indeed causes the node cache to be empty at this point and not knowing about the already retrieved element. So this can basically happen for each and every navigation.
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 9•8 months ago
|
||
As discussed today getting different states for element references after a navigation might be confusing. James wanted to have a look to understand what's going on and what we can do.
Comment 10•8 months ago
|
||
So I think the key thing here is that in https://w3c.github.io/webdriver/#dfn-get-a-known-element we first call "node reference is known" which checks if the element id has been used on the current navigable. That means if we ever saw the element in the navigable, irrespective of remoteness change, then we should send Stale Element Reference. We only send No Such Element
if the id wasn't ever used in the navigable, or the id doesn't point at an element.
Assignee | ||
Comment 11•8 months ago
|
||
Discussed with James on Matrix. So we basically missed the navigable
part when we did the refactoring on bug 1692468. Given that we do not support Navigable
yet there is no way to store the node cache information in the content process alone. Instead we also need a map in the parent process which at least holds the element or shadow root references in relation to their containing browsing context. In terms of top-level browsing contexts this would be the custom tab id.
To restore the checks we would have to partially revert some code that existed before my refactoring. We should get this fixed sooner than later even we didn't see much reports for it yet.
Assignee | ||
Updated•8 months ago
|
Comment 12•8 months ago
|
||
The severity field is not set for this bug.
:whimboo, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•8 months ago
|
Updated•8 months ago
|
Assignee | ||
Updated•8 months ago
|
Updated•8 months ago
|
Assignee | ||
Comment 13•8 months ago
|
||
I created the upstream PR first to correct the broken tests: https://github.com/web-platform-tests/wpt/pull/39646.
In parallel I'm already going to start working on a fix for Firefox.
Assignee | ||
Comment 14•7 months ago
|
||
Depends on D177488
Assignee | ||
Comment 15•7 months ago
|
||
Depends on D177489
Assignee | ||
Comment 16•7 months ago
|
||
Depends on D177490
Assignee | ||
Comment 17•7 months ago
|
||
Depends on D177491
Updated•7 months ago
|
Comment 18•7 months ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b89608b3c46a [wdspec] Correct stale element errors for cross-origin navigations #39646 r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/2c6d325cb248 [marionette] Add global WebDriverSession map as singleton and allow access for Marionette command parent actor. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/d94b6d6cd713 [marionette] Move WebElement and ShadowRoot serialization/deserialization helpers to Marionette command client actor. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/256239106623 [marionette] Check Navigable's seen nodes map for known nodes. r=webdriver-reviewers,jdescottes
Updated•7 months ago
|
Comment 19•7 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b89608b3c46a
https://hg.mozilla.org/mozilla-central/rev/2c6d325cb248
https://hg.mozilla.org/mozilla-central/rev/d94b6d6cd713
https://hg.mozilla.org/mozilla-central/rev/256239106623
Assignee | ||
Comment 20•7 months ago
|
||
Note that Bug 1822466 - [wdspec] Correct stale element errors for cross-origin navigations
has already been landed upstream so no upstream sync is needed.
Comment 21•7 months ago
|
||
The patch landed in nightly and beta is affected.
:whimboo, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox114
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 22•7 months ago
|
||
The patches are too risky to uplift to beta and need some follow-up work as well to eg. improve the performance. As such we don't want to fix it in Firefox 114.
Comment 23•7 months ago
|
||
Backed out for causing regressions in the upstream wpt tests.
Assignee | ||
Comment 24•7 months ago
|
||
After the backout of the above patches we have to find another solution to get the following scenarios working:
-
An element is retrieved in the child actor and needs to be send back to the client. Therefore a node reference will be created first in the node cache of the child process which identifies the node and is alive as long as the content process exists. The parent actor which receives the serialized data needs to know the WebReference id and store it in the SeenNodes cache for the given navigable (browsing context or browser for top-level).
-
A formerly retrieved element is used by a command. Hereby the given WebReference needs to be checked if it is known within the SeenNodes cache for the given navigable. Without extra processing the command's payload this check would have to be done when deserializing the data in the child actor.
-
(Bug 1274251) A window or frame needs to be serialized in the child actor. While for frames we make use of the browsing context id, for top-level browsing contexts we need a unique id given that the BC can be replaced by eg. cross-origin navigations. As such the parent process holds a singleton called
browserUniqueIds
. We cannot send it along with the command to the child actor given that it could change at any time and Ids might not be up-to-date anymore. Also if no ID exists yet one needs to be created.
- (Bug 1274251) A window or frame reference as formerly serialized is used by a command. For top-level browsing contexts the ID needs to be converted back to the associated browsing context's id. Without extra processing of the command's payload this check might have to be done in the child actor.
Generally we would like to avoid any extra IPC calls if possible which would make the methods async and also have a performance penalty.
Assignee | ||
Comment 25•6 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #24)
- An element is retrieved in the child actor and needs to be send back to the client. Therefore a node reference will be created first in the node cache of the child process which identifies the node and is alive as long as the content process exists. The parent actor which receives the serialized data needs to know the WebReference id and store it in the SeenNodes cache for the given navigable (browsing context or browser for top-level).
We agreed on that additionally returning a list of seen nodeIds to the parent actor is a good way to get the seenNodes map for the current WebDriver session updated.
- A formerly retrieved element is used by a command. Hereby the given WebReference needs to be checked if it is known within the SeenNodes cache for the given navigable. Without extra processing the command's payload this check would have to be done when deserializing the data in the child actor.
This one is more complex and there are two approaches.
a) We pass the seenNodes
list of the current navigable down to the child actor and through the full stack of methods. This has the negative effect that a lot of data might have to be transferred in case of SPA pages.
This approach can be seen with: https://treeherder.mozilla.org/jobs?repo=try&revision=d15508bcd2f458b6336da1aff695712f22e37ff6
b) We let the child actor try finding the node with the given reference within the actual browsing context, and if it is not available only raise a no such node
error. Then in the parent actor we need to intercept this error type and check if the nodeId might have been known but the top-level browsing context has been replaced. If the latter is the case a stale element error
can be raised.
To get this implemented some more work is needed. I'll try to get this implemented and come up with a proposed solution soon.
- (Bug 1274251) A window or frame needs to be serialized in the child actor. While for frames we make use of the browsing context id, for top-level browsing contexts we need a unique id given that the BC can be replaced by eg. cross-origin navigations. As such the parent process holds a singleton called
browserUniqueIds
. We cannot send it along with the command to the child actor given that it could change at any time and Ids might not be up-to-date anymore. Also if no ID exists yet one needs to be created.
- (Bug 1274251) A window or frame reference as formerly serialized is used by a command. For top-level browsing contexts the ID needs to be converted back to the associated browsing context's id. Without extra processing of the command's payload this check might have to be done in the child actor.
These two scenarios are more complex and we would like to defer to a later time over on bug 1274251.
Assignee | ||
Comment 26•6 months ago
|
||
This is basically the other option for Marionette to intercept errors and rewrite them:
https://hg.mozilla.org/try/rev/b36594fd3f9647bc8cac833bf73f45cf698f6b22#l2.22
Updated•6 months ago
|
Updated•6 months ago
|
Assignee | ||
Comment 27•6 months ago
|
||
Depends on D177492
Updated•6 months ago
|
Updated•6 months ago
|
Assignee | ||
Comment 28•6 months ago
|
||
Comment 29•6 months ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0d9194ca2da0 [wdspec] Enhanced serialization and deserialization tests. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/d6bcbd39ecf2 [marionette] Add global WebDriverSession map as singleton and allow access for Marionette command parent actor. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/796d1aa44633 [marionette] Check Navigable's seen nodes map for known nodes. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/06767279ae13 [wdspec] Improve stale element and detached shadow root tests. r=webdriver-reviewers,jdescottes
Failed to create upstream wpt PR due to merge conflicts. This requires fixup from a wpt sync admin.
Updated•6 months ago
|
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/40495 for changes under testing/web-platform/tests
Updated•6 months ago
|
Comment 32•6 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0d9194ca2da0
https://hg.mozilla.org/mozilla-central/rev/d6bcbd39ecf2
https://hg.mozilla.org/mozilla-central/rev/796d1aa44633
https://hg.mozilla.org/mozilla-central/rev/06767279ae13
Comment 33•6 months ago
|
||
Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 34•6 months ago
|
||
We are not going to fix this regression for 115 due to the risks being involved. It's safer to let it ride the trains.
Upstream PR merged by jgraham
Updated•4 months ago
|
Description
•