Closed Bug 1684827 Opened 3 years ago Closed 3 years ago

Sometimes a NoSuchElement error is returned instead of a StaleElement error

Categories

(Remote Protocol :: Marionette, defect, P2)

Firefox 84
defect

Tracking

(Fission Milestone:Future, firefox-esr78 unaffected, firefox84 wontfix, firefox85 fixed, firefox86 fixed)

RESOLVED FIXED
86 Branch
Fission Milestone Future
Tracking Status
firefox-esr78 --- unaffected
firefox84 --- wontfix
firefox85 --- fixed
firefox86 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Regression, )

Details

(Keywords: regression, Whiteboard: [marionette-fission-reserve], [wptsync upstream])

Attachments

(2 files, 1 obsolete file)

There are cases with the ReferenceStore for elements when accessing a formerly existent element throws a NoSuchElement error instead of StaleElement.

A simple Marioniette test like the following shows the problem after a couple of iterations:

        self.marionette.navigate(inline(iframe("<p>foo")))
        frame = self.marionette.find_element(By.TAG_NAME, "iframe")

        while True:
            self.marionette.refresh()

            try:
                self.marionette.switch_to_frame(frame)
            except errors.StaleElementException:
                pass
            except Exception as e:
                raise e

Blocks: 1677605
Blocks: 1677005

Tracking marionette-fission-reserve bugs for Fission Future (post-MVP).

Fission Milestone: --- → Future

I'll have a look into it.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

This certainly smells like a bug in the ContentDOMReference implementation. The element identifier and the browsing context that we pass in is always the same. But after around 15s the weak entry of the element is just gone from the gRegistry mappings. Hereby it doesn't matter how often I reload the page, a single one is enough. Here some example output with a sleep of 5s between each reload and element retrieval:

1609878652325	Marionette	DEBUG	2 -> [0,9,"WebDriver:Navigate",{"url":"data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E"}]
1609878652328	Marionette	TRACE	[10] MarionetteCommands actor created for window id 2147483649
1609878652331	Marionette	TRACE	[10] MarionetteEvents actor created for window id 2147483649
1609878652331	Marionette	TRACE	Received event beforeunload for about:blank
1609878652339	Marionette	TRACE	Received event pagehide for about:blank
1609878652342	Marionette	TRACE	[10] MarionetteEvents actor created for window id 2147483650
1609878652373	Marionette	TRACE	Received event DOMContentLoaded for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652379	Marionette	TRACE	[2147483649] MarionetteEvents actor created for window id 2147483652
1609878652407	Marionette	TRACE	Received event pageshow for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652407	Marionette	DEBUG	2 <- [1,9,null,{"value":null}]
console.warn: SearchSettings: "get: No settings file exists, new profile?" (new Error("", "(unknown module)"))
1609878652412	Marionette	DEBUG	2 -> [0,10,"WebDriver:FindElement",{"using":"tag name","value":"iframe"}]
1609878652413	Marionette	TRACE	[10] MarionetteCommands actor created for window id 2147483650

Iteration: 2021-01-05 21:30:52.423686

1609878652422	Marionette	DEBUG	2 <- [1,10,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"335db04a-3d98-f444-ba5e-af13e4ad4d22"}}]
1609878652426	Marionette	DEBUG	2 -> [0,11,"WebDriver:Refresh",{}]
1609878652426	Marionette	TRACE	Received event beforeunload for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652428	Marionette	TRACE	Received event pagehide for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652432	Marionette	TRACE	[10] MarionetteEvents actor created for window id 2147483653
1609878652453	Marionette	TRACE	Received event DOMContentLoaded for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652466	Marionette	TRACE	[2147483650] MarionetteEvents actor created for window id 2147483655
1609878652475	Marionette	TRACE	Received event pageshow for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878652475	Marionette	DEBUG	2 <- [1,11,null,{"value":null}]
1609878652480	Marionette	DEBUG	2 -> [0,12,"WebDriver:SwitchToFrame",{"focus":true,"element":"335db04a-3d98-f444-ba5e-af13e4ad4d22"}]
1609878652482	Marionette	TRACE	[10] MarionetteCommands actor created for window id 2147483653
*** obj: 335db04a-3d98-f444-ba5e-af13e4ad4d22 - window: 10
 ** context: 10 - id: 0.7007353109397114
*** mappings: [object Object]
*** weakReference: [xpconnect wrapped xpcIJSWeakReference]
1609878652486	Marionette	DEBUG	2 <- [1,12,{"error":"stale element reference","message":"The element reference of <iframe src=\"data:text/html;charset=utf-8,%3Cp% ... tte/content/evaluate.js:254:29\nreceiveMessage@chrome://marionette/content/actors/MarionetteCommandsChild.jsm:70:29\n"},null]

Iteration: 2021-01-05 21:30:57.494487

1609878657497	Marionette	DEBUG	2 -> [0,13,"WebDriver:Refresh",{}]
1609878657498	Marionette	TRACE	Received event beforeunload for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878657502	Marionette	TRACE	Received event pagehide for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878657505	Marionette	TRACE	[10] MarionetteEvents actor created for window id 2147483656
1609878657510	Marionette	TRACE	Received event DOMContentLoaded for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878657512	Marionette	TRACE	[2147483651] MarionetteEvents actor created for window id 2147483658
1609878657516	Marionette	TRACE	Received event pageshow for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878657516	Marionette	DEBUG	2 <- [1,13,null,{"value":null}]
1609878657539	Marionette	DEBUG	2 -> [0,14,"WebDriver:SwitchToFrame",{"focus":true,"element":"335db04a-3d98-f444-ba5e-af13e4ad4d22"}]
1609878657540	Marionette	TRACE	[10] MarionetteCommands actor created for window id 2147483656
*** obj: 335db04a-3d98-f444-ba5e-af13e4ad4d22 - window: 10
 ** context: 10 - id: 0.7007353109397114
*** mappings: [object Object]
*** weakReference: [xpconnect wrapped xpcIJSWeakReference]
1609878657542	Marionette	DEBUG	2 <- [1,14,{"error":"stale element reference","message":"The element reference of <iframe src=\"data:text/html;charset=utf-8,%3Cp% ... tte/content/evaluate.js:254:29\nreceiveMessage@chrome://marionette/content/actors/MarionetteCommandsChild.jsm:70:29\n"},null]

Iteration: 2021-01-05 21:31:02.549158

1609878662551	Marionette	DEBUG	2 -> [0,15,"WebDriver:Refresh",{}]
1609878662552	Marionette	TRACE	Received event beforeunload for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878662555	Marionette	TRACE	Received event pagehide for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878662558	Marionette	TRACE	[10] MarionetteEvents actor created for window id 2147483659
1609878662562	Marionette	TRACE	Received event DOMContentLoaded for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878662565	Marionette	TRACE	[2147483652] MarionetteEvents actor created for window id 2147483661
1609878662571	Marionette	TRACE	Received event pageshow for data:text/html;charset=utf-8,%3Ciframe%20src%3D%27data%3Atext/html%3Bcharset%3Dutf-8%2C%253Cp%253Efoo%27%3E%3C/iframe%3E
1609878662571	Marionette	DEBUG	2 <- [1,15,null,{"value":null}]
1609878662577	Marionette	DEBUG	2 -> [0,16,"WebDriver:SwitchToFrame",{"focus":true,"element":"335db04a-3d98-f444-ba5e-af13e4ad4d22"}]
1609878662578	Marionette	TRACE	[10] MarionetteCommands actor created for window id 2147483659
*** obj: 335db04a-3d98-f444-ba5e-af13e4ad4d22 - window: 10
 ** context: 10 - id: 0.7007353109397114
*** mappings: [object Object]
*** weakReference: undefined
1609878662579	Marionette	DEBUG	2 <- [1,16,{"error":"no such element","message":"Web element reference not seen before: {\"element-6066-11e4-a52e-4f735466cecf\":\ ... tte/content/evaluate.js:254:29\nreceiveMessage@chrome://marionette/content/actors/MarionetteCommandsChild.jsm:70:29\n"},null]

Maybe due to the reload the weak reference get eventually garbage collected? Maybe we should not assume that any element will be kept in the registry after a reload even if the browsing context doesn't change?

Kris or Nika can you help?

Flags: needinfo?(nika)
Flags: needinfo?(kmaglione+bmo)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

Maybe due to the reload the weak reference get eventually garbage collected? Maybe we should not assume that any element will be kept in the registry after a reload even if the browsing context doesn't change?

I haven't looked at the code yet, but if you're using a weak reference you definitely shouldn't assume that it will stay valid indefinitely. If a GC happens and you have a weak reference laying around it could be collected at any time, even if no navigation occurred.

You might be observing this because we often trigger GCs when a navigation occurs, which could be causing your weak references to disappear.

Flags: needinfo?(nika) → needinfo?(hskupin)

Without delaying the reload I can run about 300 iterations (reloads) without seeing the reference garbage collected. It always happens after around 15s. Here how the mapping is built-up:

https://searchfox.org/mozilla-central/rev/a0ccd492719b1ad2106f6456549be62a76f45acb/toolkit/modules/ContentDOMReference.jsm#90-94

So I wonder what's the benefit of ContentDOMRefernce when element references cannot be used over longer period of times.

Note that without navigation and forcing a GC/CC or minimizing the memory via about:memory doesn't cause the weak reference to go away. But having a single navigation / reload of the page is enough to loose the reference.

So maybe due to the reload all element references of the old document are getting garbage collected.

Flags: needinfo?(hskupin)

I think that we should not count on a possible null result of ContentDOMReference.resolve(). It could mean various things. Maybe we just handle it on our own, and not trigger a no such element error here:

https://searchfox.org/mozilla-central/rev/a0ccd492719b1ad2106f6456549be62a76f45acb/testing/marionette/element.js#817-818

Means as long as we have a valid reference id in our own Reference store (which lives in the parent process) for that specific element, we should always assume that it's stale in case of a null response.

I will have a closer look at that code tomorrow.

Flags: needinfo?(kmaglione+bmo)

Looks like we would have to partially backout the following changeset:
https://hg.mozilla.org/mozilla-central/rev/adf48383eccb

Lets see which implications that has.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #8)

Looks like we would have to partially backout the following changeset:
https://hg.mozilla.org/mozilla-central/rev/adf48383eccb

Running the tests locally with that changeset backed out works fine. As such this is a regression from bug 1665718 in Firefox 84.

Lets see how it looks on try:
https://treeherder.mozilla.org/jobs?repo=try&revision=bbb6a41f62a3ca31448d43709aa8bdd1af7c7756

I will update the developer release notes on MDN to add this known issue later today:
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/84#WebDriver_conformance_Marionette

There is actually a single failing test:
https://searchfox.org/mozilla-central/rev/a0ccd492719b1ad2106f6456549be62a76f45acb/testing/web-platform/tests/webdriver/tests/switch_to_window/switch.py#71-78

The the assumption that a NoSuchElement exception has to be raised is actually wrong. Note that Chrome and Safari are also failing here.

The WebDriver specification says:

An element is stale if its node document is not the active document or if its context object is not connected.

That means in case of switching tabs, the node document is not the active document anymore, and as such the first part of the sentence applies.

The only question that comes up now is how long do we have to keep the element references in our internal store? Does closing a tab or a window qualify that those can be removed / garbage collected? In this term the WebDriver spec isn't clear. Note that if we cannot clean up the element store long running tests would cause a significant memory increase.

James, what's your take here?

Flags: needinfo?(james)

We can definitely clean up elements from the internal store when the tab or windos is closed. From the spec:

Each browsing context has an associated list of known elements. When the browsing context is discarded, the list of known elements is discarded along with it.

So the lifetime of the connected elements is supposed to be tied to the lifetime of the browsing context. I think in detail the spec is incomplete here because it doesn't consider navigation. We should probably say that each Document has a list of known elements, which is cleaned up when the document is discarded.

Flags: needinfo?(james)

(In reply to James Graham [:jgraham] from comment #13)

So the lifetime of the connected elements is supposed to be tied to the lifetime of the browsing context. I think in detail the spec is incomplete here because it doesn't consider navigation. We should probably say that each Document has a list of known elements, which is cleaned up when the document is discarded.

Note that for remoteness changes like navigations between website -> about:robots -> website the top-level browsing context will be replaced. As such we would inappropriately remove elements. As discussed on Matrix the spec isn't perfect and needs a change that would distinct a tab from a browsing context.

As such we should keep our current behavior in cleaning up the element store when a tab/window gets closed.

Attachment #9195575 - Attachment is obsolete: true

Element references are per browsing context. As such
elements as found within a frame are not existent in
any parent browsing context, and when retrieving these
a "no such element" error has to be returned.

Depends on D100878

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/03ea34337a2d
[marionette] Treat a returned null value from ContentDOMReference.resolve() as stale element. r=marionette-reviewers,jdescottes,jgraham
https://hg.mozilla.org/integration/autoland/rev/697e1305e91b
[wdspec] Fix switch to frame tests for "no such element" error instead of "stale element reference". r=webdriver-reviewers,jgraham
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/27115 for changes under testing/web-platform/tests
Whiteboard: [marionette-fission-reserve] → [marionette-fission-reserve], [wptsync upstream]
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
Upstream PR merged by moz-wptsync-bot

Comment on attachment 9195574 [details]
Bug 1684827 - [marionette] Treat a returned null value from ContentDOMReference.resolve() as stale element.

Beta/Release Uplift Approval Request

  • User impact if declined: WebDriver tests have a relatively high likelihood to fail when trying checking for stale elements after a page navigation.

Even with a workaround by setting marionette.actors.enabled to false we would really like to keep users with our Fission implementation running.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Patch includes a check for the browsing context the expected element has to exist in. If it's different it throws the NoSuchElement exception now.
  • String changes made/needed:
Attachment #9195574 - Flags: approval-mozilla-beta?
Attachment #9195938 - Flags: approval-mozilla-beta?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #21)

Beta/Release Uplift Approval Request

  • User impact if declined: WebDriver tests have a relatively high likelihood to fail when trying checking for stale elements after a page navigation.

Even with a workaround by setting marionette.actors.enabled to false we would really like to keep users with our Fission implementation running.

I'm afraid I don't understand what the above means in practical terms...

Flags: needinfo?(hskupin)

Element references as retrieved via WebDriver have to be reported as stale when elements aren't part of the DOM tree anymore eg. after a navigation. Here the bug is that the error as returned is No Such Element, which should only be thrown when retrieving non-existent elements, or trying to interact with elements from a different browsing context.

That means that automated tests that navigate pages, and check formerly retrieved elements are likely to fail at any time, just when the garbage collector runs. This is problematic because people use it eg. to check when a navigation caused a page to unload.

Flags: needinfo?(hskupin)

OK, and what is the "fission implementation" reference about?

It's a new implementation of Marionette by using JSWindowActors. And with Firefox 84 we enabled it by default. Note that this implementation is used all the time, and not only when Fission is enabled.

Comment on attachment 9195574 [details]
Bug 1684827 - [marionette] Treat a returned null value from ContentDOMReference.resolve() as stale element.

approved for 85.0b8

Attachment #9195574 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9195938 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

The MDN release notes update will happen on: https://github.com/mdn/content/pull/1194

The issue is still actual in version 86.0a1 (2021-01-21)

On an attempt to get the text of an element in a closed iframe Firefox returns a response with status "no such element" (status code 7) and the message Web element reference not seen before: {"element-6066-11e4-a52e-4f735466cecf":"3fe368f4-70b6-034d-b90f-d40ae8ba6578"}

James, can you please have a look? Should we not return a no such element error but stale reference in case the element is from a sub browsing context? As you said in comment 13 do we need an update of the WebDriver spec? Any chance that you can get this clarified this week? I'm happy to fix when I'm back, and if no-one else can do it in the meantime.

Flags: needinfo?(james)

I'd like to see a test case for the problematic scenario, because I"m not sure I understand. If you are trying to get an element that was defined in a browsing context other than the active one you always get No Such Element. That would include an iframe child of the active browsing context. If the iframe is the active context but has been removed then in that case you should get Stale Element. I think the spec makes sense here.

Flags: needinfo?(james)

Given by David a couple days ago the failing test should be that one:

https://github.com/SeleniumHQ/selenium/blob/7286a92440182064ef3496538b4a5c63bfed5ef0/java/client/test/org/openqa/selenium/ElementFindingTest.java#L780-L787

Here I don't see that the frame is removed, but the frame's element is checked from the parent browsing context.

Alexei, is that correct?

Flags: needinfo?(barancev)

Sorry, my bad, I forgot to attach a code sample, and my wording was wrong too :(

Yes, the failing tests is testAnElementFoundInADifferentFrameIsStale. The frame that contains the target element is not closed, we've just switched to the parent context.

According to the specification:
"An element is stale if its node document is not the active document or if its context object is not connected."

In this case we hit the first part of this rule: the element's node document is not the active document.

Flags: needinfo?(barancev)

Reading the specification today I've found this passage:

To get a known element with argument reference, run the following steps:

  • Let element be the item in the current browsing context’s list of known elements for which the web element reference is equal to reference, if such an element exists. Otherwise return error with error code no such element.

This implies that if we pass an ID of an element that does not belong to the current context the driver should return "no such element" error and the current behavior is correct. The referenced element is not stale. It just belongs to another context and it can be used if we switch to its context.

So it must be a broken test, not a regression. And also there must be a bug in other drivers.

See Also: → 1689949

(In reply to Alexei Barantsev from comment #35)

To get a known element with argument reference, run the following steps:

  • Let element be the item in the current browsing context’s list of known elements for which the web element reference is equal to reference, if such an element exists. Otherwise return error with error code no such element.

Ok, this sounds good, and that we didn't cause another regression with the patch.

So it must be a broken test, not a regression. And also there must be a bug in other drivers.

As it looks like all the drivers are /were affected. In our case we also didn't obey the above definition from the spec, and always returned a stale reference error until Firefox 84. Starting with Firefox 85 we differentiate correctly now.

I filed bug 1689949 so that we can get proper wdspec tests added when switching frames.

we are still seeing an issue in Firefox 85 that appears to be exactly this (despite the fact it is marked as resolved for that version).

Perhaps there is something else that is as yet unreported?
repro steps in https://github.com/SeleniumHQ/selenium/issues/8929

removed this comment as I think it was referring to something else.

There is nothing we can do as long as no trace log is provided. I would suggest that you please file a new bug so we can investigate / discuss over there.

filed https://github.com/mozilla/geckodriver/issues/1839 (am not not 100% clear on the relationship between this tracker and the github one, if I need to file something here please let me know).

Blocks: 1690308
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: