Closed Bug 1690308 Opened 3 years ago Closed 3 years ago

NoSuchElement error instead of StaleElement error returned during / after a cross-group navigation

Categories

(Remote Protocol :: Marionette, defect, P1)

Firefox 84
defect

Tracking

(firefox85 wontfix, firefox86 wontfix, firefox87 fixed)

RESOLVED FIXED
87 Branch
Tracking Status
firefox85 --- wontfix
firefox86 --- wontfix
firefox87 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

(Whiteboard: [not-a-fission-bug])

Attachments

(1 file)

Follow-up from bug 1684827.

When checking for element staleness during / after a cross-group navigation a NoSuchElement error is returned instead of a StaleElementReference error. The reason for that is the new browsing context id the top-level browsing context gets. For details see bug 1674329.

These changes of the browsing context including its id can only happen for the top-level browsing context, and not for child browsing contexts. As such I didn't notice this problem on bug 1684827.

When implementing the fix we have to make sure to add some wdspec tests that cover this scenario.

Thank you James Nord for reporting the issue!

The trace log as provided on the github issue is interesting:

Marionette  TRACE   [8589934593] MarionetteEvents actor created for window id 8589934595
Marionette  TRACE   Received event beforeunload for http://127.0.0.1:60596/job/tidy_broccoli/configure
Marionette  TRACE   Received DOM event click for [object HTMLButtonElement]
Marionette  TRACE   Remoteness change detected. Set new top-level browsing context to 37
Marionette  TRACE   [37] Frame script loaded
Marionette  ERROR   [37] No reply from Marionette:Register
Marionette  TRACE   [37] MarionetteEvents actor created for window id 26
Marionette  TRACE   Received event beforeunload for about:blank
Marionette  TRACE   Received event pagehide for about:blank
Marionette  TRACE   [37] MarionetteEvents actor created for window id 10737418241
Marionette  TRACE   Received event DOMContentLoaded for http://127.0.0.1:60596/job/tidy_broccoli/
Marionette  TRACE   Received event pageshow for http://127.0.0.1:60596/job/tidy_broccoli/

The page with the button that get clicked is http://127.0.0.1:60596/job/tidy_broccoli/configure. The click causes a navigation, but due to some reason there is some interaction with about:blank, which gets unloaded (pagehide).

James, which code is the event handler of the button running to trigger the navigation?

Flags: needinfo?(teilo+bugzilla)

I can replicate the failure now with the following Marionette unit test:

https://searchfox.org/mozilla-central/rev/b2433a832c250c55255e0ee37d05192d04f20427/testing/marionette/harness/marionette_harness/tests/unit/test_navigation.py#285-294

Using the file-url element after navigating to about:robots triggers the NoSuchElement error. But that's a navigation that switches the process between the content and the parent process. The interesting part is why does it happen for a content process only cross-group navigation.

The documentation for cross-group navigation gives some examples of when it is used.

James, could it be that the page is using some Cross-Origin-Opener-Policy headers?

The page with the button that get clicked is http://127.0.0.1:60596/job/tidy_broccoli/configure. The click causes a navigation, but due to some reason there is some interaction with about:blank, which gets unloaded (pagehide).

funky.... the submit does not go via about:blank however when clicking the button (to submit the form)
Firefox POSTs to http://127.0.0.1:60596/job/tidy_broccoli/configSubmit which returns a 302 redirect to http://127.0.0.1:60596/job/tidy_broccoli/
the 302 Redirect has zero content. (is that being inferred as about:blank?)

James, which code is the event handler of the button running to trigger the navigation?

it is Jenkins, and it's funky form handling :-p

there is https://github.com/jenkinsci/jenkins/blob/77824bd6965145afaa68c075bb851e4030035595/war/src/main/webapp/scripts/hudson-behavior.js#L1100-L1117 (that is attached to the form)

there is event-min attached to the actual button https://github.com/jenkinsci/jenkins/tree/77824bd6965145afaa68c075bb851e4030035595/war/src/main/webapp/scripts/yui/event (the corresponding non minimized is in the same tree).

James, could it be that the page is using some Cross-Origin-Opener-Policy headers?

Cross-Origin-Opener-Policy header -> same-origin

Flags: needinfo?(teilo+bugzilla)

if you want to see what is going on in the SUT you can see with the following steps.

  1. download https://get.jenkins.io/war-stable/2.263.3/jenkins.war
  2. download install a JRE (8 is better)
  3. run JENKINS_HOME=/tmp/jenkins_home; java -jar jenkins.war --httpListenAddress=127.0.0.1 --httpPort=8080
  4. go to http://127.0.0.1/ follow the install wizard and accept the recommended plugins
  5. click "New Item" in the left hand menu
  6. enter a name (tidy_broccoli)
  7. click "folder" to create a folder type

this leads you to http://127.0.0.1:60596/job/tidy_broccoli/configure

then you just need to click save

Thanks for the steps. I hope that I will find time later today to continue on this bug. Would you mind checking in the meantime if you see the same behavior with the Firefox preference marionette.actors.enabled is set to false?

I just run with actors disabled, and the same behavior can be seen. As such this bug doesn't have to block bug 1669172.

No longer blocks: 1669172

To solve the missing reference to the element we might want to add the browserId beside the ContentDOMReference object to the ReferenceStore. This id will be unique and remain constant for the whole lifetime of the tab:

https://searchfox.org/mozilla-central/rev/f982032c7c7618c626165bb557968f478a1952dc/testing/marionette/element.js#258-260

By having that extra ID we can always check if the currently selected top-level browsing context has been used in the same tab before.

And this will also help to fix a leak when cleaning-up the ReferenceStore for a closed tab. Right now we pass-in a browsing context, but after such a cross-group navigation elements from former top-level browsing contexts aren't removed and remain forever in the cache.

https://searchfox.org/mozilla-central/rev/f982032c7c7618c626165bb557968f478a1952dc/testing/marionette/element.js#276

This Marionette bug blocks the marionette-fission meta bug but doesn't have a [marionette-fission-mvp] or [marionette-fission-reserve] whiteboard tag. I'm going to assume it's not a blocker for Fission MVP for now.

Whiteboard: [not-a-fission-bug]

Cross-group navigations cause a swap of the current top-level
browsing context. The only unique identifier is the browser id
the browsing context actually lives in. If it's equal also
treat the browsing context as the same.

Actually the proposed fix doesn't work. The browsing context of the element might not be around anymore given that a garbage collection could have destroyed it already. This can happen at any time when there is a bit of delay between the navigation and the element usage. Looks like we will indeed have to store the browserId alongside the ContentDOMReference in our ReferenceStore.

John, would you mind testing the test build if it fixes your problem? Not sure on which platform you are on, so here the steps how to get to them:

  1. https://treeherder.mozilla.org/jobs?repo=try&revision=b28fe21d4cb7bbd8f57600d34b8ed07c34e2a393&searchStr=build
  2. Select the (Ba) opt job from your platform
  3. Click on Artifacts at the lower pane
  4. Find the Firefox build which is named like target.dmg, target.tar.bz2, or target.zip.
Flags: needinfo?(teilo+bugzilla)
Blocks: 1691918

Henrik - I see 2 options "Windows 2012 x64 opt" and "Windows 2012 x64 shippable opt" does it matter which I use?
(running windows 10 x64)

Flags: needinfo?(teilo+bugzilla)

It doesn't matter, but shippable is what we use for Nightlies.

I used the "shippable opt" version and can confirm that the issue is fixed.

Wonderful. Thanks a lot for the verification.

Blocks: 1669172
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d44e680f845
[marionette] Report StaleElementReference error after cross-group navigations. r=marionette-reviewers,jgraham
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Comment on attachment 9202028 [details]
Bug 1690308 - [marionette] Report StaleElementReference error after cross-group navigations.

Beta/Release Uplift Approval Request

  • User impact if declined: For users of WebDriver a wrong error type will be unexpectedly thrown when trying to interact with elements after a cross-group navigation causing a new top-level browsing context.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): An additional check for the browsing context's browerId has been added to check for equality of top-level browsing contexts.
  • String changes made/needed:
Attachment #9202028 - Flags: approval-mozilla-beta?
Attachment #9202028 - Flags: approval-mozilla-beta? → approval-mozilla-release?

Comment on attachment 9202028 [details]
Bug 1690308 - [marionette] Report StaleElementReference error after cross-group navigations.

We already merged mozilla-beta to mozilla-release and it is not end-user facing nor a P1, it's a bit late for an uplift, sorry.

Attachment #9202028 - Flags: approval-mozilla-release? → approval-mozilla-release-

The patch landed in nightly and beta is affected.
:whimboo, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(hskupin)

Thanks all for the rapid fix.

As given by the denied approval request this is a wontfix for 86.

Flags: needinfo?(hskupin)
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: