Closed Bug 1666204 Opened 4 years ago Closed 4 years ago

Race condition between removing frame and next context browser check

Categories

(Remote Protocol :: Marionette, defect, P1)

Default
defect

Tracking

(Fission Milestone:M7, firefox83 fixed)

RESOLVED FIXED
83 Branch
Fission Milestone M7
Tracking Status
firefox83 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Whiteboard: [marionette-fission-mvp][simple])

Attachments

(1 file)

As noticed on bugs like bug 1665785 there is a race when a command like "Element Click" returns, because the current frame has been removed, and the next command fails to correctly check for the browsing context.

There might be a delay in syncing this information between the child and parent process, that we do not take into account right now. Listening for another event or observer notification in the parent process might help.

Blocks: 1665686
Blocks: 1666166
Blocks: 1665250
Blocks: 1665549
Blocks: 1666406
No longer blocks: 1666406

This should basically be fixed by bug 1394354. In failing cases synthesizing the click was delayed and as such the WebDriver:ElementClick command returned after the 200ms timeout for the beforeunload check. Final work to make it completely free of race conditions will happen on bug 1666755, but that's not needed right now.

Status: NEW → RESOLVED
Closed: 4 years ago
Depends on: 1394354
Resolution: --- → DUPLICATE
Whiteboard: [marionette-fission-mvp]

There was one more failure over on bug 1665549:

[task 2020-09-28T07:46:40.731Z] 07:46:40     INFO - PID 4391 | 1601279200727	Marionette	DEBUG	0 -> [0,52,"WebDriver:ElementClick",{"id":"870ae9af-1117-4fd3-a605-95cc77013ad3"}]
[task 2020-09-28T07:46:40.747Z] 07:46:40     INFO - PID 4391 | 1601279200741	Marionette	TRACE	[38] Frame with id 8589934594 got removed
[task 2020-09-28T07:46:40.747Z] 07:46:40     INFO - PID 4391 | 1601279200741	Marionette	TRACE	Canceled page load listener because current frame has been removed
[task 2020-09-28T07:46:40.748Z] 07:46:40     INFO - PID 4391 | 1601279200741	Marionette	DEBUG	0 <- [1,52,null,{"value":null}]
[task 2020-09-28T07:46:40.748Z] 07:46:40     INFO - PID 4391 | 1601279200742	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2020-09-28T07:46:40.749Z] 07:46:40     INFO - PID 4391 | 1601279200743	webdriver::server	DEBUG	-> POST /session/5d3d095b-f97b-4311-bb66-bf38dbb55c3e/elements {"using": "css selector", "value": "foo"}
[task 2020-09-28T07:46:40.750Z] 07:46:40     INFO - PID 4391 | 1601279200744	Marionette	DEBUG	0 -> [0,53,"WebDriver:FindElements",{"using":"css selector","value":"foo"}]
[task 2020-09-28T07:46:40.750Z] 07:46:40     INFO - PID 4391 | 1601279200744	Marionette	TRACE	Received DOM event click for [object HTMLInputElement]

While the frame gets closed by the click, the command still seems to return too early so that the next command succeeds instead of failing.

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Whiteboard: [marionette-fission-mvp]

Tracking marionette-fission-mvp bugs for Fission M7 Beta milestone. We would like Marionette tests to work with Fission before we ride the trains to Beta.

Fission Milestone: --- → M7
Whiteboard: [marionette-fission-mvp] → [marionette-fission-mvp][complex]
Blocks: 1668282

This seems to depend on bug 1666755.

Depends on: 1666755
Blocks: 1665297

Note that bug 1666755 will reduce the number of failures in CI, but will not fix it completely:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=768bd2feacd721ca2f0a9e1ccdd004ac34cbf604&searchStr=wd%2Copt%2Clinux

[task 2020-10-12T10:49:04.151Z] 10:49:04     INFO - PID 13689 | 1602499744148	Marionette	DEBUG	0 -> [0,52,"WebDriver:ElementClick",{"id":"6e6decef-542f-4306-91f1-73dc2a1d92f3"}]
[task 2020-10-12T10:49:04.168Z] 10:49:04     INFO - PID 13689 | 1602499744160	Marionette	TRACE	[38] Frame with id 10737418242 got removed
[task 2020-10-12T10:49:04.169Z] 10:49:04     INFO - PID 13689 | 1602499744160	Marionette	TRACE	Canceled page load listener because current frame has been removed
[task 2020-10-12T10:49:04.169Z] 10:49:04     INFO - PID 13689 | 1602499744160	Marionette	DEBUG	0 <- [1,52,null,{"value":null}]
[task 2020-10-12T10:49:04.170Z] 10:49:04     INFO - PID 13689 | 1602499744160	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2020-10-12T10:49:04.171Z] 10:49:04     INFO - PID 13689 | 1602499744161	webdriver::server	DEBUG	-> POST /session/e2dc90d6-b27e-4245-a962-f3e68d7548a7/element {"using": "css selector", "value": "foo"}
[task 2020-10-12T10:49:04.172Z] 10:49:04     INFO - PID 13689 | 1602499744161	Marionette	DEBUG	0 -> [0,53,"WebDriver:FindElement",{"using":"css selector","value":"foo"}]
[task 2020-10-12T10:49:04.172Z] 10:49:04     INFO - PID 13689 | 1602499744162	Marionette	TRACE	Received DOM event click for [object HTMLInputElement]

Note that the click event happens way later. So I wonder if it's the wrong frame that we actually observing here.

Note that this isn't fixed yet. The command returns too early due to the closing frame, which is received even before the click happens. Even we setup listeners before the navigation we should wait to handle them until the click has happened. I will take a look at this.

Assignee: nobody → hskupin
Status: REOPENED → ASSIGNED
Priority: P2 → P1
Blocks: 1671372

Rejecting the waitForNavigationCompleted Promise due to the frame being removed happens via the onFrameRemoved message handler:
https://searchfox.org/mozilla-central/rev/e0eb861a187f0bb6d994228f2e0e49b2c9ee455e/testing/marionette/navigate.js#309-328

That callback is invoked when the parent process receives the Marionette:FrameRemoved message via the message manager, which is actually sent by the framescript (listener.js). But that all the time, also when the actors are enabled.

To get the framescript finally removed we need to get away from using any of its methods and events / messages. So directly checking for a browsing context going away in the parent process should probably be the right way to do.

waitForNavigationCompleted knows to which browsing context it belongs. So handling the browsing-context-discarded observer notification should make it work. The only part that is not clear to me is how to differentiate between the frame being closed, and the browsing context exchanged due to a remoteness change?

Nika, if you have any advice I would be happy to hear. Otherwise I will play around tomorrow and check how this can be done. Thanks.

Flags: needinfo?(nika)

The try build looks great. So far not a single failing test:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b547cb034ba3a8644ee4b278ca4efa5bc82349ea

I will have a look now in how to figure out that the browsing context has just been swapped due to a remoteness change.

Turned out this is simple, and checking the currentWindowGlobal is enough. If the frame has been closed its value will be null. Here a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aab88e9b507542f412ba33ff5330ddd9971d14db

Something we would still have to check is how to get the new browsing context id, but that's for a follow-up bug.

Flags: needinfo?(nika)

When waiting for a navigation to complete the
"browsing-context-discarded" observer notification
is used to determine if the currently selected
browsing context has been closed or just replaced
due to a remoteness change.

Whiteboard: [marionette-fission-mvp][complex] → [marionette-fission-mvp][simple]
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25970d0b7e5
[marionette] Use "browsing-context-discarded" notification to detect a closing frame. r=marionette-reviewers,maja_zf
Status: ASSIGNED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch
No longer blocks: 1671372
Blocks: 1668786
Blocks: 1672758
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: