Closed Bug 1854942 Opened 1 year ago Closed 2 months ago

Several BiDi commands can fail with: "AbortError: Actor 'MessageHandlerFrame' destroyed before query 'MessageHandlerFrameParent:sendCommand' was resolved"

Categories

(Remote Protocol :: WebDriver BiDi, defect, P2)

defect
Points:
3

Tracking

(firefox132 fixed)

RESOLVED FIXED
132 Branch
Tracking Status
firefox132 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [webdriver:m12][webdriver:relnote])

Attachments

(2 files)

Similar to bug 1832778 which got fixed in Firefox 116 but still fails with the following Puppeteer test:

https://searchfox.org/mozilla-central/rev/699a7704521354cac1e6a0679029c89ca899e25c/remote/test/puppeteer/test/src/waittask.spec.ts#318-330

Error message:

 0:05.80 pid:46641 1695643915284	RemoteAgent	DEBUG	WebDriverBiDiConnection 2dca02c7-89b2-4d25-b179-1e54b24ab1e0 <- {"type":"error","id":14,"error":"unknown error","message":"AbortError: Actor 'MessageHandlerFrame' destroyed before query 'MessageHandlerFrameParent:sendCommand' was resolved","stacktrace":""}
 0:05.80 pid:46641 1695643915284	RemoteAgent	DEBUG	WebDriverBiDiConnection 2dca02c7-89b2-4d25-b179-1e54b24ab1e0 <- {"type":"event","method":"script.realmDestroyed","params":{"realm":"64b8a5e7-90c1-4a5f-ad25-a46464b8a827"}}
 0:05.80 pid:46641 1695643915284	RemoteAgent	DEBUG	WebDriverBiDiConnection 2dca02c7-89b2-4d25-b179-1e54b24ab1e0 <- {"type":"event","method":"script.realmCreated","params":{"realm":"fb33ee0a-a403-4991-8ae9-38fb8fba1fb6","origin":"http://localhost:59666","context":"de2e2d75-4459-4b9f-b92a-4aefee981c96","type":"window"}}
 0:05.80 pid:46641 console.error:
 0:05.80 pid:46641   InvalidStateError: JSWindowActorChild.sendAsyncMessage: JSWindowActorChild cannot send at the moment: _onRegistryEvent@chrome://remote/content/shared/messagehandler/transports/js-window-actors/MessageHandlerFrameChild.sys.mjs:101:10
 0:05.80 pid:46641 emit@resource://gre/modules/EventEmitter.sys.mjs:154:20
 0:05.80 pid:46641 _onMessageHandlerEvent@chrome://remote/content/shared/messagehandler/MessageHandlerRegistry.sys.mjs:234:10
 0:05.80 pid:46641 emit@resource://gre/modules/EventEmitter.sys.mjs:154:20
 0:05.80 pid:46641 emitEvent@chrome://remote/content/shared/messagehandler/MessageHandler.sys.mjs:178:10
 0:05.80 pid:46641 emitEvent@chrome://remote/content/shared/messagehandler/Module.sys.mjs:54:25
 0:05.80 pid:46641 #onDOMContentLoaded@chrome://remote/content/webdriver-bidi/modules/windowglobal/browsingContext.sys.mjs:105:12
 0:05.80 pid:46641 emit@resource://gre/modules/EventEmitter.sys.mjs:154:20
 0:05.80 pid:46641 #onDOMContentLoaded@chrome://remote/content/shared/listeners/LoadListener.sys.mjs:92:12
 0:05.80 pid:46641

I think this should be blocked on https://github.com/w3c/webdriver-bidi/issues/540

In general, we cannot prevent users from sending commands right before the browsing context disappears, so there is always a possibility that we can't fully run the command. However, we should send an explicit error message, hopefully defined in the spec.

We can also not retry such commands, because for instance for script.callFunction there is a high chance that the arguments are references to objects which used to live in the now-destroyed realm. So the command needs to fail, and the consumer should decide what to do about it.

Thanks Julian for taking another look. This sounds fine to me.

OS: macOS → Unspecified

A temporary fix for the Puppeteer case is provided at https://github.com/puppeteer/puppeteer/pull/11022 until we have a properly defined error type in the BiDi spec for this scenario.

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #3)

A temporary fix for the Puppeteer case is provided at https://github.com/puppeteer/puppeteer/pull/11022 until we have a properly defined error type in the BiDi spec for this scenario.

This workaround got wontfix'ed a while ago. Lets discuss the BiDi issue in tomorrow's WebDriver meeting.

Priority: -- → P2
Whiteboard: [webdriver:backlog]

The severity field is not set for this bug.
:whimboo, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(hskupin)
Severity: -- → S3
Flags: needinfo?(hskupin)

This is actually a failure that can be seen a lot when running the pdf.js Puppeteer integration tests with WebDriver BiDi. I would suggest that we add this to M9.

Blocks: 1860992
No longer blocks: 1860992

The problem is still there even with the fix for bug 1874793 landed. So I'm going to remove the see also link.

See Also: 1874793
Blocks: 1894569

There are other commands as well that can fail due to this error. We should finally get https://github.com/w3c/webdriver-bidi/issues/540 discussed to get a solution for this issue. I think that we should add it to the TPAC agenda.

In general this can happen at any time given that Bidi allows to run multiple commands in parallel. That means when one browsingContext.navigate command was sent and then immediately script.callFunction or something like browsingContext.setViewport is called the risk to see this error is high. For the latter command we probably can just make sure to set the correct browser size and then retry to wait for the wanted size for the to be loaded page. Maybe we need a solution for each and every command which indeed can be different.

Summary: `script.callFunction` errors with: "AbortError: Actor 'MessageHandlerFrame' destroyed before query 'MessageHandlerFrameParent:sendCommand' was resolved" → Several BiDi commands can fail with: "AbortError: Actor 'MessageHandlerFrame' destroyed before query 'MessageHandlerFrameParent:sendCommand' was resolved"
Whiteboard: [webdriver:backlog] → [webdriver:backlog][webdriver:triage]
Blocks: 1918639

Note that this error can easily be seen when a client starts sending the next command while the initial page load of a newly opened tab hasn't finished yet. See bug 1918672.

See Also: → 1918672
Blocks: 1855602
Blocks: 1918636

Recently a couple of Puppeteer tests started to fail and this might have been related to my landing attempts for bug 1904665, which is about the refactoring of the Actions code path in both Marionette and WebDriver BiDi. Given that the patch got backed out the failures stopped happening. I'm going to mark this bug as blocker for now.

Blocks: 1904665
See Also: → 1918287

Over on bug 1918287 we are going to use the retryOnAborted flag set to true when sending commands from the root to the child actor. We can use the same approach for other commands that are just retrieving some data from the content process. I'll have a look which of those calls we can actually update to make them more robust. We probably should fix them in a separate bug and if possible have tests as well.

But when such a command actually modifies the DOM or runs other code that affects the global environment it could have side-effects. This needs the discussion at TPAC.

Depends on: 1918672

For reference, in Bug 1918672 we extended the use of the retryOnAborted flag to be enabled whenever the target browsing context is initial or loading. This should hopefully reduce the amount of failures with AbortErrors seen by clients.

(In reply to Julian Descottes [:jdescottes] from comment #14)

For reference, in Bug 1918672 we extended the use of the retryOnAborted flag to be enabled whenever the target browsing context is initial or loading. This should hopefully reduce the amount of failures with AbortErrors seen by clients.

Just to note (even through it will be clear) this will not apply to command invocations for normal navigation commands. As such we will still have to check how several commands behave when any kind of navigation happens in parallel.

Lets discuss the priority again in today's triage meeting.

Whiteboard: [webdriver:backlog][webdriver:triage] → [webdriver:triage]

As Julian mentioned normal navigations are included as well. So we have to check here for which commands we can force the retry logic in all the cases. It would be good to see some wdspec tests for those commands that fail. For now this can be Mozilla-only tests until we have a navigation committed event.

Points: --- → 3
Whiteboard: [webdriver:triage] → [webdriver:m12]

I've just discussed with Julian again when we were talking about retries when actors are destroyed or deactivated.

As outcome of this discussion we actually think that it will be more helpful to users when we automatically retry to call the related method in the content process. That is because in most cases the client sends a command too early (missing committed event) and would accidentally trap into that behavior.

Here is what we would suggest:

  1. Automatically retry for all invocations of this.messageHandler.forwardCommand() including script evaluation.
  2. Add a hidden preference like remote.retry-on-abort that can be flipped to false to disable this behavior. We can then easily remove it after a couple of releases when it turns out it works all fine and no users are affected. We could consider the same for Marionette as well.
  3. When commands should not retry they have to opt-out by specifying retryOnAbort: false when calling the above command.

This blocks my patch for the refactoring of processing actions over on bug 1904665.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

The message handler framework now automatically retries executing
module methods in the window global if the corresponding
JSWindowActor pair is destroyed or becomes inactive.

This behavior can be disabled by setting the preference
"remote.retry-on-abort" to false, in which case retries
will only occur during the initial navigation.

Additionally, the behavior can be explicitly controlled
by individual callers by passing the "retryOnAbort" flag.

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b254963175f3 [remote] Retry invocation of module methods automatically. r=webdriver-reviewers,jdescottes
Blocks: 1673345
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4b9bb20bc785 [remote] Add documentation for the new "remote.retry-on-abort" preference. r=webdriver-reviewers,jdescottes
See Also: → 1921314

A patch has been attached on this bug, which was already closed. Filing a separate bug will ensure better tracking. If this was not by mistake and further action is needed, please alert the appropriate party. (Or: if the patch doesn't change behavior -- e.g. landing a test case, or fixing a typo -- then feel free to disregard this message)

Regressions: 1921756

There is one regression so far which should be easy to fix. As such we will wait and check beta feedback frequently before deciding if that feature should be turned off in the current beta.

Whiteboard: [webdriver:m12] → [webdriver:m12][webdriver:relnote]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: