Open Bug 1947402 Opened 12 days ago Updated 3 days ago

BiDi click (input.performActions pointer{Move,Down,Up}) sometimes does not respond

Categories

(Remote Protocol :: WebDriver BiDi, defect)

Firefox 135
x86_64
Linux
defect

Tracking

(firefox-esr115 unaffected, firefox-esr128 unaffected, firefox135 wontfix, firefox136 fix-optional, firefox137 affected)

ASSIGNED
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox135 --- wontfix
firefox136 --- fix-optional
firefox137 --- affected

People

(Reporter: martin, Assigned: jdescottes)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:135.0) Gecko/20100101 Firefox/135.0

Steps to reproduce:

We recently moved our https://cockpit-project.org tests from Firefox 134 to 135 (as part of regular Fedora refreshes). After this, one of our tests started to fail very often [1]. This clicks a button in one iframe, which causes the page to change URL and go go another -- I cannot be sure if that's really related to the root cause, but at least it's one of the few clicks which changes URL. Almost all other clicks work fine (it could just be due to timing of course).

Specifically, the test sends this move /pointer down / pointer up sequence (i.e. standard mouse click):

input.performActions({'context': '12884901891', 'actions': [{'id': 'pointer-36', 'type': 'pointer', 'parameters': {'pointerType': 'mouse'}, 'actions': [{'type': 'pointerMove', 'x': 0, 'y': 0, 'origin': {'type': 'element', 'element': {'type': 'node', 'sharedId': '170bd294-9c78-4b0b-ac77-9c10ee028a79', 'value': {'nodeType': 1, 'localName': 'button', 'namespaceURI': 'http://www.w3.org/1999/xhtml', 'childNodeCount': 1, 'attributes': {'aria-disabled': 'false', 'class': 'pf-v5-c-button pf-m-primary', 'type': 'button', 'data-ouia-component-type': 'PF5/Button', 'data-ouia-safe': 'true', 'data-ouia-component-id': 'OUIA-Generated-Button-primary-1'}, 'shadowRoot': None}}}}, {'type': 'pointerDown', 'button': 0}, {'type': 'pointerUp', 'button': 0}]}]}) [id 36]

Normally there should be a response to that, but in this case there isn't one -- the test times out waiting for the response. This isn't 100% reliable -- in our CI it happens in about 3 out of 4 runs, locally I catch it in about 2 out of 5 runs.

Note that the BiDi protocol is not dead at this point. After that timeout, it still reports a log event:

{'type': 'event', 'method': 'log.entryAdded, ... }

and the test failure handler also calls browsingContext.captureScreenshot, both of which are successful (and the screenshot gets a reponse). It's just the button click which is timing out.

firefox-135.0-1.fc41.x86_64 . The previous version 134 was fine.

The test can be reproduced locally with some effort, but if you never worked with cockpit it's some effort. If you want to try, I'm happy to talk you through it. But is there anything that I can do to enable more debug logs etc. to help you investigate this?

Thanks!

[1] https://github.com/cockpit-project/cockpit-podman/issues/1999
[2] https://github.com/cockpit-project/cockpit/blob/main/test/README.md

Actual results:

No BiDi response to the performActions() command.

Expected results:

It should have responded with the usual

{'type': 'success', 'id': 36, 'result': {}}

or at least an error message.

Component: Untriaged → WebDriver BiDi
OS: Unspecified → Linux
Product: Firefox → Remote Protocol
Hardware: Unspecified → x86_64

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Core & HTML' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: WebDriver BiDi → DOM: Core & HTML
Product: Remote Protocol → Core
Component: DOM: Core & HTML → WebDriver BiDi
Product: Core → Remote Protocol

Thank you for the bug report Martin! It looks like that this is a regression from bug 1922077, which is about enabling the async events by default in Firefox and that shipped with 135. Can you please try to set the remote.events.async.enabled preference of Firefox to false? Does that fix the issue? If yes, you should run your tests that way for now until we got this bug investigated and fixed.

Regarding more logs please enable the trace logs, which can be done by setting the preference remote.log.level to true in the new session capabilities. Hopefully this already gives some information where it is actually hanging. In our tests we didn't notice such a hang yet, so your input would be appreciated!

Thanks Henrik! Amazing, thanks for finding the trigger so quickly. I ran the test umpteen times in a loop with that option and indeed it seems stable now. I sent https://github.com/cockpit-project/cockpit/pull/21609 as a quick fix.

I'll follow up with the remote.log.level (although I figure you probably did not mean to set it to true -- but I found https://firefox-source-docs.mozilla.org/remote/Debugging.html so Trace seems right.

Henrik, I feel really silly for asking, but where do the trace logs go to? https://xkcd.com/838/ I enabled both remote.log.level="Trace" and remote.log.truncate=false, but these don't go to stdout/err: The only output from the "firefox --marionette" process is

*** You are running in headless mode.
WebDriver BiDi listening on ws://127.0.0.1:40421
[GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt

They also don't trigger a log.entryAdded event, nor are they sent as message on the bidi websocket. I ran firefox under strace -fvvs1024, but produces a 160 MB output file with no obvious matches of "TRACE", "Trace", or "trace". But then again it doesn't even contain "performAction", so now I'm seriously doubting myself. I keep the strace around just in case.

(In reply to Martin Pitt from comment #3)

I'll follow up with the remote.log.level (although I figure you probably did not mean to set it to true -- but I found https://firefox-source-docs.mozilla.org/remote/Debugging.html so Trace seems right.

Oh yes, that was wrong. Good that you noticed yourself!

(In reply to Martin Pitt from comment #4)

Henrik, I feel really silly for asking, but where do the trace logs go to? https://xkcd.com/838/ I enabled both remote.log.level="Trace" and remote.log.truncate=false, but these don't go to stdout/err: The only output from the "firefox --marionette" process is

How did you set the preference? Do you have some code that you could share? Do you see which commands the client sends to Firefox via Marionette or the WebDriver BiDi WebSocket? It looks like that the preference is not correctly applied.

The preferences are set here: https://github.com/cockpit-project/cockpit/blob/64e7b9c4cdf7aa6b76a2bf24f574758af2cac586/test/common/webdriver_bidi.py#L459 That works in general, nothing much would work without remote.enabled, remote.frames.enabled, or marionette.port. Locally I added these two:

            user_pref('remote.log.level', 'Trace');
            user_pref('remote.log.truncate', false);

Do you see which commands the client sends to Firefox via Marionette or the WebDriver BiDi WebSocket?

Yes, it logs everything that goes to/comes from the websocket.

(In reply to Martin Pitt from comment #6)

The preferences are set here: https://github.com/cockpit-project/cockpit/blob/64e7b9c4cdf7aa6b76a2bf24f574758af2cac586/test/common/webdriver_bidi.py#L459 That works in general, nothing much would work without remote.enabled, remote.frames.enabled, or marionette.port.

Please note that remote.enabled and remote.frames.enabled do no longer exist. So they can be removed.

Do you see which commands the client sends to Firefox via Marionette or the WebDriver BiDi WebSocket?

Yes, it logs everything that goes to/comes from the websocket.

Based on the above link I can see that you are using asyncio.create_subprocess_exec() to start Firefox, but you are not pipe'ing stdout. You will have to continuously read the data (process output) from the PIPE.

And as well to note is that when the client doesn't read the PIPE at all it can cause Firefox to not respond anymore after a given amount of time. I fixed a similar thing for Puppeteer over on bug 1734208.

Yes, by default it just inherits the parent's stdout/err, I do see the browser output on the terminal. I also tried something like

out = open("/tmp/ffox", "w")
create_subprocess_exec(..., stdout=out, stderr=subprocess.STDERR)

which gave me the three-liner I quoted in comment #4. I don't think that's it. So are you saying the tracing messages should appear on stdout?

It could be that Fedora's firefox package is built with something that breaks the tracing messages. I can try with an upstream build (but not today any more). Thanks!

(of course I meant stderr=subprocess.STDOUT -- sorry, was typing this from memory, I didn't keep that change as it was not helpful)

I grabbed the latest nightly 137.0a1 from https://download.mozilla.org/?product=firefox-nightly-latest-l10n-ssl&os=linux64&lang=en-US . Still no sign of any tracing output, and I can still reproduce the bug with that. So much for hoping that it's already fixed 😢

I also ripped out all the create_subprocess_exec() stuff, hardcoded the ports, created /tmp/h/profile/user.js with

user_pref('remote.log.level', 'Trace');
user_pref('remote.log.truncate', false);
user_pref("app.update.auto", false);
user_pref("datareporting.policy.dataSubmissionEnabled", false);
user_pref("toolkit.telemetry.reportingpolicy.firstRun", false);
user_pref("dom.disable_beforeunload", true);
user_pref("browser.download.dir", "/tmp/cockpit-test-browser-home-_chb5jlm/Downloads");
user_pref("browser.download.folderList", 2);
user_pref("signon.rememberSignons", false);
user_pref("dom.navigation.locationChangeRateLimit.count", 9999);
user_pref('marionette.port', 30000);

and ran the browser manually:

rm -rf /tmp/hh; cp -r /tmp/h /tmp/hh; firefox -profile /tmp/hh/profile --marionette --no-remote --remote-debugging-port=1234 -headless about:blank

so I see all stdout/err immediately and there is no blocking. No tracing in sight. I know that the profile works, as when I change marionette.port to e.g. 30001, the test cannot connect. I also tried to set MOZ_LOG_FILE=/tmp/moz, and it does create /tmp/moz.moz_log and /tmp/moz.child-1.moz_log but they are both empty. I didn't set $MOZ_LOG as I don't know what to set it to.

I also tried to set toolkit.dump.emit=true, no observable difference.

I'm running out of ideas, sorry..

Flags: needinfo?(martin)

Can you please show me the output that you get when running Firefox manually with the pre-configured profile? You could also add --remote-allow-origins=https://juliandescottes.github.io as argument for Firefox and then try out our BiDi demo client at https://juliandescottes.github.io/bidi-web-client/web/.

Note that our logging is not related to MOZ_LOG (which is platform code only) and also any other pref is correctly set when Firefox gets started. So you definitely should see tracing logs. Maybe explicitly check about:config for the value of remote.log.level.

Can you please show me the output that you get when running Firefox manually with the pre-configured profile?

Already did in comment #4 -- I only ever get these three lines, no other output.

You could also add --remote-allow-origins=https://juliandescottes.github.io as argument for Firefox and then try out our BiDi demo client at https://juliandescottes.github.io/bidi-web-client/web/.

Nice hint, that's a cool page!

I cleaned up the profile a bit:

❱❱❱ cat /tmp/h/profile/user.js
user_pref('remote.log.level', 'Trace');
user_pref('remote.log.truncate', false);
user_pref("app.update.auto", false);
user_pref('marionette.port', 30000);

and run Firefox in headful mode on that demo webpage:

rm -rf /tmp/hh; cp -r /tmp/h /tmp/hh; firefox -profile /tmp/hh/profile --marionette --no-remote --remote-debugging-port --remote-allow-origins=https://juliandescottes.github.io https://juliandescottes.github.io/bidi-web-client/web/
WebDriver BiDi listening on ws://127.0.0.1:9222

So you definitely should see tracing logs. Maybe explicitly check about:config for the value of remote.log.level.

I see the two remote.log values as bold (i.e. "changed") in about:config with the correct values.

On the demo page I can connect and run a few commands, and I get the expected requests/responses in the protocol traffic. I created a new browser context and .navigate({ context: "3e20355a-e766-4604-82fb-7f2ecd657394", url: "https://gnome.org" }) which works fine.

However, during all that, firefox writes a single line to the terminal:

WebDriver BiDi listening on ws://127.0.0.1:9222

I also tried to drop --marionette --no-remote from the CLI, no difference. I can't for the life of me see any tracing anywhere.

Could you try to set browser.dom.window.dump.enabled to true?

Flags: needinfo?(martin)

Just to add some context to Julian's comment...

Interesting! I can actually see it as well. I was talking with the team and we figured out that this is actually related to the browser.dom.window.dump.enabled preference set to false in release builds. Given that we test locally always with our own builds we haven't seen this yet for BiDi. Also for users of geckodriver and Puppeteer it's not a problem given that we set this preference explicitly.

So please add it as well to your client and you should be able to see the logs immediately. We are going to add it as well to our recommended preferences so that clients do not have to set it on their own.

A-haaaa! Thanks Julian and Henrik, that did the trick! I captured this in https://github.com/cockpit-project/cockpit/pull/21612 so that in the future it'll be much easier for us to turn debug logging to 11.

Gather 'round kids, Uncle Firefox has a story to tell! I captured a full "Trace" enabled log for both a passing and failing. The interesting bit where it clicks the button (that then changes the URL) is here: The first log line is our own "client-side" logger (INFO:bidi.command:← means "that's the JSON that I sent to Firefox"), the rest is Firefox' own Trace log:

INFO:bidi.command:← input.performActions({'context': '12884901891', 'actions': [{'id': 'pointer-36', 'type': 'pointer', 'parameters': {'pointerType': 'mouse'}, 'actions': [{'type': 'pointerMove', 'x': 0, 'y': 0, 'origin': {'type': 'element', 'element': {'type': 'node', 'sharedId': 'e667f324-8012-4a06-b6c0-00fe7758d923', 'value': {'nodeType': 1, 'localName': 'button', 'namespaceURI': 'http://www.w3.org/1999/xhtml', 'childNodeCount': 1, 'attributes': {'aria-disabled': 'false', 'class': 'pf-v5-c-button pf-m-primary', 'type': 'button', 'data-ouia-component-type': 'PF5/Button', 'data-ouia-safe': 'true', 'data-ouia-component-id': 'OUIA-Generated-Button-primary-1'}, 'shadowRoot': None}}}}, {'type': 'pointerDown', 'button': 0}, {'type': 'pointerUp', 'button': 0}]}]}) [id 36]
1739370222471	RemoteAgent	DEBUG	WebDriverBiDiConnection af999bec-94f2-47f3-89d5-e54d435f84f8 -> {"id":36,"method":"input.performActions","params":{"context":"12884901891","actions":[{"id":"pointer-36","type":"pointer","parameters":{"pointerType":"mouse"},"actions":[{"type":"pointerMove","x":0,"y":0,"origin":{"type":"element","element":{"type":"node","sharedId":"e667f324-8012-4a06-b6c0-00fe7758d923","value":{"nodeType":1,"localName":"button","namespaceURI":"http://www.w3.org/1999/xhtml","childNodeCount":1,"attributes":{"aria-disabled":"false","class":"pf-v5-c-button pf-m-primary","type":"button","data-ouia-component-type":"PF5/Button","data-ouia-safe":"true","data-ouia-component-id":"OUIA-Generated-Button-primary-1"},"shadowRoot":null}}}},{"type":"pointerDown","button":0},{"type":"pointerUp","button":0}]}]}}
1739370222471	RemoteAgent	TRACE	Received command input.performActions for destination ROOT
1739370222472	RemoteAgent	TRACE	Received command input._getElementOrigin for destination WINDOW_GLOBAL
1739370222473	Marionette	TRACE	Dispatching tick 1/3
1739370222473	Marionette	TRACE	Dispatch PointerMoveAction mouse with id: pointer-36 x: 0 y: 0
1739370222473	RemoteAgent	TRACE	Received command input._getClientRects for destination WINDOW_GLOBAL
1739370222473	RemoteAgent	TRACE	Received command input._getInViewCentrePoint for destination WINDOW_GLOBAL
1739370222474	RemoteAgent	TRACE	Received command input._assertInViewPort for destination WINDOW_GLOBAL
1739370222474	Marionette	TRACE	moveOverTime start: 0,0 target: 839,215 duration: 0
1739370222474	Marionette	TRACE	PointerMoveAction.performPointerMoveStep [839,215]
1739370222474	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370222481	Marionette	TRACE	Dispatching tick 2/3
1739370222481	Marionette	TRACE	Dispatch PointerDownAction mouse with id: pointer-36 button: 0
1739370222482	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370222485	Marionette	TRACE	Dispatching tick 3/3
1739370222485	Marionette	TRACE	Dispatch PointerUpAction mouse with id: pointer-36 button: 0
1739370222486	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370222488	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739370222489	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739370222489	RemoteAgent	TRACE	Received command input._finalizeAction for destination WINDOW_GLOBAL
1739370222493	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370222493	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370222494	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370222494	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370222519	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739370222519	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket

After that there's just an (unrelated) log.entryAdded, and the timeout for waiting for the input.performActions reply.

The same in a successful run:

INFO:bidi.command:← input.performActions({'context': '12884901891', 'actions': [{'id': 'pointer-36', 'type': 'pointer', 'parameters': {'pointerType': 'mouse'}, 'actions': [{'type': 'pointerMove', 'x': 0, 'y': 0, 'origin': {'type': 'element', 'element': {'type': 'node', 'sharedId': 'd05ba08a-4a33-4d72-a7c2-f4f929ab82df', 'value': {'nodeType': 1, 'localName': 'button', 'namespaceURI': 'http://www.w3.org/1999/xhtml', 'childNodeCount': 1, 'attributes': {'aria-disabled': 'false', 'class': 'pf-v5-c-button pf-m-primary', 'type': 'button', 'data-ouia-component-type': 'PF5/Button', 'data-ouia-safe': 'true', 'data-ouia-component-id': 'OUIA-Generated-Button-primary-1'}, 'shadowRoot': None}}}}, {'type': 'pointerDown', 'button': 0}, {'type': 'pointerUp', 'button': 0}]}]}) [id 36]
1739370271710	RemoteAgent	DEBUG	WebDriverBiDiConnection 3c8496af-21a7-446e-bd52-9d0e30c68756 -> {"id":36,"method":"input.performActions","params":{"context":"12884901891","actions":[{"id":"pointer-36","type":"pointer","parameters":{"pointerType":"mouse"},"actions":[{"type":"pointerMove","x":0,"y":0,"origin":{"type":"element","element":{"type":"node","sharedId":"d05ba08a-4a33-4d72-a7c2-f4f929ab82df","value":{"nodeType":1,"localName":"button","namespaceURI":"http://www.w3.org/1999/xhtml","childNodeCount":1,"attributes":{"aria-disabled":"false","class":"pf-v5-c-button pf-m-primary","type":"button","data-ouia-component-type":"PF5/Button","data-ouia-safe":"true","data-ouia-component-id":"OUIA-Generated-Button-primary-1"},"shadowRoot":null}}}},{"type":"pointerDown","button":0},{"type":"pointerUp","button":0}]}]}}
1739370271710	RemoteAgent	TRACE	Received command input.performActions for destination ROOT
1739370271711	RemoteAgent	TRACE	Received command input._getElementOrigin for destination WINDOW_GLOBAL
1739370271711	Marionette	TRACE	Dispatching tick 1/3
1739370271711	Marionette	TRACE	Dispatch PointerMoveAction mouse with id: pointer-36 x: 0 y: 0
1739370271712	RemoteAgent	TRACE	Received command input._getClientRects for destination WINDOW_GLOBAL
1739370271712	RemoteAgent	TRACE	Received command input._getInViewCentrePoint for destination WINDOW_GLOBAL
1739370271712	RemoteAgent	TRACE	Received command input._assertInViewPort for destination WINDOW_GLOBAL
1739370271713	Marionette	TRACE	moveOverTime start: 0,0 target: 839,215 duration: 0
1739370271713	Marionette	TRACE	PointerMoveAction.performPointerMoveStep [839,215]
1739370271713	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370271722	Marionette	TRACE	Dispatching tick 2/3
1739370271722	Marionette	TRACE	Dispatch PointerDownAction mouse with id: pointer-36 button: 0
1739370271723	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370271726	Marionette	TRACE	Dispatching tick 3/3
1739370271726	Marionette	TRACE	Dispatch PointerUpAction mouse with id: pointer-36 button: 0
1739370271728	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739370271731	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739370271731	RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739370271731	RemoteAgent	TRACE	Received command input._finalizeAction for destination WINDOW_GLOBAL
1739370271737	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370271737	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370271738	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370271738	RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739370271741	RemoteAgent	DEBUG	WebDriverBiDiConnection 3c8496af-21a7-446e-bd52-9d0e30c68756 <- {"type":"success","id":36,"result":{}}

Diff after removing the timestamps:

 RemoteAgent	TRACE	[12884901890] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
-RemoteAgent	DEBUG	WebDriverBiDiConnection 3c8496af-21a7-446e-bd52-9d0e30c68756 <- {"type":"success","id":36,"result":{}}
+RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
+RemoteAgent	TRACE	[10] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket

To my untrained eye it pretty much just confirms the bug, though: in the successful case (-) it sends the response, in the failing case the click is "fast enough" to change the URL. That seems to swallow the performActions() response somehow?

Thanks!

Flags: needinfo?(martin)

So it looks like that the command is hanging somewhere when trying to dispatch the action item (through IPC with the web content process). The actual event is synthesized because the page navigates but we may hang here:

https://searchfox.org/mozilla-central/rev/5e7382bf8bbb88d8260c72990cfea9b626d9b307/remote/shared/webdriver/Actions.sys.mjs#2508-2512

Martin, do you maybe have a minimized HTML test case that reproduces the issue? Otherwise I may have to ask you to run with some example build, or if possible build an artifact Firefox build locally yourself (which takes about 3min) after cloning our repository.

Henrik: I'll try to put together a small standalone reproducer (tomorrow morning, sorry -- still quite sick, I'm useless in evenings). If that works, I suppose that's much more useful and efficient.

If I can't make one after, say, 3 hours, I am also happy to build something (full firefox takes several hours, but happy to hear that there's some subset that suffices :) ).

I've tried to simplify the test so that the <button> does a simple window.parent.history.pushState() and/or window.parent.location.assign(), but that already makes the bug go away. So this is either very sensitive to timing, or it depends on the rather elaborate frame/state/location handling of cockpit, which isn't easy to rip out. So, I'm sorry -- I'm happy to try it with some local build/patches.

Yeah, it's not always that easy. But thanks for your engagement and the willingness to actually build Firefox yourself. I assume that you are a git user and would like to use it when cloning our code. So you can check the following documentation in how to use it:

https://github.com/glandium/git-cinnabar/wiki/Mozilla:-A-git-workflow-for-Gecko-development

Also it would be great if you could join our WebDriver Matrix channel at https://chat.mozilla.org/#/room/#webdriver:mozilla.org (or via another Matrix server). It would make it easier to teach you in how to build, modify the source, and run the tests. Thanks a lot!

  • git/hg clone of https://hg.mozilla.org/mozilla-unified takes more than a day (even with git --depth=1); I cancelled
  • cloning the mirror at https://github.com/mozilla/gecko-dev/ with --depth=1 takes only a few minutes; mach bootstrap works, but mach build fails with mach.util.UserError: Could not list any recent revisions in your clone. Does your clone have git-cinnabar metadata? The same happens with --depth=10. The same happens with saying no to "Would you like to run a few configuration steps to ensure Git is optimally configured?".
  • There aren't any google-able answers to that, other than https://stackoverflow.com/questions/78604870/firefox-wont-build-unless-it-is-a-mercurial-repository (unanswered).
  • I tried git cinnabar fetch hg::https://hg.mozilla.org/mozilla-unified 0cf75e28377db0f01a0b40ca3083154505857764 (the current topmost changeset on the default branch. That takes only about 20 minutes. But it still failed with the same error.

So I'll re-attempt the hg clone -b default https://hg.mozilla.org/mozilla-unified gecko, and report back in about 24 h.

hg clone -b default https://hg.mozilla.org/mozilla-unified gecko was a magnitude faster (~ 1 h) than any non-shallow git clone (~ 24 h). With that, mach bootstrap/build works, and I can run ./obj-x86_64-pc-linux-gnu/dist/bin/firefox --no-remote. I changed a string in browser/locales/en-US/browser/appmenu.ftl, rebuilt, and can see the change in the menu.

So now I'm ready to test changes. Thanks for hand-holding!

Henrik asked me to put some logging to the dispatchEvents in remote/shared/webdriver/Actions.sys.mjs and suspected that one of the awaits wouldn't return. So I did this:

diff --git a/remote/shared/webdriver/Actions.sys.mjs b/remote/shared/webdriver/Actions.sys.mjs
--- a/remote/shared/webdriver/Actions.sys.mjs
+++ b/remote/shared/webdriver/Actions.sys.mjs
@@ -2459,11 +2459,13 @@ class MousePointer extends Pointer {
       mouseEvent.clickCount = state.clickTracker.count + 1;
     }
 
+    lazy.logger.warn("***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 1");
     await dispatchEvent("synthesizeMouseAtPoint", context, {
       x: inputSource.x,
       y: inputSource.y,
       eventData: mouseEvent,
     });
+    lazy.logger.warn("***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 1 done");
 
     if (
       lazy.event.MouseButton.isSecondary(mouseEvent.button) ||
@@ -2471,11 +2473,13 @@ class MousePointer extends Pointer {
     ) {
       const contextMenuEvent = { ...mouseEvent, type: "contextmenu" };
 
+      lazy.logger.warn("***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 2");
       await dispatchEvent("synthesizeMouseAtPoint", context, {
         x: inputSource.x,
         y: inputSource.y,
         eventData: contextMenuEvent,
       });
+      lazy.logger.warn("***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 2 done");
     }
   }
 
@@ -2505,11 +2509,13 @@ class MousePointer extends Pointer {
     state.clickTracker.setClick(action.button);
     mouseEvent.clickCount = state.clickTracker.count;
 
+    lazy.logger.warn("***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint");
     await dispatchEvent("synthesizeMouseAtPoint", context, {
       x: inputSource.x,
       y: inputSource.y,
       eventData: mouseEvent,
     });
+    lazy.logger.warn("***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint done");
   }
 
   /**

but (perhaps) surprisingly they all finish:

1739482125492	RemoteAgent	DEBUG	WebDriverBiDiConnection 361ef25e-5a9a-4bd0-8a5f-06a2c6155571 -> {"id":36,"method":"input.performActions","params":{"context":"15032385539","actions":[{"id":"pointer-36","type":"pointer","parameters":{"pointerType":"mouse"},"actions":[{"type":"pointerMove","x":0,"y":0,"origin":{"type":"element","element":{"type":"node","sharedId":"423830f8-4858-4446-b58b-3b162aecbeef","value":{"nodeType":1,"localName":"button","namespaceURI":"http://www.w3.org/1999/xhtml","childNodeCount":1,"attributes":{"aria-disabled":"false","class":"pf-v5-c-button pf-m-primary","type":"button","data-ouia-component-type":"PF5/Button","data-ouia-safe":"true","data-ouia-component-id":"OUIA-Generated-Button-primary-1"},"shadowRoot":null}}}},{"type":"pointerDown","button":0},{"type":"pointerUp","button":0}]}]}}
1739482125492	RemoteAgent	TRACE	Received command input.performActions for destination ROOT
1739482125493	RemoteAgent	TRACE	Received command input._getElementOrigin for destination WINDOW_GLOBAL
1739482125493	Marionette	TRACE	Dispatching tick 1/3
1739482125493	Marionette	TRACE	Dispatch PointerMoveAction mouse with id: pointer-36 x: 0 y: 0
1739482125493	RemoteAgent	TRACE	Received command input._getClientRects for destination WINDOW_GLOBAL
1739482125494	RemoteAgent	TRACE	Received command input._getInViewCentrePoint for destination WINDOW_GLOBAL
1739482125494	RemoteAgent	TRACE	Received command input._assertInViewPort for destination WINDOW_GLOBAL
1739482125495	Marionette	TRACE	moveOverTime start: 0,0 target: 839,215 duration: 0
1739482125495	Marionette	TRACE	PointerMoveAction.performPointerMoveStep [839,215]
1739482125495	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739482125504	Marionette	TRACE	Dispatching tick 2/3
1739482125504	Marionette	TRACE	Dispatch PointerDownAction mouse with id: pointer-36 button: 0
1739482125505	Marionette	WARN	***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 1
1739482125505	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739482125509	Marionette	WARN	***** MousePointer.pointerDown dispatchEvent synthesizeMouseAtPoint 1 done
1739482125509	Marionette	TRACE	Dispatching tick 3/3
1739482125510	Marionette	TRACE	Dispatch PointerUpAction mouse with id: pointer-36 button: 0
1739482125510	Marionette	WARN	***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint
1739482125511	RemoteAgent	TRACE	Received command input._dispatchEvent for destination WINDOW_GLOBAL
1739482125512	Marionette	WARN	***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint done
1739482125515	RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739482125515	RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739482125516	RemoteAgent	TRACE	Received command input._finalizeAction for destination WINDOW_GLOBAL
1739482125522	RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739482125522	RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739482125523	RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739482125523	RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
1739482125552	RemoteAgent	DEBUG	WebDriverBiDiConnection 361ef25e-5a9a-4bd0-8a5f-06a2c6155571 <- {"type":"event","method":"log.entryAdded","params":{"type":"console","method":"log","source":{"realm":"3ba3feec-b4b2-4078-b041-75fd5e70a074","context":"15032385537"},"args":[{"type":"string","value":"failed to fetch lastlog2:"},{"type":"object","value":[["problem",{"type":"null"}],["exit_status",{"type":"number","value":1}],["exit_signal",{"type":"null"}],["message",{"type":"string","value":"/var/lib/lastlog/lastlog2.db does not exist"}],["toString",{"type":"function"}]]}],"level":"info","text":"failed to fetch lastlog2: [object Object]","timestamp":1739482125551}}
> info: failed to fetch lastlog2: [object Object]
1739482125556	RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
1739482125556	RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket

... and then the timeout for never getting the response back. The pointerUp is the last of the events. I need to stop today, but tomorrow I'll add some tracing around performActions -- there are several definitions of that function in various files, but shouldn't be hard to figure out the correct one.

Tracing this further:

  • remote/webdriver-bidi/modules/root/input.sys.mjs in async performActions() the last line calls await this.#finalizeAction(context); . This gets called, but never returns. I also checked that it does not throw.
  • remote/webdriver-bidi/modules/windowglobal/input.sys.mjs the call to async _finalizeAction() arrives (it's an indirection through this._forwardToWindowGlobal, so this isn't obvious). await lazy.AnimationFramePromise(this.messageHandler.window); is called, but never returns. My hunch is that the button click action is sometimes fast enough to change the URL to the new window. The button click causes two things: (1) load a new iframe (cockpit/@localhost/system/services.html) and then the top-level document "switches" to it and changes the visible URL (to /system/services#/podman.socket). In both the good and failing cases the iframe load happens quickly, before finalizeAction finishes the animation wait.

But the working case, the top-level switch happens after the animation wait (the two last lines in the log):

Marionette	WARN	***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint done
XXX performActions: calling finalizeAction
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	Received command input._finalizeAction for destination WINDOW_GLOBAL
XXX _finalizeAction start
XXX _finalizeAction endWheelTransaction finished, calling AnimationFramePromise
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
XXX _finalizeAction AnimationFramePromise done
XXX performActions: done
RemoteAgent	DEBUG	WebDriverBiDiConnection 35091a73-8e78-4c61-be2c-25407ecb27a8 <- {"type":"success","id":36,"result":{}}
[... some other unrelated bits ...]
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket

while in the failing case the top-level fragmentNavigated happens before the AnimationFramePromise is done:

Marionette	WARN	***** MousePointer.pointerUp dispatchEvent synthesizeMouseAtPoint done
XXX performActions: calling finalizeAction
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	Received command input._finalizeAction for destination WINDOW_GLOBAL
XXX _finalizeAction start
XXX _finalizeAction endWheelTransaction finished, calling AnimationFramePromise
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[15032385538] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/cockpit/@localhost/system/services.html#/podman.socket
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket
RemoteAgent	TRACE	[11] NavigationListenerChild Location=fragmentNavigated: http://127.0.0.2:9091/system/services#/podman.socket

I can make the test pass reliably by adding a timeout. This is of course not an actual fix, just a demonstration what the problem is:

+++ b/remote/webdriver-bidi/modules/windowglobal/input.sys.mjs
@@ -193,13 +193,27 @@ class InputModule extends WindowGlobalBi
   }
 
   async _finalizeAction() {
+    dump("XXX _finalizeAction start\n");
     // Terminate the current wheel transaction if there is one. Wheel
     // transactions should not live longer than a single action chain.
     ChromeUtils.endWheelTransaction();
 
     // Wait for the next animation frame to make sure the page's content
     // was updated.
-    await lazy.AnimationFramePromise(this.messageHandler.window);
+    dump("XXX _finalizeAction endWheelTransaction finished, calling AnimationFramePromise\n");
+    await new Promise(resolve => {
+        const timeout = this.messageHandler.window.setTimeout(() => {
+            dump("XXX _finalizeAction AnimationFramePromise timed out\n");
+            resolve();
+        }, 1000);
+
+        lazy.AnimationFramePromise(this.messageHandler.window)
+                .then(() => {
+                    this.messageHandler.window.clearTimeout(timeout);
+                    dump("XXX _finalizeAction AnimationFramePromise done\n")
+                    resolve();
+                });
+    });
   }
 
   async _getClientRects(options) {

It seems to me the correct fix is to resolve the AnimationFramePromise once the URL changes, i.e. on a fragmentNavigated event which changes the current context's window location.

remote/shared/Sync.sys.mjs AnimationFramePromise() already tries to do the right thing by also waiting for unloadPromise = new EventPromise(win, "pagehide"), but that doesn't happen here -- the original iframe (where the button click occurs) does not go away, it just isn't visible any more as the top-level window hides it (which is apparently not covered by pagehide) and shows the new one instead.

Any idea how to teach it that? I tried a total shot in the dark, but that doesn't work:

--- a/remote/shared/Sync.sys.mjs
+++ b/remote/shared/Sync.sys.mjs
@@ -33,10 +33,11 @@ export function AnimationFramePromise(wi
     });
   });
 
-  // Abort if the underlying window is no longer active (closed, BFCache)
+  // Abort if the underlying window is no longer active (closed, BFCache, or navigated to new URL)
   const unloadPromise = new EventPromise(win, "pagehide");
+  const loadPromise = new EventPromise(win, "pageload");
 
-  return Promise.race([animationFramePromise, unloadPromise]);
+  return Promise.race([animationFramePromise, unloadPromise, loadPromise]);
 }
 
 /**

Do you know how to actually implement the intention here? Thanks!

Martin, would it be possible for you to quickly lay out the structure of the HTML? Maybe you can create a minimal example that we could then use to extend more in the hope to reproduce the problem?

  1. From where is the click issued (top-level or frame)?
  2. What exactly happens? If it's an iframe it's not getting removed, right?
  3. What happens with the top-level document?
Flags: needinfo?(martin)

Good news, I managed to create a standalone reproducer! Please see https://github.com/martinpitt/bidi-test/tree/firefox-async-events -- this originated from a reproducer that I wrote for a Chromium bug, so it still has the Chromium driver and probably some other bits that are unnecessary. But the three HTML files are specific to this bug.

The structure: there is a top-level index.html which represents the "Application". It can switch between "pages" (first.html, second.html), normally with a menu, but I left all of that out as it's irrelevant. The intention is that at any given time, only one page is visible. The Application can load new frames dynamically (i.e. pages are loaded on demand, not all at once initially, as that is expensive).

A page can also send a request to the top-level app to switch the currently visible page to a different one. That's what happens here. first.html sends a request via postMessage(), index.html receives it and loads second.html. The second iframe becomes visible, and first iframe gets hidden with style.display = "none" (but stays loaded). The top-level document doesn't change structure beyond that.

Reproducer:

git clone -b firefox-async-events https://github.com/martinpitt/bidi-test
cd bidi-test
python3 -m http.server &
./bidimouse.py -f

This fails in the majority of runs on my laptop -- as this is quite a bit faster than actual cockpit, it "wins" the race more often (i.e. the "Navigation finished" happens before the "wait for animations" promise finishes).

Notes:

  • You can look at it interactively at http://localhost:8000/
  • Without options, the reproducer runs chromium; that needs chromiumdriver. You probably don't need that here. The reproducer works reliably with Chromium, FTR.
  • You can add -s to show Firefox, by default it runs in headless. It's not super interesting, but perhaps you want the inspector or so. For that, add a sufficiently large await asyncio.sleep(0.5) to the bottom of bidimouse.py.
  • The reproducer has a commented out remote.events.async.enabled setting. Enabling that makes the reproducer work reliably (as established above).
Flags: needinfo?(martin)

Thanks for the test case!

From a quick investigation,

  • the click is sent to a frame
  • the click triggers a navigation in the frame (via a postMessage sent to the parent window)
  • the input module calls finalizeAction at the end of performActions (searchfox)

The issue is that we call finalizeAction on the previous browsing context for the iframe which navigated. Not sure exactly yet in which state the browsing context is, but it's no longer live, so await lazy.AnimationFramePromise(this.messageHandler.window);

Will take another look tomorrow.

Flags: needinfo?(jdescottes)

Oh I slightly misread the test case. The click doesn't navigate the first iframe, it hides it with

document.getElementById("page-first").style.display = "none";

Which explains why await lazy.AnimationFramePromise times out.

Flags: needinfo?(jdescottes)

That's actually more or less what you pointed out in Comment 24 - sorry I didn't read the full history of the bug in details before checking the test case.

I'm trying to gather more information to know if we could detect that the frame is no longer displayed from where we call requestAnimationFrame, otherwise it sounds like we'll need to race against a timeout as you suggested.

Assignee: nobody → jdescottes
Status: NEW → ASSIGNED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: