Closed Bug 1967705 Opened 10 months ago Closed 9 months ago

sync-about-blank: Avoid parser blocking about:blank in _ConfigurationModule

Categories

(Remote Protocol :: WebDriver BiDi, defect, P2)

defect

Tracking

(firefox142 fixed)

RESOLVED FIXED
142 Branch
Tracking Status
firefox142 --- fixed

People

(Reporter: vhilla, Assigned: vhilla)

References

Details

(Whiteboard: [webdriver:m16][webdriver:external])

Attachments

(3 files)

Attached file preload_order.py

WebDriver relies on parser blocking in the _ConfigurationModule to ensure preload scripts and overrides are applied before a document starts loading.

Bug 543435 changes the initial about:blank to load synchronous and parser blocking won't work for this load anymore. This causes race conditions and webdriver/tests/bidi/browsing_context/set_viewport/user_contexts.py to fail flakily.

Context

Spec situation

The spec says to apply the viewport override (link)

After creating a document in a new navigable navigable and before the run WebDriver BiDi preload scripts

. The geolocation override is applied as part of the same algorithm. A preload script (link)

runs on creation of a new Window, before any author-defined script have run.

I think this leaves it open whether preload scripts execute before or after the initial about:blank load event. Using the attached preload_order.py test, I believe the load event is observable from a preload script in Firefox but not in Chromium.

Maybe parser blocking isn't necessary, we only need to ensure the correct order of script execution, maybe block execution till the overrides are applied.

Chromium code

I believe Chromium attaches a debugger to block script execution till it's done all WebDriver configuration tasks. It also blocks the browsing_context.create command till the configuring is done.

The relevant code is CdpTarget.unblock (link) that runs on creation of CdpTarget, attaches a debugger, applies overrides, loads preload scripts, lets the debugger resume execution, and then resolves its unblocked promise. BrowsingContextImpl.create blocks on targetUnblockedOrThrow (link), which blocks on cdpTarget.unblocked.

Ideas & Questions

I wonder whether _ConfigurationModule not blocking for the initial about:blank is an acceptable regression from Bug 543435. Or is this severe enough to block landing our changes?

Can we mimic Chromium and have WebDriver use the debugger to block script execution? Are there other fixes? How much effort is this?

We could have BrowsingContextModule.create block till a message from _ConfigurationModule is received. This would fix the set_viewport/user_contexts.py test. But :Sasha pointed out that the problem remains with script-created windows, as seen in the second (flaky) subtest in the attached preload_order.py.

:Sasha, kindly asking for your thoughts.I don't know who knows well about the Debugger, maybe you can forward the ni.

Flags: needinfo?(aborovova)

Julian, you probably know best in our team about the debugger API, do you think that it might be possible to block script execution with it?

Flags: needinfo?(aborovova) → needinfo?(jdescottes)

Yes we should be able to use Debugger's onEnterFrame to pause javascript. I will play with it to see if I can get anywhere. Also cc'ing :ochameau who knows this API way better than I do.

I'm not sure I understand all the expectations, but I think you may rather want to eagerly prevent JS code from running via suppressEventHandling and suspendTimesout, similarly to this devtools code.
onEnterFrame is an option, but you would block execution slightly later and may cause event loop to be ordered differently. Spidermonkey doesn't allow to block/pause JS execution, you would have to use Services.tm.spinEventLoopUntil (or some equivalent of it), which may cause some disordering issues.

Per a hint from :smaug, preload_order.py is already failing on central.
Parser-blocking is not preventing the opener or embedder from accessing a script-created window object. I think it is also not guaranteeing that the window knows about the viewport having changed before other scripts are scheduled.
All the subtests in preload_order_extended.py pass with Chromium, but not mozilla-central.

I'm not sure what implications bug 543435 has on these pre-existing issues and why set_viewport/user_context.py only starts failing with these changes.

(In reply to Alexandre Poirot [:ochameau] from comment #4)

I'm not sure I understand all the expectations, but I think you may rather want to eagerly prevent JS code from running via suppressEventHandling and suspendTimesout, similarly to this devtools code.
onEnterFrame is an option, but you would block execution slightly later and may cause event loop to be ordered differently. Spidermonkey doesn't allow to block/pause JS execution, you would have to use Services.tm.spinEventLoopUntil (or some equivalent of it), which may cause some disordering issues.

The goal is to pause running any content JS (eg script tags in the content page) until we have executed the preload script and applied the configuration. Will suppressEventHandling/suspendTimeout handle that?

Flags: needinfo?(jdescottes) → needinfo?(poirot.alex)

(In reply to Julian Descottes [:jdescottes] from comment #6)

The goal is to pause running any content JS (eg script tags in the content page) until we have executed the preload script and applied the configuration. Will suppressEventHandling/suspendTimeout handle that?

TBH, I don't really know. But it is possible that SuppressEventHandling blocks script execution when I see such intruction. It isn't clear if it does exactly what you need.

Also note that suspendTimeout is not solely about setTimeout and setInterval, it suspends many APIs:
https://searchfox.org/mozilla-central/rev/02d33f4bf984f65bd394bfd2d19d66569ae2cfe1/dom/base/nsGlobalWindowInner.cpp#5523-5578
It pauses workers, audio context, sensors, vrs,...

What concerns me the most is that suppressEventHandling, at least for DOM Events, prevents some actions from happening.
Some user events will be completely discarded. I'm bit concerned that your <script> element may be completely ignored, even on resume.
suspendTimeout is actually different and can delay execution. i.e. if an interval or timeout was planned, it should be fired after resume.

Flags: needinfo?(poirot.alex)
Whiteboard: [webdriver:triage]

Discussed a bit with Vincent about the current cross browser differences.

It seems that Chrome also blocks running JS in the opener window while a new window / context is being created. I feel like the main goal of blocking JS here is to ensure that preload scripts run before content scripts loaded by the page. Blocking opener scripts seems more like a side effect from the implementation (which we should discuss with the WG), but I don't think it's a goal for now.

If we discard this, whether or not we are blocking content scripts for the initial about:blank is not a huge issue, because there shouldn't be any content script.

However we still have a race condition with setting the viewport from the configuration module. According to the investigations from Vincent, it's probable this comes from the fact that waitForInitialNavigationCompleted in browsingContext.create resolves earlier now.

One option could be to notify the root/browsingContext from windowglobal/_configuration via an event. Let's discuss in triage.

Why we can't have the preload scripts in the content process early enough?
And why can't https://searchfox.org/mozilla-central/rev/5813af297bb0e6581ab49029ba897c3e12a9134c/remote/webdriver-bidi/modules/windowglobal/_configuration.sys.mjs#79-119 be handled synchronously, why does there need to be async communication with the parent process there? (assuming I'm interpreting that code correctly)

I may ask silly questions since I don't know what preload scripts are, and where are they coming from :)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #9)

Why we can't have the preload scripts in the content process early enough?
And why can't https://searchfox.org/mozilla-central/rev/5813af297bb0e6581ab49029ba897c3e12a9134c/remote/webdriver-bidi/modules/windowglobal/_configuration.sys.mjs#79-119 be handled synchronously, why does there need to be async communication with the parent process there? (assuming I'm interpreting that code correctly)

I may ask silly questions since I don't know what preload scripts are, and where are they coming from :)

I think we are early enough and we probably only block parsing because our handleCommand helper is async. However there's no communication with the parent process here. All the information from the parent should already be available in the process at that point, via shared data.

It's usually async because we don't assume the target of a command is in the same process as the caller. But here again, it should be the case so we could probably switch the setGeolocationOverride and evaluatePreloadScript to be fully sync.

But updateNavigableViewport is by nature async because it waits for a resize/rAF before resolving. We want to be sure that the viewport has the right size before clients attempt to interact with it or before any script runs (either preload or regular content script). If any of those scripts queries the viewport size, it should have the value configured by the client.

And I guess that's the main thing which blocks us from making this fully sync? I can write a handleCommandSync variant, but as long as we need to wait for updateNavigableViewport to be applied before running preload & content scripts, we'll probably have the same issue. Maybe there is a way to synchronously update the viewport?

Severity: -- → S3
Points: --- → 2
Priority: -- → P2
Whiteboard: [webdriver:triage] → [webdriver:m16]

We discussed this in triage, for now we will update the tests so that they don't block your change and we will file follow ups to improve our configuration logic if possible.

For the option to send an event from the _configuration module to let the parent process know that the configuration is done, this would work for explicit commands such as browsingContext.create but we cannot rely on this if users create contexts via other means and observe eg load events themselves.

I think we are early enough and we probably only block parsing because our handleCommand helper is async. However there's no communication with the parent process here. All the information from the parent should already be available in the process at that point, via shared data.

I was wrong here, indeed we go from content to parent for setting the viewport configuration. We send a command to the parent from the windowglobal _configuration module here. This is handled in the root browsingContext module here which will resize the browser element and then send another command to the content process to wait for the resize to be applied here. So there's a lot of jumping between parent and content process here. And I am not sure we can heavily simplify it because: 1/ this needs to be handled in the same way for browsing contexts created via a BiDi API or created via JS/user action, 2/ order matters, we need to set the viewport and have it applied before preload scripts are executed

for now we will update the tests so that they don't block your change and we will file follow ups to improve our configuration logic if possible.

We thought we could use a different URL instead of about:blank, but browsingContext.create doesn't allow to set a URL. So this is not really an option.

So let's go back to the option of sending a notification from the configuration module for now.

Vincent, do you want to close this bug? Otherwise let's attach the patch from https://phabricator.services.mozilla.com/D253541 to it ?

Flags: needinfo?(vhilla)

The _ConfigurationModule blocks the parser to perform some configuration tasks
on new documents. But parser blocking doesn't work anymore for the initial
about:blank document. This patch ensures that browsing_context.create doesn't
complete before configuration tasks are done.

Assignee: nobody → vhilla
Status: NEW → ASSIGNED

You suggestion is good, we can land the patch before bug 543435. I attached the patch.

Flags: needinfo?(vhilla)
Pushed by vhilla@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/06542d0be7f1 https://hg.mozilla.org/integration/autoland/rev/b286fa057a16 Don't rely on parser blocking in webdriver for initial page load. r=jdescottes,webdriver-reviewers
Pushed by abutkovits@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/908e7d5cb171 https://hg.mozilla.org/integration/autoland/rev/ac2b7416d17a Revert "Bug 1967705 - Don't rely on parser blocking in webdriver for initial page load. r=jdescottes,webdriver-reviewers" for causing failures at browsercontext-cookies.spec.js.

Commented and updated the patch

Flags: needinfo?(vhilla)
Pushed by vhilla@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/403b0145f46e https://hg.mozilla.org/integration/autoland/rev/af095fbba257 Don't rely on parser blocking in webdriver for initial page load. r=jdescottes,webdriver-reviewers
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 142 Branch
Points: 2 → ---
Whiteboard: [webdriver:m16] → [webdriver:m16][webdriver:external]
See Also: → 1978075
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: