Wait for nsIBrowserChild.contentTransformReceived() before dispatching events
Categories
(Remote Protocol :: Agent, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [webdriver:m20])
(In reply to Hiroyuki Ikezoe (:hiro) from bug 1965483 comment #9)
It looks to me there's still a race on the condition whether the out-of-process iframe is ready to be used. The test [uses
loadevent](https://searchfox.org/mozilla-central/rev/bd1603a168d35648f2f62f1c89162a15a8180952/testing/web-platform/tests/dom/events/scrolling/ scroll-cross-origin-iframes.html#37-53) basically, but it doesn't ensure that the out-of-process iframe document has received the information to process events properly.I think we also need to await nsIBrowserChild.contentTransformsReceived.
Especially on Android I can see with my work on bug 1971979 that wheel scroll events that we are dispatching through APZ are not actually getting processed because of this issue. This is blocking as well the async wheel scroll events to be enabled for wpt platform tests.
Hiro or Masayuki, would we have to do this call for each and every time we perform actions, or might other browser behavior be affected as well so that we may have to wait in generell whenever a (cross-origin) iframe gets created / navigated? If that's the case we may have to mark the current browsing context tree as dirty in Marionette / BiDi and wait for the content transformed to be received.
| Reporter | ||
Comment 1•9 months ago
|
||
I assume that we should call it as well when we dispatch events through the content process - even through we haven't seen issues for it yet.
Comment 2•9 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #0)
Hiro or Masayuki, would we have to do this call for each and every time we perform actions, or might other browser behavior be affected as well so that
The answer is here is the latter.
we may have to wait in generell whenever a (cross-origin) iframe gets created / navigated?
This sounds an interesting approach to me. It should work. The information that contentTransformsReceived ensures is also used for IntersectionObservers or CSS animation/transition optimizations, so it's necessary for tests that no event is synthesized.
Comment 3•9 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #0)
Especially on Android I can see with my work on bug 1971979 that wheel scroll events that we are dispatching through APZ are not actually getting processed because of this issue. This is blocking as well the async wheel scroll events to be enabled for wpt platform tests.
I don't think this bug needs to block bug 1965483. Enabling async wheel scroll events just revealed a pre-existing issue. Theoretically this bug could be observable on Firefox if the user tried to interact on an out-of-process iframe soon after the iframe got loaded.
| Reporter | ||
Comment 5•9 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #2)
This sounds an interesting approach to me. It should work. The information that
contentTransformsReceivedensures is also used for IntersectionObservers or CSS animation/transition optimizations, so it's necessary for tests that no event is synthesized.
Maybe we can do it in two steps? First we only make sure that we call this method whenever we dispatch actions via Perform Actions or Element Click, and later on we can find a way to wait for the transforms received in general when executing any other WebDriver command. The latter is more complicated and doesn't actually block the interop2025 work. The time it needs the run that code only takes around 2ms so I think its fine to have this check as first step in each call to these event methods. What do you think?
| Reporter | ||
Comment 6•9 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #3)
I don't think this bug needs to block bug 1965483. Enabling async wheel scroll events just revealed a pre-existing issue.
Well, it at least seem to stop all the Windows related test failures that we are currently observing with async events enabled. Once fixed it should lower the amount of classifications for that bug - that's why I added it as blocking.
Comment 7•9 months ago
|
||
I don't think doing something on WebDriver's actions is a good idea.
For example, given that there's an out-of-process (100x100) sized iframe positioned at (100, 100) in the parent document, and an action (110, 110) offset from the parent document origin is triggered in the parent document. The iframe document needs to await contentTransformsReceived, the parent doesn't need to.
Finding the proper target document depends on the info which contentTransformsReceived ensures the info has arrived to the document, thus it will lead to a chicken or the egg problem.
| Reporter | ||
Comment 8•9 months ago
|
||
Hm, so given that we do not have the information on our side where the dispatch will actually end up - like an element in the parent document could also overlay the iframe - could this be some logic within the APZ chain to ensure that content transforms are received before actually dispatching the widget event to that frame?
If not I wonder if maybe a contentTransformReceived field could be added to the browsing context class, which when called on the top-level browsing context would automatically check all the child browsing contexts as well and return when the information was received.
Comment 9•9 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #8)
Hm, so given that we do not have the information on our side where the dispatch will actually end up - like an element in the parent document could also overlay the iframe
The event is dispatched to the place where the action is triggered.
could this be some logic within the APZ chain to ensure that content transforms are received before actually dispatching the widget event to that frame?
Unfortunately there's still a race APZ hasn't received the info, so APZ can not do anything.
If not I wonder if maybe a
contentTransformReceivedfield could be added to the browsing context class, which when called on the top-level browsing context would automatically check all the child browsing contexts as well and return when the information was received.
contentTransformReceived kinda has the browsing context. Each BrowserChild (nearly == BrowsingContext) has the method.
| Reporter | ||
Comment 10•9 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #9)
If not I wonder if maybe a
contentTransformReceivedfield could be added to the browsing context class, which when called on the top-level browsing context would automatically check all the child browsing contexts as well and return when the information was received.
contentTransformReceivedkinda has the browsing context. Each BrowserChild (nearly == BrowsingContext) has the method.
Sure, but that requires a lot of xpconnect calls when we recursively walk the whole browsing context tree to ensure we have a stable state. Having it all on the platform side might be better.
I also found waitUntilApzStable() which awaits for promiseAllPaintsDone() and promiseOnlyApzControllerFlushed(). Is that something we need to take care of as well?
Comment 11•9 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #10)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #9)
If not I wonder if maybe a
contentTransformReceivedfield could be added to the browsing context class, which when called on the top-level browsing context would automatically check all the child browsing contexts as well and return when the information was received.
contentTransformReceivedkinda has the browsing context. Each BrowserChild (nearly == BrowsingContext) has the method.Sure, but that requires a lot of xpconnect calls when we recursively walk the whole browsing context tree to ensure we have a stable state. Having it all on the platform side might be better.
The approach we've used for mochitests is that the test author explicitly calls contentTransformReceived for the iframe document in question because the test author definitely knows which document needs to await. Thus we don't need to walk the all browsing context tree.
I understand this approach can not be used for web-platform tests.
Thus I don't have any clear idea how we can avoid the race.
I'd say asking DOM people would be more reasonable than me, specifically persons who are familiar with process isolation.
I also found
waitUntilApzStable()which awaits forpromiseAllPaintsDone()andpromiseOnlyApzControllerFlushed(). Is that something we need to take care of as well?
Though there may be a race that those function has been papering over, I don't think those ones are particularly necessary here. Bug 1794097 might have been caused by the race.
Comment 12•9 months ago
|
||
I am clarifying what happens when the test fails, I hope this clarification will convince Henrink why APZ can not do anything for fixing the race;
- When the test fails, the wheel action is consumed by the parent document, not by the iframe document
- The reason above ^ is that at that moment, APZ hasn't received any information of the cross-origin iframe
So from the perspective of APZ, when APZ received the action, APZ know nothing about the cross-origin iframe, thus there's nothing APZ can do. We should do something before the action reached to APZ.
Comment 13•9 months ago
|
||
Also note that the coordinate conversion function, LayoutUtils.rectToTopLevelWidgetRect (introduced in bug 1885106) is also used for the info APZ sent to content processes, i.e. the info that contentTransformReceived ensures that the info has arrived in each content process. So, we need to do something before the conversion happens in the action processing (if the action is triggered in an out-of-process iframe).
Thus,
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #0)
we may have to wait in generell whenever a (cross-origin) iframe gets created / navigated?
this approach sounds reasonable to me.
Comment 14•9 months ago
|
||
Based on the fact that Chrome has a similar race (bug 1965483 comment 34), I'd suggest adding a new function somewhere (WebDriver or testharnnes?) to make sure the given cross-origin iframe is ready to be tested, and using the function for such kind of tests such as bug 1965483.
| Reporter | ||
Comment 15•9 months ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #13)
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #0)
we may have to wait in generell whenever a (cross-origin) iframe gets created / navigated?
this approach sounds reasonable to me.
Hi Nika, could you help us determine whether there’s already existing code on the platform side (or whether it could be added) that would allow us to operate with (cross-process) iframes only once their content transforms have been synced?
If this behavior isn’t needed by default in Firefox because it only impacts automated tests, perhaps we could add a helper to the BrowsingContext class. This helper, exposed to privileged JavaScript, would let us check - at the top-level browsing context level - whether all child (iframe) browsing contexts have received the relevant information.
This approach would likely be faster than using the SpecialPowers way of checking each nsIBrowser, since it would all remain within C++ and don't cross XPConnect.
If such an implementation takes longer, I would suggest the following as a workaround for now by using the referenced method of SpecialPowers:
Marionette
- When switching to an existing tab we first check that the information is synced before returning from the command
- If an iframe gets created within the currently selected tab, mark the tab as not yet initialized and wait for the information synced up before allowing to run any further command
WebDriver BiDI
- Observe for new frames and within the browsing context created event wait for the sync to be done before sending out the event to the client, but also mark the corresponding top-level as not ready yet to prevent commands to be run.
Both protocols
- When creating a Webdriver session wait until all frames have their information synced
| Reporter | ||
Updated•9 months ago
|
Updated•9 months ago
|
Comment 16•9 months ago
|
||
I'm unfortunately not super familiar with how the transform information for oop frames are synchronized. I'm a touch surprised that they'd be synchronized so late so as to not be available when the load event is firing TBH, but perhaps this has to do with synchronizing with the GPU process in some cases?
It looks from a scan over the method you mention above that the matrices are provided by APZ from NotifyLayerTransforms (https://searchfox.org/mozilla-central/rev/584b344830aa2558985675b99aaadbb915402caa/gfx/layers/apz/util/APZCCallbackHelper.cpp#292-306), which I presume is triggered async by the GPU process.
If you just want a way to check if every content process under a remote tab has been sent transforms, you could perhaps write a function (e.g. on nsIRemoteTab), which does a check like this:
NS_IMETHODIMP
BrowserHost::GetAllFramesHaveTransforms(bool* aResult) {
*aResult = true;
mRoot->VisitAllDescendants([&] (BrowserParent* bp) {
*aResult = *aResult && bp->mRemoteDocumentRect.isSome();
});
return NS_OK;
}
I think that would check that SendChildToParentMatrix has been called for every OOP iframe in the tree, as it appears that member will be Nothing() until SetChildToParentConversionMatrix is called: https://searchfox.org/mozilla-central/rev/584b344830aa2558985675b99aaadbb915402caa/dom/ipc/BrowserParent.cpp#2683
| Reporter | ||
Updated•8 months ago
|
Updated•6 months ago
|
| Reporter | ||
Updated•3 months ago
|
| Reporter | ||
Updated•2 months ago
|
| Reporter | ||
Comment 17•2 months ago
|
||
Does anything has changed here over the last half year? I'm no longer able to reproduce the the problem on Android or desktop. Maybe the information is now synced right away for oop iframes?
Botond, or Hiro do you know?
Comment 18•2 months ago
|
||
I am not aware of any relevant changes. Actually the notification API is not sync at all.
Given that you are no longer able to see the failure and if the failure doesn't happen on our CIs, we can lower this bug priority now. It might be possible that the failure happens on wpt.fyi.
Comment 19•2 months ago
|
||
I also can't think of a relevant change. The course of action suggested in comment 18 sounds reasonable to me.
Description
•