Make use of a webprogress listener instead of page load events for tracking active navigations in Marionette
Categories
(Remote Protocol :: Marionette, task, P2)
Tracking
(Not tracked)
People
(Reporter: whimboo, Assigned: whimboo)
References
(Blocks 6 open bugs)
Details
(Whiteboard: [webdriver:backlog])
Attachments
(1 file, 2 obsolete files)
With bug 1612831 all the navigation commands will only operate on the parent process. That means no IPC communication will happen between the driver and the listener framescript. By doing that it retains the logic for waiting of the navigation to complete by using the page load events as sent by the framescript.
For a better handling and to finally get rid of the framescript at all, those checks need to be done in the parent process as best via the webprogress listener interface as available on each browsing context.
But before doing that we should get all the navigation tests updates first (see bug 1650132).
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Given the complexity of this change I would suggest to try the minimal approach by using the events via a JSWindowActor implementation. I filed bug 1673823 for that.
Assignee | ||
Comment 2•4 years ago
|
||
With the framescript code gone we would only have to implement that for actors. So lets block on bug 1669172.
Assignee | ||
Comment 3•4 years ago
|
||
Also it would be good to have full support for bfcache (bug 1673059) before doing that change.
Assignee | ||
Comment 4•4 years ago
|
||
With all the recent work on the different Marionette actors the current way how navigations are handled works pretty fine, and doesn't cause any trouble with Fission. Moving over to a webprogress listener is complex, and can cause lots of side-effects / regressions, which do not outweigh any benefits from doing that now.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
Quick update. Using the webProgress
via browsingContext.webProgress
would be the way to go here. It automatically tracks browsing context replacements for ongoing loads, and won't need such a complicated logic as what we have right now. Here a simple example which returns when the page has been finished loading:
await new Promise(resolve => {
let wpl = {
QueryInterface: ChromeUtils.generateQI([
"nsIWebProgressListener",
"nsISupportsWeakReference",
]),
onStateChange(webProgress, request, stateFlags) {
const { STATE_STOP } = Ci.nsIWebProgressListener;
if ((stateFlags & STATE_STOP) == STATE_STOP) {
browsingContext.webProgress.removeProgressListener(
wpl,
Ci.nsIWebProgress.NOTIFY_ALL
);
resolve();
}
},
};
browsingContext.webProgress.addProgressListener(
wpl,
Ci.nsIWebProgress.NOTIFY_ALL
);
});
Assignee | ||
Comment 6•3 years ago
|
||
When implementing this feature we should definitely use the MOZ_LOG=BCWebProgress:5
log module. It gives great information about any notification of web progress instances.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 7•3 years ago
•
|
||
After discussing with Henrik, we will keep the current navigation tests as marionette unit tests, because they are much faster to run than wdspec tests. Removing the dependency on Bug 1650132.
The current marionette tests for navigation can be found at:
Comment 8•3 years ago
•
|
||
Summary of the progress so far.
I have added a preference to turn off the webprogress implementation for navigation: marionette.webprogress.enabled.
Even when this preference is enabled, we still use the legacy codepath for click attempts.
The implementation using webprogress listener is using a combination of onStateChange and onLocationChange.
Challenges/issues:
- PageLoadStrategy.Eager: onLocationChange is mandatory if we want to replicate the "eager" page load strategy. Otherwise for onStateChange, STATE_START is too early, STATE_STOP is too late. It doesn't seem like webprogresslistener is a good fit for this
- detecting about:*error redirections is not straightforward. If you attempt to navigate to
"thisprotocoldoesnotexist://"
, you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checkingstatus != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED
and then waits for the next DOMContentLoaded since in DevTools case this runs in the content process (searchfox). Here instead I defer to the nextonLocationChange
, which should be about:neterror. However, there are also seemingly valid requests which matchstatus != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED
. I had the case for a navigation between an inline image and a regular image (searchfox). So for now I simply wait for some time but if no onLocationChange resolves the navigation, I consider it as completed. - I have a lot of intermittent about:blank navigations, although that might be because I was missing the fix from Bug 1747359. Currently rebasing, we'll see if the situation improves.
- navigations in driver.js always operate on the top Browsing Context, but navigate.js will watch navigation on the current browsing context. When the current browsing context is an iframe, the behavior is really surprising IMO. The navigation completes because the Frame's browsing context is unloaded and we do listen for that, but I am not sure we are really monitoring the actual full navigation of the top BC. In any case here since we use the webprogresslistener, I think we should always use the top BC
- (edit) overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg
inline("<script>window.{}('Hello');</script>".format(dialog_type))
. Usually, this navigation should end when thecommon-dialog-loaded
event is received, meaning the consumer can expect to interact with the dialog. However with the webprogresslistener, we sometimes get the STATE_STOP right before before the dialog was opened, and the consumer fails to find the dialog
I have a few try pushes in progress which are not too depressingly orange (eg https://treeherder.mozilla.org/jobs?repo=try&revision=ea910686f89a1a17ac891eca713b3d268ac41523), but there's still a lot of work here.
Assignee | ||
Comment 9•3 years ago
|
||
(In reply to Julian Descottes [:jdescottes] from comment #8)
- PageLoadStrategy.Eager: onLocationChange is mandatory if we want to replicate the "eager" page load strategy. Otherwise for onStateChange, STATE_START is too early, STATE_STOP is too late. It doesn't seem like webprogresslistener is a good fit for this
That is interesting. Is there no equivalent notification available as when DOMContentLoaded
is fired? I assume the location change notification might not enough or is the DOM loaded when it fires?
- detecting about:*error redirections is not straightforward. If you attempt to navigate to
"thisprotocoldoesnotexist://"
, you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checkingstatus != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED
and then waits for the next DOMContentLoaded since in DevTools case this runs in the content process (searchfox). Here instead I defer to the nextonLocationChange
, which should be about:neterror. However, there are also seemingly valid requests which matchstatus != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED
. I had the case for a navigation between an inline image and a regular image (searchfox). So for now I simply wait for some time but if no onLocationChange resolves the navigation, I consider it as completed.
Mind sharing a log with BCWebProgress:5
for such an example? Similar to my comment above I don't feel that this is actually stable enough. Maybe we should talk to some DOM folks?
- navigations in driver.js always operate on the top Browsing Context, but navigate.js will watch navigation on the current browsing context. When the current browsing context is an iframe, the behavior is really surprising IMO. The navigation completes because the Frame's browsing context is unloaded and we do listen for that, but I am not sure we are really monitoring the actual full navigation of the top BC. In any case here since we use the webprogresslistener, I think we should always use the top BC
Yes, per WebDriver spec a real navigation can only be triggered on the top-browsing context. This will be different for BiDi. But there are cases like click (navigation or even unload), forward and backward that trigger a navigation only in a sub frame. When using the WebProgress listener we have to keep all these tests working. See also test_click.py
.
- (edit) overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg
inline("<script>window.{}('Hello');</script>".format(dialog_type))
. Usually, this navigation should end when thecommon-dialog-loaded
event is received, meaning the consumer can expect to interact with the dialog. However with the webprogresslistener, we sometimes get the STATE_STOP right before before the dialog was opened, and the consumer fails to find the dialog
That's annoying and should not happen. That's a case we should clarify with the DOM folks.
I have a few try pushes in progress which are not too depressingly orange (eg https://treeherder.mozilla.org/jobs?repo=try&revision=ea910686f89a1a17ac891eca713b3d268ac41523), but there's still a lot of work here.
That's great to see! And thanks for the update.
Comment 10•3 years ago
•
|
||
Thanks for the reminder about BCWebProgress:5
.
Adding more logs. All Trace logs prefixed with "@@@@@" are my debug traces. I log: all events captured by our MarionetteEventsChild
actor and calls to my progress listener onLocationChange/onStateChange. For onStateChange, I filter out anything but STATE_STOP.
Below is eager evaluation while fetching a slow resource. Note that in an earlier attempt I tried to resolve on STATE_START from onStateChange, but it was too early as we will see here. In order to create this log I also let the webprogress listener run until STATE_STOP even though for this test case I would normally have resolved in onLocationChange. I did that to get a complete overview of the WebProgress updates.
### This is the navigate command
1642670020044 Marionette DEBUG 3 -> [0,2,"WebDriver:Navigate",{"url":"http://127.0.0.1:50671/slow_resource.html"}]
### Now we are unloading
1642670020046 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:beforeunload, uri:about:blank
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:0}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_START|STATE_IS_REQUEST|STATE_IS_DOCUMENT|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:about:blank}
1642670020049 RemoteAgent TRACE [42] Web progress state=start: about:blank
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, NS_ERROR_MALFORMED_URI, "127.0.0.1") on {top:1, id:2a, url:about:blank}
1642670020119 Marionette TRACE Remoteness change detected. Set new top-level browsing context to 42
1642670020119 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pagehide, uri:about:blank
1642670020119 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:unload, uri:about:blank
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
1642670020192 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:beforeunload, uri:about:blank
### Moving on to STATE_START
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_START|STATE_IS_REQUEST|STATE_IS_DOCUMENT|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:about:blank}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, NS_NET_STATUS_RECEIVING_FROM, "127.0.0.1") on {top:1, id:2a, url:about:blank}
### Note that we are now unloading about:blank again, probably the initial one
1642670020194 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pagehide, uri:about:blank
1642670020194 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:unload, uri:about:blank
### The following OnProgressChange is mostl likely turning into onLocationChange
[Parent 12948: Main Thread]: I/BCWebProgress OnProgressChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, http://127.0.0.1:50671/slow_resource.html, 0) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670020196 RemoteAgent TRACE @@@@@ onLocationChange documentURI http://127.0.0.1:50671/slow_resource.html
### Now, 6ms after the onLocationChange triggered in the parent process, we get DOMContentLoaded
### in the child actor, which is normally what we wait for in the current navigation logic.
1642670020202 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:DOMContentLoaded, uri:http://127.0.0.1:50671/slow_resource.html
### I didn't add timestamps here, but the following On*Change are all triggered close to DOMContentLoaded
[Parent 12948: Main Thread]: I/BCWebProgress OnSecurityChange(<null>, <null>, 4) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1β¦") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnProgressChange(<null>, <null>, 0, 0, 511, 511) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Waiting for 127.0.0.1β¦") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
### And the resource finally is loaded (and document state is complete)
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1β¦") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670024226 Marionette TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pageshow, uri:http://127.0.0.1:50671/slow_resource.html
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_STOP|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670024227 RemoteAgent TRACE [42] Web progress state=stop: http://127.0.0.1:50671/slow_resource.html
1642670024227 RemoteAgent TRACE @@@@@ onStateChange documentURI http://127.0.0.1:50671/slow_resource.html
From this, onLocationChange, seems slightly too early. But I'm not sure I see anything specific enough in any of the onProgressChange/onStatusChange to use them as clear markers of "DOMContentLoaded" has happened.
Comment 11•3 years ago
|
||
Quick overview of the other issues
detecting about:*error redirections is not straightforward. If you attempt to navigate to "thisprotocoldoesnotexist://", you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checking status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED ...
For this topic we probably have to know which error status imply that a redirection to an error page is about to happen. For the "thisprotocoldoesnotexist://" use case, we get NS_ERROR_UNKNOWN_PROTOCOL
, while for the image navigation we get NS_ERROR_PARSED_DATA_CACHED
. So we probably need to also check against NS_ERROR_PARSED_DATA_CACHED
before we consider the request as incomplete.
I have a lot of intermittent about:blank navigations, although that might be because I was missing the fix from Bug 1747359. Currently rebasing, we'll see if the situation improves.
This improved significantly after rebasing, so hopefully this was just linked to the missing patch from Bug 1747359
overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg inline("<script>window.{}('Hello');</script>".format(dialog_type))
Allowing for a small delay after STATE_STOP seems to fix the intermittent issues. I am pretty sure that STATE_STOP is only received after the script has fully executed. But on some platforms it seems that we receive STATE_STOP after the script was executed, but before the alert is open and ready to be used.
Comment 12•3 years ago
|
||
Hi Olli!
We are trying to use WebProgressListener to monitor navigations in Marionette. Marionette supports several page load strategies and one of them is "Eager" where we are supposed to resolve as soon as the document is "interactive". This comes after WebProgressListener onStateChange reaches STATE_START, but before STATE_STOP. So far, I ended up using onLocationChange, which seems to be triggered around the same time as the DOMContentLoaded
, but this might just be by accident. See comment #10 for more details.
Is WebProgressListener a good fit to monitor a navigation and detect when the document becomes "interactive"? Should we try to use the other callbacks (onStatusChange, onProgressChange)?
Comment 13•3 years ago
|
||
For the issue with dialogs opened from inline script (inline("<script>window.{}('Hello');</script>".format(dialog_type))
), my analysis was wrong. The test where this seems to frequently fail is
session.url = inline("<script>window.{}('Hello');</script>".format(dialog_type))
session.alert.accept()
session.url = inline("<script>window.{}('Hello');</script>".format(dialog_type))
response = send_alert_text(session, "Federer")
assert_error(response, "element not interactable")
And the issue is with the second inline
call. We seem to get the STATE_STOP from the previous navigation from time to time. I think I can workaround this by only accepting a STATE_STOP after a STATE_START.
Assignee | ||
Comment 14•3 years ago
|
||
(In reply to Julian Descottes [:jdescottes] from comment #13)
And the issue is with the second
inline
call. We seem to get the STATE_STOP from the previous navigation from time to time. I think I can workaround this by only accepting a STATE_STOP after a STATE_START.
Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):
https://searchfox.org/mozilla-central/rev/c1598f6d3edad19ccc53f53ab045d1d29835e1dd/remote/shared/Navigate.jsm#105-111
Comment 15•3 years ago
|
||
I have a relatively green try for the remote-protocol
preset: https://treeherder.mozilla.org/jobs?repo=try&revision=8551d16d9859e4a5575169afc86f315b323e587d
The main failures remaining are from wpt tests, and can be reproduced locally. The issue seems related to the PageLoadStrategy eager
. As hinted earlier, the onLocationChange
callback is triggered slightly before the DOMContentLoaded
event we use in the current implementation, and this leads to errors such as window.__wptrunner_process_next_event
. Waiting slightly before resolving seems to fix the issue, but at this point we should wait for Olli's feedback.
Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):
Thanks, and yes I plan to reuse it in the final version! I am using it a bit at the moment, but since my patch is not very stable I prefer to keep most of the code independent.
Comment 16•3 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [βοΈUTC+1] from comment #14)
Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):
https://searchfox.org/mozilla-central/rev/c1598f6d3edad19ccc53f53ab045d1d29835e1dd/remote/shared/Navigate.jsm#105-111
Note that for my patch, it's fine to expect a STATE_START first, because in theory the navigation has not started until I trigger the callback passed to waitForNavigationCompleted
. For the initial load, we can't be sure that we are starting to listen before some navigation started. So we should be careful if trying to apply the same pattern elsewhere.
Comment 17•3 years ago
|
||
Technically DOMContentLoaded doesn't map exactly to anything in WebProgressListener, nor readyState changes.
This is basically because of https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/dom/base/Document.cpp#8074
At that point readyState is expected to be already interactive and see the callers why DOMContentLoaded may be delayed (ScriptLoader dealing with deferred scripts).
ReadyState gets value interactive when the page has been created, basically all of its DOM
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/dom/base/nsContentSink.cpp#790
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/parser/html/nsHtml5TreeOpExecutor.cpp#165,167,178
WebProgressListener doesn't really know about interactive state, since the state depends on HTML parser passing treeOps from the parser thread to the main thread and then creating the DOM.
Comment 18•3 years ago
|
||
So if you must do something exactly when readyState changes to "interactive", a capturing event listener for readystatechange is possibly the only option. We could add something new, if event listeners are difficult to use in this case.
Comment 19•3 years ago
|
||
Thanks for the feedback! This means that if we want to use webprogresslistener, we still needto listen to readystatechange in the content process. I will try to adapt my current patch to handle this properly.
Comment 20•3 years ago
|
||
Comment 22•3 years ago
|
||
Comment on attachment 9264170 [details]
Bug 1664165 - [remote] Emit pageload internal events for existing WindowGlobalMessageHandlers
Revision D138900 was moved to bug 1756809. Setting attachment 9264170 [details] to obsolete.
Comment 23•3 years ago
|
||
Comment on attachment 9264279 [details]
Bug 1664165 - [remote] Add basic navigation helper to RootMessageHandler
Revision D138958 was moved to bug 1756809. Setting attachment 9264279 [details] to obsolete.
Comment 24•3 years ago
|
||
Unassigning for now this no longer blocks m3, we should now focus on bug 1756809 to get an initial navigation helper implemented.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 25•2 years ago
|
||
We have made good experience with the WebProgress listener for tracking a navigation. As such we probably also want to let Marionette use the same logic so that we can get rid of listening / checking load events. As seen in some cases we might miss those especially when a remoteness change takes place.
I wonder if we should prioritize this work a bit higher to make certain navigation related APIs more stable in Marionette.
Assignee | ||
Comment 26•2 years ago
|
||
Maybe we should not share with the BiDi implementation immediately because we need additional checks like modal dialogs for Marionette. As such lets use the WebProgressListener as next step to make navigation behavior more stable.
Assignee | ||
Comment 27•9 months ago
|
||
Given that amount of different failures during navigation with Marionette I put a stab on that and converted the code to make use of the webprogress listener from the shared navigation module. It works pretty fine locally. Here a full try build for verification:
https://treeherder.mozilla.org/jobs?repo=try&revision=6ad8cb259f8061599a74218fd3825d35bdea0b78
Assignee | ||
Comment 28•7 months ago
•
|
||
I forgot to actually post an update on December 29th. I was able to fix the listed Mn failures from the last try push, but saw that wpt tests are basically perma-failing now with this change:
https://treeherder.mozilla.org/jobs?repo=try&revision=29047464b77539131dfd95c6e6a2e3c2e915b94a
I've to take a look at this problem next. Here a new try build with Marionette trace logging enabled:
https://treeherder.mozilla.org/jobs?repo=try&revision=7166516525b3c8d3da40a943a626681363290adc
Assignee | ||
Comment 30•4 months ago
|
||
While working on bug 1522790 I noticed that especially for ASAN builds the current navigation logic in Marionette is more than 3x slower when Fission is enabled as when it it disabled. Here an example:
- Fission disabled (try build - 210ms)
[task 2024-05-02T15:08:20.826Z] 15:08:20 INFO - PID 2325 | 1714662500825 Marionette DEBUG 2 -> [0,7155,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/html/canvas/element/path-objects/2d.path.roundrect.radius.intersecting.1.html"}]
[task 2024-05-02T15:08:20.834Z] 15:08:20 INFO - PID 2325 | 1714662500833 Marionette TRACE [1202] Received event beforeunload for about:blank
[task 2024-05-02T15:08:20.887Z] 15:08:20 INFO - PID 2325 | 1714662500886 Marionette TRACE Remoteness change detected. Set new top-level browsing context to 1203
[task 2024-05-02T15:08:20.915Z] 15:08:20 INFO - PID 2325 | 1714662500910 Marionette TRACE [1203] Received event beforeunload for about:blank
[task 2024-05-02T15:08:20.919Z] 15:08:20 INFO - PID 2325 | 1714662500916 Marionette TRACE [1203] Received event pagehide for about:blank
[task 2024-05-02T15:08:21.037Z] 15:08:21 INFO - PID 2325 | 1714662501035 Marionette TRACE [1203] Received event DOMContentLoaded for http://web-platform.test:8000/html/canvas/element/path-objects/2d.path.roundrect.radius.intersecting.1.html
[task 2024-05-02T15:08:21.037Z] 15:08:21 INFO - PID 2325 | 1714662501036 Marionette DEBUG 2 <- [1,7155,null,{"value":null}]
- Fission enabled (try build - 860ms)
[task 2024-05-02T15:31:55.189Z] 15:31:55 INFO - PID 996 | 1714663915187 Marionette DEBUG 2 -> [0,8376,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/html/canvas/element/path-objects/2d.path.roundrect.radius.intersecting.1.html"}]
[task 2024-05-02T15:31:55.195Z] 15:31:55 INFO - PID 996 | 1714663915194 Marionette TRACE [1462] Received event beforeunload for about:blank
[task 2024-05-02T15:31:55.269Z] 15:31:55 INFO - PID 996 | 1714663915267 Marionette TRACE Remoteness change detected. Set new top-level browsing context to 1463
[task 2024-05-02T15:31:55.347Z] 15:31:55 INFO - PID 996 | 1714663915346 Marionette TRACE [1463] Received event beforeunload for about:blank
[task 2024-05-02T15:31:55.373Z] 15:31:55 INFO - PID 996 | 1714663915372 Marionette TRACE [1463] Received event pagehide for about:blank
[task 2024-05-02T15:31:56.041Z] 15:31:56 INFO - PID 996 | 1714663916040 Marionette TRACE [1463] Received event DOMContentLoaded for http://web-platform.test:8000/html/canvas/element/path-objects/2d.path.roundrect.radius.intersecting.1.html
[task 2024-05-02T15:31:56.046Z] 15:31:56 INFO - PID 996 | 1714663916041 Marionette DEBUG 2 <- [1,8376,null,{"value":null}]
While the pagehide
event comes right after the DOMContentLoaded
event has this extreme delay. It seems to me very unlikely that this is related to the IPC that our current implementation uses to wait for this event from the content process. I might probably do another try build with the BCWebProgress:5
logging enabled.
Also I may try to actually get this patch landed so that we can fully stay in the parent process and not have to rely on IPC. Lets see if I can get the wpt failures fixed (if they are still present).
Assignee | ||
Comment 31•4 months ago
|
||
I pushed a try build to check the current state of wpt tests:
https://treeherder.mozilla.org/jobs?repo=try&revision=5d88d9f99cb9adb45342ff7c53fca426d5bf2f0c
Assignee | ||
Comment 32•4 months ago
|
||
Tests are still failing. After some investigation I noticed that we basically miss a OnStateChange
notification for the real document load when there is already a document with about:blank
loading. As such we do not detect the start of the navigation and as result do not stop when state is stop
.
I've found the following condition in the docShell code which seems to be the culprit:
https://searchfox.org/mozilla-central/rev/386c7f17b2421374930a447c9f424910551b659d/docshell/base/BrowsingContextWebProgress.cpp#221-227
Here we return early without sending the notification when the document is already loading. But it does not check for the actual URL that gets loaded. That means if we have a situation when the initial about:blank
is loaded and there is some kind of delay which bleeds into a navigation as triggered by Marionette, we are not getting a OnStateChange
notification for start
state.
I'll run a full build locally to check that and will also make the following log call non-debug:
https://searchfox.org/mozilla-central/rev/386c7f17b2421374930a447c9f424910551b659d/uriloader/base/nsDocLoader.cpp#1344-1351
Assignee | ||
Comment 33•4 months ago
|
||
Bug 1673059 shouldn't block us in getting these changes landed.
Assignee | ||
Comment 34•4 months ago
|
||
Indeed, we return early if there is already an active document load even through the URL is different. But also commenting out the mIsLoadingDocument
condition and not returning early doesn't cause the notification to be sent. Maybe the DocLoader:5
logs might be helpful here. But I have to do a full build on my old Linux machine to actually test this.
Beside waiting for the build I also tried to reproduce this locally on MacOS and I stumbled over bug 1895283 which might be related or actually has the same underlying reason. Instead of the expected target page the former navigation is continued, which means that we never actually reach the target page.
Assignee | ||
Comment 35•3 months ago
|
||
There is actually an interesting situation here: https://share.firefox.dev/3Vs1JjB
While the start
state is forwarded from the web content process to the parent process and the BCWebProgress also logs it at 78,065s the onStateChange
callback of our registered WebProgressListener isn't called. As such we do not recognize that the actual navigation started and miss to set the internal flag to appropriately handle the state change stop at 78,221s, which results waiting forever.
Given that the issue happens right after a remoteness change I wonder if the browsing context check in that following code causes an early return:
https://searchfox.org/mozilla-central/rev/dc08a7321a22918d5a26b7641f9b10cd2a09d98e/dom/ipc/BrowserParent.cpp#2913-2952
I'll have to rebuild locally on my older Linux machine which will take a bit.
Comment 36•2 months ago
|
||
I'm trying to understand the situation here and how it is different than the situation on desktop re: onStateChange. On matrix :whimboo mentioned that desktop also doesn't send the STATE_START
for the main load, only sending it for the initial about:blank
load, but it sounded like the listener is being installed earlier compared to on Android.
I was also curious whether we'd see the STATE_START
for the second load if the WPT code called stop()
before starting the test page's load to halt any existing about:blank
navigation.
Assignee | ||
Comment 37•2 months ago
|
||
Given that I'm off the next couple of days I cannot check if that call to window.stop()
would help before wptrunner loads the actual test page. For now I've rebased my patch (to fix merge conflicts) and also pushed to try:
https://treeherder.mozilla.org/jobs?repo=try&revision=76fe21c16d8633d13c986cb2c48b95dd10436f4a
I only triggered a couple of web-platform-tests chunks so I hope that the failure will be visible without the above proposal. Then these patches can be used to test the above scenario. Julian, I would appreciate if you could help out here. Thanks!
Comment 38•2 months ago
|
||
I am unsure if it'd help, but I'd be curious to know if the changes from https://hg.mozilla.org/try/rev/8492a2e199e4540a74256c6606c742990533981e will help this particular situation at all. My theory with this is that we're waiting too long to stop the previous navigation when a load is triggered from within the parent process, so we don't get the STATE_STOP
from the previous load leading to this issue.
Assignee | ||
Comment 39•2 months ago
|
||
Sure. I quickly applied your patch and send it to try as full build:
https://treeherder.mozilla.org/jobs?repo=try&revision=be048a2ac26fdace1a00f7701c795669c376a1ab
Assignee | ||
Comment 40•2 months ago
|
||
Oh, I accidentally triggered a try build via artifact again because I was using mach try again
. Here a new one that should really build Firefox with Nika's patch included:
https://treeherder.mozilla.org/jobs?repo=try&revision=1850d6f13a30a8e1b69c77c5b934566a464b5070
Comment 41•2 months ago
|
||
I don't see a clear change or improvement between the run with and without Nika's patch.
I pushed to try with the suggestion to call window.stop() before loading the test URL at https://treeherder.mozilla.org/jobs?repo=try&revision=e48a1c046c68cb2dc494af0194a86d36dc3d4b76
This is still in progress and we have a few interesting failures eg all early hints tests failing at https://treeherder.mozilla.org/jobs?repo=try&revision=e48a1c046c68cb2dc494af0194a86d36dc3d4b76&selectedTaskRun=axHKpc00SpWiVJw__mhxrw.0 , but otherwise I think this is much more green than what we had before.
Comment 42•2 months ago
|
||
I had a look at the failures for the desktop tests:
- /loading/early-hints/ tests fail because the navigation errors on NS_BINDING_ABORTED. This is very similar to Bug 1908515, we have the test script which redirects to another page before the load is completed. In terms of BiDi and according to the spec, it's fine to throw here actually. But maybe the navigation helper itself should not throw and should let the consumer decide what to do?
- /html/semantics/forms/the-input-element/show-picker-disabled-readonly.html -> timeouts because each click waits for 1600 ms for a potential navigation. I imagine the timeout is. Our webprogress-based navigation helper uses a timeout multiplier based on the platform, so while the default is 200ms, for a DEBUG platform we will wait 1600ms per click, which is probably too much.
- /html/semantics/interactive-elements/the-summary-element/interactive-content.html -> same timeout issue as above
- /html/cross-origin-embedder-policy/require-corp.https.html -> similar to the NS_BINDING_ABORTED issue, we have a navigation that fails here so the test stops.
Description
•