Open Bug 1664165 Opened 4 years ago Updated 21 days ago

Make use of a webprogress listener instead of page load events for tracking active navigations in Marionette

Categories

(Remote Protocol :: Marionette, task, P2)

Default
task
Points:
5

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 2 open bugs, Blocks 6 open bugs)

Details

(Whiteboard: [webdriver:backlog])

Attachments

(1 file, 2 obsolete files)

With bug 1612831 all the navigation commands will only operate on the parent process. That means no IPC communication will happen between the driver and the listener framescript. By doing that it retains the logic for waiting of the navigation to complete by using the page load events as sent by the framescript.

For a better handling and to finally get rid of the framescript at all, those checks need to be done in the parent process as best via the webprogress listener interface as available on each browsing context.

But before doing that we should get all the navigation tests updates first (see bug 1650132).

Fission Milestone: --- → M7
Whiteboard: [marionette-fission-mvp] → [marionette-fission-mvp][complex]
Blocks: 1665210
No longer blocks: 1665210
Blocks: 1669174
Blocks: 1669787
Blocks: 1671783
See Also: → 1673823

Given the complexity of this change I would suggest to try the minimal approach by using the events via a JSWindowActor implementation. I filed bug 1673823 for that.

No longer blocks: 1669174

With the framescript code gone we would only have to implement that for actors. So lets block on bug 1669172.

Depends on: 1669172

Also it would be good to have full support for bfcache (bug 1673059) before doing that change.

Depends on: 1673059

With all the recent work on the different Marionette actors the current way how navigations are handled works pretty fine, and doesn't cause any trouble with Fission. Moving over to a webprogress listener is complex, and can cause lots of side-effects / regressions, which do not outweigh any benefits from doing that now.

Whiteboard: [marionette-fission-mvp][complex]
No longer blocks: marionette-fission
Fission Milestone: M7 → ---
Summary: Move checks for active navigation from page load events to webprogress listener → Make use of a webprogress listener instead of page load events for tracking active navigations
Blocks: 1450876

Quick update. Using the webProgress via browsingContext.webProgress would be the way to go here. It automatically tracks browsing context replacements for ongoing loads, and won't need such a complicated logic as what we have right now. Here a simple example which returns when the page has been finished loading:

        await new Promise(resolve => {
          let wpl = {
            QueryInterface: ChromeUtils.generateQI([
              "nsIWebProgressListener",
              "nsISupportsWeakReference",
            ]),
            onStateChange(webProgress, request, stateFlags) {
              const { STATE_STOP } = Ci.nsIWebProgressListener;
              if ((stateFlags & STATE_STOP) == STATE_STOP) {
                browsingContext.webProgress.removeProgressListener(
                  wpl,
                  Ci.nsIWebProgress.NOTIFY_ALL
                );
                resolve();
              }
            },
          };
          browsingContext.webProgress.addProgressListener(
            wpl,
            Ci.nsIWebProgress.NOTIFY_ALL
          );
        });

When implementing this feature we should definitely use the MOZ_LOG=BCWebProgress:5 log module. It gives great information about any notification of web progress instances.

Blocks: 1730642
Points: --- → 13
Whiteboard: [bidi-m3-mvp]
Priority: P3 → P2
Assignee: nobody → jdescottes
Status: NEW → ASSIGNED

After discussing with Henrik, we will keep the current navigation tests as marionette unit tests, because they are much faster to run than wdspec tests. Removing the dependency on Bug 1650132.

The current marionette tests for navigation can be found at:

No longer depends on: 1650132
Blocks: 1650132

Summary of the progress so far.

I have added a preference to turn off the webprogress implementation for navigation: marionette.webprogress.enabled.
Even when this preference is enabled, we still use the legacy codepath for click attempts.
The implementation using webprogress listener is using a combination of onStateChange and onLocationChange.

Challenges/issues:

  1. PageLoadStrategy.Eager: onLocationChange is mandatory if we want to replicate the "eager" page load strategy. Otherwise for onStateChange, STATE_START is too early, STATE_STOP is too late. It doesn't seem like webprogresslistener is a good fit for this
  2. detecting about:*error redirections is not straightforward. If you attempt to navigate to "thisprotocoldoesnotexist://", you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checking status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED and then waits for the next DOMContentLoaded since in DevTools case this runs in the content process (searchfox). Here instead I defer to the next onLocationChange, which should be about:neterror. However, there are also seemingly valid requests which match status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED. I had the case for a navigation between an inline image and a regular image (searchfox). So for now I simply wait for some time but if no onLocationChange resolves the navigation, I consider it as completed.
  3. I have a lot of intermittent about:blank navigations, although that might be because I was missing the fix from Bug 1747359. Currently rebasing, we'll see if the situation improves.
  4. navigations in driver.js always operate on the top Browsing Context, but navigate.js will watch navigation on the current browsing context. When the current browsing context is an iframe, the behavior is really surprising IMO. The navigation completes because the Frame's browsing context is unloaded and we do listen for that, but I am not sure we are really monitoring the actual full navigation of the top BC. In any case here since we use the webprogresslistener, I think we should always use the top BC
  5. (edit) overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg inline("<script>window.{}('Hello');</script>".format(dialog_type)). Usually, this navigation should end when the common-dialog-loaded event is received, meaning the consumer can expect to interact with the dialog. However with the webprogresslistener, we sometimes get the STATE_STOP right before before the dialog was opened, and the consumer fails to find the dialog

I have a few try pushes in progress which are not too depressingly orange (eg https://treeherder.mozilla.org/jobs?repo=try&revision=ea910686f89a1a17ac891eca713b3d268ac41523), but there's still a lot of work here.

(In reply to Julian Descottes [:jdescottes] from comment #8)

  1. PageLoadStrategy.Eager: onLocationChange is mandatory if we want to replicate the "eager" page load strategy. Otherwise for onStateChange, STATE_START is too early, STATE_STOP is too late. It doesn't seem like webprogresslistener is a good fit for this

That is interesting. Is there no equivalent notification available as when DOMContentLoaded is fired? I assume the location change notification might not enough or is the DOM loaded when it fires?

  1. detecting about:*error redirections is not straightforward. If you attempt to navigate to "thisprotocoldoesnotexist://", you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checking status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED and then waits for the next DOMContentLoaded since in DevTools case this runs in the content process (searchfox). Here instead I defer to the next onLocationChange, which should be about:neterror. However, there are also seemingly valid requests which match status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED. I had the case for a navigation between an inline image and a regular image (searchfox). So for now I simply wait for some time but if no onLocationChange resolves the navigation, I consider it as completed.

Mind sharing a log with BCWebProgress:5 for such an example? Similar to my comment above I don't feel that this is actually stable enough. Maybe we should talk to some DOM folks?

  1. navigations in driver.js always operate on the top Browsing Context, but navigate.js will watch navigation on the current browsing context. When the current browsing context is an iframe, the behavior is really surprising IMO. The navigation completes because the Frame's browsing context is unloaded and we do listen for that, but I am not sure we are really monitoring the actual full navigation of the top BC. In any case here since we use the webprogresslistener, I think we should always use the top BC

Yes, per WebDriver spec a real navigation can only be triggered on the top-browsing context. This will be different for BiDi. But there are cases like click (navigation or even unload), forward and backward that trigger a navigation only in a sub frame. When using the WebProgress listener we have to keep all these tests working. See also test_click.py.

  1. (edit) overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg inline("<script>window.{}('Hello');</script>".format(dialog_type)). Usually, this navigation should end when the common-dialog-loaded event is received, meaning the consumer can expect to interact with the dialog. However with the webprogresslistener, we sometimes get the STATE_STOP right before before the dialog was opened, and the consumer fails to find the dialog

That's annoying and should not happen. That's a case we should clarify with the DOM folks.

I have a few try pushes in progress which are not too depressingly orange (eg https://treeherder.mozilla.org/jobs?repo=try&revision=ea910686f89a1a17ac891eca713b3d268ac41523), but there's still a lot of work here.

That's great to see! And thanks for the update.

Thanks for the reminder about BCWebProgress:5.

Adding more logs. All Trace logs prefixed with "@@@@@" are my debug traces. I log: all events captured by our MarionetteEventsChild actor and calls to my progress listener onLocationChange/onStateChange. For onStateChange, I filter out anything but STATE_STOP.

Below is eager evaluation while fetching a slow resource. Note that in an earlier attempt I tried to resolve on STATE_START from onStateChange, but it was too early as we will see here. In order to create this log I also let the webprogress listener run until STATE_STOP even though for this test case I would normally have resolved in onLocationChange. I did that to get a complete overview of the WebProgress updates.

### This is the navigate command
1642670020044 Marionette  DEBUG 3 -> [0,2,"WebDriver:Navigate",{"url":"http://127.0.0.1:50671/slow_resource.html"}]

### Now we are unloading 
1642670020046 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:beforeunload, uri:about:blank
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:0}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_START|STATE_IS_REQUEST|STATE_IS_DOCUMENT|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:about:blank}
1642670020049 RemoteAgent TRACE [42] Web progress state=start: about:blank
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, NS_ERROR_MALFORMED_URI, "127.0.0.1") on {top:1, id:2a, url:about:blank}
1642670020119 Marionette  TRACE Remoteness change detected. Set new top-level browsing context to 42
1642670020119 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pagehide, uri:about:blank
1642670020119 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:unload, uri:about:blank
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
1642670020192 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:beforeunload, uri:about:blank

### Moving on to STATE_START
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_START|STATE_IS_REQUEST|STATE_IS_DOCUMENT|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:about:blank}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, NS_NET_STATUS_RECEIVING_FROM, "127.0.0.1") on {top:1, id:2a, url:about:blank}

### Note that we are now unloading about:blank again, probably the initial one
1642670020194 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pagehide, uri:about:blank
1642670020194 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:unload, uri:about:blank

### The following OnProgressChange is mostl likely turning into onLocationChange
[Parent 12948: Main Thread]: I/BCWebProgress OnProgressChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, http://127.0.0.1:50671/slow_resource.html, 0) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670020196 RemoteAgent TRACE @@@@@ onLocationChange documentURI http://127.0.0.1:50671/slow_resource.html

### Now, 6ms after the onLocationChange triggered in the parent process, we get DOMContentLoaded 
### in the child actor, which is normally what we wait for in the current navigation logic. 
1642670020202 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:DOMContentLoaded, uri:http://127.0.0.1:50671/slow_resource.html

### I didn't add timestamps here, but the following On*Change are all triggered close to DOMContentLoaded
[Parent 12948: Main Thread]: I/BCWebProgress OnSecurityChange(<null>, <null>, 4) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1…") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnProgressChange(<null>, <null>, 0, 0, 511, 511) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Waiting for 127.0.0.1…") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.

### And the resource finally is loaded (and document state is complete)
[Parent 12948: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1…") on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670024226 Marionette  TRACE @@@@@ MarionetteEventsChild:PageLoadEvent type:pageshow, uri:http://127.0.0.1:50671/slow_resource.html
[Parent 12948: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:50671/slow_resource.html, originalURI:http://127.0.0.1:50671/slow_resource.html}, STATE_STOP|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2a, url:http://127.0.0.1:50671/slow_resource.html}
1642670024227 RemoteAgent TRACE [42] Web progress state=stop: http://127.0.0.1:50671/slow_resource.html
1642670024227 RemoteAgent TRACE @@@@@ onStateChange documentURI http://127.0.0.1:50671/slow_resource.html

From this, onLocationChange, seems slightly too early. But I'm not sure I see anything specific enough in any of the onProgressChange/onStatusChange to use them as clear markers of "DOMContentLoaded" has happened.

Quick overview of the other issues

detecting about:*error redirections is not straightforward. If you attempt to navigate to "thisprotocoldoesnotexist://", you will reach a STATE_STOP for this URL before you are redirected to about:neterror. DevTools detects such bad requests by checking status != Cr.NS_OK && status != Cr.NS_BINDING_ABORTED ...

For this topic we probably have to know which error status imply that a redirection to an error page is about to happen. For the "thisprotocoldoesnotexist://" use case, we get NS_ERROR_UNKNOWN_PROTOCOL, while for the image navigation we get NS_ERROR_PARSED_DATA_CACHED. So we probably need to also check against NS_ERROR_PARSED_DATA_CACHED before we consider the request as incomplete.

I have a lot of intermittent about:blank navigations, although that might be because I was missing the fix from Bug 1747359. Currently rebasing, we'll see if the situation improves.

This improved significantly after rebasing, so hopefully this was just linked to the missing patch from Bug 1747359

overall there seems to be an issue with navigations to pages with inline script tags which open a dialog, eg inline("<script>window.{}('Hello');</script>".format(dialog_type))

Allowing for a small delay after STATE_STOP seems to fix the intermittent issues. I am pretty sure that STATE_STOP is only received after the script has fully executed. But on some platforms it seems that we receive STATE_STOP after the script was executed, but before the alert is open and ready to be used.

Hi Olli!

We are trying to use WebProgressListener to monitor navigations in Marionette. Marionette supports several page load strategies and one of them is "Eager" where we are supposed to resolve as soon as the document is "interactive". This comes after WebProgressListener onStateChange reaches STATE_START, but before STATE_STOP. So far, I ended up using onLocationChange, which seems to be triggered around the same time as the DOMContentLoaded, but this might just be by accident. See comment #10 for more details.

Is WebProgressListener a good fit to monitor a navigation and detect when the document becomes "interactive"? Should we try to use the other callbacks (onStatusChange, onProgressChange)?

Flags: needinfo?(bugs)

For the issue with dialogs opened from inline script (inline("<script>window.{}('Hello');</script>".format(dialog_type))), my analysis was wrong. The test where this seems to frequently fail is

    session.url = inline("<script>window.{}('Hello');</script>".format(dialog_type))
    session.alert.accept()

    session.url = inline("<script>window.{}('Hello');</script>".format(dialog_type))
    response = send_alert_text(session, "Federer")
    assert_error(response, "element not interactable")

And the issue is with the second inline call. We seem to get the STATE_STOP from the previous navigation from time to time. I think I can workaround this by only accepting a STATE_STOP after a STATE_START.

(In reply to Julian Descottes [:jdescottes] from comment #13)

And the issue is with the second inline call. We seem to get the STATE_STOP from the previous navigation from time to time. I think I can workaround this by only accepting a STATE_STOP after a STATE_START.

Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):
https://searchfox.org/mozilla-central/rev/c1598f6d3edad19ccc53f53ab045d1d29835e1dd/remote/shared/Navigate.jsm#105-111

I have a relatively green try for the remote-protocol preset: https://treeherder.mozilla.org/jobs?repo=try&revision=8551d16d9859e4a5575169afc86f315b323e587d

The main failures remaining are from wpt tests, and can be reproduced locally. The issue seems related to the PageLoadStrategy eager. As hinted earlier, the onLocationChange callback is triggered slightly before the DOMContentLoaded event we use in the current implementation, and this leads to errors such as window.__wptrunner_process_next_event. Waiting slightly before resolving seems to fix the issue, but at this point we should wait for Olli's feedback.

Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):

Thanks, and yes I plan to reuse it in the final version! I am using it a bit at the moment, but since my patch is not very stable I prefer to keep most of the code independent.

See Also: → 1751856

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #14)

Interesting! This would also be a good addition for the existing ProgressListener class here (which you may already have updated/used):
https://searchfox.org/mozilla-central/rev/c1598f6d3edad19ccc53f53ab045d1d29835e1dd/remote/shared/Navigate.jsm#105-111

Note that for my patch, it's fine to expect a STATE_START first, because in theory the navigation has not started until I trigger the callback passed to waitForNavigationCompleted. For the initial load, we can't be sure that we are starting to listen before some navigation started. So we should be careful if trying to apply the same pattern elsewhere.

Technically DOMContentLoaded doesn't map exactly to anything in WebProgressListener, nor readyState changes.
This is basically because of https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/dom/base/Document.cpp#8074
At that point readyState is expected to be already interactive and see the callers why DOMContentLoaded may be delayed (ScriptLoader dealing with deferred scripts).

ReadyState gets value interactive when the page has been created, basically all of its DOM
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/dom/base/nsContentSink.cpp#790
https://searchfox.org/mozilla-central/rev/02bd3e02b72d431f2c600a7328697a521f87d9b6/parser/html/nsHtml5TreeOpExecutor.cpp#165,167,178

WebProgressListener doesn't really know about interactive state, since the state depends on HTML parser passing treeOps from the parser thread to the main thread and then creating the DOM.

Flags: needinfo?(bugs)

So if you must do something exactly when readyState changes to "interactive", a capturing event listener for readystatechange is possibly the only option. We could add something new, if event listeners are difficult to use in this case.

Thanks for the feedback! This means that if we want to use webprogresslistener, we still needto listen to readystatechange in the content process. I will try to adapt my current patch to handle this properly.

Comment on attachment 9264170 [details]
Bug 1664165 - [remote] Emit pageload internal events for existing WindowGlobalMessageHandlers

Revision D138900 was moved to bug 1756809. Setting attachment 9264170 [details] to obsolete.

Attachment #9264170 - Attachment is obsolete: true

Comment on attachment 9264279 [details]
Bug 1664165 - [remote] Add basic navigation helper to RootMessageHandler

Revision D138958 was moved to bug 1756809. Setting attachment 9264279 [details] to obsolete.

Attachment #9264279 - Attachment is obsolete: true

Unassigning for now this no longer blocks m3, we should now focus on bug 1756809 to get an initial navigation helper implemented.

Assignee: jdescottes → nobody
Status: ASSIGNED → NEW
Priority: P2 → P3
No longer blocks: 1730642
Summary: Make use of a webprogress listener instead of page load events for tracking active navigations → Make use of a webprogress listener instead of page load events for tracking active navigations in Marionette
Whiteboard: [bidi-m3-mvp] → [webdriver:backlog]
Product: Testing → Remote Protocol

We have made good experience with the WebProgress listener for tracking a navigation. As such we probably also want to let Marionette use the same logic so that we can get rid of listening / checking load events. As seen in some cases we might miss those especially when a remoteness change takes place.

I wonder if we should prioritize this work a bit higher to make certain navigation related APIs more stable in Marionette.

Whiteboard: [webdriver:backlog] → [webdriver:backlog][webdriver:triage]

Maybe we should not share with the BiDi implementation immediately because we need additional checks like modal dialogs for Marionette. As such lets use the WebProgressListener as next step to make navigation behavior more stable.

Points: 13 → 5
Priority: P3 → P2
Whiteboard: [webdriver:backlog][webdriver:triage] → [webdriver:backlog]
Blocks: 1833426
No longer blocks: 1760291

Given that amount of different failures during navigation with Marionette I put a stab on that and converted the code to make use of the webprogress listener from the shared navigation module. It works pretty fine locally. Here a full try build for verification:

https://treeherder.mozilla.org/jobs?repo=try&revision=6ad8cb259f8061599a74218fd3825d35bdea0b78

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

I forgot to actually post an update on December 29th. I was able to fix the listed Mn failures from the last try push, but saw that wpt tests are basically perma-failing now with this change:

https://treeherder.mozilla.org/jobs?repo=try&revision=29047464b77539131dfd95c6e6a2e3c2e915b94a

I've to take a look at this problem next. Here a new try build with Marionette trace logging enabled:
https://treeherder.mozilla.org/jobs?repo=try&revision=7166516525b3c8d3da40a943a626681363290adc

Depends on: 1878690
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: