Closed Bug 1599257 Opened 5 years ago Closed 4 years ago

"Page.navigate" has to block until a response received

Categories

(Remote Protocol :: CDP, defect, P1)

defect

Tracking

(firefox77 fixed)

RESOLVED FIXED
Firefox 77
Tracking Status
firefox77 --- fixed

People

(Reporter: whimboo, Assigned: impossibus)

References

Details

(Whiteboard: [puppeteer-beta-mvp])

Attachments

(5 files)

When checking the logs from Chrome I can see that the Page.navigate command blocks until the Network.responseReceived event has been emitted:

  puppeteer:protocol SEND ► {"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719","method":"Page.navigate","params":{"url":"http://localhost/~henrik/page.html","frameId":"EF117F4222FDCCDD62044A9F9A001C59"},"id":17} +3ms

  puppeteer:protocol ◀ RECV {"method":"Network.requestWillBeSent","params":{"requestId":"D667AFB7CA9718CABDC8B52822C26DE4","loaderId":"D667AFB7CA9718CABDC8B52822C26DE4","documentURL":"http://localhost/~henrik/page.html","request":{"url":"http://localhost/~henrik/page.html","method":"GET","headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36","Sec-Fetch-User":"?1"},"mixedContentType":"none","initialPriority":"VeryHigh","referrerPolicy":"no-referrer-when-downgrade"},"timestamp":45457.827741,"wallTime":1574717750.218332,"initiator":{"type":"other"},"type":"Document","frameId":"EF117F4222FDCCDD62044A9F9A001C59","hasUserGesture":false},"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719"} +6ms

  puppeteer:protocol ◀ RECV {"method":"Network.requestWillBeSentExtraInfo","params":{"requestId":"D667AFB7CA9718CABDC8B52822C26DE4","blockedCookies":[],"headers":{"Host":"localhost","Connection":"keep-alive","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36","Sec-Fetch-User":"?1","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3","Sec-Fetch-Site":"none","Sec-Fetch-Mode":"navigate","Accept-Encoding":"gzip, deflate, br","Accept-Language":"en-GB,en-US;q=0.9,en;q=0.8"}},"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719"} +3ms

  puppeteer:protocol ◀ RECV {"method":"Network.responseReceivedExtraInfo","params":{"requestId":"D667AFB7CA9718CABDC8B52822C26DE4","blockedCookies":[],"headers":{"Date":"Mon, 25 Nov 2019 21:35:50 GMT","Server":"Apache/2.4.34 (Unix) PHP/7.1.23","Last-Modified":"Thu, 21 Nov 2019 21:48:38 GMT","ETag":"\"53-597e242857980\"","Accept-Ranges":"bytes","Content-Length":"83","Keep-Alive":"timeout=5, max=100","Connection":"Keep-Alive","Content-Type":"text/html"},"headersText":"HTTP/1.1 200 OK\r\nDate: Mon, 25 Nov 2019 21:35:50 GMT\r\nServer: Apache/2.4.34 (Unix) PHP/7.1.23\r\nLast-Modified: Thu, 21 Nov 2019 21:48:38 GMT\r\nETag: \"53-597e242857980\"\r\nAccept-Ranges: bytes\r\nContent-Length: 83\r\nKeep-Alive: timeout=5, max=100\r\nConnection: Keep-Alive\r\nContent-Type: text/html\r\n\r\n"},"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719"} +2ms

  puppeteer:protocol ◀ RECV {"method":"Network.responseReceived","params":{"requestId":"D667AFB7CA9718CABDC8B52822C26DE4","loaderId":"D667AFB7CA9718CABDC8B52822C26DE4","timestamp":45457.834269,"type":"Document","response":{"url":"http://localhost/~henrik/page.html","status":200,"statusText":"OK","headers":{"Date":"Mon, 25 Nov 2019 21:35:50 GMT","Server":"Apache/2.4.34 (Unix) PHP/7.1.23","Last-Modified":"Thu, 21 Nov 2019 21:48:38 GMT","ETag":"\"53-597e242857980\"","Accept-Ranges":"bytes","Content-Length":"83","Keep-Alive":"timeout=5, max=100","Connection":"Keep-Alive","Content-Type":"text/html"},"headersText":"HTTP/1.1 200 OK\r\nDate: Mon, 25 Nov 2019 21:35:50 GMT\r\nServer: Apache/2.4.34 (Unix) PHP/7.1.23\r\nLast-Modified: Thu, 21 Nov 2019 21:48:38 GMT\r\nETag: \"53-597e242857980\"\r\nAccept-Ranges: bytes\r\nContent-Length: 83\r\nKeep-Alive: timeout=5, max=100\r\nConnection: Keep-Alive\r\nContent-Type: text/html\r\n\r\n","mimeType":"text/html","requestHeaders":{"Host":"localhost","Connection":"keep-alive","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36","Sec-Fetch-User":"?1","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3","Sec-Fetch-Site":"none","Sec-Fetch-Mode":"navigate","Accept-Encoding":"gzip, deflate, br","Accept-Language":"en-GB,en-US;q=0.9,en;q=0.8"},"requestHeadersText":"GET /~henrik/page.html HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36\r\nSec-Fetch-User: ?1\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-GB,en-US;q=0.9,en;q=0.8\r\n","connectionReused":false,"connectionId":41,"remoteIPAddress":"[::1]","remotePort":80,"fromDiskCache":false,"fromServiceWorker":false,"fromPrefetchCache":false,"encodedDataLength":292,"timing":{"requestTime":45457.829565,"proxyStart":-1,"proxyEnd":-1,"dnsStart":0.814,"dnsEnd":0.822,"connectStart":0.822,"connectEnd":0.98,"sslStart":-1,"sslEnd":-1,"workerStart":-1,"workerReady":-1,"sendStart":1.016,"sendEnd":1.077,"pushStart":0,"pushEnd":0,"receiveHeadersEnd":3.332},"protocol":"http/1.1","securityState":"secure"},"frameId":"EF117F4222FDCCDD62044A9F9A001C59"},"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719"} +2ms

  puppeteer:protocol ◀ RECV {"id":17,"result":{"frameId":"EF117F4222FDCCDD62044A9F9A001C59","loaderId":"D667AFB7CA9718CABDC8B52822C26DE4"},"sessionId":"1717CA6E17AF1412C5304DFEB1CDF719"} +1ms

Right now our implementation returns immediately:

  puppeteer:protocol SEND ► {"sessionId":1,"method":"Page.navigate","params":{"url":"http://localhost/~henrik/page.html","frameId":23},"id":17} +1s

  puppeteer:protocol ◀ RECV {"sessionId":1,"id":17,"result":{"frameId":23}} +21ms
  
  puppeteer:protocol ◀ RECV {"sessionId":1,"method":"Network.requestWillBeSent","params":{"requestId":"100837242175489","loaderId":"100837242175489","documentURL":"about:blank","request":{"url":"http://localhost/~henrik/page.html","method":"GET","headers":[],"hasPostData":false,"isLinkPreload":false},"type":"Document","frameId":23}} +7ms

By doing that we would be able to also forward the loaderId, and have a proper errorText if a request fails. Both is currently missing.

Blocks: 1599260
Blocks: 1603776
Priority: P3 → P2
Whiteboard: [puppeteer-beta-mvp]

Add priority tag because early return from Page.navigate introduces races in the life cycle of execution contexts, and therefore impacts many other commands.

Whiteboard: [puppeteer-beta-mvp] → [puppeteer-beta-mvp][puppeteer-high-priority]
Blocks: 1618863
Assignee: nobody → mjzffr
Blocks: 1612174
Status: NEW → ASSIGNED
Priority: P2 → P1

Seems like webNavigation could be useful here and for missing page life cycle events in general.

But that's only for browser extensions, right?

I think we really have to get the Network.responseReceived event working (see comment 0) first, and let Page.navigate wait for it.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #3)

But that's only for browser extensions, right?

Yeah, nevermind. Plus, I mixed it up with nsIWebNavigation (which is very different, and not useful in this case).

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #4)

I think we really have to get the Network.responseReceived event working (see comment 0) first, and let Page.navigate wait for it.

Perhaps. I'm not sure it's a good idea to mix domains that way, similar concern as in Bug 1603223, but it can be a short-term solution.

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #5)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #4)

I think we really have to get the Network.responseReceived event working (see comment 0) first, and let Page.navigate wait for it.

Perhaps. I'm not sure it's a good idea to mix domains that way, similar concern as in Bug 1603223, but it can be a short-term solution.

Network.responseReceived should emit an internal event, which the navigation code could subscribe for, and unsubscribe right after. That way I don't see that we would have to mix domain code.

Judging by how Page.navigate is actually used in clients like Puppeteer and Playwright, I think all this method is expected to do is start the navigation and return an error, if any. In practice, clients call Page.navigate and then wait for other page life cycle events to draw conclusions about the result of the navigation. Taking Puppeteer as an example (but the
same approach is taken in Playwright), their Page.goto ultimately calls into a method that sends Page.navigate and starts a life cycle watcher.

So, I will write a solution that largely leaves our Page.navigate as is and instead make sure we emit other events as expected by clients.

(As an aside, I'd say this method is a good example of where we might diverge in the future in Remote Protocol versus CDP. The CDP semantics are perhaps too low-level here and clients have to write boiler-plate to handle basic navigation correctly. Instead we could handle those details in the Remote Protocol implementation.)

Henrik, before I change blockers and dependencies here, I'd like your input since you had concluded a few months ago that this is causing problems with screenshots and other features. I think our implementations of screenshots etc will have to account for life cycle events rather than relying solely on Page.navigate to wait long enough.

Flags: needinfo?(hskupin)

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #7)

Judging by how Page.navigate is actually used in clients like Puppeteer and Playwright, I think all this method is expected to do is start the navigation and return an error, if any.

And that includes handling the response HTTP status code of the initial request. Otherwise the method won't be able to return error details like this one:

>>> var a = Page.navigate({ url: "https://untrusted-root.badssl.com/" });
>>> a
Promise {
  {
    frameId: 'D501758CF3A9566E820C8BA6B0113AE3',
    loaderId: '034C2344514DB59666E1E1FAAF8ED8F7',
    errorText: 'net::ERR_CERT_INVALID'
  },
  inspect: [AsyncFunction]
}

That given we have to await the Network.responseReceived event and handle its data. See other examples in Puppeteer's navigation.spec.js unit tests. We won't get those details when firing off the page load, and returning immediately as what we do at the moment.

In practice, clients call Page.navigate and then wait for other page life cycle events to draw conclusions about the result of the navigation.

Correct. That is for all the events as fired during a page load like DOMContentLoaded, and load. So that part don't need to be covered here.

Taking Puppeteer as an example (but the
same approach is taken in Playwright), their Page.goto ultimately calls into a method that sends Page.navigate and starts a life cycle watcher.

So, I will write a solution that largely leaves our Page.navigate as is and instead make sure we emit other events as expected by clients.

That is something different and not covered by this bug.

Henrik, before I change blockers and dependencies here, I'd like your input since you had concluded a few months ago that this is causing problems with screenshots and other features. I think our implementations of screenshots etc will have to account for life cycle events rather than relying solely on Page.navigate to wait long enough.

It didn't cause problems, but the examples weren't only waiting for load but networkidle0, which we haven't implemented.

Flags: needinfo?(hskupin)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #8)

And that includes handling the response HTTP status code of the initial request. Otherwise the method won't be able to return error details like this one:

>>> var a = Page.navigate({ url: "https://untrusted-root.badssl.com/" });
>>> a
Promise {
  {
    frameId: 'D501758CF3A9566E820C8BA6B0113AE3',
    loaderId: '034C2344514DB59666E1E1FAAF8ED8F7',
    errorText: 'net::ERR_CERT_INVALID'
  },
  inspect: [AsyncFunction]
}

That given we have to await the Network.responseReceived event and handle its data. See other examples in Puppeteer's navigation.spec.js unit tests. We won't get those details when firing off the page load, and returning immediately as what we do at the moment.

Fair point, thanks for the example. I'll try to use NetworkObserver or something along those lines to get the missing data. The Page domain can't depend on notifications from the Network domain (even an internal event, as you suggest) since those notifications aren't necessarily enabled.

Internal events are always send compared to the public ones. See the difference between request, and Network.requestWillBeSent.

This facilitates monitoring the navigation response.

The remaining work for moving navigation-related code into
the parent process should be done in Bug 1612538.

NetworkObserver will be used by both Network and Page domains.

Depends on D71654

Summary: "Page.navigate" has to block until "Network.responseReceived" is received → "Page.navigate" has to block until a response received
Pushed by mjzffr@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e5e712f9a861
[remote] Move "Page.navigate" to parent process r=remote-protocol-reviewers,whimboo
https://hg.mozilla.org/integration/autoland/rev/82c49d62a2dc
[remote] Create one NetworkObserver per Session r=remote-protocol-reviewers,whimboo
https://hg.mozilla.org/integration/autoland/rev/1fdf3bef9319
[remote] Remove dead code r=remote-protocol-reviewers,whimboo
https://hg.mozilla.org/integration/autoland/rev/9ecdb6c3e171
[remote] Block Page.navigate until navigation request is done r=remote-protocol-reviewers,whimboo
https://hg.mozilla.org/integration/autoland/rev/a1152a9c5dc5
[puppeteer] Update tests r=remote-protocol-reviewers,whimboo
Regressions: 1633042
Whiteboard: [puppeteer-beta-mvp][puppeteer-high-priority] → [puppeteer-beta-mvp]
Blocks: 1603984

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #12)

Created attachment 9141861 [details]
Bug 1599257 - [remote] Create one NetworkObserver per Session

NetworkObserver will be used by both Network and Page domains.

Please note that this was not enough yet given that TabSession sub-classes Session, and as such we register a new NetworkObserver for each single tab, and destroy it when the tab gets closed:

 0:08.15 GECKO(5071) 1588666265692	RemoteAgent	TRACE	<-(connection {a83ed2df-3857-964b-a46f-ee8a5065a4a2}) {"method":"Target.targetCreated","params":{"targetInfo":{"browserContextId":null,"targetId":33,"type":"page","url":"about:blank"}}}
 0:08.21 GECKO(5071) 1588666265756	RemoteAgent	TRACE	Registering observers for NetworkObserver
 0:08.22 GECKO(5071) 1588666265763	RemoteAgent	TRACE	<-(connection {a83ed2df-3857-964b-a46f-ee8a5065a4a2}) {"id":2}
 0:08.23 GECKO(5071) 1588666265770	RemoteAgent	TRACE	<-(connection {a83ed2df-3857-964b-a46f-ee8a5065a4a2}) {"method":"Target.targetDestroyed","params":{"targetId":33}}
 0:08.25 GECKO(5071) 1588666265794	RemoteAgent	TRACE	Unregistered observers for NetworkObserver

With my refactoring on bug 1593226 I will work towards making it a real singleton.

Component: CDP: Page → CDP
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: