First visual change and reported time to first byte 1 second slower on Fenix compared to Chrome, en.m.wikipedia.org, high latency network
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: acreskey, Assigned: acreskey)
References
(Blocks 2 open bugs)
Details
Attachments
(5 files)
Fenix is 1 second slower than Chrome in first visual change and browsertime-reported time to first byte when loading https://en.m.wikipedia.org/wiki/Portal:Current_events on a high latency network (1000ms round trip time). This difference is reproducible.
Browsertime pageload comparison
I'm not convinced that there is actually a timing difference in the network stack because the overall pageload time times are equal. But will continue to investigate.
However while the SpeedIndex for Fenix actually comes out ahead, I believe that the overall user experience is much better in Chrome as the bulk of the content is presented 1 second sooner (see video).
Assignee | ||
Comment 1•7 months ago
|
||
Fenix on the left, Chrome on the right.
Assignee | ||
Comment 2•7 months ago
|
||
Fenix profile, captured from a Pixel 3
https://share.firefox.dev/4bNDJOr
The 2 seconds in "http request and waiting for response" for https://en.m.wikipedia.org/wiki/Portal:Current_events seems to involve more round trips than I would have guessed.
Assignee | ||
Comment 3•7 months ago
|
||
Chrome trace, loading the same site with 1000 ms rtt.
I'm a novice at reading Chrome traces, but I do see the following under NetLog
HTTP_STREAM_JOB_INIT_CONNECTION 2,189.964 ms
TCP_CONNECT 1,044.227 ms
SSL_CONNECT 1,052.544 ms. (TLS 1.3)
And, elsewhere in the Netlog
https://en.m.wikipedia.org/wiki/Portal:Current_events 5,469.241 ms
I'm seeing about 6.5s for us to make the GET request.
I wonder if we're using TLS 1.2 here?
Assignee | ||
Comment 4•7 months ago
|
||
Seeing TLS v1.3 when connecting to the session via remote debugging, which makes sense as the firefox profile showed one round trip for tls.
Assignee | ||
Comment 5•7 months ago
•
|
||
Attached a Chrome HAR file captured via chrome://inspect/#devices
I am seeing the ~5.5 seconds for the root resource Get request again in this one.
Assignee | ||
Comment 6•7 months ago
|
||
Attaching Chrome wireshark capture, via PCAPDroid, encrypted, but you can see key events.
Assignee | ||
Comment 7•7 months ago
|
||
And Fenix wireshark capture, via PCAPDroid, encrypted.
Haven't yet aligned them and compared.
Assignee | ||
Comment 8•7 months ago
|
||
If I run the same comparison on desktop, Firefox vs Chrome, I'm seeing Firefox to be faster on both networking and visual metrics
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1390937340
Profiles, sometimes >2 seconds in 'http request and waiting for response', sometimes less.
https://share.firefox.dev/4bf047D
https://share.firefox.dev/3QFmGG8
Assignee | ||
Comment 9•7 months ago
|
||
It's the https rr query that's causing the delay in the first byte.
Comparing in geckoview nightly (default to network.dns.native_https_query:true) to network.dns.native_https_query:false
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1739726771
We block on the rr request so in a high-latency environment this adds to first byte, particularly if the OS had already cached the dns record.
Assignee | ||
Comment 10•7 months ago
|
||
Related, I wonder how Chrome on Android can connect to sites via HTTP/3 on a new profile, cold load, without incurring this latency hit.
Assignee | ||
Comment 11•7 months ago
|
||
With Chrome compared as well: https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1627531276
Comment 12•7 months ago
|
||
Just to comment something that Valentin and I discussed at the all hands:
When we're using DoH, we wait for the HTTPS RR because we need it for ECH and if we race a non-ECH connection then we lost privacy. My understanding from Valentin is that with the new HTTPS RR support in the OS Resolver, we're still doing the same wait, although here the privacy benefit of the delay is much less because likely our DNS request went out in plaintext anyway. So only waiting a short time for the HTTPS RR in line with the new happy eyeballs proposal might be reasonable.
It might also be worth a look at [1] and [2] if you haven't already.
[1] https://datatracker.ietf.org/doc/draft-pauly-v6ops-happy-eyeballs-v3/
[2] https://github.com/tfpauly/draft-happy-eyeballs-v3/issues/6
Assignee | ||
Comment 13•7 months ago
|
||
Thank you Dennis. Let me catch up on the readings. A short wait (or race) seems good.
As it currently stands, this is a problem because with just 100ms of additional latency this gap makes us measurably slower than Chrome.
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=144887131
Assignee | ||
Comment 14•7 months ago
|
||
Happy Eyeballs for SVCB / HTTPS RR looks good to me from a first read,
Additionally, if the client also wants to receive SVCB / HTTPS
resource records (RRs) [SVCB], it SHOULD issue the SVCB query
immediately before the AAAA and A queries (prioritizing the SVCB
query since it can also include address hints). If the client has
only one of IPv4 or IPv6 connectivity, it still issues the SVCB query
prior to whichever AAAA or A query is appropriate. Note that upon
receiving a SVCB answer, the client might need to issue futher AAAA
and/or A queries to resolve the service name included in the RR.
Implementations SHOULD NOT wait for all answers to return before
attempting connection establishment. If one query fails to return or
takes significantly longer to return, waiting for the other answers
can significantly delay the connection establishment of the first
one. Therefore, the client SHOULD treat DNS resolution as
asynchronous. Note that if the platform does not offer an
asynchronous DNS API, this behavior can be simulated by making
separate synchronous queries, each on a different thread.
Assignee | ||
Comment 15•6 months ago
|
||
I believe we can see the impact of waiting for HTTPS RR in Fenix via nightly telemetry.
Enabled March 8, 2024
And I see a 10-20% regression in networking.http_channel_page_open_to_first_sent on Android.
Assignee | ||
Updated•6 months ago
|
Assignee | ||
Comment 16•6 months ago
|
||
We're going to hold HTTPS RR until we develop a non-blocking method of retrieve the records, at least with native dns. See Bug 1897462.
Assignee | ||
Comment 17•6 months ago
|
||
Verified fixed with bug 1898191
Can be seen in local tests with extremely high latency, 2000ms rtt
Updated•4 months ago
|
Description
•