Closed Bug 1895908 Opened 7 months ago Closed 6 months ago

First visual change and reported time to first byte 1 second slower on Fenix compared to Chrome, en.m.wikipedia.org, high latency network

Categories

(Core :: Performance, defect)

defect

Tracking

()

RESOLVED FIXED
Performance Impact high
Tracking Status
firefox128 --- fixed

People

(Reporter: acreskey, Assigned: acreskey)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files)

Fenix is 1 second slower than Chrome in first visual change and browsertime-reported time to first byte when loading https://en.m.wikipedia.org/wiki/Portal:Current_events on a high latency network (1000ms round trip time). This difference is reproducible.

Browsertime pageload comparison

I'm not convinced that there is actually a timing difference in the network stack because the overall pageload time times are equal. But will continue to investigate.

However while the SpeedIndex for Fenix actually comes out ahead, I believe that the overall user experience is much better in Chrome as the bulk of the content is presented 1 second sooner (see video).

Attached video fenix_chrome.mp4

Fenix on the left, Chrome on the right.

Fenix profile, captured from a Pixel 3
https://share.firefox.dev/4bNDJOr
The 2 seconds in "http request and waiting for response" for https://en.m.wikipedia.org/wiki/Portal:Current_events seems to involve more round trips than I would have guessed.

Assignee: nobody → acreskey
Attached file trace-1.json.gz

Chrome trace, loading the same site with 1000 ms rtt.

I'm a novice at reading Chrome traces, but I do see the following under NetLog

HTTP_STREAM_JOB_INIT_CONNECTION     2,189.964 ms
TCP_CONNECT                         1,044.227 ms
SSL_CONNECT                         1,052.544 ms. (TLS 1.3)

And, elsewhere in the Netlog

https://en.m.wikipedia.org/wiki/Portal:Current_events      5,469.241 ms

I'm seeing about 6.5s for us to make the GET request.
I wonder if we're using TLS 1.2 here?

Seeing TLS v1.3 when connecting to the session via remote debugging, which makes sense as the firefox profile showed one round trip for tls.

Attached a Chrome HAR file captured via chrome://inspect/#devices

I am seeing the ~5.5 seconds for the root resource Get request again in this one.

Attached file chrome_wikipedia.pcap

Attaching Chrome wireshark capture, via PCAPDroid, encrypted, but you can see key events.

Attached file fenix_wikipedia.pcap

And Fenix wireshark capture, via PCAPDroid, encrypted.

Haven't yet aligned them and compared.

If I run the same comparison on desktop, Firefox vs Chrome, I'm seeing Firefox to be faster on both networking and visual metrics
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1390937340

Profiles, sometimes >2 seconds in 'http request and waiting for response', sometimes less.
https://share.firefox.dev/4bf047D
https://share.firefox.dev/3QFmGG8

It's the https rr query that's causing the delay in the first byte.

Comparing in geckoview nightly (default to network.dns.native_https_query:true) to network.dns.native_https_query:false
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1739726771

We block on the rr request so in a high-latency environment this adds to first byte, particularly if the OS had already cached the dns record.

See Also: → 1852752

Related, I wonder how Chrome on Android can connect to sites via HTTP/3 on a new profile, cold load, without incurring this latency hit.

Just to comment something that Valentin and I discussed at the all hands:

When we're using DoH, we wait for the HTTPS RR because we need it for ECH and if we race a non-ECH connection then we lost privacy. My understanding from Valentin is that with the new HTTPS RR support in the OS Resolver, we're still doing the same wait, although here the privacy benefit of the delay is much less because likely our DNS request went out in plaintext anyway. So only waiting a short time for the HTTPS RR in line with the new happy eyeballs proposal might be reasonable.

It might also be worth a look at [1] and [2] if you haven't already.

[1] https://datatracker.ietf.org/doc/draft-pauly-v6ops-happy-eyeballs-v3/

[2] https://github.com/tfpauly/draft-happy-eyeballs-v3/issues/6

Thank you Dennis. Let me catch up on the readings. A short wait (or race) seems good.

As it currently stands, this is a problem because with just 100ms of additional latency this gap makes us measurably slower than Chrome.
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=144887131

Happy Eyeballs for SVCB / HTTPS RR looks good to me from a first read,

   Additionally, if the client also wants to receive SVCB / HTTPS
   resource records (RRs) [SVCB], it SHOULD issue the SVCB query
   immediately before the AAAA and A queries (prioritizing the SVCB
   query since it can also include address hints).  If the client has
   only one of IPv4 or IPv6 connectivity, it still issues the SVCB query
   prior to whichever AAAA or A query is appropriate.  Note that upon
   receiving a SVCB answer, the client might need to issue futher AAAA
   and/or A queries to resolve the service name included in the RR.

   Implementations SHOULD NOT wait for all answers to return before
   attempting connection establishment.  If one query fails to return or
   takes significantly longer to return, waiting for the other answers
   can significantly delay the connection establishment of the first
   one.  Therefore, the client SHOULD treat DNS resolution as
   asynchronous.  Note that if the platform does not offer an
   asynchronous DNS API, this behavior can be simulated by making
   separate synchronous queries, each on a different thread.

I believe we can see the impact of waiting for HTTPS RR in Fenix via nightly telemetry.
Enabled March 8, 2024

And I see a 10-20% regression in networking.http_channel_page_open_to_first_sent on Android.

Performance Impact: --- → high
Blocks: 1894804

We're going to hold HTTPS RR until we develop a non-blocking method of retrieve the records, at least with native dns. See Bug 1897462.

Depends on: 1898191

Verified fixed with bug 1898191
Can be seen in local tests with extremely high latency, 2000ms rtt

Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: