Closed Bug 1895908 Opened 7 months ago Closed 6 months ago

First visual change and reported time to first byte 1 second slower on Fenix compared to Chrome, en.m.wikipedia.org, high latency network

Tracking

()

Status:

RESOLVED FIXED

Project Flags:

Performance Impact

high

Tracking Flags:

Tracking

Status

firefox128

---

fixed

People

(Reporter: acreskey, Assigned: acreskey)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files)

fenix_chrome.mp4 7 months ago Andrew Creskey [:acreskey] 600.27 KB, video/mp4		Details
trace-1.json.gz 7 months ago Andrew Creskey [:acreskey] 4.13 MB, application/x-gzip		Details
chrome_en.m.wikipedia.org.har 7 months ago Andrew Creskey [:acreskey] 1.96 MB, application/octet-stream		Details
chrome_wikipedia.pcap 7 months ago Andrew Creskey [:acreskey] 631.64 KB, application/octet-stream		Details
fenix_wikipedia.pcap 7 months ago Andrew Creskey [:acreskey] 9.37 MB, application/octet-stream		Details

Andrew Creskey [:acreskey]

Assignee

Description

•

7 months ago

Fenix is 1 second slower than Chrome in first visual change and browsertime-reported time to first byte when loading https://en.m.wikipedia.org/wiki/Portal:Current_events on a high latency network (1000ms round trip time). This difference is reproducible.

Browsertime pageload comparison

I'm not convinced that there is actually a timing difference in the network stack because the overall pageload time times are equal. But will continue to investigate.

However while the SpeedIndex for Fenix actually comes out ahead, I believe that the overall user experience is much better in Chrome as the bulk of the content is presented 1 second sooner (see video).

Andrew Creskey [:acreskey]

Assignee

Comment 1

•

7 months ago

Attached video fenix_chrome.mp4 — Details

Fenix on the left, Chrome on the right.

Andrew Creskey [:acreskey]

Assignee

Comment 2

•

7 months ago

Fenix profile, captured from a Pixel 3
https://share.firefox.dev/4bNDJOr
The 2 seconds in "http request and waiting for response" for https://en.m.wikipedia.org/wiki/Portal:Current_events seems to involve more round trips than I would have guessed.

Assignee: nobody → acreskey

Andrew Creskey [:acreskey]

Assignee

Comment 3

•

7 months ago

Attached file trace-1.json.gz — Details

Chrome trace, loading the same site with 1000 ms rtt.

I'm a novice at reading Chrome traces, but I do see the following under NetLog

HTTP_STREAM_JOB_INIT_CONNECTION     2,189.964 ms
TCP_CONNECT                         1,044.227 ms
SSL_CONNECT                         1,052.544 ms. (TLS 1.3)

And, elsewhere in the Netlog

https://en.m.wikipedia.org/wiki/Portal:Current_events      5,469.241 ms

I'm seeing about 6.5s for us to make the GET request.
I wonder if we're using TLS 1.2 here?

Andrew Creskey [:acreskey]

Assignee

Comment 4

•

7 months ago

Seeing TLS v1.3 when connecting to the session via remote debugging, which makes sense as the firefox profile showed one round trip for tls.

Andrew Creskey [:acreskey]

Assignee

Comment 5

•

7 months ago

•

Edited

Attached file chrome_en.m.wikipedia.org.har — Details

Attached a Chrome HAR file captured via chrome://inspect/#devices

I am seeing the ~5.5 seconds for the root resource Get request again in this one.

Andrew Creskey [:acreskey]

Assignee

Comment 6

•

7 months ago

Attached file chrome_wikipedia.pcap — Details

Attaching Chrome wireshark capture, via PCAPDroid, encrypted, but you can see key events.

Andrew Creskey [:acreskey]

Assignee

Comment 7

•

7 months ago

Attached file fenix_wikipedia.pcap — Details

And Fenix wireshark capture, via PCAPDroid, encrypted.

Haven't yet aligned them and compared.

Andrew Creskey [:acreskey]

Assignee

Comment 8

•

7 months ago

If I run the same comparison on desktop, Firefox vs Chrome, I'm seeing Firefox to be faster on both networking and visual metrics
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1390937340

Profiles, sometimes >2 seconds in 'http request and waiting for response', sometimes less.
https://share.firefox.dev/4bf047D
https://share.firefox.dev/3QFmGG8

Andrew Creskey [:acreskey]

Assignee

Comment 9

•

7 months ago

It's the https rr query that's causing the delay in the first byte.

Comparing in geckoview nightly (default to network.dns.native_https_query:true) to network.dns.native_https_query:false
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1739726771

We block on the rr request so in a high-latency environment this adds to first byte, particularly if the OS had already cached the dns record.

Andrew Creskey [:acreskey]

Assignee

Updated

•

7 months ago

Comment 10

•

7 months ago

Related, I wonder how Chrome on Android can connect to sites via HTTP/3 on a new profile, cold load, without incurring this latency hit.

Andrew Creskey [:acreskey]

Assignee

Comment 11

•

7 months ago

With Chrome compared as well: https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=1627531276

Dennis Jackson

Comment 12

•

7 months ago

Just to comment something that Valentin and I discussed at the all hands:

When we're using DoH, we wait for the HTTPS RR because we need it for ECH and if we race a non-ECH connection then we lost privacy. My understanding from Valentin is that with the new HTTPS RR support in the OS Resolver, we're still doing the same wait, although here the privacy benefit of the delay is much less because likely our DNS request went out in plaintext anyway. So only waiting a short time for the HTTPS RR in line with the new happy eyeballs proposal might be reasonable.

It might also be worth a look at [1] and [2] if you haven't already.

[1] https://datatracker.ietf.org/doc/draft-pauly-v6ops-happy-eyeballs-v3/

[2] https://github.com/tfpauly/draft-happy-eyeballs-v3/issues/6

Andrew Creskey [:acreskey]

Assignee

Comment 13

•

7 months ago

Thank you Dennis. Let me catch up on the readings. A short wait (or race) seems good.

As it currently stands, this is a problem because with just 100ms of additional latency this gap makes us measurably slower than Chrome.
https://docs.google.com/spreadsheets/d/1HRGD1tz6vWmTtrttcPp8QcmP2Ha-GkTJ4clUH_EeBec/edit#gid=144887131

Andrew Creskey [:acreskey]

Assignee

Comment 14

•

7 months ago

Happy Eyeballs for SVCB / HTTPS RR looks good to me from a first read,

   Additionally, if the client also wants to receive SVCB / HTTPS
   resource records (RRs) [SVCB], it SHOULD issue the SVCB query
   immediately before the AAAA and A queries (prioritizing the SVCB
   query since it can also include address hints).  If the client has
   only one of IPv4 or IPv6 connectivity, it still issues the SVCB query
   prior to whichever AAAA or A query is appropriate.  Note that upon
   receiving a SVCB answer, the client might need to issue futher AAAA
   and/or A queries to resolve the service name included in the RR.

   Implementations SHOULD NOT wait for all answers to return before
   attempting connection establishment.  If one query fails to return or
   takes significantly longer to return, waiting for the other answers
   can significantly delay the connection establishment of the first
   one.  Therefore, the client SHOULD treat DNS resolution as
   asynchronous.  Note that if the platform does not offer an
   asynchronous DNS API, this behavior can be simulated by making
   separate synchronous queries, each on a different thread.

Andrew Creskey [:acreskey]

Assignee

Comment 15

•

6 months ago

I believe we can see the impact of waiting for HTTPS RR in Fenix via nightly telemetry.
Enabled March 8, 2024

And I see a 10-20% regression in networking.http_channel_page_open_to_first_sent on Android.

Andrew Creskey [:acreskey]

Assignee

Updated

•

6 months ago

Performance Impact: --- → high

Frank Doty [:fdoty]

Updated

•

6 months ago

Blocks: 1894804

Andrew Creskey [:acreskey]

Assignee

Comment 16

•

6 months ago

We're going to hold HTTPS RR until we develop a non-blocking method of retrieve the records, at least with native dns. See Bug 1897462.

Andrew Creskey [:acreskey]

Assignee

Updated

•

6 months ago

Depends on: 1898191

Andrew Creskey [:acreskey]

Assignee

Comment 17

•

6 months ago

Verified fixed with bug 1898191
Can be seen in local tests with extremely high latency, 2000ms rtt

Status: NEW → RESOLVED

Closed: 6 months ago

Resolution: --- → FIXED

Mathew Hodson

Updated

•

4 months ago

status-firefox128: --- → fixed

You need to log in before you can comment on or make changes to this bug.