Closed Bug 1578227 Opened 5 years ago Closed 2 years ago

Slow loading with a HTTP/2 proxy

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Performance Impact medium
Tracking Status
firefox70 --- fix-optional
firefox71 --- affected

People

(Reporter: armenzg, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:pageload, Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(4 files)

Loading it on Chrome is almost instant. Sometime Nightly will load it fast, however, when it doesn't it can take forever.

The URL is this:
https://arewefastyet.com/win10ref2017/overview?numDays=365

My latest Nightly load took almost 2 minutes.

Here's a profiling:
https://profiler.firefox.com/public/b8cebdde563639c4e424d272f622bea89a429b2d/calltree/?globalTrackOrder=11-12-0-1-2-3-4-5-6-7-8-9-10&hiddenGlobalTracks=1-2-3-4-5-6-7-8-9&localTrackOrderByPid=45327-1-2-0~45328-0~45334-0~45335-0~45333-0~45330-0~45689-0~45688-0~45691-0~45692-0~45331-0-1~&thread=11&v=4

It seems that nothing gets painted until all 400+ requests complete.

The issue does not happen on Fx dev edition.

I don't know why but I can't reproduce it anymore.
Feel free to close this if you like.

Whiteboard: [qf]
Attached image Network pane screenshot

This is happening again today.

This is more or less what the code does.

Promise.all(urls.map(url => (
   // 1st block
   const response = await fetch(url);
   // 2nd block
  // Process response and plot graphs
)))

The default landing page makes 264 requests. I see each request loading on the network pane slowly. Once all of the requests (1st block) are loaded the processing of the received data and plotting of the graphs happen (2nd block).

On Dev Edition, you will notice that graphs start plotting as the data becomes available. On Nightly, you will get all graphs plotted all at once.

I did set once throttling when I was using DevTools and it now shows as 'no throttling', however, could it be that the setting is still lingering around?

Side bug, the Network pane does not show the graphical timeline for each request (see attached screenshot).
You can see that this works in the Developer Edition:
https://cl.ly/d9ecf894be9d

Summary: Loading upcoming version of arewefastyet.com can sometimes be very slow → Loading arewefastyet.com can sometimes be very slow

Hi Armen, have you tried with a fresh profile on Nightly? I tried both Dev Edition and Nightly, it behaves same to me.

I used 'Good 3G' option. On both Nightly and Dev addition, the titles of these graphs will be showed first, then after a while, all graphs will get plotted all at once .

I looked at the original profile, and it seems like it was all about the network requests took a long time to finish. I don't know why though.

Mike, I wonder if you can tell something from the profile?

Flags: needinfo?(mconley)
Flags: needinfo?(armenzg)
Whiteboard: [qf] → [qf:p1:pageload]

Could it be with Firefox private network? It doesn't happen when I disable it.

Flags: needinfo?(armenzg)

Redirecting to jesup, due to comment 5.

Group: mozilla-employee-confidential
Flags: needinfo?(mconley) → needinfo?(rjesup)

It's slightly slower when I load it with FPN but it's still slower without on Firefox compared to Chrome.

I believe I saw no impact from FPN on first measurement (Linux, nightly/inbound locally-built, FPN 0.9, FiOS); not sure why. Loads take 11-14 seconds in firefox.
Chrome on the same fast machine takes 8-9 seconds
However, I got signed out due to inactivity, and after signing in I did a reload, and it took 27s. The extra time all appears to be DNS/waiting for responses. The 27s is repeatable.

With FPN, signed in: (~27s): https://perfht.ml/2lNmiKn, https://perfht.ml/2lJbGMv
With FPN, not signed in: https://perfht.ml/2nf1RGt, https://perfht.ml/2lNmiKn
Without: https://perfht.ml/2mKUare

Flags: needinfo?(rjesup) → needinfo?(amarchesini)

FYI, for comparison Chrome takes 7-9s to load the page.

From the network panel, it appears that we're either single-threading the requests, or limiting them due the the max-per-shard(?) of 6 connections; I believe all the other requests marked as "waiting for Socket Thread" are actually just queued.

I see (perhaps not surprisingly) tons of ssl3 traffic on the SocketThread during the delay (1.1s of total 1.4s of CPU in a 7-second window), and lots of DNS resolver traffic - but that's odd, since these are all the same site

Per baku, the proxy connection is http2. The question is why this takes almost 3 times as long to load over the proxy -- and why the network traffic is takes about 7 times longer to receive all the data (3.4s vs 23s). Visually, you can see there's far less overlap of transfer from what I can see. Perhaps we're running http2 to the proxy, and then the proxy is running 1 http connection to the site instead of 6 that the browser uses??

Flags: needinfo?(honzab.moz)

Is there a reliable test case for this? If we eventually finish (=we paint? or what is the indication exactly?) it could be a good case for BT.

Priority: -- → P1

Honza: quoted from slack:
"Is it related to N requests to a single server causing some pipelining/queuing that doesn't happen in the no-proxy case?"
baku: "It looks so. The proxy is http2 and we use 1 single active connection. so yes, we have a queue/pipelining of the requests.
I can ask mayhemer to work on it or to see if there is something to do to improve this issue"

Component: Performance → Networking

Note: can't work on this for 71.

I looked into the log and, roughly, what the test does is loading a ~400 resources from an h1 server. It means we should behave the same way as if there were no proxy regarding parallelism and queuing. There will be 6 h2+TLS connections to the end server, each going through its respective stream to the h2 proxy. Each connection will server only one request at the time, so there is the standard parallelism limit of 6 per host. This doesn't change with the proxy at all.

It would be good to expose the log somewhere to let someone else than me look at it.

BT may help too, I just don't have time to do run it and analyze, sorry.

Flags: needinfo?(honzab.moz)

Can you send me the log, I may have time to look at it next week?

Flags: needinfo?(rjesup)

shared with you, Honza and baku

Flags: needinfo?(rjesup)

I'll note, however, that I'm not seeing the same amount of slowdown currently. (Some slowdown yes; it took ~15s with FPN vs 10 without, in a separate run from the logging - but it was 28 vs 10-11 before, repeatably, for several people and across several OSes and machines.)

honza - do you have the original log of the slowdown, or a link to it? I don't see it in my drive.

Flags: needinfo?(honzab.moz)

I think I found the original log and shared it

Thanks, Randell. I randomly picked one request to https://treeherder.mozilla.org that took about 6 seconds to finish and I can see the following:

  • the session with the proxy allows only 6 tunnels with the end server as the server is talking h1 only; this is the same as if there were no proxy, so I'm kinda puzzled
  • the limit is enforced here Http2Session.cpp - mozsearch

It would be great to have logs from the same machine/internet connection with and without the proxy to compare. Because the behavior seems to be correct. There is one origin we keep 6 conns tops parallelism. But here instead we use proxy tunnels as (transparent) connections, so there be no major difference.

Actually, how big is the difference? There is definitely an overhead (double TLS, at least).

Flags: needinfo?(honzab.moz) → needinfo?(rjesup)

FYI we recently landed some 5 minute caching code which can make this harder to test on the main deployment:
https://github.com/mozilla-frontend-infra/firefox-performance-dashboard/commit/745a91a528fecf8325681cc1d15b7f9f536d571c

You can use this deployment w/o the caching code:
https://5db05256d995b900073dda65--firefox-performance-dashboard.netlify.com/win10/overview?numDays=60

My load numbers on my machine (w/o caching deployment) is about 30 seconds to complete with FPN.
w/o FPN loads are about 5-6 seconds.

How can I provide you logs?

(In reply to Armen [:armenzg] from comment #20)

How can I provide you logs?

There are few ways, either by env vars or command line args:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Logging_HTTP_activity_by_manually_setting_environment_variables

Thanks a lot!

I hope I'm doing this right.

I created a new profile, I installed FPN and signed in with Firefox accounts.
I loaded:
https://5db05256d995b900073dda65--firefox-performance-dashboard.netlify.com/win10/overview?numDays=60
I zipped up the files and here's a link to them:
https://drive.google.com/file/d/1zgMIr1DIjPylW2xGuFPk4v9a2_Pg6zCI/view?usp=sharing

I will be back on Thursday if you need more info.

^ I attached the logs you requested

Flags: needinfo?(honzab.moz)

I was asking for logs w/ and w/o the proxy in comment 19, but let's look at these too.

I can see even non proxied requests taking a long time. e.g. https://tiles.services.mozilla.com/v4/links/activity-stream takes ~1500 ms to finish, where ~500ms takes between opening and getting to the point we got a SERVER HELLO, which then does OCSP checking. Then we wait for 1000ms, the default soft limit [1]. There are few OCSP requests initiated between those two points. All take nearly 3300-3600ms to even start, because we are waiting for the proxy resolution callback that long.

But I don't see any requests delayed by proxy resolution after the https://5db05256d995b900073dda65--firefox-performance-dashboard.netlify.com/win10/overview?numDays=60 request was made, so it's not the reason.

Back to square 1.

Looking at the page request it self, it's delayed by 1681ms. It's using the proxy. We connect the proxy at:

2019-10-29 00:07:56.537072 UTC - [Parent 41401: Socket Thread]: D/nsSocketTransport   trying address: 162.159.192.20
2019-10-29 00:07:56.556184 UTC - [Parent 41401: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0x126f1f400 outFlags=2]

rtt 20ms, quite good. Then the proxy then answers to our TLS CLIENT HELLO in ~500ms (not expected):
2019-10-29 00:07:57.096350 UTC - [Parent 41401: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0x126f1f400 outFlags=2]
we create a tunnel stream in the session with the proxy just after that:
2019-10-29 00:07:57.097129 UTC - [Parent 41401: Socket Thread]: I/nsHttp Http2Stream::RegisterTunnel 0x12621c000 stream=0x1269f65f0 tunnels=1 [TS......[tlsflags0x00000000]5db05256d995b900073dda65--firefox-performance-dashboard.netlify.com:443 (https:firefox.factor11.cloudflareclient.com:2486)[:]{CIK 86ba23e87ecd35b80}]
then we wait for ~900+ms for an OCSP check for ...netlify.com
2019-10-29 00:07:58.113278 UTC - [Parent 41401: Socket Thread]: I/nsHttp Http2Stream::ChangeState() 0x1269f65f0 from 1 to 2 2=SENDING_BODY
and create a session/stream (inside the tunnel stream) for the request:
2019-10-29 00:07:58.168 ⁃ Http2Stream ⁃ 1269f61a0 ⁃ released ⁃ status=0 ⁃ url=https://5db05256d995b900073dda65--firefox-performance-dashboard.netlify.com/win10/overview?numDays=60
and soon get a response:

2019-10-29 00:07:58.205578 UTC - [Parent 41401: Socket Thread]: I/nsHttp http response [
  HTTP/2.0 304 Not Modified

Looking at how the individual requests go:

for the very first subresource request,

2019-10-29 00:07:58.447113 UTC - [Child 41403: Main Thread]: D/nsHttp Creating HttpChannelChild @0x11f4bc000

we create a new tunnel (new host) on an existing session with the proxy:

2019-10-29 00:07:58.656902 UTC - [Parent 41401: Socket Thread]: E/nsHttp Http2Stream::Http2Stream 0x126a27760 trans=0x0 atrans=0x127256190
2019-10-29 00:07:58.656907 UTC - [Parent 41401: Socket Thread]: I/nsHttp Http2Session::AddStream session=0x12621f000 stream=0x126a27760 serial=15 NextID=0x1B (tentative)

it answer in nearly 1.5 seconds that it managed to connect the end server:

2019-10-29 00:08:00.313536 UTC - [Parent 41401: Socket Thread]: I/nsHttp Http2Session::RecvHeaders 0x12621f000 stream 0x1B priorityLen=0 stream=0x126a27760 end_stream=0 end_headers=4 priority_group=0 paddingLength=0 padded=0
2019-10-29 00:08:00.313576 UTC - [Parent 41401: Socket Thread]: I/nsHttp Http2Stream::ConvertResponseHeaders 0x126a27760 response code 200

and in ~500ms get a response:

2019-10-29 00:08:00.759979 UTC - [Parent 41401: Socket Thread]: I/nsHttp http response [
  HTTP/1.1 200 OK
...
2019-10-29 00:08:00.858934 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::OnStopRequest [this=0x126ce9000 request=0x126879b20 status=0]

further delays (those missing 400ms) is accumulated incrementally, during the opening phase and later processing of OnStart/Data/Stop after the response. there are hundreds of requests to process on the main thread we need to pass through.

for the very last, I can see:

  • proxy resolution takes 396ms, main thread blocking apparently
2019-10-29 00:07:58.647 ⁃ start-time=6575 ⁃ nsHttpChannel ⁃ 1270ad000 ⁃ released ⁃ status=0 ⁃ http-status=200 ⁃ url=https://treeherder.mozilla.org/api/project/mozilla-central/performance/signatures/?framework=10&platform=windows10-64-shippable&subtests=0
...
2019-10-29 00:07:59.391025 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::OnProxyAvailable [this=0x1270ad000 pi=0x12db57980 status=0 mStatus=0]

then classification another 100ms (same reason):

2019-10-29 00:07:59.392420 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::Starting nsChannelClassifier 0x127046850 [this=0x1270ad000]
nsHttpChannel @1270ad000 --> nsChannelClassifier @127046850
2019-10-29 00:07:59.392506 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=0x1270ad000]
2019-10-29 00:07:59.494516 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=0x1270ad000]

then the transaction wait for dispatch because all the requests before simply take time (would need backtrack to scan them all easily...) and there is just 6 parallel cons to this host.

2019-10-29 00:07:59.497030 UTC - [Parent 41401: Main Thread]: D/nsHttp nsHttpChannel::SetupTransaction [this=0x1270ad000, cos=0, prio=0]
...
2019-10-29 00:08:05.269187 UTC - [Parent 41401: Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchAbstractTransaction [ci=TSA.....[tlsflags0x00000000]treeherder.mozilla.org:443 (https:firefox.factor11.cloudflareclient.com:2486)[:]{CIK 86ba23e87ecd35b80} trans=0x1405a2400 caps=31 conn=0x116fbbc00]
2019-10-29 00:08:05.269195 UTC - [Parent 41401: Socket Thread]: E/nsHttp nsHttpConnection::Activate [this=0x116fbbc00 trans=0x1405a2400 caps=31]

So, it looks like either the infra slows proxy and ocsp down a lot or the proxy is simply adding slowness that affects h1 to a single host.

Hard to say more with current tools available. I really want BT integrated to the profiler and simply collect profiles for these use cases.

[1] https://searchfox.org/mozilla-central/rev/b2b0077c2e6a516a76bf8077d6f0237c58f5959a/modules/libpref/init/all.js#87

Flags: needinfo?(honzab.moz)
Priority: P1 → P2

We did several improvements in secure-proxy. I would like to know if the performances are now acceptable.

Flags: needinfo?(amarchesini)

No need to be a confidential bug.

Group: mozilla-employee-confidential

(In reply to Andrea Marchesini [:baku] from comment #25)

We did several improvements in secure-proxy. I would like to know if the performances are now acceptable.

Armen, could you test this again? Thanks.

Severity: normal → S3
Flags: needinfo?(armenzg)
Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][necko-triaged]

Unfortunately the code base has changed quite a bit and I don't know if FPN is anymore available for me in Canada.
I'm OK with closing this OR let me know how I can set up FPN in Canada and I will talk with the new owners of the repo.

Flags: needinfo?(armenzg)

Ignore my previous comment about the codebase changes. I figured it out.

Please let me know how I can get FPN again.

Flags: needinfo?(rjesup)
Whiteboard: [qf:p1:pageload][necko-triaged] → [qf:p2:pageload][necko-triaged]
Priority: P2 → P3
Summary: Loading arewefastyet.com can sometimes be very slow → Slow loaing with a HTTP/2 proxy
Performance Impact: --- → P2
Keywords: perf:pageload
Whiteboard: [qf:p2:pageload][necko-triaged] → [necko-triaged]
Summary: Slow loaing with a HTTP/2 proxy → Slow loading with a HTTP/2 proxy
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

Since we've done several works to improve the performance when using Http/2 proxy, I did a simple test to compare the difference between Fx91 and Fx110 and Chrome to see if there is any improvement. The result is listed below.
Note that the test was done with a local Http/2 proxy and with using https://speed.cloudflare.com/ to collect the download/upload speed.

Fx 91 (Fx91_proxy_speed-results-1677751148.csv)

Direction File size (MB) Median (Mbps) 75th percentile (Mbps) 90th percentile (Mbps)
download 10 80.41956555588371 91.51102928148168 95.33424002899267
download 25 125.74706219933546 130.4678058812531 138.11140408841985
upload 1 34.3350308228357 41.04840136484266 43.93246381990352
upload 10 48.937965260545894 50.14459047637017 50.62817839587125
Fx 110 (Fx110_proxy_speed-results-1677751546.csv)

Direction File size (MB) Median (Mbps) 75th percentile (Mbps) 90th percentile (Mbps)
download 10 101.15701392256501 133.49369304893642 145.41101248939708
download 25 114.06181774016794 116.1205336674619 118.89020754214516
upload 1 24.248546511627904 37.363724226804116 44.093742507791895
upload 10 108.87536856929101 115.26959186553673 125.31965940631319
Chrome (chrome_speed-results-1677760471.csv)

Direction File size (MB) Median (Mbps) 75th percentile (Mbps) 90th percentile (Mbps)
download 10 92.4353657340461 113.97772004181307 131.1672441984319
download 25 102.3852482891703 117.00400597116986 124.96315759255933
upload 1 95.66692173485337 113.37615959257973 120.25702464087883
upload 10 28.344992367679133 42.8498031647118 48.413827262643224

Based on the above result, it seems that Firefox is not worse than chrome. Thus, I think we can close this bug.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: