Closed Bug 1596576 Opened 6 years ago Closed 2 years ago

H2 upload speed is so slow in Firefox, all other browsers are fine (please read detailed explanation)

Categories

(Core :: Networking: HTTP, defect, P2)

defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
relnote-firefox --- 116+
firefox115 --- fixed

People

(Reporter: ahmadmob, Assigned: acreskey, NeedInfo)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(5 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

I tried uploading a file and noticed that Firefox is slow with uploading.

Uploading any file in Firefox is so slow (max is 20mbps or 2.4MB/s)

Actual results:

I tried uploading on Google Drive or any speedtest like https://speedof.me, max speed with upload I get is always ~18mbps)

Downloading with Firefox is fine, but not uploading.

The problem exists for me (and my friend) on any browser based on Gecko. I tried Pale Moon and the same problem.

Please read the explanation on "What should have happened? (expected results)" below.

Expected results:

I should be getting my max speed which is 350mbps for upload.

I tried uploading on all other browsers (Edge, Chrome and even Internet Explorer) and I got 45MB/s with upload except for Firefox, I get 2.5MB/s

Now before you tell me to clear cache or reset Firefox or reset or change router or anything, I saved you time by doing a lot of things including:

  1. A fresh Windows install (both Windows 7 and 10)
  2. Older versions of Firefox
  3. Tried on ubuntu, I got the same slow speed (even though it was 35mbps on ubuntu but still far from my max which is 350mbps or 44MB/s)
  4. Tried more than one PC
  5. And lastly I tried on my friend's home with a different ISP. (this shocked me because it's the same problem)

None of the above helped. What I think is that my country (Jordan) has a problem with Firefox ports or something, it's really weird.

One of my friends also has the same problem and he's in another country.
(and to be honest I asked around 10 friends and only one of them has this problem, so it's not really easy to reproduce)
Few people on the internet has had this problem but no helpful answers to it.
e.g. https://www.reddit.com/r/firefox/comments/8fjrcj/youtube_uploads_capped_at_25mbs_in_firefox_not/

I successfully got my upload speed to 8MB/s (still not what I should be getting) on Firefox by forcing an old TLS version in about:config and by setting the following to 1:

security.tls.version.max
and
security.tls.version.fallback-limit

My friend is experienced and he told me the following after I did that with TLS and told him:

"The only idea I have is that Firefox's encryption implementation is absolutely bad or doesn't use hardware acceleration.

Would explain why downgrading TLS increased your speed, and why this problem only occurs with very high upload speeds".

Hopefully you can find the cause of this problem :)
I really don't want to be forced to move to another browser.

I attached a file uploading on Chrome vs Firefox, you can notice on the internet speed meter at the bottom of both pictures how Firefox is extremely slow compared to Chrome.

Thanks a lot in advance for your time reading my wall of text. ;D

I can confirm it, I have the same problem with my upload on Firefox or any other Browser that uses the same engine. Other browsers works just fine. A little help could be really nice to know what really causes it.

Thanks David. That's interesting.

Now I wonder how many people have this problem but didn't notice it. It only happens with people with high upload speeds.

Component: Untriaged → Networking
Product: Firefox → Core

Hello again.

I just messaged a friend of mine and he's located in Portugal with ~100mbps connection, he also has the same problem. He never noticed it until I asked him now. His upload speed is 7MB/s on Firefox and 15MB/s on chrome or any other browser.

This is not as bad as in my case but there's definitely something wrong with Firefox.

The problem exists to like 1/10 of people. Apparently it depends on your country internet (not ISP I believe) and how Firefox processes and deals with the packets.

Thanks

Can you send me a wireshark pcap? You can send it to me via e-mail as well.

Flags: needinfo?(ahmadmob)

Thanks Dragana Damjanovic.

Sent to your email at dd.mozilla@gmail.com.

Flags: needinfo?(ahmadmob)

need-info myself, not to forget.

Flags: needinfo?(dd.mozilla)

Hello Dragana.

I don't know what you mean. I already sent you the pcap to your e-mail.

Ah you made this as a reminder for yourself. Nevermind, sorry. I misunderstood. :)

See Also: → 1605313

Honza is this regression from your work in tailing?

Nhi, we should investigate this.

Flags: needinfo?(nhnguyen)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dd.mozilla)

(In reply to Dragana Damjanovic [:dragana] from comment #11)

Honza is this regression from your work in tailing?

Nhi, we should investigate this.

If you tell me what leads you to that conclusion for this particular bug, then I can take a look.

But reading the title (upload speed slow) - I don't think so. Tailing only delays opening a network request for tracking requests and resources. When such a request starts, tailing has zero influence on that.

If you mean throttling, then it was download-only influencing and is now disabled because of causing more trouble than good.

Flags: needinfo?(honzab.moz) → needinfo?(dd.mozilla)

I think I need-info you on a wrong bug....there was something else with tailing....

Flags: needinfo?(dd.mozilla)

Now I remember:
This reference bug 1605313. Honza, can you take a look at 1605313?

Flags: needinfo?(honzab.moz)

ahmad, can you make a wireshark recording again?
In your old recording I see that in Chrome UDP is used a lot, (i can be QUIC, but wireshark cannot decode is so I do no know)

Flags: needinfo?(ahmadmob)

(In reply to Dragana Damjanovic [:dragana] from comment #14)

Now I remember:
This reference bug 1605313. Honza, can you take a look at 1605313?

done

Flags: needinfo?(honzab.moz)

@Dragana Damjanovic
I will do that in few hours or tomorrow. Thanks for the help.

Flags: needinfo?(ahmadmob)

Hello. Sorry for taking long. I got real busy and forgot.

Here are the two wireshark recordings with both Firefox and Chrome.

Firefox: https://drive.google.com/file/d/1SQSDIHLL_Yk96AJL0euRl4i5ctPTMTeM/view?usp=sharing

Chrome: https://drive.google.com/file/d/1oYhADc1Uv0DhHUp7-u6NEWpDRL2c-jkb/view?usp=sharing

For these recordings I was getting 2.24 MB/s upload speed on Firefox and it was almost fixed at this speed.

On chrome, I was getting 25-40MB/s.
On every other browser (except Firefox) I am getting ~25-45MB/s.

As I said in my OP, I tried everything from fresh Windows install to trying from countless other PCs (friends' PCs etc..) Every ISP in Jordan gives me the same problem with upload speed on Firefox. It's so weird to say the least.

A friend of mine in Portugal also has the same problem (but not as bad, he's getting ~8MB/s on Firefox but 15MB/s on any other browser).

I think I've the same issue.
Mainly on Google Drive.
I see significant difference in upload times comparing FF and Chrome.
Will follow this ticket.

Hello adam, thanks for confirming the issue on your end. The issue with Firefox upload can be easily reproduced and it's so clear with Google Drive for those affected, but if the issue was only with Google Drive I don't think I would've opened this ticket, I noticed it also on ZippyShare and many other upload sites and even some speedtests like https://speedof.me for example.

Where are you from and what's your ISP if I may ask?

(In reply to ahmad from comment #21)

Hello adam, thanks for confirming the issue on your end. The issue with Firefox upload can be easily reproduced and it's so clear with Google Drive for those affected, but if the issue was only with Google Drive I don't think I would've opened this ticket, I noticed it also on ZippyShare and many other upload sites and even some speedtests like https://speedof.me for example.

Where are you from and what's your ISP if I may ask?

I tested Mega.nz today and I got the opposite result, FF was 0.5MB faster than Chrome.
But I'll have to wait a little bit because my tests are not good, I still use 100mb equipment and my fiber line is 200.
I hope to get the 1gb stuff in the coming days and perform some proper tests.

I can confirm this too, but only when tried to upload to google. Other cloud uploads, were just fine.
I really hope google gets hit, again, by a huge fine someday, for anti-competitive behaviour, at least in EU.

Junior, could you investigate this? Thanks!

Flags: needinfo?(nhnguyen) → needinfo?(juhsu)

Hello Reporter,
For upload speed, I can't reproduce locally.
All browsers are able to hit 4-5MB/s in US. Maybe I need to upgrade my ISP :)

I see the wireshark in comment 18, Chrome uses http3 but Firefox uses http2.
We want to compare apple to apple.

Could you try to gather more statistics about upload speed for moving this forward?
(a) Chrome with quic turned off (in chrome://flags)
(b) Firefox with http3 turned on (toggling network.http.http3.enabled value to true in about:config)

I assume at least we can have 20MB/s when we turn on quic based on bug 1627522

Thanks!

Flags: needinfo?(juhsu) → needinfo?(ahmadmob)

Hello Junior. Thanks for your input.

In Chrome, when I set "Experimental QUIC protocol" from "default" to "disabled", my upload speed gets significantly lower, max ~11MB/s, but with it being set on default I get ~48MB/s which is what I should be getting.

Now moving to Firefox, when I set network.http.http3.enabled to "true", I get extremely lower upload speed, around 20KB/s, yes you read that right, it's 20 kilobytes per second! and it even gets stuck sometimes and the file doesn't get uploaded.

Flags: needinfo?(ahmadmob)

Sorry, I should tell this before.

Http3 is an ongoing project and improved day-by-day.
Could you try firefox nightly again with http3 on?

Flags: needinfo?(ahmadmob)

Sure. Please give me until tomorrow to try it. Thanks.

Hey everyone, I've been following this thread and made an account just to help out. I'm on v77.0.1 running on MacOS 14.6. My upload speeds are limited to around 5MB/s, while Chrome and Safari top out around 26MB/s. I've tested the TLS downgrade workaround mentioned early and can confirm that it brings my upload speeds in Firefox in line with other browsers. I've downloaded the nightly to test the http3 setting and it behaves just as Ahmad noted above.

I'm willing to do some testing, though I'm not a programmer by any stretch of the imagination.

Thanks to whoever is looking into this. I'm in the creative industry and moving large files over the internet is a normal work activity for me.

Sorry for taking long. I tested my upload speed with the latest Firefox nightly and network.http.http3.enabled set to true.

It gave me the same results as stable Firefox which is 0-20KB/s

Flags: needinfo?(ahmadmob)

(In reply to ahmad from comment #30)

Sorry for taking long. I tested my upload speed with the latest Firefox nightly and network.http.http3.enabled set to true.

It gave me the same results as stable Firefox which is 0-20KB/s

It's interesting that the max upload speed is only 20KB/s with http3 enabled.
Andy, do you probably have an idea about this?

Flags: needinfo?(agrover)

This bug didn't initially mention http3 at all. I've opened bug 1648266 for http3 upload speed issues. I suggest we keep enabling http3 out of the picture for this issue, which seems to not strictly depend on http3 use.

Blocks: QUIC
Flags: needinfo?(agrover)
See Also: 1605313
See Also: → 1605313
No longer blocks: QUIC

Thanks ahmad and adam.

We definitely need log to see what cuases the slowness, network code or NSS-related code, but I'll wait for bug 1638925 to have a more informative log. We might focus on the reason ~80% slower upload speed from TLS 1.2

Hello ahmad and Adam,
Bug 1638925 is going to latest nightly.
Could you please update the nightly and collect the HTTP log to move this forward?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

The logging string should be appended "events:2" (Note that it's events)
like timestamp,rotate:200,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,events:2

I'd like to see the two logs from TLS 1.1 and TLS 1.2.
And the upload speed after enables logging.

Thanks a lot.

Flags: needinfo?(ahmadmob)
Flags: needinfo?(adam.dincorn)

Here are two logs one with default TLS and one with old TLS version, events:2 was enabled on both of them:

TLS security on default: https://drive.google.com/file/d/1aXr3KjJxF0w2rWQmUSVjafyeANzuOg0S/view?usp=sharing

TLS security set to old version in about:config: https://drive.google.com/file/d/1nGBT2hyXxJ1wzrrj9Oe8qlh4oC_SfWNF/view?usp=sharing

on default TLS I was getting ~1.7MB/s (13.6mbps)

on old TLS, I was getting ~10MB/s (80mbps)

Flags: needinfo?(ahmadmob)

BTW what I meant by old TLS is that I set the following in about:config:

security.tls.version.fallback-limit - from 4 to 1
security.tls.version.max - from 4 to 1
security.tls.version.min - from 3 to 1

If you want me to test with any other TLS settings, please let me know and tell me the numbers I should set on all three of them.

Hi, I'm not a Firefox developer, but to help reduce the number of variables for maintainers ( newer TLS connect to Google Drive with (ECDHE_ECDSA_WITH_)GCM_SHA256, older use ECDHE_ECDSA_WITH_AES_128_CBC_SHA but the problem would be easier to anaylze if the only difference was the TLS version ), after applying the settings below, is uploading fast or slow? If it's fast then the problem is in GCM_SHA256. If it's slow then no news.

security.tls.version.fallback-limit - 3
security.tls.version.max - 3
security.tls.version.min - 3

and in about:config search for "ssl3" and set all of them to "false" except "security.ssl3.ecdhe_ecdsa_aes_128_sha"

Thanks

With the exact settings above by mcccs I got 10MB/s (80mbps) upload speed, same as with old TLS version I tried in my previous comment.

Then I suspect SHA256 instead of the older, faster SHA is the bottleneck. Chrome uses OpenSSL's hardware-optimized SHA256; Firefox's NSS doesn't use hardware acceleration for amd64. If Mozilla decides to add optimized SHA256 and MIT-licensed code is appropriate then I suggest the files named sha256_* in https://github.com/bitcoin/bitcoin/tree/master/src/crypto

The less likely explanation is that GCM is slower than CBC (which I can't understand why? Or if it's SHA256 then why doesn't this happen while downloading?). (I'm not qualified to investigate it further.)

Thanks. ni? myself as a reminder to get back to this.

Flags: needinfo?(juhsu)

The problem is in HTTP/2. If you disable HTTP/2 (via network.http.spdy.enabled.http2 and restart the browser. This doesn't reduce security so you can use this as a permanent workaround), you'll achieve maximum speed. Since this only happens while uploading, I'm guessing it's related to how HTTP/2 interacts with TCP congestion control.

I've set up a local HTTPS Golang server with TLS 1.2 + CBC_SHA and TLS 1.2 + GCM_SHA256 and TLS 1.3 + GCM_SHA256. Tested with a 100MB file, their speeds were approximately equal (160-200MB/s). Then I enabled HTTP/2 of the Golang server caused upload speed to decrease to 110MB/s. HTTP/2 doesn't support AES_128_CBC_SHA but only AES_128_GCM_SHA256 so forcing the use of AES_128_CBC_SHA disabled HTTP/2 and the reporter was able to achieve full speeds. I was able to achieve high speeds on my machine but still significantly lower than HTTP1.1, which suggests as I wrote "related to how HTTP/2 interacts with congestion control".

there is one more possible cause that is the flow control.

mcccs, can give me a summary:
Are upload speed with HTTP/2 lower on Firefox than on Chrome? (chrome should use HTTP/2 as well. not HTTP/3(HTTP/3 in firefox is still in development it is not ready yet))

Flags: needinfo?(mcccs)

HTTP/2 on Chrome and Firefox achieve the exact same upload speeds on my machine. I'll move the server to a different device in the network with 100Mbit link speed and post a comparison soon.

(In reply to mcccs from comment #43)

HTTP/2 on Chrome and Firefox achieve the exact same upload speeds on my machine. I'll move the server to a different device in the network with 100Mbit link speed and post a comparison soon.

Thank you!

Could reproduce !!! :))

Firefox + HTTPS/1.1 = 66 MB/s
Firefox + HTTPS/2.0 = 14 MB/s
Chrome + HTTPS/2.0 = 50 MB/s

PS: After I gathered these results I remembered that that my "network.http.spdy.push-allowance" was 1310720 (1.28MB) instead of the default 131072 (128KB). However I don't think this has impacted the result in any way.

Would it be useful if I shared the Go server source code for you to debug HTTP/2? I used it since I don't have access to internet with high upload speeds.

Flags: needinfo?(mcccs)

Thanks mcccs and all the others for investigating this. I can confirm that setting network.http.spdy.enabled.http2 to false increased my upload speed on FF to 10MB/s instead of 2MB/s, still not what I should be getting but it's probably a Google Drive compatibility thingy with FF. I upgraded my fiber connection to 600Mbps so I should be getting 60MB/s on Google Drive upload (60MB/s is the max Google Drive allows per file upload. I tested on chrome/edge and I got 59MB/s)

@mcccs I have many spare PCs, so if you want to test with high upload speed maybe I can set one PC up for you and you can investigate this in Team Viewer?

Junior, do you have access to a high upload speed link to test this?

Unfortunately, I don't have one as comment 25 said :(

If any of you guys want remote access to one of my PCs please let me know :) I won't be using the PC at all and it will have fresh Windows installation.

I can spin up small server on GCP with high speed connection.
Let me know what needed in order to do the test.

Junior, I sent you an e-mail.

Thanks adam and ahmad, the issue is reproducible without a high speed uplink in a local network. Instructions to reproduce: Open a folder and copy the attached go file named "main.go". Set "IP" in the go file to device's local IP. Install mkcert $ mkcert -install -ecdsa $ mkcert example.com "*.example.com" example.test localhost 127.0.0.1 ::1 -ecdsa $ go run . From another device, connect to https://YOUR_IP:1441/ with Firefox. Ignore the certificate warning Play with `network.http.spdy.enabled.http2` to compare

Instructions to reproduce:

Open a folder and copy the attached go file named "main.go".
Set "IP" in the go file to device's local IP.
Install mkcert
$ mkcert -install -ecdsa
$ mkcert example.com "*.example.com" example.test localhost 127.0.0.1 ::1
-ecdsa
$ go run .
From another device, connect to https://YOUR_IP:1441/ with Firefox.
Ignore the certificate warning
Play with network.http.spdy.enabled.http2 to compare

Thanks for the setup.
Let you know that I try to reproduce this and get 400 for https://YOUR_IP:1441/ with Client sent an HTTP request to an HTTPS server.

I install mkcert, nss, and go. Then do the mkcert setup. Change the excample.com+5[-key].pem to 5 in main.go based on the mkcert result.

Flags: needinfo?(juhsu) → needinfo?(mcccs)

We want to send every chuck as large as possible, so we introduce nsBufferedInputStream
https://searchfox.org/mozilla-central/rev/91d82d7cbf05a71954dfa49d0e43824c7c973e62/netwerk/protocol/http/nsHttpTransaction.cpp#396-402

For each PR_Write, we should send 8 MB per chuck for a large upload, but H2 will adjust the size to smaller.
https://searchfox.org/mozilla-central/rev/91d82d7cbf05a71954dfa49d0e43824c7c973e62/netwerk/protocol/http/Http2Session.cpp#657

Here's the H1 pattern:

2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=-1]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=32768]
2020-06-30 09:58:25.972000 UTC - [Parent 976: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=32768]

Here's the H2 pattern:

2020-06-30 09:56:11.576000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.576000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=11721]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=11721]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=5065]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=5065]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.591000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.607000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=10969]
2020-06-30 09:56:11.607000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=10969]
2020-06-30 09:56:11.607000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=5817]
2020-06-30 09:56:11.607000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=5817]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=10217]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=10217]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=6569]
2020-06-30 09:56:11.623000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=6569]
2020-06-30 09:56:11.638000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.638000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.638000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=9465]
2020-06-30 09:56:11.638000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=9465]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=7321]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=7321]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=8713]
2020-06-30 09:56:11.654000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=8713]
2020-06-30 09:56:11.669000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=8073]
2020-06-30 09:56:11.669000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=8073]
2020-06-30 09:56:11.676000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.676000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.676000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=7961]
2020-06-30 09:56:11.676000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=7961]
2020-06-30 09:56:11.692000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=8825]
2020-06-30 09:56:11.692000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=8825]
2020-06-30 09:56:11.692000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.692000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=7209]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=7209]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=537]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=537]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=9049]
2020-06-30 09:56:11.707000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=9049]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=16009]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=16009]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=6985]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=6985]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=9801]
2020-06-30 09:56:11.723000 UTC - [Parent 1260: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=9801]

We can see we have around 4x more PR calls than H1 (even with 30% failure to write), which should be fixed.

Assignee: nobody → juhsu
Severity: normal → S3
Flags: needinfo?(adam.dincorn)
Priority: -- → P2
Whiteboard: [necko-triaged]

(In reply to Junior [:junior] from comment #53)

Instructions to reproduce:

Open a folder and copy the attached go file named "main.go".
Set "IP" in the go file to device's local IP.
Install mkcert
$ mkcert -install -ecdsa
$ mkcert example.com "*.example.com" example.test localhost 127.0.0.1 ::1
-ecdsa
$ go run .
From another device, connect to https://YOUR_IP:1441/ with Firefox.
Ignore the certificate warning
Play with network.http.spdy.enabled.http2 to compare

Thanks for the setup.
Let you know that I try to reproduce this and get 400 for https://YOUR_IP:1441/ with Client sent an HTTP request to an HTTPS server.

I install mkcert, nss, and go. Then do the mkcert setup. Change the excample.com+5[-key].pem to 5 in main.go based on the mkcert result.

Please double-check that the URL starts with "https://" and not "http://". Then you need to "Advanced..." > "Accept the Risk and Continue"

Flags: needinfo?(mcccs)

After changing https, the go server shows no cipher suite supported by both client and server and the browsers failed to connect.
Even I uncomment the SHA.

Hello Dragana,
Please see Comment 54.
Do you think if we can get upload speed improvement when we max out the size of PR_Write for H2?
Thanks.

Flags: needinfo?(dd.mozilla)

(In reply to Junior [:junior] from comment #56)

After changing https, the go server shows no cipher suite supported by both client and server and the browsers failed to connect.
Even I uncomment the SHA.

Oh no I made a mistake in instructions while posting them.

Please change:

CipherSuites: []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
//tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
},

to

CipherSuites: []uint16{
tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
},

and run "go run ."

(In reply to Junior [:junior] from comment #54)

We want to send every chuck as large as possible, so we introduce nsBufferedInputStream
https://searchfox.org/mozilla-central/rev/91d82d7cbf05a71954dfa49d0e43824c7c973e62/netwerk/protocol/http/nsHttpTransaction.cpp#396-402

For each PR_Write, we should send 8 MB per chuck for a large upload, but H2 will adjust the size to smaller.
https://searchfox.org/mozilla-central/rev/91d82d7cbf05a71954dfa49d0e43824c7c973e62/netwerk/protocol/http/Http2Session.cpp#657

That line makes sense. The first part of the buffer has been already send so we do not want to send it again. We should probably do better realignment. The check if this is the problem you can probably increase the buffer size, or do realignment more often.
I am wondering if it is a flow control problem. You are testing on local host? Did you change rtt, probably not? If rtt is very low, the flow control is probably not the reason for the low rate.

Flags: needinfo?(dd.mozilla)

(In reply to mcccs from comment #58)

I hit this when I try to upload a file.

http: TLS handshake error from ${IP}: tls: no cipher suite supported by both client and server
http2: server: error reading preface from client ${IP}: remote error: tls: unknown certificate authority

Thanks, Dragana.

I try some experiments with my local nas.
Both browsers are with similar speed, with H1 and H2.

chunk size is not the bottleneck. I decrease network.buffer.cache.size to 1024 but it looks like http is still fast.
I want to do the opposite way for H2, but this limitation catches my eyes.
That is, my hypothesis for a possible way to improve in comment 54 is not true.

Comment 45 runs in local as well, looks like flow control is not a problem.

Based on Comment 29, the upload speed could be suppressed to 5MB by TLS version for non-local network.
That should be the first step for the real-world network.

Component: Networking → Libraries
Priority: P2 → --
Product: Core → NSS
QA Contact: jjones
Whiteboard: [necko-triaged]
Version: 70 Branch → other

Comment 45 has the server on a different computer connected to the same Wi-Fi network. Therefore the cause is flow control? On local HTTPS/2.0 both Chrome and Firefox can do about 120 megabytes per second.

I hit this when I try to upload a file.

http: TLS handshake error from ${IP}: tls: no cipher suite supported by both client and server
http2: server: error reading preface from client ${IP}: remote error: tls: unknown certificate authority

In Comment 58 did you miss "RSA"? If not, I have no idea why it's not working.

(In reply to mcccs from comment #62)

Comment 45 has the server on a different computer connected to the same Wi-Fi network. Therefore the cause is flow control? On local HTTPS/2.0 both Chrome and Firefox can do about 120 megabytes per second.

It could be, but that's a possible bottleneck for >10MB. Thanks for the input.

In Comment 58 did you miss "RSA"? If not, I have no idea why it's not working.

I do have "RSA" :(

Assignee: juhsu → nobody

Junior, can you point us to profile(s) with the NSS AEAD methods showing up as significant overhead?

Flags: needinfo?(juhsu)

(In reply to J.C. Jones [:jcj] (he/him) [increased latency due to COVID-19] from comment #64)

Junior, can you point us to profile(s) with the NSS AEAD methods showing up as significant overhead?

I don't know if it showed in profiles. The overhead is indicated by Comment 29

Flags: needinfo?(juhsu)
Attached file main.go
I resist believing that NSS is the culprit; flow control is. Localhost -> Localhost has the same speed as chrome in both HTTP/2 but for 192.168.1.70 -> 192.168.1.90, throughoutput reduces 4-fold (Comment 45). The reason downgrading TLS worked is that HTTP/2 downgrades to HTTP/1.1 with TLS 1.1, and HTTP/1.1 doesn't have the flow control problem on Firefox. In case if it's needed in the future, I'm attaching an updated version of the Golang file upload speed test tool, to be used on two devices on the same network (but not Raspberry Pi). The instructions are inside. It compares TLS 1.2 with HTTP/2 and TLS 1.2 with HTTP/1.1 using the same cipher suite.

Based on mcccs' Comment 66, this is about HTTP/2 vs HTTP/1.1.

Logically, the crypto part of this is very symmetrical (at least for TCP-based sessions): There's nowhere for ciphers to be slower on upload vs. download, and we gather the same locks the same way for each side of the connection. The slowdown has to be upstream of NSS. Moving back to Networking (sorry!)

Assignee: nobody → nobody
Component: Libraries → Networking: HTTP
Product: NSS → Core
QA Contact: jjones
Version: other → unspecified

I'll set this to P2 for now.

Priority: -- → P2
Whiteboard: [necko-triaged]
Priority: P2 → P3

Hello guys.

Today Firefox updated to version 89.0 and I noticed my browser upload speed got significantly better but still not what I should be getting. I now get ~15MB/s instead of max 2MB/s on previous versions of Firefox. Did they update or do something regarding this?

However as I mentioned many times in this report, what I get on chrome or other non-Gecko based browsers is around ~50MB/s

Hi, has there been any progress on this issue at all? This is a 2 years old bug at this point and I can still replicate it in FF 93 and on popular websites.

I have a 100mbps up/down connection and upload speed does not exceed 30-35mbps on sites like gdrive and youtube.

Yah, it was kinda better on FF 89 as mentioned on my previous comment but now on new versions it's back to the same issue. I just learnt to give up on them regarding this.

2.4MBps to Amazon S3 right now... 400mbps internet connection..

(In reply to Junior [inactive] from comment #61)

chunk size is not the bottleneck. I decrease network.buffer.cache.size to 1024 but it looks like http is still fast.
I want to do the opposite way for H2, but this limitation catches my eyes.

This is an interesting observation that we previously missed.

As it is apparent from Comment 54 this synthetic limitation is blocking us from sending more than 16K at a time.
I am not aware of anything H2 RFC preventing us from sending more than 16K. I'll land a patch to remove the clamping then we can test if increasing the pref fixes the upload speed.

Im having the same problem too. Firefox 94 only gave me upload speeds of 5mbps while Chrome was around 200mbps.

I tried increasing network.http.spdy.send-buffer-size and it seemed to help a bit. I get around 100mbps now, though its still half of Chrome.

Hopefully the network.http.spdy.chunk-size patch removes the other bottleneck.

Feel free to test with the latest Nightly build.
I suggest comparing network.http.spdy.chunk-size = 16000 vs 32000.
Thanks!

So, in my environment (Debian 10, FF97beta8, Google Drive, symmetrical cable connection 100/100Mb, default cache and chunk sizes) HTTP1.1 (both HTTP2 and HTTP3 disabled) give best performance (about 8 megabytes per second). Enabling HTTP2 (spdy) or HTTP3 drops it to 2-3 megabytes per second. So I believe we should dig in those protocols defaults.

The server and the browser are on the same computer:

Safari H/3: 130 MB/s

FF H/2: 150 MB/s

FF H/3: 30 MB/s

so yes there are upload speed problems

(In reply to mcccs from comment #77)

The server and the browser are on the same computer:
Safari H/3: 130 MB/s
FF H/2: 150 MB/s
FF H/3: 30 MB/s

This bug was for H2 upload speed. Given those numbers, it seems H2 is fine for you, but H3 is not.
Consider filing a new bug that blocks Bug 1158011. A profiler trace might help.

Same problem here, i was trying some solutions, but still upload for youtube is very slow, about half upload than other browsers.

I can confirm this issue. Uploading to YouTube is very slow. 10 GB file 1 hour in firefox. 9 Minutes in Chrome.
Switching to chrome until this gets a fix.

Disabling HTTP3 increases speed significantly, but not yet on par with Chrome.

10 GB File size for upload to YouTube:

Firefox: 57 minutes
Firefox with disabled HTTP3: 16 minutes
Chrome: 9 Minutes

So, there is a problem with HTTP3 in Firefox, but HTTP2 seems to be slower by a bit too. Definitely disabling HTTP3 for now, as I am uploading to YouTube on a daily basis.

What worries me is that this issue is several years old, and it is not even CONFIRMED yet. How comes this? Why is this being ignored?

Your problem is bug 1753486, I guess.
Changing the title to clarify.

Summary: Upload speed is so slow in Firefox, all other browsers are fine (please read detailed explanation) → H2 upload speed is so slow in Firefox, all other browsers are fine (please read detailed explanation)

Don't know if this issue needs any more comment with a confirmation, but here's my experience:

  • Uploaded a 13 GB video to YouTube on Firefox
  • Half an hour or so later I realized it's taking too long, upload was at 70-ish percent
  • Uploaded the same exact file via Vivaldi
  • By the time Vivaldi upload was done, the Firefox upload reached 95%
  • All of that on 600 Mbps down / 60 Mbps up ethernet optic fiber connection

Some napkin math leads me to believe, Firefox uploads are about 7 times slower.

I was about to create a new report, but decided to add to this.

Uploads on ANY host using Firefox are disastrously slow. I have been experiencing this for years and was too lazy to say anything.

I tried using bunkr or gofile for example. The upload is capped at 4 mbps. When I switch to chrome it uses all my bandwidth to upload which is 30 mbps.

Not exaggerating, this is literally the only reason I can't use Firefox. I work in media and have to frequently upload large files. Why can't this be fixed I don't understand.

Why it is unconfirmed? Experiencing that low upload speed on regular basis.

Firefox upload speed is unstable, keeps dropping by more than 200 Mb/s, confirmed.
Firefox: https://www69.zippyshare.com/v/gcreoZPc/file.html
Chrome: https://www69.zippyshare.com/v/FuEEfZYp/file.html

(In reply to ahmad from comment #86)

(In reply to i.terhin from comment #85)

Why it is unconfirmed? Experiencing that low upload speed on regular basis.

Because they don't want to admit that their browser is crippled when it comes to upload speed. As I said years ago, I give up. I can't believe I made this report 3 years ago and it's still not fixed.

As an option, you may consider setting a bounty if this issue is important for you.

I am seeing this as well. Disabling TLS versions > 1 has improved my loading of Grafana web pages slightly, but disabling http2 and http3 have made Grafana load faster on my internal work network instance. Edge and Chrome load faster by default, so this seems to be related to overall speed problems with HTTP2 AND 3. It could be the SSL CPU encoding performance is not optimal. Should I file a new bug or do you need any logs? I have a slightly similar symptom but I believe this is the same issue.

Here is a profiler output from my internal Grafana server that I use just switching outputs. I also am including a network tab profiler from developer tools. Hopefully this data will help https://profiler.firefox.com/from-browser/calltree/?globalTrackOrder=q0wp&hiddenGlobalTracks=1wn&hiddenLocalTracksByPid=23944-0w2~21652-0~19124-013w5&thread=x3&v=7

(In reply to paul.stejskal from comment #91)

Here is a profiler output from my internal Grafana server that I use just switching outputs. I also am including a network tab profiler from developer tools. Hopefully this data will help https://profiler.firefox.com/from-browser/calltree/?globalTrackOrder=q0wp&hiddenGlobalTracks=1wn&hiddenLocalTracksByPid=23944-0w2~21652-0~19124-013w5&thread=x3&v=7

You have to click upload on the profile (Top right button) for it to be visible publicly.

Flags: needinfo?(paul.stejskal)

My apologies. Here are a couple of the profiles: https://share.firefox.dev/3DcpVid and https://share.firefox.dev/3BsYVtq

Flags: needinfo?(paul.stejskal)

(In reply to paul.stejskal from comment #90)

I am seeing this as well. Disabling TLS versions > 1 has improved my loading of Grafana web pages slightly, but disabling http2 and http3 have made Grafana load faster on my internal work network instance. Edge and Chrome load faster by default, so this seems to be related to overall speed problems with HTTP2 AND 3. It could be the SSL CPU encoding performance is not optimal. Should I file a new bug or do you need any logs? I have a slightly similar symptom but I believe this is the same issue.

It's not about SSL as we discussed earlier since if it were, it would also be reflected with the download speed. As I said I suspect that it's related to HTTP/2 and HTTP/3 upload performance of Firefox. By disabling TLS versions > 1 you're forcing HTTP/1.1, that's why in that case it's fast, otherwise newer TLS versions are not bad.

(In reply to mcccs from comment #94)

It's not about SSL as we discussed earlier since if it were, it would also be reflected with the download speed. As I said I suspect that it's related to HTTP/2 and HTTP/3 upload performance of Firefox. By disabling TLS versions > 1 you're forcing HTTP/1.1, that's why in that case it's fast, otherwise newer TLS versions are not bad.

Thank you. Is more data to troubleshoot this issue needed?

I can confirm as well that forcing min & max TLS versions to 1 in about:config restores the upload speed. Otherwise, all uploads are disastrously slow. Is this ever going to be fixed? And what are the implications of using old TLS version permanently? Can anyone answer?

(In reply to Usman Khwaja from comment #96)

I can confirm as well that forcing min & max TLS versions to 1 in about:config restores the upload speed. Otherwise, all uploads are disastrously slow. Is this ever going to be fixed? And what are the implications of using old TLS version permanently? Can anyone answer?

TLS 1 is essentially broken from a security perspective. So it would be quite a bad idea to set it to use this version.

Thank you for all these leads. While TLS 1 is indeed not usable on the internet, it does give us a way to measure performance.

Would it help to disable TLS2 and get any data like a network trace? Obviously it would only be for testing purposes? I don't know if Google Drive is set to support TLS1.2 minimum (for obvious security reasons) but a quick test would be beneficial if more data would help.

(In reply to Valentin Gosu [:valentin] (he/him) from comment #98)

Thank you for all these leads. While TLS 1 is indeed not usable on the internet, it does give us a way to measure performance.

As we have discussed extensively earlier in this thread, TLS 1.0 makes it faster because it disables HTTP/2,3. The same performance level can be achieved by disabling HTTP/2 and HTTP/3.

Therefore disable HTTP/2 and HTTP/3 temporarily as a fix.

Valentin, the tool I linked to in bug 1753486 can be used to find the bottleneck. In terms of H/2 upload on the same computer (zero-ping), Firefox is 25% slower than Safari and Chrome is 97% slower than Safari - I have no clue why it's so, perhaps I should report this to Chrome?. (In short, Firefox is very performant in H/2 uploads (and downloads), if using a zero ping setup,) Therefore the H/2 upload bottleneck appears only when there's some latency.

When downloading H/2 because the numbers are astronomical I don't know how to compare Safari 6.2Gbps, Chrome 4.4Gbps, Firefox 2.1Gbps. Nevertheless it's always faster than what's reported on this thread so the cipher suite, as always I have claimed, is irrelevant, which can be verified by trying downloads from the same website.

What's needed to debug this is now running my tool on a different server on the same local network and see if it can achieve the max performance of 1Gbps like Chrome and Safari, I believe, can. Would you like a profiler output for this - which I can post in a month? Or the network log? Since I think it makes no difference, would enabling the separate socket process help debug this?

I performed a quick test after couple of months and it seems that this issue have been fixed already, at least for speed up to 1 Gb/s as you can see in the video https://www.youtube.com/watch?v=hkMYO9yXtnY 🤔
I used new clean profile with no tweaks, just with AdGuard extension to block mess on the internet.

I can confirm that the bottleneck limiting H/2 upload to 40% in my case is still present in comparison to both upload of other browsers and download of the three browsers including FF. I observed this with computers connected to the same network so there's some latency. Increasing CPU frequency resulted in no change for no browser's upload or dl speed so the problem is not CPU. It's just not using full network capacity.

Then I doubled available speed. Other browsers uploaded 100% faster but FF only 50% so please investigate this

I am on Firefox 108, on Linux and I still had this issue. I have other colleagues that have the same issue (on a Mac), and some who don't (chromebook), not sure of the version.

For context, I'm part of the Proton Drive team, and we've recently disabled http2 generally on the file upload routes, due to this bug. We've seen a ~50% increase in performance (graph) for Firefox users globally.

I was able to change upload speed to Proton Drive from 7-8MB/s to ~18-19MB/s by increasing the values of network.http.http2.send-buffer-size and network.http.http2.push-allowance to 33554432 in about:config . Our servers' window size is set to 4MB (this made a pretty big difference on it's own for users with high latency). This is a pretty big issue for us and disabling http2 is a rather bad workaround.

We're happy to help on this issue where we can at Proton.

I have now experimented with "network.http.http2.push-allowance" on my test setup. Doubling the default value gave me 50% faster upload. Then I used "33554432" and it reached full network speed very fast. I conclude that Firefox needs a different fix than changing this value. Whatever this value is and whatever the max value reached is the value is reached steeply. What Chrome and Safari do is a concave asymptotic increase and what FF does is steep. I'm afraid that a bigger change is needed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Thank you everyone for your comments. We plan to work on this issue in the next few months.

Priority: P3 → P2

Thanks a lot Valentin Gosu.

Yes, I hope you can work on this soon. There are other bugs with a similar complaint.

Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-review]
Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-priority-queue]
Assignee: nobody → acreskey

I've just started looking into this.

On speedof.me, I can certainly see a significant discrepancy in upload speeds between Firefox and Chrome.

Upload speed, speedof.me

Device Firefox Release 110 Chrome Release 110
Asus Core-i3 Windows 10 47 Mbps 58 Mbps
MacBookPro M1 Max 62 Mbps 131 Mbps

On the "real-world" scenarios that I've tried, the performance is similar, Firefox a little bit faster.

Uploading a 1GB binary to Proton Drive:

Device Firefox Release 110 Chrome Release 110
Asus Core-i3 Windows 10 5.8 Mbps 4.6 Mbps
MacBookPro M1 Max 11.9 Mbps 11.1 Mbps

Uploading a 1GB binary to Google Drive:

Device Firefox Release 110 Chrome Release 110
Asus Core-i3 Windows 10 4.0 Mbps 3.7 Mbps
MacBookPro M1 Max 13.3 Mbps 13.3 Mbps

This is on a 100 Mbps Wifi (measured as 101.5 Mbps download, 137.9 Mbps upload via Google's Internet Speed test).
Both Firefox and Chrome report the same values for this test.

I've have a look at the protocols used and start experimenting with our http2 parameters next.

Hi, on Proton Drive, http2 is currently disabled for content download, since it was causing performance issues on Firefox.

Thanks for the info. Would it be possible to enable it for a subset of users/pages? Or maybe provide a staging site.

Flags: needinfo?(jonathan.villemaire-krajden)

Hello @Andrew Creskey, thanks a lot for taking a look into this.

As I mentioned in this bug before, the upload speed bug in Firefox is mostly only apparent in very high upload speeds (e.g. 300mbps) and even then doesn't happen with everyone. So I am willing to let you or anyone connect to my TeamViewer/Anydesk and test all they want and install any tools if it helps to know the exact cause of the issue. You can E-mail me or contact me on discord (enio#0292) to arrange that if you want.

(In reply to Jonathan Villemaire-Krajden from comment #109)

Hi, on Proton Drive, http2 is currently disabled for content download, since it was causing performance issues on Firefox.

Thank you Jonathan, I see the HTTP/1.1 in our logs when connecting to Proton Drive.

(In reply to ahmad from comment #111)

Hello @Andrew Creskey, thanks a lot for taking a look into this.

As I mentioned in this bug before, the upload speed bug in Firefox is mostly only apparent in very high upload speeds (e.g. 300mbps) and even then doesn't happen with everyone. So I am willing to let you or anyone connect to my TeamViewer/Anydesk and test all they want and install any tools if it helps to know the exact cause of the issue. You can E-mail me or contact me on discord (enio#0292) to arrange that if you want.

I appreciate the generous offer, ahmad. I may take you up on that.
First I'm going to see if I can find a fast wired connection here, since I have a range of hardware that I'd like to test on.

(In reply to Andrew Creskey [:acreskey] from comment #113)

I appreciate the generous offer, ahmad. I may take you up on that.
First I'm going to see if I can find a fast wired connection here, since I have a range of hardware that I'd like to test on.

NP take your time :)
Just hit me up on discord or by E-mail if you want to test, I will dedicate a PC/laptop for the this purpose and you can install anything and test as long as you want on it.

Cheers!

(In reply to Andrew Creskey [:acreskey] from comment #113)

(In reply to ahmad from comment #111)

First I'm going to see if I can find a fast wired connection here, since I have a range of hardware that I'd like to test on.

On a Linux machine, you can just test against localhost and

sudo tc qdisc add dev lo root netem delay 10ms rate 500Mbit

See Also: → 1822149

Quick update: In bug 1822149 we are in the process of adding performance tests measure upload speed.
It shouldn't take much longer to land these in CI.
We can then use them to tune the parameters and algorithm.

We've landed the first upload bandwidth tests.
You can have a look at some early Windows results here
Chrome generally clocking in at a bit under 600 mbps, Firefox at 350 mpbs.

I'm kicking off some tests where network.http.http2.chunk-size and network.http.http2.send-buffer-size are doubled.

Looking at the upload bandwidth for Firefox ( upload upload-bandwidth opt fission webrender), I'm not seeing any improvements for either of these experiments:

Doubled network.http.http2.chunk-size to 32000

Doubled network.http.http2.send-buffer-size

The results are a bit noisy, but if you hover over the results you'll see the distribution.

I'll try increasing both the network.http.http2.chunk-size and network.http.http2.send-buffer-size at the same time.

Results of two other tests.
Neither moved the needle on Windows or Linux, possible improvements on Mac. (Noting that we have some noise and outliers in the results)

Doubled both network.http.http2.chunk-size and network.http.http2.send-buffer-size

Doubled network.http.http2.push-allowance

I'm going to collect logs and profiles and see what I can find.

Ah, I inadvertently created an http/3 upload test because I didn't realize that we renamed the http/3 pref :)
I'll fix this test to stick to http/2 and make a separate http/3 test.

I've made sure that our upload test is now using http/2 for Firefox and Chrome (bug 1826624).

Verifying with logs on in our performance infrastructure I can confirm this (Windows: https://pastebin.mozilla.org/vQ4HJLJ4)
And locally just using the dev tools and protocol field.

And this is the interesting part.
I re-ran these experiments, with no improvement in bandwidth (but some regression). MacOS results still coming in.

Doubled network.http.http2.chunk-size
Doubled network.http.http2.send-buffer-size
Doubled network.http.http2.push-allowance

So instead I cut all of the http2 chunk and buffering prefs to a quarter or less (network.http.http2.chunk-size, network.http.http2.send-buffer-size, network.http.http2.push-allowance).

Heavily reduce all of the http2 buffer/chunk sizes
And that yielded a ~44.43% improvement on Windows with statistical significance (over 19 runs, which is 190 uploads).
And a nearly opposite regression on Linux of -38.32%
Not what I expected.

No issues in reproducing a huge upload improvement on Windows with smaller chunk/buffer/push sizes.
Fresh baseline and test here

I'll break this down to see if one pref in particular is the limiting factor.

So unfortunately the network conditions within our performance lab are not stable enough to make meaningful conclusions on a live-site upload test such as what I've constructed.
They provide a reasonable rolling data point on upload speed, but I don't they can be used for tuning since the same mercurial revision compared with itself can vary by 40% in bandwidth even with 100 iterations of the test.

So I've setup this environment to measure maximum bandwidth:

I'm running an https server locally, accepting http/2 connections.

If I connect my Windows PC via Cat6 cable I measure upload speeds of a little over 900mbps from Firefox. Profile
Same performance on Chrome.

(Leaving Dev Tools open in Firefox will greatly reduce the bandwidth, to about 720 mbps).

If I upload to my server from the same machine, I see speeds of a bit over 2000mpbs with both Firefox and Chrome.

They provide a reasonable rolling data point on upload speed, but I don't they can be used for tuning since the same mercurial revision compared with itself can vary by 40% in bandwidth even with 100 iterations of the test.

By the way, the drop in bandwidth around April 11ths was the changing of the protocol from HTTP/3 to HTTP/2.

(In reply to Andrew Creskey [:acreskey] from comment #124)

So I've setup this environment to measure maximum bandwidth:

I'm running an https server locally, accepting http/2 connections.

If I connect my Windows PC via Cat6 cable I measure upload speeds of a little over 900mbps from Firefox. Profile
Same performance on Chrome.

(Leaving Dev Tools open in Firefox will greatly reduce the bandwidth, to about 720 mbps).

If I upload to my server from the same machine, I see speeds of a bit over 2000mpbs with both Firefox and Chrome.

Remember typical internet connections will have 100x as much latency as ethernet LAN, which may change the behavior. You can use the netem qdisc to simulate going across the internet, as mentioned in comment #115. The man page is "man netem".

I'm running an https server locally, accepting http/2 connections.

If I connect my Windows PC via Cat6 cable I measure upload speeds of a little over 900mbps from Firefox. Profile
Same performance on Chrome.

Not able to see an improvement in our bandwidth when increasing the http/2 prefs in this setup (nearly at the limit of my gigabit adapter).

However if I use a packet-level throttler on the host (300mpbs up/down), then I can see a reliable difference between Chrome and Firefox.
On Firefox, seeing about 165mbps, and about 260mpbs on Chrome.

(In reply to Russell Haley from comment #126)

(In reply to Andrew Creskey [:acreskey] from comment #124)

So I've setup this environment to measure maximum bandwidth:

I'm running an https server locally, accepting http/2 connections.

If I connect my Windows PC via Cat6 cable I measure upload speeds of a little over 900mbps from Firefox. Profile
Same performance on Chrome.

(Leaving Dev Tools open in Firefox will greatly reduce the bandwidth, to about 720 mbps).

If I upload to my server from the same machine, I see speeds of a bit over 2000mpbs with both Firefox and Chrome.

Remember typical internet connections will have 100x as much latency as ethernet LAN, which may change the behavior. You can use the netem qdisc to simulate going across the internet, as mentioned in comment #115. The man page is "man netem".

Yes, agreed. I'll simulate something reasonable but fast and see what I can improve.

Yes, with 300 mpbs up/down limits and 10ms latency in each direction, Firefox will upload at about 43mpbs, and Chrome at 154mbs.
So this is a good test environment.

Yes, certainly the send-buffer-size is the limiting factor in this environment as doubling it doubles the upload speed and quadrupling it nearly quadruples the speed. (matching Chrome's)

Pref                                        change        mpbs

http2.chunk-size                           doubled        43.57
http2.chunk-size                           doubled        43.64
http2.chunk-size                           halved         43.47 
http2.chunk-size                           halved         43.63

network.http.http2.send-buffer-size        doubled        85.86
network.http.http2.send-buffer-size        doubled        86.20

network.http.http2.send-buffer-size        quadrupled     159.0
network.http.http2.send-buffer-size        quadrupled     159.9

network.http.http2.push-allowance          doubled        43.50
network.http.http2.push-allowance          doubled        43.40

So the 128KB send-buffer-size is used here -- we detect a large file (>128KB), and cap the send buffers based on a Bandwidth Delay Product for a 10Mbps connection with 100ms RTT.

// If this stream represents a large upload, disable autotuning for
// the session and cap the send buffers by default at 128KB.
// (10Mbit/sec @ 100ms)

From first glance the numerical assumptions seem dated (also, the constants are almost 10 years old :).
Perhaps autotuning or an updated BDP would be better?

Keeping in mind that I knew nothing about http/2 this morning, and know next-to-nothing now, and am unfamiliar with the Firefox codebase and am completely baffled by how "network.http.http2.send-buffer-size" becomes kTCPSendBufferSize...

A few things turn up:

  1. On Linux, setting SO_SNDBUF apparently stops the kernel from autoscaling, and apparently the autoscaler can scale up to larger BDP than the default value of net.core.wmem_max allows to be set with setsockopt() -- 208 KiB on my machine. This has caused throughput problems in other software. Unfortunately, (what appears to be) the in-depth explanation is in Chinese, I can't read Chinese, and Google Translate doesn't like that site.

  2. This was introduced by :mcmanus, to fix a problem with Gmail uploads through a squid proxy. On his machine the autoscaler was going up to 32 seconds of sending. But it's not supposed to scale larger than the congestion window size, and a 32 s congestion window is ludicrous, so maybe the connection he was testing on had a huge amount of bufferbloat? If so I might chalk it down to, "doc, it hurts when my ISP goes like this." Luckily, Dave Taht's crusade has probably made such severe bufferbloat unlikely these days.

  3. Cloudflare encountered and fixed a problem with HTTP/2 speed proxying through nginx. Maybe Squid had/has a similar problem, and that has something to do with bug 790184?

  4. In 790184#c14 :mcmanus says "this problem will be mitigated by v3 in ff16". Was it? Is this an obsolete workaround?

Maybe e-mail :mcmanus and ask.

Thank you for the forensics, Russel.

I'm new to the Necko http/2 layer so I'll be bringing in our experts as needed as I go through those issues.

Certainly something is terribly wrong in the general case with the send buffer patch.
If one stream in an HTTP/2 connection sends more than 128KB then we fix the send buffer size for the entire connection to 128KB?

More investigation is needed, and that's a good idea to ask mcmanus.
If possible, would be great to remove this.

If I remove the setting of the TCP sendbuffer size (i.e. that code), the improvement on a network with higher bandwidth delay product is huge

600 Mpbs, 40ms latency

Product Upload bandwidth, Mbps
Chrome Release 90.9
Firefox Release 23.3
Firefox Nightly with 128KB sendbuffer size removed 90.6

(These measurements were made on Windows.)

So on Windows what I'm seeing is that the kernel is defaulting the TCP send buffer size to 512KB.
As soon as the fix for the squid proxy issue kicks in, the send buffer is cut down to 128KB and the performance suffers.

On my MacBook I'm seeing the kernel autotuning of send buffer size bring it up to about 144KB before being disabled by that fix.

I'm going to try to see if I can reproduce the squid-proxy issue. Patrick mentioned that he saw it on his 1MBps network.

But it's looking like if we could remove this legacy fix that would be ideal.

On my Mac I can observe the kernel autotune the TCP sendbuffer size up to 4MB.

And if I restrict my bandwidth (e.g. 1Mbps), it will autotone it back down to 128KB.

The bug that lead to our hardcoding of the send buffer side was seen in Squid proxy. It is no longer readily available for MacOs, but I've tested with another local recording proxy, mitmproxy, and even at heavily reduced bandwidth I'm not seeing buffer bloat lead to a http/2 ping failure. The kernel does not autotune the tcd-send buffer beyond about 140KB for me on a throttled network.

Summarizing where we are and suggesting next steps:

From my tests, the primary cause of our poor http/2 upload performance is the forcing of the TCP send buffer size to 128KB once 128 KB has been uploaded.
As Russel discovered, that was introduced 11 years ago as a fix to an issue seen in the partially implemented http/2 layer that could manifest on slower networks.


In general, disabling the kernel autotuning of the TCP send buffer is not recommended.
From my testing and research I believe that the behaviour introduced in that fix is not needed.

The kernel autotuning has not lead to the buffer bloat from 4MB send buffers as described, even on heavily throttled networks.

I’ve gone down to 1Mpbs and lower without issues.


From what I have seen, Windows does not autotune the send buffer, but uses a fixed size of 512KB.


On Windows, removing this code can lead to performance improvements of up to 4x on faster networks.
On Linux and MacOs I believe that we will be able to measure larger improvements because the kernel can autotune up to 4MB send buffers. (I will measure and graph once we are sure of the approach)

If our networking experts are in agreement, I propose:


  1. We first add telemetry to record both HTTP/2 upload bandwidth and error rates
  2. Once that’s in place, we can remove or disable the logic that hardcodes the send-buffer size (Perhaps only remove for Nightly and Beta at first)
  3. We could consider asking Patrick if he has any thoughts on this, as Russel suggested


Request:

Valentin or Kershaw, can I ask to read this thread and share your thoughts?

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(kershaw)

If anyone would like to build or try this change, binaries and the patch are here

These binaries also log the tcp send buffer size (to standard out and MOZ_LOG (nsHttp:5)).

Thanks for the in-depth investigation Andrew.

Looking at the comments in bug 790184, and the current implementation of RecvWindowUpdate I would say that the send buffer size limit was introduced to fix an issue that is no longer present. More specifically, HTTP/2 has both per stream and per session flow control, so presumably we should no longer have the issue in bug 790184. spec link

I propose we first disable the send buffer limit by setting the pref value to 0, and change the code that sets it in the following way:

https://searchfox.org/mozilla-central/rev/31f5847a4494b3646edabbdd7ea39cb88509afe2/netwerk/protocol/http/Http2StreamBase.cpp#554-573

void Http2StreamBase::UpdateTransportSendEvents(uint32_t count) {
  uint32_t bufferSize = gHttpHandler->SpdySendBufferSize();
  if (StaticPrefs::network_http_http2_send_buffer_size() >= 0 && (mTotalSent > bufferSize) && !mSetTCPSocketBuffer) {
    mSetTCPSocketBuffer = 1;
    mSocketTransport->SetSendBufferSize(bufferSize);
  }

We can then verify that the squid proxy issue is no longer present on linux.
We should then monitor telemetry to make sure the we don't encounter more flow_control errors (0x03):
https://glam.telemetry.mozilla.org/firefox/probe/spdy_goaway_local/
https://glam.telemetry.mozilla.org/firefox/probe/spdy_goaway_peer/

Given the info you dug up, I don't think we need to be too cautious and limit this to nightly and beta - if there are any negative consequences we change back the pref to it's initial value and reevaluate.

Flags: needinfo?(valentin.gosu)

I totally agree Valentin's proposal.
We should modify Http2StreamBase::UpdateTransportSendEvents as Valentin suggested and try to set the pref send_buffer_size to 0.
Note that the if statement should be like:

if (StaticPrefs::network_http_http2_send_buffer_size() > 0 && (mTotalSent > bufferSize) && !mSetTCPSocketBuffer) {

In this way, when the pref is 0, auto-tuning is in effect.

About your proposal:

  1. We first add telemetry to record both HTTP/2 upload bandwidth and error rates

The upload bandwidth might be difficult to measure and record, so we might not need it.
However, we could try to add another error type PING_TIMEOUT here and set mGoAwayReason to it when we see a PING timeout here.

  1. Once that’s in place, we can remove or disable the logic that hardcodes the send-buffer size (Perhaps only remove for Nightly and Beta at first)

As Valentin suggested, we can set the pref to 0 and let the patch ride the train.

  1. We could consider asking Patrick if he has any thoughts on this, as Russel suggested


Don't have strong opinion on this. It's harmless to ask. :)

Flags: needinfo?(kershaw)

Appreciate the quick replies.
I'm adding telemetry for upload throughput over all http protocols in bug 1830087.
And I will land the proposed fix to this issue after the probe has had a few days to collect data.

This behaviour was introduced midway through HTTP/2 development in Bug 790184. With connection-level and stream-level flow control, it is not needed.

Note that on fast networks removing this forcing of a small tcp send buffer has been shown to improve H2 upload performance by up to 4x on Windows.

Telemetry patch and discussed fix is up for review.
I'll chat with the team on the best way to track error rates. Adding a PING_TIMEOUT error type seems like it would handle the situation best, if we think it's needed.

I believe I'm having the same issue, but in this case it's not the upload but the download speed that's particularly slow. I'm on FF 112.0.2 (64-bit), on Windows 10, and I've tried a couple different speed test websites and none of them showed more than 150Mbps, when my actual internet speed is 300Mbps. The weird thing is that I get the expected download speed if I run the test using Chrome or Edge.

I tried setting browser.tabs.remote.autostart = false (as I read in some forum) but it didn't help. Also tried with a private window but results were the same. And I've observed the same behavior both over wifi and ethernet.

Finally, I downloaded this build and performed a speed test using it, but the result was still less than half my actual internet speed.

Let me know if there's any information I can provide that might be of help.

Thank you Anarelle. Can I ask you to add your observations to Bug 1738939?
We will have the upload speed resolved shortly in Nightly.

The telemetry Bug 1830087 is pushed and will allow us to observer the impact of this fix in the field.

Using a local server and a Windows machine attached via Cat 6, I was able to measure HTTP/2 upload throughput under various network conditions [attached.]

Wowza! That is looking great! Thank you so much for persevering with this. I look forward to when I can start using Fx again for my large uploads.

Thanks a lot Valentin, Andrew and everyone that was involved in fixing this.

Depends on: 1830918

In Bug 1830918 we landed the first part of the fix.
With this patch, if you set network.http.http2.send-buffer-size to 0, the HTTP/2 upload throughput limiting code will be bypassed.
This will be in Firefox 114.

The full fix should land for Firefox 115.

Thank you ahmad for filing this one and to everyone who contributed with such helpful observations, data, and ideas.

Duplicate of this bug: 1621277
Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dab89a9f13e0 No longer force tcp send buffer size on HTTP/2 uploads > 128KB r=necko-reviewers,valentin
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

Fix landed May 24th.
Although the results are noisy, we are clearly seeing improvements in H2 upload bandwidth via the tests in CI.
Fast results roughly doubled on all three deskptop platforms.
Note: Chrome is included for some platforms, so watch for those.

Likely seeing improvements via telemetry in Nightly, query, although I'll wait and in particular watch this roll into Beta.

But we are setting new upload bandwidth records at the 80th, 90th, and 95th percentiles.

HTTP/2 error rates look good to me:
Received peer go away, 31 is none. Stable.
Send client go away, 31 is none sent. Stable.

Duplicate of this bug: 1725758
QA Whiteboard: [qa-115b-p2]

The HTTP/2 upload fix has made it through Beta so we can take a look at the results there.
(Comparing Fx 114 (Before), to Fx 115 (after))

Note that the performance timings are stored in bucketed histograms and so the results are approximate, particularly at very low bandwidths.

Windows

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 0 0 0 0.00%
0.1 1 1 0 0.00%
0.2 1 2 1 100.00%
0.3 3 4 1 33.33%
0.4 4 6 2 50.00%
0.5 7 8 1 14.29%
0.6 9 12 3 33.33%
0.7 13 17 4 30.77%
0.8 19 25 6 31.58%
0.9 34 43 9 26.47%
0.95 54 63 9 16.67%

MacOs

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 0 1 1 100
0.1 2 2 0 0.00%
0.2 5 5 0 0.00%
0.3 7 8 1 14.29%
0.4 9 11 2 22.22%
0.5 13 15 2 15.38%
0.6 17 21 4 23.53%
0.7 23 31 8 34.78%
0.8 34 46 12 35.29%
0.9 54 86 32 59.26%
0.95 80 149 69 86.25%

Linux

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 2 1 -1 -50.00%
0.1 3 3 0 0.00%
0.2 6 6 0 0.00%
0.3 10 9 -1 -10.00%
0.4 13 13 0 0.00%
0.5 18 19 1 5.56%
0.6 25 27 2 8.00%
0.7 34 43 9 26.47%
0.8 58 68 10 17.24%
0.9 138 174 36 26.09%
0.95 188 408 220 117.02%

Collected from this query.

Overall, these results look great.
We will monitor them in the release population.

One reason that we may be seeing larger improvements on MacOs and Linux is that those kernels autotune the send buffer up to larger sizes (4MB on MacOS), while on Windows I saw a fixed send buffer size of 512KB.

We should have requested this for 115 release notes.
Is it possible to make a belated 116 release note?

Release Note Request (optional, but appreciated)

[Why is this notable]:
The upload performance impact could be significant - reducing the time of large uploads by minutes (depending on the network configuration).

[Suggested wording]:
The upload performance of HTTP/2 been significantly improved, particularly on those with a higher bandwidth delay product (i.e., networks characterized by both high bandwidth and high latency)

relnote-firefox: --- → ?

The HTTP/2 upload fix has spent over a month in Release so we can take a look at the results there.
(Comparing Fx 114 (Before), to Fx 115 (after))

Note that the performance timings are stored in bucketed histograms and so the results are approximate, particularly at very low bandwidths.

Windows

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 0 0 1 0.00%
0.1 1 1 0 0.00%
0.2 3 4 1 33.33%
0.3 6 6 0 0.00%
0.4 8 9 1 12.50%
0.5 11 13 2 18.18%
0.6 16 19 3 18.75%
0.7 21 27 6 28.57%
0.8 31 40 9 29.03%
0.9 50 68 18 36.00%
0.95 80 109 29 36.25%

MacOs

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 0 1 0 0.00%
0.1 2 2 0 0.00%
0.2 5 5 1 25.00%
0.3 7 8 1 14.29%
0.4 9 10 1 11.11%
0.5 13 15 2 15.38%
0.6 17 21 4 23.53%
0.7 23 29 6 26.09%
0.8 34 46 15 48.39%
0.9 54 80 26 48.15%
0.95 80 138 58 72.50%

Linux

Percentile Before (Mbps) After (Mbps) Delta (Mbps) % difference
0.05 1 1 0 0.00%
0.1 2 2 0 0.00%
0.2 5 6 1 20.00%
0.3 9 10 1 11.11%
0.4 12 15 3 25.00%
0.5 18 21 3 16.67%
0.6 25 29 4 16.00%
0.7 34 43 9 26.47%
0.8 54 68 14 25.93%
0.9 86 118 32 37.21%
0.95 161 219 58 36.02%

Collected from this query.

On Windows, improvements from ~18% to ~36% starting at the 50th percentile and upwards.
On MacOs, improvements from ~15% to ~72% starting at the 50th percentile and upwards.
On Linux, improvements from ~16% to ~36% starting at the 50th percentile and upwards.

We landed the telemetry as a keyed histogram which is not yet supported on glean so we don't have Android results.
From local testing of Fenix Nightly with 50ms latency up and 50ms of latency down (introduced via packet-level throttler), same wifi network, I measured large improvements.

Before the Http/2 fix
11.6 Mbps
11.0 Mbps
10.3 Mbps
After the fix
18.3 Mbps
22.2 Mbps
21.2 Mbps

For users in the wild the improvements would be dependent on their network environment, with larger gains on networks with higher bandwidth delay products.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: