Connections hang when using HTTP/2 proxy
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: bugzilla-cloudflare, Assigned: mayhemer)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged][secure-proxy-mvp])
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
- Start an HTTP/2 proxy server like nghttpx + Squid or Caddy's forwardproxy
- Configure Firefox to use the proxy with a PAC file (as described in https://nghttp2.org/documentation/nghttpx-howto.html#http-2-proxy-mode)
- Go to a site like fast.com that opens a lot of connections
Actual results:
Some internal stream limit is reached and newer connections cannot be made, breaking the speed test and preventing navigation to other sites.
I also got it to crash once; there should be a crash report with my email attached.
Expected results:
The browser opens more HTTP/2 sessions to accommodate the large number of connections.
Reporter | ||
Updated•5 years ago
|
![]() |
Assignee | |
Comment 1•5 years ago
|
||
Thanks for this report! Checking this use case with mayhemer/node_http2_proxy and latest Firefox Nightly I can confirm there is such a problem.
![]() |
Assignee | |
Updated•5 years ago
|
![]() |
Assignee | |
Comment 2•5 years ago
|
||
From a preliminary look at backtrack and logan this seems to be a duplicate or a variant of bug 1463729 for what I can reproduce.
Andrew, few questions:
- have you been using MOZ_LOG during your test?
- what is your download speed? (as the bug I mention above depends on hi-speed rx)
thanks
Reporter | ||
Comment 3•5 years ago
|
||
I did not use MOZ_LOG. But it shouldn't be too big of a problem since you can reproduce it yourself.
My download/upload speed ranges from 150-200 Mbps on the Wi-Fi here. I also noticed this bug occasionally during normal Web browsing, so it might be both throughput and the number of streams.
![]() |
Assignee | |
Comment 4•5 years ago
|
||
(In reply to Andrew Sun from comment #3)
Thanks for a quick answer.
I did not use MOZ_LOG. But it shouldn't be too big of a problem since you can reproduce it yourself.
To explain my question, the I/O that MOZ_LOG/_FILE do obviously slows the socket thread processing and makes that bug way more reproducible; explanation below.
My download/upload speed ranges from 150-200 Mbps on the Wi-Fi here. I also noticed this bug occasionally during normal Web browsing, so it might be both throughput and the number of streams.
That high speed is probably the trigger. The thing is that we have an "optimization" in our h2 implementation which simply loops as long as there are new data on the socket available (w/o querying poll()). If the downstream speed is high, it may loop that way for significant amount of time, blocking our single socket thread, which we poll and process socket events on, from processing new requests.
Updated•5 years ago
|
![]() |
Assignee | |
Comment 5•5 years ago
|
||
Hi Andrew, can you please check the test build at target.dmg which contains a draft fix for bug 1463729 in your setup again?
Thanks.
(Build from try: Bug 1463729 - Break the loop in nsHttpConnection::OnSocketReadable ...)
Reporter | ||
Comment 6•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #5)
Unfortunately, this only partially fixed the issue. fast.com still has lots of hanging/failed connections, and navigation to other sites during the speed test is sometimes very slow. Also, Firefox eats about 120% CPU during the speed test while transferring very little data.
I also got it to crash again when visiting another site during a speed test. There should be another crash report with my email attached.
![]() |
Assignee | |
Comment 7•5 years ago
|
||
(In reply to Andrew Sun from comment #6)
(In reply to Honza Bambas (:mayhemer) from comment #5)
Unfortunately, this only partially fixed the issue. fast.com still has lots of hanging/failed connections, and navigation to other sites during the speed test is sometimes very slow. Also, Firefox eats about 120% CPU during the speed test while transferring very little data.
Thanks for testing this. I have to confess I have similar results (modulo the CPU load, which I didn't look at, actually) with a nodejs based h2 proxy. Hence, this indicates there is another problem in Firefox and not a proxy.
I also got it to crash again when visiting another site during a speed test. There should be another crash report with my email attached.
You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.
Reporter | ||
Comment 8•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #7)
You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.
Here's a link to the crash: https://crash-stats.mozilla.org/report/index/039501f9-e9c1-4794-969f-0054c0190612
Looks like it's in the same place as the other crash report I submitted.
![]() |
Assignee | |
Comment 9•5 years ago
|
||
(In reply to Andrew Sun from comment #8)
(In reply to Honza Bambas (:mayhemer) from comment #7)
You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.
Here's a link to the crash: https://crash-stats.mozilla.org/report/index/039501f9-e9c1-4794-969f-0054c0190612Looks like it's in the same place as the other crash report I submitted.
Looks like. The missing symbols are from the try run (we apparently don't expose symbols for them) Thanks.
![]() |
Assignee | |
Comment 10•5 years ago
|
||
Hi Andrew, sorry for not much update lately on this bug, we had an all company meeting last week.
I don't have a final fix at the moment, but actively working on it now, with some progress.
I have a new test build at [1] for OSX that has another version of the patch from bug 1463729. If you can, please test it again under the same conditions. Note that the crash (bug 1556491) is not fixed yet in that build.
It might be helpful for me if you could (as another separate test) run the test build with logging on [2] and try to catch the crash. MOZ_LOG
would have to be added sync,pipnss:5
modules. Note that this is changing timing which may affect reproducibility (both directions.)
Thank you.
[1] https://queue.taskcluster.net/v1/task/a7ecOpWIStCnRjpnEo1HTw/runs/0/artifacts/public/build/target.tar.bz2 (apparently we no longer provide dmgs...)
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Updated•5 years ago
|
Reporter | ||
Comment 11•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10)
The binaries in the linked build are for Linux. Could you make sure the CI server is building for Mac?
![]() |
Assignee | |
Comment 12•5 years ago
|
||
(In reply to Andrew Sun from comment #11)
(In reply to Honza Bambas (:mayhemer) from comment #10)
The binaries in the linked build are for Linux. Could you make sure the CI server is building for Mac?
Ah! Clicked on a wrong build, here it goes (.dmg):
https://queue.taskcluster.net/v1/task/LVgC4-QQTAO8DZKuqSoqfw/runs/0/artifacts/public/build/target.dmg
Thanks!
Reporter | ||
Comment 13•5 years ago
|
||
The connections still hang on fast.com and speedtest.net (shows speed test errors). Navigation to other sites is possible during the speed test... until it isn't, and their page loads hang too.
With this build, I cannot reproduce the crash anymore.
![]() |
Assignee | |
Comment 14•5 years ago
|
||
(In reply to Andrew Sun from comment #13)
The connections still hang on fast.com and speedtest.net (shows speed test errors). Navigation to other sites is possible during the speed test... until it isn't, and their page loads hang too.
It's not clear what the "their page loads hang too" mean - can you clarify please?
How long do you have to wait to let pages actually load? Or does it end with an error of some kind?
With this build, I cannot reproduce the crash anymore.
Thanks.
Reporter | ||
Comment 15•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #14)
It's not clear what the "their page loads hang too" mean - can you clarify please?
This means either getting stuck at a white screen or loading halfway with missing fonts/images/content. Both cases have an indefinite loading spinner.
How long do you have to wait to let pages actually load? Or does it end with an error of some kind?
I waited at least a minute or two for pages to load (unsuccessfully). I don't think I waited much longer.
![]() |
Assignee | |
Comment 16•5 years ago
|
||
I think we can close this now, we have identified bugs causing various problems: 1563538, 1563695 - fixed. There are few more that may take affect in extreme cases (+250 sharding), will be fixed later.
Description
•