1556005 - Connections hang when using HTTP/2 proxy

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Description

•

5 years ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

Start an HTTP/2 proxy server like nghttpx + Squid or Caddy's forwardproxy
Configure Firefox to use the proxy with a PAC file (as described in https://nghttp2.org/documentation/nghttpx-howto.html#http-2-proxy-mode)
Go to a site like fast.com that opens a lot of connections

Actual results:

Some internal stream limit is reached and newer connections cannot be made, breaking the speed test and preventing navigation to other sites.

I also got it to crash once; there should be a crash report with my email attached.

Expected results:

The browser opens more HTTP/2 sessions to accommodate the large number of connections.

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Updated

•

5 years ago

OS: Unspecified → macOS

Hardware: Unspecified → x86_64

Honza Bambas (:mayhemer)

Assignee

Comment 1

•

5 years ago

Thanks for this report! Checking this use case with mayhemer/node_http2_proxy and latest Firefox Nightly I can confirm there is such a problem.

Blocks: secure-proxy

Status: UNCONFIRMED → NEW

Ever confirmed: true

Priority: -- → P1

Honza Bambas (:mayhemer)

Assignee

Updated

•

5 years ago

Assignee: nobody → honzab.moz

Honza Bambas (:mayhemer)

Assignee

Comment 2

•

5 years ago

From a preliminary look at backtrack and logan this seems to be a duplicate or a variant of bug 1463729 for what I can reproduce.

Andrew, few questions:

have you been using MOZ_LOG during your test?
what is your download speed? (as the bug I mention above depends on hi-speed rx)

thanks

Flags: needinfo?(asun)

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 3

•

5 years ago

I did not use MOZ_LOG. But it shouldn't be too big of a problem since you can reproduce it yourself.

My download/upload speed ranges from 150-200 Mbps on the Wi-Fi here. I also noticed this bug occasionally during normal Web browsing, so it might be both throughput and the number of streams.

Flags: needinfo?(asun)

Honza Bambas (:mayhemer)

Assignee

Comment 4

•

5 years ago

(In reply to Andrew Sun from comment #3)

Thanks for a quick answer.

I did not use MOZ_LOG. But it shouldn't be too big of a problem since you can reproduce it yourself.

To explain my question, the I/O that MOZ_LOG/_FILE do obviously slows the socket thread processing and makes that bug way more reproducible; explanation below.

My download/upload speed ranges from 150-200 Mbps on the Wi-Fi here. I also noticed this bug occasionally during normal Web browsing, so it might be both throughput and the number of streams.

That high speed is probably the trigger. The thing is that we have an "optimization" in our h2 implementation which simply loops as long as there are new data on the socket available (w/o querying poll()). If the downstream speed is high, it may loop that way for significant amount of time, blocking our single socket thread, which we poll and process socket events on, from processing new requests.

Kershaw Chang [:kershaw]

Updated

•

5 years ago

Whiteboard: [necko-triaged]

Honza Bambas (:mayhemer)

Assignee

Comment 5

•

5 years ago

Hi Andrew, can you please check the test build at target.dmg which contains a draft fix for bug 1463729 in your setup again?

Thanks.

(Build from try: Bug 1463729 - Break the loop in nsHttpConnection::OnSocketReadable ...)

Flags: needinfo?(asun)

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 6

•

5 years ago

(In reply to Honza Bambas (:mayhemer) from comment #5)

Unfortunately, this only partially fixed the issue. fast.com still has lots of hanging/failed connections, and navigation to other sites during the speed test is sometimes very slow. Also, Firefox eats about 120% CPU during the speed test while transferring very little data.

I also got it to crash again when visiting another site during a speed test. There should be another crash report with my email attached.

Flags: needinfo?(asun)

Honza Bambas (:mayhemer)

Assignee

Comment 7

•

5 years ago

(In reply to Andrew Sun from comment #6)

(In reply to Honza Bambas (:mayhemer) from comment #5)

Unfortunately, this only partially fixed the issue. fast.com still has lots of hanging/failed connections, and navigation to other sites during the speed test is sometimes very slow. Also, Firefox eats about 120% CPU during the speed test while transferring very little data.

Thanks for testing this. I have to confess I have similar results (modulo the CPU load, which I didn't look at, actually) with a nodejs based h2 proxy. Hence, this indicates there is another problem in Firefox and not a proxy.

I also got it to crash again when visiting another site during a speed test. There should be another crash report with my email attached.

You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 8

•

5 years ago

(In reply to Honza Bambas (:mayhemer) from comment #7)

You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.
Here's a link to the crash: https://crash-stats.mozilla.org/report/index/039501f9-e9c1-4794-969f-0054c0190612

Looks like it's in the same place as the other crash report I submitted.

Honza Bambas (:mayhemer)

Assignee

Comment 9

•

5 years ago

(In reply to Andrew Sun from comment #8)

(In reply to Honza Bambas (:mayhemer) from comment #7)

You can go to about:crashes, there you will find all reports and you can share links (or IDs) here. Thanks.
Here's a link to the crash: https://crash-stats.mozilla.org/report/index/039501f9-e9c1-4794-969f-0054c0190612

Looks like it's in the same place as the other crash report I submitted.

Looks like. The missing symbols are from the try run (we apparently don't expose symbols for them) Thanks.

Honza Bambas (:mayhemer)

Assignee

Comment 10

•

5 years ago

Hi Andrew, sorry for not much update lately on this bug, we had an all company meeting last week.

I don't have a final fix at the moment, but actively working on it now, with some progress.

I have a new test build at [1] for OSX that has another version of the patch from bug 1463729. If you can, please test it again under the same conditions. Note that the crash (bug 1556491) is not fixed yet in that build.

It might be helpful for me if you could (as another separate test) run the test build with logging on [2] and try to catch the crash. MOZ_LOG would have to be added sync,pipnss:5 modules. Note that this is changing timing which may affect reproducibility (both directions.)

Thank you.

[1] https://queue.taskcluster.net/v1/task/a7ecOpWIStCnRjpnEo1HTw/runs/0/artifacts/public/build/target.tar.bz2 (apparently we no longer provide dmgs...)
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Flags: needinfo?(asun)

Nhi Nguyen (:nhi)

Updated

•

5 years ago

Whiteboard: [necko-triaged] → [necko-triaged][secure-proxy-mvp]

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 11

•

5 years ago

(In reply to Honza Bambas (:mayhemer) from comment #10)

The binaries in the linked build are for Linux. Could you make sure the CI server is building for Mac?

Flags: needinfo?(asun) → needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 12

•

5 years ago

(In reply to Andrew Sun from comment #11)

(In reply to Honza Bambas (:mayhemer) from comment #10)

The binaries in the linked build are for Linux. Could you make sure the CI server is building for Mac?

Ah! Clicked on a wrong build, here it goes (.dmg):
https://queue.taskcluster.net/v1/task/LVgC4-QQTAO8DZKuqSoqfw/runs/0/artifacts/public/build/target.dmg

Thanks!

Flags: needinfo?(honzab.moz) → needinfo?(asun)

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 13

•

5 years ago

The connections still hang on fast.com and speedtest.net (shows speed test errors). Navigation to other sites is possible during the speed test... until it isn't, and their page loads hang too.

With this build, I cannot reproduce the crash anymore.

Flags: needinfo?(asun)

Honza Bambas (:mayhemer)

Assignee

Comment 14

•

5 years ago

(In reply to Andrew Sun from comment #13)

The connections still hang on fast.com and speedtest.net (shows speed test errors). Navigation to other sites is possible during the speed test... until it isn't, and their page loads hang too.

It's not clear what the "their page loads hang too" mean - can you clarify please?

How long do you have to wait to let pages actually load? Or does it end with an error of some kind?

With this build, I cannot reproduce the crash anymore.

Thanks.

Andrew Sun (prev. asun@cloudflare.com)

Reporter

Comment 15

•

5 years ago

(In reply to Honza Bambas (:mayhemer) from comment #14)

It's not clear what the "their page loads hang too" mean - can you clarify please?

This means either getting stuck at a white screen or loading halfway with missing fonts/images/content. Both cases have an indefinite loading spinner.

How long do you have to wait to let pages actually load? Or does it end with an error of some kind?

I waited at least a minute or two for pages to load (unsuccessfully). I don't think I waited much longer.

Honza Bambas (:mayhemer)

Assignee

Comment 16

•

5 years ago

I think we can close this now, we have identified bugs causing various problems: 1563538, 1563695 - fixed. There are few more that may take affect in extreme cases (+250 sharding), will be fixed later.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WORKSFORME

Bugzilla

Quick Search

Connections hang when using HTTP/2 proxy

Categories

(Core :: Networking: HTTP, defect, P1)

Tracking

()

People

(Reporter: bugzilla-cloudflare, Assigned: mayhemer)

References

Details

(Whiteboard: [necko-triaged][secure-proxy-mvp])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16