Firefox 65: Web socket over h2 causes us to loop CONNECT request to a proxy
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: rainer.meier, Assigned: dragana)
References
Details
(Keywords: 64bit, regression, Whiteboard: [necko-triaged])
Attachments
(2 files)
1.47 MB,
application/octet-stream
|
Details | |
47 bytes,
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Steps to reproduce:
- Installed Firefox 65 release https://ftp.mozilla.org/pub/firefox/releases/65.0/ (exe package, 64-bit, EN)
- Launched Firefox
- Browsing web pages (specifically Facebook Messenger and Facebook causing the issue)
Actual results:
Firefox starts flooding my network with continued requests to edge-chat.facebook.com. My squid proxy log is flooded and looks like this:
...
1548713564.108 35 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3309 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
1548713564.145 33 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3309 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
1548713564.181 33 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3308 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
1548713564.217 33 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3309 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
1548713564.301 82 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3310 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
1548713564.333 30 fdea:5b48:d4c1:1:e82d:509e:c563:fb6e TCP_TUNNEL/200 3309 CONNECT edge-chat.facebook.com:443 - HIER_DIRECT/2a03:2880:f008:1:face:b00c:0:1 -
...
Firefox seems to request the page in a loop. Effectively flooding my upload and download (approx. 1Mbps upload/download on a single machine in my case on). Rolling out this Firefox version in a larger network might cause a huge network load.
Also other services (mainly chat-like services) seem to experience those problem.
Even closing Facebook or messenger.com tabs does not stop the flood unless Firefox is closed down completely.
Expected results:
Firefox might fail to establish some socket or busy waiting and immediately re-tries. Effectively causing a connection flood.
Downgrading to Firefox 64 fixes the problem.
Re-upgrading to Firefox 65 immediately restarts the problem.
I tried to disable all add-ons at no avail. I can't make Firefox stop flooding the network.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
I tried some debugging:
I did create an entirely new Firefox 65 profile and the flooding still occurs immediately after logging on to Facebook. So for sure it's not related to the profile or any add-ons.
Reporter | ||
Comment 3•6 years ago
|
||
(In reply to Matthias Versen [:Matti] from comment #1)
Is this only an issue if you use a proxy ?
Actually you are right. It's harder to trace then but I just tried. It stops if proxy is disabled.
However this is actually a critical issue for company networks and everybody running a proxy then.
Comment 4•6 years ago
|
||
I can't test this because I do not have a Facebook Account.
ni? Honza because this sounds critical and this could be a regression from bug 1520125
Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Rainer Meier from comment #3)
(In reply to Matthias Versen [:Matti] from comment #1)
Is this only an issue if you use a proxy ?
Actually you are right. It's harder to trace then but I just tried. It stops if proxy is disabled.
However this is actually a critical issue for company networks and everybody running a proxy then.
Confirmed. This seems to happen only on proxy connections.
I need to block the rollout to company network now as this bug might cause havok in proxied networks.
Updated•6 years ago
|
Reporter | ||
Comment 6•6 years ago
|
||
Haha, found another page where it happens, although unlikely to happen in corporate environments with proxies usually (just trying to find some chat-enabled pages)... lovoo.com shows the same behavior.
Comment hidden (offtopic) |
Comment hidden (offtopic) |
Comment hidden (offtopic) |
Updated•6 years ago
|
![]() |
||
Comment 10•6 years ago
|
||
I'll ask for a full log as described here as a start:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments
Thanks!
Reporter | ||
Comment 11•6 years ago
|
||
This is a log of the proxy flood generated by Firefox 65. Steps to reproduce:
- Install Firefox 65
- Configure proxy (squid in my case)
- Log on to https://messenger.com/
Flooding to edge-chat.messenger.com starts immediately. Look for it in the logs.
The log has been zipped-up by 7-zip as even a quick logon to messenger.com immediately shows up with > 30MB of logs.
![]() |
||
Comment 12•6 years ago
|
||
Thanks! The problem is captured there, hopefully we will be able to find the cause.
Preliminarily:
- we try to open websocket at https://edge-chat.messenger.com/chat?region=lla
- the transaction (nsHttpTransaction @15d49cb2000 created 2019-01-29 18:30:06.295) handling the upgrade channel request is looping between active/don't reuse/restart and sends CONNECT on every reactivation
seems related to bug 1434137 (websockets over h2).
Assignee | ||
Updated•6 years ago
|
Comment 13•6 years ago
|
||
[Tracking Requested - why for this release]: using a proxy causes Firefox to make network attempts infinitely when visiting a site using web sockets.
Assignee | ||
Comment 14•6 years ago
|
||
Hi Rainer,
I have a fix for the problem and would like to verify it. Can you download this version:
https://queue.taskcluster.net/v1/task/Yys1zhYeQjWe_tnSUvEAWA/runs/0/artifacts/public/build/target.zip
You do not need to install it. You can just run firefox.exe
Thank you
Updated•6 years ago
|
Updated•6 years ago
|
Comment 15•6 years ago
|
||
Dragana, can we work around this issue by setting network.http.spdy.websockets to false? Reporter, can you please try that as well?
Comment 16•6 years ago
|
||
This may block wider rollout of Fx65, so need to fully understand the impact of this issue ASAP.
Assignee | ||
Comment 17•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #15)
Dragana, can we work around this issue by setting network.http.spdy.websockets to false? Reporter, can you please try that as well?
We can. It should work.
This will affect people using http proxy that are trying to use websockets. The server hosting websockets speaks h2 (the serve may not serve websockets over h2, but it speaks h2 generaly)
Updated•6 years ago
|
Comment 18•6 years ago
|
||
Dragana has a patch, but it sounds like the next step is to disable the feature and let her patch ride the trains.
Assignee | ||
Comment 19•6 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #18)
Dragana has a patch, but it sounds like the next step is to disable the feature and let her patch ride the trains.
We can uplift it to beta and maybe to release as well.
I am surprised this was not caught in our automated tests. Also, adding Tom, Tania to suggest including this as an additional scenario to test in our manual test process.
Tracked as a blocker so we don't forget including it as a patch in the next 65 dot release and in beta66.
Reporter | ||
Comment 22•6 years ago
|
||
Alright... I am sorry for late reply but I was busy at work and we might also not be in the same timezone ;-)
Here are my results:
Test 1: Stock Firefox 65
- Install Firefox 65
- create a fresh profile
Result: Successfully reproduced issue, loop occurs.
Test 2: Config change
- Set network.http.spdy.websockets=false
Result: Issue does NOT occur, no problem.
Test 3: Use test-build
- Install Firefox 67a build attached here
- Reset network.http.spdy.websockets to true
Result: Issue does NOT occur, seems to be fixed.
Note: I used the same profile in all tests.
So the issue seems identified and solved in the code as well as there is a work-around. However most companies/users will likely have a hard time to identify why the network load increases and even worse some of them might not be able to roll out the work-around easily.
Many thanks for taking this serious and fixing it that quickly.
Comment 23•6 years ago
|
||
Thanks, Rainer! We're working to flip the pref remotely as a short-term workaround until we can ship the proper fix in later release. Thanks again for your help narrowing this down so quickly.
Comment 24•6 years ago
|
||
I was able to reproduce the original problem in 65, but after trying the build in Comment 14, I tested it with the pref network.http.spdy.websockets set to true and false. Attached are the logs. Please review and see if this is in line with expectations of the bug fix.
https://drive.google.com/file/d/1SY-eFuyZPeFYN7fAFMgm-uJASsZrDIgE/view?usp=sharing
Comment 26•6 years ago
|
||
Attached are my proxy logs for 65 when I did a pref flip without restarting.
https://drive.google.com/file/d/1FQZvZJnGTAY_5zGiazWJAPXDvIHL3AiA/view?usp=sharing
Comment 27•6 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #17)
This will affect people using http proxy that are trying to use websockets. The server hosting websockets speaks h2 (the serve may not serve websockets over h2, but it speaks h2 generaly)
Dragana, when the flooding occurs, can users still use the website/service normally? I'm trying to understand whether this affects the user experience, or is it just generating lots of additional useless traffic? Thanks.
Comment 28•6 years ago
|
||
Rainer, we've pushed out the remote pref flip change.
You should see within the next 6 hours a new "hotfix-disable-websockets-over-h2-65-1523978" study in the "Active studies" section of about:studies. Once you show as enrolled in that, you should be seeing normal behavior with your proxy server again.
Reporter | ||
Comment 29•6 years ago
|
||
Hello Ryan, I just checked about:studies and up to now I just see pref-flip-screenshots-release-1369150 which seems to change extensions.screenshots.system-disabled to true.
So I guess I have to wait a bit or is there a way to update the studies?
I will report as soon as I received the websocket hotfix.
Comment 30•6 years ago
|
||
Yeah, it can take up to 6 hours to enroll. Thanks for checking!
Reporter | ||
Comment 31•6 years ago
|
||
Ryan, I guess I might not get this study. Simply due to the fact that participations to studies are disabled within my organization (via GPO). So this bug is bad in multiple ways. It seems to affect mainly corporate users. Corporate users however tend to disable participations to studies and/or feedback systems, automatic reports, data collection and telemetry.
Maybe you can confirm that I will not receive the study if the following option is disabled in "Firefox Data Collection and Use" settings:
[ ] Allow Firefox to send technical and interation data to Mozilla
[ ] Allow Firefox to install and run studies
[ ] Allow Firefox to make personalized extension recommendations
All those options are off in all corporate environments where I rolled out Firefox as those settings are incompatible with corporate policies.
Comment 32•6 years ago
|
||
"Allow Firefox to install and run studies" being disabled will indeed not enroll you :-\
Reporter | ||
Comment 33•6 years ago
|
||
Ryan, I think it's clear to me that it will. However if it would, then my next question as an operator of Firefox in corporate (small and medium business) environments would have been: How can you manipulate settings in my company-wide enrolled Firefox and how does this comply to corporate policies?
Indeed I used the official GPO to disable exactly this - unwanted communication and remote-management and control of software running in corporate environment.
So what I am saying is that I think my setup is not uncommon. Especially within the group of corporate users using a proxy it is also likely that the IT guidelines do not allow for telemetry and similar things. So the hotfix will exactly fail in those cases where the hotfix is needed. It's not needed for private users without proxies but it's needed in corporate environments with proxies and in such environments the hotfix is very likely not to be effective.
Reporter | ||
Comment 34•6 years ago
|
||
Alright, I have tested enabling studies and yes I got two hotfixes pushed now:
- hotfix-disable-websockets-over-h2-65-1523978, network.http.spdy.websockets=false
- prefflip-push-performance-1491171, dom.push.alwaysConnect=true
But I repeat again: I cannot rollout Firefox 65 in any of my managed environments as all those environments have "studies" disabled by policy for good reason. If I explain those companies that I have to enable participation in some "studies" in production environment just to get this fixed I better look for another job then as I am not doing mine right...
Comment 35•6 years ago
|
||
I think that's taking this specific bug off-topic. There's plenty of documentation about what data collection we do and how our studies work linked directly from the preferences section controlling them. As an org that cares about user agency and privacy, we certainly understand the trade-offs of having systems like these in place and do our best to explain the decisions made.
Anyway, yes, it's clear we're going to need to ship a 65.0.l release for this issue. Realistically, that won't be able to happen until next week at the earliest. This bug doesn't even have a patch up for review yet. I'm sorry for the inconvenience - this is obviously the kind of issue that doesn't easily turn up during pre-release testing, which is certainly painful for us all.
Reporter | ||
Comment 36•6 years ago
|
||
Hi Ryan, don't get me wrong. I think you should deploy the study anyway and flip the option. Just to fix at least the ones you can reach.
My intention was just to point out that this "hotfix" is probably not as effective as you might think as it does not reach the users actually affected.
So yes, I confirm the study works and puts the hotfix in place. I will however have to wait for Firefox 65.0.1 release for a rollout and I recommend any admin in a proxied environment to do the same. Not more, not less.
Many thanks again for your quick responses and investigations to everyone participating here.
I think we have nailed down the issue and hopefully we will see Firefox 65.0.1 soon as well as some updated test cases to prevent it re-appearing.
Thanks again!
Comment 37•6 years ago
|
||
We'll certainly look into what we can do to catch this kind of problem in pre-release, but honestly, feedback from admins such as yourself during the Beta cycle is likely to find more issues that we ever can on our own given the number of edge cases which can exist within those managed environments. That's an area where we'd love to see more admins trying things out when there's still time to get bugs fixed before release.
Reporter | ||
Comment 38•6 years ago
|
||
I fully agree. Granted I could have tested Firefox 65 pre-release or at least the release candidates before the release. Also granted the issue happens on pages (facebook?) not mainly used within companies, but might potentially also affect corporate communication tools. I also admit that in many corporate environments Firefox ESR might be the better option to go but in my (medium sized) companies I did not see a reason to go for ESR. I am also not sure if I would have seen this specific bug during testing of beta releases upfront as I just noticed "by accident" when checking the gateway traffic and proxy logs for a completely unrelated issue.
So yes, it is indeed an edge-case. But we got it nailed down pretty quickly and Firefox 65.0.1 is in the pipe. So I just stick with version 64.0.2 meanwhile. Not a huge deal.
There is a good reason I still stick to Firefox also in corporate application. With recent changes (GPO support, MSI installer) it became much more manageable. Just waiting for other fixes like correct placement of data in %LocalAppData% for further corporate compatibility improvements. But this is certainly going beyond this report.
I will for sure keep opening issue if I detect some. But the fact I didn't had to open critical bug reports in the past is sufficient for me to keep supporting Firefox.
Thanks again to everybody involved solving the issue and testing.
Assignee | ||
Comment 39•6 years ago
|
||
Assignee | ||
Comment 40•6 years ago
|
||
(In reply to Grover Wimberly IV [:Grover-QA] Use NeedInfo from comment #24)
I was able to reproduce the original problem in 65, but after trying the build in Comment 14, I tested it with the pref network.http.spdy.websockets set to true and false. Attached are the logs. Please review and see if this is in line with expectations of the bug fix.
https://drive.google.com/file/d/1SY-eFuyZPeFYN7fAFMgm-uJASsZrDIgE/view?usp=sharing
The log looks as expected. The patch solves the problem. Thank you.
Assignee | ||
Comment 41•6 years ago
|
||
(In reply to Tom Grabowski [:TomGrab] from comment #27)
(In reply to Dragana Damjanovic [:dragana] from comment #17)
This will affect people using http proxy that are trying to use websockets. The server hosting websockets speaks h2 (the serve may not serve websockets over h2, but it speaks h2 generaly)
Dragana, when the flooding occurs, can users still use the website/service normally? I'm trying to understand whether this affects the user experience, or is it just generating lots of additional useless traffic? Thanks.
No they cannot. The http request is stack in a loop and it will only go out the loop if the request is canceled :(
Assignee | ||
Updated•6 years ago
|
Comment 42•6 years ago
|
||
Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4fa023708467
Check whether we really have a response for a upgrade request before making a connection DontReuse. r=michal
Assignee | ||
Comment 43•6 years ago
|
||
Comment on attachment 9040377 [details]
Check whether we really have a response for a upgrade request before making a connection DontReuse. r=michal
Beta/Release Uplift Approval Request
Feature/Bug causing the regression
User impact if declined
Users using websockets hosted on a server that speaks http/2.0 and also use a http proxy will get in a loop and will not be able to connect to the websocket. They can get out of the loop only by canceling the websocket request.
Is this code covered by automated tests?
No
Has the fix been verified in Nightly?
No
Needs manual test from QE?
No
If yes, steps to reproduce
comment #24 confirms that QA can reproduce the issue and has tested the fix on a try build.
List of other uplifts needed
None
Risk to taking this patch
Low
Why is the change risky/not risky? (and alternatives if risky)
It is very small change. I have also looked at the log with the fix and all possible code paths around that code were used in the log and the behavior were as expected, without any new regressions.
String changes made/needed
Comment 44•6 years ago
|
||
Comment on attachment 9040377 [details]
Check whether we really have a response for a upgrade request before making a connection DontReuse. r=michal
Hoping to get this into the 66 beta 4 build today so that we can see results for including it in a possible 65 dot release.
Comment 45•6 years ago
|
||
bugherder uplift |
Comment 46•6 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #20)
I am surprised this was not caught in our automated tests. Also, adding Tom, Tania to suggest including this as an additional scenario to test in our manual test process.
This will be added to our analysis document for 65 regarding including this test in our manual test process
Comment 47•6 years ago
|
||
bugherder |
Comment 48•6 years ago
|
||
Rainer, we shipped 66.0b4 on Friday with this patch included. Would you be willing/able to test it in your environment to confirm that everything behaves as expected? Would be great to get that confirmation prior to accepting it for the 65.0.1 release. Thanks!
Reporter | ||
Comment 49•6 years ago
|
||
I just tested this:
- Using Firefox 65.0 (release)
- Set network.http.spdy.websockets=true (default)
-> Result: I could reproduce the loop
Then I closed Firefox and Installed Firefox 66.0b4.
-> Result: No loop.
So I confirm the bug to be fixed in Firefox 66.0b4.
Can we expect this to be in the next point release 65.0.1 as well or do we need to skip Firefox 65 entirely and wait for Firefox 66.0 release?
Comment 50•6 years ago
|
||
I intend to take it in 65.0.1, yes. Aiming to ship that early next week.
Comment 51•6 years ago
|
||
Comment on attachment 9040377 [details]
Check whether we really have a response for a upgrade request before making a connection DontReuse. r=michal
[Triage Comment]
Simple fix which has been verified on Nightly and Beta. Approved for 65.0.1 (shipping early next week).
Comment 52•6 years ago
|
||
bugherder uplift |
Updated•6 years ago
|
Comment 53•6 years ago
|
||
Rainer, the fix for this bug is included in the 65.0.1 release being built today and due to ship tomorrow morning Pacific time. If you could verify that things work as expected for you, I would be most grateful :). Thanks again for all your help getting this diagnosed and verified quickly!
Reporter | ||
Comment 54•6 years ago
|
||
Hi Ryan, I would be glad to do this. But where can I get the 65.0.1 build?
I kept monitoring https://ftp.mozilla.org/pub/firefox/candidates/ for 65.0.1 builds but couln't spot it yet. But maybe I am just monitoring the wrong location. Can you provide some hint? Thank you!
Comment 55•6 years ago
|
||
That's where it'll be once it's built. Should be there in a few hours.
Reporter | ||
Comment 56•6 years ago
|
||
Alright. Here it's currently 9pm local time. I hope I can do the test this night. I will keep monitoring the candidates and provide feedback as soon as I can.
Reporter | ||
Comment 57•6 years ago
|
||
I just did the tests:
- Installed Firefox 65.0 to verify the issue.
- Upgraded to Firefox 65.0.1 (using 64-bit en_US MSI package build 2 from https://ftp.mozilla.org/pub/firefox/candidates/65.0.1-candidates/build2/win64/en-US/)
The problem seems to be resolved.
Many thanks again.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 58•5 years ago
|
||
Please specify a root cause for this bug. See :tmaity for more information.
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Description
•