Closed Bug 1253582 Opened 5 years ago Closed 5 years ago

Secure websocket using h2 coalescing map?

Categories

(Core :: Networking: WebSockets, defect)

44 Branch
x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: andrew, Assigned: mcmanus)

Details

(Whiteboard: [necko-active])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36

Steps to reproduce:

Trying to open https://www.idokep.hu/idokep or https://www.idokep.hu/hoterkep pages, which are using socket.io 1.4.5 websockets


Actual results:

mostly the websocket connection fails and after timeout, the page falls back to a failsafe page (idokep_old and hoterkep_old page)


Expected results:

the browser should stay at the current page, and display a zoomable map
On the client side, the connection fails with error code 1006
Instpecting the connection with wireshark when the problem occurs, no connection attempt made to the server. So i think there is no problem on the server side.
Any other browser work perfectly.
Component: Untriaged → Webapp Runtime
OS: Unspecified → All
Hardware: Unspecified → x86_64
Component: Webapp Runtime → Networking: WebSockets
Product: Firefox → Core
public testcase - worth checking
Flags: needinfo?(michal.novotny)
Whiteboard: [necko-active]
Attached file nspr.log.gz
I can reproduce it, but AFAICS the problem is on the server side. On the first load everything works correctly, when I reload the page websocket connection cannot be established because server responds with error 404 to upgrade request. I've attached gzipped NSPR log.

First load:
request (time 00:06:41.824742)
http request [
  GET /socket.io/?EIO=3&transport=websocket HTTP/1.1
  Host: hoterkep.idokep.hu
  User-Agent: Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0
  Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  Accept-Language: en-US,en;q=0.5
  Accept-Encoding: gzip, deflate, br
  Sec-WebSocket-Version: 13
  Origin: https://www.idokep.hu
  Sec-WebSocket-Key: bvRpUdlviJzpTt4mqEp2ZA==
  Cookie: _ga=GA1.2.1591016506.1457297472; __gfp_64b=k6dqHPpVSQzkaJriw47pN32wC_aUYpu1dffUbHMAsA7.27; _gat=1
  Connection: keep-alive, Upgrade
  Pragma: no-cache
  Cache-Control: no-cache
  Upgrade: websocket
]
response (time 00:06:41.977152)
http response [
  HTTP/1.1 101 Switching Protocols
  Server: nginx
  Date: Mon, 07 Mar 2016 00:06:41 GMT
  Connection: upgrade
  Upgrade: websocket
  Sec-WebSocket-Accept: dh9So/z/bev0BMf4W2+tJYA9ccU=
]


Second load:
request (time 00:09:36.482819)
http request [
  GET /socket.io/?EIO=3&transport=websocket HTTP/1.1
  Host: hoterkep.idokep.hu
  User-Agent: Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0
  Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  Accept-Language: en-US,en;q=0.5
  Accept-Encoding: gzip, deflate, br
  Sec-WebSocket-Version: 13
  Origin: https://www.idokep.hu
  Sec-WebSocket-Key: XKjo7UpLuYul0M9oQC/Uwg==
  Cookie: _ga=GA1.2.1591016506.1457297472; __gfp_64b=k6dqHPpVSQzkaJriw47pN32wC_aUYpu1dffUbHMAsA7.27; _gat=1
  Connection: keep-alive, Upgrade
  Pragma: no-cache
  Cache-Control: no-cache
  Upgrade: websocket
]
response (time 00:09:36.648987)
http response [
  HTTP/1.1 404 Not Found
  Server: nginx
  Date: Mon, 07 Mar 2016 00:09:36 GMT
  Content-Type: text/html; charset=UTF-8
  Content-Length: 162
  Connection: keep-alive
]

There are another 2 tries to establish websocket connection and both are failed with 404 by the server.
Flags: needinfo?(michal.novotny)
I find out that firefox sometimes connects to the wrong server, and this is not related to websockets. 
These 404 errors comes from a different server, which don't have the websocket service.
The problem occurs even with simple static files.
www.idokep.hu have multiple A records, while hoterkep.idokep.hu only one.
How can these IP adresses mixed up?
It's hard to reproduce correctly the problem.
I've created a test page: http://hoterkep.idokep.hu/index.html
Clicking through several times the the two ssl and the dummy pages, once the websocket fails, the dummy static page also drop a 404 error, beacause only the hoterkep.idokep.hu domain and it's 79.172.211.37 addressed server have that page.
DNS records seems to be ok for me.
grepping through the site log, it's only related to firefox user agents:
     38 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
     38 Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:43.0) Gecko/20100101 Firefox/43.0
     40 Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:44.0) Gecko/20100101 Firefox/44.0
     42 Mozilla/5.0 (Windows NT 5.1; rv:39.0) Gecko/20100101 Firefox/39.0
     47 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0
     51 Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
     52 Mozilla/5.0 (Windows NT 6.2; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
     91 Mozilla/5.0 (Windows NT 5.1; rv:43.0) Gecko/20100101 Firefox/43.0
     95 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
    100 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
    115 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:44.0) Gecko/20100101 Firefox/44.0
    123 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:44.0) Gecko/20100101 Firefox/44.0
    124 Mozilla/5.0 (Windows NT 6.0; rv:44.0) Gecko/20100101 Firefox/44.0
    124 Mozilla/5.0 (Windows NT 6.3; rv:44.0) Gecko/20100101 Firefox/44.0
    132 Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:44.0) Gecko/20100101 Firefox/44.0
    173 GemiusSDK/1.1 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B410 Safari/600.1.4
    177 Mozilla/5.0 (Windows NT 6.1; rv:43.0) Gecko/20100101 Firefox/43.0
    404 Mozilla/5.0 (Windows NT 10.0; rv:44.0) Gecko/20100101 Firefox/44.0
    449 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0
   1347 Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
   1568 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
   1635 -
   3241 Mozilla/5.0 (Windows NT 5.1; rv:44.0) Gecko/20100101 Firefox/44.0
   3600 Mozilla/5.0 (Windows NT 6.1; rv:44.0) Gecko/20100101 Firefox/44.0
   5637 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
After disabled the browser's DNS cache, the connection problem never occurs:
http://ccm.net/faq/555-disabling-the-dns-cache-in-mozilla-firefox
(In reply to andrew from comment #7)
> After disabled the browser's DNS cache, the connection problem never occurs:
> http://ccm.net/faq/555-disabling-the-dns-cache-in-mozilla-firefox

The connection problem appeared again, this setting changed nothing.
thanks for all the information. its probably a bug but I need to do a little more legwork to confirm - you can probably work around it until a fix goes out

the issue is that www.idokep.hu has cert valid for hoterkep.idokep.hu (probably a wildcard) and www.idokep.hu has used http/2 and there is a dns overlap between them.. that ties those hosts together for the purpose of http/2 host coalescing. (i.e. we intentionally route requests for hoterkep.idokep.hu to open connecitons made to the former hostname - though they are clearly labeled as hoterkep.idokep.hu)

its a bug because we aren't using an h2 connection to do this websockets bootstrap - so we shouldn't coalesce.

easy workarounds:
 * don't have the hosts overlap any DNS
 * don't have www use a wildcard cert.. you can get free non-wild card certs on demand from lets encrypt

I'll update the bug when I confirm the issue
Assignee: nobody → mcmanus
Summary: Secure websocket unreliable → Secure websocket using h2 coalescing map?
Removed the overlapping servers form the DNS records, and changed the certificate of hoterkep.idokep.hu, but the problem still exists.
It seems that the only workaround is to totally disable http2 support on the site.
(In reply to andrew from comment #10)
> Removed the overlapping servers form the DNS records, and changed the
> certificate of hoterkep.idokep.hu, but the problem still exists.
> It seems that the only workaround is to totally disable http2 support on the
> site.

Actually disabling http2 and removing the overlapping IPs do the trick. I will do a more detailed test to find out the proper workaround.
changing the cert in that way probly didn't help because www still had a cert valid for hoterkep (i.e. a wildcard).. I'm guessing without seeing a log

removing the overlapping ip should also work.. make sure that neither v4 nor v6 have any overlap.

another option is to have the server that is getting the non desired upgrade to respond to that with 421 instead of 404. I think that would work too - but there might be something special about websockets that would get in the way.. but it might be the easiest thing to try.
the DISALLOW_SPDY caps flag in the http channel (used to bootstrap websockets) is set implicitly based on the websockets upgrade callback being present.. that's fine, and it works for the nsHttpTransaction but it happens after the connectionInfo is established. the CI hash is what prevents the coalescing from happening.

The other changes in this patch are just drive by improvements.
Attachment #8729067 - Flags: review?(michal.novotny)
could you download this trial build and see if it resolves your problem?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd14dd06c604
Flags: needinfo?(andrew)
The connection problem remains with the nightly 48.0a1 (2016-03-11) build.
Test page: https://www.idokep.hu/hoterkep2
Flags: needinfo?(andrew)
Attachment #8729067 - Flags: review?(michal.novotny) → review+
(In reply to andrew from comment #15)
> The connection problem remains with the nightly 48.0a1 (2016-03-11) build.
> Test page: https://www.idokep.hu/hoterkep2

thanks for testing that.. I was on the road when I did it and somehow screwed up the try build when using unfamiliar machines - you'll see that the patch you tested was empty instead of what michal reviewed:
https://hg.mozilla.org/try/rev/e03d4348a766

sorry! I will make a new one.
(In reply to Patrick McManus [:mcmanus] from comment #17)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b28dec1c9a1e

can you verify this build? (I've checked it has the right code :)) Thanks!
Flags: needinfo?(andrew)
It seems to be ok, I can't reproduce the problem with this nightly build (48.0a1 (2016-03-16))
Thak you very much for the fix!
Flags: needinfo?(andrew)
https://hg.mozilla.org/mozilla-central/rev/1789a471b2d5
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8729067 [details] [diff] [review]
0001-Bug-1253582-h2-coalescing-impacts-wss-r-michal.patch

this small patch fixes a websocket interop problem

Approval Request Comment
[Feature/regressing bug #]: long time ago.. at least 2 esrs
[User impact if declined]: some websockets configurations will have intermittent failures
[Describe test coverage new/current, TreeHerder]: manual verification of fix
[Risks and why]: small change that just moves an initialization earlier
[String/UUID change made/needed]: none
Attachment #8729067 - Flags: approval-mozilla-aurora?
Comment on attachment 8729067 [details] [diff] [review]
0001-Bug-1253582-h2-coalescing-impacts-wss-r-michal.patch

This has been in Nightly for a few days, Aurora47+
Attachment #8729067 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.