Closed
Bug 1363451
Opened 8 years ago
Closed 8 years ago
Firefox erroneously re-uses HTTP/2 connections for similar servers
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
RESOLVED
INVALID
People
(Reporter: norby, Unassigned)
Details
Attachments
(1 file)
35.13 KB,
text/csv
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Steps to reproduce:
Users reported in the Flickr help forum that when using the Firefox browser they would have multiple images fail to load. They would get a 302 redirect to another URL that would 404.
https://www.flickr.com/help/forum/en-us/72157679881082324/
Actual results:
Turns out, after looking at logs, and with information provided by our users, we found that traffic destined for c1.staticflickr.com (pointing at ct2.gycs.b.yahoodns.net) was being sent over a connection that was opened for www.flickr.com (handled by ds-ycpi-sombrero-grande.gycpi.b.yahoodns.net). The latter has a generic *.flickr.com rule that sends traffic to our UI front-ends, so when an image request was sent there, it tried to rewrite somewhere sensible and failed.
Expected results:
We believe that the requests for c1.staticflickr.com should have continued to be sent to ct2.gycs.b.yahoodns.net as DNS indicated. The current working theory is that we are running into a condition similar to what id described here:
https://tools.ietf.org/html/rfc7540#section-9.1.1
"In some deployments, reusing a connection for multiple origins can
result in requests being directed to the wrong origin server. For
example, TLS termination might be performed by a middlebox that uses
the TLS Server Name Indication (SNI) [TLS-EXT] extension to select an
origin server. This means that it is possible for clients to send
confidential information to servers that might not be the intended
target for the request, even though the server is otherwise
authoritative."
Because the servers for the two rotations involved share identical certs and SAN information, we theorize that Firefox opted to re-use an existing connection to one to send requests intended the other. We have not looked for (or seen reports of) this happening in the other direction.
This problem isn't completely isolated to Firefox, but in a day's worth of traffic, we saw Firefox in 98% of the User Agent strings, which suggests that this condition triggers much more frequently in Firefox than in other browsers. I've attached a CSV file that shows the breakdown of various UA strings that were impacted on 5/5/2017.
Comment 1•8 years ago
|
||
So there are a lot of CNAMES and geographic load balancing going on here, but I suspect if I went through the client side log of an occurrence of this I would find that
1] c1.staticflickr.com (or its terminal CNAME) resolved to a set of addresses that included an address that was also in the set of addresses that www.flickr.com resolved to. Note the language here is "resolved to a set including" not "connected to".
2] the existing connection (www.flickr.com) was made with a cert that covers c1.staticflickr.com too (I just looked at my connection to www.flickr.com and it included a SAN for *.staticflickr.com, so this seems likely)
RFC 7540 section 9.1.1 encourages reuse of the existing connection under these circumstances if two conditions are met: the host having resolved to the same IP address (note the word is resolved) and the cert being sufficient.
Again, we don't have enough logs to show that's the case here, but I bet its the case here :)
Typically you might get
req A: IP1, IP2, IP3 (connect to IP1)
req B - lookup: IP3, IP4 (reuse conn A)
and be surprised if the host at IP1 was never provisioned for origin B.
But this is rather by design in h2 - the security model is about the cert, and the dns overlap is considered the opt in.
ok - assuming that's what's going on, what can you do?
1] return 421 when you get a request for an origin that you have a cert for but aren't actually authoritative for. the client will refetch it without coalescing. This is the easiest and most pragmatic thing and something you can roll out quickly. (i.e. if the request you receive is for an origin your host is not configured for, return 421 rather than 404.)
2] straighten out your certs to only list SANs that you actually are authoritative for on that host. From a security pov its easy for an attacker to do this same kind of rerouting and your response is treated as authoritative if you have the cert for it. (i.e. MITM the unauthenticated dns)
3] your DNS is being taken as the opt-in, so you could rearrange that though it might have other suboptimalities I understand. Just putting it out there.
4] coming in the next few months to the h2 ecosystem will be the ORIGIN frame - which will all the server to indicate at a per connection granularity what origins it would like to coalesce for (including the option of no origins). This will take over the routing function from the DNS, but the security rule would stay in place.
does this help?
Yes, after some internal discussion, we theorized that this was possibly the "correct" (if undesirable) behavior. After looking more closely at our rotations I found the following:
1] c1.staticflickr.com pointed to a newer set of hosts [eXX.ycpi].
2] www.flickr.com pointed to a mix of older + newer hosts [rXX.ycpi + eXX.ycpi]. <--- The "IP3" overlap you mentioned
3] eXX.ycpi can handle both types of traffic. rXX.ycpi cannot.
4] So if eXX.ycpi showed up in both places, the entire set (including rXX.ycpi hosts) seems to have been marked as valid targets for c1.staticflickr.com traffic based on their being lumped behind one DNS Name with the eXX.ycpi hosts.
So it looks like while the DNS is shared for some hosts between the two pools, the traffic is being set to hosts that are in the same pool, but with different IP addresses (and not just to the servers with the same IPs in both pools). As you point out, part of the problem is that we're being sloppy in re-using the same "mega" cert for different flavors of edge hosts, so we have rXX.ycpi hosts that say it can handle *.staticflickr.com when in reality, it does not.
The ORIGIN frame (option 4) looks to be clearly more precise, but we would probably still suffer (in this case) from being sloppy about spreading which servers an edge node might be able to talk to, so that would likely be part of the final resolution.
In our case, we're probably going to instead move the one type of traffic off of the rotation with the older rXX.ycpi hosts before moving the image serving traffic onto the newer rotation, thereby avoiding any way to accidentally send traffic to the older hosts (option 3). The 421 response (option 1) may or may not actually be faster to deploy in our case, but seems to be the more logically correct way to handle this situation, and is probably something that we'll ensure that ATS is returning when appropriate, whether or not it is used to fix this.
I'm going to go out on a limb and guess that other browsers are probably stricter about whether they feel comfortable re-using connections if there is a shared IP address unless they actually have it open to that specific shared IP address. Alternately (and probably more likely), a co-worker suggests that the other browsers re-try the DNS lookup upon getting the 404. This explains why we saw *some* erroneous traffic from non-Firefox browsers, but is overwhelmingly Firefox-specific. But, the RFC is not strict about this particular case, so caveat emptor (or some other more apropos aphorism).
Thanks for confirming FF's handling of HTTP/2 and prodding me to find the DNS overlap that I wasn't previously aware of. This helps alot (a lot). :)
[https://4.bp.blogspot.com/_D_Z-D2tzi14/S8TRIo4br3I/AAAAAAAACv4/Zh7_GcMlRKo/s400/ALOT.png]
Comment 3•8 years ago
|
||
Thanks. I'll try and remember to ping you when origin comes to fruition
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•