Closed Bug 1420777 (http2-connection-coalescing) Opened 7 years ago Closed 6 months ago

Http/2 connection reuse to non-origin server for new hostname on DNS overlap

Categories

(Core :: Networking: HTTP, task, P2)

57 Branch
task

Tracking

()

RESOLVED FIXED
126 Branch
Tracking Status
firefox126 --- fixed

People

(Reporter: patrick, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-next])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 Steps to reproduce: Set up two subdomains, A.example.com and B.example.com, with overlapping but non-identical origin servers. In my setup, A.example.com has DNS records for X and Y, while B.example.com has a DNS record for Y only. Both server X and Y have the same certificate that share A.example.com and B.example.com Server X will return an error when B.example.com is requested. I then embed an image from B.example.com on A.example.com. Actual results: When Firefox requests A.example.com and hits server X, it will reuse the connection for B.example.com, even though X is NOT an origin server for B.example.com. Expected results: I would have expected that Firefox would reuse the connection from A.example.com for B.example.com if it hit server Y (the shared server), while it wouldn't reuse it if it hit server X (which only has A.example.com).
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
Both servers share at least some DNS, and both servers assert (via tls) that they are authoritative for both A.e.c and B.e.c. This is 100% allowed per 7540. The 421 http error code is the proper way to handle this situation if you have overlapping dns and tls.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
fwiw we also support the HTTP/2 origin extension which would allow the server to disable this behavior.

See here for an explanation of how connection coalescing works in Firefox:
https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/

Alias: http2-connection-coalescing
See Also: → 1190136

(In reply to u408661 from comment #1)

Both servers share at least some DNS, and both servers assert (via tls) that
they are authoritative for both A.e.c and B.e.c. This is 100% allowed per
7540. The 421 http error code is the proper way to handle this situation if
you have overlapping dns and tls.

I would like to argue that this goes against RFC 7540: In 9.1.1, Connection reuse it states:

A connection can be reused as long as the origin server is authoritative (Section 10.1). For TCP connections without TLS,  this depends on the host having resolved to the same IP address.
For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

So for https the host needs to present a valid certificate and resolve to the same IP address.

Firefox should not skip the DNS lookup: If the server operator wants to speed things up and have the browser skip the DNS lookup, they can make it explicit and use the ORIGIN frame to advertise that to the browser. It's confusing when the browser does not respect the DNS without explicitly telling it do so.

In short, I essentially agree with +bart.geesink@surfnet.nl and I think the current behaviour can be improved while still adhering to the standards.

Let me provide some details some of which I already mentioned in 1634888.

1. section "9.1.1. Connection Reuse" reads:

For TCP connections without TLS, this depends on the host having resolved to the same IP address.
For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

So let's say I have a connection 1 whose host "A" has been resolved to IP address 1.1.1.1. At this point connection 1 has a property "Host" set to "A" and a property "IP address" and it is set to 1.1.1.1. In other words the browser ran some sort of "resolution process" whether it involves DNS requests, using DNS caches, picking a random entry in a list and the result of that process for host "A" was 1.1.1.1.

Now I'm performing another request to host "B" that is resolved to IP address 2.2.2.2, but Firefox decides to reuse connection 1. However, that connection's host has not "resolved to the same IP address" through the aforementioned "resolution process".

For me the line "host having resolved to the same IP address" sounds like it tries to prevent exactly this kind of behaviour, that is, it says (in my words) "do not reuse connections that despite having the same host resolved to a different IP address, only reuse connections if their target IP addresses match". Otherwise it's hard to interpret the spirit of this sentence as saying reuse connection meant for one IP address to serve a request meant for another IP address just because of multiple addresses somewhere in resolution process.

For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

I interpret the word "additionally" to mean that it has to first of all satisfy the non-TLS authority establishment and then in addition to that further validate that using TLS-related rules (i.e. "having a certificate that is valid for the host in the URI").

2. Other popular browsers like Chrome and Safari do not seem to have this issue. To quote a Chrome developer: "if Chrome has an open connection to a.website.nl which exhibited a certificate valid for b.website.nl, and a.website.nl and b.website.nl resolve to the same IP address, then Chrome will pool the request for b.website.nl to the existing connection."

If this is a possible interpretation maybe it could benefit Firefox users without sacrificing much performance (and maybe even improving performance, because we wouldn't need to send a request that's bound to fail).

3. Nginx only sends the 421 response when client certificate validation is enabled. See: http://hg.nginx.org/nginx/rev/654d2dae97d3. So one of the most popular webservers will not send a 421 response in the generic case we're discussing here.

So to work around this Firefox behaviour people would need to patch nginx. Does this mean that nginx has a bug (not following RFC 7540) or is their interpretation still legal?

4. In my original ticket 1634888 I've shared a screenshot of Firefox network tab, which shows a request with an illegal or at least very unintuitive property combination: a request whose host is impossible to resolve to its IP address.

To summarize, my motivation is twofold:

  • provide performant and intuitive experience to the users and developers that ideally does not require modifications to their web servers;
  • confirm if the proposed small modification is still compliant with the standards.

(In reply to naktinis from comment #10)

3. Nginx only sends the 421 response when client certificate validation is enabled. See: http://hg.nginx.org/nginx/rev/654d2dae97d3. So one of the most popular webservers will not send a 421 response in the generic case we're discussing here.

It seems it's not easy to implement 421 error code for web developers.
Dragana, do you think we should revise our connection coalescing mechanism?

Flags: needinfo?(dd.mozilla)

Just a note: I will look into this in the next couple of days.

Any updates on this? I understand if you're busy with other tasks, just wanted to check, in case this was missed by accident.

We've seen this same type of behavior for quite some time, we just ran into a new extremely visible recurrence of it doing some failover testing. The interesting part is that the edge network is authoritative as it's a CDN.

(using fake IPs in examples..)
Normal production operation:
www.example.com -> resolves to edge network pop for location with IP 10.1.2.3
another.example.com -> resolve to same edge network pop IP 10.1.2.3 for the same firefox session..
these are routed differently internally based on host header

Failover enacted, DNS records change (they are CNAMES)
www.example.com now resolves to a different edge network, 192.168.1.2
another.example.com resolves to a completely different network and IP, 172.16.1.2

During failover, as stated we change DNS for both www.example.com and another.example.com to now resolve to two independent IPs on a completely different edge network infrastructures. This process is in code and deployed manually with a trigger, so no one committed a typo changing any records. Some Firefox users accessing www.example.com will end up getting a connection reuse to the IP hosting another.example.com, as we can see the response is a canned response from that application when it doesn't recognize the host that was provided in the request.

Other examples I can find just sifting through our logs, there is a hostname that resolves to a number of AWS ELBs, we'll call that one host2.example.com. I see two requests from Mozilla products (one Firefox, one Thunderbird) with that host in the host header for www.example.com's logs. These hosts resolve to IPs on completely different providers. I assume this is happening due to some either DNS caching problem in the agent or strange connection reuse issue to that IP. I haven't been able to see it myself personally to do any low level debugging.

This is only happening with Firefox that we can tell (or Thunderbird which I assume is sharing the same engine or codebase) we don't witness this behavior with any other agents.

Sorry, I meant to say the edge network is authoritative for both domains, the normal production network. The failover scenario where the two IPs are wholly unique is using wildcard certificates as well if that means anything.

@Jon : about this wrong behaviour after a DNS change, see https://bugzilla.mozilla.org/show_bug.cgi?id=1604286 . I've not checked recently if the issue is still there but I doubt it was fixed.

To workaround this issue, i've ensured requests with unknown vhosts returns HTTP 421. Firefox will detect the issue and retry using a new connection and a valid IP. Firefox will still mess the two connections, but will retry each time and get the valid response...

(In reply to pascal from comment #17)

@Jon : about this wrong behaviour after a DNS change, see https://bugzilla.mozilla.org/show_bug.cgi?id=1604286 . I've not checked recently if the issue is still there but I doubt it was fixed.

To workaround this issue, i've ensured requests with unknown vhosts returns HTTP 421. Firefox will detect the issue and retry using a new connection and a valid IP. Firefox will still mess the two connections, but will retry each time and get the valid response...

Thanks for the response, I was looking into that as a possible solution, shouldn't be too difficult.

See Also: → 1670212

Hi Kershaw,
I would think this bug was mark as "RESOLVED" is reasonable.
The bug report https://bugzilla.mozilla.org/show_bug.cgi?id=1420777 you quoted shows that this action may violated rfc which were pointed by naktinis and Bart Geesink 7 mothes ago.
"additionally" were specifically used to emphasize that this "trick" is not recommended to "improve performance", which is also runs counter to the behavior of the current mainstream browsers. Not to mention whether this kind of behavior can really improve performance.

For me, at lease two of major websites in my country won't work fine with this "trick", but other browser work without any problem.
So i think this may not worth for firefox doing this "improve performance" stuff, which caused extra obstacle for users.

Hi,
Sorry for typo.

I would think this bug was mark as "RESOLVED" is reasonable.
I would not think this bug was mark as "RESOLVED" is reasonable.

And additional, it may not a good method to tell user that "you may contact the websites' administrators to adjust their configurations for firefox on your own".

Hi,
After reviewing bug 1634888 submit by "naktinis", this feature may enable mistakly in some situation.

Actually, we only reuse connection for host B when there is an overlap of DNS addresses between A and B.

The scenarios i faced is both of these two website have same unreachable IPv6 address "::".
And then firefox confuse IPv6 records with IPv4 records, and enable this feature mistakly.
Specifically, Firefox choose and reuse different part of ip inside this record, which also violates RFC 7540, and more likely a bug.
In other words, the ip which firefox chose to be reused should inside the same part of two domains' record.
Besides, IPv6 record and IPv4 record should not be treated as the same zone because there is no guarantee that those server have same configuration.

Given that this is causing a lot of pain for developers we've decided to reconsider this issue.

Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Blocks: 1641696

See also bug 1710199

Flags: needinfo?(dd.mozilla)
Assignee: nobody → dd.mozilla
Severity: normal → N/A
Type: defect → task
Priority: -- → P2
Whiteboard: [necko-triaged]
See Also: → 1751400

Hey,

I can confirm that the problem persists in Firefox 91 and affects production setups described in similar bugs — if the root domain example.com has two DNS entries (for redundancy) and both have wildcard certs for *.example.com, then Firefox will incorrectly send requests meant for sub.example.com to the IP address it used to load the root domain page from, even if that IP address isn't returned for sub.example.com. Seems like quite a critical issue.

Well, I think I've already given up. HTTP2 is dead and cannot be used as long as major browsers cannot correctly deal with it. I've just kept it turned off on my web servers. Problem solved. It's not very useful or even needed anyway. HTTP3 isn't even there yet, but will probably suffer the same bugs.

(In reply to Yves Goergen from comment #35)

Well, I think I've already given up. HTTP2 is dead and cannot be used as long as major browsers cannot correctly deal with it. I've just kept it turned off on my web servers. Problem solved. It's not very useful or even needed anyway. HTTP3 isn't even there yet, but will probably suffer the same bugs.

Well, I've given up on Firefox's network stack. Can't take a browser seriously with a RFC violation this blatant.

I just ran into this again where a redirect from the root domain to a subdomain on a different server causes a 404 (because it's using the wrong address), but a forced reload fixes it (because it redoes the http/2 handshake), how is this still not fixed?

Just encountered this in a very similar situation. Is there a suggested workaround short of disabling HTTP2? I'm working on a video streaming platform; losing HTTP2 entirely would be a big disadvantage. My case is...

> GET playback.livepeer.studio/STREAM_NAME.m3u8 (Resolves to 0.0.0.1, 0.0.0.2, 0.0.0.3, Firefox picks 0.0.0.2)
< 302 redirect playback-server-1.livepeer.studio (0.0.0.2)

> GET single-playback-server-1.livepeer.studio (resolves to 0.0.0.1, Firefox incorrectly uses 0.0.0.2)
< 404 Not Found (0.0.0.2)

If I run playback.livepeer.studio using a TLS cert that's only valid for that domain, and then run single-playback-server-1.livepeer.studio using a *.livepeer.studio domain, will that force a renegotiation? That could work...

I'm not completely aware of all the details in this bug report. But I experienced this as a combination of:

  • HTTP/2
  • same IPv4 address (IPv6 is ignored)
  • shared wildcard TLS certificates

Remove any of those and it might work. I've still disabled HTTP/2 on my servers and no plans to activate it as long as a major browser can't properly handle it (or as long as the HTTP standard is broken, whichever applies). I'm using the Apache web server, if that's relevant.

Confirmed that moving the two different domains to different TLS certs works around the issue.

We're seeing issues with this as well from Firefox installations. We do not have overlapping IPV4 DNS records, and we do not have any IPV6 records as of yet. But we see numerous recent-version Firefox installations attempting to reuse HTTP/2 connections between different subdomains and the parent domain which happen to use the same wildcard TLS certificate. We do return 421 for this case, but we'd prefer that it didn't occur at all given that we do not have overlapping DNS records.

I will not have time to work on this bug, so I am unassigning myself.

Assignee: dd.mozilla → nobody
Assignee: nobody → valentin.gosu
Duplicate of this bug: 1796278

(In reply to Andri Möll from comment #34)

Hey,

I can confirm that the problem persists in Firefox 91 and affects production setups described in similar bugs — if the root domain example.com has two DNS entries (for redundancy) and both have wildcard certs for *.example.com, then Firefox will incorrectly send requests meant for sub.example.com to the IP address it used to load the root domain page from, even if that IP address isn't returned for sub.example.com. Seems like quite a critical issue.

I run into this exact issue while attempting to use a particular dev site at wethecustomer.staging.example.com, which due to this bug pointed to wethecustomer.example.com instead (both IPv4). As I had previously used the production site, it pointed to the wrong IP from the very start. I got a certificate warning, as the certificate belonged to *.example.com, but I – foolishly! – just thought that the admins were lazy and hadn't put up a proper cert for the staging site. Luckily, I realised my error before causing any irreversible damage in production, and I will do a more thorough background checks in the future, before even considering to ignore any certificate warning.

But still, it feels critical: the only way I can use the staging site now is to fire up another profile or use a different browser entirely. It seems to be very difficult to regain access to the staging site at all after hitting this bug, and any accidential request to the production site blocks the staging site again. Also, it kind of violates my basic trust to the browser when all of a sudden the domain in the address bar doesn't point to the IP where the browser's very own DNS cache says it points to.

I think you can disable coalescing by setting network.http.http2.coalesce-hostnames to false

(In reply to Valentin Gosu [:valentin] (he/him) from comment #46)

I think you can disable coalescing by setting network.http.http2.coalesce-hostnames to false

Confirming that this solves the issue for me. Thank you very much!

And while you reopened this issue you also wrote that ‘this is causing a lot of pain for developers ’, so I hope that this will help others as well, until this will get a proper fix.

Maybe this could be helpful for some, but in general, that setting is close to a non-fix. I'm not running the web server for myself, it's for everybody. I can't phone the world and instruct them to configure their Firefox so that they're able to access my web server. If that setting helps, it must be a default for any new and existing Firefox installation to be helpful.

the main intention for the introduction of that setting was obviously proxy configurations:
// if we are doing spdy coalescing and haven't recorded the ip address
// for this entry before then make the hash key if our dns lookup
// just completed. We can't do coalescing if using a proxy because the
// ip addresses are not available to the client.

It is required though that the Firefox developers realize that this is a generic problem. The optional setting is not a solution for the totally broken default behaviour.

I just encountered this while migrating a large site between two CDNs (in a gradual, subdomain-by-subdomain process that matches the failover scenario described in https://bugzilla.mozilla.org/show_bug.cgi?id=1420777#c15). The result was surprising both to my team and our contacts at the CDN -- I don't think any of us were aware of features that could short-circuit DNS resolution like this (even once connection coalescing did come to mind as a possibility and we checked the spec).

(Thanks, and sorry to bump, I just wanted to highlight another real-world case where the current implementation results in user-facing breakage and surprising behavior.)

Status: REOPENED → ASSIGNED
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-new]
Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/83ed667f93bb Coalesce connections less aggresively r=necko-reviewers,kershaw

Backed out for causing xpcshell failures in test_connection_coalescing.js.

Flags: needinfo?(valentin.gosu)
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/85b0eb5a028f Coalesce connections less aggresively r=necko-reviewers,kershaw
Flags: needinfo?(valentin.gosu)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 126 Branch
Duplicate of this bug: 1890640
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: