Closed Bug 1634888 Opened 5 years ago Closed 5 years ago

Wrong DNS resolution / mismatch between about:networking and network tab

Categories

(Core :: Networking, defect)

75 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1420777

People

(Reporter: naktinis, Unassigned)

Details

Attachments

(1 file)

Attached image Firefox-dns-problem.png

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

I noticed that for a while half of the images on our web app get resolved to the wrong IP address. Our content is distributed among two data centers served from two different subdomains. At first I thought maybe it's a misconfiguration of our DNS entries, but it appears that's not the case.

Actual results:

When I go to about:networking#dnslookuptool and resolve the domain vv3-0.muse.ai it gets resolved (and cached for 180s) correctly.

When I make a request to it, let's say by navigating to it via the address bar, or clicking "Edit and Resend" in the network tools it goes to the wrong address. DNS cache is ignored.

In more detail:

  1. Our main domain muse.ai resolves to two IP addresses: A and B
  2. Data server vv3-0.muse.ai resolves to one of these addresses: A
  3. Data server vv3-1.muse.ai resolves to the other one: B
  4. When I resolve vv3-0.muse.ai with about:networking#dnslookuptool it resolves correctly to A
  5. I've inspected DNS responses in wireshark and mac console (mdnsresponder) and it always resolves correctly to A (and only A); nslookup and dscacheutil always resolve correctly to A (and only A)

However, here's the problem:
6. When any of vv3-0.muse.ai URLs are being fetched by the browser (navigating, "Edit and Resend", fetching images referred to in HTML etc.) it goes to server B
7. Server B responds with a 301, but that doesn't matter response cache-wise (may matter for some other reasons that I don't know about) because when, for example, using "Edit and Resend" I do actually see the request hitting the server B

Expected results:

  1. Resolving via "DNS Lookup" in about:networking#dnslookuptool and resolution by the browser should result in the same IP.
  2. Firefox should not resolve to an IP address that is not returned by the DNS for that domain.
  3. If the DNS resolution is cached, as indicated in about:networking#dns, that cache should be respected.

Note:
I tried deleting Firefox profiles.
I tried using "Forget this site" functionality in History.
I tried using "Disable Cache" in the network tab.
None of these helped.
I've attached a screenshot indicating the correct resolutions in the about:networking and incorrect resolution reported in the network tab (and the request is actually performed to the wrong server as indicated in server logs).

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Networking
Product: Firefox → Core

This looks like the connection coalescing mechanism in firefox.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE

You resolved it as a duplicate of a closed bug report, but this bug is still unresolved. What is the proper course of action here? Repoen the old bug report?

Please see https://bugzilla.mozilla.org/show_bug.cgi?id=1420777#c1.
I'm afraid we might not want to change this behavior.

BTW, I can't reproduce this. Maybe because I don't have an account for muse.ai? Could you provide the steps to reproduce?
Thanks.

Flags: needinfo?(naktinis)

I created an isolated example where I can reproduce this. You may need to reload a couple of times with Cmd+Shift+R and maybe disable the network cache, but a colleague of mine can also reproduce it so it's not just me (one of the images won't show up depending on which resolution was picked):
https://muse.ai/firefox-dns

I read the report 1420777, the comment you're referrinng to, and RFC 7540 but I still don't think this is resolved (or at the very least is fixable while still adhering to the standards and not sacrificing performance noticably).

1. section "9.1.1. Connection Reuse" reads:

For TCP connections without TLS, this depends on the host having resolved to the same IP address.
For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

So let's say I have a connection 1 whose host "A" has been resolved to IP address 1.1.1.1. At this point connection 1 has a property "Host" set to "A" and a property "IP address" and it is set to 1.1.1.1. In other words the browser ran some sort of "resolution process" whether it involves DNS requests, using DNS caches, picking a random entry in a list and the result of that process for host "A" was 1.1.1.1.

Now I'm performing another request to host "B" that is resolved to IP address 2.2.2.2, but Firefox decides to reuse connection 1. However, that connection's host has not "resolved to the same IP address" through the aforementioned "resolution process".

For me the line "host having resolved to the same IP address" sounds like it tries to prevent these kinds of bugs, that is, it says (in my words) "do not reuse connections that despite having the same host resolved to a different IP address, only reuse connections if their target IP addresses match". Otherwise it's hard to interpret the spirit of this sentence as saying reuse connection meant for one IP address to serve a request meant for another IP address just because of multiple addresses somewhere in resolution process.

For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

I interpret the word "additionally" to mean that it has to first of all satisfy the non-TLS authority establishment and then in addition to that further validate that using TLS-related rules (i.e. "having a certificate that is valid for the host in the URI").

2. For me it just sounds rational to only reuse connections whose target IP addresses match.

3. In my screenshot the network tab is just incorrect. You see the URL (and the Host header not shown in the screenshot) and the Host header does not match the IP address shown. There is no interpretation where this Host is resolved to this IP.

4. I've never seen this in other broswers which I use a lot for testing like Chrome and Safari and I see this in Firefox every day for months.

5. Even assuming that your interpretation of the standard is correct, isn't there another interpretation (say the one that Chrome uses) that is still correct, but also avoids sending requests to IP addresses that the hostname actually resolves to?

Flags: needinfo?(naktinis)

(In reply to naktinis from comment #5)

I created an isolated example where I can reproduce this. You may need to reload a couple of times with Cmd+Shift+R and maybe disable the network cache, but a colleague of mine can also reproduce it so it's not just me (one of the images won't show up depending on which resolution was picked):
https://muse.ai/firefox-dns

I read the report 1420777, the comment you're referrinng to, and RFC 7540 but I still don't think this is resolved (or at the very least is fixable while still adhering to the standards and not sacrificing performance noticably).

1. section "9.1.1. Connection Reuse" reads:

For TCP connections without TLS, this depends on the host having resolved to the same IP address.
For "https" resources, connection reuse additionally depends on having a certificate that is valid for the host in the URI.

So let's say I have a connection 1 whose host "A" has been resolved to IP address 1.1.1.1. At this point connection 1 has a property "Host" set to "A" and a property "IP address" and it is set to 1.1.1.1. In other words the browser ran some sort of "resolution process" whether it involves DNS requests, using DNS caches, picking a random entry in a list and the result of that process for host "A" was 1.1.1.1.

Now I'm performing another request to host "B" that is resolved to IP address 2.2.2.2, but Firefox decides to reuse connection 1. However, that connection's host has not "resolved to the same IP address" through the aforementioned "resolution process".

Actually, we only reuse connection for host B when there is an overlap of DNS addresses between A and B.
In the case of this bug, the connection to vv3-0.muse.ai is coalesced to the connection to muse.ai, not the connection to vv3-1.muse.ai.

Anyway, I'll ask my colleague in bug 1420777 to see if we need to revise our connection coalescing algorithm.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: