Closed Bug 861273 Opened 11 years ago Closed 10 years ago

DNS cache stays around for way too long (network.dnsCacheExpirationGracePeriod) - tough on webmasters

Categories

(Core :: Networking: DNS, defect)

20 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 981513

People

(Reporter: mattreportsbugs, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130311191226

Steps to reproduce:

Background: I decided to move my website http://redrockswingdance.com for various reasons. Everything went well, and I removed the old account. I noticed my two firefox browsers (on different machines)  were still directing themselves to the old site, but that was expected as I know about it caching the dns ip lookups. I used my devel firefox browser (different user on same machine) to load the website and work on it. I changed the dns records, and checked them through dnsstuff.org. All was fine, and I thought no more about it.

The site is http://redrockswingdance.com (this is for reproducing it, down below)
Old ip: 31.170.166.97
New ip: 31.170.163.237

Problem: Three days passed. My two main browsers were still up. I had not closed them. I had not used them to access the website during that time. When I tried to, I was redirected to the old host, and received an "Account Unavailable" page (with a normal 200 response) instead of the actual website. The "Account Unavailable" page was expected from the host as I had closed the account. It was *not* expected that I was still being directed to that host. I was expecting the new host with the actual website.

I checked everything I could think of
- nslookup returned the correct address 31.170.163.237
- I restarted firefox. Same problem.
- I restarted the computer, just in case. Same problem.
- I traced the network with wireshark. Firefox was looking up the records for http://redrockswingdance.com, and getting the right response of 31.170.163.237, but it was still going to the old one.

Both firefoxes were set on directing me to the old site.

What helped? *Clearing the cache.* Then I was able to get to the right one.

Steps to reproduce (you may use my site and ip addresses above, or your own):
1) Get firefox to cache an the ip of a site that you can change later. (Browse to an internal site you own, map my site to the old ip address using your hosts file, insert dns record directly into Firefox's cache, etc)
2) Change the record externally. (change hosts file, or if you inserted it into the cache you can ignore this step)
3) Change Firefox's time to three days in the future. (In real life, when a dns change happens, after three days most all web dns servers will have the update record).
4) Browse to the website again, using firefox.
5) Firefox will still be going to the old site.
5b) Restart firefox, restart computer, wonder what is going on :-)
5c) Try to access the site again.
6) Clear firefox's cache.
7) Firefox will now be going to the new site (you may have to restart it once more).

(Note, I asked my old host to have the web server respond with a 404 request for http://redrockswingdance.com, so firefox will be forced to do a lookup. They may do it, they may not. Just FYI.)

My two cents: I can see this being a big problem for webmasters, and their most loyal visitors. People who browse their site frequently will be unable to reach the site if it gets suddenly moved. And not just for a few days, like what happens when the dns records are changed. But for up to 30 days!

That's harsh. Very harsh. By then people would assume the site had died. And relying on hosting providers to "do the right thing" is putting too much power in their hands. What happens if they sell out and the new owner puts up ad pages for each website? Or one could even wonder if this will allow for hosting provider "lockins." ie "Move your site and risk loosing visitors, as many may not be able to reach you for a month."

Yes, I know those are extreme cases, but in my case I had to post to our facebook page informing users what to do if they could not reach the site. I shouldn't have to do that. This is terrible to say, but in this case I'm glad most of them use IE, and Chrome. I can feel my gut wrench as I type that.

network.dnsCacheExpirationGracePeriod should be lowered to 3 days (259200), if not lower. Perhaps that's what the dev originally wanted, and just accidentally added a 0.

http://dxr.mozilla.org/mozilla-central/netwerk/dns/nsHostResolver.cpp#l539

Ranting: And let's not even talk about dns poisoning. A single poisoning lasting 30 days? It would be limited, but still. Poison google.com lookups at the right time and they'd be redirectable for all their lookups for an entire month (since lots of people type website addresses into google search and go where ever it takes them).

Another issue is those who travel will keep on being directed to the same server, even when one may be closer to them, now that they are in Asia or Europe rather than the US, or vice versa.

And I'm still confused why it was looking up dns names and then discarding them... but that will have to be a different bug.



Actual results:

I tried to reach my website on the new host, but was taken to the old one host. As it did not return an error, firefox did not recheck the dns.


Expected results:

As it had been a few days, firefox should have rechecked the dns record to verify the site was really still there.
Component: Untriaged → Networking: DNS
Product: Firefox → Core
+1 on this. 

https://forums.aws.amazon.com/thread.jspa?messageID=472374&#472374

We have found this issue to be affecting us (and our Firefox users), as we often (sometimes daily) update one of our CNAMEs used. 

Experienced on Firefox 22 on Linux (Fedora 18) x86_64
Firefox > 18 on Mac OSX (awaiting confirmation of firefox & osx version)
Firefox > 20 on Windows 7 (awaiting version confirmation)

Here is a summary of what happened when the issue was reproduced on my workstation.

1. DNS cname for www.oursite.tld set on Friday 19 July 2013 to lb1.someservice.tld (a load balancer)
2. Test www.oursite.tld - working as expected.
3. Monday 22 July: Update cname to a new hostname - lb2.someservice.tld. Update lb1.someservice.tld to serve new content (i.e. we don't have record of whether firefox was serving content from lb1 or lb2's IP address)
4. Tuesday 23: update cname to lb3, and update lb1 and lb2 to use same backend servers as lb3.
5. Wednesday 24th: update cname to lb4, and lb2 and lb3 update backend server.
6. Wednesday 24th, 2 hours after cname update from step 5: shut down the load balancer and delete the DNS record for lb1.someservice.tld. firefox is unable to load page. Firebug shows "Connection aborted", with our hostname, and the IP address of lb1.someservice.tld, from 5 days earlier.

During this window, Firefox was not restarted. The TTL for www.oursite.tld is 300 seconds (5 minutes). The TTL on our service providers load balancers is 60 seconds (1 minute).

7. Close firefox and reopen. Visit www.oursite.tld. Site loads as normal, using the correct IP address.

Firefox has Firebug plugin installed.
I don't see any progress on this bug, so adding my 2-cents.  I work on  major website and this bug has a big impact on our FF customer base.  We release code to a new set of servers (with new IP address) and route users to the new IP via DNS updates. A week after the cutover we still numerous (1000's) of FF users still hitting the old site.  Something is not right here.   

Majority of the FF users are version 26/27,   that could be because those are the latest versions at this time.
also see bug 709976
I just hit this problem today too. I am webmaster of a 300+ page website which I just upgraded, which involved migrating it to a new server and a new IP address. I found that while addressing [mywebsite].co.uk worked just fine shortly after the migration, addressing www.[mywebsite].co.uk resulted in the error message "Bad Request (Invalid Hostname)". At first I thought this was a temporary problem with public DNS caches, but then I discovered that there was no problem if I visited www.[mywebsite].co.uk using Chrome or IE, or when pinging it. I eventually traced the problem to my own Firefox internal DNS cache. I just had to close and restart Firefox to be able to reach the URL correctly in Firefox too, but I shouldn't have to do that. 

I am running FF v27.0.1. I have now changed network.dnsCacheExpirationGracePeriod to 3 hours instead of 30 days in my own FF settings, but I reckon this internal FF DNS cache by default should not be kept for any longer than external DNS caches, i.e. around 4 hours, and it may well cause problems for users who are in the habit of keeping their copy of FF running for long periods.
firefox 29 has this code reverted. ff29 will go to the beta channel next week.

The code expects that a migrated website will signal that through a TCP failure to the old IP address (if you can do that, you can workaround the issue). It didn't forsee the virtual hosting case you describe here with the http 4xx or the TLS bad cert name equivalent.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.