Closed Bug 151929 Opened 23 years ago Closed 9 years ago

DNS: TTL (time-to-live) support in DNS cache

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: david, Unassigned)

References

(Depends on 3 open bugs)

Details

(Keywords: helpwanted, Whiteboard: [dns])

Would it be possible to allow the DNS cache in Mozilla to honor DNS-provided TTL values? The fixed (5-minute?) cache timeout value can conflict with DNS-based load-balancers or fail-over configurations, which prefer a much shorter TTL value that they express through DNS. By not letting this TTL value be expressed through the browser's DNS cache, a fail-over scenario that changes a site's IP address could leave users unable to access the site again until they restart their browser or allow the browser's cached IP address to be expired. It may affect performance for those sites, but that's probably a decision they're comfortable with. Sites that choose to use a longer TTL will enjoy less DNS traffic as they do today with the fixed cache. Indeed, if the TTL value is sufficiently large, would we necessarily want to limit it to a smaller value?
WONTFIX: apparently the OS API's don't expose TTL...
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Summary: Let DNS cache honor DNS TTL values → DNS: honor TTL values
The OS API may not expose the TTL, but it expresses it directly by caching the answers for the TTL. So the answer is to ask the OS every time (or at least every so often e.g. every 10 seconds for performance reasons) when a name -> IP address mapping is required. This bug recently bit us when we moved our mail server - Users got a confusing "connection refused by imap server" messages, as it was caching the old IP address. Even Outlook didn't exhibit this behaviour - we had to tell all Moz users to shut all instances of mozilla (not just mozilla mail), and restart. Poor. With the exception of some shoddy commercial web caches, pretty much everything seems to respect TTLs.
REOPEN: There is supposed to be behavior where a cached entry would re-request if it failed to connect. I'm changing your summary to address the real-world problem you described (since I think we alredy had a TTL bug filed). Just to clarify: most OS's do NOT DNS cache (Mac OS X does, Solaris has nscd (a system service)). Mozilla does do some DNS caching, so that is where we should check for this problem you had.
Status: RESOLVED → UNCONFIRMED
Keywords: testcase
Resolution: WONTFIX → ---
Summary: DNS: honor TTL values → DNS: caching should re-request if hostname is renumbered
This is what happened: Thursday afternoon, mail server functions are moved to a different machine (DNS TTL is set to 300 seconds) change still in progress when I leave the office. Tues morning, I get back into the office, dialogue box up "can't connect to server imap.mail.digitalbrain.com - connection refused", dismiss dialogue and hit "get mail" - same dialogue reappears. I use tcpdump to see what it is happening, and a connection is being attempted to the old IP address (no NS lookups, or traffic to new host). Typing "host imap.mail.digitalbrain.com" on the console of the same machine gave the correct response. This Linux box is running nscd, as it happens, but I think this is imaterial (I will test to confirm this tomorrow - I don't think host uses glibc gethostbyname , BICBW). Is the strategy of only looking up on failure to connect correct? I would argue it isn't - as a systems admin, when I set a TTL, I expect it to be honoured (all the nameservers on the internet seem to, as well as pretty much all apps), at the very least, it breaks the "principle of least astonishment". What happens if I repurpose the original box? E.g. I move handling of a domain (mail or http) from one machine to another? Then you've got to pass down application level errors (assuming you can even detect incorrect behaviour at the application level) to the underlying lib, and get it to reconnect. Yuck. This sort of looks like moz trying to fill in deficiencies of the underlying OS - is this really necessary or even desirable? In the current implementation, moz is trading speed for correctness on poorly performing OSs, and breaking ones which perform well. Also, is it really gaining much? On this box ns lookups are cached at the NAT server, and also at the DSL router, and using nscd (I'm not some sort of speed tweak fetishist - this is just the way things happen to be setup here, for other reasons). This behaviour was observed on moz from a nightly build (approx a week old), on Debian/Woody, with 2.4.18 (and nscd running). Connection was IMAP over SSL.
Another option is to use an external resolver library, such as this: http://www.chiark.greenend.org.uk/~ian/adns/ instead of the C library. This is obviously a big change tho'.
BTW, having checked, nscd is set to not cache host lookups by default on Debian (and I would assume other Linux distros): # .... The mechanism in nscd to # cache hosts will cause your local system to not be able to trust # forward/reverse lookup checks. DO NOT USE THIS if your system relies on # this sort of security mechanism. Use a caching DNS server instead. enable-cache hosts no
-> NEW Finally got around to looking at some DNS cache problems, and there is no TTL bug. I've changed the summary back, yes you are the first filer to discuss this. re #2, not all OS's we support DNS cache, so we'd need to implement different cache logic for DNS:TTL caching OS's. If people think this is viable, we should discuss this HERE. If not, then as far as I can tell, TTL support should be WONTFIX, as gordon marked it before. The RFC's on DNS probably say something about this, so I (or someone else) should probably research that. re #5, we are having some other DNS problems that might require an external resolver library to solve, so that might be another interesting bug to file. I try to keep the conversations in bugs focused, but it seems worth mentioning here that originally we implemented DNS caching because of performance concerns, especially for users who have a long round trip to distant DNS servers or slow, poorly connected DNS servers. Since then we had to change caching again, for security reasons. The new version basically has an infinity cache. Bug 16287 is where we discuss finding a solution that is more flexible and less monolithic.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: testcase
Summary: DNS: caching should re-request if hostname is renumbered → DNS: TTL (time-to-live) support in cache
Ben: bug 16287 does not appear to be the bug of which you speak, did you maybe miss a number?
bug 162871. Sorry!
Keywords: helpwanted
Target Milestone: --- → Future
Blocks: 61683
May I suggest that DNS caching in Mozilla be optional under the caching options? A simple if wrapper around each DNS cache lookup to ignore the cache if the user doesn't want DNS caching should be sufficient. Shipping with caching enabled and then suggesting that it be disabled for the sake of areas like those mentionned that need it off would seem like a best solution to me.
michael: that's not a bad idea. i think it might be very handy in some cases to be able to tell mozilla to just not bother caching any DNS results. the default should be to cache DNS lookups.
I've submitted bug 188505 w.r.t. the preferences option I suggested in comment #10.
fwiw (probably nothing), you can force mozilla to flush its cache by going to file>work offline twice.
Hmm, this seems to me to be making the behaviour suspect by default, with a tick box to behave well? :-( I would speculate that 99.9% of users will just see broken pages, and not have the necessary knowledge to even guess that it might be a DNS caching problem. I think just imposing an arbitary low TTL (say 2 mins), instead of caching forever would be a better fix than going to the hassle of adding a gui mechanism to control caching? My justification for some low number of minutes of TTL as a default is that one DNS lookup every x minutes is really not going to be noticable when compared with HTTP (or any other TCP protocol) traffic, and at many sites the upstream DNS mechanisms (local OS, local network DNS servers, broadband modem, or ISP) will cache the result if appropriate, so the roundtrip for the DNS request will probably be so close to nothing as to not matter. Long term DNS caching gives quite a high level of breakage (quite a few support calls in my experience of running our own web site, with users using moz within the company) for very little benefit!
Comment #14 above looks more like a comment against my bug 188505 than this one. That said, no matter what we do to make DNS caching more accurate in the browser, I believe the user should be able to turn it off, just like disk or memory caching. Please comment on that bug in its bug page. I don't think that lowering the TTL for the DNS cache is the right answer either, btw; lowering it too much removes any benefit for the code even being there and if its too large (as it may be now), it causes more errors. The right answer seems to be to get the TTL for the domain at least from the SOA, or to not cache the DNS entries automatically on operating systems where DNS requests are cached already (Linux at least).
I would suggest some benchmarks for latency (preferably with various different connectivity types) to see if the current caching behaviour is a useful optimisation at all. It might also be worth looking at what some other browsers do? e.g Where on this line is it appropriate to be? correct, but slower <------> broken, but faster At the moment moz is at the far right. Here are some questions: . Do HTTP keep-alives make DNS result caching of little benefit? . Does moz current load all content for a given page down a single TCP connection? . For a typical browsing session, what is the effect of setting different TTLs for moz's internal cache? (My guess is that setting an arbitary TTL of 1min will give >50% of the performance benefit of infinite caching) I am unlikely to get a chance to do these tests any time soon, as I don't have a Windows machine here, and I'm off on holiday at the end of the month..
What would be more useful is a statistical analysis of peoples' browsing habits (including server info for visited sites). This isn't likely available in any morally-upright way, but it would tell us how long people visit a site for, how long between their clicks, whether the server keeps the connection alive for them, how many connections are made to machines within the same SOA even if they have different records, and how many of those use round-robin DNS. PS, for more DNS caching info, please consider seeing Dan Bernstein's info collected re: writing the 'dnscache' program: http://cr.yp.to/djbdns/intro-dns.html (How DNS really works) http://cr.yp.to/djbdns/resolve.html (How resolution takes place) http://cr.yp.to/djbdns/notes.html (Impotant notes re: DNS and caching) http://cr.yp.to/djbdns/forgery.html (How DNS forgery is avoided in caches)
*** Bug 197674 has been marked as a duplicate of this bug. ***
gordon: since the application cannot access TTL, shouldn't we WONTFIX this bug? People keep asking for TTL support, while more feasible solutions are discussed in other bugs.
Summary: DNS: TTL (time-to-live) support in cache → DNS: TTL (time-to-live) support in DNS cache
> since the application cannot access TTL, shouldn't we WONTFIX this bug It could access TTL, if it used a resolver library, such as adns. http://www.chiark.greenend.org.uk/~ian/adns/
I think this issue sould be delegated to the Operative System. Mozilla shouldn't cache DNS entries, since this work reside in the OS/DNS resolver. We should rely on it.
jesus: mozilla's dns cache is an optimization (in many cases).
Darin, which OS doesn&#180;t cach&#233; DNS results?. In a correcta behaving OS, keep an Mozilla internal DNS cach&#233; shouldn&#180;t buy anything. :-??? Am I missing something?.
Perhaps some alternative solutions: 1) Instead of the TTL being "infinite" it will expire after say 30 minutes 2) Flush the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to be restarted) 3) Every time the DNS result is used, reset the TTL to 5 minutes or so, thus ignoring a will cause it's DNS entry to quickly expire.
There are separate bugs for each suggestion. That is why I think this bug is ready for closure.
Hi, I support the idea of providing a way to flush the DNS cache. I think that attributing a default TTL to every entry is bad but it is an optimization. 1)For a user experience perspective, I think we should look at what a user is doing when he cannot load his page. I would say that typically, he is going to reload the page. So flushing the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to be restarted) is a good idea in my opinion. 2)We should also provide an UI to do DNS caching or not in case the user is using some DNS cache resolver more efficient than Mozilla (dnscache from djb, bind,...). The default would be to do caching, but the user would be able to easily reconfigure that.
*** Bug 220152 has been marked as a duplicate of this bug. ***
*** Bug 223866 has been marked as a duplicate of this bug. ***
nicolas: you can clear mozilla's DNS cache by toggling the offline/online button in the browser's status bar. (note: firebird does not have this control.)
OK we can clear the cache by disconnecting/reconnecting. But I think it would be really easier for Mr John Doe to find how to do it if the "Clear cache" command erase all the caches, including the DNS cache.
This seems like a hack to get a greater problem fixed. In Windows, the fix-all was a reboot. In Mozilla, we're moving towards a "clear cache" universal fix, and that just seems ugly. But it's arguably better than what we have today. I just foresee a user visiting a web site, and suddenly they can't hit it anymore (behind the scenes, DNS has changed due to an outage on the original IP address). They call the helpdesk who tells them to clear their cache. It works. Everyone shrugs and moves on. Nothing obviously points to a "cache" problem as the culprit here, so only techies are going to understand what the issue is and how to fix it. For everyone else, clearing the cache just needs to find its way into that bag of universal fixes for problems. I think I'm going to have to side with the previous few posters. This bug is about TTL values. Other issues are discussed in other bugs, or need to be. If this isn't possible today, and we're not willing to investigate alternate resolvers, a WONTFIX seems appropriate.
I really would like to see this fixed, it's a very annoying behavior of Mozilla (and others) to ignore the TTL of a DNS Entry: Example: One uses GSBL with 2 Loadbalancers foo.test.com ____________________|________________________ | | [VirtualIP 1] | [VirtualIP 2] / | \ | / | \ Server1| Server3 | Server1 | Server3 Server2 | Server2 Resolving foo.test.com returns 2 IPs with a TTL of 10.. If VirtualIP 1 fails, the DNS Server would only return the IP of VirtualIP2.. because we got a TTL of 10 seconds, the service would be interrupted for max. 10 secounds (for 50% of the users).. IF the browser cares about the TTL. With Mozilla, the service is unaviable for about $FIXED_TTL, making GSLB almost unuseable for Port 80, just because common browsers don't care about the TTL (InternetExplorer also ignores this, i don't know about opera). This night, i had to change the IP of a VirtualIP .. i had to do this at about 01:00 (=not much users) because of this bug. IE seems to cache the DNS Entry until the host gets rebootet: there are still people who connect to the old ip!.. ouch About comment #26: > 1)For a user experience perspective, I think we should look at what a user is > doing when he cannot load his page. > I would say that typically, he is going to reload the page. > So flushing the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to > be restarted) is a good idea in my opinion. I think, this would be a good (easy and somewhat clean) solution if it is so hard to handle the TTL of a DNS Entry.. * Mozilla has a DEAD ip in cache * User gets 'connection refused' * User hit's reload and flushes the cache * Mozilla ReResolves and gets a good IP * Everyone happy
> With Mozilla, the service is unaviable for about $FIXED_TTL, making (FIXED_TTL is one minute) mozilla does not know what the TTL is... getaddrinfo doesn't tell.
Adrian: If you want to see this fixed, file bugs against GNU LIBC, Apple, and Microsoft to provide an API that applications can use to discover TTL ;-) BTW, as for IP address caching, Mozilla's cache has a fixed TTL of 5 minutes. This can be configured via preferences. You can set the network.dnsCacheExpiration pref to whatever value in seconds that you like. With Mozilla, you can also toggle the "File->Work Offline" to clear the DNS cache.
> Adrian: If you want to see this fixed, file bugs against GNU LIBC, Apple, and > Microsoft to provide an API that applications can use to discover TTL ;-) Ok, i'll do it ;-) I understand the problem, belive me.. But it's very annoying that even mozilla breaks GSLB for WWW :-/ But maybe we could make an acceptable workaround without needing a new API: If we got a Connection Refused: Don't cache the entry / remove the entry from mozillas DNS-Cache Example: 1. foo.bar.com has 2 VIPs 2. User loads http://foo.bar.com -> Mozilla uses VIP #1 -> Connecting to VIP #1 works 3. VIP #1 of foo.bar.com dies 4. User clicks on a link at foo.bar.com -> Conection Refused -> [NEW] mozilla removes foo.bar.com from its cache 5. user clicks again -> Mozilla resolves again and got a working IP (only VIP #2) 6. Everything fine :)
Adrian, So, if Mozilla receives more than one IP address as a result of a DNS query, it will try to use the first IP address. If that fails, then it will try the other IP addresses. I suppose we could extend that algorithm to repeat the DNS request, bypassing the local cache, to see if there are any other IP addresses to try. Hmm... thanks for the suggestion.
I've encountered the problem using the coral cache. In case the DNS query returns no results the first time, firefox (3.0pre) keeps believing the host doesn't exist because it has inappropriately "cached" the 0-TTL result. Meanwhile, running “host somesite.org.nyud.net” gave a non-empty result set most of the time. Besides the fix outlined in comment 35, the mozilla-specific DNS cache should discard negative results or keep them with a much shorter TTL; no need for the TTL info the system APIs don't give.
Assignee: general → nobody
QA Contact: benc → networking
Target Milestone: Future → ---
I'd like to know about the status of this bug. It is still marked as new, after 5 years from last comment - and almost 11 year from reporting. I found this bug while searching for another possible bug, but this one does raise some concerns: that an optimization would provoke incorrect behavior. Needless to say, support for buggy OSes has been dropped long know, so this shouldn't be a problem anymore. About the bug itself, I would like to drop my opinions: - an optimization designed to resolve a performance issue in buggy OSes should ONLY be enable by default in THAT buggy OSes; on the other OSes, it should be disabled; - EVERY feature that alters the expected behavior of a standard feature (e.g. DNS caching by the application) MUST have a way to be disabled (e.g. via about:config); this is already documented here: https://developer.mozilla.org/en-US/docs/Mozilla/Preferences/Mozilla_networking_preferences?redirectlocale=en-US&redirectslug=Mozilla_Networking_Preferences#DNS - the management of TTL and name expiration should not be imported to the application, but be left to the DNS system (library/service/caching); it imports a complexity that is already been treated somewhere else; Now, if caching is necessary, how should it be used? The answer perhaps is "where the DNS query is too slow"; in fact, since DNS query usually IS a locking query, at this point a proper cache would benefit all OSes.
This is especially nasty, since in some situations, it manifests as an invalid SSL certificate (that is, I get a warning because the certificate is for a different domain). I've had this happen twice in the last week. This point: "- an optimization designed to resolve a performance issue in buggy OSes should ONLY be enable by default in THAT buggy OSes; on the other OSes, it should be disabled;" seems like a reasonable approach, no?
See Also: → 964391
I'm raising the priority level on this bug. The Internet has changed tremendously since 2002. We can no longer rely on IP addresses as a static pointer to a resource. As a matter of fact, the caching we are currently doing creates a security risk to our users. Caching DNS records beyond their TTL value effectively means that we are sending users to sites that aren't located at these addresses anymore. Most sites and services located in AWS change IPs often, and by caching these IPs, we are sending potentially sensitive traffic to IPs that have been reassigned to someone else. Which is what is happening currently with Firefox Account. We are also breaking one of the most useful traffic management tool used by sites operators. One only need to look at Alexa's top 30 sites to see that TTL are not meant to be cached. domain TTL ---------------+--- google.com. 300 facebook.com. 900 youtube.com. 300 baidu.com. 600 qq.com. 600 taobao.com. 600 amazon.com. 60 sina.com.cn. 60 twitter.com. 30 blogspot.com. 300 google.co.in. 300 linkedin.com. 300 weibo.com. 60 tmall.com. 600 wordpress.com. 300 360.cn. 300 yandex.ru. 300 yahoo.co.jp. 300 vk.com. 900 google.de. 300 sohu.com. 600 soso.com. 600 pinterest.com. 60 We need to move away from this practice as soon as possible. If the Services team agree (:mmayo?), I would like to make this a blocker for FxA/FF29.
Severity: enhancement → major
Flags: needinfo?(mmayo)
(In reply to Julien Vehent [:ulfr] from comment #42) > We need to move away from this practice as soon as possible. If the Services > team agree (:mmayo?), I would like to make this a blocker for FxA/FF29. That's not really how the train release model works; you should talk to the Networking team about prioritizing this bug appropriately.
Flags: needinfo?(jduell.mcbugs)
Julien, can you clarify how we're sending "sensitive data" here? If DNS is stale (or compromised), SSL should fail and no data should be sent.
I'm going to move the resolution of 42-44 over to bug 981447 - it is subtly different (though clearly related) to this issue.
Flags: needinfo?(mmayo)
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [dns]
(In reply to Mike Connor [:mconnor] from comment #44) > Julien, can you clarify how we're sending "sensitive data" here? If DNS is > stale (or compromised), SSL should fail and no data should be sent. For our own FxA use and other such services that's true, but this bug could lead to that result for other services that don't use encryption.
Depends on: 820391
IP addresses aren't the security layer, so anything "at risk" for this is already severely insecure. But it's definitely wrong. Amazon ELB uses TTLs of around 45-60 seconds, so if you're ignoring that and caching for longer, services can randomly stop working when Amazon shifts load balancers around under the hood. I assume ELB makes allowances for this (by assuming cache will live longer than specified), and Firefox isn't the biggest offender here (http://www.openaccess.org/index.php?section=163 "[we] ignore TTL records less than one hour"), but I was very surprised to learn that it's an offender at all. It's hard to get people to fix broken DNS servers when even Firefox gets it wrong.
I am a support engineer at Heroku. We do see a couple of these tickets a month related to Firefox TTLs. Usually, it's an ELB shifting IPs and then suddenly people start getting SSL cert warnings for random websites when visiting their own sites. It would be greatly appreciated if this issue gets its priority raised and gets fixed, as per Julien's comments above.
Depends on: 1040280
No longer depends on: 1084645
I'm going to close unused meta bugs, but individual work items can be left open
Status: NEW → RESOLVED
Closed: 23 years ago9 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.