DNS: TTL (time-to-live) support in DNS cache

RESOLVED INCOMPLETE

Status

()

defect
--
major
RESOLVED INCOMPLETE
17 years ago
3 years ago

People

(Reporter: david, Unassigned)

Tracking

(Depends on 3 bugs, {helpwanted})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [dns])

(Reporter)

Description

17 years ago
Would it be possible to allow the DNS cache in Mozilla to honor DNS-provided TTL
values?  The fixed (5-minute?) cache timeout value can conflict with DNS-based
load-balancers or fail-over configurations, which prefer a much shorter TTL
value that they express through DNS.  By not letting this TTL value be expressed
through the browser's DNS cache, a fail-over scenario that changes a site's IP
address could leave users unable to access the site again until they restart
their browser or allow the browser's cached IP address to be expired.

It may affect performance for those sites, but that's probably a decision
they're comfortable with.  Sites that choose to use a longer TTL will enjoy less
DNS traffic as they do today with the fixed cache.  Indeed, if the TTL value is
sufficiently large, would we necessarily want to limit it to a smaller value?

Comment 1

17 years ago
WONTFIX: apparently the OS API's don't expose TTL...
Status: UNCONFIRMED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → WONTFIX
Summary: Let DNS cache honor DNS TTL values → DNS: honor TTL values

Comment 2

17 years ago
The OS API may not expose the TTL, but it expresses it directly by caching the
answers for the TTL.  So the answer is to ask the OS every time (or at least
every so often e.g. every 10 seconds for performance reasons) when a name -> IP
address mapping is required.  This bug recently bit us when we moved our mail
server - Users got a confusing "connection refused by imap server" messages, as
it was caching the old IP address.

Even Outlook didn't exhibit this behaviour - we had to tell all Moz users to
shut all instances of mozilla (not just mozilla mail), and restart.  Poor.  With
the exception of some shoddy commercial web caches, pretty much everything seems
to respect TTLs.

Comment 3

17 years ago
REOPEN: 
There is supposed to be behavior where a cached entry would re-request if it
failed to connect.

I'm changing your summary to address the real-world problem you described (since
I think we alredy had a TTL bug filed). 

Just to clarify: most OS's do NOT DNS cache (Mac OS X does, Solaris has nscd (a
system service)). Mozilla does do some DNS caching, so that is where we should
check for this problem you had.
Status: RESOLVED → UNCONFIRMED
Keywords: testcase
Resolution: WONTFIX → ---
Summary: DNS: honor TTL values → DNS: caching should re-request if hostname is renumbered

Comment 4

17 years ago
This is what happened:

Thursday afternoon, mail server functions are moved to a different machine (DNS
TTL is set to 300 seconds) change still in progress when I leave the office.

Tues morning, I get back into the office, dialogue box up "can't connect to
server imap.mail.digitalbrain.com - connection refused", dismiss dialogue and
hit "get mail" - same dialogue reappears.

I use tcpdump to see what it is happening, and a connection is being attempted
to the old IP address (no NS lookups, or traffic to new host).  Typing "host
imap.mail.digitalbrain.com" on the console of the same machine gave the correct
response.

This Linux box is running nscd, as it happens, but I think this is imaterial (I
will test to confirm this tomorrow - I don't think host uses glibc gethostbyname
, BICBW).

Is the strategy of only looking up on failure to connect correct?  I would argue
it isn't - as a systems admin, when I set a TTL, I expect it to be honoured (all
the nameservers on the internet seem to, as well as pretty much all apps), at
the very least, it breaks the "principle of least astonishment".

What happens if I repurpose the original box?  E.g. I move handling of a domain
(mail or http) from one machine to another?  Then you've got to pass down
application level errors (assuming you can even detect incorrect behaviour at
the application level) to the underlying lib, and get it to reconnect.  Yuck.

This sort of looks like moz trying to fill in deficiencies of the underlying OS
- is this really necessary or even desirable?  In the current implementation,
moz is trading speed for correctness on poorly performing OSs, and breaking ones
which perform well.  Also, is it really gaining much?

On this box ns lookups are cached at the NAT server, and also at the DSL router,
and using nscd (I'm not some sort of speed tweak fetishist - this is just the
way things happen to be setup here, for other reasons).

This behaviour was observed on moz from a nightly build (approx a week old), on
Debian/Woody, with 2.4.18 (and nscd running).  Connection was IMAP over SSL.

Comment 5

17 years ago
Another option is to use an external resolver library, such as this:
http://www.chiark.greenend.org.uk/~ian/adns/ instead of the C library.  This is
obviously a big change tho'.

Comment 6

17 years ago
BTW, having checked, nscd is set to not cache host lookups by default on Debian
(and I would assume other Linux distros):

# .... The mechanism in nscd to
# cache hosts will cause your local system to not be able to trust
# forward/reverse lookup checks. DO NOT USE THIS if your system relies on
# this sort of security mechanism. Use a caching DNS server instead.
        enable-cache            hosts           no

Comment 7

17 years ago
-> NEW

Finally got around to looking at some DNS cache problems, and there is no TTL
bug.  I've changed the summary back, yes you are the first filer to discuss this.

re #2, not all OS's we support DNS cache, so we'd need to implement different
cache logic for DNS:TTL caching OS's. If people think this is viable, we should
discuss this HERE. If not, then as far as I can tell, TTL support should be
WONTFIX, as gordon marked it before.

The RFC's on DNS probably say something about this, so I (or someone else)
should probably research that.

re #5, we are having some other DNS problems that might require an external
resolver library to solve, so that might be another interesting bug to file.

I try to keep the conversations in bugs focused, but it seems worth mentioning
here that originally we implemented DNS caching because of performance concerns,
especially for users who have a long round trip to distant DNS servers or slow,
poorly connected DNS servers. 

Since then we had to change caching again, for security reasons. The new version
 basically has an infinity cache. Bug 16287 is where we discuss finding a
solution that is more flexible and less monolithic.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: testcase
Summary: DNS: caching should re-request if hostname is renumbered → DNS: TTL (time-to-live) support in cache
Ben: bug 16287 does not appear to be the bug of which you speak, did you maybe
miss a number?

Comment 9

17 years ago
bug 162871. Sorry!

Updated

17 years ago
Keywords: helpwanted
Target Milestone: --- → Future

Updated

17 years ago
Blocks: 61683
May I suggest that DNS caching in Mozilla be optional under the caching options?
 A simple if wrapper around each DNS cache lookup to ignore the cache if the
user doesn't want DNS caching should be sufficient.  Shipping with caching
enabled and then suggesting that it be disabled for the sake of areas like those
mentionned that need it off would seem like a best solution to me.

Comment 11

17 years ago
michael: that's not a bad idea.  i think it might be very handy in some cases to
be able to tell mozilla to just not bother caching any DNS results.  the default
should be to cache DNS lookups.
I've submitted bug 188505 w.r.t. the preferences option I suggested in comment #10.

Comment 13

17 years ago
fwiw (probably nothing), you can force mozilla to flush its cache by going to
file>work offline twice.

Comment 14

17 years ago
Hmm, this seems to me to be making the behaviour suspect by default, with a tick
box to behave well?  :-(  I would speculate that 99.9% of users will just see
broken pages, and not have the necessary knowledge to even guess that it might
be a DNS caching problem.

I think just imposing an arbitary low TTL (say 2 mins), instead of caching
forever would be a better fix than going to the hassle of adding a gui mechanism
to control caching?

My justification for some low number of minutes of TTL as a default is that one
DNS lookup every x minutes is really not going to be noticable when compared
with HTTP (or any other TCP protocol) traffic, and at many sites the upstream
DNS mechanisms (local OS, local network DNS servers, broadband modem, or ISP)
will cache the result if appropriate, so the roundtrip for the DNS request will
probably be so close to nothing as to not matter.

Long term DNS caching gives quite a high level of breakage (quite a few support
calls in my experience of running our own web site, with users using moz within
the company) for very little benefit!
Comment #14 above looks more like a comment against my bug 188505 than this one.
 That said, no matter what we do to make DNS caching more accurate in the
browser, I believe the user should be able to turn it off, just like disk or
memory caching.  Please comment on that bug in its bug page.

I don't think that lowering the TTL for the DNS cache is the right answer
either, btw; lowering it too much removes any benefit for the code even being
there and if its too large (as it may be now), it causes more errors.

The right answer seems to be to get the TTL for the domain at least from the
SOA, or to not cache the DNS entries automatically on operating systems where
DNS requests are cached already (Linux at least).

Comment 16

17 years ago
I would suggest some benchmarks for latency (preferably with various different
connectivity types) to see if the current caching behaviour is a useful
optimisation at all.  It might also be worth looking at what some other browsers do?

e.g Where on this line is it appropriate to be?

correct, but slower  <------>   broken, but faster


At the moment moz is at the far right.  Here are some questions:

. Do HTTP keep-alives make DNS result caching of little benefit?
. Does moz current load all content for a given page down a single TCP connection?
. For a typical browsing session, what is the effect of setting different TTLs
for moz's internal cache? (My guess is that setting an arbitary TTL of 1min will
give >50% of the performance benefit of infinite caching)

I am unlikely to get a chance to do these tests any time soon, as I don't have a
Windows machine here, and I'm off on holiday at the end of the month..
What would be more useful is a statistical analysis of peoples' browsing habits
(including server info for visited sites).  This isn't likely available in any
morally-upright way, but it would tell us how long people visit a site for, how
long between their clicks, whether the server keeps the connection alive for
them, how many connections are made to machines within the same SOA even if they
have different records, and how many of those use round-robin DNS.

PS, for more DNS caching info, please consider seeing Dan Bernstein's info
collected re: writing the 'dnscache' program:

http://cr.yp.to/djbdns/intro-dns.html (How DNS really works)
http://cr.yp.to/djbdns/resolve.html   (How resolution takes place)
http://cr.yp.to/djbdns/notes.html     (Impotant notes re: DNS and caching)
http://cr.yp.to/djbdns/forgery.html   (How DNS forgery is avoided in caches)

Comment 18

16 years ago
*** Bug 197674 has been marked as a duplicate of this bug. ***

Comment 19

16 years ago
gordon: since the application cannot access TTL, shouldn't we WONTFIX this bug? 

People keep asking for TTL support, while more feasible solutions are discussed
in other bugs.


Summary: DNS: TTL (time-to-live) support in cache → DNS: TTL (time-to-live) support in DNS cache

Comment 20

16 years ago
> since the application cannot access TTL, shouldn't we WONTFIX this bug

It could access TTL, if it used a resolver library, such as adns.

http://www.chiark.greenend.org.uk/~ian/adns/

Comment 21

16 years ago
I think this issue sould be delegated to the Operative System.  Mozilla
shouldn't cache DNS entries, since this work reside in the OS/DNS resolver. We
should rely on it.

Comment 22

16 years ago
jesus: mozilla's dns cache is an optimization (in many cases).

Comment 23

16 years ago
Darin, which OS doesn&#180;t cach&#233; DNS results?. In a correcta behaving OS, keep an
Mozilla internal DNS cach&#233; shouldn&#180;t buy anything. :-???

Am I missing something?.

Comment 24

16 years ago
Perhaps some alternative solutions:
1) Instead of the TTL being "infinite" it will expire after say 30 minutes

2) Flush the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to be
restarted)

3) Every time the DNS result is used, reset the TTL to 5 minutes or so, thus
ignoring a will cause it's DNS entry to quickly expire.

Comment 25

16 years ago
There are separate bugs for each suggestion. That is why I think this bug is
ready for closure.
Hi, I support the idea of providing a way to flush the DNS cache.
I think that attributing a default TTL to every entry is bad but it is an
optimization.

1)For a user experience perspective, I think we should look at what a user is
doing when he cannot load his page.
I would say that typically, he is going to reload the page.
So flushing the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to
be restarted) is a good idea in my opinion.

2)We should also provide an UI to do DNS caching or not in case the user is
using some DNS cache resolver more efficient than Mozilla (dnscache from djb,
bind,...).
The default would be to do caching, but the user would be able to easily
reconfigure that.

Comment 27

16 years ago
*** Bug 220152 has been marked as a duplicate of this bug. ***

Comment 28

16 years ago
*** Bug 223866 has been marked as a duplicate of this bug. ***

Comment 29

16 years ago
nicolas: you can clear mozilla's DNS cache by toggling the offline/online button
in the browser's status bar.  (note: firebird does not have this control.)

Comment 30

15 years ago
OK we can clear the cache by disconnecting/reconnecting.

But I think it would be really easier for Mr John Doe to find how to do it if
the "Clear cache" command erase all the caches, including the DNS cache.
(Reporter)

Comment 31

15 years ago
This seems like a hack to get a greater problem fixed.  In Windows, the fix-all
was a reboot.  In Mozilla, we're moving towards a "clear cache" universal fix,
and that just seems ugly.

But it's arguably better than what we have today.

I just foresee a user visiting a web site, and suddenly they can't hit it
anymore (behind the scenes, DNS has changed due to an outage on the original IP
address).  They call the helpdesk who tells them to clear their cache.  It
works.  Everyone shrugs and moves on.

Nothing obviously points to a "cache" problem as the culprit here, so only
techies are going to understand what the issue is and how to fix it.  For
everyone else, clearing the cache just needs to find its way into that bag of
universal fixes for problems.

I think I'm going to have to side with the previous few posters.  This bug is
about TTL values.  Other issues are discussed in other bugs, or need to be.  If
this isn't possible today, and we're not willing to investigate alternate
resolvers, a WONTFIX seems appropriate.

Comment 32

15 years ago
I really would like to see this fixed, it's a very annoying behavior of Mozilla
(and others) to ignore the TTL of a DNS Entry:

Example: One uses GSBL with 2 Loadbalancers

                foo.test.com
____________________|________________________
|                                            |
   [VirtualIP 1]       |     [VirtualIP 2]
    /    |   \         |      /    |  \
  Server1|  Server3    |  Server1  | Server3
       Server2         |         Server2

Resolving foo.test.com returns 2 IPs with a TTL of 10..
If VirtualIP 1 fails, the DNS Server would only return
the IP of VirtualIP2.. because we got a TTL of 10 seconds, the
service would be interrupted for max. 10 secounds (for 50% of the users).. IF
the browser cares about the TTL.

With Mozilla, the service is unaviable for about $FIXED_TTL, making
GSLB almost unuseable for Port 80, just because common browsers
don't care about the TTL (InternetExplorer also ignores this, i don't
know about opera).


This night, i had to change the IP of a VirtualIP .. i had to do this at about
01:00 (=not much users) because of this bug. IE seems to cache the DNS Entry
until the host gets rebootet: there are still people who connect to the old
ip!.. ouch


About comment #26:
> 1)For a user experience perspective, I think we should look at what a user is
> doing when he cannot load his page.
> I would say that typically, he is going to reload the page.
> So flushing the DNS Cache on Ctrl-Shift-Reload (instead of requring mozilla to
> be restarted) is a good idea in my opinion.

I think, this would be a good (easy and somewhat clean) solution if it is so
hard to handle the TTL of a DNS Entry..

* Mozilla has a DEAD ip in cache
* User gets 'connection refused'
* User hit's reload and flushes the cache
* Mozilla ReResolves and gets a good IP
* Everyone happy
> With Mozilla, the service is unaviable for about $FIXED_TTL, making

(FIXED_TTL is one minute)

mozilla does not know what the TTL is... getaddrinfo doesn't tell.

Comment 34

15 years ago
Adrian: If you want to see this fixed, file bugs against GNU LIBC, Apple, and
Microsoft to provide an API that applications can use to discover TTL ;-)

BTW, as for IP address caching, Mozilla's cache has a fixed TTL of 5 minutes. 
This can be configured via preferences.  You can set the
network.dnsCacheExpiration pref to whatever value in seconds that you like.

With Mozilla, you can also toggle the "File->Work Offline" to clear the DNS cache.

Comment 35

15 years ago
> Adrian: If you want to see this fixed, file bugs against GNU LIBC, Apple, and
> Microsoft to provide an API that applications can use to discover TTL ;-)

Ok, i'll do it ;-)

I understand the problem, belive me.. But it's very annoying that even mozilla
breaks GSLB for WWW :-/

But maybe we could make an acceptable workaround without needing a new API:
 If we got a Connection Refused: Don't cache the entry / remove the entry from
mozillas DNS-Cache

Example:
1. foo.bar.com has 2 VIPs 
2. User loads http://foo.bar.com
 -> Mozilla uses VIP #1
 -> Connecting to VIP #1 works
3. VIP #1 of foo.bar.com dies
4. User clicks on a link at foo.bar.com
 -> Conection Refused
 -> [NEW] mozilla removes foo.bar.com from its cache
5. user clicks again
 -> Mozilla resolves again and got a working IP (only VIP #2)
6. Everything fine :)




Comment 36

15 years ago
Adrian,

So, if Mozilla receives more than one IP address as a result of a DNS query, it
will try to use the first IP address.  If that fails, then it will try the other
IP addresses.  I suppose we could extend that algorithm to repeat the DNS
request, bypassing the local cache, to see if there are any other IP addresses
to try.  Hmm... thanks for the suggestion.

Comment 37

11 years ago
I've encountered the problem using the coral cache. In case the DNS query returns no results the first time, firefox (3.0pre) keeps believing the host doesn't exist because it has inappropriately "cached" the 0-TTL result. Meanwhile, running “host somesite.org.nyud.net” gave a non-empty result set most of the time.

Besides the fix outlined in comment 35, the mozilla-specific DNS cache should discard negative results or keep them with a much shorter TTL; no need for the TTL info the system APIs don't give.
Duplicate of this bug: 524084
Assignee: general → nobody
QA Contact: benc → networking
Target Milestone: Future → ---

Comment 39

6 years ago
I'd like to know about the status of this bug. It is still marked as new, after 5 years from last comment - and almost 11 year from reporting.

I found this bug while searching for another possible bug, but this one does raise some concerns: that an optimization would provoke incorrect behavior. Needless to say, support for buggy OSes has been dropped long know, so this shouldn't be a problem anymore.

About the bug itself, I would like to drop my opinions:

- an optimization designed to resolve a performance issue in buggy OSes should ONLY be enable by default in THAT buggy OSes; on the other OSes, it should be disabled;

- EVERY feature that alters the expected behavior of a standard feature (e.g. DNS caching by the application) MUST have a way to be disabled (e.g. via about:config); this is already documented here: https://developer.mozilla.org/en-US/docs/Mozilla/Preferences/Mozilla_networking_preferences?redirectlocale=en-US&redirectslug=Mozilla_Networking_Preferences#DNS

- the management of TTL and name expiration should not be imported to the application, but be left to the DNS system (library/service/caching); it imports a complexity that is already been treated somewhere else;

Now, if caching is necessary, how should it be used? The answer perhaps is "where the DNS query is too slow"; in fact, since DNS query usually IS a locking query, at this point a proper cache would benefit all OSes.
Duplicate of this bug: 906625

Comment 41

5 years ago
This is especially nasty, since in some situations, it manifests as an invalid SSL certificate (that is, I get a warning because the certificate is for a different domain). I've had this happen twice in the last week. This point:

    "- an optimization designed to resolve a performance issue in buggy OSes should ONLY be enable by default in THAT buggy OSes; on the other OSes, it should be disabled;"

seems like a reasonable approach, no?

Updated

5 years ago
See Also: → 964391
I'm raising the priority level on this bug. The Internet has changed tremendously since 2002. We can no longer rely on IP addresses as a static pointer to a resource. As a matter of fact, the caching we are currently doing creates a security risk to our users.

Caching DNS records beyond their TTL value effectively means that we are sending users to sites that aren't located at these addresses anymore. Most sites and services located in AWS change IPs often, and by caching these IPs, we are sending potentially sensitive traffic to IPs that have been reassigned to someone else. Which is what is happening currently with Firefox Account.

We are also breaking one of the most useful traffic management tool used by sites operators. One only need to look at Alexa's top 30 sites to see that TTL are not meant to be cached.

domain          TTL
---------------+---
google.com.     300
facebook.com.   900
youtube.com.    300
baidu.com.      600
qq.com.         600
taobao.com.     600
amazon.com.     60
sina.com.cn.    60
twitter.com.    30
blogspot.com.   300
google.co.in.   300
linkedin.com.   300
weibo.com.      60
tmall.com.      600
wordpress.com.  300
360.cn.         300
yandex.ru.      300
yahoo.co.jp.    300
vk.com.         900
google.de.      300
sohu.com.       600
soso.com.       600
pinterest.com.  60

We need to move away from this practice as soon as possible. If the Services team agree (:mmayo?), I would like to make this a blocker for FxA/FF29.
Severity: enhancement → major
Flags: needinfo?(mmayo)
(In reply to Julien Vehent [:ulfr] from comment #42)
> We need to move away from this practice as soon as possible. If the Services
> team agree (:mmayo?), I would like to make this a blocker for FxA/FF29.

That's not really how the train release model works; you should talk to the Networking team about prioritizing this bug appropriately.
Flags: needinfo?(jduell.mcbugs)
Julien, can you clarify how we're sending "sensitive data" here?  If DNS is stale (or compromised), SSL should fail and no data should be sent.
I'm going to move the resolution of 42-44 over to bug 981447 - it is subtly different (though clearly related) to this issue.
Flags: needinfo?(mmayo)
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [dns]
(In reply to Mike Connor [:mconnor] from comment #44)
> Julien, can you clarify how we're sending "sensitive data" here?  If DNS is
> stale (or compromised), SSL should fail and no data should be sent.

For our own FxA use and other such services that's true, but this bug could lead to that result for other services that don't use encryption.
Depends on: 820391

Comment 47

5 years ago
IP addresses aren't the security layer, so anything "at risk" for this is already severely insecure.

But it's definitely wrong.  Amazon ELB uses TTLs of around 45-60 seconds, so if you're ignoring that and caching for longer, services can randomly stop working when Amazon shifts load balancers around under the hood.  I assume ELB makes allowances for this (by assuming cache will live longer than specified), and Firefox isn't the biggest offender here (http://www.openaccess.org/index.php?section=163 "[we] ignore TTL records less than one hour"), but I was very surprised to learn that it's an offender at all.  It's hard to get people to fix broken DNS servers when even Firefox gets it wrong.

Comment 48

5 years ago
I am a support engineer at Heroku.  We do see a couple of these tickets a month related to Firefox TTLs.  Usually, it's an ELB shifting IPs and then suddenly people start getting SSL cert warnings for random websites when visiting their own sites.  It would be greatly appreciated if this issue gets its priority raised and gets fixed, as per Julien's comments above.

Updated

5 years ago
Depends on: 1040280
No longer depends on: 1084645
I'm going to close unused meta bugs, but individual work items can be left open
Status: NEW → RESOLVED
Last Resolved: 17 years ago3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.