Last Comment Bug 709976 - DNS: asynchronously refresh cache entries near expiration upon use
: DNS: asynchronously refresh cache entries near expiration upon use
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: 11 Branch
: x86_64 Linux
: -- normal (vote)
: mozilla11
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-12 14:42 PST by Patrick McManus [:mcmanus]
Modified: 2013-03-23 08:38 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 1 (10.92 KB, patch)
2011-12-12 14:47 PST, Patrick McManus [:mcmanus]
bzbarsky: review+
Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] 2011-12-12 14:42:25 PST
failure to get a DNS hit results in wait times while we issue a getaddrinfo() lookup. This is true for both cache misses and cache hits that find expired entries.

The result is periodic dns lookup stalls as the cached entry expires. This might be mitigated by an OS cache, or it might not :) We cache things for 3 minutes.

I'm going to add a patch that adds a grace period to the dns cache. If an entry has expired but is within the grace period then we will return the entry and asynchronously start a lookup to refresh the entry. This way before the grace period has expired the entry has been renewed for another 3 minutes and the user never blocks on a lookup of a name that is being frequently used.

To start things off I've changed the 3 minute expiry to be 2 minutes of expiry and 1 minute of grace, which means we won't use stale data in any case when we hadn't before.
Comment 1 Patrick McManus [:mcmanus] 2011-12-12 14:47:15 PST
Created attachment 581064 [details] [diff] [review]
patch 1
Comment 2 Steve Workman [:sworkman] (please use needinfo) 2011-12-13 15:48:54 PST
Patrick, looked over the code - I like the concept.  So, the point seems to be to refresh the cache before an entry expires completely, specifically for frequently used entries as you say in the description.  Is there any reason for the extra 1 min grace period, rather than issuing a getaddrinfo lookup in the last 30 sec period before expiration?  Just curious, because it seems that in the current patch the expiration period is effectively extended by 1 min. So, even though a lookup is issued, the address is used as is for the connection.

It's all numbers at the end of the day, I realise, and I get that the point is to have three phases of an entry's life.
  1. A period where the entry is accepted as is.
  2. A second period where it's accepted but a lookup is issued to refresh, and;
  3. For entries older than that the socket must wait for getaddrinfo to complete.

So, just wondering if there's a rationale behind the figures you chose?
Comment 3 Patrick McManus [:mcmanus] 2011-12-13 17:17:13 PST
(In reply to Steve Workman [:sworkman] from comment #2)
> Patrick, looked over the code - I like the concept.  So, the point seems to
> be to refresh the cache before an entry expires completely, specifically for
> frequently used entries as you say in the description.  Is there any reason
> for the extra 1 min grace period, rather than issuing a getaddrinfo lookup

>, because it seems
> that in the current patch the expiration period is effectively extended by 1
> min.

I don't think so. Without the patch the lifetime of a cache entry was 3 minutes. With the patch it is 2 minutes plus 1 minute of grace. During the grace period the entry can still be used and a lookup is started in the bg. So that's still 3 minutes during which it can be used - effectively the same.

did I miss something? My gut is those numbers should be longer, but for this patch I was just trying to keep it the same.


>   1. A period where the entry is accepted as is.
>   2. A second period where it's accepted but a lookup is issued to refresh,
> and;
>   3. For entries older than that the socket must wait for getaddrinfo to
> complete.

yes: a cache hit, a hit-renewal, and a expired entry.
 
> So, just wondering if there's a rationale behind the figures you chose?

keeping the various hits and misses divided by the legacy amount (3min) was the goal. So the only real choice was the length of the grace period. nsHostResolver tracks time in minute granularity (no, really, it does) so given a lifetime of 3 that pretty much means the grace was 2 or 1 :).. setting it at 2 just seemed too far away from expiration. not sure it really matters if you aren't using the site fairly frequently it won't help you either way. it would be good to convert that stuff to seconds - but we can do that when we have real ttls or just in a separate cleanup patch.
Comment 4 Steve Workman [:sworkman] (please use needinfo) 2011-12-13 17:26:10 PST
Oops: I didn't see the change from 3 to 2 for 'PRUint32 maxCacheLifetime = 2', I just saw the condition adding ' + mGracePeriod'.  Sorry about that.  Thanks for answering the question, though.  And yeah, it seems like it will make a difference for frequently accessed sites for sure, with no change in behavior for other sites. Cool, makes sense.
Comment 5 Boris Zbarsky [:bz] 2011-12-13 22:32:34 PST
Comment on attachment 581064 [details] [diff] [review]
patch 1

The "using cache entry" log message should probably differentiate between the "grace period" case and the "negative" case.

Should the "dns lookup blocking" log message be conditioned on NS_SUCCEEDED(rv) ?

r=me with those changes.
Comment 6 Patrick McManus [:mcmanus] 2011-12-16 18:37:24 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/a61e49ccdf6b
Comment 7 Matt Brubeck (:mbrubeck) 2011-12-17 09:32:18 PST
https://hg.mozilla.org/mozilla-central/rev/a61e49ccdf6b
Comment 8 Richard Seegmiller 2013-03-23 06:03:41 PDT
I realize this bug is closed, however this is the only place I can find as a possible source to understand why the current default value for network.dnsCacheGracePeriod is 2592000.

It seems clear from this bug thread the value should be 1.

Thoughts?
Comment 9 Patrick McManus [:mcmanus] 2013-03-23 07:03:14 PDT
(In reply to Richard Seegmiller from comment #8)
> I realize this bug is closed, however this is the only place I can find as a
> possible source to understand why the current default value for
> network.dnsCacheGracePeriod is 2592000.
> 
> It seems clear from this bug thread the value should be 1.
> 
> Thoughts?

see 792442 where we generally allowed using out of date info as long as a refresh was started - which bounds the amount of time that we will actually use that info to something quite small.
Comment 10 Richard Seegmiller 2013-03-23 08:38:30 PDT
(In reply to Patrick McManus [:mcmanus] from comment #9)
> 
> see 792442 where we generally allowed using out of date info as long as a
> refresh was started - which bounds the amount of time that we will actually
> use that info to something quite small.

Thanks for your reply and for your contributions to Mozilla.

I see that 792442 is where the 2592000 value is first introduced, however I don't see the explanation which you gave above.  I guess a careful reading of the code would show your explanation to be the case...

Note You need to log in before you can comment on or make changes to this bug.