Last Comment Bug 710205 - DNS Cache Telemetry
: DNS Cache Telemetry
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: 11 Branch
: x86_64 Linux
: -- normal (vote)
: mozilla11
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-13 06:59 PST by Patrick McManus [:mcmanus]
Modified: 2011-12-17 10:19 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 1 (11.08 KB, patch)
2011-12-13 07:02 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch 2 (10.94 KB, patch)
2011-12-14 07:41 PST, Patrick McManus [:mcmanus]
bzbarsky: review+
Details | Diff | Review

Description Patrick McManus [:mcmanus] 2011-12-13 06:59:54 PST
I've been thinking about this patch for a while.

Track:

* method used to implement resolvehost() call. (cache hit, hit with renewal started, network lookup, piggyback on existing network lookup, negative hit, or ip literal)

* The age of cache entries when they are evicted from the cache. This is a strong indicator on whether or not the cache is fitting the stuff we try and stick into it.

* the time a network lookup takes, crosstab'd for renewals.
Comment 1 Patrick McManus [:mcmanus] 2011-12-13 07:02:28 PST
Created attachment 581258 [details] [diff] [review]
patch 1
Comment 2 Boris Zbarsky [:bz] 2011-12-13 22:23:11 PST
Comment on attachment 581258 [details] [diff] [review]
patch 1

Hmm.  We can end up with both METHOD_RENEWAL and METHOD_NEGATIVE_HIT accumulated for the same ResolveHost call.  That seems suboptimal, especially because there's no way to disentangle them later.  Or do we not care?

In the "else if (he->rec->onQueue) {" block the telemetry code should come before the comment, since that comment only applies to the other code in that block.

Was there a reason to not just |using namespace mozilla;| at the top of the file and skip all the "mozilla::" stuff?

The DNS_CLEANUP_AGE state can be off id the max cache lifetime changes while hostrecords are live. But this can be assumed rare, I guess.

> +HISTOGRAM(DNS_CLEANUP_AGE, 1, 86400, 50, EXPONENTIAL, "DNS Cache Entry Age
> at Removal Time (sec)")

This is actually recorded in minutes, not seconds.  You may want to change the docstring and the bounds based on that....

Is there a good reason to assume this cannot be longer than a day?

Also, is there a good reason to assume that dns resolves will not take more than 60 seconds?
Comment 3 Patrick McManus [:mcmanus] 2011-12-14 06:36:12 PST
(In reply to Boris Zbarsky (:bz) from comment #2)
> Comment on attachment 581258 [details] [diff] [review]
> patch 1
> 
> Hmm.  We can end up with both METHOD_RENEWAL and METHOD_NEGATIVE_HIT
> accumulated for the same ResolveHost call.  That seems suboptimal,
> especially because there's no way to disentangle them later.  Or do we not
> care?

that's a good point. I've changed it to only track renewals for positive entries.. negative entries are constantly being refreshed so the renewal information isn't really interesting there.


> The DNS_CLEANUP_AGE state can be off id the max cache lifetime changes while
> hostrecords are live. But this can be assumed rare, I guess.

yeah, I don't think it is important for the extremely minor impact it could have on the data... but I think its ok anyhow, mMaxCacheLifetime doesn't change - if the pref is updated the nsHostResolver is shutdown and a new one is created with the new value. I intentionally don't track age during shutdown - just when you're forced out of the cache to make room.

> 
> > +HISTOGRAM(DNS_CLEANUP_AGE, 1, 86400, 50, EXPONENTIAL, "DNS Cache Entry Age
> > at Removal Time (sec)")
> 
> This is actually recorded in minutes, not seconds.  You may want to change
> the docstring and the bounds based on that....

thanks!

> 
> Is there a good reason to assume this cannot be longer than a day?

it can be, but as a data point they are the same to me (~infinite) and telem will clamp it for me.. this means my exponential buckets can be spread across the more interesting range.
> 
> Also, is there a good reason to assume that dns resolves will not take more
> than 60 seconds?

same reason - that's pretty much inifinity from the pov of the data.
Comment 4 Patrick McManus [:mcmanus] 2011-12-14 07:41:50 PST
Created attachment 581636 [details] [diff] [review]
patch 2
Comment 5 Boris Zbarsky [:bz] 2011-12-14 10:09:06 PST
> and telem will clamp it for me

Ah, good.
Comment 6 Boris Zbarsky [:bz] 2011-12-14 10:11:20 PST
Comment on attachment 581636 [details] [diff] [review]
patch 2

Nix the stray printf that crept in, and r=me.
Comment 7 Patrick McManus [:mcmanus] 2011-12-16 18:38:10 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/9786d30f60b3
Comment 8 Matt Brubeck (:mbrubeck) 2011-12-17 09:32:42 PST
https://hg.mozilla.org/mozilla-central/rev/9786d30f60b3

Note You need to log in before you can comment on or make changes to this bug.