Closed Bug 1131600 Opened 5 years ago Closed 5 years ago

Add telemetry probes to get detailed disk cache hit rate

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: michal, Assigned: michal)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached patch patch v1 (obsolete) — Splinter Review
No description provided.
Attachment #8562100 - Flags: review?(honzab.moz)
Comment on attachment 8562100 [details] [diff] [review]
patch v1

Patrick, have a look at NETWORK_CACHE_HIT_RATIO_PER_CACHE_SIZE probe. Is this something you were talking about last meeting?
Attachment #8562100 - Flags: feedback?(mcmanus)
Comment on attachment 8562100 [details] [diff] [review]
patch v1

Review of attachment 8562100 [details] [diff] [review]:
-----------------------------------------------------------------

This patch looks good, as far as I can understand it (not sure I can evaluate the probe data correct, tho).

Michal, as discussed on IRC, please give your constant limits, local variables and methods better names.  Please, split the Telemetry::NETWORK_CACHE_HIT_RATIO_PER_CACHE_SIZE second argument calculation so that it's clear what it means and what data and how we exactly collect.  Also, most of the code needs comments, definitely both probe calculations and definitely the |GetHitRate(uint32_t aBuckets) const;| mysterious method.

Then it would be nice to somehow avoid using a mutex.  We don't need to be super accurate here and if you use uintptr_t for counters we get atomicity access for free (but of course not arithmetic side effects atomicity, which is tho not critical for telemetry).  Maybe lock only when you are reporting, best would be swap the array locally under the lock and then report outside the lock.

Thanks.

::: netwerk/cache2/CacheFileUtils.h
@@ +89,5 @@
>  
> +
> +class DetailedCacheHitTelemetry {
> +public:
> +  static void AddRecord(bool aIsHit);

please have an enum class, plain bools are obsolete

@@ +97,5 @@
> +  public:
> +    HitStats();
> +
> +    void     AddRecord(bool aIsHit);
> +    uint32_t GetHitRate(uint32_t aBuckets) const;

some explanation of the argument would be good

::: netwerk/cache2/CacheIndex.cpp
@@ +1304,5 @@
> +
> +  nsRefPtr<CacheIndex> index = gInstance;
> +
> +  if (!index)
> +    return NS_ERROR_NOT_INITIALIZED;

{ }
Attachment #8562100 - Flags: review?(honzab.moz) → review-
Comment on attachment 8562100 [details] [diff] [review]
patch v1

Review of attachment 8562100 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

::: netwerk/cache2/CacheFileUtils.cpp
@@ +505,5 @@
> +
> +  for (uint32_t i = 0; i < kNumOfRanges; ++i) {
> +    if (sHitStats[i].Count() >= kHitRateMinRecords) {
> +      Telemetry::Accumulate(Telemetry::NETWORK_CACHE_HIT_RATIO_PER_CACHE_SIZE,
> +                            i * kHitRateBuckets +

maybe i * kNumRanges instead of i * kHitRateBuckets?

::: netwerk/cache2/CacheFileUtils.h
@@ +106,5 @@
> +    uint32_t mHitCnt;
> +    uint32_t mMissCnt;
> +  };
> +
> +  static StaticMutex sLock;

please document what data the lock protects

@@ +124,5 @@
> +  static const uint32_t kHitRateMinRecords = 500;
> +  static const uint32_t kHitRateBuckets = 20;
> +
> +  static uint32_t sRecordCnt;
> +  static HitStats sHitStats[kNumOfRanges];

sHitStats has unfortunate case-normalized imlpications. I'd rather have something else :)

::: toolkit/components/telemetry/Histograms.json
@@ +6520,5 @@
> +  "NETWORK_CACHE_HIT_RATIO_PER_CACHE_SIZE": {
> +    "expires_in_version": "never",
> +    "kind": "enumerated",
> +    "n_values": 400,
> +    "description": "Hit ratio (in 5% steps) split by cache size in file count (0=hit ratio 0-5% for cache size 0-5000, 1=hit ratio 5-10% for cache size 0-5000, ..., 20=hit ratio 0-5% for cache size 5001-10000, ...)"

It took me a while to work out this description.. maybe a longer comment - even if not part of the json - would help. e.g. cache size of sample in bucket I = (i / 20) * 5000, hit ratio of sample in bucket I = (i % 20) * 5
Attachment #8562100 - Flags: feedback?(mcmanus) → feedback+
Attached patch patch v2Splinter Review
Honza, Patrick, please have a look at the new comments and feel free to come with a better description if it's still hard to understand.


(In reply to Honza Bambas (:mayhemer) from comment #2)
> Then it would be nice to somehow avoid using a mutex.  We don't need to be
> super accurate here and if you use uintptr_t for counters we get atomicity
> access for free (but of course not arithmetic side effects atomicity, which
> is tho not critical for telemetry).  Maybe lock only when you are reporting,
> best would be swap the array locally under the lock and then report outside
> the lock.

Correct me if I'm wrong, AFAIK Telemetry::Accumulate() is not thread-safe so we need the lock here anyway.


> ::: netwerk/cache2/CacheFileUtils.cpp
> @@ +505,5 @@
> > +
> > +  for (uint32_t i = 0; i < kNumOfRanges; ++i) {
> > +    if (sHitStats[i].Count() >= kHitRateMinRecords) {
> > +      Telemetry::Accumulate(Telemetry::NETWORK_CACHE_HIT_RATIO_PER_CACHE_SIZE,
> > +                            i * kHitRateBuckets +
> 
> maybe i * kNumRanges instead of i * kHitRateBuckets?

That was correct but I decided to change the order of the values in this new patch since it's IMO more readable to have the same hit rate for a different cache sizes grouped together.


> @@ +124,5 @@
> > +  static const uint32_t kHitRateMinRecords = 500;
> > +  static const uint32_t kHitRateBuckets = 20;
> > +
> > +  static uint32_t sRecordCnt;
> > +  static HitStats sHitStats[kNumOfRanges];
> 
> sHitStats has unfortunate case-normalized imlpications. I'd rather have
> something else :)

Renamed :) I didn't see it and it took me a while to get what you mean.
Attachment #8562100 - Attachment is obsolete: true
Attachment #8563632 - Flags: review?(honzab.moz)
Attachment #8563632 - Flags: feedback?(mcmanus)
Comment on attachment 8563632 [details] [diff] [review]
patch v2

Review of attachment 8563632 [details] [diff] [review]:
-----------------------------------------------------------------

wrt the lock, you only seem to use it in one function.. can that method be called on multiple threads? (I wouldn't think so, but I don't know.) If not, nothing else is going to respect the lock anyhow :) but if there is concurrent access to your new telemetry probe then you do need the lock.
Attachment #8563632 - Flags: feedback?(mcmanus) → feedback+
Attached patch patch v3Splinter Review
Attachment #8565367 - Flags: review?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #5)
> wrt the lock, you only seem to use it in one function.. can that method be
> called on multiple threads? (I wouldn't think so, but I don't know.) If not,
> nothing else is going to respect the lock anyhow :) but if there is
> concurrent access to your new telemetry probe then you do need the lock.

The method is usually called on Cache IO thread but it can be also called on the main thread from NotifyCacheFileListenerEvent::Run(), so the lock is needed. BTW telemetry probes in CacheEntry::OnFileReady() are not protected by any lock, so I moved them into DetailedCacheHitTelemetry.

Patrick, please do the review since Honza won't be available next 2 weeks.
Comment on attachment 8565367 [details] [diff] [review]
patch v3

Review of attachment 8565367 [details] [diff] [review]:
-----------------------------------------------------------------

ship it

::: netwerk/cache2/CacheFileUtils.cpp
@@ +494,5 @@
> +    ++hitMissValue;
> +  }
> +
> +  StaticMutexAutoLock lock(sLock);
> +  

whitespace nit
Attachment #8565367 - Flags: review?(mcmanus) → review+
Still needed my r on v2?
Flags: needinfo?(michal.novotny)
(In reply to Honza Bambas (:mayhemer) from comment #9)
> Still needed my r on v2?

Patrick's r+ is enough. I want to land it after bug #1128339 since that's the order in my patch queue. So if you have some free time you could help by doing the review of the new patch in bug #1128339.
Flags: needinfo?(michal.novotny)
Attachment #8563632 - Flags: review?(honzab.moz)
https://hg.mozilla.org/mozilla-central/rev/4c63c386c39e
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Depends on: 1147973
You need to log in before you can comment on or make changes to this bug.