Closed Bug 1151812 Opened 4 years ago Closed 4 years ago

Add telemetry to find optimal cache entry hash size

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox40 --- affected
firefox41 --- fixed

People

(Reporter: michal, Assigned: michal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

No description provided.
Attached patch patch (obsolete) — Splinter Review
The initial plan was to add a telemetry probe that would gather the ideal hash size continuously, but this is IMO not possible because to get the smallest possible hash size for a newly added entry it must be compared with all existing entries in the index which is very CPU intensive task.

Another option was to get the hash statistics just once (when the cache is big enough to give some useful data) during reading of the index from disk. I.e. compare every newly parsed entry with all already added entries before it is added to the index in CacheIndex::ParseRecords(). But this was affecting startup because we parse the index during startup and the CPU intensive hash comparison was not easy to interrupt in case some more priority task needed Cache IO thread.

So I ended up with the attached patch that does:

 - It grabs the hash statistics just once and only if the index contains at least 15000 entries.
 - Gathering of the data is delayed by 40 seconds after the index is read from the disk to not affect startup time.
 - It iterates over all entries in the hash table and compares the current entry with all following entries.
 - It runs with the lowest priority and regularly checks whether some higher priority event is pending.
 - It stores the hash statistics in the array instead of adding the telemetry continuously, so we shouldn't have partial reports in telemetry. The telemetry is reported at once at the end when all entries were iterated.
 - This code is not intended to stay in the tree. We should land it on nightly for about 2-3 weeks and then back it out once we collect enough reports.
Attachment #8595354 - Flags: review?(honzab.moz)
How complicated would be to just have an array referencing all hashes (nsTArray<SHA1Sum::Hash*>), then have a custom comparator to sort them and then just walk one by one and 'subtract' each two neighbors?  You would go from O(N^2) to O(N) easily.
Attachment #8595354 - Flags: review?(honzab.moz)
Comment 2
Flags: needinfo?(michal.novotny)
Attached patch patchSplinter Review
I was sceptical about the speed of sorting the array, but it is surprisingly fast. Sorting of 30000 records takes only 15ms on my computer.
Attachment #8595354 - Attachment is obsolete: true
Flags: needinfo?(michal.novotny)
Attachment #8613734 - Flags: review?(honzab.moz)
Comment on attachment 8613734 [details] [diff] [review]
patch

Review of attachment 8613734 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab, please address the comments

::: netwerk/cache2/CacheIndex.cpp
@@ +3651,5 @@
> +{
> +  const uint32_t *h1 = reinterpret_cast<const uint32_t *>(aHash1);
> +  const uint32_t *h2 = reinterpret_cast<const uint32_t *>(aHash2);
> +
> +  for (uint32_t i=0 ; i<5; i++) {

please have spaces around operators
a bit more preferred is ++i

::: netwerk/cache2/CacheObserver.h
@@ +65,5 @@
>      { return sCacheFSReported; }
>    static void SetCacheFSReported();
> +  static bool const HashStatsReported()
> +    { return sHashStatsReported; }
> +  static void SetHashStatsReported();

where is this used?

should probably be called at the bottom of CacheIndex::ReportHashStats()
Attachment #8613734 - Flags: review?(honzab.moz) → review+
https://hg.mozilla.org/mozilla-central/rev/5bab2734cb2e
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.