Closed Bug 763351 Opened 13 years ago Closed 13 years ago

kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
firefox13 --- wontfix
firefox14 + wontfix
firefox15 + fixed
firefox16 + fixed

People

(Reporter: briansmith, Assigned: briansmith)

References

Details

Attachments

(4 files, 1 obsolete file)

Look at the telemetry histograms for HTTP_CACHE_DISPOSITION for all dates prior to today (i.e. over the entire time we have been collecting telemetry). The values (at the time I looked) are: kCacheHit (1): 32.48% kCacheHitViaReval (2): 16.20% kCacheMissed (4): 51.32% Note that these add up to exactly 100%. Note that kCacheMissedViaReval (3) is COMPLETELY missing--not just low, but NEVER reported, AFAICT. What this telemetry is saying that is, if we have a cache entry, 100% of the time it will revalidate successfully--i.e. no server has EVER returned anything other than a 304 response to any of our conditional requests. That seems impossible to me. (I could believe the number would be CLOSE to zero; but, it cannot be zero.)
This patch contains an xpcshell test + some logging in nsHttpChannel that shows that it is at least POSSIBLE for us to get kCacheMissedViaReval. If you run the test with NSPR_LOG_MODULES=nsHTTP:5, then the log output should include: 0[31f940]: CACHE DISPOSITION: 4 0[31f940]: CACHE DISPOSITION: 3 0[31f940]: CACHE DISPOSITION: 4 0[31f940]: CACHE DISPOSITION: 3 0[31f940]: CACHE DISPOSITION: 4 0[31f940]: CACHE DISPOSITION: 3 So, I think the problem is not in how we calculate the cache disposition telemetry value. The problem is either in telemetry itself or somewhere else in nsHttpChannel. I had thoughe the most likely situation is that, when we need to do validation, we're switching to heuristic validation, or we're throwing the entry away. But, in that case, I would expect my simple test in this attachment to never output CACHE DISPOSITION: 3. It is possible that my test here is not "real world" enough and that my suspicion is still correct.
Note that the attachment contains a fix for bug 763342, in addition to the work done to investigate this bug.
In the debugger, I found that telemetry is putting values 2 and 3 into bucket 2. I wrote a simple test server in python that always returns a 200 response with an ETag and Last-Modified. Thus, reloading this page should cause kCacheMissedViaReval all the time. My logging from the first patch shows that indeed we do calculate the disposition to be kCacheMissedViaReval (3). I made the following change: -HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 5, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss") HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 4, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss") and with that change, kCacheMissedViaReval (3) is reported. Now I am starting to suspect that there might be a bug in Telemetry itself. In the debugger, I noticed some oddity in histogram.cc. the ranges_[] array for this histogram is { 0, 1, 2, 4, 5, 2147483647 }; This does not make sense; there should be a "3" in there between 2 and 4 because the range of the 4th bucket is exactly (3, 3). Consequently, I believe this is a bug in telemetry and I will file it, blocking this bug.
Attached file Python test server
Here is my python test server that I used to create a reproducable test for this. Run it with "python testserver.py" and then point firefox to http://localhost:8000. Reload the page a bunch of times. The above patch that adds the logging of the cache disposition may also be helpful.
In reply to Nathan Froyd (:froydnj) from bug 732053 comment #2 on 2012-03-08 (three months ago): > Histograms that could benefit from this macro [HISTOGRAM_ENUMERATED_VALUES]: > > GC_REASON > HTTP_*_DISPOSITION > CACHE_*_SEARCH (odd to have a linear histogram for ms histogram, especially > one so finely bucketed) > CACHE_SERVICE_LOCK_WAIT* (same) > PLACES_EXPIRATION_STEPS_TO_CLEAN > DNS_LOOKUP_METHOD > UPDATE_STATUS > > but most of those would involve a small tweak to the histogram (either in # > of buckets or upper bound); doesn't seem worth the effort So, we need to switch to HISTOGRAM_ENUMERATED_VALUES for this histogram and others, and discard most/all of the data from these probes.
Summary: kCacheMissedViaReval (3) is neer reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION) → kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)
Ugh--what a mess. Thanks for noticing this and looking into it.
Note we should rename the histograms when we discard data.
(In reply to Brian Smith (:bsmith) from comment #2) Nick, I didn't write an automated test for this, but you can verify that it is working by browsing around a little bit and then observing that you have positive counts for values 1, 2, 3, and 4 for HTTP_CACHE_DISPOSITION_2 and the other DISPOSITION_2 probes. But, see also bug 763342, where the device-specific telemetry will only show cache hits. I left the max at 5 even though there are only 4 distinct valid values now. I am not sure how much of a difference this makes, but AFAICT it shouldn't hurt.
Assignee: nobody → bsmith
Status: NEW → ASSIGNED
Attachment #632537 - Flags: review?(hurley)
Marking as tracking all unreleased versions because we need this telemetry to be corrected ASAP.
Target Milestone: --- → mozilla16
Comment on attachment 632537 [details] [diff] [review] Fix cache hit rate telemetry Review of attachment 632537 [details] [diff] [review]: ----------------------------------------------------------------- Ship it!
Attachment #632537 - Flags: review?(hurley) → review+
Blocks: 763359
No longer depends on: 763359
This appears to have an r+. Anything holding us back from landing on m-c, and then nominating for Aurora/Beta yet?
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 632537 [details] [diff] [review] Fix cache hit rate telemetry [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 763359 User impact if declined: Wrong telemetry in Aurora and Beta branches Testing completed (on m-c, etc.): Landed on m-c last week. Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes. String or UUID changes made by this patch: NONE.
Attachment #632537 - Flags: approval-mozilla-beta?
Attachment #632537 - Flags: approval-mozilla-aurora?
Comment on attachment 632537 [details] [diff] [review] Fix cache hit rate telemetry [Triage Comment] This missed our last beta. Let's get this into Aurora 15 first.
Attachment #632537 - Flags: approval-mozilla-beta?
Attachment #632537 - Flags: approval-mozilla-beta-
Attachment #632537 - Flags: approval-mozilla-aurora?
Attachment #632537 - Flags: approval-mozilla-aurora+
Attached patch Fix for mozilla-aurora (obsolete) — Splinter Review
A different version of the patch is needed for mozilla-aurora because mozilla-central and mozilla-aurora diverged.
Attachment #640704 - Flags: review?(hurley)
Attachment #640704 - Attachment is obsolete: true
Attachment #640704 - Flags: review?(hurley)
Attachment #640705 - Flags: review?(hurley)
Attachment #640705 - Flags: review?(hurley) → review+
(In reply to Brian Smith (:bsmith) from comment #17) > Created attachment 640705 [details] [diff] [review] > Fix for aurora [v2] Please nominate this patch for mozilla-beta approval since 15 is now on Beta.
Comment on attachment 640705 [details] [diff] [review] Fix for aurora [v2] [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 763359 User impact if declined: Wrong telemetry in Aurora and Beta branches Testing completed (on m-c, etc.): Already in m-c and m-a for a while. Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes. String or UUID changes made by this patch: NONE.
Attachment #640705 - Flags: approval-mozilla-beta?
Comment on attachment 640705 [details] [diff] [review] Fix for aurora [v2] [Triage Comment] More/better data is good - approving for Beta 15.
Attachment #640705 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
It hasn't landed in Beta 15.
Let's get this in this week please, we'll want to know this has landed in Beta before the next one gets built.
Nick - can you land this on beta?
Oddly, it looks as if this has already landed on beta: http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5 Not sure when it happened, exactly
The beta landing is here, this bug apparently wasn't update properly. http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: