Closed
Bug 763351
Opened 13 years ago
Closed 13 years ago
kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
RESOLVED
FIXED
mozilla16
People
(Reporter: briansmith, Assigned: briansmith)
References
Details
Attachments
(4 files, 1 obsolete file)
6.13 KB,
patch
|
Details | Diff | Splinter Review | |
1.39 KB,
text/plain
|
Details | |
4.74 KB,
patch
|
u408661
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
6.48 KB,
patch
|
u408661
:
review+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Look at the telemetry histograms for HTTP_CACHE_DISPOSITION for all dates prior to today (i.e. over the entire time we have been collecting telemetry).
The values (at the time I looked) are:
kCacheHit (1): 32.48%
kCacheHitViaReval (2): 16.20%
kCacheMissed (4): 51.32%
Note that these add up to exactly 100%.
Note that kCacheMissedViaReval (3) is COMPLETELY missing--not just low, but NEVER reported, AFAICT.
What this telemetry is saying that is, if we have a cache entry, 100% of the time it will revalidate successfully--i.e. no server has EVER returned anything other than a 304 response to any of our conditional requests. That seems impossible to me. (I could believe the number would be CLOSE to zero; but, it cannot be zero.)
Assignee | ||
Comment 1•13 years ago
|
||
This patch contains an xpcshell test + some logging in nsHttpChannel that shows that it is at least POSSIBLE for us to get kCacheMissedViaReval. If you run the test with NSPR_LOG_MODULES=nsHTTP:5, then the log output should include:
0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3
0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3
0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3
So, I think the problem is not in how we calculate the cache disposition telemetry value. The problem is either in telemetry itself or somewhere else in nsHttpChannel. I had thoughe the most likely situation is that, when we need to do validation, we're switching to heuristic validation, or we're throwing the entry away. But, in that case, I would expect my simple test in this attachment to never output CACHE DISPOSITION: 3. It is possible that my test here is not "real world" enough and that my suspicion is still correct.
Assignee | ||
Comment 2•13 years ago
|
||
Note that the attachment contains a fix for bug 763342, in addition to the work done to investigate this bug.
Assignee | ||
Comment 3•13 years ago
|
||
In the debugger, I found that telemetry is putting values 2 and 3 into bucket 2.
I wrote a simple test server in python that always returns a 200 response with an ETag and Last-Modified. Thus, reloading this page should cause kCacheMissedViaReval all the time. My logging from the first patch shows that indeed we do calculate the disposition to be kCacheMissedViaReval (3).
I made the following change:
-HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 5, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss")
HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 4, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss")
and with that change, kCacheMissedViaReval (3) is reported.
Now I am starting to suspect that there might be a bug in Telemetry itself.
In the debugger, I noticed some oddity in histogram.cc. the ranges_[] array for this histogram is { 0, 1, 2, 4, 5, 2147483647 }; This does not make sense; there should be a "3" in there between 2 and 4 because the range of the 4th bucket is exactly (3, 3).
Consequently, I believe this is a bug in telemetry and I will file it, blocking this bug.
Assignee | ||
Comment 4•13 years ago
|
||
Here is my python test server that I used to create a reproducable test for this. Run it with "python testserver.py" and then point firefox to http://localhost:8000. Reload the page a bunch of times. The above patch that adds the logging of the cache disposition may also be helpful.
Assignee | ||
Comment 5•13 years ago
|
||
In reply to Nathan Froyd (:froydnj) from bug 732053 comment #2 on 2012-03-08 (three months ago):
> Histograms that could benefit from this macro [HISTOGRAM_ENUMERATED_VALUES]:
>
> GC_REASON
> HTTP_*_DISPOSITION
> CACHE_*_SEARCH (odd to have a linear histogram for ms histogram, especially
> one so finely bucketed)
> CACHE_SERVICE_LOCK_WAIT* (same)
> PLACES_EXPIRATION_STEPS_TO_CLEAN
> DNS_LOOKUP_METHOD
> UPDATE_STATUS
>
> but most of those would involve a small tweak to the histogram (either in #
> of buckets or upper bound); doesn't seem worth the effort
So, we need to switch to HISTOGRAM_ENUMERATED_VALUES for this histogram and others, and discard most/all of the data from these probes.
Summary: kCacheMissedViaReval (3) is neer reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION) → kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)
Comment 6•13 years ago
|
||
Ugh--what a mess. Thanks for noticing this and looking into it.
Comment 7•13 years ago
|
||
Note we should rename the histograms when we discard data.
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #2)
Nick, I didn't write an automated test for this, but you can verify that it is working by browsing around a little bit and then observing that you have positive counts for values 1, 2, 3, and 4 for HTTP_CACHE_DISPOSITION_2 and the other DISPOSITION_2 probes. But, see also bug 763342, where the device-specific telemetry will only show cache hits.
I left the max at 5 even though there are only 4 distinct valid values now. I am not sure how much of a difference this makes, but AFAICT it shouldn't hurt.
Assignee | ||
Comment 9•13 years ago
|
||
Marking as tracking all unreleased versions because we need this telemetry to be corrected ASAP.
status-firefox13:
--- → affected
status-firefox14:
--- → affected
status-firefox15:
--- → affected
status-firefox16:
--- → affected
tracking-firefox14:
--- → ?
tracking-firefox15:
--- → ?
tracking-firefox16:
--- → ?
Target Milestone: --- → mozilla16
Comment 10•13 years ago
|
||
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry
Review of attachment 632537 [details] [diff] [review]:
-----------------------------------------------------------------
Ship it!
Attachment #632537 -
Flags: review?(hurley) → review+
Updated•13 years ago
|
![]() |
||
Updated•13 years ago
|
Comment 11•13 years ago
|
||
This appears to have an r+. Anything holding us back from landing on m-c, and then nominating for Aurora/Beta yet?
Assignee | ||
Comment 12•13 years ago
|
||
Comment 13•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Assignee | ||
Comment 14•13 years ago
|
||
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 763359
User impact if declined: Wrong telemetry in Aurora and Beta branches
Testing completed (on m-c, etc.): Landed on m-c last week.
Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes.
String or UUID changes made by this patch: NONE.
Attachment #632537 -
Flags: approval-mozilla-beta?
Attachment #632537 -
Flags: approval-mozilla-aurora?
Comment 15•13 years ago
|
||
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry
[Triage Comment]
This missed our last beta. Let's get this into Aurora 15 first.
Attachment #632537 -
Flags: approval-mozilla-beta?
Attachment #632537 -
Flags: approval-mozilla-beta-
Attachment #632537 -
Flags: approval-mozilla-aurora?
Attachment #632537 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 16•13 years ago
|
||
A different version of the patch is needed for mozilla-aurora because mozilla-central and mozilla-aurora diverged.
Attachment #640704 -
Flags: review?(hurley)
Assignee | ||
Comment 17•13 years ago
|
||
Attachment #640704 -
Attachment is obsolete: true
Attachment #640704 -
Flags: review?(hurley)
Attachment #640705 -
Flags: review?(hurley)
Updated•13 years ago
|
Attachment #640705 -
Flags: review?(hurley) → review+
Comment 18•13 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #17)
> Created attachment 640705 [details] [diff] [review]
> Fix for aurora [v2]
Please nominate this patch for mozilla-beta approval since 15 is now on Beta.
Assignee | ||
Comment 19•13 years ago
|
||
Comment on attachment 640705 [details] [diff] [review]
Fix for aurora [v2]
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 763359
User impact if declined: Wrong telemetry in Aurora and Beta branches
Testing completed (on m-c, etc.): Already in m-c and m-a for a while.
Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes.
String or UUID changes made by this patch: NONE.
Attachment #640705 -
Flags: approval-mozilla-beta?
Comment 20•13 years ago
|
||
Comment on attachment 640705 [details] [diff] [review]
Fix for aurora [v2]
[Triage Comment]
More/better data is good - approving for Beta 15.
Attachment #640705 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 21•13 years ago
|
||
It hasn't landed in Beta 15.
Comment 22•13 years ago
|
||
Let's get this in this week please, we'll want to know this has landed in Beta before the next one gets built.
Comment 23•13 years ago
|
||
Nick - can you land this on beta?
Comment 24•13 years ago
|
||
Oddly, it looks as if this has already landed on beta: http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5 Not sure when it happened, exactly
Comment 25•13 years ago
|
||
The beta landing is here, this bug apparently wasn't update properly.
http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5
You need to log in
before you can comment on or make changes to this bug.
Description
•