Note: There are a few cases of duplicates in user autocompletion which are being worked on.

kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)

RESOLVED FIXED in Firefox 15

Status

()

Core
Networking: HTTP
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: briansmith, Assigned: briansmith)

Tracking

unspecified
mozilla16
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox13 wontfix, firefox14+ wontfix, firefox15+ fixed, firefox16+ fixed)

Details

Attachments

(4 attachments, 1 obsolete attachment)

Look at the telemetry histograms for HTTP_CACHE_DISPOSITION for all dates prior to today (i.e. over the entire time we have been collecting telemetry).

The values (at the time I looked) are:

kCacheHit (1): 32.48%
kCacheHitViaReval (2): 16.20%
kCacheMissed (4): 51.32%

Note that these add up to exactly 100%.

Note that kCacheMissedViaReval (3) is COMPLETELY missing--not just low, but NEVER reported, AFAICT.

What this telemetry is saying that is, if we have a cache entry, 100% of the time it will revalidate successfully--i.e. no server has EVER returned anything other than a 304 response to any of our conditional requests. That seems impossible to me. (I could believe the number would be CLOSE to zero; but, it cannot be zero.)
Created attachment 631777 [details] [diff] [review]
Test validation with non-304 responses with logging

This patch contains an xpcshell test + some logging in nsHttpChannel that shows that it is at least POSSIBLE for us to get kCacheMissedViaReval. If you run the test with NSPR_LOG_MODULES=nsHTTP:5, then the log output should include:

0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3
0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3
0[31f940]: CACHE DISPOSITION: 4
0[31f940]: CACHE DISPOSITION: 3

So, I think the problem is not in how we calculate the cache disposition telemetry value. The problem is either in telemetry itself or somewhere else in nsHttpChannel. I had thoughe the most likely situation is that, when we need to do validation, we're switching to heuristic validation, or we're throwing the entry away. But, in that case, I would expect my simple test in this attachment to never output CACHE DISPOSITION: 3. It is possible that my test here is not "real world" enough and that my suspicion is still correct.
Note that the attachment contains a fix for bug 763342, in addition to the work done to investigate this bug.
In the debugger, I found that telemetry is putting values 2 and 3 into bucket 2.

I wrote a simple test server in python that always returns a 200 response with an ETag and Last-Modified. Thus, reloading this page should cause kCacheMissedViaReval all the time. My logging from the first patch shows that indeed we do calculate the disposition to be kCacheMissedViaReval (3).

I made the following change:

-HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 5, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss")
HISTOGRAM(HTTP_CACHE_DISPOSITION, 1, 4, 5, LINEAR, "HTTP Cache Hit, Reval, Failed-Reval, Miss")

and with that change, kCacheMissedViaReval (3) is reported.

Now I am starting to suspect that there might be a bug in Telemetry itself.

In the debugger, I noticed some oddity in histogram.cc. the ranges_[] array for this histogram is { 0, 1, 2, 4, 5, 2147483647 }; This does not make sense; there should be a "3" in there between 2 and 4 because the range of the 4th bucket is exactly (3, 3).

Consequently, I believe this is a bug in telemetry and I will file it, blocking this bug.
Created attachment 631784 [details]
Python test server

Here is my python test server that I used to create a reproducable test for this. Run it with "python testserver.py" and then point firefox to http://localhost:8000. Reload the page a bunch of times. The above patch that adds the logging of the cache disposition may also be helpful.
Depends on: 763359
In reply to Nathan Froyd (:froydnj) from bug 732053 comment #2 on 2012-03-08 (three months ago):
> Histograms that could benefit from this macro [HISTOGRAM_ENUMERATED_VALUES]:
> 
> GC_REASON
> HTTP_*_DISPOSITION
> CACHE_*_SEARCH (odd to have a linear histogram for ms histogram, especially
> one so finely bucketed)
> CACHE_SERVICE_LOCK_WAIT* (same)
> PLACES_EXPIRATION_STEPS_TO_CLEAN
> DNS_LOOKUP_METHOD
> UPDATE_STATUS
> 
> but most of those would involve a small tweak to the histogram (either in #
> of buckets or upper bound); doesn't seem worth the effort

So, we need to switch to HISTOGRAM_ENUMERATED_VALUES for this histogram and others, and discard most/all of the data from these probes.
Summary: kCacheMissedViaReval (3) is neer reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION) → kCacheMissedViaReval (3) is never reported in cache disposition telemetry (e.g. HTTP_CACHE_DISPOSITION)
Ugh--what a mess.  Thanks for noticing this and looking into it.

Comment 7

5 years ago
Note we should rename the histograms when we discard data.
Depends on: 764190
Blocks: 764260
Created attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry

(In reply to Brian Smith (:bsmith) from comment #2)
Nick, I didn't write an automated test for this, but you can verify that it is working by browsing around a little bit and then observing that you have positive counts for values 1, 2, 3, and 4 for HTTP_CACHE_DISPOSITION_2 and the other DISPOSITION_2 probes. But, see also bug 763342, where the device-specific telemetry will only show cache hits.

I left the max at 5 even though there are only 4 distinct valid values now. I am not sure how much of a difference this makes, but AFAICT it shouldn't hurt.
Assignee: nobody → bsmith
Status: NEW → ASSIGNED
Attachment #632537 - Flags: review?(hurley)
Marking as tracking all unreleased versions because we need this telemetry to be corrected ASAP.
status-firefox13: --- → affected
status-firefox14: --- → affected
status-firefox15: --- → affected
status-firefox16: --- → affected
tracking-firefox14: --- → ?
tracking-firefox15: --- → ?
tracking-firefox16: --- → ?
Target Milestone: --- → mozilla16
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry

Review of attachment 632537 [details] [diff] [review]:
-----------------------------------------------------------------

Ship it!
Attachment #632537 - Flags: review?(hurley) → review+

Updated

5 years ago
tracking-firefox14: ? → +
tracking-firefox15: ? → +
tracking-firefox16: ? → +
Blocks: 763359
No longer depends on: 763359
This appears to have an r+. Anything holding us back from landing on m-c, and then nominating for Aurora/Beta yet?
https://hg.mozilla.org/integration/mozilla-inbound/rev/5a2f50db17b3
https://hg.mozilla.org/mozilla-central/rev/5a2f50db17b3
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

5 years ago
status-firefox16: affected → fixed
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 763359

User impact if declined: Wrong telemetry in Aurora and Beta branches

Testing completed (on m-c, etc.): Landed on m-c last week.

Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes.

String or UUID changes made by this patch: NONE.
Attachment #632537 - Flags: approval-mozilla-beta?
Attachment #632537 - Flags: approval-mozilla-aurora?
Comment on attachment 632537 [details] [diff] [review]
Fix cache hit rate telemetry

[Triage Comment]
This missed our last beta. Let's get this into Aurora 15 first.
Attachment #632537 - Flags: approval-mozilla-beta?
Attachment #632537 - Flags: approval-mozilla-beta-
Attachment #632537 - Flags: approval-mozilla-aurora?
Attachment #632537 - Flags: approval-mozilla-aurora+
Created attachment 640704 [details] [diff] [review]
Fix for mozilla-aurora

A different version of the patch is needed for mozilla-aurora because mozilla-central and mozilla-aurora diverged.
Attachment #640704 - Flags: review?(hurley)
Created attachment 640705 [details] [diff] [review]
Fix for aurora [v2]
Attachment #640704 - Attachment is obsolete: true
Attachment #640704 - Flags: review?(hurley)
Attachment #640705 - Flags: review?(hurley)

Updated

5 years ago
status-firefox14: affected → wontfix
Attachment #640705 - Flags: review?(hurley) → review+
(In reply to Brian Smith (:bsmith) from comment #17)
> Created attachment 640705 [details] [diff] [review]
> Fix for aurora [v2]

Please nominate this patch for mozilla-beta approval since 15 is now on Beta.
Comment on attachment 640705 [details] [diff] [review]
Fix for aurora [v2]

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 763359

User impact if declined: Wrong telemetry in Aurora and Beta branches

Testing completed (on m-c, etc.): Already in m-c and m-a for a while.

Risk to taking this patch (and alternatives if risky): Low risk. This only modifies telemetry probes.

String or UUID changes made by this patch: NONE.
Attachment #640705 - Flags: approval-mozilla-beta?
Comment on attachment 640705 [details] [diff] [review]
Fix for aurora [v2]

[Triage Comment]
More/better data is good - approving for Beta 15.
Attachment #640705 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 21

5 years ago
It hasn't landed in Beta 15.
Let's get this in this week please, we'll want to know this has landed in Beta before the next one gets built.

Updated

5 years ago
status-firefox13: affected → wontfix

Comment 23

5 years ago
Nick - can you land this on beta?
Oddly, it looks as if this has already landed on beta: http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5 Not sure when it happened, exactly
status-firefox15: affected → fixed

Comment 25

5 years ago
The beta landing is here, this bug apparently wasn't update properly.

http://hg.mozilla.org/releases/mozilla-beta/rev/99faf1860ca5
You need to log in before you can comment on or make changes to this bug.