Closed Bug 1473550 Opened 7 years ago Closed 7 years ago

NETWORK_CACHE_METADATA_SIZE changed on June 15

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: chutten, Unassigned)

Details

We have detected a change in the Telemetry probe NETWORK_CACHE_METADATA_SIZE in Firefox Nightly builds from 2018-06-15. Alert details: http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/1166/alerts/?from=2018-06-15&to=2018-06-15 Changes new to Nightly builds on 2018-06-15: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=91db0c695f0272f00bf92c81c471a85101056d96&tochange=5b2a9b683f20297ecede5c083c60627d9ea24765 The value of NETWORK_CACHE_METADATA_SIZE over time: https://telemetry.mozilla.org/new-pipeline/evo.html#!measure=NETWORK_CACHE_METADATA_SIZE If you have any problems, please ask for help on the #telemetry IRC channel or on Slack in #fx-metrics. We'll give you a hand. What this is: We have a system called cerberus[1] that compares Telemetry collected on different Nightly builds and looks for sudden changed in value distributions using the Bhattacharyya Distance[2]. It found such a change in NETWORK_CACHE_METADATA_SIZE on 2018-06-15 so it asked its buddy medusa[3] to send an email to the dev-telemetry-alerts mailing list and to all email addresses listed in the alert_emails field of NETWORK_CACHE_METADATA_SIZE's definition. [1]: https://github.com/mozilla/cerberus [2]: https://en.wikipedia.org/wiki/Bhattacharyya_distance [3]: https://github.com/mozilla/medusa
The change is particularly noticeable with this view: https://mzl.la/2KJnPs4 Specific questions: 1) Is this change intentional? 2) Is it a good change or a bad change? 3) Is this probe still measuring something important?
(In reply to Chris H-C :chutten from comment #1) > The change is particularly noticeable with this view: https://mzl.la/2KJnPs4 Interesting :) This looks like either a bug in how we collect the telemetry [1] or a bug in the telemetry back-end. The noise is gone! It indicates some artificial kind of error. > > Specific questions: > > 1) Is this change intentional? No. > 2) Is it a good change or a bad change? Probably bad. > 3) Is this probe still measuring something important? I think yes. Michal, can you think of any change to cache2 that may caused this? It seems the data is build-id bound and not submission-date bound (Chris?) Only change for the data is [2] bu that is totally unrelated. [1] https://searchfox.org/mozilla-central/rev/1193ef6a61cb6e350460eb2e8468184d3cb0321d/netwerk/cache2/CacheFileMetadata.cpp#808 [2] https://hg.mozilla.org/mozilla-central/rev/287bdf729c79
Flags: needinfo?(michal.novotny)
Flags: needinfo?(chutten)
On 28-10-2017 there was a similar spike for 75 percentile. This may really be only a change in an SSL cert chain of google, facebook or some other high profile site that makes the metadata larger. It could also be bigger cookies. I am changing the answer to "is this bad" to "no".
What I'm seeing is consistent with a code-specific effect and moderately inconsistent with an environmental effect. If we presume a Large Internet Company made a change which inflicted this upon us we'd expect to see a gradual change in the build_id aggregates across all channels. Beta users, for instance, don't update particularly quickly so we'd expect to see several build ids showing the effect (as several build ids would have population experiencing the effect). The plot shows Beta experiencing the effect only starting from the June 18th beta (which suggests an uplift): https://mzl.la/2KUjjHp So I'm afraid it's a bit unlikely that it's environmental. Also of note is that the noise isn't 100% gone, it's just diminished. There's still _just_ a hint of the ~3KB mode present on the graph (peaking at 0.16% per bucket, historically it was as high as 0.33%).
Flags: needinfo?(chutten)
(In reply to Honza Bambas (:mayhemer) from comment #2) > Michal, can you think of any change to cache2 that may caused this? It > seems the data is build-id bound and not submission-date bound (Chris?) No, I don't think we made any change that could cause this. I checked few entries in my current cache and the biggest part of the metadata is still security info. It seems the security info increases over time. I remember that when we created NETWORK_CACHE_METADATA_SIZE probe, metadata size of HTTPS entries was about 3.5kB, but now it's 8kB which is above the limit of NETWORK_CACHE_METADATA_SIZE. I'll file a bug to increase max value of NETWORK_CACHE_METADATA_SIZE and NETWORK_CACHE_METADATA_FIRST_READ_SIZE as well as kMinMetadataRead constant in CacheFileMetadata.
Flags: needinfo?(michal.novotny)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
(In reply to Michal Novotny (:michal) from comment #5) > (In reply to Honza Bambas (:mayhemer) from comment #2) > > Michal, can you think of any change to cache2 that may caused this? It > > seems the data is build-id bound and not submission-date bound (Chris?) > > No, I don't think we made any change that could cause this. I checked few > entries in my current cache and the biggest part of the metadata is still > security info. It seems the security info increases over time. I remember > that when we created NETWORK_CACHE_METADATA_SIZE probe, metadata size of > HTTPS entries was about 3.5kB, but now it's 8kB which is above the limit of > NETWORK_CACHE_METADATA_SIZE. I'll file a bug to increase max value of > NETWORK_CACHE_METADATA_SIZE and NETWORK_CACHE_METADATA_FIRST_READ_SIZE as > well as kMinMetadataRead constant in CacheFileMetadata. Makes sense, thanks. This could actually be an NSS/PSM bug or simply a change in how big servers (facebook/google etc..) server intermediate certs and chains.
(In reply to Franziskus Kiefer [:franziskus] from comment #7) > Could this be the reason [1]? > > [1] https://hg.mozilla.org/mozilla-central/rev/5d47226b6b8c I did a quick test and it seems this patch doesn't affect security info size.
Even if the connection uses TLS session resumption?
(In reply to Dana Keeler [:keeler] (she/her) (use needinfo) from comment #9) > Even if the connection uses TLS session resumption? I cannot easily find out whether the connection used TLS session resumption. I did another test when I logged storing security info to the cache entry. I loaded (and reloaded) the same sites with and without the change in bug 1465562. After parsing the logs I can see that without the patch from bug 1465562 security info has the same size during the first load for given host but sometimes it's smaller during subsequent loads. So yes, that patch had some effect on metadata size. But in my case only 13 entries out of 240 were affected.
You need to log in before you can comment on or make changes to this bug.