High limit of NETWORK_CACHE_METADATA_SIZE and NETWORK_CACHE_METADATA_FIRST_READ_SIZE probes is too low

RESOLVED FIXED in Firefox 66

Status

()

defect
P3
normal
RESOLVED FIXED
8 months ago
4 months ago

People

(Reporter: michal, Assigned: michal)

Tracking

Trunk
mozilla66
Points:
---

Firefox Tracking Flags

(firefox64 wontfix, firefox65 wontfix, firefox66 fixed)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 attachment, 3 obsolete attachments)

Assignee

Description

8 months ago
Metadata size has increased over time and most of the reports are now outside the range.
Assignee

Comment 1

8 months ago
Posted patch patch (obsolete) — Splinter Review
Attachment #9014437 - Flags: review?(honzab.moz)
Attachment #9014437 - Flags: review?(francois)
Comment on attachment 9014437 [details] [diff] [review]
patch

Review of attachment 9014437 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFileMetadata.cpp
@@ -719,5 @@
>    if (mFirstRead) {
>      Telemetry::AccumulateTimeDelta(
>        Telemetry::NETWORK_CACHE_METADATA_FIRST_READ_TIME_MS, mReadStart);
> -    Telemetry::Accumulate(
> -      Telemetry::NETWORK_CACHE_METADATA_FIRST_READ_SIZE, mBufSize);

can you explain (maybe in the bug) why this is being removed?

::: toolkit/components/telemetry/Histograms.json
@@ +10648,5 @@
>      "n_buckets": 50,
>      "description": "Time spent to read the missing part of the metadata from the cache entry file."
>    },
> +  "NETWORK_CACHE_METADATA_SIZE_2": {
> +    "record_in_processes": ["main", "content"],

why "content"?
Attachment #9014437 - Flags: review?(honzab.moz) → review+
Assignee

Comment 3

8 months ago
Posted patch patch v2 (obsolete) — Splinter Review
(In reply to Honza Bambas (:mayhemer) from comment #2)
> > -    Telemetry::Accumulate(
> > -      Telemetry::NETWORK_CACHE_METADATA_FIRST_READ_SIZE, mBufSize);
> 
> can you explain (maybe in the bug) why this is being removed?

The probe doesn't provide any useful data. We try to read at least 1kB and we align the read to 4kB boundary. So if the file is large enough the value is between 1-5kB. The telemetry data shows that 3.3% of entries are smaller than 1kB and the values between 1-5kB are evenly distributed.

> > +  "NETWORK_CACHE_METADATA_SIZE_2": {
> > +    "record_in_processes": ["main", "content"],
> 
> why "content"?

I just copied it from the previous probe. I removed it in the new patch. I also added alert_emails.
Attachment #9014437 - Attachment is obsolete: true
Attachment #9014437 - Flags: review?(francois)
Attachment #9014606 - Flags: review?(francois)
Comment on attachment 9014606 [details] [diff] [review]
patch v2

Review of attachment 9014606 [details] [diff] [review]:
-----------------------------------------------------------------

Please also request a data review for this since I don't think the original probe ever had one done: https://wiki.mozilla.org/Firefox/Data_Collection#Step_1:_Submit_Request

::: toolkit/components/telemetry/Histograms.json
@@ +10657,2 @@
>      "n_buckets": 256,
>      "description": "Actual size of the metadata parsed from the disk."

What's the unit? Bytes?

If so, can you please add that to the description?

::: toolkit/components/telemetry/histogram-whitelists.json
@@ +868,3 @@
>      "NETWORK_CACHE_METADATA_FIRST_READ_TIME_MS",
>      "NETWORK_CACHE_METADATA_SECOND_READ_TIME_MS",
> +    "NETWORK_CACHE_METADATA_SIZE_2",

Why not just add a bug_numbers field in Histograms.json? That way, it doesn't have to continue to be whitelisted.

It looks like the original bug number is https://bugzilla.mozilla.org/show_bug.cgi?id=1133739 so you could have:

    "bug_numbers": [ 1133739, 1495336 ]
Attachment #9014606 - Flags: review?(francois) → review-
Assignee

Comment 5

8 months ago
Posted patch patch v3 (obsolete) — Splinter Review
Attachment #9014606 - Attachment is obsolete: true
Attachment #9014689 - Flags: review?(francois)
Chris, I noticed that this probe is whitelisted for "n_buckets". I couldn't find information about this in the telemetry docs (https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#n-buckets). Do you know the maximum value that should be used for that field? 100?

Is this something you'd recommend changing in this probe too?
Flags: needinfo?(chutten)
QA Contact: jduell.mcbugs

Comment 7

8 months ago
The maximum value for n_buckets/n_values is 100. Lemme file a bug for specifying that in the docs. (bug 1496789)

As for this probes in particular...

NETWORK_CACHE_METADATA_SIZE[1] is a linear probe with a high of 5*1024 and 256 buckets, resulting in uniform bucket width of 20 (what units are they? When are they measured? How often?). From beta 63 data [2] the first bucket with any data in it at all is the 82-102 bucket, and a whopping 84.77% of all samples are in the overflow bucket ( > 5K ).

My recommendation would be to have devs survey their own metadata sizes to find out what "typical" values are and then adjust the low, high, and n_buckets values according to the data needs as documented in the data review (which is being handled elsewhere?)

Ultimately I expect this probe will benefit from being exponential instead of linear, and having a "low" value of, let's say, 100. (values < low are still recorded, but they're all in the same underflow bucket). I recommend the use of the Histogram Simulator to help tune the probe for bucket widths and limits... for instance, n_buckets of 100, low of 100, high of 15000 looks like this with data uniformly distributed between 0 and 15000: https://telemetry.mozilla.org/histogram-simulator/#low=100&high=20000&n_buckets=100&kind=exponential&generate=uniform

The number of buckets influences the size of a recorded histogram in memory, on disk, and in transit (though we do have a sparse representation for disk and transmission). If we can keep to at most 100 buckets, that'd be nice, and we can remove it from the whitelist. Having more buckets doesn't mean we can make more meaningful decisions, after all.

tl;dr - What questions are we hoping to answer with this measure? What are typical values? Will the questions require high-resolution measures? Maybe it could be better replaced with a full-resolution scalar tracking the maximum size of metadata read that subsession?

[1]: https://telemetry.mozilla.org/probe-dictionary/?search=NETWORK_CACHE&detailView=histogram%2FNETWORK_CACHE_METADATA_SIZE
[2]: https://mzl.la/2pAC7SV
Flags: needinfo?(chutten)
QA Contact: jduell.mcbugs
Comment on attachment 9014689 [details] [diff] [review]
patch v3

Review of attachment 9014689 [details] [diff] [review]:
-----------------------------------------------------------------

I imagine you may not have time to do implement all of the things that Chris proposed in comment 7, but let's at least reduce the bucket size to 100 and remove this probe from the whitelist entirely.

Also, don't forget to:

- fill in a data review request (see link in comment 4)
- get someone to review the code changes in this patch (I'm only looking at Histograms.json)
Attachment #9014689 - Flags: review?(francois) → review-
Assignee

Comment 9

5 months ago
Posted patch patch v4Splinter Review

(In reply to François Marier [:francois] from comment #8)

I imagine you may not have time to do implement all of the things that Chris
proposed in comment 7, but let's at least reduce the bucket size to 100 and
remove this probe from the whitelist entirely.

Linear probe is what we want. I decided to increase the high value, so now it's very unlikely that reported size will overflow. The resolution will be 1kB with 66 buckets, which is fine.

Also, don't forget to:

  • get someone to review the code changes in this patch (I'm only looking at
    Histograms.json)

The code didn't change since patch v1, so no new review is needed for those changes.

  • fill in a data review request (see link in comment 4)
  1. What questions will you answer with this data?
  • We can optimize first read size when reading metadata.
  • It will detect when kMaxElementsSize in CacheFileMetadata.cpp isn't big enough.
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?
  • Faster cache entry opening.
  • If metadata starts to grow for whatever reason, we need to know about it, otherwise we might stop caching entries with too large metadata.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?
  • We need data from real users.
  1. Can current instrumentation answer these questions?
  • No
  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.
  1. How long will this data be collected? Choose one of the following:
  • I want to permanently monitor this data. (Michal Novotny)
  1. What populations will you measure?
  • All Firefox users.
  1. If this data collection is default on, what is the opt-out mechanism for users?
  • There is no specific opt-out possibility just for this probe, so general out-out mechanism applies.
  1. Please provide a general description of how you will analyze this data.
  • Basic analysis on TMO measurement dashboard.
  1. Where do you intend to share the results of your analysis?
  • The data would be used to file appropriate bug.
Attachment #9014689 - Attachment is obsolete: true
Attachment #9035388 - Flags: review?(chutten)

Preliminary notes:

Category 3 is for Web Activity data that is considered sensitive. The size of the cache metadata is a technical detail, not a personal or sensitive one, so is Category 1.

Also, your data request is for all firefox users but your definition of NETWORK_CACHE_METADATA_SIZE_2 is for prerelease channels only. If you'd like it to be collected in release you'll need to add "releaseChannelCollection": "opt-out", to the definition. This Data Review covers whether you want it for all channels or just prerelease, so no need to refile.

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is Telemetry so is documented in its definitions file (Histograms.json), the Probe Dictionary, and on telemetry.mozilla.org's Measurement Dashboards.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes. :michal is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.


Result: datareview+

Comment on attachment 9035388 [details] [diff] [review]
patch v4

Review of attachment 9035388 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #9035388 - Flags: review?(chutten) → review+
Assignee

Comment 12

4 months ago

Also, your data request is for all firefox users but your definition of NETWORK_CACHE_METADATA_SIZE_2 is for prerelease channels only. If you'd like it to be collected in release you'll need to add "releaseChannelCollection": "opt-out", to the definition. This Data Review covers whether you want it for all channels or just prerelease, so no need to refile.

Thanks for pointing this out, I think prerelease should be sufficient.

Keywords: checkin-needed

Comment 13

4 months ago

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd696bc79dff
High limit of NETWORK_CACHE_METADATA_SIZE and NETWORK_CACHE_METADATA_FIRST_READ_SIZE probes is too low. r=chutten

Keywords: checkin-needed

Comment 14

4 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

Did you want to uplift this to Beta so 65 has the new probe? If so, please nominate ASAP.

Flags: needinfo?(michal.novotny)
Assignee

Comment 16

4 months ago

No need to uplift this patch.

Flags: needinfo?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.