Closed Bug 1312806 Opened 4 years ago Closed 4 years ago

Categorical histograms should default to 50 buckets

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: sfink, Assigned: gfritzsche)

References

Details

(Whiteboard: [measurement:client])

Attachments

(1 file, 6 obsolete files)

In reading through https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe I see that most of the time I want an enumerated histogram. But they're a pain, because they show numbers instead of string names.

From the description, I would much rather be using categorical histograms, except I never will because they disallow new values, and I never have anything where I'm confident enough that the set of possible values will stay completely fixed.

What if categorical histograms were allowed to have some number of placeholder labels? Then you'd allow placeholder labels to change their names to non-placeholder labels, but you'd still disallow non-placeholder names from changing. (I was originally thinking you could just allow some number of empty string categories, and allow empty string -> nonempty string transitions only, but I suppose you probably want them to be unique.)

Maybe make them numeric, and translate "1" -> Telemetry::LABEL_UNUSED_1? To be clear, I'm not suggesting that you have an infinite number of these allowed. I was thinking it'd be the same as setting n_buckets higher for enumerated histograms. It's fine if the user has to list them all out.
Roberto or Frank, is adding more buckets actually a problem right now for categorical histograms specifically?

If yes, do you see a problem pipeline-side if we just set the bucket count for all categorical histograms to a high value (strawman: 100), to allow future extension?
Client-side & storage-wise that would be fine as we store and serialize histograms sparsely.
Flags: needinfo?(rvitillo)
Flags: needinfo?(fbertsch)
Priority: -- → P3
Whiteboard: [measurement:client]
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> Roberto or Frank, is adding more buckets actually a problem right now for
> categorical histograms specifically?
Yes it is, we consider histogram definitions immutable.

> If yes, do you see a problem pipeline-side if we just set the bucket count
> for all categorical histograms to a high value (strawman: 100), to allow
> future extension?

I am not sure that doing this for all categorical histograms is the right thing to do but we certainly can provide the option to do so when a new categorical histogram is defined.
Flags: needinfo?(rvitillo)
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> I am not sure that doing this for all categorical histograms is the right
> thing to do but we certainly can provide the option to do so when a new
> categorical histogram is defined.

I would rather not require specifying buckets manually, so always reserving the maximum number of buckets would be a simple approach.
Unless this has pipeline impact.

The other easy option i can think of is to just require renaming the probe each time.
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> The other easy option i can think of is to just require renaming the probe
> each time.

I have enumerated histograms that I track across versions, even when I add new buckets to it (we're careful to keep existing values the same). Example: GC_MINOR_REASON.

Hmm... though looking at the dashboard, maybe the consistency isn't helping that much anyway. The "Histogram Dashboard" view is very handy, but only shows one Firefox version at a time. The "Evolution Dashboard" shows multiple versions, but I don't know what it's showing. Perhaps it's the mean numeric enumeration value divided by the number of submissions? At any rate, it appears that the evolution view is meaningless for enumerations. Is that accurate?

But it would be *nice* if I could track these across versions. For example, it could show a stacked bar chart of all the different enum values (or category names, if this bug is resolved somehow) in the evolution dashboard, so that you could see change in relative percentages over time. And it would be nice if, for example, we "split" one category into two subcategories (perhaps abandoning the original category), and we could see both before and after versions in one view.

Though I guess for some measures I wouldn't care about relative percentages, but instead would want a set of bars for each enum value, where the bars in the set correspond to different versions.

But those are for different bugs.
telemetry.mozilla.org is designed to be a generic view for generic histograms. If you have specialized needs, then specialized dashboards would be more appropriate.

The thing is, without good data regimens like yours, :sfink, we just wouldn't have the data to power any of these different, useful views. So keep up the good work, even if t.m.o isn't designed in a way that shows it off :S
Looks like Roberto answered this.
Flags: needinfo?(fbertsch)
With (In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> > I am not sure that doing this for all categorical histograms is the right
> > thing to do but we certainly can provide the option to do so when a new
> > categorical histogram is defined.
> 
> I would rather not require specifying buckets manually, so always reserving
> the maximum number of buckets would be a simple approach.
> Unless this has pipeline impact.

Roberto or Frank, can you answer on that part?
If there is no pipeline impact, just always reserving e.g. for a maximum of 50 buckets would be cleanest.
Flags: needinfo?(rvitillo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> With (In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> > (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #2)
> > > I am not sure that doing this for all categorical histograms is the right
> > > thing to do but we certainly can provide the option to do so when a new
> > > categorical histogram is defined.
> > 
> > I would rather not require specifying buckets manually, so always reserving
> > the maximum number of buckets would be a simple approach.
> > Unless this has pipeline impact.
> 
> Roberto or Frank, can you answer on that part?
> If there is no pipeline impact, just always reserving e.g. for a maximum of
> 50 buckets would be cleanest.

OK
Flags: needinfo?(rvitillo)
Summary: Categorical histograms are not expandable → Categorical histograms should always have 50 buckets
Assignee: nobody → gfritzsche
Priority: P3 → P1
We don't have any categorical histograms submitting yet, so lets quickly fix this now.
Attachment #8819885 - Flags: review?(alessio.placitelli)
Comment on attachment 8819885 [details] [diff] [review]
Categorical histograms should always have 50 buckets

Review of attachment 8819885 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good. However, should we note somewhere the reason why we're always having 50 buckets for the Categoricals? So we don't forget about it when backtracking stuff 3 months from now.

I'm not sure though where exactly this can be written. Maybe MDN or near CATEGORICAL_BUCKET_COUNT?
Attachment #8819885 - Flags: review?(alessio.placitelli) → review+
(In reply to Alessio Placitelli [:Dexter] from comment #11)
> This looks good. However, should we note somewhere the reason why we're
> always having 50 buckets for the Categoricals? So we don't forget about it
> when backtracking stuff 3 months from now.
> 
> I'm not sure though where exactly this can be written. Maybe MDN or near
> CATEGORICAL_BUCKET_COUNT?

Good point; we should do both.
Attachment #8819885 - Attachment is obsolete: true
Attachment #8819890 - Flags: review+
As i just looked over a bug that could use 48 buckets already, i'm changing the logic to:
* default categorical histograms to 50 buckets (+1 overflow)
* allow overriding this with n_values
Depends on: 1324477
Depends on: 1324478
Updated to the allow overriding the minimum of 50 buckets.
Attachment #8821506 - Flags: review?(alessio.placitelli)
Attachment #8819890 - Attachment is obsolete: true
Comment on attachment 8821506 [details] [diff] [review]
Categorical histograms should default to 50 buckets

Review of attachment 8821506 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good with the comment below addressed. Don't forget to update the MDN page with the new behaviour!

::: toolkit/components/telemetry/tests/unit/test_nsITelemetry.js
@@ +236,5 @@
> +
> +  // This histogram overrides the default of 50 values to 70.
> +  let h3 = Telemetry.getHistogramById("TELEMETRY_TEST_CATEGORICAL_NVALUES");
> +  for (let v of ["CommonLabel", "Label7", "Label8"]) {
> +    dump("* adding: " + v + "\n");

Is this a leftover from debugging?
Attachment #8821506 - Flags: review?(alessio.placitelli) → review+
Attachment #8821506 - Attachment is obsolete: true
Attachment #8821513 - Flags: review+
Summary: Categorical histograms should always have 50 buckets → Categorical histograms should default to 50 buckets
Keywords: checkin-needed
Needs rebasing.
Keywords: checkin-needed
Drive-by rebasing so that this can land on Aurora while :gfritzsche is on PTO.
Attachment #8822023 - Attachment is obsolete: true
Attached file categorical-buckets (obsolete) —
Rebased.
Attachment #8821513 - Attachment is obsolete: true
Attachment #8822523 - Flags: review+
Attachment #8822523 - Attachment is obsolete: true
Attachment #8822524 - Flags: review+
Attachment #8822524 - Attachment is patch: true
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd3ac71945c8
Categorical histograms should default to 50 buckets. r=dexter
https://hg.mozilla.org/mozilla-central/rev/cd3ac71945c8
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Blocks: 1324477, 1324478
No longer depends on: 1324477, 1324478
You need to log in before you can comment on or make changes to this bug.