Last Comment Bug 763359 - Telemetry does not always put samples into the correct bucket
: Telemetry does not always put samples into the correct bucket
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: unspecified
: All All
: -- major (vote)
: ---
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
Depends on: 763351 764190 764260 764585 781200
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-10 17:14 PDT by Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
Modified: 2012-09-12 03:12 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
-
-
-


Attachments

Description Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-10 17:14:00 PDT
Please see my investigation in bug 763351.

I have found that if you have a linear histogram with range (1, 5) and five buckets, the value "3" will always get put into bucket 2, instead of bucket "3" like you would expect. That is, the values "2" and "3" will get put into the same bucket.

In the case of bug 763351, we were using the buckets to act like enums; value "3" means something completely different than value "2" (in fact, in that probe, the values have almost completely opposite meanings).

In general, if there is a telemetry histogram with range (1, X) with X buckets, we expect each distinct value to be reported in its own bucket.

I have seen other telemetry probes using the same kind of pattern so there are likely other probes being affected by this.

Based on the investigation of bug 763351, it seems like this has been an issue since at least October, if not since the beginning of Telemetry data collection.

I am nominating this to be tracked on all channels because it needs to be fixed ASAP in order for us to meaningfully compare some significant telemetry data across releases.
Comment 1 Nathan Froyd [:froydnj] 2012-06-10 18:48:36 PDT
Please see HISTOGRAM_ENUMERATED_VALUES in TelemetryHistograms.h, introduced in bug 732053.  It would have been a good idea to file bugs on all the enum-values LINEAR_HISTOGRAMS that couldn't be converted when HISTOGRAM_ENUMERATED_VALUES was introduced, but I didn't do that. :(
Comment 2 Alex Keybl [:akeybl] 2012-06-13 13:24:24 PDT
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Please see HISTOGRAM_ENUMERATED_VALUES in TelemetryHistograms.h, introduced
> in bug 732053.  It would have been a good idea to file bugs on all the
> enum-values LINEAR_HISTOGRAMS that couldn't be converted when
> HISTOGRAM_ENUMERATED_VALUES was introduced, but I didn't do that. :(

Are you planning on filing these bugs now? Wouldn't this be a good meta bug for tracking?

Is there any reason to believe this bug isn't as concerning as bsmith suggests?
Comment 3 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-13 13:36:10 PDT
(In reply to Alex Keybl [:akeybl] from comment #2)
> Are you planning on filing these bugs now? Wouldn't this be a good meta bug
> for tracking?

I filed bugs for the Javascript case (
> 
> Is there any reason to believe this bug isn't as concerning as bsmith
> suggests?

I filed bug 764184 and bug 764190 in addition to bug 763351. For the other probes, it is OK for there to be bucket collisions.

Because multiple teams got tripped up by this, I highly recommend that we change the API so that one **cannot** create a linear histogram from (1-n) with n buckets, by adding a MOZ_STATIC_ASSERT for that condition. For the case where this is a false positive, the person who defines the telemetry can just change the bucket count to (n+1). For the case where this check would prevent a problem, it will avoid collecting bad data.
Comment 4 Nathan Froyd [:froydnj] 2012-06-13 14:25:23 PDT
(In reply to Brian Smith (:bsmith) from comment #3)
> Because multiple teams got tripped up by this, I highly recommend that we
> change the API so that one **cannot** create a linear histogram from (1-n)
> with n buckets, by adding a MOZ_STATIC_ASSERT for that condition. For the
> case where this is a false positive, the person who defines the telemetry
> can just change the bucket count to (n+1). For the case where this check
> would prevent a problem, it will avoid collecting bad data.

This seems reasonable; I filed bug 764585 for this.  I suppose fixing that bug requires modifying all the other histograms anyway...
Comment 5 Marco Bonardo [::mak] 2012-06-14 08:16:22 PDT
likely we may just fix the number of buckets here, the thing we care most is that this measure stays low (under 5), so the actual value has less importance. We will just get more precice infos once the fix gets out in the wild.
Comment 6 Marco Bonardo [::mak] 2012-06-14 08:16:47 PDT
oops, comment 5 was intended for bug 764190, sorry.
Comment 7 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-19 13:31:05 PDT
Seems like this is part of work Nathan will be doing so assigning to him. Feel free to move to another assignee (that isn't nobody) if it makes more sense. Just trying to make sure our tracking 14+ bugs have owners at this point in the cycle.
Comment 8 Nathan Froyd [:froydnj] 2012-06-19 13:41:09 PDT
It seems to me that a metabug shouldn't necessarily be tracked for 14; the individual blocking bugs should be, and then those bugs can be tracked with a release based on the importance of information gained from them.

Going to shuffle the dependencies to make them clearer.
Comment 9 Nathan Froyd [:froydnj] 2012-09-12 03:12:03 PDT
Fixed all the wrongly-specified histograms and put stricter checks in place.  Closing.

Note You need to log in before you can comment on or make changes to this bug.