Last Comment Bug 731901 - update.status telemetry ping is not reporting correctly
: update.status telemetry ping is not reporting correctly
Product: Toolkit
Classification: Components
Component: Application Update (show other bugs)
: 12 Branch
: x86_64 Windows 7
: -- normal (vote)
: mozilla13
Assigned To: Brian R. Bondy [:bbondy]
: Robert Strong [:rstrong] (use needinfo to contact me)
Depends on: 708123 711054
Blocks: 715942
  Show dependency treegraph
Reported: 2012-02-29 19:12 PST by Brian R. Bondy [:bbondy]
Modified: 2012-03-29 17:09 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Patch v1. (1.48 KB, patch)
2012-02-29 20:10 PST, Brian R. Bondy [:bbondy]
no flags Details | Diff | Splinter Review
Patch v2. (1.48 KB, patch)
2012-03-01 04:47 PST, Brian R. Bondy [:bbondy]
no flags Details | Diff | Splinter Review
Patch v3. (1.48 KB, patch)
2012-03-01 08:53 PST, Brian R. Bondy [:bbondy]
no flags Details | Diff | Splinter Review
Patch v4. (1.48 KB, patch)
2012-03-01 09:11 PST, Brian R. Bondy [:bbondy]
ehsan: review+
Details | Diff | Splinter Review
Patch v1. Update JS telemetry ping name. (1.00 KB, patch)
2012-03-09 07:21 PST, Brian R. Bondy [:bbondy]
robert.strong.bugs: review+
Details | Diff | Splinter Review
Aurora patch (1.48 KB, patch)
2012-03-09 11:11 PST, Brian R. Bondy [:bbondy]
netzen: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Brian R. Bondy [:bbondy] 2012-02-29 19:12:03 PST
Update.status telemetry pings are not being reported correctly currently.

It looks as though there are a lot of memory errors (old error code 1).
But really it is just because all of the errors happening are in bucket #1. 

The current code is: 

> "Updater: the status of the latest update performed")

This means that there are 27 buckets from 1 to 16006 and that there are 592 status codes per bucket. I.e. the bucketizing of the accumulated telemetry values should be distributed linearly across the buckets.

Instead we should have 1 bucket per error code so they show up correctly.  Since there are a lot of unused space in error codes though I'll fix that in a different bug and land that first.
Comment 1 Brian R. Bondy [:bbondy] 2012-02-29 20:10:11 PST
Created attachment 601863 [details] [diff] [review]
Patch v1.
Comment 2 Brian R. Bondy [:bbondy] 2012-02-29 20:30:04 PST
By the way I know the error codes only go up to 30, but I put 31 because any larger values automatically go into the largest bucket.  So that way we won't accidentally attribute fake manually set error codes as error 30.  Ditto regarding anyone using an older build with the older status codes.
Comment 3 Nathan Froyd [:froydnj] 2012-03-01 04:23:11 PST
Comment on attachment 601863 [details] [diff] [review]
Patch v1.

Review of attachment 601863 [details] [diff] [review]:

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +318,5 @@
>  /**
>   * Updater telemetry.
>   */
> +HISTOGRAM(UPDATE_STATUS, 1, 31, 31, LINEAR, "Updater: the status of the latest update performed")

Please rename the histogram so that we don't wind up confusing old bucketed data with new.  (The server side will probably be much happier, too.)
Comment 4 Brian R. Bondy [:bbondy] 2012-03-01 04:47:52 PST
Created attachment 601926 [details] [diff] [review]
Patch v2.

My saving grace was that error code 1 and 0 were the only ones being reported before and error code 1 should no longer be reported. But I agree that a new status code is best.
Comment 5 Brian R. Bondy [:bbondy] 2012-03-01 06:04:39 PST
> But I agree that a new status code is best.

err I meant new name.
Comment 6 Nathan Froyd [:froydnj] 2012-03-01 07:04:52 PST
Thanks.  Do you think it's worthwhile to make the # of buckets bigger (~40-50) just to ensure we don't have to change it straightaway after we add a new error code or two?
Comment 7 Brian R. Bondy [:bbondy] 2012-03-01 08:53:40 PST
Created attachment 601993 [details] [diff] [review]
Patch v3.

> Do you think it's worthwhile to make the # of buckets bigger (~40-50) 

Ya might as well
Comment 8 Nathan Froyd [:froydnj] 2012-03-01 09:03:38 PST
Comment on attachment 601993 [details] [diff] [review]
Patch v3.

Review of attachment 601993 [details] [diff] [review]:

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +318,5 @@
>  /**
>   * Updater telemetry.
>   */
> +HISTOGRAM(UPDATER_STATUS_CODES, 1, 50, 50, LINEAR, "Updater: the status of the latest update performed")

Last comment, promise. :)

If you want evenly spaced buckets-of-1, you need to request 51 buckets, as there's an implicit 0 bucket tacked on to the bottom.  See, for instance, the A11Y_CONSUMERS histograms.

Otherwise you'll end up with a bucket somewhere (depending on the math, usually at the high end) that holds two values.  Doesn't make a difference now, but it would at some unspecified future point.
Comment 9 Brian R. Bondy [:bbondy] 2012-03-01 09:11:40 PST
Created attachment 601996 [details] [diff] [review]
Patch v4.

I thought the implicitly added 0 bucket would be implicitly counted as well.  But thanks for catching this.  Adjusted.
Comment 11 Matt Brubeck (:mbrubeck) 2012-03-06 11:10:04 PST
Comment 12 Brian R. Bondy [:bbondy] 2012-03-09 07:20:04 PST
Re-opening since this is not working correctly.
Comment 13 Brian R. Bondy [:bbondy] 2012-03-09 07:21:33 PST
Created attachment 604408 [details] [diff] [review]
Patch v1. Update JS telemetry ping name.

ugh... forgot to update the telemetry name in js after the name change in Comment 3 :(.
Comment 15 Brian R. Bondy [:bbondy] 2012-03-09 11:11:40 PST
Created attachment 604468 [details] [diff] [review]
Aurora patch

[Approval Request Comment]
Regression caused by (bug #): 708123
User impact if declined: We will not have visibility into errors people are getting with updater.
Testing completed (on m-c, etc.): The first patch already landed on m-c for a couple days but silently failed due to using the wrong name when reporting the telemetry which is fixed with the second patch.  Both patches are aggregated in this Aurora patch.
Risk to taking this patch (and alternatives if risky): Extremely low
String changes made by this patch: None
Comment 16 Alex Keybl [:akeybl] 2012-03-09 12:08:46 PST
Comment on attachment 604468 [details] [diff] [review]
Aurora patch

[Triage Comment]
Improving telemetry data with low risk fixes = a+.
Comment 17 Marco Bonardo [::mak] 2012-03-10 02:51:43 PST
Comment 18 Brian R. Bondy [:bbondy] 2012-03-10 06:05:26 PST
Uplifted to mozilla12 aurora
Comment 19 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-29 15:01:10 PDT
Is this something QA can help verify?
Comment 20 Brian R. Bondy [:bbondy] 2012-03-29 16:56:38 PDT
I verified that this is working on the telemetry dashboard.
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-29 17:09:25 PDT
Thanks Brian

Note You need to log in before you can comment on or make changes to this bug.