Last Comment Bug 735768 - make persistent telemetry more robust
: make persistent telemetry more robust
Status: RESOLVED FIXED
[qa-]
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla14
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
: 735927 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-14 11:05 PDT by Nathan Froyd [:froydnj]
Modified: 2012-05-30 10:42 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
part 1 - robustify C++ side (1.64 KB, patch)
2012-03-14 11:45 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
vladan.bugzilla: checkin+
Details | Diff | Review
part 2 - robustify JS side (2.52 KB, patch)
2012-03-14 11:46 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
vladan.bugzilla: checkin+
Details | Diff | Review
part 3 - add tests (10.64 KB, patch)
2012-03-14 11:48 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
vladan.bugzilla: checkin+
Details | Diff | Review
telemetry submission rates post-patch (78.77 KB, image/png)
2012-03-21 07:49 PDT, Nathan Froyd [:froydnj]
no flags Details
part 1 - robustify C++ side, aurora version (1.57 KB, patch)
2012-03-21 08:02 PDT, Nathan Froyd [:froydnj]
nfroyd: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Review
part 2 - robustify JS side, aurora version (2.43 KB, patch)
2012-03-21 08:02 PDT, Nathan Froyd [:froydnj]
nfroyd: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Review
part 3 - add tests, aurora version (10.45 KB, patch)
2012-03-21 08:05 PDT, Nathan Froyd [:froydnj]
nfroyd: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Review

Description Nathan Froyd [:froydnj] 2012-03-14 11:05:38 PDT
There are some potential problems lurking in persistent telemetry that need to be addressed.  Patches incoming.
Comment 1 Nathan Froyd [:froydnj] 2012-03-14 11:45:12 PDT
Created attachment 605877 [details] [diff] [review]
part 1 - robustify C++ side

We should have been checking for what happened when we reflected persisted histograms into JS.  Just skip corrupt histograms as we do for normal telemetry.
Comment 2 Nathan Froyd [:froydnj] 2012-03-14 11:46:25 PDT
Created attachment 605878 [details] [diff] [review]
part 2 - robustify JS side

We should make sure on the JS side that if something goes wrong, we basically abort sending the persisted data.
Comment 3 Nathan Froyd [:froydnj] 2012-03-14 11:48:06 PDT
Created attachment 605879 [details] [diff] [review]
part 3 - add tests

Should have done this initially.  Needs some tweaks to the IDL interface, but otherwise straightforward.

Testing for corruption here is hard.  It can happen during testing, but it doesn't happen very often.  In any event, having tests makes me feel better about everything not falling apart.
Comment 4 Mozilla RelEng Bot 2012-03-14 12:19:32 PDT
Autoland Patchset:
	Patches: 605877, 605878, 605879
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=0505978c2624
Try run started, revision 0505978c2624. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624
Comment 5 (dormant account) 2012-03-14 17:08:37 PDT
Comment on attachment 605878 [details] [diff] [review]
part 2 - robustify JS side

should record this corruption via telemetry
Comment 6 Nathan Froyd [:froydnj] 2012-03-14 17:53:56 PDT
Try run looks pretty green, marking checkin-needed.

(In reply to Taras Glek (:taras) from comment #5)
> should record this corruption via telemetry

If I do this, I'll do it in a followup bug, but we already have telemetry for corrupt histograms in a session, so doing corruption counting for persisted histograms just sounds like double-counting for little benefit.
Comment 7 Matthew N. [:MattN] (behind on reviews) 2012-03-14 18:19:11 PDT
*** Bug 735927 has been marked as a duplicate of this bug. ***
Comment 8 Mozilla RelEng Bot 2012-03-14 22:31:36 PDT
Try run for 0505978c2624 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624
Results (out of 216 total builds):
    exception: 1
    success: 171
    warnings: 44
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-0505978c2624
Comment 11 Nathan Froyd [:froydnj] 2012-03-21 07:49:26 PDT
Created attachment 607951 [details]
telemetry submission rates post-patch

Here's a graph of what happened to our telemetry submission rates after we fixed this.  Putting this here for evidence of Aurora uplift goodness.
Comment 12 Nathan Froyd [:froydnj] 2012-03-21 08:02:07 PDT
Created attachment 607954 [details] [diff] [review]
part 1 - robustify C++ side, aurora version

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Comment 13 Nathan Froyd [:froydnj] 2012-03-21 08:02:47 PDT
Created attachment 607955 [details] [diff] [review]
part 2 - robustify JS side, aurora version

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Comment 14 Nathan Froyd [:froydnj] 2012-03-21 08:05:08 PDT
Created attachment 607956 [details] [diff] [review]
part 3 - add tests, aurora version

This patch is optional; the tests are merely there for peace of mind.  The previous two patches are the important ones.

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: None.
Testing completed (on m-c, etc.): On m-c.
Risk to taking this patch (and alternatives if risky): Epsilon.
Comment 15 Lukas Blakk [:lsblakk] use ?needinfo 2012-03-21 15:05:52 PDT
Comment on attachment 607954 [details] [diff] [review]
part 1 - robustify C++ side, aurora version

[Triage Comment]
Very nice. Happy to see this and thank you for the screenshot to prove the data is being positively affected.
Comment 16 Lukas Blakk [:lsblakk] use ?needinfo 2012-03-21 15:10:10 PDT
Please continue to monitor the data after this lands, let's check in a week or so from now and ensure there are no regressions.
Comment 18 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-29 10:49:05 PDT
Is there something QA can do to verify this fix?
Comment 19 Nathan Froyd [:froydnj] 2012-05-30 07:03:57 PDT
I don't think there's a good STR for this.
Comment 20 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-30 10:42:06 PDT
Okay, thanks Nathan. If something comes to light which we can use to test please mark this [qa+].

Note You need to log in before you can comment on or make changes to this bug.