make persistent telemetry more robust

RESOLVED FIXED in Firefox 13

Status

()

Toolkit
Telemetry
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: froydnj, Assigned: froydnj)

Tracking

unspecified
mozilla14
Points:
---

Firefox Tracking Flags

(firefox13 fixed)

Details

(Whiteboard: [qa-])

Attachments

(7 attachments)

(Assignee)

Description

5 years ago
There are some potential problems lurking in persistent telemetry that need to be addressed.  Patches incoming.
(Assignee)

Comment 1

5 years ago
Created attachment 605877 [details] [diff] [review]
part 1 - robustify C++ side

We should have been checking for what happened when we reflected persisted histograms into JS.  Just skip corrupt histograms as we do for normal telemetry.
Attachment #605877 - Flags: review?(taras.mozilla)
(Assignee)

Comment 2

5 years ago
Created attachment 605878 [details] [diff] [review]
part 2 - robustify JS side

We should make sure on the JS side that if something goes wrong, we basically abort sending the persisted data.
Attachment #605878 - Flags: review?(taras.mozilla)
(Assignee)

Updated

5 years ago
Attachment #605878 - Attachment is patch: true
(Assignee)

Comment 3

5 years ago
Created attachment 605879 [details] [diff] [review]
part 3 - add tests

Should have done this initially.  Needs some tweaks to the IDL interface, but otherwise straightforward.

Testing for corruption here is hard.  It can happen during testing, but it doesn't happen very often.  In any event, having tests makes me feel better about everything not falling apart.
Attachment #605879 - Flags: review?(taras.mozilla)
(Assignee)

Updated

5 years ago
Whiteboard: [autoland-try:-b do -p all -u all -t none]

Updated

5 years ago
Whiteboard: [autoland-try:-b do -p all -u all -t none] → [autoland-in-queue]

Comment 4

5 years ago
Autoland Patchset:
	Patches: 605877, 605878, 605879
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=0505978c2624
Try run started, revision 0505978c2624. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624

Updated

5 years ago
Attachment #605877 - Flags: review?(taras.mozilla) → review+

Comment 5

5 years ago
Comment on attachment 605878 [details] [diff] [review]
part 2 - robustify JS side

should record this corruption via telemetry
Attachment #605878 - Flags: review?(taras.mozilla) → review+

Updated

5 years ago
Attachment #605879 - Flags: review?(taras.mozilla) → review+
(Assignee)

Comment 6

5 years ago
Try run looks pretty green, marking checkin-needed.

(In reply to Taras Glek (:taras) from comment #5)
> should record this corruption via telemetry

If I do this, I'll do it in a followup bug, but we already have telemetry for corrupt histograms in a session, so doing corruption counting for persisted histograms just sounds like double-counting for little benefit.
Keywords: checkin-needed
Duplicate of this bug: 735927

Comment 8

5 years ago
Try run for 0505978c2624 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624
Results (out of 216 total builds):
    exception: 1
    success: 171
    warnings: 44
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-0505978c2624

Updated

5 years ago
Whiteboard: [autoland-in-queue]

Updated

5 years ago
Assignee: nobody → nfroyd
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8f18e891c9a
https://hg.mozilla.org/integration/mozilla-inbound/rev/49a7eafa791f
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ee7f4594ba9
Keywords: checkin-needed
Target Milestone: --- → mozilla14
Attachment #605877 - Flags: checkin+
Attachment #605878 - Flags: checkin+
Attachment #605879 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/d8f18e891c9a
https://hg.mozilla.org/mozilla-central/rev/49a7eafa791f
https://hg.mozilla.org/mozilla-central/rev/4ee7f4594ba9
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 11

5 years ago
Created attachment 607951 [details]
telemetry submission rates post-patch

Here's a graph of what happened to our telemetry submission rates after we fixed this.  Putting this here for evidence of Aurora uplift goodness.
(Assignee)

Comment 12

5 years ago
Created attachment 607954 [details] [diff] [review]
part 1 - robustify C++ side, aurora version

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Attachment #607954 - Flags: review+
Attachment #607954 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 13

5 years ago
Created attachment 607955 [details] [diff] [review]
part 2 - robustify JS side, aurora version

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Attachment #607955 - Flags: review+
Attachment #607955 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 14

5 years ago
Created attachment 607956 [details] [diff] [review]
part 3 - add tests, aurora version

This patch is optional; the tests are merely there for peace of mind.  The previous two patches are the important ones.

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: None.
Testing completed (on m-c, etc.): On m-c.
Risk to taking this patch (and alternatives if risky): Epsilon.
Attachment #607956 - Flags: review+
Attachment #607956 - Flags: approval-mozilla-aurora?
Comment on attachment 607954 [details] [diff] [review]
part 1 - robustify C++ side, aurora version

[Triage Comment]
Very nice. Happy to see this and thank you for the screenshot to prove the data is being positively affected.
Attachment #607954 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #607955 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #607956 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Please continue to monitor the data after this lands, let's check in a week or so from now and ensure there are no regressions.
(Assignee)

Comment 17

5 years ago
Pushed to aurora:

https://hg.mozilla.org/releases/mozilla-aurora/rev/2fdf7be9bd2c
https://hg.mozilla.org/releases/mozilla-aurora/rev/f30faa038ccb
https://hg.mozilla.org/releases/mozilla-aurora/rev/fe19b0150074
status-firefox13: --- → fixed
Is there something QA can do to verify this fix?
Whiteboard: [qa?]
(Assignee)

Comment 19

5 years ago
I don't think there's a good STR for this.
Okay, thanks Nathan. If something comes to light which we can use to test please mark this [qa+].
Whiteboard: [qa?] → [qa-]
You need to log in before you can comment on or make changes to this bug.