Closed Bug 735768 Opened 8 years ago Closed 8 years ago

make persistent telemetry more robust

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox13 --- fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

(Whiteboard: [qa-])

Attachments

(7 files)

There are some potential problems lurking in persistent telemetry that need to be addressed.  Patches incoming.
We should have been checking for what happened when we reflected persisted histograms into JS.  Just skip corrupt histograms as we do for normal telemetry.
Attachment #605877 - Flags: review?(taras.mozilla)
We should make sure on the JS side that if something goes wrong, we basically abort sending the persisted data.
Attachment #605878 - Flags: review?(taras.mozilla)
Attachment #605878 - Attachment is patch: true
Should have done this initially.  Needs some tweaks to the IDL interface, but otherwise straightforward.

Testing for corruption here is hard.  It can happen during testing, but it doesn't happen very often.  In any event, having tests makes me feel better about everything not falling apart.
Attachment #605879 - Flags: review?(taras.mozilla)
Whiteboard: [autoland-try:-b do -p all -u all -t none]
Whiteboard: [autoland-try:-b do -p all -u all -t none] → [autoland-in-queue]
Autoland Patchset:
	Patches: 605877, 605878, 605879
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=0505978c2624
Try run started, revision 0505978c2624. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624
Attachment #605877 - Flags: review?(taras.mozilla) → review+
Comment on attachment 605878 [details] [diff] [review]
part 2 - robustify JS side

should record this corruption via telemetry
Attachment #605878 - Flags: review?(taras.mozilla) → review+
Attachment #605879 - Flags: review?(taras.mozilla) → review+
Try run looks pretty green, marking checkin-needed.

(In reply to Taras Glek (:taras) from comment #5)
> should record this corruption via telemetry

If I do this, I'll do it in a followup bug, but we already have telemetry for corrupt histograms in a session, so doing corruption counting for persisted histograms just sounds like double-counting for little benefit.
Keywords: checkin-needed
Try run for 0505978c2624 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0505978c2624
Results (out of 216 total builds):
    exception: 1
    success: 171
    warnings: 44
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-0505978c2624
Whiteboard: [autoland-in-queue]
Assignee: nobody → nfroyd
Attachment #605877 - Flags: checkin+
Attachment #605878 - Flags: checkin+
Attachment #605879 - Flags: checkin+
Here's a graph of what happened to our telemetry submission rates after we fixed this.  Putting this here for evidence of Aurora uplift goodness.
[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Attachment #607954 - Flags: review+
Attachment #607954 - Flags: approval-mozilla-aurora?
[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: Spurious error messages in the console.
Testing completed (on m-c, etc.): On m-c; see attached graph of telemetry submission rates for evidence this pushes us in the right direction.
Risk to taking this patch (and alternatives if risky): Epsilon.  We will not be collecting telemetry data if this patch is declined.
Attachment #607955 - Flags: review+
Attachment #607955 - Flags: approval-mozilla-aurora?
This patch is optional; the tests are merely there for peace of mind.  The previous two patches are the important ones.

[Approval Request Comment]
Regression caused by (bug #): Bug 707320 and follow on patches.
User impact if declined: None.
Testing completed (on m-c, etc.): On m-c.
Risk to taking this patch (and alternatives if risky): Epsilon.
Attachment #607956 - Flags: review+
Attachment #607956 - Flags: approval-mozilla-aurora?
Comment on attachment 607954 [details] [diff] [review]
part 1 - robustify C++ side, aurora version

[Triage Comment]
Very nice. Happy to see this and thank you for the screenshot to prove the data is being positively affected.
Attachment #607954 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #607955 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #607956 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Please continue to monitor the data after this lands, let's check in a week or so from now and ensure there are no regressions.
Is there something QA can do to verify this fix?
Whiteboard: [qa?]
I don't think there's a good STR for this.
Okay, thanks Nathan. If something comes to light which we can use to test please mark this [qa+].
Whiteboard: [qa?] → [qa-]
You need to log in before you can comment on or make changes to this bug.