Closed Bug 1492656 Opened Last year Closed Last year

land Coverage ping in-tree

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

Attachments

(1 file, 2 obsolete files)

In bug 1487578 and bug 1492356 we've had to ship the Telemetry Coverage measurement tool as extensions.

As we'd like to continue getting these measurements regularly for the foreseeable future, I have been working on landing this in-tree. My thought is that we can have it behind a pref, and flip it remotely for the subset of users we care about.

This will allow us to remove the custom sampling code, have automated tests, etc.

While we're at it, I'd like to add the automatic-retry feature we've discussed.

My thoughts on this are that we could run it like so:

1) shortly after startup (not blocking UI), TelmetryCoverage is invoked
2) if either OPT_OUT_PREF or ALREADY_RAN_PREF are true, return early
3) send telemetry-coverage ping
4) if ping fails, retry

My thought is that ALREADY_RAN pref would be versioned and in sync with the version in the endpoint URL, so to trigger a re-run we would just bump this in the source.
(In reply to Robert Helmer [:rhelmer] from comment #0)
> In bug 1487578 and bug 1492356 we've had to ship the Telemetry Coverage
> measurement tool as extensions.
> 
> As we'd like to continue getting these measurements regularly for the
> foreseeable future, I have been working on landing this in-tree. My thought
> is that we can have it behind a pref, and flip it remotely for the subset of
> users we care about.
> 
> This will allow us to remove the custom sampling code, have automated tests,
> etc.
> 
> While we're at it, I'd like to add the automatic-retry feature we've
> discussed.
> 
> My thoughts on this are that we could run it like so:
> 
> 1) shortly after startup (not blocking UI), TelmetryCoverage is invoked
> 2) if either OPT_OUT_PREF or ALREADY_RAN_PREF are true, return early
> 3) send telemetry-coverage ping
> 4) if ping fails, retry
> 
> My thought is that ALREADY_RAN pref would be versioned and in sync with the
> version in the endpoint URL, so to trigger a re-run we would just bump this
> in the source.

Georg, how does this plan sound to you?

I am planning to use ServiceRequest, which is what Telemetry uses as well. ServiceRequest uses more conservative TLS settings, so it's able to get through more TLS man-in-the-middle and other broken TLS conditions.

I have a few more specific questions too:

- is there common code somewhere, that Telemetry uses to do things like retry?
- do you have suggestions on generating a UUID for detecting duplicates, that does not involve the client ID?

I'm pretty sure Normandy figured out a solution for the latter which I will look into, but figured you might have some ideas as well :)
(In reply to Robert Helmer [:rhelmer] from comment #1)
> - is there common code somewhere, that Telemetry uses to do things like
> retry?

Hm, had a thought on this - how about we set the ALREADY_RAN pref only if the XHR POST succeeds, and otherwise we'll just try the next time Firefox starts?
Sorry forgot to set needinfo? - please see comment 1 and comment 2.
Flags: needinfo?(gfritzsche)
Attached patch telemetry-coverage-in-tree.patch (obsolete) — Splinter Review
Actually instead of needinfo'ing you, I put together a quick patch - this is inspired by the code in TelemetrySend.jsm obviously but I've modernized a bit. 

We could potentially refactor and make more of it reusable, probably the best thing would be to move more common request settings into ServiceRequest.jsm

This needs a test and I haven't done manual testing yet, but how do you feel about the overall approach?

Thanks!
Flags: needinfo?(gfritzsche)
Attachment #9010498 - Flags: feedback?(gfritzsche)
Working example.
Attachment #9010498 - Attachment is obsolete: true
Attachment #9010498 - Flags: feedback?(gfritzsche)
Attachment #9010501 - Flags: feedback?(gfritzsche)
Attachment #9010501 - Flags: feedback?(gfritzsche)
Attachment #9010501 - Attachment is obsolete: true
Component: General → Telemetry
Product: Data Platform and Tools → Toolkit
Priority: -- → P1
Comment on attachment 9010530 [details]
Bug 1492656 - move Telemetry Coverage ping in-tree r?gfritzsche

Jan-Erik Rediger [:janerik] has approved the revision.
Attachment #9010530 - Flags: review+
Depends on: 1494292
Summary: land Telemetry Coverage tool in-tree → land Coverage ping in-tree
BTW - in the interest of clarifying what this is for users we've decided to call this "coverage" and not have "telemetry" in the name of prefs or the endpoint (bug 1495037), since it's not associated with any other telemetry.

I've amended the bug, just waiting on data review and that endpoint change.

Please reach out to me if you want to chat about this, thanks!
You need to request review (r?) from one of the data stewards, :chutten would be a good person to ask for review.

(Would have been fine to have the review in here, now that it is in a separate bug that's ok as well)

If the endpoint's hostname is changed as mentioned above that's okay with me (no additional r+ needed unless something else changes in the code)
Flags: needinfo?(rhelmer)
(In reply to Jan-Erik Rediger [:janerik] from comment #9)
> You need to request review (r?) from one of the data stewards, :chutten
> would be a good person to ask for review.
> 
> (Would have been fine to have the review in here, now that it is in a
> separate bug that's ok as well)
> 
> If the endpoint's hostname is changed as mentioned above that's okay with me
> (no additional r+ needed unless something else changes in the code)

Thanks!
Flags: needinfo?(rhelmer)
Hm so since the beta merge is in ~3 days I'm going to go ahead and land it with the existing hostname - easy enough to change later.
Pushed by rhelmer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b969cbe46b14
move Telemetry Coverage ping in-tree r=janerik
Backed out changeset b969cbe46b14 (bug 1492656)for Android xpcshell failures on unit/test_CoveragePing.js CLOSED TREE

Backout revision https://hg.mozilla.org/integration/autoland/rev/c096376198dfd61024c75345b94c9674b403988b

Failed push https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&selectedJob=206480575&revision=b969cbe46b14814c397199ef84db2ed18d252107

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=206480575&repo=autoland

First the failures started on TV tier2 tests and the moved to X12 

:rhelmer could you please take a look?
Flags: needinfo?(rhelmer)
(In reply to Arthur Iakab [arthur_iakab] from comment #13)
> Backed out changeset b969cbe46b14 (bug 1492656)for Android xpcshell failures
> on unit/test_CoveragePing.js CLOSED TREE
> 
> Backout revision
> https://hg.mozilla.org/integration/autoland/rev/
> c096376198dfd61024c75345b94c9674b403988b
> 
> Failed push
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedSta
> te=unclassified&selectedJob=206480575&revision=b969cbe46b14814c397199ef84db2e
> d18d252107
> 
> Failure log:
> https://treeherder.mozilla.org/logviewer.html#?job_id=206480575&repo=autoland
> 
> First the failures started on TV tier2 tests and the moved to X12 
> 
> :rhelmer could you please take a look?

It's because CoveragePing.jsm is expecting the toolkit.coverage.enabled to be present, but it's only set for Desktop:
https://treeherder.mozilla.org/logviewer.html#?job_id=206480575&repo=autoland&lineNumber=1866

I'll add a default of `false` for `getBoolPref` here, I don't think we have current plans for this ping on Android and I'd like to do more testing there before we enable it, anyway.
Flags: needinfo?(rhelmer)
I added a one-line change to default the coverage ping enabled pref to `false`, and did a try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=92516448c34735821f001d8a6aed4d0c2051f9ca&selectedJob=206521758

Just wanted to check in with you before I re-land :) I don't think we need to enable this for Android, they can opt-in later easily. WDYT?
Flags: needinfo?(jrediger)
Ah, damn. I should have caught that in review. Having a default in the code is good, having the default be false is good.
The ping is not enabled by default anywhere yet. Go ahead.
Flags: needinfo?(jrediger)
Pushed by rhelmer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68445fa63653
move Telemetry Coverage ping in-tree r=janerik
https://hg.mozilla.org/mozilla-central/rev/68445fa63653
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.