Closed Bug 1586271 Opened 5 years ago Closed 5 years ago

Make WEBRTC_CALL_DURATION telemetry opt-out on release channel

Categories

(Core :: WebRTC, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: dminor, Assigned: dminor)

Details

Attachments

(2 files)

Bug 1571015 made some improvements to the existing WEBRTC_CALL_DURATION telemetry, so that we can track the total duration of a call rather than the duration of each individual peer connection. With these changes in place, we'd like to start collecting data on release channels. This would give us an idea of much WebRTC is actually used on the web and help prioritize future work.

Attached file bug-1586271-request.md
Attachment #9098846 - Flags: data-review?(chutten)

WEBRTC_CALL_DURATION currently has an exception allowing for 1000 buckets rather than the usual 100. Please let me know if we need to reduce this if it is going to be enabled for release.

We don't have to reduce the number buckets, but that many buckets will probably make analysis a pain in the posterior. How do you intend to analyze this data? *checks the data review* The measurement dashboards' use of metricsgraphics makes displaying this many buckets very awkward. If you have an idea of what the likely values are, we can tailor the range and bring the bucket resolution down to at most 100.

(( Even if we just straight-up drop the bucket count to 100 we still have acceptable resolution. It's not until 1h lengths that we're talking 7min bucket widths (which you can see using the histogram simulator) ))

If we drop our sights to 2h of buckets, that gives us at worst a 10min bucket width, with everything under 1h having a less than 4min resolution. (resolution is better than 1min all the way up to about 1000s (16min))

A note that if you change the buckets, you need to give the probe a new name. Some downstream data tools don't handle bucket reassignments, so we need to treat them as immutable.

Comment on attachment 9098846 [details]
bug-1586271-request.md

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is Telemetry so is documented in its definitions file [Histograms.json](https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Histograms.json) and the [Probe Dictionary](https://telemetry.mozilla.org/probe-dictionary/).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, :dro is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+ pending :drno confirming they're okay with being responsible (I don't see any discussion on this bug about this)
Flags: needinfo?(drno)
Attachment #9098846 - Flags: data-review?(chutten) → data-review+

Yes I'm okay owning this going forward. This is a high level metric product and engineering management want to track.

When it comes to buckets the current 1000 are not very useful. In the default dashboard you can only look at it in the table mode and then have scroll a lot. So in general smaller buckets would be better. But the current average value is in the range of a few seconds. So I'm a little concerned if our resolution is going to lump up everything in the first 4 min. Is it possible to have more finer resolution in the low range values and increase the resolution for higher values?

Flags: needinfo?(drno)

I invite you to play with the low, high, and n_buckets/n_values settings in the simulator to see what the bucket widths are for given settings. The "finer resolution in the low range" is how histograms with a kind of "exponential" work.

So for example, here's a default low, high = 10000, and n_buckets = 100 (the same shape as today, but with 10x fewer buckets): (simulator link). The buckets are 1s wide all the way up to about 20s. The widest bucket at the end is 751s wide.

If we drop the upper bound from nearly 3 hours to just 1 hour (from 10000 to 3600) we get this simulation. The buckets are 1s up to 20 as before, but the widest bucket is now only 230s wide.

Ultimately, the bucketing is sparsely represented on disk and in transit, so having extra buckets only costs us in data section binary size, in-memory storage, and complications in analysis (like not being able to see the values on the Measurement Dashboard).

(In reply to Nils Ohlmeier [:drno] from comment #6)

Yes I'm okay owning this going forward. This is a high level metric product and engineering management want to track.

When it comes to buckets the current 1000 are not very useful. In the default dashboard you can only look at it in the table mode and then have scroll a lot. So in general smaller buckets would be better. But the current average value is in the range of a few seconds. So I'm a little concerned if our resolution is going to lump up everything in the first 4 min. Is it possible to have more finer resolution in the low range values and increase the resolution for higher values?

With the changes in Bug 1571015, I would expect the average to go up as we should get the duration from the first peerconnection to the last peerconnection rather than the duration of each peerconnection. It might make sense to wait a week or two and see what happens to the data before we change the buckets here. That said, if I look at data since those changes landed, the median call is still just a few seconds long, so perhaps it won't make much difference.

Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6eec85211514
Make WEBRTC_CALL_DURATION telemetry opt-out on release channel; r=chutten
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: