Closed Bug 1380786 Opened 8 years ago Closed 6 years ago

Can we now use standard telemetry for webrtc stats?

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla72

Tracking Flags:

Tracking

Status

firefox72

---

fixed

People

(Reporter: chutten|PTO, Assigned: dminor)

Details

(Whiteboard: [measurement:client:tracking])

Attachments

(2 files)

Bug 1380786 - Remove ICE candidate telemetry; r=bwc! 6 years ago Dan Minor [:dminor] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1380786 - Remove WebrtcTelemetry and associated code; r=chutten! 6 years ago Dan Minor [:dminor] 47 bytes, text/x-phabricator-request		Details \| Review

Chris H-C :chutten|PTO (back Oct 23)

Reporter

Description

•

8 years ago

webrtc stats are one of the few remaining pieces of telemetry captured on childPayloads instead of being aggregated to the parent. Unfortunately, webrtc stats are by their own nature quite complicated. We first introduced a pair of probes in bug 970690 Then we introduced a custom struct in bug 1198883 Luckily, no one's using that custom struct yet so we can still change it further if we're clever. If we can use standard telemetry primitives (Histograms, Scalars, whatever) we can remove the custom webrtc handling and get client child aggregation for free. The trick is that we need to record two 2^11 (2048) bitstrings' worth of information[1]. Anyone have a brilliant idea? [1]: http://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/peerconnection/WebrtcGlobalInformation.cpp#1053-1080

Chris H-C :chutten|PTO (back Oct 23)

Reporter

Comment 1

•

8 years ago

So we could have a 2048-bucket categorical histogram. Pros: * Excellent tooling support. * telemetry.mozilla.org might display this exactly the way we want * String buckets instead of bitstrings sounds easier to read to me Cons: * 2048 buckets... this would be the first histogram quite this wide. Might stress something. * Have to call Accumulate multiple times per collection (once for each flipped bit). (might not be a big deal as it could replace the bitstring creation code) If telemetry.mozilla.org won't satisfy the analysis needs, we're looking at custom analysis. At that point it doesn't really matter what format we use so long as we can munge it in python (or SQL) later. That opens things up like keyed boolean histograms (one histogram each for success/failure, keys are the bitstrings), keyed uint scalars (ditto), and probably other ideas. Questions 1) :drno - what analysis would you like to perform on this data when you get it? 2) :gfritzsche, :Dexter - have any brilliant ideas for storing this data? Any knowledge about internal bucket limits?

Flags: needinfo?(gfritzsche)

Flags: needinfo?(drno)

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Comment 2

•

8 years ago

(In reply to Chris H-C :chutten from comment #1) > So we could have a 2048-bucket categorical histogram. That sounds like a good idea to me, given that it's an exceptional measurement and that we don't want to add 2048-bucket every other day. I'm a bit concerned about the impact on the ping size: our serialization format would basically enforce 2048 keys to be dumped in the "values" section of this histogram. Is that correct? > 2) :gfritzsche, :Dexter - have any brilliant ideas for storing this data? > Any knowledge about internal bucket limits? As far as I can tell/remember, we only require the histogram to be in a whitelist if more than 100 buckets are needed. We don't seem to enforce other limit (other than the minimum/default number of 50 buckets for categoricals).

Flags: needinfo?(alessio.placitelli)

Nils Ohlmeier [:drno]

Comment 3

•

8 years ago

To give a bit of history: the reason this in custom code is that the default Histograms could not carry the 27 bits we are using. Or there were at least concerns about the size of data to be transferred as each of the 2^27 representations would get transferred (?). I think the default analyzes I would want to perform on this data would something like: - show me success vs failure percentages in case where both sides of the call had IPv6 UDP - show me success vs failure percentages where only one side had TCP available - show me how many Windows clients had TCP locally available ... And obviously ;-) all of that per Firefox version, channel and OS :-) Ideally we would have some kind of interface similar to standard Telemetry interface where people can change OS, version etc in drop downs. But obviously an initial version with hard coded queries would be a good start as well.

Flags: needinfo?(drno)

Georg Fritzsche [:gfritzsche]

Comment 4

•

8 years ago

(In reply to Nils Ohlmeier [:drno] from comment #3) > To give a bit of history: the reason this in custom code is that the default > Histograms could not carry the 27 bits we are using. Or there were at least > concerns about the size of data to be transferred as each of the 2^27 > representations would get transferred (?). The serialization/transfer of histogram data is sparse, but if you record many of those representations in a single session, that is a concern. AFAIU, performance becomes a concern for the aggregator though for high bucket counts. > I think the default analyzes I would want to perform on this data would > something like: > - show me success vs failure percentages in case where both sides of the > call had IPv6 UDP > - show me success vs failure percentages where only one side had TCP > available > - show me how many Windows clients had TCP locally available > ... Can you enumerate the standard questions and add standard scalars or histograms for them? (e.g. boolean scalar for "tcp available", boolean histogram for "success/failure with both sides having udp") Then you would have them show up automatically in e.g. the TMO dashboard without further work.

Flags: needinfo?(gfritzsche) → needinfo?(drno)

Chris H-C :chutten|PTO (back Oct 23)

Reporter

Comment 5

•

8 years ago

Hey :frank, know of any perf/stability concerns for aggregating particularly wide (~2K buckets) histograms? ...but you know what, maybe we can be more clever than this. We could represent this as four (success/failure and local/remote) 11-bucket categorical histograms. Then if a bit would be flipped in the bitstring, we accumulate to that bit's bucket in the categorical histogram. For example, a bitstring of 4 and a bitstring of 5 (both local, success) would result in values of [1, 0, 2]. So we'd know what proportion of all connections use which features over time, but not pairs of features (that information goes missing at the client)

Flags: needinfo?(fbertsch)

Frank Bertsch [:frank]

Comment 6

•

8 years ago

(In reply to Chris H-C :chutten from comment #5) > Hey :frank, know of any perf/stability concerns for aggregating particularly > wide (~2K buckets) histograms? Nope, we have a bunch that are 1K wide, and two that are 10K wide. See one here: https://mzl.la/2tc1MWf.

Flags: needinfo?(fbertsch)

Alex Chronopoulos [:achronop]

Updated

•

8 years ago

Rank: 25

Component: WebRTC → WebRTC: Signaling

Priority: -- → P2

Georg Fritzsche [:gfritzsche]

Updated

•

8 years ago

Whiteboard: [measurement:client] → [measurement:client:tracking]

Bulk Bug Changes for mreavy's org

Comment 7

•

8 years ago

Mass change P2->P3 to align with new Mozilla triage process.

Priority: P2 → P3

Byron Campen [:bwc]

Comment 8

•

6 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #4)

(In reply to Nils Ohlmeier [:drno] from comment #3)

To give a bit of history: the reason this in custom code is that the default
Histograms could not carry the 27 bits we are using. Or there were at least
concerns about the size of data to be transferred as each of the 2^27
representations would get transferred (?).

The serialization/transfer of histogram data is sparse, but if you record
many of those representations in a single session, that is a concern.
AFAIU, performance becomes a concern for the aggregator though for high
bucket counts.

I think the default analyzes I would want to perform on this data would
something like:

show me success vs failure percentages in case where both sides of the
call had IPv6 UDP

show me success vs failure percentages where only one side had TCP
available

show me how many Windows clients had TCP locally available
...

Can you enumerate the standard questions and add standard scalars or
histograms for them?
(e.g. boolean scalar for "tcp available", boolean histogram for
"success/failure with both sides having udp")
Then you would have them show up automatically in e.g. the TMO dashboard
without further work.

Looks like this fell between the cracks. The original intent for this telemetry was to answer the question "Why did our ICE success rate go down?". As such, we did not have a small number of things we wanted to monitor. We really did want every combination of capabilities (local and remote).

That said, I don't think anyone has looked at this telemetry in a really long time. I can't even figure out how to find this data anymore. We don't have telemetry for the overall ICE success rate, either.

Has anybody actually used this ICE candidate telemetry in the last year? We may just need to remove this.

Flags: needinfo?(na-g)

Flags: needinfo?(mfroman)

Flags: needinfo?(drno)

Flags: needinfo?(dminor)

Michael Froman [:mjf]

Comment 9

•

6 years ago

I have not.

Flags: needinfo?(mfroman)

Nico Grunbaum [:ng, @chew:mozilla.org]

Comment 10

•

6 years ago

I have not.

Flags: needinfo?(na-g)

Dan Minor [:dminor]

Assignee

Comment 11

•

6 years ago

I'm not using it. I can take care of removing it.

Assignee: nobody → dminor

Flags: needinfo?(dminor)

Chris H-C :chutten|PTO (back Oct 23)

Reporter

Comment 12

•

6 years ago

Please let me know if I can be of any assistance in its removal.

Dan Minor [:dminor]

Assignee

Comment 13

•

6 years ago

Attached file Bug 1380786 - Remove ICE candidate telemetry; r=bwc! — Details

This ICE candidate telemetry has not been used in a long time and in
addition requires special handling by the telemetry code. It is best
removed.

Dan Minor [:dminor]

Assignee

Comment 14

•

6 years ago

Attached file Bug 1380786 - Remove WebrtcTelemetry and associated code; r=chutten! — Details

The ICE candidate telemetry recorded using this is no longer useful,
and so this code can be safely removed.

Depends on D50656

Pulsebot

Comment 15

•

6 years ago

Pushed by dminor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e2536fbffa15 Remove ICE candidate telemetry; r=bwc https://hg.mozilla.org/integration/autoland/rev/bbd49f460213 Remove WebrtcTelemetry and associated code; r=chutten

Cosmin Sabou [:CosminS]

Comment 16

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e2536fbffa15
https://hg.mozilla.org/mozilla-central/rev/bbd49f460213

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox72: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla72

You need to log in before you can comment on or make changes to this bug.