Closed Bug 1404535 Opened 2 years ago Closed 2 years ago

Add telemetry to time nr_ice_gather

Categories

(Core :: WebRTC: Networking, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

Attachments

(1 file)

While poking around I noticed that creating sockets on try was taking a very long time, and blocking the content STS thread (bug 1404409). It is possible that this is making nr_ice_gather (just the call, not the actual gathering process itself) take a very long time in the wild, so we should add some telemetry to see.
See Also: → 1404409
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review190248

To me the naming of the probes is confusing. Let's try to improve that.

::: media/mtransport/nricectx.cpp:978
(Diff revision 1)
> +
>  
>    if (!r) {
>      SetGatheringState(ICE_CTX_GATHER_COMPLETE);
> +    Telemetry::Accumulate(
> +        Telemetry::WEBRTC_ICE_START_GATHER_TIME,

I think the name is not ideal. This is the gathering time of the sockets on the host (excluding STUN and TURN server), right?
So maybe something like WEBRTC_ICE_SOCKET_GATHER_TIME...

Secondly this looks to me like the immediate success case. Here we switch GATHER_COMPLETE. The other one further below is the one which still waits for STUN/TURN responses.

::: toolkit/components/telemetry/Histograms.json:8781
(Diff revision 1)
>      "n_buckets": 20,
>      "description": "The length of time (in milliseconds) it took for the answerer to complete ICE, given that it failed. Does not count cases where StartChecks() was never called."
>    },
> +  "WEBRTC_ICE_START_GATHER_TIME": {
> +    "record_in_processes": ["main", "content"],
> +    "alert_emails": ["webrtc-ice-telemetry-alerts@mozilla.com"],

In my recent Telemetry thing I learned that apparently now a second, personal email address is needed :-/

::: toolkit/components/telemetry/Histograms.json:8787
(Diff revision 1)
> +    "bug_numbers": [1319268],
> +    "expires_in_version": "60",
> +    "kind": "exponential",
> +    "high": 10000,
> +    "n_buckets": 20,
> +    "description": "The length of time (in milliseconds) it took to call nr_ice_gather, given that gathering did not succeed/fail immediately during the call."

Given my concerns with the code above I think this description is also off.

::: toolkit/components/telemetry/Histograms.json:8789
(Diff revision 1)
> +    "kind": "exponential",
> +    "high": 10000,
> +    "n_buckets": 20,
> +    "description": "The length of time (in milliseconds) it took to call nr_ice_gather, given that gathering did not succeed/fail immediately during the call."
> +  },
> +  "WEBRTC_ICE_START_GATHER_TIME_IMMEDIATE_FAILURE": {

I'm wondering if a better solution then to define three different Historgrams here is to use a scalar. Maybe check with mjf what he thinks.
Attachment #8913887 - Flags: review?(drno) → review-
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review190248

> I think the name is not ideal. This is the gathering time of the sockets on the host (excluding STUN and TURN server), right?
> So maybe something like WEBRTC_ICE_SOCKET_GATHER_TIME...
> 
> Secondly this looks to me like the immediate success case. Here we switch GATHER_COMPLETE. The other one further below is the one which still waits for STUN/TURN responses.

I will rename to WEBRTC_ICE_NR_ICE_GATHER_TIME etc

I don't want to put "SOCKET" in here, because there's a lot more going on in that call.

I'm fixing the mixup too.

> I'm wondering if a better solution then to define three different Historgrams here is to use a scalar. Maybe check with mjf what he thinks.

I actually want to see the distribution here for all three of these, separately.
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review190794

LGTM
I think you still need a review from a data shepard though.
Attachment #8913887 - Flags: review?(drno) → review+
Rank: 15
Priority: -- → P2
Attachment #8913887 - Flags: review?(chutten)
I swear I received an email from Bugzilla with a review request for this :-\ But now I see :chutten's name :D
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191066

A couple of notes: Per Data Collection Policy[1] you need a data review from a Data Peer (or the Data Steward)

Also: We have Telemetry::AccumulateTimeDuration which you might find beneficial instead of doing the duration work yourself inline.

[1]: https://wiki.mozilla.org/Firefox/Data_Collection

::: toolkit/components/telemetry/Histograms.json:8780
(Diff revision 2)
>      "high": 10000,
>      "n_buckets": 20,
>      "description": "The length of time (in milliseconds) it took for the answerer to complete ICE, given that it failed. Does not count cases where StartChecks() was never called."
>    },
> +  "WEBRTC_ICE_NR_ICE_GATHER_TIME": {
> +    "record_in_processes": ["main", "content"],

WebRTC can happen in the parent ("main") process? If not (or if you only want these stats in the content process) you can skip it.

::: toolkit/components/telemetry/Histograms.json:8785
(Diff revision 2)
> +    "record_in_processes": ["main", "content"],
> +    "alert_emails": ["webrtc-ice-telemetry-alerts@mozilla.com", "bcampen@mozilla.com"],
> +    "bug_numbers": [1319268],
> +    "expires_in_version": "60",
> +    "kind": "exponential",
> +    "high": 10000,

10s seems really high. What sort of values do you see in the wild?
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191066

> WebRTC can happen in the parent ("main") process? If not (or if you only want these stats in the content process) you can skip it.

We still support running with e10s turned off. But now that you mention it, I don't think I would want non-e10s samples in this measurement.

> 10s seems really high. What sort of values do you see in the wild?

I have no idea what I will see in the wild, but I have seen several seconds on try.
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191066

AccumulateTimeDelta does make this cleaner. Thanks!
Attachment #8913887 - Flags: feedback?(rweiss)
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191066

> We still support running with e10s turned off. But now that you mention it, I don't think I would want non-e10s samples in this measurement.

The Firefox version you're shipping this in doesn't support !e10s :)
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191118

Telemetry usage looks good to me. Pending data-r?
Attachment #8913887 - Flags: review?(chutten) → review+
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review191066

> The Firefox version you're shipping this in doesn't support !e10s :)

Not officially, but e10s can still be disabled with about:config.
Attachment #8913887 - Flags: review?(liuche)
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review194202

data-review only.

r+ because this collects Type 1 data, and has detailed description of what and when this probe is collected.
Attachment #8913887 - Flags: review?(liuche) → review+
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5ede138b6e27
Add telemetry that times nr_ice_gather. r+drno r=chutten,drno,liuche
https://hg.mozilla.org/mozilla-central/rev/5ede138b6e27
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review196628

Seems like liuche got to this, will defer to that review.
Attachment #8913887 - Flags: review+
Comment on attachment 8913887 [details]
Bug 1404535: Add telemetry that times nr_ice_gather. r+drno

https://reviewboard.mozilla.org/r/185296/#review196630
Attachment #8913887 - Flags: feedback?(rweiss) → feedback+
You need to log in before you can comment on or make changes to this bug.