Closed Bug 1342523 Opened 3 years ago Closed 3 years ago

We need better ICE duration telemetry

Categories

(Core :: WebRTC: Networking, defect, P2)

54 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

Attachments

(3 files)

First, we need to separate out the telemetry for ICE duration into offerer and answerer cases (since they're going to be very different). Second, we need to be more careful to record the right statistics when ICE restarts happen. Third, we need to record stats when the PeerConnection is torn down before ICE completes (since this can happen if a user closes/refreshes a tab because it is taking too long).

We can take advantage of the fact that Telemetry::Accumulate is threadsafe now.
Comment on attachment 8841068 [details]
Bug 1342523 - Part 2: Add some better ICE duration telemetry.

https://reviewboard.mozilla.org/r/115418/#review117244

::: media/mtransport/nricectx.cpp:1070
(Diff revision 3)
> +              Telemetry::WEBRTC_ICE_ANSWERER_SUCCESS_TIME,
> +              time_delta.ToMilliseconds());
> +        }
> +        break;
> +      case ICE_CTX_FAILED:
> +      case ICE_CTX_DISCONNECTED:

I'm not convinced that disconnected should be included in the failure category. Because this state should only be reached after we have reached connected before. And Disconnected is only a intermediary state.
Assuming this should only hit when going checking -> disconnected I would be more interested in having another category for it.
Attachment #8841068 - Flags: review?(drno) → review+
Comment on attachment 8841067 [details]
Bug 1342523 - Part 1: Clean up how controlling/offerer is specified.

https://reviewboard.mozilla.org/r/115416/#review117216

::: media/mtransport/nricectxhandler.cpp
(Diff revision 3)
>  RefPtr<NrIceCtx>
>  NrIceCtxHandler::CreateCtx(const std::string& ufrag,
>                             const std::string& pwd) const
>  {
>    RefPtr<NrIceCtx> new_ctx = new NrIceCtx(this->current_ctx->name(),
> -                                          true, // offerer (hardcoded per bwc)

How do we preserve the offerer state through ICE restarts now?
Attachment #8841067 - Flags: review?(drno) → review+
Comment on attachment 8841069 [details]
Bug 1342523 - Part 3: Remove old Telemetry probes.

https://reviewboard.mozilla.org/r/115420/#review117252

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
(Diff revision 3)
> -    // mIceStartTime can be null if going directly from New to Closed, in which
> -    // case we don't count it as a success or a failure.
> -    if (!mIceStartTime.IsNull()){
> -      TimeDuration timeDelta = TimeStamp::Now() - mIceStartTime;
> -      if (isSucceeded(domState)) {
> -        Telemetry::Accumulate(Telemetry::WEBRTC_ICE_SUCCESS_TIME,

Should these also be removed from the Histogram.js file?
Maybe a question for bsmedberg.
Attachment #8841069 - Flags: review?(drno) → review+
Comment on attachment 8841068 [details]
Bug 1342523 - Part 2: Add some better ICE duration telemetry.

https://reviewboard.mozilla.org/r/115418/#review117428

data-r=me (I did not review code)
Attachment #8841068 - Flags: review?(benjamin) → review+
Rank: 25
Priority: -- → P2
Comment on attachment 8841067 [details]
Bug 1342523 - Part 1: Clean up how controlling/offerer is specified.

https://reviewboard.mozilla.org/r/115416/#review117216

> How do we preserve the offerer state through ICE restarts now?

NrIceCtx only uses |offerer| for telemetry now. For ICE restarts, it makes sense to have it change with each renegotiation. Also, I don't think we ever set this to anything other than true, except in the unit-tests, so even if we should be remembering something like this for restarts, we aren't doing it right now.
Comment on attachment 8841068 [details]
Bug 1342523 - Part 2: Add some better ICE duration telemetry.

https://reviewboard.mozilla.org/r/115418/#review117244

> I'm not convinced that disconnected should be included in the failure category. Because this state should only be reached after we have reached connected before. And Disconnected is only a intermediary state.
> Assuming this should only hit when going checking -> disconnected I would be more interested in having another category for it.

I _think_ this is a "should not happen". I would have to look to see what happens if the connection drops after some (but not all) of the streams have succeeded. I can have it assert and break here.
(In reply to Nils Ohlmeier [:drno] from comment #12)
> Comment on attachment 8841069 [details]
> Bug 1342523 - Part 3: Remove old Telemetry probes.
> 
> https://reviewboard.mozilla.org/r/115420/#review117252
> 
> ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
> (Diff revision 3)
> > -    // mIceStartTime can be null if going directly from New to Closed, in which
> > -    // case we don't count it as a success or a failure.
> > -    if (!mIceStartTime.IsNull()){
> > -      TimeDuration timeDelta = TimeStamp::Now() - mIceStartTime;
> > -      if (isSucceeded(domState)) {
> > -        Telemetry::Accumulate(Telemetry::WEBRTC_ICE_SUCCESS_TIME,
> 
> Should these also be removed from the Histogram.js file?
> Maybe a question for bsmedberg.

Do we need to wait for these histograms to expire before removing them from Histogram.js?
Flags: needinfo?(benjamin)
Once things are removed from Histograms.json they will no longer show up on telemetry.mozilla.org. If that's ok with you, you can remove them whenever you want.
Flags: needinfo?(benjamin)
Ok, let's hold onto it for now. That data is still somewhat useful.
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5074cc34731d
Part 1: Clean up how controlling/offerer is specified. r=drno
https://hg.mozilla.org/integration/autoland/rev/d99df97e4115
Part 2: Add some better ICE duration telemetry. r=bsmedberg,drno
https://hg.mozilla.org/integration/autoland/rev/82450ee6343f
Part 3: Remove old Telemetry probes. r=drno
Depends on: 1345791
You need to log in before you can comment on or make changes to this bug.