Add basic telemetry for ICE

RESOLVED FIXED in mozilla31

Status

()

Core
WebRTC: Networking
P2
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: abr, Assigned: bwc)

Tracking

(Blocks: 1 bug)

unspecified
mozilla31
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [ft:webrtc, p=5][s=fx32])

Attachments

(2 attachments, 15 obsolete attachments)

5.32 KB, patch
jib
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
18.70 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

4 years ago
We need to add telemetry for the type and number of candidates gathered, the candidate type actualy used for a for call, and the RTT measured for connectivity checks.
(Assignee)

Comment 1

4 years ago
RTT should probably be broken down by candidate type. Also, we should definitely be measuring how long ICE takes overall.
(Assignee)

Updated

4 years ago
Blocks: 970426
Summary: Add telemetry for ICE → Add basic telemetry for ICE
(Reporter)

Comment 2

4 years ago
This also need to indicate connection success; e.g., whether any packets actually ended up flowing.
(Assignee)

Comment 3

4 years ago
I think what I will do first here is a couple of tables (one for all calls, one for only ICE failures) that break down every combination of the following:

Is there a local TURN UDP candidate?
Local TURN TCP candidate?
Remote TURN UDP candidate?
Remote TURN TCP candidate?

This should give us a handle of whether our biggest problem is lack of TURN servers, and an idea of how much TURN TCP is helping us.
(Assignee)

Updated

4 years ago
QA Contact: docfaraday
(Assignee)

Updated

4 years ago
Assignee: nobody → docfaraday
QA Contact: docfaraday
(Assignee)

Updated

4 years ago
Depends on: 958221
(Assignee)

Comment 4

4 years ago
Created attachment 8381790 [details] [diff] [review]
Part 1: Add transport field to ICE candidate stats.

We need to be able to distinguish between TURN UDP and TURN TCP to record the stats we want.
(Assignee)

Comment 5

4 years ago
Created attachment 8381815 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Initial cut. When PCs close, we record what types of relayed candidates we gathered as a bitpattern, and whether the remote end used any relayed candidates. Additionally, we maintain a separate histogram for cases where ICE failed, with the same data.
(Assignee)

Updated

4 years ago
Attachment #8381790 - Flags: review?(jib)
(Assignee)

Updated

4 years ago
Attachment #8381815 - Flags: review?(ekr)
Comment on attachment 8381790 [details] [diff] [review]
Part 1: Add transport field to ICE candidate stats.

Review of attachment 8381790 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/webidl/RTCStatsReport.webidl
@@ +108,5 @@
>    DOMString componentId;
>    DOMString candidateId;
>    DOMString ipAddress;
> +  DOMString transport;
> +  DOMString localTransport; // needs standardization

Help me out, when is transport and localTransport not the same? Relay? Why is localTransport useful info?

mozLocalTransport I suppose until we standardize or get a good word.

::: toolkit/content/aboutWebrtc.xhtml
@@ +30,5 @@
>  
> +function candidateTypeString(cand) {
> +  if (cand.type == "localcandidate") {
> +    if (cand.candidateType == "relayed") {
> +      return cand.candidateType += '-' + cand.localTransport;

+= modifies the cand input argument which is surprising. I think you mean +
(Assignee)

Comment 8

4 years ago
(In reply to Jan-Ivar Bruaroey [:jib] from comment #7)
> Comment on attachment 8381790 [details] [diff] [review]
> Part 1: Add transport field to ICE candidate stats.
> 
> Review of attachment 8381790 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/webidl/RTCStatsReport.webidl
> @@ +108,5 @@
> >    DOMString componentId;
> >    DOMString candidateId;
> >    DOMString ipAddress;
> > +  DOMString transport;
> > +  DOMString localTransport; // needs standardization
> 
> Help me out, when is transport and localTransport not the same? Relay? Why
> is localTransport useful info?
>

    Yeah, relay (specifically, TURN TCP). |transport| is what we tell the remote end, and will pretty much always be udp (until ICE TCP lands, that is), whereas |localTransport| could be either udp or tcp (or tls once we have TURN TLS).
 
> mozLocalTransport I suppose until we standardize or get a good word.
>

   I've sent a message to public-webrtc and Harald, and discussed a little bit, we'll see what he does.

> ::: toolkit/content/aboutWebrtc.xhtml
> @@ +30,5 @@
> >  
> > +function candidateTypeString(cand) {
> > +  if (cand.type == "localcandidate") {
> > +    if (cand.candidateType == "relayed") {
> > +      return cand.candidateType += '-' + cand.localTransport;
> 
> += modifies the cand input argument which is surprising. I think you mean +

    Whoops! Copy/paste bug...
(Assignee)

Comment 9

4 years ago
Created attachment 8382441 [details] [diff] [review]
Part 1: Add transport field to ICE candidate stats.

Incorporate feedback.
(Assignee)

Updated

4 years ago
Attachment #8381790 - Attachment is obsolete: true
Attachment #8381790 - Flags: review?(jib)
(Assignee)

Comment 10

4 years ago
Created attachment 8382442 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Unrot.
(Assignee)

Updated

4 years ago
Attachment #8381815 - Attachment is obsolete: true
Attachment #8381815 - Flags: review?(ekr)
(Assignee)

Updated

4 years ago
Attachment #8382441 - Flags: review?(jib)
(Assignee)

Updated

4 years ago
Attachment #8382442 - Flags: review?(ekr)
(Assignee)

Comment 11

4 years ago
Created attachment 8382540 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Add histograms the record the final ICE connection and gathering state, and fix a bug where we'd try to update telemetry twice in some cases and hit an assert.
(Assignee)

Updated

4 years ago
Attachment #8382442 - Attachment is obsolete: true
Attachment #8382442 - Flags: review?(ekr)
(Assignee)

Comment 13

4 years ago
Created attachment 8382655 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Add histograms for trickle arrival time.
(Assignee)

Updated

4 years ago
Attachment #8382540 - Attachment is obsolete: true
Attachment #8382441 - Flags: review?(jib) → review+
(Assignee)

Comment 14

4 years ago
Created attachment 8383341 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Record each stream separately in the histograms dealing with TURN usage, since it seems common enough for different streams to get different kinds of TURN candidates (particularly when TURN servers are slow).
(Assignee)

Updated

4 years ago
Attachment #8382655 - Attachment is obsolete: true
(Assignee)

Comment 15

4 years ago
Created attachment 8383420 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Add histograms for ICE completion time, and use a separate timestamp for anything being measured from the start of ICE.
(Assignee)

Updated

4 years ago
Attachment #8383341 - Attachment is obsolete: true
(Assignee)

Comment 17

4 years ago
Created attachment 8384956 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Fix a build error on B2G ICS Emulator.
(Assignee)

Updated

4 years ago
Attachment #8383420 - Attachment is obsolete: true
(Assignee)

Comment 19

4 years ago
Comment on attachment 8384956 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8384956 [details] [diff] [review]:
-----------------------------------------------------------------

Requesting feedback on whether this is a good start on telemetry. This covers:

* What kinds of relayed candidates were used on each ICE (UDP/TCP/TLS/HTTPS for local, and yes/no on remote, since the remote doesn't tell us what type it is using)
* Same as above, given that ICE failed
* How long ICE took in ms
* How long ICE took in ms, given that ICE failed
* Arrival times for trickle candidates in ms (each trickled candidate recorded separately)
* Same as above, given that ICE failed
* Final ICE connection state (should let us detect cases where a call was torn down while ICE was still in checking, for example)
* Final ICE gathering state (I've never seen this in anything other than gathering; I suspect this is not useful. Am I missing something?)

Together this should give us enough information to see what proportion of failed ICE had properties such as:
- lack of relayed candidates (either remote/local, and by type for local)
- very late remote trickled candidates

These should help us make an informed decision on how much priority to put on things like TURN TLS and TURN HTTPS, and whether we need to do more to cope with very late trickle.
Attachment #8384956 - Flags: feedback?(ekr)
Attachment #8384956 - Flags: feedback?(adam)
(Assignee)

Updated

4 years ago
Status: NEW → ASSIGNED
(Assignee)

Updated

4 years ago
Keywords: leave-open
(Assignee)

Updated

4 years ago
Attachment #8382441 - Flags: checkin?
Comment on attachment 8382441 [details] [diff] [review]
Part 1: Add transport field to ICE candidate stats.

https://hg.mozilla.org/integration/mozilla-inbound/rev/1e3670440894
Attachment #8382441 - Flags: checkin? → checkin+
Martin -- Can I move the request for feedback on Byron's second patch (adding basic telemetry for ICE) from Adam to you?  Adam is swamped, and I need to unblock Byron.  Adam is about half-way through providing feedback;  he's happy to provide his notes.
Flags: needinfo?(martin.thomson)
Comment on attachment 8384956 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8384956 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +1873,5 @@
> +    (mIceConnectionState != PCImplIceConnectionState::Checking &&
> +     mIceConnectionState != PCImplIceConnectionState::New);
> +
> +  bool failed = false;
> +  bool succeeded = false;

I'm looking at the logic here and this seems a little convoluted.

How about ditching the tri-state here and change the check below to:

if (mIceConnectionState != New||Checking||Closed) {
  if (mIceConnectionState != Failed||Disconnected) {
   success peg
  } else {
   failure peg
  }
}

@@ +1912,5 @@
>        break;
>    }
>  
> +  if (!wasComplete && (failed || succeeded)) {
> +#ifdef MOZILLA_INTERNAL_API

Might be best to move this outside the if statement.

@@ +2449,5 @@
> +      Telemetry::Accumulate(Telemetry::WEBRTC_TURN_USAGE_GIVEN_FAILURE,
> +                            i->second.turnUsageBitpattern);
> +    }
> +  }
> +}

Ever heard of cyclomatic complexity?

There's enough code here to warrant a new class.  PeerConnectionImpl is already big enough, and this seems separate enough.

::: toolkit/components/telemetry/Histograms.json
@@ +4951,5 @@
> +  "WEBRTC_ICE_FINAL_CONNECTION_STATE": {
> +    "expires_in_version": "never",
> +    "kind": "enumerated",
> +    "n_values": 7,
> +    "description": "The ICE connection state when the PC was closed"

Any sense in filtering out New from this?
(Assignee)

Comment 24

4 years ago
Comment on attachment 8384956 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8384956 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +1873,5 @@
> +    (mIceConnectionState != PCImplIceConnectionState::Checking &&
> +     mIceConnectionState != PCImplIceConnectionState::New);
> +
> +  bool failed = false;
> +  bool succeeded = false;

That'll work.

@@ +1912,5 @@
>        break;
>    }
>  
> +  if (!wasComplete && (failed || succeeded)) {
> +#ifdef MOZILLA_INTERNAL_API

Note sure why I didn't do that in the first place.

@@ +2449,5 @@
> +      Telemetry::Accumulate(Telemetry::WEBRTC_TURN_USAGE_GIVEN_FAILURE,
> +                            i->second.turnUsageBitpattern);
> +    }
> +  }
> +}

Sure, I can break this out to a separate file, but that's not gonna do anything to the overall complexity. I'm ok with breaking the function into more easily digestible pieces too.

::: toolkit/components/telemetry/Histograms.json
@@ +4951,5 @@
> +  "WEBRTC_ICE_FINAL_CONNECTION_STATE": {
> +    "expires_in_version": "never",
> +    "kind": "enumerated",
> +    "n_values": 7,
> +    "description": "The ICE connection state when the PC was closed"

Not sure whether we are ever in New long enough to transition from there to Closed. If we wanted to filter it out, it is a teensy bit more code, and would require sanity-checking for this case and aborting, and I would be curious as to when it happened in the field. So just recording in telemetry seems the best approach to me.
The following tri-state will be useful:  turn server supplied, only a stun server supplied, neither stun nor turn supplied.  This might be more useful than most of the above.

Maybe collecting in your bitmap whether a reflexive address was gathered (and provided) would help.

I don't think that we can do much more than this.  In fact, this might be overkill (e.g., final gathering state; isn't this supposed to transition to completed at some point?  Is that a bug?)
(In reply to Byron Campen [:bwc] from comment #24)
> Not sure whether we are ever in New long enough to transition from there to
> Closed. If we wanted to filter it out, it is a teensy bit more code, and
> would require sanity-checking for this case and aborting, and I would be
> curious as to when it happened in the field. So just recording in telemetry
> seems the best approach to me.

The reason I suggest this is for cases where RTCPeerConnection is created, then closed without being used.  Those cases might skew the stats for failure.
Flags: needinfo?(martin.thomson)
(Assignee)

Comment 27

4 years ago
(In reply to Martin Thomson [:mt] from comment #25)
> The following tri-state will be useful:  turn server supplied, only a stun
> server supplied, neither stun nor turn supplied.  This might be more useful
> than most of the above.
> 

   Are we anticipating seeing any services without a stun server? I should note that merely having a TURN server configured means very little; if the thing is unresponsive the stats will be misleading. A better indicator is if we gathered any related candidates. Now, having both would allow us to detect cases where a TURN server was supplied, but no candidates were gathered for some reason.

> Maybe collecting in your bitmap whether a reflexive address was gathered
> (and provided) would help.

   I suppose. If we do that for both ends, the bitpattern gets quite large.

> 
> I don't think that we can do much more than this.  In fact, this might be
> overkill (e.g., final gathering state; isn't this supposed to transition to
> completed at some point?  Is that a bug?)

   Yeah, I'm not 100% convinced we need the final gathering state. All it would let us do is detect cases where the PC was closed before gathering completed, and even then it might be better represented as a "Did gathering complete?" boolean.
(In reply to Byron Campen [:bwc] from comment #27)
> (In reply to Martin Thomson [:mt] from comment #25)
> > The following tri-state will be useful:  turn server supplied, only a stun
> > server supplied, neither stun nor turn supplied.  This might be more useful
> > than most of the above.
> > 
> 
>    Are we anticipating seeing any services without a stun server? I should
> note that merely having a TURN server configured means very little; if the
> thing is unresponsive the stats will be misleading. A better indicator is if
> we gathered any related candidates. Now, having both would allow us to
> detect cases where a TURN server was supplied, but no candidates were
> gathered for some reason.

I want to be able to (statistically at least) separate the cases where we get failures because we get failures, and where the failure is due to the application not doing the right thing.
(Reporter)

Comment 29

4 years ago
Comment on attachment 8384956 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8384956 [details] [diff] [review]:
-----------------------------------------------------------------

Publishing my initial notes. I looked through about half the patch a while back, but haven't had time to circle back around to it to finish my review. Overally, this seemed to be going in roughly the right direction based on what I looked at.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +1920,5 @@
> +    }
> +
> +    if (failed) {
> +      Telemetry::Accumulate(Telemetry::WEBRTC_ICE_FAILURE_TIME,
> +                            timeDelta.ToMilliseconds());

Do we really want to peg this if mIceStartTime.IsNull()? That would seem to imply that we somehow made it into a failed state before we actually started checking. The spec FSM doesn't allow this to happen, so it must be a logic error on our part.

@@ +1923,5 @@
> +      Telemetry::Accumulate(Telemetry::WEBRTC_ICE_FAILURE_TIME,
> +                            timeDelta.ToMilliseconds());
> +    } else {
> +      Telemetry::Accumulate(Telemetry::WEBRTC_ICE_COMPLETION_TIME,
> +                            timeDelta.ToMilliseconds());

This is even more questionable, as any transition to a success state prior to passing through checking is impossible to explain.

@@ +1926,5 @@
> +      Telemetry::Accumulate(Telemetry::WEBRTC_ICE_COMPLETION_TIME,
> +                            timeDelta.ToMilliseconds());
> +    }
> +#endif
> +  }

So, basically, I would structure this more as:

> #ifdef MOZILLA_INTERNAL_API
> if (!wasComplete && (failed || succeeded) && !mIceStartTime.IsNull()) {
>   TimeDuration timeDelta(TimeStamp::Now() - mIceStartTime);
>   if (failed) {
> ...

@@ +2367,5 @@
> +    LOCAL_GATHERED_TURN_TCP = 1 << 1,
> +    LOCAL_GATHERED_TURN_TLS = 1 << 2,
> +    LOCAL_GATHERED_TURN_HTTPS = 1 << 3,
> +    // TODO(bcampen@mozilla.com): Reserve more space for even more exotic
> +    // variants of TURN?

Why don't you just put remote first so you don't need to worry about that?
Attachment #8384956 - Flags: feedback?(adam)
(Assignee)

Comment 30

4 years ago
Created attachment 8411119 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Unrot
(Assignee)

Updated

4 years ago
Attachment #8384956 - Attachment is obsolete: true
Attachment #8384956 - Flags: feedback?(ekr)
(Assignee)

Comment 31

4 years ago
Created attachment 8411316 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Incorporate some feedback by moving the Telemetry stats dip into WebrtcGlobalInformation. While I'm at it, use this stats dip to also record stats reports for closed PeerConnections so they can be retrieved later for debugging purposes.
(Assignee)

Updated

4 years ago
Attachment #8411119 - Attachment is obsolete: true
(Assignee)

Comment 32

4 years ago
Created attachment 8411356 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Add server reflexive to the candidate type bitpattern.
(Assignee)

Updated

4 years ago
Attachment #8411316 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8411356 - Flags: review?(martin.thomson)
Comment on attachment 8411356 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8411356 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Not sure about the asymmetry regarding the recording of success/failure vs. unconditional/failure here, but I think that you're basically OK.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +1358,5 @@
> +#ifdef MOZILLA_INTERNAL_API
> +  if(!mIceStartTime.IsNull()) {
> +    TimeDuration timeDelta = TimeStamp::Now() - mIceStartTime;
> +    Telemetry::Accumulate(Telemetry::WEBRTC_ICE_TRICKLE_ARRIVAL_TIME,
> +                          timeDelta.ToMilliseconds());

This isn't congruent with the later code which success/failure, whereas this is unconditional/failure.

::: media/webrtc/signaling/src/peerconnection/WebrtcGlobalInformation.cpp
@@ +220,5 @@
>  }
>  
> +struct StreamResult {
> +  StreamResult() : canididateTypeBitpattern(0), streamSucceeded(false) {}
> +  uint8_t canididateTypeBitpattern;

misspelling here

@@ +275,5 @@
> +      streamResults[streamId].streamSucceeded = true;
> +    } else {
> +      // Make sure we have an entry, but don't modify it.
> +      streamResults[streamId];
> +    }

That's a lot of code, why not?
bool succeeded = pair.mState.Value() == RTC...::Succeeded;
streamResults[streamId].streamSucceeded = succeeded;

@@ +294,5 @@
> +    }
> +
> +    // Note: this is not a "component" in the ICE definition, this is really a
> +    // stream ID. This is just the way the stats API is standardized right now
> +    // Very confusing.

Do we need to change this?  I know that it's going to be hard to get anyone to give a shit about non-rtcp-mux cases, but if the confusion is a problem it's better to get it fixed.

@@ +325,5 @@
> +  }
> +
> +  for (auto i = streamResults.begin(); i != streamResults.end(); ++i) {
> +    Telemetry::Accumulate(Telemetry::WEBRTC_CANDIDATE_TYPES,
> +                          i->second.canididateTypeBitpattern);

Same comment as above about congruence.
Attachment #8411356 - Flags: review?(martin.thomson) → review+
(Assignee)

Comment 34

4 years ago
(In reply to Martin Thomson [:mt] from comment #33)
> Comment on attachment 8411356 [details] [diff] [review]
> Part 2: Add basic telemetry for ICE.
> 
> Review of attachment 8411356 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good.  Not sure about the asymmetry regarding the recording of
> success/failure vs. unconditional/failure here, but I think that you're
> basically OK.
> 
> ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
> @@ +1358,5 @@
> > +#ifdef MOZILLA_INTERNAL_API
> > +  if(!mIceStartTime.IsNull()) {
> > +    TimeDuration timeDelta = TimeStamp::Now() - mIceStartTime;
> > +    Telemetry::Accumulate(Telemetry::WEBRTC_ICE_TRICKLE_ARRIVAL_TIME,
> > +                          timeDelta.ToMilliseconds());
> 
> This isn't congruent with the later code which success/failure, whereas this
> is unconditional/failure.

   That's because ICE isn't complete yet at this stage, most of the time.

> @@ +275,5 @@
> > +      streamResults[streamId].streamSucceeded = true;
> > +    } else {
> > +      // Make sure we have an entry, but don't modify it.
> > +      streamResults[streamId];
> > +    }
> 
> That's a lot of code, why not?
> bool succeeded = pair.mState.Value() == RTC...::Succeeded;
> streamResults[streamId].streamSucceeded = succeeded;
> 

Because a single failed pair would cause the stream to be marked failed, which is not what we want. If any pair succeeds, we want this to be true.

> @@ +294,5 @@
> > +    }
> > +
> > +    // Note: this is not a "component" in the ICE definition, this is really a
> > +    // stream ID. This is just the way the stats API is standardized right now
> > +    // Very confusing.
> 
> Do we need to change this?  I know that it's going to be hard to get anyone
> to give a shit about non-rtcp-mux cases, but if the confusion is a problem
> it's better to get it fixed.
> 

   I think we need to scrub the spec for misuse of this ICE-specific terminology, yes.

> @@ +325,5 @@
> > +  }
> > +
> > +  for (auto i = streamResults.begin(); i != streamResults.end(); ++i) {
> > +    Telemetry::Accumulate(Telemetry::WEBRTC_CANDIDATE_TYPES,
> > +                          i->second.canididateTypeBitpattern);
> 
> Same comment as above about congruence.

   I'll change this one.
(Assignee)

Comment 35

4 years ago
Created attachment 8411407 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Fix nits.
(Assignee)

Updated

4 years ago
Attachment #8411356 - Attachment is obsolete: true
(In reply to Byron Campen [:bwc] from comment #34)
> (In reply to Martin Thomson [:mt] from comment #33)
> > This isn't congruent with the later code which success/failure, whereas this
> > is unconditional/failure.
> 
>    That's because ICE isn't complete yet at this stage, most of the time.

The only reason I suggest this is that it's easier to consistently add "succeeded" (or at least non-failed) telemetry to "failed" telemetry.  That's all.

> > That's a lot of code, why not?
> > bool succeeded = pair.mState.Value() == RTC...::Succeeded;
> > streamResults[streamId].streamSucceeded = succeeded;
> 
> Because a single failed pair would cause the stream to be marked failed,
> which is not what we want. If any pair succeeds, we want this to be true.

streamSucceeded isn't a tri-state; it's initialized to false.
(Assignee)

Comment 37

4 years ago
(In reply to Martin Thomson [:mt] from comment #36)
> (In reply to Byron Campen [:bwc] from comment #34)
> > (In reply to Martin Thomson [:mt] from comment #33)
> > > This isn't congruent with the later code which success/failure, whereas this
> > > is unconditional/failure.
> > 
> >    That's because ICE isn't complete yet at this stage, most of the time.
> 
> The only reason I suggest this is that it's easier to consistently add
> "succeeded" (or at least non-failed) telemetry to "failed" telemetry. 
> That's all.
> 

   We could have "we haven't failed, yet" telemetry in this case I guess (call it ON_TIME or something, since that constitutes a candidate that we might be able to do something with, even if we've already succeeded).


> > > That's a lot of code, why not?
> > > bool succeeded = pair.mState.Value() == RTC...::Succeeded;
> > > streamResults[streamId].streamSucceeded = succeeded;
> > 
> > Because a single failed pair would cause the stream to be marked failed,
> > which is not what we want. If any pair succeeds, we want this to be true.
> 
> streamSucceeded isn't a tri-state; it's initialized to false.

   True, but a transition from true -> false is not allowed; it can only go from false -> true. It could be written as streamSucceeded =| succeeded;, but I feel like the current code reads easier.
(In reply to Byron Campen [:bwc] from comment #37)
>    We could have "we haven't failed, yet" telemetry in this case I guess
> (call it ON_TIME or something, since that constitutes a candidate that we
> might be able to do something with, even if we've already succeeded).

That would be good.  Then we have two buckets: (potentially) useful candidates, and (definitely) useless candidates.
 
>    True, but a transition from true -> false is not allowed; it can only go
> from false -> true. It could be written as streamSucceeded =| succeeded;,
> but I feel like the current code reads easier.

Oh, I hadn't properly made that connection.  Why do we consider this to have succeeded if *either* component succeeds?  I can see arguments for either one, depending on who is consuming it.  For telemetry purposes, it's pretty interesting if anything can succeed - because that implies that it's possible.

For the stats API, I can't see why that way around would be interesting at all.  After all, RTP without RTCP == bad and RTCP without RTP == completely busted.
BTW, I think that I might have made the connection if you had used |=.
(Assignee)

Comment 40

4 years ago
(In reply to Martin Thomson [:mt] from comment #38)
> (In reply to Byron Campen [:bwc] from comment #37)
> >    True, but a transition from true -> false is not allowed; it can only go
> > from false -> true. It could be written as streamSucceeded =| succeeded;,
> > but I feel like the current code reads easier.
> 
> Oh, I hadn't properly made that connection.  Why do we consider this to have
> succeeded if *either* component succeeds?  I can see arguments for either
> one, depending on who is consuming it.  For telemetry purposes, it's pretty
> interesting if anything can succeed - because that implies that it's
> possible.

   We consider the entire stream to have succeeded when any pair succeeds because the component id isn't recorded in the stats anywhere (there's another comment about this problem further up). We'd need to add this info to the stats dip. I can tackle that in a subsequent patch. I suspect that once we break things up by component, we'll have additional telemetry we'd want to record (cases where one component succeeds while another fails are particularly interesting).

> 
> For the stats API, I can't see why that way around would be interesting at
> all.  After all, RTP without RTCP == bad and RTCP without RTP == completely
> busted.

   Yeah, ICE fails if any component fails, regardless of the duties of said component.
(Assignee)

Comment 41

4 years ago
Created attachment 8411438 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.Bug 970690 - Part 2: Add basic telemetry for ICE.

More nit fixes
(Assignee)

Updated

4 years ago
Attachment #8411407 - Attachment is obsolete: true
(Assignee)

Comment 42

4 years ago
Created attachment 8411453 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.Bug 970690 - Part 2: Add basic telemetry for ICE.

Remove the stash we were keeping of final ICE stats reports, since the leak checker in mochitest was unhappy with it. Need to find a good workaround.
(Assignee)

Updated

4 years ago
Attachment #8411438 - Attachment is obsolete: true
(Assignee)

Comment 43

4 years ago
Created attachment 8412267 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.Bug 970690 - Part 2: Add basic telemetry for ICE.

Add a comment
(Assignee)

Updated

4 years ago
Attachment #8411453 - Attachment is obsolete: true

Updated

4 years ago
Priority: -- → P2
Whiteboard: [ft:webrtc, p=5]
Target Milestone: --- → mozilla32
(Assignee)

Comment 44

4 years ago
Created attachment 8412735 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Fix commit message
(Assignee)

Updated

4 years ago
Attachment #8412267 - Attachment is obsolete: true
(Assignee)

Comment 45

4 years ago
Comment on attachment 8412735 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8412735 [details] [diff] [review]:
-----------------------------------------------------------------

There's enough interdiff here since the last review that I want to re-ask.

https://bugzilla.mozilla.org/attachment.cgi?oldid=8411356&action=interdiff&newid=8412735&headers=1
Attachment #8412735 - Flags: review?(martin.thomson)
Comment on attachment 8412735 [details] [diff] [review]
Part 2: Add basic telemetry for ICE.

Review of attachment 8412735 [details] [diff] [review]:
-----------------------------------------------------------------

It's good.  Thanks for the interdiff.

I only just noticed, but yay for new circular dependencies!
Attachment #8412735 - Flags: review?(martin.thomson) → review+
(Assignee)

Comment 47

4 years ago
Try run looks ok apart from unrelated intermittents, but re-running the failed cases anyway. Needinfo myself to check back.

https://tbpl.mozilla.org/?tree=Try&rev=0c7b3af0bb09
Flags: needinfo?(docfaraday)
Keywords: leave-open
(Assignee)

Updated

4 years ago
Flags: needinfo?(docfaraday)
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ee5752cdf5b
Keywords: checkin-needed
Target Milestone: mozilla32 → mozilla31
https://hg.mozilla.org/mozilla-central/rev/6ee5752cdf5b
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Bug 1001959 appears to be a regression from this landing - Byron, please look ASAP.
Blocks: 1001959
Flags: needinfo?(docfaraday)
(Assignee)

Comment 51

4 years ago
Left a comment there; the upshot is that this flaw was already present with the stats dispatch, we're just much more exposed to races given the timing in this case. I'll give it a look.
Flags: needinfo?(docfaraday)
Blocks: 1002236

Updated

4 years ago
Whiteboard: [ft:webrtc, p=5] → [ft:webrtc, p=5][s=fx32]
You need to log in before you can comment on or make changes to this bug.