Open Bug 1151652 Opened 9 years ago Updated 2 years ago

We need more/better telemetry for ICE failure diagnosis in the field

Categories

(Core :: WebRTC: Signaling, defect, P3)

defect

Tracking

()

Tracking Status
firefox40 --- affected

People

(Reporter: bwc, Unassigned)

References

(Blocks 2 open bugs)

Details

I've been tossing around the idea of creating a "trouble code" enumeration in telemetry. Things like missing necessary candidates (eg; we have a nominated srflx/relay candidate pair on one stream, but another stream has no remote relay candidates and has failed), inconsistent ICE stream successes (one stream succeeded, others failed), stalled ICE (one stream succeeded, others still frozen), etc. These should be split into stuff that is probably our fault, and stuff that is probably not.
Blocks: 1151587
After some more thought, it seems that bug 1151647 captures pretty much everything in the "not our fault" bucket. Trickle failure (missing remote candidates) and unresponsive/unconfigured STUN/TURN servers (missing local candidates) cover pretty much all the clear-cut cases. It is certainly possible that the other side's ICE stack is busted, but that is basically impossible to distinguish from a NAT eating packets. It seems that trouble codes should probably be for local errors. Here's a strawman list:

ICE gathering failure
No local host candidates
ICE start failure
ICE stall: frozen streams
ICE stall: frozen components
ICE stall: frozen candidates
Excessive thread queuing delay
backlog: --- → webRTC+
Rank: 25
Priority: -- → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Depends on: webrtc-telemetry
No longer depends on: webrtc-telemetry
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.