Closed Bug 1112789 Opened 10 years ago Closed 8 years ago

Add Hello direct call setup metrics

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: abr, Assigned: kparlante)

References

Details

Attachments

(1 file)

Graphs for Call Failure States 10 years ago Adam Roach [:abr] 66.05 KB, image/png		Details

Adam Roach [:abr]

Reporter

Description

•

10 years ago

Attached image Graphs for Call Failure States — Details

As a follow-on to Bug 1085252, we'd like to be able to extract a variety of information from the call progress logs, once this server deploys. An initial list of questions we'd like to be able to answer are: 1) When a call transitions to "terminated," what state was it in immediately before doing so? Ideally this would be a pie chart over some relatively small time-local window (e.g., one week), along with a time-series line graph that shows which state the failure occurred in, as a percentage of all failures. Roughly, something like the attached image. I'd like us to include "connected" and "terminated" in the list; even though these should be "0%" by definition, I'd like to include them as a double-check that the server isn't doing something it shouldn't. 2) For calls that transition to "terminated" from "alerting", what are the reason codes? Ideally, the same two kinds of graphs as for (1) 3) For calls that transition to "terminated" from "connecting" or "half-connected", what are the reason codes? Ideally, the same two kinds of graphs as for (1) 4) For calls that include an "accept" event, what percentage eventually enter the "connected" state? This would be a simple time series line graph. 5) For calls that include an "accept" event and fail, how long is it between the "accept" event and the "terminate" event? Ideally, this is a bucketed distribution graph, similar to what we currently have on the telemetry website; i.e., a bar graph of the number of calls that failed <1 second after "accept"; number of calls that failed 1-2 seconds after, etc. Due to the supervisory timers we run, I would expect this to top out at 10 seconds: having a single bucket to catch "> 10 seconds" is probably sufficient. 6) Same as (5), except for calls that include an "accept" event and eventually *succeed*. Note that all of the preceding analyses should disregard "terminated" messages with a reason of "answered-elsewhere".

Eric Rescorla (:ekr)

Comment 1

•

10 years ago

(In reply to Adam Roach [:abr] from comment #0) > Created attachment 8538103 [details] > Graphs for Call Failure States > > As a follow-on to Bug 1085252, we'd like to be able to extract a variety of > information from the call progress logs, once this server deploys. An > initial list of questions we'd like to be able to answer are: > > 1) When a call transitions to "terminated," what state was it in immediately > before doing so? Ideally this would be a pie chart over some relatively > small time-local window (e.g., one week), along with a time-series line > graph that shows which state the failure occurred in, as a percentage of all > failures. Roughly, something like the attached image. I'd like us to include > "connected" and "terminated" in the list; even though these should be "0%" > by definition, I'd like to include them as a double-check that the server > isn't doing something it shouldn't. > > 2) For calls that transition to "terminated" from "alerting", what are the > reason codes? Ideally, the same two kinds of graphs as for (1) > > 3) For calls that transition to "terminated" from "connecting" or > "half-connected", what are the reason codes? Ideally, the same two kinds of > graphs as for (1) > > 4) For calls that include an "accept" event, what percentage eventually > enter the "connected" state? This would be a simple time series line graph. > > 5) For calls that include an "accept" event and fail, how long is it between > the "accept" event and the "terminate" event? Ideally, this is a bucketed > distribution graph, similar to what we currently have on the telemetry > website; i.e., a bar graph of the number of calls that failed <1 second > after "accept"; number of calls that failed 1-2 seconds after, etc. Due to > the supervisory timers we run, I would expect this to top out at 10 seconds: > having a single bucket to catch "> 10 seconds" is probably sufficient. What about the reason code for this? And some way to filter this to the ICE logs > 6) Same as (5), except for calls that include an "accept" event and > eventually *succeed*. > > Note that all of the preceding analyses should disregard "terminated" > messages with a reason of "answered-elsewhere".

Wesley Dawson [:whd]

Comment 2

•

9 years ago

Moving to a more actively triaged component.

Component: Operations: Metrics/Monitoring → Metrics: Pipeline

Priority: -- → P5

Wesley Dawson [:whd]

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add Hello direct call setup metrics

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P5)

Tracking

(Not tracked)

People

(Reporter: abr, Assigned: kparlante)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Updated

Attachment

General

Description

File Name

Content Type