Closed Bug 1127646 Opened 5 years ago Closed 5 years ago

Report join latency and mean time between rebuffers in telemetry

Categories

(Core :: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 + wontfix
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: cpearce, Assigned: cajbir)

References

Details

Attachments

(1 file, 3 obsolete files)

In addition to the MSE telementry in bug 1119947, we should also report the rebuffer rate, and the mean time between rebuffers.

"rebuffer" here refers to the time spent buffering, i.e. time where the throbber shows up.
The primary metrics YouTube cares about are:

* Above the Fold Time (AFT): time between _page_ onload and video playback.
* Join Latency: time between _video_ load and video playback.
* Mean Time Between Rebuffering (MTBR): play time between rebuffering hiccups.

Can we add telemetry measuring AFT and join latency for HTML5 video? These measurements would probably be impractical for Flash video.
Blocks: 1127968
Hardware: x86_64 → All
Priority: -- → P1
Attached patch WIP Patch (obsolete) — Splinter Review
Some fixes on top of my patch in bug 1119947.

Also changes when we report the telemetry to be (hopefully) more reliable, and fixes an issue with the buffered-ranges based test wasn't working properly.
cajbir: Are you able to take this off me?
Assignee: cpearce → cajbir.bugzilla
Flags: needinfo?(cajbir.bugzilla)
Yep.
Flags: needinfo?(cajbir.bugzilla)
Tracking all MSE P1 bugs for Firefox 37.
Blocks: 1133988
Summary: Report rebuffer rate and mean time between rebuffers in telemetry → Report join latency and mean time between rebuffers in telemetry
Reports mean time between rebuffering and join latency. :cpearce for media changes, :bsmedberg for telemetry changes.
Attachment #8561735 - Attachment is obsolete: true
Attachment #8570167 - Flags: review?(cpearce)
Attachment #8570167 - Flags: review?(benjamin)
New try build with unused function removed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=51be4f4ed4f2
Remove unused function from last patch which broke try. Review requested from :cpearce for media code and :bsmedberg for telemetry code.
Attachment #8570167 - Attachment is obsolete: true
Attachment #8570167 - Flags: review?(cpearce)
Attachment #8570167 - Flags: review?(benjamin)
Attachment #8570189 - Flags: review?(cpearce)
Attachment #8570189 - Flags: review?(benjamin)
I don't quite understand the "mean" measurement here. Is there a reason we need to accumulate this "mtbf" separately instead of just having a histogram for total play time and a separate histogram for rebuffering length? That would let us calculate MTBR plus look at the distribution of rebuffering delays (many short ones or a few long ones).
Flags: needinfo?(cajbir.bugzilla)
We want to track mtbr and see whether it is increasing or decreasing over time. If we accumulate them value separately as you suggest is this still possible? How would one calculate the mtbr from the telemetry dashboard? I have no problem changing it if it's possible.
Flags: needinfo?(cajbir.bugzilla)
needinfo :bsmedberg for comment 11.
Flags: needinfo?(benjamin)
The telemetry.mozilla.org infra can't show you that because it's a relation between two separate datapoints. But with a histogram of pauses and a counter/histogram of total time, the realtime dashboard can graph that over time. File a dashboard bug and it'll be quick!
Flags: needinfo?(benjamin)
Comment on attachment 8570189 [details] [diff] [review]
Report join latency and MTBR stats

Remove review request until I address :bsmedberg's comments.
Attachment #8570189 - Flags: review?(cpearce)
Attachment #8570189 - Flags: review?(benjamin)
Priority: P1 → P2
Splits MTBR up into total play time and count of buffering as suggested by :bsmedberg. I'll raise the telemetry dashboard bug to enable reporting on these as a combined MBTR value shortly.

Review requested from :bsmedberg for telemetry code, :cpearce for the rest.
Attachment #8570189 - Attachment is obsolete: true
Attachment #8575766 - Flags: review?(cpearce)
Attachment #8575766 - Flags: review?(benjamin)
Raised bug 1141915 for the telemetry dashboard change mentioned in comment 13.
Comment on attachment 8575766 [details] [diff] [review]
Report join latency and MTBR stats

Review of attachment 8575766 [details] [diff] [review]:
-----------------------------------------------------------------

Do we want to report the mean rebuffering time too? If that number goes down over time, our users' experience gets better right?

::: dom/html/HTMLMediaElement.cpp
@@ +2541,5 @@
>    return rv;
>  }
>  
> +static double
> +Rate(const HTMLMediaElement::TimeDurationAccumulator& a,

Rate() is unused, can be removed.
Attachment #8575766 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #17)
> Do we want to report the mean rebuffering time too? If that number goes down
> over time, our users' experience gets better right?

YouTube measures MTBR, Mean Time *Between* Rebuffering, where bigger is better. This patch's mRebufferTime looks like it is measuring something different: the cumulative time the user spends watching a rebuffering spinner.

I don't know which metric would be a better indicator for user experience.
Comment on attachment 8575766 [details] [diff] [review]
Report join latency and MTBR stats

Is this VIDEO_MSE_BUFFERING_COUNT instead of a histogram VIDEO_MSE_BUFFERING_LENGTHS because a histogram of lengths is too complex to code? I'm ok with this version, but having an actual histogram of lengths would let us calculate the amount of time spent buffering as well.

Also ISTM you wouldn't need the TimeDurationAccumulator if you wrote it like that: you could save the last played time and accumulate the histogram frequently, rather than once at the end.

I'm going to mark r+ but I feel like this could be simpler...
Attachment #8575766 - Flags: review?(benjamin) → review+
(In reply to Chris Peterson [:cpeterson] from comment #18)
> YouTube measures MTBR, Mean Time *Between* Rebuffering, where bigger is
> better. This patch's mRebufferTime looks like it is measuring something
> different: the cumulative time the user spends watching a rebuffering
> spinner.

This patch reports the total play time (ie. time spent not watching the spinner) and the total number of times buffering occurred (ie. count of times the spinner happened). From this we can work out the average time spent consecutively playing before a rebuffer happens. Bigger is better. Is that not what MTBR is?
Flags: needinfo?(cpeterson)
Yes, that sounds correct. Reading the patch, I was confused that we were starting and stopping a timer called mRebufferTime. I didn't see that we are only reporting telemetry for mRebufferTime.Count().
Flags: needinfo?(cpeterson)
https://hg.mozilla.org/mozilla-central/rev/afb6a4080533
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
This Telemetry addition is a little more involved than most. I had tracked this for 37 as it was marked as an MSE P1. Do you think that this is required for 37 or can we take this in 38 or 39 instead?
Flags: needinfo?(giles)
Ralph is OOO. ni on cpearce instead.
Flags: needinfo?(giles) → needinfo?(cpearce)
I am a little preoccupied, cajbir can you please handle the uplift of this, if we need it?
Flags: needinfo?(cpearce) → needinfo?(cajbir.bugzilla)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #25)
> This Telemetry addition is a little more involved than most. I had tracked
> this for 37 as it was marked as an MSE P1. Do you think that this is
> required for 37 or can we take this in 38 or 39 instead?

It's not required for 37. We're waiting on a telemetry dashboard fix to view the data so it's not urgent.
Flags: needinfo?(cajbir.bugzilla)
Comment on attachment 8575766 [details] [diff] [review]
Report join latency and MTBR stats

I think we should get this into 38. It's now had a week to bake on nightly.

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Test data for feature evaluation becomes available more slowly.
[Describe test coverage new/current, TreeHerder]: Landed on m-c.
[Risks and why]: Change to add new telemetry is isolated and easy to disable. Risk looks low to me.
[String/UUID change made/needed]: None.
Attachment #8575766 - Flags: approval-mozilla-aurora?
Attachment #8575766 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.