Closed Bug 831998 Opened 7 years ago Closed 7 years ago

<video> canplaythrough (autoplay) will not fire in less than three seconds due to reliable rate logic

Categories

(Core :: Audio/Video, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: johns, Assigned: johns)

References

()

Details

Attachments

(1 file)

I've noticed that video elements on various sites will not begin autoplaying for seconds, whereas they will begin nearly immediately in other browsers.

Some digging reveals content/media/MediaDecoder.cpp is using the 'aReliable' out param of GetRate() to estimate its reliable decode rate. However GetRate's reliable parameter is hardcoded to |seconds >= 3.0| here

http://dxr.mozilla.org/mozilla-central/content/media/MediaResource.h.html#l91

With similar logic for GetRateAtLastStop()

Without a reliable decode rate estimate, we wait until we have ten seconds of buffered video before CanPlayThrough() is true, meaning videos on a page will essentially wait three seconds or for a significant amount of data to be downloaded before playing.

A hack to remove the reliable check does indeed cause autoplay videos to begin playing instantly on a good connection.

Example:
http://people.mozilla.com/~jschoenick/autoplay.htm

FF: canplaythrough occurs at ~1s *lan* connection to server, ~3s from home PC
with reliable check removed entirely: 70-115ms
chrome: 150-300ms
Note that a hard refresh does not seem to clear the video from cache, so you need to close the tab and manually flush cache to see the issue.
The 3 seconds probably comes from tuning the playback experience on the media teams wonderful NZ internet connections at the time :)
Looks like it was bumped from 1s to 3s in bug 492420 to handle issues with live streams.  Since we've got more buffering information available now via GetBuffered, it's likely we can improve that, but I don't think we want to simply remove the delay in GetRate becoming reliable.

There's some history of other implementations firing canplaythrough far too optimistically.  For example, see bug 627153 comment 6 and http://crbug.com/73609.
(In reply to Matthew Gregan [:kinetik] from comment #3)
> There's some history of other implementations firing canplaythrough far too
> optimistically.  For example, see bug 627153 comment 6 and
> http://crbug.com/73609.

In this case though, isn't being too conservative worse? A 3000ms delay for all videos for a user that could stream them all immediately, vs potential miscalculation and buffering 5-10s into a video for users on slower connections. The latter is assuming the first ~500ms of data is at a higher rate than they can sustain -- and connections that allow "burst" for the first few seconds would also fool our current heuristic with the 3s delay.
we can do better, right?  please?
I think slow start is the primary challenge here. All of TCP has to ramp up its sending rate because of it, so early bandwidth samples are unreliable. During slow start bandwidth is really driven by how fast CWND can grow and that is limited by rtt. The good news is that after enough data you can feel like your connection is only going to get faster and that seems like the relevant property for "reliable" here, right?

We know that the median CWND for long connections, thanks to spdy data, is about 30 packets. If we take the conservative assumption of CWND starting at 3 (aka IW3) and draw out the exponential growth we get a window of 30 after a little more than 4 round trips (i.e. 3 + 6 + 12 + 24 + the-first-15pct-of-48)..

that's a total of 57 packets.. handwave a little more and say each packet is 1460 bytes and you're looking at about 84KB of data. After that point the stream can be assumed to be moving pretty fast and generally staying the same or getting faster with the problems of over-estimating the speed of early data largely eliminated because more data has been transferred than is left latently unused in a round trip for the kinds of connections we see.

its really pretty conservative. For streams that will run CWND > 30 this just means we offer a bandwidth sample that will grow over time (delaying the canplaythrough event un-necessarily), and for streams that run CWND < 30 it takes them extra round trips to reach this threshold again delaying the event un-necessarily. 

I could imagine using a number as low as 21 (3 round trips) on the more aggressive side. But I think its important to have at least 4 round trips involved.. your first two don't carry much data and are dominated by wait time.. so its important to water down that sample.

I don't want to oversell it - its heuristic. But then again so is "3 seconds". :)

as anecdote - If I change *aReliable algorithm from (seconds >= 3) to be (seconds >=3 || data >= 84kb) on my bog standard cable service I get a canplaythrough event at ~1500ms instead of ~30000 and it plays just fine.

This doesn't deal with something like "timewarner turbo boost", though in general both the allowed burst and the sustained rate of things like that seem to be above what is necessary to do the streaming so that's really the opposite of the problem I would be worried about here.

The other way you could look at it is to say that you need to wait 4+ round trips before you were reliable and do the time calculation based on the rtt of the handshake (i.e. min(3second, 4rtt)). That has a theoretical niceness to it, but a] its hard to trust 1 rtt sample as the number you should be scaling on, and b] its even less robust to IW10 instead of IW3 (IW is a server setting -we can't really ever know it for certain).

fwiw

diff --git a/content/media/MediaResource.h b/content/media/MediaResource.h
--- a/content/media/MediaResource.h
+++ b/content/media/MediaResource.h
@@ -21,16 +21,19 @@
 
 // For HTTP seeking, if number of bytes needing to be
 // seeked forward is less than this value then a read is
 // done rather than a byte range request.
 static const int64_t SEEK_VS_READ_THRESHOLD = 32*1024;
 
 static const uint32_t HTTP_REQUESTED_RANGE_NOT_SATISFIABLE_CODE = 416;
 
+// 57 Segments at IW=3 allows slow start to reach a CWND of 30.
+static const int64_t RELIABLE_DATA_THRESHOLD = (57 * 1460);
+
 namespace mozilla {
 
 class MediaDecoder;
 
 /**
  * This class is useful for estimating rates of data passing through
  * some channel. The idea is that activity on the channel "starts"
  * and "stops" over time. At certain times data passes through the
@@ -78,28 +81,31 @@ public:
       // ignore this data, it may be related to seeking or some other
       // operation we don't care about
       return;
     }
     mAccumulatedBytes += aBytes;
   }
   double GetRateAtLastStop(bool* aReliable) {
     double seconds = mAccumulatedTime.ToSeconds();
-    *aReliable = seconds >= 1.0;
+    *aReliable = (seconds >= 1.0) ||
+        (mAccumulatedBytes >= RELIABLE_DATA_THRESHOLD);
+
     if (seconds <= 0.0)
       return 0.0;
     return static_cast<double>(mAccumulatedBytes)/seconds;
   }
   double GetRate(bool* aReliable) {
     TimeDuration time = mAccumulatedTime;
     if (mIsStarted) {
       time += TimeStamp::Now() - mLastStartTime;
     }
     double seconds = time.ToSeconds();
-    *aReliable = seconds >= 3.0;
+    *aReliable = (seconds >= 3.0) ||
+        (mAccumulatedBytes >= RELIABLE_DATA_THRESHOLD);
     if (seconds <= 0.0)
       return 0.0;
     return static_cast<double>(mAccumulatedBytes)/seconds;
   }
 private:
   int64_t      mAccumulatedBytes;
   TimeDuration mAccumulatedTime;
   TimeStamp    mLastStartTime;
Maybe I'm misunderstanding this, but is slow start a factor in this event? We fire canplaythrough when we estimate that our download rate is "reliably" faster than the bitrate of the video (or, failing at that, we have some number of seconds buffered). It follows that waiting for the rate to become "reliable" is only useful in cases of overestimation. However, most network bursting is over the course of seconds, not a few RTTs, so short of waiting a very long time, we're not going to be able to detect things like ISP bursting without profiling previous connections.

What exactly is this check accomplishing for the media code?
(In reply to John Schoenick [:johns] from comment #7)
> What exactly is this check accomplishing for the media code?

We're waiting until we believe the download rate is reliable/consistent enough before we use that data to decide whether or not to fire canplaythrough with a reasonable chance of not being wrong. When we fire the canplaythrough event we start playback if a media element has the "autoplay" attribute, and we don't want to start playback if the cached data + download rate is insufficient to play through the resource without a stall.

Chrome just fires "canplaythrough" as soon its determined that the video is playable... So people keep complaining that Firefox is slow compared to Chrome. So I'm happy to take reasoned advice (like Patrick has given, thanks!) on how we can reduce the 3 second reliability check.
(In reply to Chris Pearce (:cpearce) from comment #8)
> We're waiting until we believe the download rate is reliable/consistent
> enough before we use that data to decide whether or not to fire
> canplaythrough with a reasonable chance of not being wrong.

Yes, but "being wrong" in this case means we think our data right is *high* enough, when it is not, correct? In which case the check is supposedly guarding against spuriously high estimates of throughput, not spuriously low ones that would come from bad interaction with TCP ramp-up. So what situation would we have a value for data rate that was higher than reality, that waiting for 3000ms solves?
(In reply to John Schoenick [:johns] from comment #9)
> Yes, but "being wrong" in this case means we think our data right

s/right/rate/
(In reply to John Schoenick [:johns] from comment #9)
> So what
> situation would we have a value for data rate that was higher than reality,
> that waiting for 3000ms solves?

IIRC we observed some connections sending a big burst of data at the start and then slow down. I know some web servers do this when rate limiting (lighttpd for one).

Basically we know very little about networking, but we're under the impression that the download rate is variable at the start of a download, and 3 seconds was a number that Roc pulled out of the air. ;)

Are you arguing that we should remove the 3 second stand-down altogether and just use the 84KB guard that Patrick proposed? We're happy to try things here.
(In reply to Chris Pearce (:cpearce) from comment #11)
> IIRC we observed some connections sending a big burst of data at the start
> and then slow down. I know some web servers do this when rate limiting
> (lighttpd for one).

This is a legitimate problem -- many ISPs, for instance, provide higher burst rates at the beginning of a connection like the "timewarner turbo boost" Patrick mentions. This usually lasts for a period of multiple seconds, though, and delaying all media streams for a long time seems like a poor solution. We would need more support in the networking code to profile connections -- e.g. track the highest sustained data rate we see in connections that are past the 5s mark, and once we've profiled N such channels, assume that datarates higher than that before the 5s mark are bursty and unsustainable.

> Are you arguing that we should remove the 3 second stand-down altogether and
> just use the 84KB guard that Patrick proposed? We're happy to try things
> here.

Yes - in fact, I'm not sure if the 84KB guard is useful either. Patrick's comment is a good basis for determining if a TCP session has passed through its ramp-up window, and may be useful to other reliable-rate consumers, but for the purposes of this event, it seems fairly unlikely that we'll get a high rate inside the ramp-up window and then a lower rate afterwards.

Perhaps a better check would be to replace the reliable-rate check with 500ms of buffered video? In that case, at least, we're not firing the event almost immediately upon getting a few packets, which would not be a good time to estimate the rate for low bitrate videos. For higher bitrate videos, its unlikely that we'll have 500ms buffered but still have too few packets for a rate estimation to be sensible.
(In reply to John Schoenick [:johns] from comment #9)
> (In reply to Chris Pearce (:cpearce) from comment #8)
> > We're waiting until we believe the download rate is reliable/consistent
> > enough before we use that data to decide whether or not to fire
> > canplaythrough with a reasonable chance of not being wrong.
> 
> Yes, but "being wrong" in this case means we think our data right is *high*
> enough, when it is not, correct? In which case the check is supposedly
> guarding against spuriously high estimates of throughput, not spuriously low
> ones that would come from bad interaction with TCP ramp-up. 

low estimates often yes, but you can get spuriously high ones too (though it is less likely).

beware of crude ascii art!

X=data .=no-data

   xxx............xxxxxx
t= 012345678901234567890

if you take a bw sample at t=2 you get a rate much too high compared to what tcp will really deliver (soon).. if you take it at sample t=14 you get a much lower rate. The number of "." samples depends on the latency of the connection - its really hard to know reliably.

The idea of basing things around enough data to establish a 30 packet window (57 packets total to get there) is that from our telemetry that will fill up many people's pipes enough to make those "." become a distinct minority and thus the estimate more reliable.

I don't have an answer for the "turbo boost" scenario other than to suspect it isn't relevant.. but I'm pretty sure that a policy of "wait 3 seconds" isn't any more robust in that regard.

but I agree its definitely something that has to be experimented with
Comment on attachment 745419 [details] [diff] [review]
Make the media canplaythrough estimation less conservative

Okay, looking at this more we already have a spot where we check for buffered data, in addition to reliable rate:
http://dxr.mozilla.org/mozilla-central/content/media/MediaDecoder.cpp#l1615

This seems way too conservative to me on top of the rate estimation - our download speed is faster than bitrate, *and* we have 10s of data buffered -- being wrong is bad, but waiting around much longer than necessary is also bad.

I think then we should try Patrick's approach, combined with lowering this limit. This patch takes his suggestion of (84kB data || time > 3s), and then lowers the buffered data requirement to 1 second. For bitrates above ~660kbps we would necessarily have more than 84kB buffered if 1s of video was available, and we'll be a little more conservative for lower bitrate videos where our calculation might be less robust per comment 13.

I pushed this to try to play with here:
https://tbpl.mozilla.org/?tree=Try&rev=f2423c11e6c5
Attachment #745419 - Flags: review?(cpearce)
Attachment #745419 - Flags: feedback?(mcmanus)
Attachment #745419 - Flags: feedback?(mcmanus) → feedback+
(In reply to Patrick McManus [:mcmanus] from comment #13)
> but I agree its definitely something that has to be experimented with

AFAIK, Youtube does burst-on-connect followed by tightly rate-limited transfers. So this should be easy to experiment with.
Comment on attachment 745419 [details] [diff] [review]
Make the media canplaythrough estimation less conservative

Review of attachment 745419 [details] [diff] [review]:
-----------------------------------------------------------------

Great! I built and tested your patch and it performs well. Thanks!

::: content/media/MediaResource.h
@@ +27,5 @@
>  static const uint32_t HTTP_REQUESTED_RANGE_NOT_SATISFIABLE_CODE = 416;
>  
> +// Number of bytes we have accumulated before we assume the connection download
> +// rate can be reliably calculated.
> +// 57 Segments at IW=3 allows slow start to reach a CWND of 30.

Can you add a "See bug 831998" to this comment here? This last line is just gobbledygook to me, so I'd like to make it easier to find your more in verbose analysis in this bug.
Attachment #745419 - Flags: review?(cpearce) → review+
Comment on attachment 745419 [details] [diff] [review]
Make the media canplaythrough estimation less conservative

https://hg.mozilla.org/integration/mozilla-inbound/rev/6380415c496d
Attachment #745419 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/6380415c496d
Assignee: nobody → jschoenick
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.