Closed Bug 479863 Opened 15 years ago Closed 15 years ago

Implement 'autobuffer' and stop buffering everything by default

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: roc, Assigned: roc)

References

Details

(Keywords: dev-doc-complete, verified1.9.1)

Attachments

(4 files, 3 obsolete files)

http://www.whatwg.org/specs/web-apps/current-work/#attr-media-autobuffer

We should implement 'autobuffer', which means by default we should stop reading ahead once we reach HAVE_CURRENT_DATA.
Flags: blocking1.9.1+
Assignee: nobody → kinetik
Spec currently says to suspend the network fetch once HAVE_METADATA is reached unless autobuffer or autoplay is set.  Robert has requested clarification: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-February/018653.html

As it makes more sense to suspend after HAVE_CURRENT_DATA (i.e. once we have a frame of video data to display), that's what I'll implement.  In the event that this turns out to be incorrect, it'll be trivial to update the patch.
Attached patch patch v0 (not for review) (obsolete) — Splinter Review
First pass--pretty simple.  Also fixes a bug where progress events where fired when no data was coming in (spec says least frequent of ~350ms or every byte received).

Not for review because there's a bug where streams are not handing us data after being suspended when doing a new load of the same URI.  For example, load a video at tinyvid.tv and wait until the first frame is displayed and "suspend" is fired.  Then hit reload.  The page reloads, but the video never does because we don't receive data and thus never make it beyond "loadstart".  Setting a new URI using v1.src = "some-video-never-loaded.ogg"; v1.load(); works once, then breaks the same way.

Also, there's probably no point suspending the stream if it's a local file.  Or if the data is coming from Necko's cache.  On the other hand, it doesn't hurt, because the data is available quickly when the stream is resumed in these cases.
Calling mDecoder->Resume() in a bunch of places before we call mDecoder->Shutdown() fixes (or, at least, works around) that bug.

I seem to remember a related issue where loading the same file twice seemed to be sharing some Necko resource, e.g. loading a page at tinyvid and loading the video file directly resulted in the second request waiting on the first to complete.  If this still happens, it's obviously going to be a problem if the same file is loaded twice, once with autobuffer and once without.
> Also fixes a bug where progress events where fired
> when no data was coming in (spec says least frequent
> of ~350ms or every byte received).

Is that bug 476813?
> If this still happens, it's obviously going to be a problem if the
> same file is loaded twice, once with autobuffer and once without.

Just tried this now. It does still happen. Even opening the same video on two tabs triggers it.
(In reply to comment #4)
> Is that bug 476813?

Yep, thanks.  I'll make sure the logic in my fix is correct and post it as a separate patch to that bug after I've had some sleep.
One thing Robert mentioned in another bug (can't recall which one) was what happens when the video is suspended for a long time? Does the HTTP connection timeout and get restarted gracefully when Resume is called?
That'd be bug 479711 comment 14.  It'll be important to have a good answer for that question before landing this patch.
I don't think we can rely on a graceful restart on Resume.
I couldn't see any code to deal with graceful restarts in nsHttpChannel, but I didn't look that hard.  I left a media stream suspended overnight and it didn't restart afterwards.  Unfortunately, when the stream times out, we think the resource is completely loaded because of bug 478109.

Opening streams with LOAD_BYPASS_CACHE avoids the shared-stream-blocking problem mentioned in comment 3 and comment 5.  I assume that Necko is servicing the second request from cache, but has to wait for the first request to complete before it services the second (not sure why it can't begin as soon as data is available).  This won't work well for suspended (or infinite) streams, because the first request will never complete.  Thinking about the possible cases:

- load, uncached: load as normal (so the data is cached).

- load, in cache but not complete (only possible if another load in-flight): either use LOAD_BYPASS_CACHE for the second load, or resume the first load if it's suspended.  Resuming doesn't solve the infinite stream case, though.

- load, in cache and complete: service from cache, don't bother with suspend/resume.
Turns out to be easy, just pass nsICachingChannel::LOAD_BYPASS_LOCAL_CACHE_IF_BUSY.  Awesome.

Need to move the resume-on-shutdown down into the stream strategies, otherwise we can try to resume a stream that was never suspended (due to the channel being reopened).  If we don't resume when shutting down, a whole lot of stuff is leaked.

Also need to deal with the resume case where the server has timed us out.  It turns out that this is required for the mochitests to pass--resuming a connection (even almost immediately) with the test server doesn't seem to work, we never receive more data.

I wonder if it'd be easier to avoid suspend/resume completely, and cancel the channel where the current patch suspends it.  It seems like it'd be a lot easier to deal with resuming if we just open a new channel when the user wants to begin playback.
Spec has been changed to stop reading at HAVE_CURRENT DATA: http://html5.org/tools/web-apps-tracker?from=2873&to=2874
Blocks: 464272
I think this is one of the few video/audio bugs that we should actually block the release on. It's an author-visible behaviour change that we really really need to make it palatable to have many media elements on a page.
Attached patch fix (obsolete) — Splinter Review
Here's an implementation of autobuffer along with the change in behaviour to not download (much) past the first frame by default. This is fairly straightforward, the key thing is consider suspending the download in nsHTMLMediaElement::FirstFrameLoaded and remember that we did, then in various places we need to call StopSuspendingAfterFirstFrame to resume downloading if necessary.

This patch also fixes dynamic setting of 'autoplay' so that setting autoplay to true will actually trigger playing if we're already in state HAVE_ENOUGH_DATA.

The patch does not try to stop downloading if we've passed FirstFrameLoaded and autobuffer and autoplay are dynamically removed. That would require tracking a bit more state and doesn't seem worth it, at least not right now.

This patch doesn't address the problem of servers dropping the connection when we've suspended too long. I will work on that in a followup patch.
Assignee: kinetik → roc
Attachment #364074 - Attachment is obsolete: true
Attachment #377055 - Flags: review?(chris.double)
(In reply to comment #15)
> This patch also fixes dynamic setting of 'autoplay' so that setting autoplay to
> true will actually trigger playing if we're already in state HAVE_ENOUGH_DATA.

This can potentially regress bug 478982. (haven't tried the patch, just throwing this up)...
Summary: Implement 'autobuffer' → Implement 'autobuffer' and stop buffering everything by default
(In reply to comment #16)
> (In reply to comment #15)
> > This patch also fixes dynamic setting of 'autoplay' so that setting autoplay
> > to true will actually trigger playing if we're already in state
> > HAVE_ENOUGH_DATA.
> 
> This can potentially regress bug 478982. (haven't tried the patch, just
> throwing this up)...

No, this only makes dynamic 'autoplay' setting match what we would have done if 'autoplay' had been there from the start.
Attached patch fix v2Splinter Review
The previous patch had a bug where if you initiate a seek before FirstFrameLoaded fires, we still suspend the download. This version fixes that by adding mAllowSuspendAfterFirstFrame to track whether something that should trigger buffering has happened.
Attachment #377055 - Attachment is obsolete: true
Attachment #377084 - Flags: review?(chris.double)
Attachment #377055 - Flags: review?(chris.double)
We *probably* get little benefit from HTTP persistent connections for media streams because a) they're usually large, so reusing a connection for multiple complete media loads is of little benefit and b) if we're seeking then we always cancel an existing channel which requires closing the existing connection. So this patch sets the Connection: close header which means we can avoid limits on the number of persistent connections (although we still hit limits on the total number of connections).

The one case this might hurt is finding the duration of an Ogg file, where we seek to the end of the stream. Right now we might be reading to the end of the stream and be able to reuse the TCP connection for the seek back to the beginning. This patch might cost us a TCP setup round-trip in that case. I'll have to look into it.
The autobuffer patch makes us suspend downloads a lot, which creates two problems:
1) Connections which are suspended for a long time might be closed by the server, and currently we have no way to detect that and reopen the connection.
2) If there are many media elements on the page then we can hit the connection limit and nothing more from that domain will load!

This patch addresses those problems in two ways. First we add a parameter to nsMediaStream::Suspend: aCloseImmediately. When this is false, we suspend the channel as now. When it's true, we actually close the channel and then on Resume we'll reopen the channel. We have to use a flag mIgnoreClose to avoid notifying the media cache and decoder that the stream has ended. When we suspend the stream because we've loaded the first frame and we don't want to buffer any more data, we pass true for aCloseImmediately. This addresses problem #2.

The other thing this patch does is set a flag mReopenOnError when we Resume a suspended channel. Then, if the stream closes with an error, we'll ignore the error and try to reopen it. This should handle problem #1.

It's still possible to use up all the connections to a server by playing many videos at once and filling up the media cache so the cache suspends many connections --- it still passes false for aCloseImmediately. Probably we should add a timeout so that after a channel has been suspended for some time interval, we go ahead and close it. We're less likely to hit this case so that's future work.

To make this work I have to track precisely the offset of the nsMediaChannelStream, so I've renamed mLastSeekOffset to mOffset and updated it in OnDataAvailable.

The media cache often resumes a stream and then does a seek on it. To avoid creating a new stream in the resume and then instantly killing it in the seek, I've extended CacheClientSeek to support seeking and resuming in one action, so we can avoid unnecessary Resume work.

Note that this patch makes us use aCloseImmediately when the media element is frozen for bfcache. I think that's a good thing!
Attachment #377090 - Flags: review?(chris.double)
It occurs to me that it might be a good idea for FirstFrameLoaded to have a small delay, say one second, before it suspends download. That would allow some extra data to arrive, making it more likely that if the user presses play we'll be able to reopen the connection and start receiving data before the buffered data runs out.
Correct me if I'm wrong, but keeping connections alive (but "suspended") will causes scaling issues on the server side.

I'm looking to deploy <video> on the DownThemAll! landing page. The initial page will likely contain three small tutorial videos.
Having 2M+ users and 300k+ downloads a week this this will generate a lot of hits especially whenever a major version is released.
Keeping 3 connections for three videos alive for each user, although the user is highly unlikely to watch all videos simultaneously if at all will consume that many open connections that the server will soon refuse to accept new ones.

Now think about sites like wikipedia that have lots of users, make use of <video> and try to offer more and more videos... When a lot of Firefox users keeping articles containing videos open for long periods of time they might inadvertently DDoS wikipedia when either the tcp stack or the the httpd has it's queue filled up because of all those active connections.
(In reply to comment #21)
> It occurs to me that it might be a good idea for FirstFrameLoaded to have a
> small delay, say one second, before it suspends download. That would allow some
> extra data to arrive, making it more likely that if the user presses play we'll
> be able to reopen the connection and start receiving data before the buffered
> data runs out.

The the webmaster decide here. If he disables autobuffering he likely has a reason.
Take my use case in comment #22 as example. It would be best to avoid extra traffic there as I can be sure that a lot of videos won't even be watched by many visitors.
Currently we fully download each video we see in the page until the local media cache is full, and then we hold the connection open. The patches in this bug make us stop downloading after we can display the first frame of each video (unless the video is marked with 'autobuffer'), and close the connections used. In other words, the purpose of this bug is to fix the problems you are concerned about.
Comment on attachment 377084 [details] [diff] [review]
fix v2

This needs a little more work to be spec-compliant. In particular we need to fire the "suspend" event.
Attachment #377084 - Flags: review?(chris.double)
I decided it would be easiest to do this as a separate patch on top of the others. This adds support for the "suspend" event and changing networkState to NETWORK_IDLE while we're suspended, as per spec. (This also applies to suspends requested by the media cache.)

The only thing I haven't done here is stop blocking the document load event while the connection is suspended. That wouldn't be too hard to do but it's not important right now --- if we suspend due to going into bfcache, it's irrelevant; if we suspend after displaying the first frame, we've already stopped blocking the load event; the only situation where it matters is if the cache decides to block loading before we've been able to display the first frame. But that's unlikely (or at worst, temporary) since the decoder will read data quickly to get the first frame, and the cache will keep giving it data even though the cache is full, and eventually we'll reach the first frame and stop blocking the load event. So I don't think we'll ever suspend a download indefinitely while the resource is blocking the document load event.

This patch avoids firing the "stalled" event while the download is suspended. I wasn't completely sure if that's what the spec wants but it makes sense to me.
Attachment #377342 - Flags: review?(chris.double)
Comment on attachment 377084 [details] [diff] [review]
fix v2

I ended up not changing this.
Attachment #377084 - Flags: review?(chris.double)
Whiteboard: [needs review]
Attached patch part 3 v2Splinter Review
Need to fix test_load.html to ignore suspend events, it wasn't ready for them. When we're not using use_large_cache.js there's always the possibility of getting a suspend event because the cache is full.
Attachment #377342 - Attachment is obsolete: true
Attachment #377348 - Flags: review?(chris.double)
Attachment #377342 - Flags: review?(chris.double)
Attachment #377084 - Flags: review?(chris.double) → review+
Comment on attachment 377090 [details] [diff] [review]
Smarter suspend and resume

  // Opens the channel, using an HTTP byte range request to start at aOffset
   // if possible. Main thread only.
-  nsresult OpenChannel(nsIStreamListener** aStreamListener, PRInt64 aOffset);
+  nsresult OpenChannel(nsIStreamListener** aStreamListener);
Attachment #377090 - Flags: review?(chris.double) → review+
Comment 29 should say, I think the comment should be updated for the removal of aOffset. r+ with that.
Attachment #377348 - Flags: review?(chris.double) → review+
Whiteboard: [needs review] → [needs landing]
http://hg.mozilla.org/mozilla-central/rev/0da7ac7acba6
http://hg.mozilla.org/mozilla-central/rev/2ab43146677c
http://hg.mozilla.org/mozilla-central/rev/0b624f6bb320
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 landing]
Perhaps there could be a pref controlling this? Some users will probably like to be able to have videos buffer whether or not the webpage has this attribute...
As soon as you visit a page with dozens of media elements on it, you'll change your mind.
Verified fixed on trunk with builds on OS X and Windows: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090519 Minefield/3.6a1pre ID:20090519032945

http://www.rumblingedge.com/2008/11/09/mark-surman-interview-in-sept-08/
http://air.mozilla.com/
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla1.9.2a1
This was landed:
c4ba19fecb45
7e271ab93764
b90e72e273de
491b83f59b17
9d709c3b7ec2
068cb5867924
11694b807f68

but backed out due to bustage:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a49416ba591b
test_closing_connections.html has caused a random orange

Bug 494274 - test_closing_connections.html | Error thrown during test: [object ProgressEvent] - got 0, expected 1
Depends on: 494274
Verified fixed on 1.9.1 too with the testcase from comment 36 on all platforms and builds like Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090524 Shiretoko/3.5pre ID:20090524031149
Keywords: dev-doc-needed
Blocks: 458305
You need to log in before you can comment on or make changes to this bug.