Closed Bug 475441 Opened 16 years ago Closed 15 years ago

Cache media data after playing it and across seeks

Categories

(Core :: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: dale, Assigned: roc)

References

()

Details

(Keywords: fixed1.9.1)

Attachments

(15 files, 6 obsolete files)

8.71 KB, text/plain
Details
24.63 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
15.73 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
10.60 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
5.62 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
2.68 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
2.60 KB, patch
kinetik
: review+
Details | Diff | Splinter Review
4.14 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
6.67 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
248.03 KB, patch
Details | Diff | Splinter Review
248.00 KB, patch
Details | Diff | Splinter Review
2.29 KB, text/html
Details
786 bytes, patch
conrad
: review+
Details | Diff | Splinter Review
209.91 KB, patch
cajbir
: review+
kinetik
: review+
Details | Diff | Splinter Review
28.94 KB, patch
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2a1pre) Gecko/20090126 Minefield/3.2a1pre
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2a1pre) Gecko/20090126 Minefield/3.2a1pre

Presently video support discard all buffered data before the current play position. This is problematic causing many net requests for simple seeks (even within areas of the video that appear to already be buffered.) 

Reproducible: Always

Steps to Reproduce:
1.Play the video
2. Try to seek within buffered areas or before the current play head position
Actual Results:  
Notice the delay as the client goes to the web to seek since it threw away all your buffed data. 

Expected Results:  
We should not throw away buffered data seeking should be done from disk for parts of the clip that has been downloaded. 

This is especially bad in low bandwidth cases where you play back the video and then it pauses because it hits the end of the buffer. You then rewind the playhead to get smooth playback (like you would with any other video player on the web) Unfortunately the client throws away all your buffer info and the low quality playback experience is repeated.
Severity: major → blocker
Bug 469408 also discusses some of these issues.
Component: General → Video/Audio
Product: Firefox → Core
QA Contact: general → video.audio
Version: unspecified → Trunk
OS: Linux → All
Hardware: x86 → All
Severity: blocker → normal
I highly recommend this bug be made blocking for 1.9.1.

Without this bug fixed variable and low bandwidth connections will NOT be able to playback the video at all.

Additionally this bug will be highly detrimental for a server under high load. Users will have low quality experience and be re-requesting the video every time they try to play it back smoothly... snowballing into more people having a low quality experience. 

The video playback system will be labeled broken as it fails to work comparably to other video playback systems in this extremely common use case.
Severity: normal → critical
Flags: wanted1.9.1?
Flags: blocking1.9.1?
before changing bug severity please see 
https://bugzilla.mozilla.org/page.cgi?id=fields.html#bug_severity
If video file is big we may not want to keep that in memory.
So we may want a about:config item to say how much should be buffered.
Blocks: 476397
Is this CONFIRMED? While I think it's definitely wanted, I don't know if it blocks. It is a major user inconvenience, though, I agree.
confirming following steps...

Case 1
1. On Firefox nightly goto http://www.double.co.nz/video_test/test5.html
2. Wait till video fully loaded 
3. Goto windows network connection
4. Disable your internet connection
5. Come back to Firefox
6. Click play button on the VIDEO tag UI controls
7. Wait till the movie ends
8. Click play button again to replay movie

Result:-
Movie dont play, screen blanks out

Expected:-
Movie plays from buffer


Case 2
(enable internet connection)
1. On Firefox nightly goto http://www.double.co.nz/video_test/test5.html
2. Wait till video fully loaded 
3. Goto windows network connection
4. Disable your internet connection
5. Come back to Firefox
6. Click play button on the VIDEO tag UI controls
7. Wait till 20sec of the movie play
8. Move thumb to already played position (say to the 10th sec)
9. Wait for movie to continue from 10sec 

Result:-
Movie never plays, it is stuck at the 20th sec frame

Expected:-
Movie plays from the buffer 10th sec



Case 3
(enable internet connection)
1. On Firefox nightly goto http://www.double.co.nz/video_test/test5.html
2. Wait till video fully loaded 
3. Goto windows network connection
4. Disable your internet connection
5. Come back to Firefox
6. Click play button on the VIDEO tag UI controls
7. Wait till 20sec of the movie play
8. Move thumb to a buffered position after current (say to the 30th sec)
9. Wait for movie to continue from 30sec 

Result:-
Movie never plays, it is stuck at the 20th sec frame

Expected:-
Movie plays from the buffer 30th sec



Case 4
(enable internet connection)
1. install "Live HTTP Headers" addon
   https://addons.mozilla.org/en-US/firefox/addon/3829
2. Enable http headers at Firefox side bar
   (View > Sidebar > LiveHTTPHeaders)
3. Goto http://www.double.co.nz/video_test/test5.html
4. Wait till video fully loaded 
5. "Clear" the HTTP headers at sidebar
6. Click play button on the VIDEO tag UI controls
7. Wait till 20sec of the movie play
8. Move thumb to a buffered position after current (say to the 30th sec)

Result:-
A lot of HTTP chatter in the LiveHTTPHeaders window.
Then the movie continue to play.

Expected:-
Movie plays from the buffer 30th sec immediately with out HTTP chatter 


I also feel this should be blocking1.9.1
I also think this is the root cause of bug 475286
ie, when network is disabled the thumb moves immediately 
to where ever you click.
Severity: critical → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
tracking-fennec: --- → ?
Or we are discarding buffer content when we start seeking?
(In reply to comment #8)
> Or we are discarding buffer content when we start seeking?

So I think this is because of bug 475685
Depends on: 475685
It is not because of bug 475685. We store the buffer of loaded data in the listener for the channel. When we seek, or replay, we get a new channel and the buffered data is gone. The buffered data is stored in a pipe. As we read from the pipe the buffered data is gone, that is why we only have data from the current play position.
No longer depends on: 475685
I don't think we can block on this, but it would be a huge improvement in user experience and we should try to get it in if we can.
Assignee: nobody → roc
Flags: wanted1.9.1?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Priority: -- → P3
This is my first attempt at the interface and basic design.
This looks really good.

> This class caches blocks of compressed media data in files, but not

They're not always compressed (Wave PCM).

>  the block we predict will be used furthest in the future.

farthest

>  from the block or when the client seeks forward over data in the block.

Why renew blocks just because they've been seeked past?

What's the use case for pinning an entire stream?

What happens with partial blocks?  They're never entered into the cache and thus we don't have to deal with them?

I think this API is sufficient to cover the QuickTime decoder backend's needs too.  QuickTime requests data using an async pread()-like cal.   It requests an offset and a size, provides a completion callback (if the request is async, otherwise it must be serviced before returning), and (for async requests) provides a schedule (time the request must be serviced before, priority, and flags to indicate whether it's a regular read, or a preroll (movie opening time) read (preroll requests can also been required or optional)).  If you're unable to service a request within the scheduled time, you're expected to the reject all other outstanding requests at lower priorities; in effect, if you can't supply the data for a keyframe, don't bother supplying the delta frame data either.
(In reply to comment #13)
> >  from the block or when the client seeks forward over data in the block.
> 
> Why renew blocks just because they've been seeked past?

Because generally I think we should assume that data closer to the current playback point is higher priority than other data.

> What's the use case for pinning an entire stream?

My idea is that when script calls bufferedRanges, we call this Pin method. If it returns true then we can return the entire interval [0, duration] as the buffered range and script can then seek --- even if the underlying stream is not seekable --- without possibility of failure due to a cache block being evicted that turns out to be needed.

> What happens with partial blocks?  They're never entered into the cache and
> thus we don't have to deal with them?

Partial blocks should be handled, but I haven't written down enough about them. Basically I think we will always read from the underlying stream in full block increments. OnDataAvailable notifications for partial blocks will just be ignored until the block is full (unless we reach end of stream, of course).
Attached patch checkpoint #13 (obsolete) — Splinter Review
A lot more fleshed out. nsMediaCache is now completely coded, and builds, although there's much work to do to integrate it into nsMediaStream. Some decoder modifications will also be required since 'mDownloadPosition' doesn't make much sense anymore; decoders should be probing the cache to see what's there.

I was originally going to have nsMediaCache manage netwerk stream listeners itself, but at this point it seems better to move all the stream listening into nsMediaStream, which is where we set up the requests anyway.

I changed around the block management a bit as I discovered problems writing the code. Managing the read-ahead data for non-seekable streams is one problem; we can't actually discard that data. But what I have here still seems fairly simple to me. Putting all the control logic in Update() makes it simple, although fairly inefficient, but I suspect for reasonable cache sizes it's not going to be a problem. If necessary we can add optimizations to make Update() far more efficient.
I'm not sure why that came out as checkpoint #13.
Things are slightly simpler if we expose stream strategy subclasses directly to the decoders instead of proxying everything through nsMediaStream. So rename nsStreamStrategy to nsMediaStream. The only other significant change is that responsibility for calling NotifyBytesConsumed has moved up from nsMediaStream::Read to its callers.
Attachment #364238 - Flags: review?(chris.double)
Attachment #364238 - Flags: review?(chris.double) → review+
nsDefaultStreamStrategy and nsHttpStreamStrategy duplicate a bunch of code. We should move most of the functionality up to nsDefaultStreamStrategy, rename it to nsChannelStreamStrategy and just leave the seeking logic in nsHttpStreamStrategy which inherits from nsChannelStreamStrategy.
Attachment #364247 - Flags: review?(chris.double)
I'm asking for 1.9.1 blocking on this for two reasons:

1. The user experience for this is incredibly bad.  If you're watching a video and you seek backwards because you want to watch something again, you can way many many seconds on even a pretty fast connection for it to re-buffer.  (The UI experience where it doesn't say anything about re-buffering should be fixed soon, but it's the amount of time that's bad.)

2. This will have a pretty serious, if somewhat unquantified, impact on servers.  Having to re-download files over and over again is going to waste server bandwidth and make it very difficult to get people to adopt this technology once they realize what happens when the user does something relatively simple.
Flags: blocking1.9.1?
This won't completely fix #1 if the user seeks backward before they reach the end of the video; the current seek algorithm will need to seek *forward* as it does its binary search. We can pretty easily fix that, since we can identify a backward seek and know there's no point searching in front of the current data. We can also tweak the seek algorithm to look at cached data first to narrow down the search. But those are separate bugs.
Flags: blocking1.9.1? → blocking1.9.1+
Attachment #364247 - Attachment is patch: true
Attachment #364247 - Attachment mime type: application/text → text/plain
Attachment #364247 - Flags: review?(chris.double) → review+
Rename nsChannelStreamStrategy to nsMediaChannelStream, nsHttpStreamStrategy to nsMediaHttpStream, and nsFileStreamStrategy to nsMediaFileStream.
Attachment #365087 - Flags: review?
Attachment #365087 - Flags: review? → review?(chris.double)
Attached patch checkpoint #2 (obsolete) — Splinter Review
This is the remainder of the cache patch. It builds and runs, but hits 480063 or something very much like it almost every time you load a video via HTTP.
Attached patch checkpoint #3 (obsolete) — Splinter Review
OK, checkpoint #2 was returning incorrect data in some cases, triggering the liboggplay bug.

This fixes that problem, and adds some null checkery to avoid some crashes, and now we pass the media mochitests!

Currently when nsOggDecoder finishes playing the resource, we destroy the nsChannelReader and nsMediaStream and then recreate them if someone seeks backward. I need to fix that.
Attachment #365091 - Attachment is obsolete: true
Attached patch checkpoint #4 (obsolete) — Splinter Review
This is almost ready for review. Tested it quite a bit with small cache sizes and refined the caching policy. It now deals reasonably with "overflow blocks". I added lots of logging.
Attachment #365087 - Flags: review?(chris.double) → review+
Attached patch part 4: the media cache (obsolete) — Splinter Review
I couldn't figure out how to break it up well, sorry. Here's the synopsis:

-- Refactor UpdateReadyStateForData to add a NotifyAutoplayDataReady helper. This gets called not only when we reach HAVE_ENOUGH_DATA but also when the cache suspends the download because the cache is full ... in that case we need to start playing to free up cache space or we might not ever play.

-- Change the signature of GetCurrentPrincipal so that it returns an already_Addrefed<nsIPrincipal>. This means we don't have to cache a principal in nsMediaStream.

-- The NS_ASSERTION(aNextFrame != NEXT_FRAME_AVAILABLE) is invalid. An UpdateReadyState posted before MetadataLoaded may run and discover that the decoder has already advanced to decode one or more frames.

-- No need for MOZ_MEDIA ifdef in content/media/video/public/Makefile.in, this whole directory is skipped if !MOZ_MEDIA

-- Instead of having the Ogg decoder go through nsChannelReader to cancel/suspend/resume/GetCurrentPrincipal, just expose the underlying nsMediaStream so nsOggDecoder can use it directly. I'm doing this because I need to add more features to nsMediaStream (like asking for the cache state) and I don't want to have to add more glue code to nsChannelReader.

-- nsChannelToPipeListener goes away. The transfer of data from the main thread to the decoder thread is done by nsMediaCache now (the main thread writes to the cache, the decoder threads read from the cache). All the Necko/HTTP-specific hacks in nsChannelToPipeListener go into nsMediaChannelStream.

-- I hope that nsMediaCache.h and nsMediaCache.cpp speak for themselves. I added a bunch of comments. Please let me know of areas that need more comments.

-- In the decoders, instead of manually tracking mDownloadPosition we now just look at the decoder position and ask the cache (via nsMediaStream) how much cached data there is after the decoder position.

-- We no longer track the total bytes in the decoders. We get it from the cache (via nsMediaStream) instead since the cache has more knowledge.

-- NotifyDownloadSeeked is no longer needed. The decoders' decoder position and
(for Ogg) playback position are just updated explicitly after the SEEKING phase has ended, and don't track the stream-level seeks.

-- NotifyBytesConsumed is no longer needed in nsMediaDecoder. Previously it was needed as a general nsMediaDecoder method because nsMediaHttpStream needed to be able to consume bytes from the pipe and notify the decoder to advance mDecodePosition (because there wasn't going to be a stream-level seek to update it), but now that's no longer necessary.

-- NotifyBytesDownloaded no longer has to send the number of bytes. The decoders no longer care. All they need to know is that *some* data arrived so they should re-check to see how much data is available in the cache.

-- NotifySuspendedStatusChanged has been added to the decoders so that the cache can warn them to check if their streams have been suspended by the cache. We may need to start autoplay streams playing, see the first point in this list.

-- Download rate tracking moves from the decoders to nsMediaStream, because nsMediaChannelStream and nsMediaFileStream are closest to the actual channels and can thus best estimate the download rate (being careful to ignore the times we're suspended etc). Thus we move ChannelStatistics to nsMediaStream because we need to start tracking download rates there.

-- nsMediaStream adds a bunch of API to set the read mode, set the playback rate estimate, Pin and Unpin the data, get the download rate, get the data length, inspect the cache state, and check if the stream is suspended by the cache. For file streams these are all trivial, we treat the file as if it's completely cached always (none of its data goes into nsMediaCache of course).

-- nsMediaStream::Cancel has gone, just use nsMediaStream::Close instead.

-- nsMediaChannelStream is moved up to the header file so that nsMediaCache can call CacheClientSeek/CacheClientSuspend/CacheClientResume on it.

-- Because nsChannelToPipeListener is gone, nsMediaChannelStream now manages the stream listener itself. The stream listener just forwards to OnStartRequest/OnStopRequest/OnDataAvailable on the nsMediaChannelStream.

-- In nsOggDecoder, as well as the general changes noted above we can get rid of mProgressPosition, because mDecoderPosition is now updated only after the Ogg seek is ended. While the Ogg seek is in progress nothing in nsOggDecoder is changing. This seems much nicer. We still have to suppress progress updates though.

-- nsMediaHttpStream has gone away. The seek optimizations it did have moved to nsMediaCache, and it's not needed for anything else.

-- nsMediaChannelStream::OnStartRequest and OnDataAvailable mostly do what nsChannelToPipeListener used to do, except they feed the data to nsMediaCache instead of a pipe. It's nice to have all the channel logic in one place.

-- When we need to cancel the channel, make sure that we resume it first to prevent strange leaks. Also, pass NS_ERROR_PARSED_DATA_CACHED as the status code, so that nsDocumentViewer::LoadComplete doesn't think the load aborted and decline to fire a load event for nsVideoDocuments.

-- We don't need much locking in nsMediaChannelStream now since most of what it does off-main-thread just delegates to thread-safe nsMediaCacheStream methods.

-- We have to add some code to the decoders to compute playback rate estimates and send them to the cache via nsMediaStream.

-- There's some annoyance in nsOggDecoder::GetStatistics where we can call GetStatistics after mReader has gone away. That's a problem now that nsChannelReader owns the nsMediaStream. In that case it seems fine to just return zeroed-out statistics.

-- The decoders have to not buffer while the cache has suspended their stream, for the same reason we have to autoplay; we have to try to read data to free up cache space. This shouldn't be a problem for reasonable cache sizes.

-- mTimeOffset in nsWaveDecoder has gone away, it was redundant with mPlaybackPosition.
Attachment #364018 - Attachment is obsolete: true
Attachment #365534 - Attachment is obsolete: true
Attachment #365819 - Attachment is obsolete: true
Attachment #367018 - Flags: review?(chris.double)
-- In automation.py.in, used during mochitests and other automated tests, set the cache size to 100K. This is completely unreasonable and thrashes the cache code hard.

-- During progress tests, we can sometimes go backwards with a small cache because we might evict data from the cache at the end of the readahead buffer. So for those tests, use a reasonable cache size.
Attachment #367019 - Flags: review?(chris.double)
Comment on attachment 367018 [details] [diff] [review]
part 4: the media cache

Matthew should look at this too.
Attachment #367018 - Flags: review?(kinetik)
When we seek or play after playback has ended, the spec says we should just seek (to the start, in the case of play) and not do a full reload. We really want to avoid the reload since that throws out our media cache state, so let's fix that now.
Attachment #367021 - Flags: review?(chris.double)
test_bug465498.html assumes that when we seek after ending, we'll be paused. That's incorrect per spec (as far as I can tell), and right now we're only paused because we reloaded. Removing the reload means we have to fix this test.

Other tests are assuming that when we call play() after playback ended, another 'playing' event will fire. Again, I think this is incorrect per spec, and avoiding reloads means the 'playing' event does not fire, so we need to fix these tests.
While tracking down the problem with nsDocumentViewer::LoadComplete thinking the load had aborted because we cancelled the document's channel, I refactored test_videoDocumentTitle and test_audioDocumentTitle to use iframes instead of opening new windows. This is faster and cleaner and simpler to debug internally so we should just do it.

test_delay_load also uses window.open but there's no immediately obvious and simple way to avoid that. We can make things a bit simpler and more robust though by avoiding depending on that window's load event to close it. Instead we can just close the window when the test is finished. I *think*, although I'm not sure, that I've seen that load event fail to fire or fire during the window.open call --- see bug 482647, maybe.
Attachment #367029 - Flags: review?(chris.double)
This is all nine patches updated to trunk and rolled together, plus the fix for bug 482942 (which the media cache depends on to build) --- everything you need to build this.
Whiteboard: [needs review]
In nsMediaCache::Update()

+    nsresult rv;
+    if (stream->mChannelOffset != desiredOffset && enableReading) {
+      // We need to seek now.
+      NS_ASSERTION(stream->mIsSeekable || desiredOffset == 0,
+                   "Trying to seek in a non-seekable stream!");
...
+    if (NS_FAILED(rv)) {
+      streamsToClose.AppendElement(stream);
+    }
+  }

rv may be used uninitialized.  This was causing test failures for me.
Ideally the cache implementation could work more like images where the data is cached across page views and identical resources share the same cache. 

In other words "video should NOT discard all buffer data once you leave the page". (aside from back/forward buttons that don't really "load" the page) 

Bug 476371 also highlights this issue.
In nsMediaCacheStream::NotifyDataEnded:

+  PRInt32 blockOffset = PRInt32(mChannelOffset%BLOCK_SIZE);
+  if (blockOffset > 0) {
+    // Write back the partial block
+    memset(reinterpret_cast<char*>(mPartialBlockBuffer) + blockOffset,
+           BLOCK_SIZE - blockOffset, 0);

Wrong argument order for memset.  Found this with Valgrind while running mochitests.  Other than this (and the one I submitted a patch for yesterday), we're clean when running tests.
No longer blocks: 483324
Blocks: 483324
diff --git a/xpcom/io/nsLocalFileWin.cpp b/xpcom/io/nsLocalFileWin.cpp
--- a/xpcom/io/nsLocalFileWin.cpp
+++ b/xpcom/io/nsLocalFileWin.cpp
@@ -396,16 +396,20 @@ OpenFile(const nsAFlatString &name, PRIn
+    if (osflags & NS_LOCALFILE_DELETE_ON_CLOSE) {
+      flags6 |= FILE_FLAG_DELETE_ON_CLOSE;
+    }
+

I think was meant to be "flag6 |= FILE_FLAG_DELETE_ON_CLOSE;"? It won't compile on Windows as is.
Yes, that's fixed in bug 482942. Sorry I forgot to mention it here.
In nsMediaCache::FindReusableBlock:

+    do {
+      if (blockIndex < PRInt32(aMaxSearchBlockIndex))
+        return blockIndex;

This won't match free block indices When aMaxSearchBlockIndex is PR_UINT32_MAX, because the result of conversion to PRInt32 is -1.  Needs to be something like:

    if (blockIndex >= 0 && PRUint32(blockIndex) < aMaxSearchBlockIndex)
Attached patch part 4: the media cache (v2) (obsolete) — Splinter Review
Updated to fix comment #36 and comment #39. The DELETE_ON_CLOSE flag usage changed to match the patch that landed on trunk.
Attachment #367018 - Attachment is obsolete: true
Attachment #369012 - Flags: review?
Attachment #367018 - Flags: review?(kinetik)
Attachment #367018 - Flags: review?(chris.double)
Attachment #369012 - Flags: review? → review?(chris.double)
Initial review of part 4:

+ * (In the future it might be advantageous to relax this, e.g. so that a
+ * seek to near the end of the file is can happen without disturbing

Spurious 'is' in that sentence.

+  // These are called on the main thread.
+  // Tell us whether the stream is seekable or not. Seekable streams will
+  // always pass 0 for aOffset. This should only be called while the
+  // stream is at channel offset 0. Seekability can change during
+  // the lifetime of the nsMediaCacheStream --- every time we do an HTTP
+  // load the seekability may be different (and sometimes is, in practice,
+  // due to the effects of caching proxies).
+  void SetSeekable(PRBool aIsSeekable);

What 'aOffset' is this comment talking about? Should that be non-seekable
streams will always pass 0?

+  // This method assumes that the cache monitor is held and can be called on
+  // any thread.
+  PRInt64 GetCachedDataEndInternal(PRInt64 aOffset);

Can we assert that the cache monitor is held, or do a commented out assert
(like nsOggDecoder does in places) pending the NSPR changes needed to be
able to write the assert. Ditto with other functions that mention this
assumption.

+  // All other fields are all protected by the cache's monitor and
+  // can be accessedy by any thread.

'accessedy'

+void
+nsMediaCache::Update()

This method can only be called on the main thread - can we put an asssert
for this in there. Ditto with other methods that can only be called on
the main thread.
+
+

REPLAY_DELAY is defined as being a unit in seconds. But in the following code (line 2632 in the patch):

+    return aNow - block->mLastUseTime + REPLAY_DELAY;

This number of seconds is being added to a PRIntervalTime. According to the NSPR docs a PRIntervalTime's units are platform dependant. So it looks like REPLAY_DELAY should be converted using PR_SecondsToInterval. Ditto with the other places where REPLAY_DELAY is used.

NSPR docs state that the maximum useful interval is approximately 6 hours. Is this likely to be an issue? For example with code that does a PR_MAX on intervals and the interval has overflowed.

+    } else if (desiredOffset < stream->mStreamOffset + BLOCK_SIZE) {

What happens if mStreamOffset + BLOCK_SIZE overflows?
Attachment #367019 - Flags: review?(chris.double) → review+
Attachment #367021 - Flags: review?(chris.double) → review+
Attachment #367026 - Flags: review?(chris.double) → review+
Attachment #367029 - Flags: review?(chris.double) → review+
(In reply to comment #41)
> Initial review of part 4:
> +  // These are called on the main thread.
> +  // Tell us whether the stream is seekable or not. Seekable streams will
> +  // always pass 0 for aOffset. This should only be called while the
> +  // stream is at channel offset 0. Seekability can change during
> +  // the lifetime of the nsMediaCacheStream --- every time we do an HTTP
> +  // load the seekability may be different (and sometimes is, in practice,
> +  // due to the effects of caching proxies).
> +  void SetSeekable(PRBool aIsSeekable);
> 
> What 'aOffset' is this comment talking about? Should that be non-seekable
> streams will always pass 0?

Yes, sorry. Non-seekable streams will always get 0 passed for aOffset in nsMediaChannelStream::CacheClientSeek.

> +  // This method assumes that the cache monitor is held and can be called on
> +  // any thread.
> +  PRInt64 GetCachedDataEndInternal(PRInt64 aOffset);
> 
> Can we assert that the cache monitor is held, or do a commented out assert
> (like nsOggDecoder does in places) pending the NSPR changes needed to be
> able to write the assert. Ditto with other functions that mention this
> assumption.

We can actually assert this now! See
http://mxr.mozilla.org/mozilla-central/search?string=PR_ASSERT_CURRENT_THREAD_OWNS_LOCK
I'll update the patch to add such assertions for nsMediaCache and the nsMediaStream classes. We should probably file another bug to add assertions for the decoder locks.

> +void
> +nsMediaCache::Update()
> 
> This method can only be called on the main thread - can we put an asssert
> for this in there. Ditto with other methods that can only be called on
> the main thread.

Sure.

> REPLAY_DELAY is defined as being a unit in seconds. But in the following code
> (line 2632 in the patch):
> 
> +    return aNow - block->mLastUseTime + REPLAY_DELAY;
> 
> This number of seconds is being added to a PRIntervalTime. According to the
> NSPR docs a PRIntervalTime's units are platform dependant. So it looks like
> REPLAY_DELAY should be converted using PR_SecondsToInterval. Ditto with the
> other places where REPLAY_DELAY is used.

Yep. Good catch.

> NSPR docs state that the maximum useful interval is approximately 6 hours. Is
> this likely to be an issue? For example with code that does a PR_MAX on
> intervals and the interval has overflowed.

That's a good point. Looking at the code, on Windows, Linux and Mac platforms it's in milliseconds, so rolls over every 50 days or so. On Solaris, it's in units of 10 microseconds, so rolls over twice a day. On IRIX it's something dynamic depending on the machine...

PRTime is 64-bit but depends on the current time of day, which can change in arbitrary ways, so that's no good.

I'm not 100% sure what to do here. I suppose we could define some kind of non-rolling-over interval timer that uses 64-bit values. Computing "now" would call PR_IntervalNow(), and if we observe it going backwards from the last value, we increment the most-significant 32 bits. If you somehow avoid calling "now" for over 50 days of uptime, then you forget one or more rollovers, but I think we can live with that. I've filed bug 484921 on that issue. Once that's fixed I'll update all the media code related to timers to use the new API.

> +    } else if (desiredOffset < stream->mStreamOffset + BLOCK_SIZE) {
> 
> What happens if mStreamOffset + BLOCK_SIZE overflows?

mStreamOffset is 64-bit so we'd have to be in a pretty bad situation for that to happen. But even if it did happen, nothing much bad happens here, we just enable reading for a stream that we'd prefer not to.
While running test_bug465498.html, nsMediaChannelStream::CacheClientSeek() can be called while the nsMediaChannelStream::mDecoder is in SHUTDOWN state, and mDecoder->GetMediaElement() returns nsnulll. CacheClientSeek() goes on to create a channel, etc, but the channel has nothing to supply data to. I reckon you should add a guard to nsMediaChannelStream::CacheClientSeek() to bail out in this case.
I can cause a crash with the media cache patch applied with this testcase with the following STR:

1. Play the video.
2. Seek the video somewhere.
3. Before the seek has completed, quit FireFox.

The crash is in the media decode thread, in liboggplay, but it does not happen without the media cache patch.

Call stack:

gklayout.dll!ogg_page_serialno(ogg_page * og=0x078afd78)  Line 62 + 0x5 bytes	C
gklayout.dll!oggz_seek_set(_OGGZ * oggz=0x078afd48, __int64 unit_target=11461)  Line 726 + 0x9 bytes	C
gklayout.dll!oggz_seek_units(_OGGZ * oggz=0x078afd48, __int64 units=11461, int whence=0)  Line 893 + 0x11 bytes	C
gklayout.dll!oggplay_seek(_OggPlay * me=0x06165508, __int64 milliseconds=11461)  Line 70 + 0x16 bytes	C
gklayout.dll!nsOggDecodeStateMachine::Run()  Line 1045 + 0x1c bytes	C++
xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x0584f784)  Line 511	C++
Summary: video support should NOT discard all buffered data before the current play position → Cache media data after playing it and across seeks
Turns out we can't yet assert that a *monitor* is held, so I can't enable the assertions we want. I've added them as comments in my patch though.
I lied. We can add those assertions just fine.
I'm not sure exactly what the media cache changed, but it seems to be tickling a liboggz bug that's causing the crash in comment #44.

In oggz_seek_set, we have
    offset_next = oggz_get_next_start_page (oggz, og);
which fills in 'og' (which is an ogg_page*). Except that if an I/O error occurs (as it does when we're shutting down the stream), offset_next will return a negative number and 'og' may not be filled in. All calls to oggz_get_next_start_page bail out immediately on a negative result, *except* for this call in oggz_seek_set. It carries on:
    if (/*unit_end == -1 &&*/ offset_next == -2) { /* reached eof, backtrack */
In our case offset_next==-1, so we proceed:
    } else {
      serialno = ogg_page_serialno (og);
which crashes because ogg_page_serialno dereferences og->header[14], but og->header is null since we haven't filled og in.

If oggz_seek_set hadn't crashed there it would have continued to
    if (offset_next < 0) {
      goto notfound;
and then
 notfound:
#ifdef DEBUG
  printf ("oggz_seek_set: NOT FOUND\n");
#endif
  oggz_reset (oggz, offset_orig, -1, SEEK_SET);
  return -1;

So this patch seems perfectly reasonable; don't try to get serialno and granule_at if offset_next < 0 and we're just going to bail out anyway.

This patch does indeed fix the crash.

Conrad, can you confirm this is right?
Attachment #369641 - Flags: review?
Comment on attachment 369641 [details] [diff] [review]
crash fix in liboggz

Conrad, please see comment #47...
Attachment #369641 - Flags: review? → review?(conrad)
Updated media cache patch for all comments. I also found a bug in the replacement algorithm: we were sometimes evicting a block that contained the current read position for a stream (i.e. containing both "played" and "readahead" data) because we classify it as a PLAYED_BLOCK and we penalize PLAYED_BLOCKS because they might not actually be replayed. Really it's both a PLAYED_BLOCK and a READAHEAD_BLOCK, but actually handling it that way is a pain, so I just special-cased to avoid evicting blocks that contain the current read position.
Attachment #369651 - Flags: review?(chris.double)
Attachment #369012 - Attachment is obsolete: true
Attachment #369012 - Flags: review?(kinetik)
Attachment #369012 - Flags: review?(chris.double)
Comment on attachment 369641 [details] [diff] [review]
crash fix in liboggz

Yes, this patch looks reasonable.

Applied in upstream svn.annodex.net r3902
Attachment #369641 - Flags: review?(conrad) → review+
Fix from comment 51 has been applied to mozilla-central in bug 480521
Attachment #369651 - Flags: review?(chris.double) → review+
Attachment #367023 - Flags: review?(kinetik) → review+
P1 since this has to have beta exposure.
Priority: P3 → P1
Comment on attachment 369651 [details] [diff] [review]
part 4: the media cache (v3)

> + * If the cache fails to initialize then no instances of this class
> + * can be created, so nonstatic methods of this class can assume
> + * gMediaCache is non-null.

It doesn't seem like this is enforced quite as described.  What really
happens is that nsMediaCacheStream::Init returns an error, so you can have
an instance of nsMediaCacheStream, but it's invalid.

The way the current code deals with this can result in a crash if the cache
fails to initialize: decoder Load() returns an error as a result of
nsMediaCacheStream::Init returning an error, the media element fires an
'error' event, and we crash trying to dereference gMediaCache via the
element's DispatchProgressEvent calling the decoder's GetStatistics calling
the mediachannel's GetCachedDataEnd, which calls the mediastream's
implementation and crashes trying to lock the cache's monitor.

> +  friend class Listener;

Do we still support compilers where this is necessary? :-(

> +NS_IMPL_ISUPPORTS2(nsMediaChannelStream::Listener, nsIRequestObserver, nsIStreamListener)

Not a big deal, but it might be useful to add a comment to explain the
purpose of this class (or maybe just rename it), since the implementation
looks skeletal and you have to look at where it's used to infer the purpose.

In nsMediaCache::FindReusableBlock:
> +      if (blockIndex < PRInt32(aMaxSearchBlockIndex))
> +        return blockIndex;

aMaxSearchBlockIndex is already PRInt32.
Attachment #369651 - Flags: review?(kinetik) → review+
(In reply to comment #55)
> (From update of attachment 369651 [details] [diff] [review])
> > + * If the cache fails to initialize then no instances of this class
> > + * can be created, so nonstatic methods of this class can assume
> > + * gMediaCache is non-null.
> 
> It doesn't seem like this is enforced quite as described.  What really
> happens is that nsMediaCacheStream::Init returns an error, so you can have
> an instance of nsMediaCacheStream, but it's invalid.

Good point, I've fixed the comment.

> The way the current code deals with this can result in a crash if the cache
> fails to initialize: decoder Load() returns an error as a result of
> nsMediaCacheStream::Init returning an error, the media element fires an
> 'error' event, and we crash trying to dereference gMediaCache via the
> element's DispatchProgressEvent calling the decoder's GetStatistics calling
> the mediachannel's GetCachedDataEnd, which calls the mediastream's
> implementation and crashes trying to lock the cache's monitor.

I'll fix this by wiping out nsOggDecoder's mReader if it fails to initialize:
    nsresult rv = mReader->Init(this, mURI, aChannel, aStreamListener);
    if (NS_FAILED(rv)) {
      // Free the failed-to-initialize reader so we don't try to use it.
      mReader = nsnull;
      return rv;
    }
nsOggReader::GetStatistics already knows to bail out if mReader is null.

nsWaveDecoder::Load does this:
  nsresult rv = nsMediaStream::Open(this, mURI, aChannel, getter_Transfers(mStream),
                                    aStreamListener);
so we can just modify nsMediaStream::Open to not return a stream if the Open fails.

> > +  friend class Listener;
> 
> Do we still support compilers where this is necessary? :-(

I think so.

> > +NS_IMPL_ISUPPORTS2(nsMediaChannelStream::Listener, nsIRequestObserver, nsIStreamListener)
> 
> Not a big deal, but it might be useful to add a comment to explain the
> purpose of this class (or maybe just rename it), since the implementation
> looks skeletal and you have to look at where it's used to infer the purpose.

OK

> In nsMediaCache::FindReusableBlock:
> > +      if (blockIndex < PRInt32(aMaxSearchBlockIndex))
> > +        return blockIndex;
> 
> aMaxSearchBlockIndex is already PRInt32.

THanks.
Checked in the liboggz patch: http://hg.mozilla.org/mozilla-central/rev/95a2e49051d0

I also checked in the rest, but had to back it out due to a hang in test_videocontrols. I'll look into that.
Grr, it's a trivial issue that videocontrols.xml needs to use a large cache size; it wants to load the video completely before playing, and with a tiny cache size the video will never completely load. I just need to override the cache size in that test.
This is great stuff, just built with these patches and <video> just became really awesome(r)! When changing positions there's a little smudging (occasionally) but I'm not sure if it's this patch or some other problem. I was testing http://www.double.co.nz/video_test/test1.html the first video.
The smudging is most likely bug 463358.
(In reply to comment #58)
> Grr, it's a trivial issue that videocontrols.xml needs to use a large cache
> size; it wants to load the video completely before playing, and with a tiny
> cache size the video will never completely load. I just need to override the
> cache size in that test.

That's still a bug, isn't it?  An unimportant one, to be sure, but still definitely a bug that if someone wants a small cache they can't have it even at the expense of slower seeks and more range re-requests.
Actually, come to think of it, it may not be so clearly unimportant -- consider a case like <http://www.made-in-england.org/videos/cdt/> where the video to be watched is a couple hours long and potentially in super-high quality.  Still not sure where that ranks it, but at least being able to play such a video (if not necessarily well) is, I think, fairly important.
By 'never completely load' Robert means that it won't load until it starts playing and consuming cached data. This will free up room in the cache for the load to resume.

The issues with the test is because it didn't load the entire file the event indicating it was loaded doesn't fire. And the test was waiting for that.
Yeah. Tests, or Web pages, that start loading a really huge video and wait for it to completely load before they start playing anything can get stuck, because we deliberately pause the video download when we've read ahead far enough to fill up the cache. There isn't much we can do about this problem. The spec says our behaviour is allowed. We need to bound the cache size. It makes no sense to evict some read-ahead data to make room for later read-ahead data.

If the video is "autoplay" and its download is paused by the cache before we have enough data to play through, we automatically start playing it anyway to avoid being stuck waiting for more data that can never arrive. Applications that are trying to fake that behaviour should probably start playing when a "waiting" event is received.
I've actually done experiments with ridiculously small cache sizes and been able to play a 640x480 video over the net with just a 200K cache and just a few buffering pauses. By comparison a single decoded frame of that video is 1200K...
So what's the change that needs to be made to the spec?  "Buffered everything we can" vs. "Buffered everything"?  Just sounds like something that needs to go back to the spec group.
Yeah, maybe we should be allowed to fire the load event early if the cache is full. I'll post to the list.
Whiteboard: [needs 191 landing]
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: