Closed Bug 531340 Opened 15 years ago Closed 14 years ago

Remove liboggplay usage from Ogg backend of <video>

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a4
Tracking Status
blocking2.0 --- beta1+

People

(Reporter: cajbir, Assigned: cpearce)

References

Details

(Keywords: perf)

Attachments

(4 files, 10 obsolete files)

732.01 KB, patch
cpearce
: review+
cpearce
: superreview+
Details | Diff | Splinter Review
1.63 KB, patch
cajbir
: review+
Details | Diff | Splinter Review
1.23 KB, patch
Details | Diff | Splinter Review
638 bytes, patch
roc
: review+
Details | Diff | Splinter Review
The architecture of liboggplay appears to not be suitable for playback of large videos. We want to be able to do things like:

a) Decode audio data as much a possible and keep writing this to the sound hardware to avoid sound skipping. liboggplay always packages a certain amount of audio data with the video frame so performance of decoding the audio is bounded by the decoding of the video frame.

b) Skip decoding interframes and read through to the next keyframe if the decoded interframes would be discarded during frame skipping due to the time taken to decode them extending past the time required to display them.

Removing liboggplay will also reduce the third party library dependancy and reduce the amount of code that needs to be audited for security issues and maintained.
Assignee: nobody → chris.double
This not for review, it's the current work in progress and passes very few tests. It does however playback video from local files and the network. It uses libogg, libtheora, libvorbis and libsydneyaudio. It still uses liboggplay for the yuv2rgb decoding for now, but not for the playback.

On my machine it plays back large videos without audio skipping, playing only video keyframes if that's what it takes to achieve that.

Seeking doesn't work. Due to that, duration retrieval doesn't work (TinyVid videos show duration since that uses X-Content-Duration).
Great!

A few minor issues:
-- s/bool/PRBool/
-- s/true/PR_TRUE/
-- s/false/PR_FALSE/
-- replace tabs with spaces
-- need to use nsTHashtable instead of std::map
-- need to use ... something ... instead of std::deque
Yeah I got standard C++ carried away :) I'll get onto these, cheers.
Attached patch Address comment 2 (obsolete) — Splinter Review
Replaces std::map with nsClassHashtable, std::deque with nsDeque, and addresses the other points in the comment.
Attachment #414800 - Attachment is obsolete: true
I suggest a few refactorings:

* Make nsOggStream an pure virtual class for an OggStream, and have subclasses nsTheoraStream and nsVorbisStream implement them. We don't need have to have both the mTheora and mVorbis members in both streams.
* We should move the contents of ReadVorbisHeader() and ReadTheoraHeader() into their respective nsOggStream classes, into virtual nsOggStream::ReadHeader() functions. The different header packets of theora/vorbis/skeleton streams all have distinctive magic bytes, so we can detect when we're reading the last header packet (or worst case just count the incoming header packets), and we can then do the InitForData() before returning after reading the last header packet. This moves stream specific stuff into the stream class.
* We should use PRInt64 milliseconds timestamps for all timestamps, rather than doubles, else you can get floating point rounding errors breaking frame accurate seeking when seeking exactly on a keyframe boundary.
Yeah, I planned to do something like the first two once the initial patch was up and working. I don't want to manually detect for magic bytes - I'd prefer to use only the library API if possible. I also don't want to count headers as this has been a problem with liboggplay causing some files not to play.

I'll leave the timestamp changes for you to do when you do the seeking support since you know what needs to be done there.
Comment on attachment 415065 [details] [diff] [review]
Address comment 2

>+void nsOggDecodeStateMachine::DecodeTheoraFrame(VideoData* aData)
>+{

This is reading out of bounds, on videos with a pixel y-offset (e.g. http://tinyvid.tv/show/3hh220yw2esri ):

[...]
>+  unsigned char* p = yuv.ptry;
>+  unsigned char* q = aData->mBuffer[0].data + ((mTheoraStream->mTheora.mInfo.pic_x&~1) + aData->mBuffer[0].stride*(mTheoraStream->mTheora.mInfo.pic_y&~1));
>+  for(int i=0; i < aData->mBuffer[0].height; ++i) {
>+    memcpy(p, q, aData->mBuffer[0].width);
>+    p += aData->mBuffer[0].width;
>+    q += aData->mBuffer[0].stride;
>+  }
>+

In the above loop, if you assert that (q < aData->mBuffer[0].data + aData->mBuffer[0].stride * aData->mBuffer[0].height), the assertion fires. Looks like the loop condition doesn't take into account the fact that q starts a few lines offset into aData->mBuffer[0], I think the loop condition needs to be (i < aData->mBuffer[0].height - (mTheoraStream->mTheora.mInfo.pic_y&~1)).

[...]
>+  int uv_offset = xo+yo;
>+  p = yuv.ptru;
>+  q = aData->mBuffer[1].data + uv_offset;
>+  unsigned char* p2 = yuv.ptrv;
>+  unsigned char* q2 = aData->mBuffer[2].data + uv_offset;
>+  for(int i=0;i < aData->mBuffer[1].height; ++i) {
>+    memcpy(p, q, aData->mBuffer[1].width);
>+    memcpy(p2, q2, aData->mBuffer[1].width);
>+    p += aData->mBuffer[1].width;
>+    p2 += aData->mBuffer[1].width;
>+    q += aData->mBuffer[1].stride;
>+        q2 += aData->mBuffer[1].stride;
>+  }

Same issue here too, if you assert (q < aData->mBuffer[1].data + aData->mBuffer[1].stride * aData->mBuffer[1].height) or (q2 < aData->mBuffer[2].data + aData->mBuffer[2].stride * aData->mBuffer[2].height), those assertions fire. I think the loop condition needs to be (i < aData->mBuffer[1].height - (mTheoraStream->mTheora.mInfo.pic_y&~1)).

I'm seeing a hang in the audio thread as well on shutdown, I'll try to debug that as well.
(In reply to comment #7)

As I said, patch is not ready for review. I've made it available so people can see progress or use it for basing existing work on. Don't bother tracking down the audio hang, I'm still working on these things.

The YUV data conversion is 'stop gap' code. It will change because we won't be using the YUV conversion code from liboggplay.
Taking bug. This patch builds on Chris Double's previous patch, and adds (unoptimized) seeking support. Mochitests pass on Windows, but test_playback times out on Linux. Probably a sound underrun issue. I haven't tried it on Mac yet...

This patch is a work in progress, it's not ready for review. I'm just posting here as a progress update.
Assignee: chris.double → chris
Attachment #415065 - Attachment is obsolete: true
Keywords: perf
Re comment #2: in another patch I was asked to do: s/PRBool/bool/
So, what is the current standard for booleans?
(In reply to comment #11)
> So, what is the current standard for booleans?

https://developer.mozilla.org/en/C___Portability_Guide

<quote>
Type scalar constants to avoid unexpected ambiguities

Some platforms (e.g. Linux) have native definitions of types like Bool which sometimes conflict with definitions in XP code. Always use PRBool (PR_TRUE, PR_FALSE).
</quote>
(In reply to comment #13)
> <http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/27549f4e2fb4f51a/f058d235a0f68c60>,
> OTOH, claims "use bool".

These notes are about migration to the 'Mozilla 2' platform.
fyi: https://wiki.mozilla.org/Mozilla_2
Attached patch Patch v1 (obsolete) — Splinter Review
Working backend patch!
* Feature equivalent with existing liboggplay backend.
* Tests pass consistently on Tryserver and locally, though I've seen test_bug468190.html fail once on Mac tryserver, I'll investigate tomorrow.
* Added two new files to test suite (that's half the size of the patch) one with a video and one with audio overhang.
* I have identified a problem though; the theora decoder will decode one theora packet at a time, but the vorbis decoder will decode a whole page at a time. This can causes us to spend too much time decoding vorbis, and we stutter when trying to decode large frame videos. I'll look into that in the next revision, but I post this patch regardless so we can get the review rolling...
* I sometimes see leaks of nsVorbisStream on tinderbox...
Attachment #418979 - Attachment is obsolete: true
Attachment #429495 - Flags: superreview?(roc)
Attachment #429495 - Flags: review?(chris.double)
> nsAudioStream.cpp:
>+    static TimeStamp last = TimeStamp::Now();
>+    float seconds = (TimeStamp::Now() - last).ToSeconds();
>+    last = TimeStamp::Now();

This is old debug code from my original patch and can be removed.
(In reply to comment #15)
> * I have identified a problem though; the theora decoder will decode one theora
> packet at a time, but the vorbis decoder will decode a whole page at a time.

I can fix this by doing a "capture vorbis granulepos" similar to how I capture the theora granulepos. I then get smoothish playback on large videos.
(In reply to comment #17)
> I can fix this by doing a "capture vorbis granulepos" similar to how I capture
> the theora granulepos. I then get smoothish playback on large videos.

(Because then I don't need to read an entire page of vorbis data in order to calculate the presentation times of vorbis packets, I can just increment the previous packet's granulepos by the number of samples in the decoded output).
I'm still working through the patch, just posting some of what I've looked at so far.

> +  // Moves the decode head to aTime milliseconds. aStartTime and aEndTime
> +  // denote is the start and end times of the media.

Remove 'is'?
    
+  // PR_FALSE while decode threads should be running. Accessed on audio, 
+  // state machine and decode threads. Syncrhonized by decoder monitor.

Spelling of synchronized.

Line 1506 of patch:
+  mMonitor = nsAutoMonitor::NewMonitor("media.oggreader");

Check for monitor failure.

Line 1515 to 1544 in the patch seem to have DOS line endings. Same in other places in the patch.

I think nsOggPlayStateMachine has gotten big and complex enough to be split out into it's own implementation file nsOggStateMachine.cpp. The implementation of that class is probably bigger than nsOggDecoder now.

In nsOggStream.h/.cpp:

+  PRBool ReadAllHeaders() { return mReadHeaders; }

ReadAllHeaders Needs a better name. When I first saw this I thought it was doing the actual reading. Needs comments too.

In nsOggStream.cpp, nsOggStream:Create (and other places):

+ strncmp((const char*)aPage->body+1, "theora", 6) == 0) {

Use C++ casts. I don't like the approach of digging into the packet structure instead of using the library API routines to identify headers packet types. If you are going to take this approach though please reference in a comment where these values comes from (Vorbis, Theora specs, RFC's or wherever).

In nsOggStream.cpp, nsTheoraStream::DecodeHeader:

+  PRBool isSetupHeader = aPacket->packet[0] == 0x82;

Assert that the packet bytes > 0 before this line. Also in nsVorbisStream::DecodeHeader.

In nsOggStream.cpp, nsTheoraStream:Time:

  return (th_granule_frame(mCtx, granulepos) + 1) * 1000 *
           mInfo.fps_denominator / mInfo.fps_numerator;

Do you need to deal with integer overflow here? What if 1000*granule_frame*denominator overflows? Same in StartTime and MaxKeyframeOffset.

In nsOggStream.cpp, nsVorbisStream destructor, need to call vorbis_info_clear and vorbis_comment_clear to free memory. Also equivalent functions to clear memory for the block and dsp.

In nsOggStream.cpp, nsVorbisStream::DecodeHeader, reference location of spec where you get the magic 0x1, 0x3, 0x5 header values from. 

nsVorbisStream::Time, check overflow on 1000*granulepos calculation.
The two cases of NS_MAX(0LL, ...) need to be changed to build on x86_64 Linux (where PRInt64 is a long).

+  NS_ASSERTION(NS_IsMainThread(), "Should be on main thread.");
+
   NS_ASSERTION(NS_IsMainThread(), 
                "nsOggDecoder::Shutdown called on non-main thread");

Duplicate assertion in Shutdown().

All of the mochitests pass on Linux x86_64 except test_ended2 (missing playing event) and test_playback (7 errors).  The number of test failures drops to one (test_playback, 1 error) when run under Valgrind, so the other failures are intermittent.  The playback failures are all cases of time running backwards:

1017 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | bug482461.ogv time should run forwards: p=2.6679999828338623 c=2.628999948501587

Tests are memcheck clean and mostly helgrind clean.  It found Vorbis leaks, but I see doublec's already pointed out the problem there.  There's some suspicious looking thread safety stuff showing up in libsydneyaudio and in the ALSA code.  We haven't been sharing a single sa_stream across threads before now, so this'll need investigation.

I ran into a bunch of intermittent shutdown hangs, but haven't bothered to debug them yet.  I'll wait for the latest version of the patch and see if they still occur.

http://v2v.cc/~j/theora_testsuite/chained_streams.ogg plays through way too fast the second time.
Using 'unsigned' for indexes in some places and 'int' in others. See nsOggDecoder.cpp, ReadAllHeaders:

  for (unsigned i=0; i<aStreams.Length(); i++) {
    if (!aStreams[i]->ReadAllHeaders()) {
      return PR_FALSE;
    }

and ReadOggHeaders:

  for (int i = 0; i<streams.Length(); i++) {
    nsOggStream* s = streams[i];
    if (s != mVorbisStream && s != mTheoraStream) {
      s->Deactivate();
    }
  }

ReadOggHeaders:

  mCallbackPeriod = (1000 * d) / n;

Overflow possibility here (1000*d)?

    int w = mTheoraStream->mInfo.pic_width;
    int h = mTheoraStream->mInfo.pic_height;
    NS_ASSERTION(w < MAX_VIDEO_WIDTH, "Invalid Video Width");
    NS_ASSERTION(h < MAX_VIDEO_HEIGHT, "Invalid Video Height");
    if (w * h > MAX_VIDEO_WIDTH * MAX_VIDEO_HEIGHT ||
        w * h < 0)
    {

Do we need to check the frame width and height here too? This is used in an allocation in RenderTheoraFrame.
kinetik,doublec: thanks for your comments, keep them coming!

Re: the missing "playing" event errors in test_ended and time running backwards in test_playback.html: these have been dogging me for quite a while, I through they were fixed, but I will need to apply some record-replay magic to them.

The nsVorbisStream leaks are because I'm calling MOZ_COUNT_DTOR() in nsVorbisStream::nsVorbisStream(), rather than MOZ_COUNT_CTOR(). :P Or are you seeing something else?

I've noticed that the fast playback (which kintik noted on chained_streams.ogg) happens on content/media/test/seek.ogv on second and subsequent plays through as well.
The leaks I'm talking about should be fixed by addressing doublec's comment:

> In nsOggStream.cpp, nsVorbisStream destructor, need to call vorbis_info_clear
and vorbis_comment_clear to free memory. Also equivalent functions to clear
memory for the block and dsp.

Also, you can free mSetup (th_setup_info) once you've called th_decode_alloc.  Up to you if it's worth the effort of freeing it immediately vs freeing it in the destructor--it might save a few kB (at most).
(In reply to comment #20)
> All of the mochitests pass on Linux x86_64 except [...]
> test_playback (7 errors).  [...]
> 1017 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html
> | bug482461.ogv time should run forwards: p=2.6679999828338623
> c=2.628999948501587

This occurs in Fedora 12 x686 running in VirtualBox, using alsa. In this environment nsAudioStream::GetPosition() sometimes returns a value less than it's previously returned, i.e. it reports time going backwards. I assume this happens because the VirtualBox sound driver estimates the sound position, and then updates it periodically with the actual value from hardware. I note that we still stay in A/V sync when we use the decreased audio position, so the decreased audio position is the correct one. I guess we shouldn't let the reported current playback position (mCurrentFrameTime/HTMLMediaElement.currentTime) decrease, but internally still use the audio clock as the actual playback position even when it decreases.

If I don't let the current playback position decrease as above, I also note that sometimes when we write a block of PCM data in nsAudioStream::Write() which is larger than what nsAudioStream::Available() reports as being free, the call to sa_stream_write() (in sydney_audio_alsa.c) never returns, causing the audio clock to not advance and media playback to stall. This looks like a bug in the libsydneyaudio alsa backend which we already have, so I'll ignore it for now...

> http://v2v.cc/~j/theora_testsuite/chained_streams.ogg plays through way too
> fast the second time.

Pretty simple fix; I wasn't resetting the playback start time properly when playing through the second time.
(In reply to comment #24)
> I guess we shouldn't let the
> reported current playback position
> (mCurrentFrameTime/HTMLMediaElement.currentTime) decrease, but internally still
> use the audio clock as the actual playback position even when it decreases.

Sounds good to me.
(In reply to comment #19)
> I don't like the approach of digging into the packet structure
> instead of using the library API routines to identify headers packet types.

I do this because the library API routines are insufficient.

When we seek to time 0 we have three options:
1. Seek to byte offset 0, blow away or re-init our old nsOggStreams, re-read the header packets and re-initialize our decoders.
2. Seek to byte offset 0, dig into the structure of the freshly re-read packets in order to determine if the packets we're reading are header or data packets. If they're header packets, skip them. We can't pass header packets into the lower level libraries' (non-header) decode functions, IIRC we had some bugs in with libvorbis when libogg{z,play} did this.
3. Seek to the start byte offset of the first page with non-header packets, and just begin decoding from there.

I chose option 3. I figure it's does less work, though admittedly option 1 and 2 aren't significantly more work.

To do option 3, we need to determine the byte offset of the start of the first page with non-header packets. That has two possible implementations, bearing in mind that libvorbis doesn't provide a "is this valid a header packet?" function, and that packets can span pages and streams' pages are interleaved:

1. When reading packets, determine/remember the byte offset of the start of the page on which it begins. This would require reading the ogg page lacing values to determine the number of packets which start on a page vs the number which we decode after reading the page, but this isn't hard (I did this in my Ogg indexer). When the lower level libraries read a packet which isn't header packet, they'll tell us (as per doublec's original patch) and we can use the first non-header packet's page-start-offset as the offset of the first non-header page. This would require us to munge around in the ogg_page structure.
2. Or (as I'm doing) we can have the nsOggStreams tell us when they've read the all their header packets by inspecting the packets as they come in.

I don't see a strong advantage of either option. They both would break if the underlying formats changed, but if that happened we'd need to update the libraries anyway.
(In reply to comment #26)
> 
> I do this because the library API routines are insufficient.

That's fine, thanks for the explanation. It would be great if you could raise these issues, or suggest API improvements, on one of the Xiph mailing lists.
(In reply to comment #21)
>     int w = mTheoraStream->mInfo.pic_width;
>     int h = mTheoraStream->mInfo.pic_height;
>     NS_ASSERTION(w < MAX_VIDEO_WIDTH, "Invalid Video Width");
>     NS_ASSERTION(h < MAX_VIDEO_HEIGHT, "Invalid Video Height");
>     if (w * h > MAX_VIDEO_WIDTH * MAX_VIDEO_HEIGHT ||
>         w * h < 0)
>     {
> 
> Do we need to check the frame width and height here too? This is used in an
> allocation in RenderTheoraFrame.

If we check the frame width/height here in nsOggReader::ReadOggHeaders(), we don't need to check it in RenderTheoraFrame, as a Theora stream with too-large frame sizes will be deactivated, and thus too-large frames will never be presented to RenderTheoraFrame (or needlessly decoded).
(In reply to comment #28)
> >     int w = mTheoraStream->mInfo.pic_width;
> >     int h = mTheoraStream->mInfo.pic_height;
> >     NS_ASSERTION(w < MAX_VIDEO_WIDTH, "Invalid Video Width");
> >     NS_ASSERTION(h < MAX_VIDEO_HEIGHT, "Invalid Video Height");
> >     if (w * h > MAX_VIDEO_WIDTH * MAX_VIDEO_HEIGHT ||
> >         w * h < 0)
> >     {

I just noticed that pic_width and pic_height are uint32 but you're assigning it to a signed int. This can cause issues with the bounds checks due to overflow.
Attached patch Patch v2 (obsolete) — Splinter Review
Address review comments.
* This is not based on the not-yet-checked-in patches from bug 525401 or the new YCbCr -> RGB code or new layers code. If they land first, I'll rebase.
* Moved nsOggPlayStateMachine and nsOggReader methods into their own files, with appropriate header files.
* Handle integer overflow.
* Buffer pages in nsOggStreams as we're reading ahead for other streams, as per comment 17.
* Ensure time never runs backwards when the audio clock corrects itself, as per comment 24.
* Tests are now more reliable. I recorded the content/media mochitests running in a loop for about 2 hours before test_ended2 failed, but that will be fixed when Matthew's patches from bug 525401 are forward-ported to the this new backend.
* Changed DecodeTheoraPage() so that it will not decode frames which are behind the current playback position when skipping forwards to the next keyframe. This means if we're really struggling with keyframe skip, we won't decode keyframes which we're not going to be able to display anyway.
* All line endings are \n.
* Use NS_MAX/MIN rather than PR_MAX/MIN.
* Fix leaks.
Attachment #429495 - Attachment is obsolete: true
Attachment #432479 - Flags: review?(chris.double)
Attachment #432479 - Flags: feedback?(kinetik)
Attachment #429495 - Flags: superreview?(roc)
Attachment #429495 - Flags: review?(chris.double)
Attachment #432479 - Attachment is obsolete: true
Attachment #432479 - Flags: review?(chris.double)
Attachment #432479 - Flags: feedback?(kinetik)
Comment on attachment 432479 [details] [diff] [review]
Patch v2

Damnit, doesn't build on Linux.
Attached patch Patch v2.1 (obsolete) — Splinter Review
* As previous patch, but builds on Linux.
Attachment #432481 - Flags: review?(chris.double)
Attachment #432481 - Flags: feedback?(kinetik)
Attached patch Patch v2.2 (obsolete) — Splinter Review
Patch v2.1 rebased ontop of Matthew's changes from bug 525401.
Attachment #432481 - Attachment is obsolete: true
Attachment #432716 - Flags: review?(chris.double)
Attachment #432716 - Flags: feedback?(kinetik)
Attachment #432481 - Flags: review?(chris.double)
Attachment #432481 - Flags: feedback?(kinetik)
Blocks: 496684
Attachment #432716 - Flags: superreview?(roc)
This looks really good.  I haven't gone through the entire patch yet, but here are my comments so far:

>    PRInt32 available = Available();
>    if (available < count) {
>      mBufferOverflow.AppendElements(s_data.get() + available, (count - available));

We can get rid of mBufferOverflow completely now, right?  And the
Available() implementation, which is only called from here.

It's possible that removing the above code will cause problems with the Wave backend on Linux (and any other platform with small backend/sydneyaudio buffers) because it assumes it can write ~200ms of audio without blocking.

>   if (aPage->body_len > 8 &&
>       strncmp(reinterpret_cast<const char*>(aPage->body), "fishead", 8) == 0) {
>     return new nsSkeletonStream(aPage);
>   }

Minor, but I think for clarity I'd prefer memcmp, with an explicit NUL for
the fishead which really is 8 bytes including a NUL, whereas the
theora/vorbis magic values are not really C strings and thus don't include a NUL.

>   int ppmax = 0;
>   int ret = th_decode_ctl(mCtx,
>                           TH_DECCTL_GET_PPLEVEL_MAX,
>                           &ppmax,
>                           sizeof(ppmax));

Might as well delete this code (and below), since it's a big no-op.  We're querying the max and then setting the default.

Error check calls to ogg_stream_pagein and ogg_stream_reset everywhere?

>  p->header = new unsigned char[p->header_len + p->body_len];

Can this ever overflow, or is it checked elsewhere?

>  nsTheoraStream::Time(PRInt64 granulepos) {
>  PRInt64 nsTheoraStream::StartTime(PRInt64 granulepos) {

Need to stick to aStupidNamingScheme.  I hate myself for pointing this one
out.  Also, braces are riding high here but not elsewhere in the file.

>    vorbis_synthesis_restart(&mDsp);

Error check?

>  PRInt64 frameno = th_granule_frame(mCtx, granulepos);

Error check this, or assert that we're not passing some other negative gpos in?  We only test for -1.

>  #define PR_INT64_MAX (~((PRInt64)(1) << 63))
>  #define PR_INT64_MIN (-1 - PR_INT64_MAX)

This should be moved to prtypes.h.

>  // Returns the end time that a granulepos represents.
>  virtual PRInt64 StartTime(PRInt64 granulepos) { return -1; }

Comment should say "start time".

>   // Holds a decoded sound sample.
>   class SoundData {

"decoded sound buffer", "chunk", "segment" or something?  It's a collection of samples rather than a single sample.

>   // Holds a decoded Theora frame, in YUV format. These are queued in the reader.

YCbCr!  Also a couple of other places, e.g. mYUVSize/mYUVBuffer on the state
machine.  mYUVSize is signed int.

>      int size = aBuffer[i].height * aBuffer[i].stride;

Can this overflow, or is it checked elsewhere?  Also, should be unsigned.

>  VideoData(PRBool aEOS) :

Assert that aEOS is true.  It might be clearer to have a set of named constructors for instantiating VideoData.

>      mBuffer[i].data = (unsigned char*)malloc(size);
>      memcpy(mBuffer[i].data, aBuffer[i].data, size);

Our malloc can still return null, so either check for error and handle it or
use new [] or moz_xmalloc.

>     if (!mDuplicate && !mEOS) {
>       for (int i=0; i < 3; ++i) {

It might be nicer to initialize mBuffer to nulls and then always call free in the dtor rather than testing mDuplicate/mEOS.  Makes it slightly clearer that the buffers are always freed.

>   mPlayStartTime(),

This is dead code, the default ctor will be called anyway.  Also applies to
mBufferingStart(), nsMediaDecoder(), etc.

As a general comment, there are a lot of places where we use PRInt32/PRInt64
where the value can't ever be negative (e.g. mPacketCount).  This was a
problem with the old code too, so we don't have to fix it right now, but I
wanted to point it out.

>   int y_offset = mInfo.mPicY & ~1;

It's valid for the offsets and picture dimensions to be odd.  liboggplay's
code for this was wrong.  We should be able to delete this massive chunk of
buffer fiddling mayhem using doublec's new YCbCr code, so maybe this doesn't
matter.  We're not regressing current behaviour, either.

>        NS_ASSERTION(ret == 0, "vorbis_synthesis_read failed");

Every case where we do this (assert success on a retval) should either be
NS_ABORT_IF_FALSE or assert *and* return error.  Otherwise, since
NS_ASSERTION is non-fatal, bad things could happen after a failed API
call.

>  // what page to seek to when seeking to the media start. The nsOggStreams
>  // track when they've 

Truncated comment?

>    MulOverflow32(1000, d, c);

This (and MulOverflow) used without checking return value in a couple of places.
+#include "mozilla/TimeStamp.h"
+using mozilla::TimeStamp;

newline before 'using'. We generally seem to have newlines between #includes and other code.

Why are we removing the mBufferOverflow code?

Why are we changing nsAudioStream::GetPosition to return ms? consistency with other code?

 double nsOggDecoder::ComputePlaybackRate(PRPackedBool* aReliable)
 {
-  PRInt64 length = mReader ? mReader->Stream()->GetLength() : -1;
+  NS_ASSERTION(NS_IsMainThread() || IsThread(mStateMachineThread),
+               "Should be on main or state machine thread.");

I presume we should be holding mMonitor here? If so, you should assert so.

 void nsOggDecoder::UpdatePlaybackRate()
 {
-  if (!mReader)
+  NS_ASSERTION(NS_IsMainThread() || IsThread(mStateMachineThread),
+               "Should be on main or state machine thread.");

Ditto

+    mElement->DispatchSimpleEvent(NS_LITERAL_STRING("durationchange"));

I'd like to move all DOM event dispatching into the element (and ultimately factor out these notifications into some kind of nsCodecHost object so elements can have multiple decoders and we can drive a decoder without an nsHTMLMediaElement). But I guess that can wait.

+    if (videoQueueSize > 10) {

Ewww, magic number! Use a named constant please.

+  PRBool audioPump = PR_TRUE;
+  PRBool videoPump = PR_TRUE;

Comment what these mean.

+  PRBool keyframeSkip = PR_FALSE;

This means "skip to keyframe", not "skip keyframe", right? Needs a comment.

+    const int audioThreshold = 20;
+    int audioQueueSize = mReader->mAudioQueue.GetSize();
+    if (audioQueueSize > 250 ||
+        (keyframeSkip && audioQueueSize > 50)) {

Ewww, more magic numbers!

+      if (theoraPlaying &&
+          vorbisPlaying &&
+          mDecoder->mDecoderPosition >= initialDownloadPosition)
+      {
+        mBufferExhausted = PR_TRUE;
+      }

I don't understand why this is conditional on theoraPlaying && vorbisPlaying.

General comment: in principle, I think hardcoding "theora" and "vorbis" names here is a mistake. In principle we might want to support different audio and video codecs in this code, so the names should be "audio" and "video". Even more theoretically we might want to support additional tracks --- multiple audio tracks, or subtitle tracks, or whatnot. We'd probably need to have generic per-track state and identify one track that we're currently synchronizing to (normally the audio track). In practice, though, we should definitely land this before we worry about any of that.

Up to nsOggPlayStateMachine::AudioLoop.
> Why are we removing the mBufferOverflow code?

The new Ogg backend uses a separate thread to write audio, so it can write as much as possible and let sydneyaudio writes block.  This is better (than the current code, at least) because we keep the audio buffers as full as possible and are less likely to underrun.
With the latest patch the tests pass in the Fedora 12 VM where I reported problems, and still run clean under Valgrind.  It did detect a few leaks, though:

- Need to call th_comment_clear and th_info_clear in nsTheoraStream destructor.
- Because some Vorbis and Theora structures are only freed if mActive is true, it looks like these are leaking in some circumstances.
- ogg_sync_buffer, called from FindEndTime, ReadOggPage, and SeekBisection, is leaking.  Possibly because we bail earlier due to errors (e.g. Read() failing) before calling ogg_sync_wrote?
Why does nsVorbisStream::Init try to call vorbis_block_init if vorbis_synthesis_init fails?  Missing early returns in the error paths?

> - ogg_sync_buffer, called from FindEndTime, ReadOggPage, and SeekBisection, is
> leaking.  Possibly because we bail earlier due to errors (e.g. Read() failing)
> before calling ogg_sync_wrote?

This appears to be caused by missing calls to ogg_sync_clear to clean up mOggState.
Another one of the leaks is a Vorbis bug.  Filed https://trac.xiph.org/ticket/1663
(In reply to comment #36)
> The new Ogg backend uses a separate thread to write audio, so it can write as
> much as possible and let sydneyaudio writes block.  This is better (than the
> current code, at least) because we keep the audio buffers as full as possible
> and are less likely to underrun.

OK. Should we worry about its effect on the Wave backend?
Would it be too much trouble to rename EOS to EndOfStream everywhere?

+    delete sound;
+    sound = nsnull;

Make 'sound' an nsAutoPtr to avoid explicit delete.

Seems to me that mPlaying is equivalent to !mPlayStartTime.IsNull(). Why not just use the latter? That means we don't have to worry about keeping them consistent.

Is it always true that mStartTime + mDuration = mEndTime? If so, let's not store one of them, say mEndTime. Make a method GetEndTime() to use wherever we need that.

+    switch(mState) {

space before (

+            // This will wake up the step decode thread and force it to
+            // call oggplay_step_decoding at least once. This guarantees
+            // we make progress.

This comment needs to be fixed...

I think it would help to have an AutoTemporaryExit autoExit(monitor) helper object that exits the monitor in its constructor and re-enters in its destructor. We don't need to have it in this patch, though.

+              delete video;

Make video an nsAutoPtr to avoid this.

+          TimeDuration duration = TimeStamp::Now() - mBufferingStart;

Move this into the PR_LOG, it's logging-only

+           // Resync against the audio clock, while we're trusting the
+           // audio clock.

fix indent

+            // Sound is disabled on this system. Sync to the system clock.

Codecs having to care about this is really ugly. Can't we have the audio system be responsible for giving us system clock times when there's no audio clock?

It seems like much of DECODER_STATE_COMPLETED repeats what's in AdvanceFrame, with special handling for when we've run out of video to play. Can't we make AdvanceFrame work when we've run out of video?

+              // some situaions, and we need to gracefully handle that.

typo

+  if (aData->mBuffer[0].stride < 0) {
+    yuvsize -= aData->mBuffer[0].stride * aData->mBuffer[0].height;
+    NS_ASSERTION(aData->mBuffer[1].stride < 0, "buffer[1] stride should be less than 0");
+    yuvsize -= aData->mBuffer[1].stride * aData->mBuffer[1].height * 2;
+  }
+  else {
+    yuvsize += aData->mBuffer[0].stride * aData->mBuffer[0].height;
+    yuvsize += aData->mBuffer[1].stride * aData->mBuffer[1].height * 2;
+  }

Why not just use PR_ABS?

Seems that nsOggPlayStateMachine::RenderTheoraFrame always makes a copy of the YUV data. Can we skip that when it's not necessary?

+        if (videoData) {
+          delete videoData;
+        }
+        videoData = data;

delete is null-safe so the if isn't needed. But we can simplify this by making videoData an nsAutoPtr.

+  if (mState == DECODER_STATE_SHUTDOWN)
+    return;
+}

This 'if' seems pointless.

+  // Should be called by main thread.
+  PRBool HaveNextFrameData() const {
+    return mReader->mVideoQueue.GetSize() > 0 ||
+           mReader->mAudioQueue.GetSize() > 0;
+  }

Hmm. If we're playing video synced to audio, and the audio queue gets down to 0 because we run out of data, shouldn't we return false here?

Looking back at DECODER_STATE_SEEKING, how does it handle the case where a new seek is requested during a seek operation? Shouldn't SEEKING be able to stay in the SEEKING state?

I wonder if it's confusing to have nsOggStream and nsMediaStream which are quite different abstractions? Maybe nsOggStream should be called nsOggContainer?

Up to nsOggReader.cpp.
> OK. Should we worry about its effect on the Wave backend?

I brought it up in an earlier comment.  It only tries to write a maximum of 200ms of audio, which probably fits in the sydneyaudio buffers without blocking on the major platforms.  It's probably worth testing on a non-PulseAudio ALSA setup as the buffers tend to be smaller there.

> Maybe nsOggStream should be called nsOggContainer?

Yeah, Stream seems off.  It's not really acting as a container, though.  I was doing to suggest demuxer, but it's not really doing that either.  It's really just holding codec state and dealing with initializing the codecs.
(In reply to comment #42)
> Yeah, Stream seems off.  It's not really acting as a container, though.  I was
> doing to suggest demuxer, but it's not really doing that either.  It's really
> just holding codec state and dealing with initializing the codecs.

It wraps a logical bitstream according to the RFC IIRC. Something named around that maybe?
(In reply to comment #42)
> I brought it up in an earlier comment.  It only tries to write a maximum of
> 200ms of audio, which probably fits in the sydneyaudio buffers without blocking
> on the major platforms.  It's probably worth testing on a non-PulseAudio ALSA
> setup as the buffers tend to be smaller there.

Yes, the question is whether this problem is one we should worry about or not :-).
> -  { name:"beta-phrasebook.ogg", type:"audio/ogg", duration:4 },
> +  //{ name:"beta-phrasebook.ogg", type:"audio/ogg", duration:4.01 },

Why's this disabled?
...and the final set of Vorbis leaks are caused by calling vorbis_info_clear before vorbis_dsp_clear.  Fix is to clear block and dsp before info.
Blocks: 455165
+  SoundData(PRInt64 duration, int samples, float* aData, int aLength) :

aDuration, aSamples

+  float* mAudioData;

nsAutoArrayPtr?

+  int mSamples;
+  float* mAudioData;

swap the order of these fields for better packing

+  int mSamples;
+  float* mAudioData;
+  int mLength;

What are mSamples and mLength here? Is mSamples equal to mLength times the number of channels? If so, might be better to just store the number of channels and provide a Samples() method.

+  VideoData(PRInt64 aTime,
+            th_ycbcr_buffer aBuffer,
+            PRBool aKeyframe,
+            PRInt64 aGranulepos) :
+    mTime(aTime),
+    mGranulepos(aGranulepos),
+    mDuplicate(PR_FALSE),
+    mKeyframe(aKeyframe),
+    mEOS(PR_FALSE)
+  {
+    MOZ_COUNT_CTOR(VideoData);
+    for (int i=0; i < 3; ++i) {
+      int size = aBuffer[i].height * aBuffer[i].stride;
+      mBuffer[i].width = aBuffer[i].width;
+      mBuffer[i].height = aBuffer[i].height;
+      mBuffer[i].stride = aBuffer[i].stride;
+      mBuffer[i].data = (unsigned char*)malloc(size);

We should handle OOM here. This could be large.

+  PRPackedBool mDuplicate;

Document what mDuplicate means?

Use nsIntRect and nsIntSize in nsOggInfo?

+  // Finds the end time of the last page which occurs before aEndOffset.
+  // This will not read past aEndOffset, and stores result in aOutTime.
+  PRInt64 FindEndTime(PRInt64 aEndOffset);

What aOutTime? I guess you just need to fix the comment.

+  // Decodes one vorbis page, enqueuing the audio data in mAudioQueue.
+  // If aHandleEOS is true, we'll push an EOS marker onto the video queue
+  // if the end of stream is reached.
+  PRBool DecodeVorbisPage(PRBool aHandleEOS);

Why push an EOS marker onto the video queue? the video might not have ended. Should the comment refer to the audio queue?

+  PRBool DecodeTheoraPage(PRBool &aKeyframeSkip,

Why a reference here?

+  // Fills aRanges with ByteRanges denoting the sections of the media which
+  // have been downloaded and are stored in the media cache. aMonitor must
+  // be a reference to this reader's monitor, and must be held with exactly
+  // one lock count.
+  nsresult GetBufferedBytes(nsAutoMonitor& aMonitor,
+                            nsTArray<ByteRange>& aRanges);

Should the nsMediaStream be pinned in the media cache while this is called? If so, say so.

Is it worth changing the new code to use xpcom/glue/Monitor.h instead of PRMonitors?

+  // The Vorbis stream we're decoding, if we have video.

If we have audio

+          for (int i=0;i < samples; ++i)
+            for(int j=0; j < channels; ++j)

try to be consistent with spacing. Preferred would be "for (int i = 0; i < samples; ++i)" but "for (int i=0; i<samples; ++i)" would be OK and consistent with what you did later.

+              *p++ = pcm[j][i];

Needs {} around it

+    for (int i=soundBuffer.Length()-1; i>=0; --i) {
+    for (unsigned i=0; i<soundBuffer.Length(); ++i) {

These should be PRUint32 I think

There's a lot of code in DecodeVorbisPage that's duplicated between the mVorbisGranulepos == -1 case and the else case. This needs to be fixed. Perhaps by breaking the "decode a packet" code out into a helper function?

+  for (unsigned i=0; i<aFrames.Length(); i++) {

PRUint32

+  NS_ASSERTION(mPlayer->OnStateMachineThread() || mPlayer->OnDecodeThread(),
+                        "Should be on state machine or AV thread.");

Fix indent

+    for (int i=frames.Length()-2; i>=0; --i) {
+    for (unsigned i=0; i<frames.Length(); i++) {

PRUint32

As for Vorbis, we have a bunch of code in DecodeTheoraPage that's duplicated between the "got previous granulepos" case and the other case. We need to fix that.

+    aMonitor.Exit();
+    {
+      nsAutoMonitor stateMon(mPlayer->mDecoder->GetMonitor());
+      exit = (mPlayer->mState == nsOggPlayStateMachine::DECODER_STATE_SHUTDOWN);
+    }
+    aMonitor.Enter();
+    if (exit) {
+      return NS_ERROR_FAILURE;
+    }

Why do we need to check the shutdown state on every iteration of this loop?

+  for (unsigned i = 0; i < ranges.Length(); i++) {

PRUint32

+    if (r.mTimeStart < aTarget) {
+    if (r.mTimeStart < aTarget && aTarget <= r.mTimeEnd) {

<= ?

+    // instaneous.

instantaneous

+  VideoData* video = nsnull;
+  SoundData* audio = nsnull;

nsAutoPtr?

+static PRBool DoneReadingHeaders(nsTArray<nsOggStream*>& aStreams) {
+  if (aStreams.Length() == 0) {
+    return PR_FALSE;
+  }

Why return false if there are no streams? That doesn't seem to make sense.

+  for (unsigned i=0; i<aStreams.Length(); i++) {

PRUint32

+  aResult = 0xFFFFFFFF & r64;

You can just static_cast to PRUint32

+  for (unsigned i = 0; i<streams.Length(); i++) {

PRUint32

nsOggReader::FindStartTime (and other places) does work for video and audio that's basically the same. Could we give VideoData and AudioData a common superclass and make it possible to use the same code for both video and audio?

+  if (HasVideo() && theoraStartTime != -1 &&
+      HasAudio() && vorbisStartTime != -1) {
+    aOutStartTime = NS_MIN(theoraStartTime, vorbisStartTime);
+  } else if (HasVideo() && theoraStartTime != -1) {
+    aOutStartTime = theoraStartTime;
+  } else if (HasAudio() && vorbisStartTime != -1) {
+    aOutStartTime = vorbisStartTime;
+  }

Can simplify this by setting aOutStartTime to PR_INT64_MAX and then applying the video start time and then the audio start time.

+// 64 bit integer multiplication with overflow checking. Retuns PR_TRUE

"Returns"

+    return false;

PR_FALSE

Instead of this complicated MulOverflow function, why not just work in floats and have a float-to-PRInt64-with-check function to use at the end?
>            StartPlayback(mon);
>            mon.NotifyAll();

It looks like every StartPlayback is paired with a NotifyAll.  Does it make
sense to move the notify into StartPlayback?

>    const int audioThreshold = 20;

I feel like this should be based on the duration of the decoded audio,
rather than just the number of buffer items we have queued (which could hold
any amount of audio data).

>  // Deactivate any non-primary streams.

For secondary Theora and Vorbis streams (i.e. all the things we have
decoders for), this leaves all the decoder state active, which uses memory.
We don't need this state for anything as far as I can tell (StartTime
etc. bail with an error if !mActive).

>  if (HasVideo()) {
> [...]
>    aInfo.mFrameWidth = HasVideo() ? mTheoraStream->mInfo.frame_width : 0;

No need to test HasVideo() here again (nor for the next line).

> static const int PAGE_STEP = 5000;

Does it make sense to just make this 8192?  Being over 2kB, it's going to
hit the "large allocation" path in most memory allocators anyway, so you
might as well make it 2x the common page size.

>         // Upon integer overflow, just reset to PAGE_STEP...

The behaviour of signed integer overflow is undefined, so I think it should
be handled directly rather than assuming wrapping to negative.

>   mPageOffset += aPage->header_len + aPage->body_len;

About this (and anywhere else we look at header_len and body_len).  Can they
be negative (maybe with an invalid file) or does libogg guarantee they're
not?  They're defined as longs in libogg.

Also, in cases like this, do we need to handle overflow?  header_len +
body_len aren't promoted to PRInt64 for the addition.
(In reply to comment #34)
> We can get rid of mBufferOverflow completely now, right?  And the
> Available() implementation, which is only called from here.
> 
> It's possible that removing the above code will cause problems with the Wave
> backend on Linux (and any other platform with small backend/sydneyaudio
> buffers) because it assumes it can write ~200ms of audio without blocking.
> 

We'll just have to test this...

> >  p->header = new unsigned char[p->header_len + p->body_len];
> 
> Can this ever overflow, or is it checked elsewhere?

Ogg pages can only be a max of 64KB in size, including headers, so this could only overflow if libogg was broken...

Checking for overflow here isn't a big cost, so we may as well.


> >  PRInt64 frameno = th_granule_frame(mCtx, granulepos);
> 
> Error check this, or assert that we're not passing some other negative gpos in?
>  We only test for -1.

Yeah, we should bail if granulepos < 0, rather than if granulepos == -1.
(In reply to comment #35)
> Why are we changing nsAudioStream::GetPosition to return ms? consistency with
> other code?

I use whole ms in order to prevent FP rounding errors from causing us to miss keyframes. If we work in floating point when calculating timestamps, we can sometimes calculate a keyframe's timestamp as x.000000000001 due to a FP error. When the keyframe's timestamp is exactly x we *really* need x else we'll terminate our bisection after the keyframe and get visual artefacts while seeking.
> Ogg pages can only be a max of 64KB in size, including headers, so this could
> only overflow if libogg was broken...

Right, but I was wondering what happened with, say, an invalid file.  Based on a quick look at the code (ogg_framing.c around line 563) it looks like it's impossible for these to be crazy values, so we're fine (and should consider any case where it fails to be a libogg bug).
(In reply to (Roc's) comment #41)
> Is it always true that mStartTime + mDuration = mEndTime? If so, let's not
> store one of them, say mEndTime. Make a method GetEndTime() to use wherever we
> need that.

mEndTime - mStartTime != mDuration when we can't get the end time in a non-seekable stream. The duration will increase as we play in this case anyway, so we may as well only store mStartTime and mDuration.

> 
> Seems that nsOggPlayStateMachine::RenderTheoraFrame always makes a copy of the
> YUV data. Can we skip that when it's not necessary?

I assume Chris Double's YCbCr -> RGB changes or the more layers patches will clean up nsOggPlayStateMachine::RenderTheoraFrame()?

> +  // Should be called by main thread.
> +  PRBool HaveNextFrameData() const {
> +    return mReader->mVideoQueue.GetSize() > 0 ||
> +           mReader->mAudioQueue.GetSize() > 0;
> +  }
> 
> Hmm. If we're playing video synced to audio, and the audio queue gets down to 0
> because we run out of data, shouldn't we return false here?

Hmmm probably, I will take a look.

> 
> Looking back at DECODER_STATE_SEEKING, how does it handle the case where a new
> seek is requested during a seek operation? Shouldn't SEEKING be able to stay in
> the SEEKING state?

The behaviour here is the same as the old backed; when we finish a seek, we fire a callback to nsOggDecoder::SeekingStopped{AtEnd}() which detects if a seek came in while the just-finished seek was running, and if so returns us to PLAY_STATE_SEEKING, triggering a new seek. We probably should call back out to the nsOggDecoder so that the appropriate events are fired in the correct order.
(In reply to comment #52)
> I assume Chris Double's YCbCr -> RGB changes or the more layers patches will
> clean up nsOggPlayStateMachine::RenderTheoraFrame()?

Yes, that's fine for now.
(In reply to comment #47)
> Why push an EOS marker onto the video queue? the video might not have ended.

Logically, the when am EOS packet occurs in a stream, it should have ended. I should probably ignore frames after the EOS packet or EOF. It may be cleaner to set a flag saying we've read the last packet in each stream, I'll investigate.


> 
> +  PRBool DecodeTheoraPage(PRBool &aKeyframeSkip,
> 
> Why a reference here?

Because if aKeyframeSkip is true, we'll not decode or enqueue any frames which aren't keyframes. When we reach the next keyfram, DecodeTheoraPage() sets aKeyframeSkip to false, and we know not to skip any more.


> +static PRBool DoneReadingHeaders(nsTArray<nsOggStream*>& aStreams) {
> +  if (aStreams.Length() == 0) {
> +    return PR_FALSE;
> +  }
> 
> Why return false if there are no streams? That doesn't seem to make sense.

Because there are no streams in aStreams on the first iteration of the loop, and on the first iteration we don't want to assume the header-reading is done. Hmm, looks this is no longer needed since I added a readAllBOS flag to that loop...


> Instead of this complicated MulOverflow function, why not just work in floats
> and have a float-to-PRInt64-with-check function to use at the end?

I don't want to introduce rounding errors as per my earlier comment...
(In reply to comment #47)
> +    aMonitor.Exit();
> +    {
> +      nsAutoMonitor stateMon(mPlayer->mDecoder->GetMonitor());
> +      exit = (mPlayer->mState ==
> nsOggPlayStateMachine::DECODER_STATE_SHUTDOWN);
> +    }
> +    aMonitor.Enter();
> +    if (exit) {
> +      return NS_ERROR_FAILURE;
> +    }
> 
> Why do we need to check the shutdown state on every iteration of this loop?
>

I thought that we could get stuck if we shutdown while in this loop, but I can't reproduce it now. We can probably safely remove this.
(In reply to comment #35)
> +      if (theoraPlaying &&
> +          vorbisPlaying &&
> +          mDecoder->mDecoderPosition >= initialDownloadPosition)
> +      {
> +        mBufferExhausted = PR_TRUE;
> +      }
> 
> I don't understand why this is conditional on theoraPlaying && vorbisPlaying.

Oops, I not sure why either. This came from Chris Double's original patch. Chris Double: can you remember the intended effect? Can we just remove the (theoraPlaying && vorbisPlaying) condition?
(In reply to comment #56)
> 
> Oops, I not sure why either. This came from Chris Double's original patch.
> Chris Double: can you remember the intended effect? Can we just remove the
> (theoraPlaying && vorbisPlaying) condition?

Why would you want to go into the buffer state if the video is not playing?
(In reply to comment #57)
> >  Can we just remove the
> > (theoraPlaying && vorbisPlaying) condition?
> 
> Why would you want to go into the buffer state if the video is not playing?

If we're playing only audio, or if we're playing both audio and video, and the video finishes before the audio (or vice versa).
(In reply to comment #44)
> (In reply to comment #42)
> > I brought it up in an earlier comment.  It only tries to write a maximum of
> > 200ms of audio, which probably fits in the sydneyaudio buffers without blocking
> > on the major platforms.  It's probably worth testing on a non-PulseAudio ALSA
> > setup as the buffers tend to be smaller there.
> 
> Yes, the question is whether this problem is one we should worry about or not
> :-).

Removing the audio-buffer-overflow handling code caused the WAV backend to hang during blocking audio-writes on Fedora 12 64bit in VirtualBox, using the default PulseAudio server. So I'll re-enable this code for nsAudioStreams which are playing for the WAV backend only. It's not needed for the new Ogg decoder backend.
Blocks: 551277
(In reply to [roc's] comment #41)
> +            // Sound is disabled on this system. Sync to the system clock.
> 
> Codecs having to care about this is really ugly. Can't we have the audio system
> be responsible for giving us system clock times when there's no audio clock?
 
It's unfortunate, but time sync code is ugly. The play state machine needs to distinguish between the audio clock, the system clock, and the timestamp of the last played video frame in different situations.

> 
> Is it worth changing the new code to use xpcom/glue/Monitor.h instead of
> PRMonitors?

Done. I added a MonitorAutoExit class, and I noticed that MonitorAutoEnter.NotifyAll() was actually only doing a Notify(), so I've fixed that in my patch.


> +    if (r.mTimeStart < aTarget) {
> +    if (r.mTimeStart < aTarget && aTarget <= r.mTimeEnd) {
> 
> <= ?

Yup, I mean <= . The end time of a range is the end time of the last sample to finish in that range; i.e. the sample starts and finishes inside that range.
Attached patch Patch v3 (obsolete) — Splinter Review
Patch v3. Hooray!
* Enables test_seek on windows, now that async httpd.js has landed.
* Renames nsOggStream to nsOggCodecState.
* HTMLMediaElement.duration returns NaN when the duration is unknown (this is specified behaviour, we were previously returning 0).
* nsAudioStream maintains whether it's paused. This prevents deadlocks when writing to a full audio stream just after it's been paused.
* Still do non-blocking audio stream write for Wav backend, blocking in this new Ogg backend.
* Use mozilla::Monitor, mozilla::MonitorAutoEnter, and (added in this patch) mozilla::MonitorAutoExit instead of nsAutoMonitor/PRMonitor*.
* Skip-to-keyframe based on duration of queued decoded audio, rather than number of decoded audio packets as Matthew suggested.
* Refactored playback position/sync code into AdvanceFrame(), no longer duplicates logic in COMPLETED state of state machine Run().
* Non-infinite monitor waits now wait in a loop to ensure that we actually wait for the time duration we say we will. This prevents starvation of the decode or audio threads (particularly in single-core VMs).
* Limit playback position to the lesser of the clock time and Max(last played video frame end time, last played audio sample end time). This ensures that when we're playing an audio file with no sound hardware (like on Tinderbox machines...) the system clock won't advance after the end time of the media.
* Refactored Vorbis and Theora packet decodes out of nsOggReader::Decode{Video,Audio}Page().
* Leaks Matthew pointed out *should* be fixed. Matthew: it would be great if you could verify that they're fixed please.
* MediaQueues now have a Finish() method which informs them that they've reached end of stream. We now use this rather than push an end of stream packet onto the queue.
* Refactored nsOggPlayStateMachine::FindStartTime() to not duplicate logic for finding video/audio start times.
* Handled int overflow in the other places pointed out; either abort or continue with safe defaults in non-critical cases.
* Addressed other review comments...

I will post the a patch showing differences between the previous v2.2 patch and this one so that it's easier for reviewers to check that review comments were addressed...
Attachment #432716 - Attachment is obsolete: true
Attachment #436113 - Flags: superreview?(roc)
Attachment #436113 - Flags: review?(chris.double)
Attachment #436113 - Flags: feedback?(kinetik)
Attachment #432716 - Flags: superreview?(roc)
Attachment #432716 - Flags: review?(chris.double)
Attachment #432716 - Flags: feedback?(kinetik)
Comment on attachment 436113 [details] [diff] [review]
Patch v3

+  const PRUint32 mSamples;
+  nsAutoArrayPtr<float> mAudioData;

Switch the order to pack better on 64-bit

> +  PRPackedBool mDuplicate;
>
> Document what mDuplicate means?

This didn't get addressed.

-      mFrameWidth(0),
-      mFrameHeight(0),

For reasons I cannot adequately explain, nsSize does not auto-initialize to 0, so you still need to do that. (nsRect does though.)
Attachment #436113 - Flags: superreview?(roc) → superreview+
Fantastic work overall, by the way.
Comment on attachment 436113 [details] [diff] [review]
Patch v3

NSPR changes need to be moved/reviewed by someone involved in that as discussed in person.
Attachment #436113 - Flags: review?(chris.double) → review+
Also, please file a followup bug to remove media/liboggz, etc.
As we discussed, the NSPR changes need to be moved somewhere in XPCOM, at least temporarily, and broken out into their own patch. Also, the XPCOM monitor change should be broken out into its own patch and reviewed by cjones.
The patch seems to create a zero length file called 'dummy' in the root source directory - I'm guessing that's not intentional?
Depends on: 556214
The XPCOM monitor change in bug 556214 seems to be growing in scope (to changing to not using recursive monitors, etc). I worry that this will hold up this bug and possibly introduce other problems related to the change of monitors. This bug blocks a lot of my existing work. Is it worthwhile to land with the simplest of monitor changes (maybe in a file local to the video work) and move to the new monitor stuff when it's done?
Yeah, I'm OK with either adding a media-local MonitorAutoExit or just using manual exit/enter for now. Up to Chris P to decide what he wants to do.
(In reply to comment #70)
> Yeah, I'm OK with either adding a media-local MonitorAutoExit or just using
> manual exit/enter for now. Up to Chris P to decide what he wants to do.

We talked about this, we'll make a media-local MonitorAutoExit (and PR_INT64_MAX) and then come back and fix them in subsequent patches.
Depends on: 556424
Patch v3 with review comments addressed. r=doublec sr=roc. Ready to land. Depends on patch in bug 556424.
Attachment #436113 - Attachment is obsolete: true
Attachment #436114 - Attachment is obsolete: true
Attachment #436411 - Flags: superreview+
Attachment #436411 - Flags: review+
Attachment #436113 - Flags: feedback?(kinetik)
Comment on attachment 436411 [details] [diff] [review]
Patch v3.1
[Checkin: Comment 74]

>+ * The Initial Developer of the Original Code is the Mozilla Corporation.

s/Corporation/Foundation/, as per http://weblogs.mozillazine.org/gerv/archives/2010/02/mpl_initial_developer_for_mozilla_employ.html.
Pushed patch v3 + the changes Reed suggested:
http://hg.mozilla.org/mozilla-central/rev/1d00691be5f2

I will mark this as closed once it's baked a few cycles...
Attached patch Bustage fix (obsolete) — Splinter Review
Bustage fix.

Had a crash on tinderbox when checking in: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1270180075.1270181344.13387.gz#err2

In this crash, a SoundData object is null in AudioLoop(). This could happen if the audio queue is length 0, but not at EOS when in non-decoding state (like seeking).

This patch makes us wait when the audio queue is empty in AudioLoop(), rather than if the audio queue is empty and we're in decoding state. Also adds an explicit null check for paranoia's sake.
Attachment #436644 - Flags: review?(chris.double)
With revisions as per IRC conversation.
Attachment #436644 - Attachment is obsolete: true
Attachment #436646 - Flags: review?(chris.double)
Attachment #436644 - Flags: review?(chris.double)
Comment on attachment 436646 [details] [diff] [review]
Bustage fix v2
[Checkin: Comment 78]

+ NS_ASSERTION(mReader->mAudioQueue.GetSize(),

GetSize() returns a PRInt32. Although it's highly unlikely this can therefore be negative which is bad for this assertion. It should be GetSize() > 0.

r+ with that.
Attachment #436646 - Flags: review?(chris.double) → review+
Another crash in nsOggPlayStateMachine::AudioLoop():377 http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1270187807.1270188120.27899.gz

My previous bustage fix missed the actual problem. We're dereferencing the nsAutoPtr<SoundData> declared on line 339, but it's null. But the nsAutoPtr is not null because the audio queue was empty when we popped it. I think what's really happening is the nsAutoPtr is being forget()ed on line 373 when we've paused after popping the SoundData off the audio queue, but before acquiring the audio lock. We should move the crashing dereference (line 377) into the "if (!mAudioStream->IsPaused()) block on starting on line 367, so that we're not dereferencing the nsAutoPtr after forget()ing it.
Pushed "Bustage fix again": http://hg.mozilla.org/mozilla-central/rev/fa8b5b822730

No one was around to review the patch, so I took the initiative and pushed it anyway in order to make the tree green again...
(In reply to comment #74)
> http://hg.mozilla.org/mozilla-central/rev/1d00691be5f2

This broke SM and TB debug builds:

{
gkconogg_s.lib(nsOggPlayStateMachine.obj) : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __thiscall mozilla::MonitorAutoExit::~MonitorAutoExit(void)" (__imp_??1MonitorAutoExit@mozilla@@QAE@XZ) referenced in function "private: void __thiscall nsOggPlayStateMachine::StopPlayback(enum nsOggPlayStateMachine::eStopMode)" (?StopPlayback@nsOggPlayStateMachine@@AAEXW4eStopMode@1@@Z)
gkconogg_s.lib(nsOggReader.obj) : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __thiscall mozilla::MonitorAutoExit::~MonitorAutoExit(void)" (__imp_??1MonitorAutoExit@mozilla@@QAE@XZ)
gkconogg_s.lib(nsOggPlayStateMachine.obj) : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __thiscall mozilla::MonitorAutoExit::MonitorAutoExit(class mozilla::Monitor &)" (__imp_??0MonitorAutoExit@mozilla@@QAE@AAVMonitor@1@@Z) referenced in function "private: void __thiscall nsOggPlayStateMachine::StopPlayback(enum nsOggPlayStateMachine::eStopMode)" (?StopPlayback@nsOggPlayStateMachine@@AAEXW4eStopMode@1@@Z)
gkconogg_s.lib(nsOggReader.obj) : error LNK2001: unresolved external symbol "__declspec(dllimport) public: __thiscall mozilla::MonitorAutoExit::MonitorAutoExit(class mozilla::Monitor &)" (__imp_??0MonitorAutoExit@mozilla@@QAE@AAVMonitor@1@@Z)
gklayout.dll : fatal error LNK1120: 2 unresolved externals
}
Audio choppiness is actually much worse for me now.  I'm on a Pentium 3.
(In reply to comment #82)

If there's no fix coming up, PLEASE back out!
(In reply to comment #83)
> Audio choppiness is actually much worse for me now.  I'm on a Pentium 3.

Do you have a URL for that? What OS are you running?
(In reply to comment #82)
> (In reply to comment #74)
> > http://hg.mozilla.org/mozilla-central/rev/1d00691be5f2
> 
> This broke SM and TB debug builds:
> 
> {
> gkconogg_s.lib(nsOggPlayStateMachine.obj) : error LNK2019: unresolved external
> symbol "__declspec(dllimport) public: __thiscall
> mozilla::MonitorAutoExit::~MonitorAutoExit(void)"
> (__imp_??1MonitorAutoExit@mozilla@@QAE@XZ) referenced in function "private:
> void __thiscall nsOggPlayStateMachine::StopPlayback(enum
> nsOggPlayStateMachine::eStopMode)"
> (?StopPlayback@nsOggPlayStateMachine@@AAEXW4eStopMode@1@@Z)
> gkconogg_s.lib(nsOggReader.obj) : error LNK2001: unresolved external symbol
> "__declspec(dllimport) public: __thiscall
> mozilla::MonitorAutoExit::~MonitorAutoExit(void)"
> (__imp_??1MonitorAutoExit@mozilla@@QAE@XZ)
> gkconogg_s.lib(nsOggPlayStateMachine.obj) : error LNK2019: unresolved external
> symbol "__declspec(dllimport) public: __thiscall
> mozilla::MonitorAutoExit::MonitorAutoExit(class mozilla::Monitor &)"
> (__imp_??0MonitorAutoExit@mozilla@@QAE@AAVMonitor@1@@Z) referenced in function
> "private: void __thiscall nsOggPlayStateMachine::StopPlayback(enum
> nsOggPlayStateMachine::eStopMode)"
> (?StopPlayback@nsOggPlayStateMachine@@AAEXW4eStopMode@1@@Z)
> gkconogg_s.lib(nsOggReader.obj) : error LNK2001: unresolved external symbol
> "__declspec(dllimport) public: __thiscall
> mozilla::MonitorAutoExit::MonitorAutoExit(class mozilla::Monitor &)"
> (__imp_??0MonitorAutoExit@mozilla@@QAE@AAVMonitor@1@@Z)
> gklayout.dll : fatal error LNK1120: 2 unresolved externals
> }

I'll try and produce a fix for this. It'll take a few hours, I don't have a SM/TB tree on my machine.

Any ideas what the problem is? Something to do with namespaces? Do we need to export nsOggHacks.h in content/media/ogg/Makefile.in?
(In reply to comment #85)
> Do you have a URL for that? What OS are you running?

Audio of every video is choppier, but here's one example: http://www.spreadfirefox.com/5years/en-US/

Before the choppiness was more intermittent, as though the audio buffer was really small and under-running.  Now it's as though there's no audio buffer and the choppiness is practically constant.

My OS is Windows XP, XP3; Pentium 3 933 MHz

Some important information: This <video> component is very demanding on my CPU and uses 100 % CPU to play.  So maybe the audio component is not given sufficient priority, or whatever, to allow it to smoothly process sound.  However, most flash videos consume 100 % of my CPU and yet the audio is completely smooth.
(In reply to comment #86)

> I'll try and produce a fix for this. It'll take a few hours

A few hours more is fine, if it's unbroken either way after that.

> Any ideas what the problem is? Something to do with namespaces? Do we need to

The first thing I wonder about is: why Debug Windows only?

Fwiw, bug 488175 c-c check-in made no difference. (and there has be a clobber after that, iiuc)

> export nsOggHacks.h in content/media/ogg/Makefile.in?

This may be a good guess...
Depends on: 556789
Depends on: 556821
Depends on: 556889
Depends on: 556893
Pushed bustage fix: http://hg.mozilla.org/mozilla-central/rev/3421c70b573d
Problem was the linker couldn't handle "class NS_COM_GLUE NS_STACK_CLASS
MonitorAutoExit" on comm-central, removing "NS_COM_GLUE NS_STACK_CLASS" fixed
it.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
No longer blocks: 556813
Probably should put NS_STACK_CLASS back, that's only for static analysis.
(In reply to comment #87)
> (In reply to comment #85)
> > Do you have a URL for that? What OS are you running?
> 
> Audio of every video is choppier, but here's one example:
> http://www.spreadfirefox.com/5years/en-US/
> 
> Before the choppiness was more intermittent, as though the audio buffer
> was really small and under-running.  Now it's as though there's no audio
> buffer and the choppiness is practically constant.

The problem is that the decoder turns out far too many tiny blocks of audio data.  For CD-quality audio, liboggplay used to consistently produce 8kb blocks.  This implementation creates 4kb blocks when things are going well and 512b otherwise.  An 8 or 16-fold increase in the number of writes to the backend kills performance on low-end machines.

I had to rewrite my buffering strategy for the OS/2 version of sydneyaudio to make it usable with this decoder.  The result was a significant increase in performance for videos where the decoder writes 4kb blocks.  However, for the video cited - where half the blocks are 512b - the audio was horribly choppy due to constant underruns.  More data less often would likely fix the problem.
Thanks, Rich.

No wonder performance sucks so bad.  Even 8kb is too small, let alone dropping to 512b on a slow system.  Instead of starting with 4kb then dropping, how about starting with 8kb then increasing, up to 256kb?

Given Firefox's relative inefficiency, compared to a stand-alone audio/video player, more buffer space is needed to compensate -- not less.
Please stop with subjective statements such as 'given firefox's relative inefficiency', and continue on constructive comments. Rich seems to have detected a valid issue: too small blocks, which can be addressed easily. Please create a separate bug for this. A bug with a specific, defined and measurable issue can and will be solved quicker.
Depends on: 557103
Comment on attachment 436411 [details] [diff] [review]
Patch v3.1
[Checkin: Comment 74]

>new file mode 100644
>+++ b/content/media/ogg/nsOggHacks.h
>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998

Where's Netscape been hiding this for the past 12 years? ;-)

>new file mode 100644
>+++ b/content/media/ogg/nsOggPlayStateMachine.cpp
>+ * The Initial Developer of the Original Code is the Mozilla Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 2007

2010 for this new file?

If you think this decoder might be incorporated into non-Gecko projects you could get more specific than just "Mozilla code" for the original code. "Mozilla ogg decoder" or something? Totally optional and up to you. Fix the Netscape bit though.
(In reply to comment #91)
> The problem is that the decoder turns out far too many tiny blocks of audio
> data.  For CD-quality audio, liboggplay used to consistently produce 8kb
> blocks.  This implementation creates 4kb blocks when things are going well and
> 512b otherwise.  An 8 or 16-fold increase in the number of writes to the
> backend kills performance on low-end machines.

The size of the writes shouldn't matter much.  There's no particular API requirement that says a write to an sa_stream is expensive or cheap, but the expectation is certainly that it's cheap.  On the Mac and Windows backends a write that doesn't fill a block is a simple memcpy + offset update, which is extremely cheap.

> I had to rewrite my buffering strategy for the OS/2 version of sydneyaudio to
> make it usable with this decoder.  The result was a significant increase in
> performance for videos where the decoder writes 4kb blocks.  However, for the
> video cited - where half the blocks are 512b - the audio was horribly choppy
> due to constant underruns.  More data less often would likely fix the problem.

Buffering data into bigger chunks in the decoder before submitting it for playback is going to make underruns *more* likely because there's less data immediately available for the audio backend.  It sounds like the root problem is that the new backend isn't preparing enough audio data ahead of the current position to avoid underruns.

The old backend aimed for around 250ms of audio.  It looks like the new backend is aiming to be within 100-250ms of audio.  From experimentation with heavily loaded systems, we really want around 500ms of audio data available in the audio backend itself to avoid glitches caused by underruns.  Let's continue any discussion of this particular problem over in bug 484814.
Oops, the new backend targets 250-2000ms, I misread.
Depends on: 557276
(In reply to comment #94)
> (From update of attachment 436411 [details] [diff] [review])
> >new file mode 100644
> >+++ b/content/media/ogg/nsOggHacks.h
> >+ * The Initial Developer of the Original Code is
> >+ * Netscape Communications Corporation.
> >+ * Portions created by the Initial Developer are Copyright (C) 1998
> 
> Where's Netscape been hiding this for the past 12 years? ;-)

Thanks for pointing that out Dan, I fixed it when I checked in.

> >new file mode 100644
> >+++ b/content/media/ogg/nsOggPlayStateMachine.cpp
> >+ * The Initial Developer of the Original Code is the Mozilla Corporation.
> >+ * Portions created by the Initial Developer are Copyright (C) 2007
> 
> 2010 for this new file?

This file contains portions of the original nsOggDecoder.cpp, which was Copyright 2007.

> If you think this decoder might be incorporated into non-Gecko projects you
> could get more specific than just "Mozilla code" for the original code.

I wouldn't expect it to be.

(In reply to comment #90)
> Probably should put NS_STACK_CLASS back, that's only for static analysis.

The attached patch adds this. It doesn't break the comm-central build, FWIW.
Attachment #438657 - Flags: review?(roc)
Attachment #438657 - Attachment description: Cleanup patch: Add NS_STACK_CLASS to MonitorAutoExit → Cleanup patch: Add NS_STACK_CLASS to MonitorAutoExit [Checkin: Comment 98]
Attachment #436651 - Attachment description: Bustage fix again → Bustage fix again [Checkin: Comment 81]
Attachment #436646 - Attachment description: Bustage fix v2 → Bustage fix v2 [Checkin: Comment 78]
Attachment #436411 - Attachment description: Patch v3.1 → Patch v3.1 [Checkin: Comment 74]
(In reply to comment #98)
> Cleanup patch: http://hg.mozilla.org/mozilla-central/rev/9550660e5f7b

for mozilla1.9.3a5.
Flags: in-testsuite+
Target Milestone: --- → mozilla1.9.3a4
ok, so how is this exactly RESO FIXED if the chained ogg file linked in comment 20 does not work and neither do chained ogg/vorbis streams? Am I seeing an regression on Mozilla/5.0 (X11; Linux i686; rv:2.0b2) Gecko/20100720 Firefox/4.0b2? Please enlighten me!
There's no regression. Chained Oggs never worked. This bug is perfectly well described in the summary, and it is most definitely fixed.
Depends on: 601476
Depends on: 601542
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: