Implement buffered time ranges for WebM

RESOLVED FIXED

Status

()

Core
Audio/Video
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: kinetik, Assigned: kinetik)

Tracking

(Blocks: 2 bugs)

Trunk
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

Attachments

(3 attachments, 9 obsolete attachments)

(Assignee)

Description

7 years ago
Spinning this off from bug 462957, which contains the general DOM support and implementations for Ogg and Wave.
(Assignee)

Comment 1

7 years ago
Created attachment 450056 [details] [diff] [review]
proof of concept v0

This works, but isn't complete:

1. Need to read the segment timecode scale at startup.
2. End time needs to be calculated using last cached block in end cluster.
3. Need to do more to verify parser sync.
4. Performance is unacceptable while the media cache is downloading.

Other issues (also affects the Ogg backend, I think):

1. The local file media stream fires assertions when called on the main thread.
2. Will get horribly confused with a multi-segment file.  Ogg has some protection via stream serial numbers.

The performance issue is strange and needs investigation.  With every file tested so far, calculating the buffered ranges takes around 1500ms while the media cache is downloading, but drops to 30ms as soon as the download is suspended.
(Assignee)

Comment 2

7 years ago
The perf issue was simply the number of small cached reads performed when searching for parser sync.  The implementation in the current patch searched byte-by-byte, which is quite expensive with our cached reader implementation as it takes locks and issues syscalls to read from a file on disk.

Unlike with Ogg pages (which can be no larger than ~64kB), Cluster sizes in WebM can be almost any size, with typical sizes of 200kB-500kB (but I have a test file with a 4MB cluster).  The current patch was issuing around 100k reads before finding sync.  Using an intermediate buffer, I've reduced this to 1-10 (larger) reads.  Performance is typically < 0.5ms, but I'm still seeing up to 5ms sometimes when testing against YouTube.

Given that GetBuffered() may be called very frequently (e.g. it seems as if YouTube is calling it via timeupdate events, which we fire every frame), taking 5ms on the main thread to calculate the time ranges probably isn't acceptable.
Blocks: 569993
(Assignee)

Updated

7 years ago
Depends on: 585864
Keywords: dev-doc-needed
(Assignee)

Comment 3

7 years ago
Created attachment 466223 [details] [diff] [review]
nestegg changes v0

Adds a method to query the timecode scale from the nestegg context.
Attachment #450056 - Attachment is obsolete: true
(Assignee)

Comment 4

7 years ago
Created attachment 466225 [details] [diff] [review]
patch v1

Implement buffered for WebM:

- Adds a data discarded notification to the media cache, and feeds that notification through to the decoder's reader.

- Fix a hang in nsMediaFileStream when trying to read past EOF.

- Adds a hunk of ugly code to nsWebMReader to grovel through WebM files to fetch timecodes.

There are a bunch of optimizations that can be done to this, such as only recomputing the time ranges that are in byte ranges that have changed, but it's not yet clear how much optimization effort is worthwhile for this feature.  On my current machine a typical call (that isn't returning a cached value) takes around 0.5ms.

I've enabled test_buffered in this patch, but it currently fails because the end buffered time is not identical to the duration.  That's a real bug--the end time is actually the start time of the last frame.  I need to think about the correct way to fix this, then I'll post an updated patch.
removing doc needed, since this is already documented generally and isn't WebM specific.
Keywords: dev-doc-needed
(Assignee)

Updated

7 years ago
Attachment #466223 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #466225 - Attachment is obsolete: true
blocking2.0: --- → betaN+
(Assignee)

Comment 6

7 years ago
Created attachment 472535 [details] [diff] [review]
second attempt - nestegg changes v0
(Assignee)

Comment 7

7 years ago
Created attachment 472536 [details] [diff] [review]
second attempt - v0

Not quite ready for review, but this works.
(Assignee)

Comment 8

7 years ago
Created attachment 472950 [details] [diff] [review]
second attempt - v1
Attachment #472536 - Attachment is obsolete: true
Attachment #472950 - Flags: review?(roc)
(Assignee)

Updated

7 years ago
Attachment #472535 - Flags: review?(chris.double)
(Assignee)

Updated

7 years ago
Blocks: 587481
(Assignee)

Updated

7 years ago
Blocks: 587482
+  virtual void NotifyDataArrived(const char *aBuffer, PRUint32 aLength, PRUint32 aOffset);

no space between char and *

+  // consumes blocks.  A new parser is created for each distinct byte range
+  // in the media cache and begins parsing from the first WebM cluster
+  // within that range.  Old parsers are destroyed when their range merges

Well, the byte range might not really be in the media cache at any given time. I'd just say "A new parser is created for each distinct byte range of data received"

+    void Run(const unsigned char* aBuffer, PRUint32 aLength,
+             nsTArray<TimeDataOffset>& aMapping);

I think "Append" would be a better name.

+    PRInt64 mOffset;

I think "mCurrentOffset"

+    enum State {
+      SKIP_DATA,
+      CLUSTER_SYNC,
+      READ_VINT,
+      READ_VINT_REST,
+      TIMECODE_SYNC,
+      READ_CLUSTER_TIMECODE,
+      ANY_BLOCK_SYNC,
+      BLOCK_SYNC,
+      READ_BLOCK,
+      READ_BLOCK_TIMECODE
+    };

Better document these a bit.

I think ClusterParser could be moved to its own file. It's nice and isolated.

+    PRUint32 mIDPos;
+
+    PRUint64 mVInt;
+    PRUint32 mVIntLength;
+    PRUint32 mVIntLeft;
+
+    PRUint64 mBlockSize;
+
+    PRUint64 mClusterTimecode;
+
+    PRInt64 mBlockOffset;
+    PRInt16 mBlockTimecode;
+    PRUint32 mBlockTimecodeLength;
+
+    PRUint32 mSkipBytes;

These all need a little bit of explanation, even if it's just a reference to a spec

I think you could just overload operators instead of defining your own Comparators.

+  // Merge parsers with overlapping regions and clean up the dead remnants.

Maybe this would be a little simpler if you did

  PRUint32 i = 0;
  while (i + 1 < mRangeParser.Length()) {
    if mRangeParser[i].mCurrentOffset >= mRangeParser[i + 1].mStartOffset {
      mRangeParser[i + 1].mStartOffset = mRangeParser[i].mStartOffset;
      remove mRangeParser[i];
    } else {
      ++i;
    }
  }

+PRBool nsWebMReader::CalculateBufferedForRange(nsTimeRanges* aBuffered,
+                                               PRInt64 aStartOffset, PRInt64 aEndOffset)

Why return a value? You don't use it.

+  // Find the first data->time map entry at or after aStartOffset.
+  PRUint32 start;
+  mTimeMapping.GreatestIndexLtEq(TimeDataOffset(aStartOffset, 0), start,
+                                 TimeDataOffsetComparator());

Aren't you finding the first entry at or before aStartOffset?

+  // Find the first data->time map entry at or after aEndOffset.
+  PRUint32 end;
+  mTimeMapping.GreatestIndexLtEq(TimeDataOffset(aEndOffset, 0), end,
+                                 TimeDataOffsetComparator());

Ditto?
(Assignee)

Comment 10

7 years ago
(In reply to comment #9)
> +  // Find the first data->time map entry at or after aStartOffset.
> +  PRUint32 start;
> +  mTimeMapping.GreatestIndexLtEq(TimeDataOffset(aStartOffset, 0), start,
> +                                 TimeDataOffsetComparator());
> 
> Aren't you finding the first entry at or before aStartOffset?

It's returning an index at which inserting aStartOffset would leave the array sorted.  I don't insert anything, so it ends up pointing to the element >= aStartOffset.  Is it confusing because the method is named for one intended use (ordered insertion) and the comment is talking about a different use?

> +  // Find the first data->time map entry at or after aEndOffset.
> +  PRUint32 end;
> +  mTimeMapping.GreatestIndexLtEq(TimeDataOffset(aEndOffset, 0), end,
> +                                 TimeDataOffsetComparator());
> 
> Ditto?

The above comment for start applies, but then I decrement the index by one to make it the element <= aEndOffset.  The comment is wrong, it should say "at or before", and if there's an exact match it shouldn't decrement end by 1.  I've fixed those in my latest patch.
OK, I think the comment in nsTArray for GreatestIndexLtEq is just wrong.

+  // Find the first data->time map entry at or after aEndOffset.
+  PRUint32 end;
+  mTimeMapping.GreatestIndexLtEq(TimeDataOffset(aEndOffset, 0), end,
+                                 TimeDataOffsetComparator());
+
+  // Adjust end to be the first entry before aEndOffset.
+  if (end > 0) {
+    end -= 1;
+  }

What if there is a timecode entry for aEndOffset? Seems like we shouldn't subtract 1 in that case; the end time would be the time for aEndOffset.

+        TimeDataOffset entry(mBlockOffset , mClusterTimecode + mBlockTimecode);

Stray space before ,

I'm a bit concerned about the performance impact of using an array to store the timecode map. Usually we'll be inserting at the end of the array, but I suppose if you play some video late in the stream and then seek back to the beginning to play the rest, inserting the timecodes could be expensive. I think it would be premature to optimize it at this point, but you might want to try a pathological testcase to see how bad it can be.
(Assignee)

Updated

7 years ago
Blocks: 595071
(Assignee)

Comment 12

7 years ago
(In reply to comment #11)
> What if there is a timecode entry for aEndOffset? Seems like we shouldn't
> subtract 1 in that case; the end time would be the time for aEndOffset.

Yeah, I noticed that and mentioned it above. :-)

> I'm a bit concerned about the performance impact of using an array to store the
> timecode map. Usually we'll be inserting at the end of the array, but I suppose
> if you play some video late in the stream and then seek back to the beginning
> to play the rest, inserting the timecodes could be expensive. I think it would
> be premature to optimize it at this point, but you might want to try a
> pathological testcase to see how bad it can be.

Yeah, it needs to be optimized once this patch lands.  I've filed bug 595071 to deal with this.  Some numbers:

I timed each NotifyDataArrived call: for a 1GB 720p file (preload="metadata") with 260930 blocks (and a 4GB media cache), seeking to time=duration*0.3 and waiting for the rest of the file to download, then seeking to time=0 and waiting until the entire file had downloaded, there were 40788 calls, ~31000 were under 1ms (most are 0.01ms), ~120 were over 10ms, and the rest (~9000) were between 1 - 10ms.  There are ~29000 blocks from duration*0.3 .. duration, so almost all of the fast calls were best case (appending at the end).
(Assignee)

Comment 13

7 years ago
Created attachment 473959 [details] [diff] [review]
second attempt - v2

Everything but FSM documentation.

- Moved most of the parsing stuff to nsWebMBufferedParser
- Removed BLOCK_SYNC state and merged it with ANY_BLOCK_SYNC
- Fixed a bug parsing BlockGroups/Blocks (mBlockOffset wasn't set)
- Added SKIP_ELEMENT state to skip contents of unused elements

I'll post another patch with that and then request review, but feel free to give feedback on one too.
Attachment #472950 - Attachment is obsolete: true
Attachment #472950 - Flags: review?(roc)
(Assignee)

Comment 14

7 years ago
Created attachment 474622 [details] [diff] [review]
second attempt - v3

Adds comments documenting the parser's state machine.  Also fixes a few Win32 warnings.

Working on an additional test to go with this (we already have test_buffered, but it only tests basic functionality), I'll attach that as a separate patch once it's ready.
Attachment #473959 - Attachment is obsolete: true
Attachment #474622 - Flags: review?(roc)
+    case TIMECODE_SYNC:
+      if (*p++ != TIMECODE_ID) {
+        mState = CLUSTER_SYNC;
+        break;
+      }

If *p is CLUSTER_ID[0], shouldn't we consume it in CLUSTER_SYNC instead of skipping it here? In other words, we should increment p and consume the character only if we successfully advance to READ_VINT.

But what if the first CLUSTER_SYNC was spurious, we start reading a VINT and inside the VINT we encounter a new, real CLUSTER_SYNC? Seems like we'll lose that here too.

It seems like we might need to spawn off a new parser every time we see a CLUSTER_SYNC bit pattern, or something like that. Maybe we shouldn't try to do that now.

Other than that one issue, the rest looks good.
(Assignee)

Comment 16

7 years ago
Created attachment 474639 [details] [diff] [review]
second attempt - v3.1

Fix the issue discussed below, and also treat variable length integers smaller or larger than the spec permits as a sync error.

(In reply to comment #15)
> +    case TIMECODE_SYNC:
> +      if (*p++ != TIMECODE_ID) {
> +        mState = CLUSTER_SYNC;
> +        break;
> +      }
> 
> If *p is CLUSTER_ID[0], shouldn't we consume it in CLUSTER_SYNC instead of
> skipping it here? In other words, we should increment p and consume the
> character only if we successfully advance to READ_VINT.
> 
> But what if the first CLUSTER_SYNC was spurious, we start reading a VINT and
> inside the VINT we encounter a new, real CLUSTER_SYNC? Seems like we'll lose
> that here too.

Ah, good catch.  I handle this in another place by decrementing p before returning to CLUSTER_SYNC.  That should be sufficient to handle this case.

> It seems like we might need to spawn off a new parser every time we see a
> CLUSTER_SYNC bit pattern, or something like that. Maybe we shouldn't try to do
> that now.

Yeah, to handle this problem reliably for the general case of losing sync after we've been through multiple parse steps and consumed a bit of data (and can't necessarily back up far enough to recover) we need to do something like that.

The other change in this patch suffers from this issue--if we fail inside READ_VINT we can't easily get back to the next correct place to find sync, so we'll just scan forward looking for a sync point and possibly miss some data.

We could also have the parser buffer data from potential sync points to allow it to back up, but that could end up consuming a lot of memory, so I prefer your idea.  I'll file a follow up bug for that.
Attachment #474622 - Attachment is obsolete: true
Attachment #474639 - Flags: review?(roc)
Attachment #474622 - Flags: review?(roc)
(Assignee)

Updated

7 years ago
Blocks: 595790
(Assignee)

Comment 17

7 years ago
Created attachment 474647 [details] [diff] [review]
second attempt - v3.1

Gah, forgot to qrefresh.
Attachment #474639 - Attachment is obsolete: true
Attachment #474647 - Flags: review?(roc)
Attachment #474639 - Flags: review?(roc)
(Assignee)

Updated

7 years ago
Blocks: 587480
Comment on attachment 474647 [details] [diff] [review]
second attempt - v3.1

+      // Variable length integers must be 1 to 8 bytes long.  If we got
+      // anything else we must've lost sync.
+      if (mVIntLength < 1 || mVIntLength > 8) {

There is no way VIntLength can return a value greater than 8.

Change the >8 check to an assertion, and r=me
Attachment #474647 - Flags: review?(roc) → review+
(Assignee)

Comment 19

7 years ago
Created attachment 474651 [details] [diff] [review]
second attempt - v3.2

Thanks, that addition was just wrong.  It can't be less than 1, either.  I've added an assertion to VIntLength to document that.  Carrying forward review, thanks!
Attachment #474647 - Attachment is obsolete: true
Attachment #474651 - Flags: review+

Updated

7 years ago
Attachment #472535 - Flags: review?(chris.double) → review+
(Assignee)

Comment 20

7 years ago
http://hg.mozilla.org/mozilla-central/rev/ffd7d74232e6
http://hg.mozilla.org/mozilla-central/rev/fd21140bec7b
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Comment 21

7 years ago
Caused bustage:

WINNT 5.2 mozilla-central debug test mochitests-1/5 on 2010/09/14 00:15:34  
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1284448534.1284450904.24147.gz#err1

NEXT ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_play_events.html | Exited with code -1073741819 during test run
INFO | automation.py | Application ran for: 0:31:46.500000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmpsftv3wpidlog
==> process 3628 launched child process 412
==> process 3628 launched child process 3356
INFO | automation.py | Checking for orphan process with PID: 412
INFO | automation.py | Checking for orphan process with PID: 3356
PROCESS-CRASH | /tests/content/media/test/test_play_events.html | application crashed (minidump found)
Operating system: Windows NT
                  5.2.3790 Service Pack 2
CPU: x86
     GenuineIntel family 6 model 23 stepping 8
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION
Crash address: 0xb0

Thread 0 (crashed)
 0  xul.dll!nestegg_duration [nestegg.c:c14c5fe155da : 1507 + 0x12]
    eip = 0x10e337d8   esp = 0x0012c444   ebp = 0x0012c478   ebx = 0x03a300b8
    esi = 0x15a3dc3c   edi = 0xffff0007   eax = 0x0012c470   ecx = 0x000000b0
    edx = 0x0012c444   efl = 0x00010206
    Found by: given as instruction pointer in context


Almost certainly caused by mContext being null in nsWebMReader::GetBuffered() when nestegg_duration is called.

Pushed trivial fix: http://hg.mozilla.org/mozilla-central/rev/5588c9796f0b
(Assignee)

Comment 22

7 years ago
Created attachment 475019 [details] [diff] [review]
bustage fix as pushed
You need to log in before you can comment on or make changes to this bug.