Closed Bug 1227396 Opened 9 years ago Closed 9 years ago

mp4_demuxer::Index::ConvertByteRangesToTimeRanges not handling incremental updates

Categories

(Core :: Audio/Video: Playback, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: jya, Assigned: jya)

References

Details

Attachments

(14 files, 16 obsolete files)

12.45 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
2.59 KB, patch
jya
: review+
Details | Diff | Splinter Review
16.41 KB, patch
Details | Diff | Splinter Review
840 bytes, patch
jya
: review+
Details | Diff | Splinter Review
642 bytes, patch
jya
: review+
Details | Diff | Splinter Review
929 bytes, patch
jya
: review+
Details | Diff | Splinter Review
2.86 KB, patch
jya
: review+
Details | Diff | Splinter Review
41.08 KB, patch
jya
: review+
Details | Diff | Splinter Review
5.47 KB, patch
jya
: review+
Details | Diff | Splinter Review
17.11 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
808 bytes, patch
cpearce
: review+
Details | Diff | Splinter Review
1.44 KB, patch
jya
: review+
Details | Diff | Splinter Review
8.82 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
5.01 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
ConvertByteRangesToTimeRanges always scan every single samples and perform a quick sort on all the samples currently buffered.

As the video gets big, and ConvertByteRangsToTimeRanges get called often we will often see quick sort being called several times per second on array with over 100,000 elements.

The mp4_demuxer::Index should handle incremental updates and only recalculates new data from the last call.
Attachment #8691697 - Flags: review?(cpearce)
Will facilitate the replacement of MediaByteRange by Interval<int64_t>
Attachment #8691698 - Flags: review?(cpearce)
Attachment #8691699 - Flags: review?(cpearce)
Attachment #8691700 - Flags: review?(gsquelart)
Attachment #8691701 - Flags: review?(gsquelart)
It's now okay to simplify.
Attachment #8691703 - Flags: review?(cpearce)
There was several drawbacks in code:
1- It required that a moof always start with a keyframe
2- You couldn't use it with partial data

An accurate buffered range was required with the old MSE architecture. This code path is no longer used with MSE so we can relax the calculation.
Attachment #8691704 - Flags: review?(ajones)
As we want to process data incrementally, yet have no guarantee that the cached data is contiguous between runs, only considering a frame to be buffered if it's entirely contained causes small gaps (typically 1 frame length) when processing the cached range incrementally.
Attachment #8691705 - Flags: review?(ajones)
Attachment #8691706 - Flags: review?(cpearce)
Attached patch P11. Remove unnecessary monitor. (obsolete) — Splinter Review
The Index and MoofParser are now only used via the MP4Demuxer which is guaranteed to always be called on the same taskqueue.
Attachment #8691707 - Flags: review?(cpearce)
Attachment #8691708 - Flags: review?(cpearce)
Attachment #8691700 - Flags: review?(gsquelart) → review+
Comment on attachment 8691701 [details] [diff] [review]
P5. Add IntervalSet::Clear method.

Review of attachment 8691701 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with small change please.

::: dom/media/Intervals.h
@@ +665,5 @@
>    }
>  
> +  void Clear()
> +  {
> +    return mIntervals.Clear();

Remove 'return '.
Attachment #8691701 - Flags: review?(gsquelart) → review+
With this change, MP4TrackDemuxer::GetBuffered() and ConvertByteRangesToTimeRanges() when playing the first 5 minutes of this video (http://channel9.msdn.com/Shows/Visual-Studio-Toolbox/Visual-Studio-Debugging-Tips-and-Tricks which is much bigger than the one listed in bug 1227299 and served by a much faster server making it easier to test).

before:
mp4_demuxer::Index::ConvertByteRangesToTimeRanges() is called from 5 threads with 28s spent in each in QuickSort
after:
17ms on each 5 threads is spent on QuickSort.

Without patch 10, playback actually stall after 1 minute 30s until the video is fully buffered (which occurs after about 6 minutes) as the media taskqueues is actually too busy normalising the buffered range.
Assignee: nobody → jyavenard
An interval with a length of 0 doesn't really exist and will be removed when used in an IntervalSet. As such, calculating a Span with an empty intervals didn't really make sense
Attachment #8691753 - Flags: review?(gsquelart)
Attachment #8691753 - Flags: review?(gsquelart) → review+
Also fix constness.
Attachment #8691849 - Flags: review?(gsquelart)
Attachment #8691849 - Flags: review?(gsquelart) → review+
Some numbers after all the changes.

Time spent in NS_QuickSort playing https://channel9.msdn.com/Shows/Visual-Studio-Toolbox/Visual-Studio-Debugging-Tips-and-Tricks over 5 minutes:

Before: 28.2s (out of 117.75s total CPU time or 23.94%)
After: 14ms (out of 84s total CPU time or 0.01%)
Above was in opt mode
JW, comparing the time between calculating MediaFormatReader::GetBufferedInternal vs nothing (I add a quick return) shows a difference of 35.6s CPU time still (84s vs 48.4)

Can probably still save a great amount of time in ConvertByteRangesToTimeRanges. While I have drastically reduced the time spent in QuickSort, the profiler shows that the greatest time now is spent in RangeFinder::Contains() which scan every single frames to find out if they are contained or not.

Can probably save some times there still, but re-adding the throttle may still be the quickest method is achieving what we want quickly.

In MediaDecoderReader::NotifyDataArrived, I see that we are often called while the cached range hasn't changed (likely because several 32kB blocks are fired at once, and by the time they are received on the reader's task queue the cache has been updated in between.

Starting to question the whole utility of NotifyDataArrived and immediately calling MediaDecoderReader::GetBuffered() this is what's killing CPU.

Could still call NotifyDataArrived() as often as now, but not calling GetBuffered that often.
Flags: needinfo?(jwwang)
https://hg.mozilla.org/mozilla-central/file/099f695d31326c39595264c34988a0f4b7cbc698/dom/media/mediasource/SourceBuffer.cpp#l484

We still need to call GetBuffered() as often as NotifyDataArrived() when the update is from MSE.

I will file a new bug to bring the throttling back.
Flags: needinfo?(jwwang)
Attachment #8691697 - Flags: review?(cpearce) → review+
Attachment #8691698 - Flags: review?(cpearce) → review+
Comment on attachment 8691699 [details] [diff] [review]
P3. Replace MediaByteBuffer with Interval<int64_t>.

Review of attachment 8691699 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaResource.h
@@ +131,5 @@
>  
>  // Represents a section of contiguous media, with a start and end offset.
>  // Used to denote ranges of data which are cached.
> +
> +class MediaByteRange : public media::Interval<int64_t> {

Any reason to not just "typedef media::Interval<int64_t> MediaByteRange" instead? To enforce that you can't intermix a different Interval<int64_t> with a MediaByteRange maybe?
Attachment #8691699 - Flags: review?(cpearce) → review+
Attachment #8691702 - Flags: review?(cpearce) → review+
Comment on attachment 8691703 [details] [diff] [review]
P7. Replace MediaByteRange with Interval<int64_t> typedef.

Review of attachment 8691703 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, I see this addresses my earlier comment. Great minds think alike. ;)
Attachment #8691703 - Flags: review?(cpearce) → review+
> Any reason to not just "typedef media::Interval<int64_t> MediaByteRange"
> instead? To enforce that you can't intermix a different Interval<int64_t>
> with a MediaByteRange maybe?

Because of the number of times you find forward declaration in the code and how it's used with nsTArray. 

I thought it was cleaner and made the patch smaller by doing it in two steps
Comment on attachment 8691706 [details] [diff] [review]
P10. Calculate mp4 buffered range incrementally.

Review of attachment 8691706 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/fmp4/MP4Demuxer.cpp
@@ +434,5 @@
>  
>    MonitorAutoLock mon(mMonitor);
> +  if (newData.Length() > 0) {
> +    nsTArray<mp4_demuxer::Interval<int64_t>> timeRanges;
> +    mIndex->ConvertByteRangesToTimeRanges(newData, &timeRanges);

It looks like Index::ConvertByteRangesToTimeRanges() only returns timestamps for frames that have a preceeding sync point. i.e. frames that can actually be decoded.

Doesn't that mean that if newData doesn't start with a key frame, which is especially likely in the non-MSE case, we could end up with holes in the buffered ranges?

For example, if newData starts in the middle of a GOP and ends in the middle of the next GOP, then we'll end up assuming everything from the previous keyframe to the following keyframe isn't buffered.

Did you test this in the non-MSE case? Note the built in video controls don't show the complete buffered ranges, they just show us having buffered from 0 to the end of the last buffered ranges.

You can use something like https://people.mozilla.org/~cpearce/buffered-demo.html to get a visual representation of the buffered ranges. You'll want to edit that page to load a bigger MP4 video obviously.

It looks to me like you need to do something to ensure that newData is expanded down to the preceding sync point on all streams, if that's buffered.

@@ +446,4 @@
>    }
> +  if (removedData.Length() > 0) {
> +    nsTArray<mp4_demuxer::Interval<int64_t>> timeRanges;
> +    mIndex->ConvertByteRangesToTimeRanges(removedData, &timeRanges);

Same problem here; you'll only remove time ranges which are decodable, i.e. if we drop a keyframe and half its GOP, the remaining half of the GOP should also be marked as non-buffered, but I don't see how you achieve that here.
Attachment #8691706 - Flags: review?(cpearce) → review-
(In reply to Chris Pearce (:cpearce) from comment #26)

> Doesn't that mean that if newData doesn't start with a key frame, which is
> especially likely in the non-MSE case, we could end up with holes in the
> buffered ranges?

this is what patch 9 correct.

Buffered range being approximate anyway, and that's exactly what webm does too: there's no reliance on keyframe.

So no we can't have hole as it is following patch 9. FWIW: I have been running all the mochitests with both the old and new method in parallel, asserting if there were any differences): there are none.



> 
> For example, if newData starts in the middle of a GOP and ends in the middle
> of the next GOP, then we'll end up assuming everything from the previous
> keyframe to the following keyframe isn't buffered.

not with patch 9 applied.

I'd say that it doesn't matter this information is never used for playback. If we want to play at a particular spot we will have to seek and necessary data will be retrieved then.
But 

> 
> Did you test this in the non-MSE case? Note the built in video controls
> don't show the complete buffered ranges, they just show us having buffered
> from 0 to the end of the last buffered ranges.

this code is only ever used for non-MSE. So of course I have tested with non-MSE :)
the entire bug is about a regression occurring when playing normal MP4.

This patch 10 is the actual patch for this bug. If you don't want this one, may as well close the bug and forget all the other patches.

> It looks to me like you need to do something to ensure that newData is
> expanded down to the preceding sync point on all streams, if that's buffered.

I disagree...

the buffered range is by itself an inaccurate approximated representation on how much data we have cached. It caches thing by block of 32kB.

> 
> Same problem here; you'll only remove time ranges which are decodable, i.e.
> if we drop a keyframe and half its GOP, the remaining half of the GOP should
> also be marked as non-buffered, but I don't see how you achieve that here.

There is no problem, only if you don't apply the entire patch stack
(In reply to Chris Pearce (:cpearce) from comment #26)

> Same problem here; you'll only remove time ranges which are decodable, i.e.
> if we drop a keyframe and half its GOP, the remaining half of the GOP should
> also be marked as non-buffered, but I don't see how you achieve that here.

forgot to add: this isn't MSE, we can't drop a keyframe. things can be removed from the cache, but the buffered range of the data we have cached. I don't see in the spec that the buffered range must be of something that is decodable.
"The buffered attribute must return a new static normalized TimeRanges object that represents the ranges of the media resource,"

as such, we are exactly per spec, the buffered range being of the media resource, not the media element.
The Ogg and WebM backends only report the buffered ranges as the time ranges that are playable, so we should have MP4 as good, if not better, than those backends.
Also to be clear, I think the idea of memoizing something here to make buffered range calculation faster is a good idea. I'm not trying to discourage that.
Comment on attachment 8691704 [details] [diff] [review]
P8. Ignore keyframes data when calculating buffered range.

based on comment for patch 10, this needs to be reworked differently.
Attachment #8691704 - Attachment is obsolete: true
Attachment #8691704 - Flags: review?(ajones)
Comment on attachment 8691705 [details] [diff] [review]
P9. Considered frames to be buffered when they are partially contained within a byte range.

based on comment for patch 10, this can now longer be used if we want an accurate buffered range
Attachment #8691705 - Attachment is obsolete: true
Attachment #8691705 - Flags: review?(ajones)
We now use an index of samples made of block of samples delimited by keyframes. The search is performed using a binary search. This allows to quickly find which blocks are found within the media cache.
On a 8 core mac pro, this leads to a 67% improvement on CPU time spent playing a long MP4 video (from 112s CPU to 45s CPU)

The optimisation is only possible if all mp4 data chunks are continuous (which they almost always are)
Attachment #8692756 - Flags: review?(cpearce)
Will facilitate the replacement of MediaByteRange by Interval<int64_t>
Attachment #8692767 - Flags: superreview+
An interval with a length of 0 doesn't really exist and will be removed when used in an IntervalSet. As such, calculating a Span with an empty intervals didn't really make sense
Attachment #8692770 - Flags: review+
It's now okay to simplify.
Attachment #8692773 - Flags: review+
The Index and MoofParser are now only used via the MP4Demuxer which is guaranteed to always be called on the same taskqueue.
Attachment #8692774 - Flags: review?(cpearce)
Attachment #8692775 - Flags: review?(cpearce)
Also fix constness.
Attachment #8692776 - Flags: review+
Attachment #8691697 - Attachment is obsolete: true
Attachment #8691698 - Attachment is obsolete: true
Attachment #8691699 - Attachment is obsolete: true
Attachment #8691700 - Attachment is obsolete: true
Attachment #8691701 - Attachment is obsolete: true
Attachment #8691702 - Attachment is obsolete: true
Attachment #8691703 - Attachment is obsolete: true
Attachment #8691706 - Attachment is obsolete: true
Attachment #8691707 - Attachment is obsolete: true
Attachment #8691707 - Flags: review?(cpearce)
Attachment #8691708 - Attachment is obsolete: true
Attachment #8691708 - Flags: review?(cpearce)
Attachment #8691753 - Attachment is obsolete: true
Attachment #8691849 - Attachment is obsolete: true
Attachment #8691860 - Attachment is obsolete: true
Attachment #8691860 - Flags: review?(cpearce)
Attachment #8692756 - Attachment is obsolete: true
Attachment #8692756 - Flags: review?(cpearce)
All samples in an audio track are keyframes. As such, use block on 128 samples instead.
Attachment #8692779 - Flags: review?(cpearce)
Attachment #8692774 - Flags: review?(cpearce) → review+
Attachment #8692775 - Flags: review?(cpearce) → review+
Attachment #8692777 - Flags: review?(cpearce) → review+
Attachment #8692756 - Attachment is obsolete: false
Attachment #8692756 - Flags: review?(cpearce)
benchmarking.
STR:
1- Load page https://channel9.msdn.com/Shows/Visual-Studio-Toolbox/Visual-Studio-Debugging-Tips-and-Tricks, got into preferences -> advanced, clear cache, then go into privacy and clear all history
2- Press Play

Time CPU usage after 5 minutes playback. I download the file at 2MB/s

All were run 3 times to ensure the times are consistent (they always are within 1s of one another)
With P14, P13: 53s
With P13: 52s
With none: 117s : playback will pause and stutter after 2 minutes as the array being sorted becomes too big.

CPU usage is around 20% playing the video with P13, and exceed 200% without.

Looks like we may not need bug 1228187 afterall
GetBuffered() now adds less than 2s CPU over 5 minutes playback.
P14 allows to drop memory usage by 3.4MB just for handling the audio track with barely any impact on CPU.
(In reply to Jean-Yves Avenard [:jya] from comment #47)
> Looks like we may not need bug 1228187 afterall
Great!
(In reply to JW Wang [:jwwang] from comment #50)
> (In reply to Jean-Yves Avenard [:jya] from comment #47)
> > Looks like we may not need bug 1228187 afterall
> Great!

I still think it should be done, because updating it every 32kB makes little sense.

roc mentioned on IRC that we should guaranteed that the media resource is pinned for the cycle at which the buffered range is returned. Ensuring that we can seek within the buffered range without never having to redownload new data (when seeking, the media resource is immediately pinned).

If doing so, I suggest that we pass the MediaByteRangeSet (which replaces the nsTArray<MediaByteRange>) as argument to GetBuffered() and have all demuxers and older readers calculate the buffered range from that range rather than having them individually query the MediaResource for the cache range.

I'll open a bug for that
(In reply to Jean-Yves Avenard [:jya] from comment #51)
> roc mentioned on IRC that we should guaranteed that the media resource is
> pinned for the cycle at which the buffered range is returned. Ensuring that
> we can seek within the buffered range without never having to redownload new
> data (when seeking, the media resource is immediately pinned).

Since the buffer ranges are calculated in the reader thread and mirrored to the main thread. The cache data might have been evicted by the time seeking happens on the main thread. I wonder how pinning will help for that case.
All samples in an audio track are keyframes. As such, use block on 128 samples instead.
Attachment #8692900 - Flags: review?(cpearce)
Attachment #8692779 - Attachment is obsolete: true
Attachment #8692779 - Flags: review?(cpearce)
git bz didn't attach my comment so here it is again:

If the last audio data block contained some samples but neither the start nor the end was in the data cache, the samples would have been ignored.
So now check that the last block intersects with the buffered data and if so, look for the frames in it.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ef7840aa0ff

try run with assert comparing the previous algorithm and the new one to check that everything is consistent.
Comment on attachment 8692756 [details] [diff] [review]
P13. Refactor how MP4 buffered range is calculated.

Review of attachment 8692756 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/libstagefright/binding/Index.cpp
@@ +254,1 @@
>      for (size_t i = 0; i < aIndex.Length(); i++) {

You could do:

for (const Indice& indice : aIndex) {
  //...

As the C++ standard guarantees the traversal will be in begin() to end() order.

@@ +301,5 @@
> +      auto& last = mDataOffset.LastElement();
> +      last.mEndOffset = aIndex.LastElement().end_offset;
> +      last.mTime = Interval<int64_t>(intervalTime.GetStart(), intervalTime.GetEnd());
> +    } else {
> +      mDataOffset.Clear();

Couldn't you break out of the loop above when you set progressive = false, as you know you'll throw away everything in mDataOffset if progressive is false? That way you don't spend the time doing work you're going to throw away.
Attachment #8692756 - Flags: review?(cpearce) → review+
Attachment #8692900 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #56)

> Couldn't you break out of the loop above when you set progressive = false,
> as you know you'll throw away everything in mDataOffset if progressive is
> false? That way you don't spend the time doing work you're going to throw
> away.

The loop is also filling up the samples table which will we need regardless (to demux)
Blocks: 1229339
Depends on: 1234778
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: