Closed Bug 1558364 Opened 5 years ago Closed 5 years ago

Poor performance of TrackBuffersManager::CodedFrameProcessing() due to mInputBuffer.RemoveElement(0,N) on Twitch

Categories

(Core :: Audio/Video: Playback, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: cpearce, Assigned: cpearce)

References

Details

Attachments

(2 files)

Profiling https://www.twitch.tv/videos/427206039 on the HD8 tablet (with my patches from Bug 1554075 applied), I see a lot of time spent in TrackBuffersManager::CodedFrameProcessing() calling nsTArray::RemoveElementAt():

https://perfht.ml/2K9Ydb3

I believe this is caused by the call to mInputBuffer->RemoveElementsAt(0, length);.

To test this, I added logging to log the length being removed from the front of the array, vs the array's length. Sure enough, I see logging like so (this log was collected on desktop Firefox on Linux):

[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 493) of 20896
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] ProcessTasks AppendBuffer=829613
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] ProcessTasks AppendBuffer=829613 (done) len(mInputBuffer)=829613
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=829613
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=829613
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 6900) of 829613
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=20403
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=20403
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 469) of 20403
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=822713
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=822713
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 27985) of 822713
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19934
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19934
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 434) of 19934
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=794728
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=794728
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19500
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19500
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 489) of 19500
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=794728
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 7683) of 794728
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19011
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=19011
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 496) of 19011
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=787045
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=787045
[Child 14571: MediaPlayback #2]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 6766) of 787045
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=18515
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=18515
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 500) of 18515
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=780279
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=780279
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 22180) of 780279
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=18015
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=18015
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 517) of 18015
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=758099
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=758099
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=758099
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 6471) of 758099
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=17498
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=17498
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 488) of 17498
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=751628
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=751628
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 6029) of 751628
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=17010
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] SegmentParserLoop mProcessedInput=620 len(mInputBuffer)=17010
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c6fd4000] CodedFrameProcessing() removeElements(0, 494) of 17010
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=745599
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] SegmentParserLoop mProcessedInput=734 len(mInputBuffer)=745599
[Child 14571: MediaPlayback #3]: V/cpearce TrackBuffersManager[0x7fc1c3ff6000] CodedFrameProcessing() removeElements(0, 25106) of 745599

This log shows that we see a big append, followed by a large number of relatively small RemoveElements(0,N) calls to snip off the bytes consumed. Each of these calls to RemoveElements(0,N) requires memmoving the remaining content in the nsTArray<uint8_t> down by N bytes. So in the extreme cases in the above log, in order to slice off ~500 bytes from the front, we end up having to memmove almost 800,000 bytes, and we'll keep having to memmove the entire remaining data every time we remove ~500 bytes from the front until mInputBuffer is empty.

We can speed this up. A couple of ideas:

  • We could change TrackBuffersManager::mInput to a ring buffer. It's not obvious that all the demuxers would handle that gracefully, as if the ring buffer's content spanned the end of the ring buffer, we'd need to pass demuxers two slices; one slice from the start of content to the end of the ring buffer, and a second slice from the beginning of the ring buffer to the end of the content. It looks like the WebMParser could handle that to me, but it's not clear the MP4 parser would.
  • Instead of calling RemoveElements(0,N) on TrackBuffersManager::mInput, we could maintain a slice into this array, and increment its start by N elements. We'd need to pass this slice to anything that read TrackBuffersManager::mInput before. This is only a good idea if we can be sure that we always process everything in TrackBuffersManager::mInput before appending more data onto TrackBuffersManager::mInput. Otherwise the capacity of TrackBuffersManager::mInput's underlying storage could grow unbounded.

As seen in this profile of a Twitch replay: https://perfht.ml/2K9Ydb3 we can
often end up spending time in TrackBuffersManager::CodedFrameProcessing()
shaving off bytes from the front off TrackBuffersManager::mInputBuffer. This
requires all the remaining bytes to be memmove'd down to the start of this
array. Sometimes we have close to 1MB in that buffer, and when we're just
trying to consume a few hundred bytes, that becomes high overhead.

So intead of using this "slice off, shuffle down" approach change
TrackBuffersManager::mInputBuffer to be a new type MediaSpan, which maintains a
RefPtr to a MediaByteBuffer and a span defining the subregion of the buffer we
care about. This means the RemoveElementsAt(0,N) operation becomes basically
free, and we can eliminate a few other copies we were doing as well.

This allows us to avoid a (probably small) copy when we stash the pending input.

Depends on D34661

Priority: -- → P2
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2e64d8db2b4b
Add MediaSpan and use it for TrackBuffersManager::mInputBuffer. r=jya
https://hg.mozilla.org/integration/autoland/rev/098ce3586133
Convert TrackBuffersManager::mPendingInput into a MediaSpan. r=jya
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6df4eb84932
Add MediaSpan and use it for TrackBuffersManager::mInputBuffer. r=jya
https://hg.mozilla.org/integration/autoland/rev/f53080eb8868
Convert TrackBuffersManager::mPendingInput into a MediaSpan. r=jya
Flags: needinfo?(cpearce)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Regressions: 1565501
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: