1195864 - Don't do YUV -> RGB conversion for video frames that will be discarded

Reporter

Description

•

10 years ago

When seeking, we seek to the previous keyframe, and then decode frames from there to the actual seek location, discarding all the intermediate frames. On Windows the YUV -> RGB conversion is optional, but we're doing it for these intermediate frames since the decoder doesn't know that they're going to be discarded. This can potentially be a lot of work, and we're trying to do it all ASAP, not at the playback framerate. I believe this is causing us to drop frames, since we wait up to 10ms on the first frame we try present, without taking into account all the colour conversion work we just scheduled. I think we could fix this by having MediaFormatReader remember the actual seek time that was requested, and have a way for MediaDataDecoder::Input/MediaRawData to specify that we don't want any Output() calls for this.

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 1

•

10 years ago

I may not have all the context but I'm thinking of something along the lines of: For example: Ouptut(TrackType aType, LazyPromise<MediaData*> aSample) template<class T> class LazyPromise { virtual Promise<T> Get() = 0; }; So the Output() function can either call Get() to start the YUV conversion, or not.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 2

•

10 years ago

Attached patch Skip RGB conversion for WMF if we can — Details — Splinter Review

Assignee: nobody → matt.woodrow

Attachment #8650068 - Flags: review?(jyavenard)

Matt Woodrow (:mattwoodrow)

Reporter

Comment 3

•

10 years ago

(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #1) > I may not have all the context but I'm thinking of something along the lines > of: > > For example: > > Ouptut(TrackType aType, LazyPromise<MediaData*> aSample) > > template<class T> > class LazyPromise { > virtual Promise<T> Get() = 0; > }; > > So the Output() function can either call Get() to start the YUV conversion, > or not. This is nice, but I don't think it's worth implementing the LazyPromise machinery just for the current bug.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 4

•

10 years ago

Comment on attachment 8650068 [details] [diff] [review] Skip RGB conversion for WMF if we can Review of attachment 8650068 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp @@ +310,5 @@ > } > > HRESULT hr = mDecoder->CreateInputSample(aSample->Data(), > uint32_t(aSample->Size()), > + aSample->mWillBeDiscarded ? -1 : aSample->mTime, I think I'll switch this to use INT64_MIN as the magic value instead of -1.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 5

•

10 years ago

Comment on attachment 8650068 [details] [diff] [review] Skip RGB conversion for WMF if we can Review of attachment 8650068 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp @@ +310,5 @@ > } > > HRESULT hr = mDecoder->CreateInputSample(aSample->Data(), > uint32_t(aSample->Size()), > + aSample->mWillBeDiscarded ? -1 : aSample->mTime, It appears that only specific properties are copied from the input sample to the output sample, so we can't easily pass the discard flag without hijacking some other property. Switching the presentation timestamp to something crazy (like INT64_MIN) seems like it should be safe, even if a bit ugly. I'll add an inline comment to explain why we're doing this.

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 6

•

10 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #4) > Comment on attachment 8650068 [details] [diff] [review] > Skip RGB conversion for WMF if we can > > Review of attachment 8650068 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp > @@ +310,5 @@ > > } > > > > HRESULT hr = mDecoder->CreateInputSample(aSample->Data(), > > uint32_t(aSample->Size()), > > + aSample->mWillBeDiscarded ? -1 : aSample->mTime, > > I think I'll switch this to use INT64_MIN as the magic value instead of -1. std::optional has better semantics than a null sentinel.

Jean-Yves Avenard [:jya]

Comment 7

•

10 years ago

Comment on attachment 8650068 [details] [diff] [review] Skip RGB conversion for WMF if we can Review of attachment 8650068 [details] [diff] [review]: ----------------------------------------------------------------- r=me with comments addressed. ::: dom/media/AbstractMediaDecoder.h @@ +54,5 @@ > + , mType(SeekTarget::Invalid) > + , mEventVisibility(MediaDecoderEventVisibility::Observable) > + { > + } > + SeekTarget(int64_t aTimeUsecs, would be a good time to use TimeUnit to avoid any future errors of usecs vs seconds oor something else.. they keep popping up ::: dom/media/MediaData.h @@ +110,5 @@ > , mDuration(0) > , mFrames(aFrames) > , mKeyframe(false) > , mDiscontinuity(false) > + , mWillBeDiscarded(false) The addition of Opus support in MP4 will require that we have a "pre-skip" or "pre-roll" field (see bug 1168674). This could be used and provide more information than a simple boolean. But I guess this will do for now ::: dom/media/MediaDecoderStateMachine.cpp @@ +1579,5 @@ > // Do the seek. > nsRefPtr<MediaDecoderStateMachine> self = this; > mSeekRequest.Begin(ProxyMediaCall(DecodeTaskQueue(), mReader.get(), __func__, > &MediaDecoderReader::Seek, mCurrentSeek.mTarget.mTime, > + mCurrentSeek.mTarget.mType, why don't we just past mCurrentSeek.mTarget instead? would be cleaner ::: dom/media/MediaFormatReader.h @@ +417,5 @@ > OnSeekFailed(TrackType::kAudioTrack, aFailure); > } > // Temporary seek information while we wait for the data > Maybe<media::TimeUnit> mOriginalSeekTime; > Maybe<media::TimeUnit> mPendingSeekTime; if we passed the SeekTarget object instead, the above would become Maybe<SeekTarget> mOriginalSeek ; Maybe<SeekTarget> mPendingSeek ::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp @@ +310,5 @@ > } > > HRESULT hr = mDecoder->CreateInputSample(aSample->Data(), > uint32_t(aSample->Size()), > + aSample->mWillBeDiscarded ? -1 : aSample->mTime, I think passing crazy value will likely greatly confuse the decoder reordering queue. This control on how quickly the decoder returns something too. It may work, but it would be something totally unsupported that could break at any time. There are other fields that you could use that would be much less likely to break stuff.

Attachment #8650068 - Flags: review?(jyavenard) → review+

Matt Woodrow (:mattwoodrow)

Reporter

Comment 8

•

10 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #7) > > I think passing crazy value will likely greatly confuse the decoder > reordering queue. This control on how quickly the decoder returns something > too. > > It may work, but it would be something totally unsupported that could break > at any time. > > There are other fields that you could use that would be much less likely to > break stuff. Oh man, I totally forgot that the WMF decoder does reordering internally, you're right, this will probably break things. Any idea what other fields we could use? The docs don't specify which fields get copied from the input to output, and testing a few unrelated ones suggests that only the ones specific to IMFSample do. I don't see any other options on that class that could be (ab)used for this purpose.

Jean-Yves Avenard [:jya]

Comment 9

•

10 years ago

I think we could kill two birds with one stone here. We should simply keep an array (or some over appropriate structure) of the original MediaRawData appended since the last keyframe. So when the image comes out of the decoder, we look for the original MediaRawData and retrieve the data we care about. The second thing we could do once we have this in place is solve the draining issue that causes the WMF decoder once drained to be unable to decode anything until a new key frame is added. That way once we drain, we have all the information we need to re-inject all the frames since the previous keyframe. All frames that were flushed since the last drain would be marked to be dropped. And there, drain problem solved. We can start decoding the new frame after a drain.

Jean-Yves Avenard [:jya]

Comment 10

•

10 years ago

Re reading this bug description, I don't see how the YUV conversion could cause to drop frames... Why would time measured to determine if we should deactivate DXVA be including the YUV conversion? What is exactly measured? It can take easily over 100ms between the time you're feeding the first frame and the time the decoder output its first frame, as the WMF decoder queue a lot of frames (typically over 25 on windows 7) 25 frames to decode is quite a lot. The Intel chipset takes between 6 and 30ms to decode a 1080p frame on my laptop. So up to 500ms before anything is output. So that's probably not what you're measuring. The MDSM will drop the frame as soon as the frame reaches it, it's never rendered. So how can the YUV->RGB conversion time matter in deciding if we use DXVA or not? I must be missing a piece of the puzzle

Matt Woodrow (:mattwoodrow)

Reporter

Comment 11

•

10 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #9) > I think we could kill two birds with one stone here. > > We should simply keep an array (or some over appropriate structure) of the > original MediaRawData appended since the last keyframe. So when the image > comes out of the decoder, we look for the original MediaRawData and retrieve > the data we care about. That seems workable, an array should be fine since N should always be small (<=25). > > The second thing we could do once we have this in place is solve the > draining issue that causes the WMF decoder once drained to be unable to > decode anything until a new key frame is added. > That way once we drain, we have all the information we need to re-inject all > the frames since the previous keyframe. All frames that were flushed since > the last drain would be marked to be dropped. > And there, drain problem solved. We can start decoding the new frame after a > drain. I wasn't aware this was an issue, when do we Drain() and then expect to pick up where we left off? Implementing Anthony's suggestion would probably be the cleanest for my bug, since we can make the decision in MediaFormatReader where we know the seek time, and don't need to worry about copying the data around between input/output samples. Solving both with a single solution is nice though, so I'm happy to do that.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 12

•

10 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #10) > Re reading this bug description, I don't see how the YUV conversion could > cause to drop frames... > Why would time measured to determine if we should deactivate DXVA be > including the YUV conversion? > > What is exactly measured? > > It can take easily over 100ms between the time you're feeding the first > frame and the time the decoder output its first frame, as the WMF decoder > queue a lot of frames (typically over 25 on windows 7) > 25 frames to decode is quite a lot. > The Intel chipset takes between 6 and 30ms to decode a 1080p frame on my > laptop. So up to 500ms before anything is output. > > So that's probably not what you're measuring. The MDSM will drop the frame > as soon as the frame reaches it, it's never rendered. > > So how can the YUV->RGB conversion time matter in deciding if we use DXVA or > not? > > I must be missing a piece of the puzzle The YUV->RGB conversion is done asynchronously, and often doesn't finish until long after we've put the frame into MDSM's video queue. Once the MDSM decides that it wants to present a given frame, it checks to see if this conversion has completed (and gives it up to 10 additional milliseconds). If it's still not ready, then we discard it and record a 'corrupt' frame. When seeking we're going to queue up a lot of this asynchronous work for the dropped frames, and not wait for it to complete. Then when the frame we actually want is decoded, we try present it immediately. The colour conversion work for this frame is behind all the dropped frames in the GPU pipeline, but we still only wait 10ms for it to complete. This could easily happen multiple times in a row depending on how fast the GPU is decoding and colour converting the frames. It's also just better to avoid doing work that we don't have to do, should save battery.

Chris Peterson [:cpeterson]

Updated

•

10 years ago

Priority: -- → P2

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 13

•

9 years ago

Mass change P2 -> P3

Priority: P2 → P3

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

3 years ago

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: matt.woodrow → nobody

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

Don't do YUV -> RGB conversion for video frames that will be discarded

Categories

(Core :: Audio/Video: Playback, defect, P3)

Tracking

()

People

(Reporter: mattwoodrow, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Updated

Attachment

General

Description

File Name

Content Type