Handle AV sync in MediaDecoderStateMachine for EME HW CDM

NEW
Unassigned

Status

()

4 years ago
11 months ago

People

(Reporter: kikuo, Unassigned)

Tracking

(Depends on: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments, 4 obsolete attachments)

644.21 KB, image/png
Details
104.30 KB, image/png
Details
195.36 KB, patch
Details | Diff | Splinter Review
32.15 KB, patch
jwwang
: feedback+
Details | Diff | Splinter Review
2.77 KB, patch
jwwang
: feedback+
Details | Diff | Splinter Review
18.79 KB, patch
cpearce
: feedback+
Details | Diff | Splinter Review
(Reporter)

Description

4 years ago
1. Abstract general AV sync logic for original MediaDecoderStateMahcine into General AV Sync Controller (AVSyncCtrl)
2. Implement EMEHWOverlayAVSyncCtrl for HW CDM rendering.
(Reporter)

Updated

4 years ago
Blocks: 1146791
(Reporter)

Comment 1

4 years ago
Created attachment 8600697 [details] [diff] [review]
[WIP] Abstract AdvanceFrame logic out of MediaDecoderStateMachine into GeneralAVSynchronizer_242138

1. Attempt to centralize buffering information and operation API into IBufferingInfo class
2. Add a BaseAVSynchronizer class to provide APIs for create derived classes and call DoAdvanceFrame() of inherited classes.
3. Add GeneralAVSynchronizer to handle original AdvanceFrame() logic in DoAdvanceFrame().
Assignee: nobody → kikuo
Status: NEW → ASSIGNED
(Reporter)

Updated

4 years ago
Attachment #8600697 - Attachment is patch: true

Updated

4 years ago
Component: General → General
Product: Firefox OS → Firefox

Updated

4 years ago
Component: General → General
Product: Firefox → Core

Updated

4 years ago
Component: General → Video/Audio
(Reporter)

Comment 2

4 years ago
Created attachment 8602829 [details] [diff] [review]
[WIP] Extract logic of AdvanceFrame into a newly created GeneralAVSynchronizer_242626.diff

1. Remove unnecessary coding style correction.
2. Focus on AdvanceFrame() and centralize the code of checking the need of buffering.
Attachment #8600697 - Attachment is obsolete: true
(Reporter)

Comment 3

4 years ago
Created attachment 8602830 [details]
Class diagram for AVSynchronizer_v1.png
(Reporter)

Comment 4

4 years ago
Comment on attachment 8602829 [details] [diff] [review]
[WIP] Extract logic of AdvanceFrame into a newly created GeneralAVSynchronizer_242626.diff

Hello R.O.C,

Please forgive me for taking the liberty of asking you for a favor.
Would you please take some time to help review the patch and give me some feedbacks ?

The thing is, IIRC, when I had a pitch about new GMP API for EME HW CDM on March, A.Jones mentioned that you seemed to plan a code refactoring regarding AdvanceFrame in MediaDecoderStateMachine.
I'm also trying to do this now to extend the ability so that the MediaDecoderStateMachine will be capable of handling corresponding synchronization and buffering mechanism with a HW CDM which does decrypt, decode and render.

So I try to get my foot in the door.
1. Add additional DECODER_STATE_CREATE_SYNCHRONIZER, this phase should be between DECODER_STATE_WAIT_FOR_CDM and DECODER_STATE_DECODING_FIRSTFRAME. So that StateMachine could decide which kind of synchronizer to be created.
2. Move the part of code from MediaDecoderStateMachine::AdvanceFrame into class GeneralAVSynchronizer.
3. Centralize the code which determines whether to start buffering or not.
(You could see Attachment 8602830 [details], it's a simple UML to describe the newly added attributes/methods/classes. Hope it helps)
4. Add an ExternalAVSynchronzier class, but not implemented yet. Because the original advanceframe and buffering machinery would not be suitable for it (may could be, but a lot of code could be dropped) 

I consider the 2nd step could be that storing some default threshold values in synchronizer and let synchronizer handle buffering statistics. In that case, it would be more adaptive to different situation.
Hope my explanation is clear and thanks for your time.
Please let me know if you need more details, thanks.
Attachment #8602829 - Flags: feedback?(robert)
The work in bug 1143575 is to have MediaDecoderStateMachine submit its entire queue of timestamped images to the compositor, and make the compositor responsible for picking which image to display at any given moment. It doesn't change buffering at all.

I guess that with the EME HW CDM we will not be able to take advantage of that, because our compositor will not be able to composite individual frames from the CDM, right?

Can you explain what kind of API the EME HW CDM exposes for A/V sync? Are we playing audio samples in our code and telling the CDM how to align its video timestamps with real time?

Why do you need to change the buffering logic?
(Reporter)

Comment 7

4 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> The work in bug 1143575 is to have MediaDecoderStateMachine submit its
> entire queue of timestamped images to the compositor, and make the
> compositor responsible for picking which image to display at any given
> moment. It doesn't change buffering at all.
> 
> I guess that with the EME HW CDM we will not be able to take advantage of
> that, because our compositor will not be able to composite individual frames
> from the CDM, right?

Right, here this bug, the goal is aimed to explore a data path for external decoder/renderer through MediaDecoderStateMachine. Hence we're not gonna to use our compositor to composite frames even decode frames and doing av synchronization.

> 
> Can you explain what kind of API the EME HW CDM exposes for A/V sync? Are we
> playing audio samples in our code and telling the CDM how to align its video
> timestamps with real time?

First, the answer to the 2nd question is No. We're not going to play A/V in our code. 
EME HW CDM (especially for 4K TV case) may handle all stuff (including decrypt, decode, render, and av sync), as our TV partner does. We only need to pass raw data(encrypted & undecoded) to CDM, see Bug 1146794.
You could find my previous proposal here, https://goo.gl/K0K45M (Architecture at page 6, APIs at page 7, sequence diagram at page 8)

We provide control-like APIs (Play/Seek/Pause), data requesting callback APIs (NotifyAudioInputNext/NotifyVideoInputNext) and data offering APIs (RenderAudioPacket/RenderVideoPackt)...etc,
so that CDM could pull raw data from MediaDecoderStateMachine and notify MediaDecoderStateMachine about playback elapsed time.


> 
> Why do you need to change the buffering logic?

I'm not going to change original buffering logic, it takes both undecoded and decoded data into consideration during different DECODE_STATEs. It's kinda complicated for EME HW CDM (especially for 4K TV).
So that I think we could simplified the buffering logic for EME HW CDM case.

Besides we're not doing actually decode, the data processing time in MediaDecoderStateMachine would be quite short. We may need to queue more raw data for CDM to request, and that should result to having different default buffering threshold values. I was just thinking about we may need another buffering strategy for this cases. But it's still too early now.

Hope these information could help you having a more complete understanding of it.
Thanks for your WIP, I'll look into it.
(Reporter)

Comment 8

4 years ago
Created attachment 8606100 [details]
Prototype of AdvanceMultiFrames & VideoSink.png

Hi ROC,

I looked your WIP, it seems to treat ImageContainer as a VideoSink(which doesn't exist yet).
Based on your WIP, what I'm trying to do is to create a VideoSink which wraps VideoFrameContainer so that MediaDecoderStateMachine act with AudioSink and VideoSink via GeneralAVSynchronizer.

Please refer to the attached figure to have better understanding.
(Green part would be my 1st step, orange part is the following step)
AudioSink & VideoSink will be the place holding timestamped-"processed"(not necessarily decoded) data which are expected to pass to actual rendering component.

Do you have any concern or suggestion ?
Flags: needinfo?(roc)
(Reporter)

Updated

4 years ago
Attachment #8602829 - Flags: feedback?(robert)
So you won't be using our audio decoding, video decoding, audio playback, or video playback paths. All we'll be doing is fetching data (either via MediaResource or MediaSourceExtensions) and feeding it into this HW module. OK...

I discussed this with Chris and Anthony and I think the best approach is probably to create an alternative implementation of MediaDecoderStateMachine, by copying our current implementation and removing all the parts you don't need (everything that deals with decoded video and audio, basically). That seems more straightforward and "honest" than producing "fake" decoded video and audio or generalizing MediaDecoderStateMachine so that the output of decoders/CDMs may not be real decoded video and audio, which would add complexity to that class which we don't want. You should keep on using our Reader classes for demuxing etc.

With that approach, you can clone MDSM and get it running on your own schedule. Once you have it working, we can look at the code and figure out which which parts should actually be shared with the main MDSM and share them. You will probably need to make some changes to the MDSM interface (making some methods virtual, etc), and the Reader interface (so you can extract compressed, not-decoded data).

For video rendering, you can put an OverlayImage into the VideoFrameContainer.

Does that sound OK?
Flags: needinfo?(roc)
(Reporter)

Comment 10

4 years ago
Thank you ROC !  It sounds good to me.
I've been constrained for a while to leverage MDSM interfaces in order to generalize this data path.
Cloning MDSM to start from scratch might be easier for me and prevents myself to break things :)
I'll start working on this direction.
(Reporter)

Comment 11

4 years ago
Created attachment 8625856 [details] [diff] [review]
[WIP] Create ExternalMDSM (subclass from MDSM) and FrameAdvancer (with ExAudioSink / ExVideoSink) to simulate HW CDM requesting Media RawData

It's a proof of concept WIP patch to demux media raw data (based on MediaFormatReader) and buffer these encoded data in ExMDSM.
Then A FrameAdvancer which contains ExAudioSink & ExVideoSink has 2 threads to simulate a raw packet puller (as a HW CDM Process) and a raw packet transferrer (from ExMDSM to ExAudioSink/ExVideoSink).
Just upload this patch as a record.

I will start working on a new patch based on the discussion in Whistler WW. in several directions.
1. Create BaseAudioSink / BaseVideoSink and instantiate sub class dynamically according to the metadata.
2. We could change the "RequestAudioData/RequestVideoData" API to e.g 
"RequestData(aType)", this will make MDSM more flexible.
3. By moving TrackInfo from MediaRawData to MediaData, we could identify the type of sample(demuxed/decoded).
4. How to make the synchronization work if we're using a OverlayImage & External HWC for VideoData.

Please feel free to comment, if you've got some ideas.
Attachment #8602829 - Attachment is obsolete: true
(Reporter)

Comment 12

4 years ago
Created attachment 8628190 [details] [diff] [review]
[Part1] Change data type of MediaQueue in MDSM & AudioSink to MediaData

Hi JW,

I did some modification to MediaQueue in MDSM & AudioSink, and this could be the first step to make MDSM less involved with the media data received and create the possibility to handle different media data in corresponding Sink.
Could you have a quick look on this patch ? It'd be great if there's any feedback : )
Attachment #8628190 - Flags: feedback?(jwwang)
Comment on attachment 8628190 [details] [diff] [review]
[Part1] Change data type of MediaQueue in MDSM & AudioSink to MediaData

Review of attachment 8628190 [details] [diff] [review]:
-----------------------------------------------------------------

I would prefer to have MediaData::As<T> to convert a MediaData to an AudioData or a VideoData without changing MediaQueue.h which is a container and should know as less item type as possible.
Attachment #8628190 - Flags: feedback?(jwwang)
(Reporter)

Comment 14

4 years ago
(In reply to JW Wang [:jwwang] from comment #13)
> Comment on attachment 8628190 [details] [diff] [review]
> [Part1] Change data type of MediaQueue in MDSM & AudioSink to MediaData
> 
> Review of attachment 8628190 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would prefer to have MediaData::As<T> to convert a MediaData to an
> AudioData or a VideoData without changing MediaQueue.h which is a container
> and should know as less item type as possible.

Thanks JW, that's a good point !! 
In that case, I could still do the type-casting check MediaData::As<T>. 
I think I have some ideas to evolve the patch.
Let me ask you for help again in the next few days, thanks
(Reporter)

Comment 15

4 years ago
Created attachment 8629934 [details] [diff] [review]
[Part1] Change data type of MediaQueue in MDSM/AudioSink/DecodedStream to MediaData

According to Comment 13, 
1. Add MediaData::As<T> to let consumer(AudioSink/VideoSink) do type-casting from MediaData to AudioData/VideoData.
2. MDSM should only care about time/duration-related information from MediaData.
Attachment #8628190 - Attachment is obsolete: true
(Reporter)

Comment 16

4 years ago
Created attachment 8631528 [details] [diff] [review]
[Part1] Make MDSM knowing less about AudioData/VideoData_251933.diff

Hi JW,

I've modified the patch according to Comment 13, and further more I've also replaced AudioData/VideoData from those APIs e.g. |Push/PushFront/DropXXXUpToSeekTarget/CheckTurningOffHardwareDecoder| and several member variables into MediaData.   

I'm not sure if this modification goes too far, could you give me some feedback/review ? Thanks.
Attachment #8629934 - Attachment is obsolete: true
Attachment #8631528 - Flags: feedback?(jwwang)
(Reporter)

Comment 17

4 years ago
Created attachment 8631654 [details] [diff] [review]
[Part2] Add GMP EME can render flag_251933
(Reporter)

Comment 18

4 years ago
Comment on attachment 8631654 [details] [diff] [review]
[Part2] Add GMP EME can render flag_251933

In order to query the capability of rendering, add these definition first.
I think the previous definition for the capability of decryption and decode are too lengthy, e.g. GMP_EME_CAP_DECRYPT_AND_DECODE_AUDIO.
Since a CDM with decode capability implies the ability to decrypt. Maybe it's better to rename it as GMP_EME_CAP_DECODE_AUDIO.
Attachment #8631654 - Attachment description: [Part2]_Add_GMP_EME_Can render_flag_251933.diff → [Part2] Add GMP EME can render flag_251933
Attachment #8631654 - Flags: feedback?(jwwang)
(Reporter)

Comment 19

4 years ago
Created attachment 8631674 [details] [diff] [review]
[Part3] Abstract APIs From AudioSink To BaseSink_251933.diff

Hi JW,

Sorry for the feedback? spam XD
This patch is an infrastructure for the following will-be-created classes RawVideoSink/VideoSink/VideoBaseSink(which are I'm planning to do). 

The idea is 
1. A base class "BaseSink" has APIs for resource initialization and release, e.g. Init()/PrepareShutdown()/Shutdown().
2. AudioBaseSink derived from BaseSink, and there're APIs for audio information configuration and retrieval. e.g. SetVolume/SetPlaybackRate/SetPreservesPitch...
3. Then a regular AudioSink and RawAudioSink should inherit AudioBaseSink and then implement code for their own purpose.

And my next patch(Part4) is to find a timing to create a regular VideoSink and to wrap functions which keep consuming VideoData into VideoSink.
What do you think about this ?
Attachment #8631674 - Flags: feedback?(jwwang)
Comment on attachment 8631528 [details] [diff] [review]
[Part1] Make MDSM knowing less about AudioData/VideoData_251933.diff

Review of attachment 8631528 [details] [diff] [review]:
-----------------------------------------------------------------

The idea looks fine to me.

::: dom/media/AudioSink.cpp
@@ +346,5 @@
>  AudioSink::PlayFromAudioQueue()
>  {
>    AssertOnAudioThread();
>    NS_ASSERTION(!mAudioStream->IsPaused(), "Don't play when paused");
> +  nsRefPtr<AudioData> audio = AudioQueue().PopFront().downcast<AudioData>();

Always call MediaData::As to ensure we don't cast to the wrong type.

::: dom/media/DecodedStream.cpp
@@ +473,5 @@
>    // is ref-counted.
>    aQueue.GetElementsAfter(mData->mNextAudioTime, &audio);
>    for (uint32_t i = 0; i < audio.Length(); ++i) {
> +    AudioData* a = audio[i].get()->As<AudioData>();
> +    SendStreamAudio(mData.get(), aStartTime, a, &output, rate, aVolume);

SendStreamAudio() should do the cast.

::: dom/media/MediaData.h
@@ +73,5 @@
>    // True if this is the first sample after a gap or discontinuity in
>    // the stream. This is true for the first sample in a stream after a seek.
>    bool mDiscontinuity;
>  
> +  int32_t mFrameID;

Why moving mFrameID up the class hierarchy?

@@ +88,5 @@
> +
> +  template <typename ReturnType>
> +  ReturnType* As()
> +  {
> +    MOZ_ASSERT_IF(this, this->mType == ReturnType::sType);

How could |this| be null?

@@ +125,5 @@
>      , mChannels(aChannels)
>      , mRate(aRate)
> +    , mAudioData(aData)
> +  {
> +    mFrames = aFrames;

mFrames should be a const.

::: dom/media/MediaDecoderStateMachine.cpp
@@ +547,5 @@
>  MediaDecoderStateMachine::OnAudioDecoded(AudioData* aAudioSample)
>  {
>    MOZ_ASSERT(OnTaskQueue());
>    ReentrantMonitorAutoEnter mon(mDecoder->GetReentrantMonitor());
> +  nsRefPtr<MediaData> audio(static_cast<MediaData*>(aAudioSample));

You don't need explicit up-cast.

@@ +820,5 @@
>  MediaDecoderStateMachine::OnVideoDecoded(VideoData* aVideoSample)
>  {
>    MOZ_ASSERT(OnTaskQueue());
>    ReentrantMonitorAutoEnter mon(mDecoder->GetReentrantMonitor());
> +  nsRefPtr<MediaData> video(static_cast<MediaData*>(aVideoSample));

Ditto.

@@ +2481,3 @@
>  {
>    MOZ_ASSERT(OnTaskQueue());
> +  MOZ_ASSERT(aData && aData->mType == MediaData::VIDEO_DATA);

This check is already included in MediaData::As<VideoData>.

@@ +2672,5 @@
>          remainingTime = nextFrame->mTime - clockTime;
>          break;
>        }
>        ++framesRemoved;
> +      if (!static_cast<VideoData*>(currentFrame.get())->mSentToCompositor) {

Always call MediaData::As for type-safe.

@@ +2677,5 @@
>          mDecoder->NotifyDecodedFrames(0, 0, 1);
>          VERBOSE_LOG("discarding video frame mTime=%lld clock_time=%lld",
>                      currentFrame->mTime, clockTime);
>        }
> +      CheckTurningOffHardwareDecoder(currentFrame.get());

nsRefPtr<MediaData> is implicitly convertible to Mediadata*

@@ +2825,5 @@
>      return NS_ERROR_FAILURE;
>    }
>    uint32_t frames = audio->mFrames - static_cast<uint32_t>(framesToPrune);
> +
> +  const AudioData* aData = audio.get()->As<AudioData>();

|aFoo| only for function parameters.
Just say audio->As.

::: dom/media/MediaDecoderStateMachine.h
@@ +477,5 @@
>    // If aTimeStamp is non-null, set *aTimeStamp to the TimeStamp corresponding
>    // to the returned stream time.
>    int64_t GetClock(TimeStamp* aTimeStamp = nullptr) const;
>  
> +  nsresult DropAudioUpToSeekTarget(MediaData* aSample);

The function should remain the way it is. It doesn't make sense if you pass a MediaRawData to this function.

@@ +1203,5 @@
>    // This temporarily stores the first frame we decode after we seek.
>    // This is so that if we hit end of stream while we're decoding to reach
>    // the seek target, we will still have a frame that we can display as the
>    // last frame in the media.
> +  nsRefPtr<MediaData> mFirstVideoFrameAfterSeek;

Ditto. See the comment for DropAudioUpToSeekTarget().
Attachment #8631528 - Flags: feedback?(jwwang) → feedback+
Attachment #8631654 - Flags: feedback?(jwwang) → feedback+
Comment on attachment 8631674 [details] [diff] [review]
[Part3] Abstract APIs From AudioSink To BaseSink_251933.diff

Review of attachment 8631674 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see much benefit in BaseSink. I would prefer to have one AbstractAudioSink and 2 sub-classes which are AudioDataSink and AudioRawDataSink to keep the class hierarchy as flat as possible.

::: dom/media/mediasource/MediaSourceReader.cpp
@@ +90,5 @@
>    return !mHasEssentialTrackBuffers;
>  }
>  
>  bool
> +MediaSourceReader::IsCDMRenderer()

I am not sure if this is the right place for we also have MediaFormatReader. Please consult jya about this.
Attachment #8631674 - Flags: feedback?(jwwang)
(Reporter)

Comment 22

4 years ago
(In reply to JW Wang [:jwwang] from comment #21)
> Comment on attachment 8631674 [details] [diff] [review]
> [Part3] Abstract APIs From AudioSink To BaseSink_251933.diff
> 
> Review of attachment 8631674 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't see much benefit in BaseSink. I would prefer to have one
> AbstractAudioSink and 2 sub-classes which are AudioDataSink and
> AudioRawDataSink to keep the class hierarchy as flat as possible.
> 

The reason of having a BaseSink is that APIs like e.g Init()/Shutdown()/PrepareShutdown() could be also available in VideoSink once VideoSink inherits from BaseSink. 
But I also agreed with you that keeping the class hierarchy as flat as possible is good for developer to maintain these stuff. Besides, VideoSink(still not exist) may not need these exact APIs.  
I'll flatter the classes here, thanks !
(Reporter)

Comment 23

4 years ago
(In reply to JW Wang [:jwwang] from comment #21)
> Comment on attachment 8631674 [details] [diff] [review]
> [Part3] Abstract APIs From AudioSink To BaseSink_251933.diff
> 
> Review of attachment 8631674 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: dom/media/mediasource/MediaSourceReader.cpp
> @@ +90,5 @@
> >    return !mHasEssentialTrackBuffers;
> >  }
> >  
> >  bool
> > +MediaSourceReader::IsCDMRenderer()
> 
> I am not sure if this is the right place for we also have MediaFormatReader.
> Please consult jya about this.

Hi JYA,
I was thinking about adding an API |IsCDMRenderer()| in MediaDecoderReader and then this could be used to determine the type of AudioSink/VideoSink which will be created after metadata loaded. The plan is to have different type of sinks for regular decoded frames and raw frames once the CDM has the ability of decrypt/decode/render.

I'm not sure if this is the right place as JW commented, but maybe storing CDM capability into MediaInfo for MDSM will be a better way.
Do you have any concern or suggestion about this API ?
Flags: needinfo?(jyavenard)
Comment on attachment 8631674 [details] [diff] [review]
[Part3] Abstract APIs From AudioSink To BaseSink_251933.diff

Review of attachment 8631674 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/mediasource/MediaSourceReader.cpp
@@ +90,5 @@
>    return !mHasEssentialTrackBuffers;
>  }
>  
>  bool
> +MediaSourceReader::IsCDMRenderer()

We are aiming at having all functions in MediaDecoderReader starts with MOZ_ASSERT(OnTaskQueue()). This breaks the new paradigm.

Now, three problems I see right away:
1- What if video isn't encrypted.
2- What if audio isn't encrypted.
3- What if encryption method changes part way (e.g. from encrypted to decrypted or vice-versa)

Netflix for example use an encrypted video stream, with clear audio. You're now expecting the EME decoder to know how to handle non-encrypted content.

Currently, you seem to only care about audio ; is that for encrypted audio, or to be used for encrypted video?

Anything related to EME is more cpearce cup of tea.

In any case, I would much prefer the MediaDecoderStateMachine handle this, and use the MediaInfo to determine what is encrypted rather than adding one new function that makes use of the MediaDecoder's monitor, when bholley has spent the last few months trying to get rid of it.
Attachment #8631674 - Flags: feedback?(cpearce)
Should add. MediaSourceReader.cpp is going away very soon
Flags: needinfo?(jyavenard)
(Reporter)

Comment 26

4 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #24)
> Comment on attachment 8631674 [details] [diff] [review]
> [Part3] Abstract APIs From AudioSink To BaseSink_251933.diff
> 
> Review of attachment 8631674 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/mediasource/MediaSourceReader.cpp
> @@ +90,5 @@
> >    return !mHasEssentialTrackBuffers;
> >  }
> >  
> >  bool
> > +MediaSourceReader::IsCDMRenderer()
> 
> We are aiming at having all functions in MediaDecoderReader starts with
> MOZ_ASSERT(OnTaskQueue()). This breaks the new paradigm.
>
 
Got it.

> Now, three problems I see right away:
> 1- What if video isn't encrypted.
> 2- What if audio isn't encrypted.
> 3- What if encryption method changes part way (e.g. from encrypted to
> decrypted or vice-versa)

If you don't mind, I'd like to explain more first to make the idea more clear.

The plan is to provide a data path for the situation where partner wants to utilize their own HW (Decrypt)+Decoder+Renderer and av synchronization modules due to the need of high-quality-playback experience regarding high-bandwidth digital content.
All these operations are not done in gecko, but we try to make media playback information could be gathered into MDSM for other components such as HTMLMediaElement.
Also the action that user takes from UI (e.g. pause/play/seek/stop/volume+-...) could be delivered to 3rd-party component or GMP-child-process from MDSM.

If the incoming data is clear,
1. MDSM could handle it by current SW/HW decoder path and pass these decoded frames(or image pointer) into original AudioSink(already exists)/VideoSink(to be created) as currently designed.
2. (For 2k/4k/8k content) MDSM only handles demuxed data and pass these encoded frames into RawAudioSink/RawVideoSink, then these buffered raw data should be pulled out from consumers (e.g. 3rd-party component or GMP-child-process)

If the incoming data is encrypted,
1. MDSM knows the information after reading metadata and buffers these demuxed(encrypted) data into RawAudioSink/RawVideoSink. 3rd-party component or GMP child process then handles the rest of things. 

******

Back to the problems you mentioned, and we have a premise that decode & rendering are both done by HW components (Audio renderer and video renderer may even be different devices)

>1- What if video isn't encrypted.
With encrypted audio ?  IMHO, this case seems rarely be found in reality. But if it happens, the demuxed video data can be buffered into either VideoSink or RawVideoSink. 
 - For VideoSink case, video should be already decoded and buffered in VideoSink and then MDSM could advance frame by checking clock_time reported from remote AudioDecoder/Renderer.
 - For RawVideoSink case, demuxed video data is buffered into RawVideoSink, and wait for being requested by remote video decode/render consumer. 
Demuxed(encrypted) audio is also buffered into RawAudioSink and wait for being requested by remote consumer.

2- What if audio isn't encrypted.
Like Netflix, 
The audio could be either decoded in gecko and buffered in AudioSink then consumed by libcubeb or queued into RawAudioSink and waits for being requested by remote audio decoder/render.
Demuxed(encrypted) video should be buffered into RawVideoSink and wait for being processed by remote consumer.  

3- What if encryption method changes part way (e.g. from encrypted to decrypted or vice-versa)
hmm ... I've not come out an idea yet @@ 

I'd like to think more about case 1 & 2 because decode/rendering in different devices seems to be rarely-discussed issue.

> 
> Netflix for example use an encrypted video stream, with clear audio. You're
> now expecting the EME decoder to know how to handle non-encrypted content.

Not really, we will still need a CDMProxy to accomplish the EME session/key exchange mechanism. But we only need to demux input stream and then pass these demuxed samples (encrypted or not-encrypted) to GMP-child-process for (decrypt)/decode/render.

> 
> Currently, you seem to only care about audio ; is that for encrypted audio,
> or to be used for encrypted video?

Oh, sorry for the misleading.
I was trying to separate the tasks into several stages, so my early step is to construct AudioSink/RawAudioSInk first, then is video. 

> 
> Anything related to EME is more cpearce cup of tea.
> 
> In any case, I would much prefer the MediaDecoderStateMachine handle this,
> and use the MediaInfo to determine what is encrypted rather than adding one
> new function that makes use of the MediaDecoder's monitor, when bholley has
> spent the last few months trying to get rid of it.

Cool, I remembered you said that MediaSourceReader will be gone soon @WhistlerWW and thanks for the question and suggestions. Hope that my explanation is clear.  Please feel free to question me.
(Reporter)

Comment 27

4 years ago
(In reply to JW Wang [:jwwang] from comment #20)
> Comment on attachment 8631528 [details] [diff] [review]
> [Part1] Make MDSM knowing less about AudioData/VideoData_251933.diff
> 
> Review of attachment 8631528 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The idea looks fine to me.
> 
> ::: dom/media/AudioSink.cpp
> @@ +346,5 @@
> >  AudioSink::PlayFromAudioQueue()
> >  {
> >    AssertOnAudioThread();
> >    NS_ASSERTION(!mAudioStream->IsPaused(), "Don't play when paused");
> > +  nsRefPtr<AudioData> audio = AudioQueue().PopFront().downcast<AudioData>();
> 
> Always call MediaData::As to ensure we don't cast to the wrong type.

PopFront() returns an already_AddRefed<T>, in this case, T is MediaData and I found that the |downcast| helper function could do thing like MediaData::As<U>. It's more neat I think, but yeah it's also unsafe.
I could use |nsRefPtr<AudioData> audio = dont_AddRef(AudioQueue().PopFront().take()->As<AudioData>())| instead.   

> 
> ::: dom/media/MediaData.h
> @@ +73,5 @@
> >    // True if this is the first sample after a gap or discontinuity in
> >    // the stream. This is true for the first sample in a stream after a seek.
> >    bool mDiscontinuity;
> >  
> > +  int32_t mFrameID;
> 
> Why moving mFrameID up the class hierarchy?

I was trying to make |MediaDecoderStateMachine::Push| and |MediaDecoderStateMachine::PushFront| handle MediaData* as input, but when pushing (MediaData*)aSample to VideoQueue(), aSample->mFrameID should increase.

One solution is to cast the aSample from MediaData* to VideoData* and make mFrameID increased, another is to make mFrameID as a member variable in MediaData. Since I already move mFrames from AudioData to MediaData, so I considered move mFrameID to MediaData too, and set default zero for AudioData/MediaRawData.  

What do you think ?
Flags: needinfo?(jwwang)
It doesn't make sense to me to extract mFrameID into MediaData since it is a concept for VideoData only. It is also weird for an AudioData to have a frame id since an AudioData contains multiple frames.
Flags: needinfo?(jwwang)
Comment on attachment 8631674 [details] [diff] [review]
[Part3] Abstract APIs From AudioSink To BaseSink_251933.diff

Review of attachment 8631674 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/BaseSink.h
@@ +25,5 @@
> +  virtual nsresult Init() = 0;
> +
> +  // Tell the Sink to stop processing and initiate shutdown.
> +  // Must be called with the decoder monitor held.
> +  virtual void PrepareToShutdown() = 0;

It doesn't seem necessary to have separate PrepareToShutdown() and Shutdown() functions. You should just have a single Shutdown() function that returns a MediaPromise that resolves when shutdown has completed. I feel like you may also want Init() to be async; if talking to GMPs is involved and you have to await a response, the operation will definitely be async.

::: dom/media/MediaDecoderStateMachine.cpp
@@ +1750,5 @@
> +  mAudioEndTime = mAudioStartTime;
> +  MOZ_ASSERT(mAudioStartTime == GetMediaTime());
> +  mAudioCompleted = false;
> +  mAudioSink = aExternalRenderer ?
> +    (AudioBaseSink*)new RawAudioSink(this, mAudioStartTime,

Unnecessary upcast?

@@ +1756,5 @@
> +                                     mDecoder->GetAudioChannel()) :
> +    (AudioBaseSink*)new AudioSink(this, mAudioStartTime,
> +                                  mInfo.mAudio,
> +                                  mDecoder->GetAudioChannel());
> +  // OnAudioSinkError() will be called before Init() returns

If Init() returned a promise, you could have the promise reject on error, instead of calling OnAudioSinkError() inside this function call.

::: dom/media/mediasource/MediaSourceReader.cpp
@@ +92,5 @@
>  
>  bool
> +MediaSourceReader::IsCDMRenderer()
> +{
> +#ifdef MOZ_EME

I agree with jya; we should not be adding new calls into the reader except on the reader's task queue.

Maybe you need to ensure the CDM caps are available before we move out of DECODER_STATE_WAIT_FOR_CDM, and call ReadUpdatedMetadata() once we've moved out of that state so that the state machine knows whether the CDM is rendering.

::: dom/media/moz.build
@@ +247,5 @@
>      'TrackUnionStream.cpp',
>      'VideoFrameContainer.cpp',
>      'VideoPlaybackQuality.cpp',
>      'VideoSegment.cpp',
> +    'VideoSink.cpp',

There's no VideoSink.cpp file...
Attachment #8631674 - Flags: feedback?(cpearce) → feedback+
(Reporter)

Updated

3 years ago
Depends on: 1186367
(Reporter)

Updated

3 years ago
Depends on: 1186375
(Reporter)

Comment 30

3 years ago
(In reply to Chris Pearce (:cpearce) from comment #29)
> Comment on attachment 8631674 [details] [diff] [review]
> [Part3] Abstract APIs From AudioSink To BaseSink_251933.diff
> 
> Review of attachment 8631674 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/BaseSink.h
> @@ +25,5 @@
> > +  virtual nsresult Init() = 0;
> > +
> > +  // Tell the Sink to stop processing and initiate shutdown.
> > +  // Must be called with the decoder monitor held.
> > +  virtual void PrepareToShutdown() = 0;
> 
> It doesn't seem necessary to have separate PrepareToShutdown() and
> Shutdown() functions. You should just have a single Shutdown() function that
> returns a MediaPromise that resolves when shutdown has completed. I feel
> like you may also want Init() to be async; if talking to GMPs is involved
> and you have to await a response, the operation will definitely be async.
> 

I've did some work on this idea, and yes, we could merge AudioSink::PrepareToShutdonw() and AudioSink::Shutdown() into only one function. And JW already landed a refactor code about this.

But later when I was trying to make AudioSink::Init() & AudioSink::Shutdown() as async-functions  then I found that the Init() is already called in MDSM::TaskQueue(), and what MDSM does now is to wait AudioLoop starting to pull sample out after Init() is called.  It seems that we could keep AudioSink::Init() as a sync function for MDSM because the RawAudioSink will also be a pull-based component. MDSM could initialized AudiSink/RawAudioSink and start buffering until AudioSink/RawAudioSink start to pull data.
Further, we could do async-initialization inside AudioSink::Init if needed.

Comes to AudioSink::Shutdown(), it's more complicated. But I think the same idea could be applied.
Currently MDSM still does things after MDSM::StopAudioThread(), e.g (enter dormant, initiate seek, Shutdown, DECODER_STATE_COMPLETED, DispatchAudioCaptured)
And the reason we do sync-Shutdown() to AudioSink is that we don't want AudioSink accesses AudioQueue() while MDSM is going to reset that.

One idea is to set a shutdown flag into AudioSink via AudioSink::Shutdown, and AudioSink stops to access AudioQueue according to the flag and enters asyn-shutdown-stage inside.  After shutdown completed, notify AudioShutdownManager to clear the instance.
In this case, the work-flow in MDSM would be much easier, and by doing so, we make the relation between MDSM and AudioSink looser.

I'd like to file 2 bugs for patch as follow up, one to create base class for AudioSink, the other to make the AudioSink::Init()/AudioSink::Shutdown() async insdie AudioSink, and handled by AudioShutdownManager.

What do you think ?

> ::: dom/media/MediaDecoderStateMachine.cpp
> @@ +1750,5 @@
> > +  mAudioEndTime = mAudioStartTime;
> > +  MOZ_ASSERT(mAudioStartTime == GetMediaTime());
> > +  mAudioCompleted = false;
> > +  mAudioSink = aExternalRenderer ?
> > +    (AudioBaseSink*)new RawAudioSink(this, mAudioStartTime,
> 
> Unnecessary upcast?

If no casting, there's a compile error "conditional expression between distinct pointer types"
Or I can switch to if/else.
(Reporter)

Comment 31

3 years ago
(In reply to Kilik Kuo [:kikuo][:kilikkuo] from comment #30)
> (In reply to Chris Pearce (:cpearce) from comment #29)
> > Comment on attachment 8631674 [details] [diff] [review]
> > [Part3] Abstract APIs From AudioSink To BaseSink_251933.diff
> > 
> > Review of attachment 8631674 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: dom/media/BaseSink.h
> > @@ +25,5 @@
> > > +  virtual nsresult Init() = 0;
> > > +
> > > +  // Tell the Sink to stop processing and initiate shutdown.
> > > +  // Must be called with the decoder monitor held.
> > > +  virtual void PrepareToShutdown() = 0;
> > 
> > It doesn't seem necessary to have separate PrepareToShutdown() and
> > Shutdown() functions. You should just have a single Shutdown() function that
> > returns a MediaPromise that resolves when shutdown has completed. I feel
> > like you may also want Init() to be async; if talking to GMPs is involved
> > and you have to await a response, the operation will definitely be async.
> > 
> 
> I've did some work on this idea, and yes, we could merge
> AudioSink::PrepareToShutdonw() and AudioSink::Shutdown() into only one
> function. And JW already landed a refactor code about this.
> 
> But later when I was trying to make AudioSink::Init() &
> AudioSink::Shutdown() as async-functions  then I found that the Init() is
> already called in MDSM::TaskQueue(), and what MDSM does now is to wait
> AudioLoop starting to pull sample out after Init() is called.  It seems that
> we could keep AudioSink::Init() as a sync function for MDSM because the
> RawAudioSink will also be a pull-based component. MDSM could initialized
> AudiSink/RawAudioSink and start buffering until AudioSink/RawAudioSink start
> to pull data.
> Further, we could do async-initialization inside AudioSink::Init if needed.
> 
> Comes to AudioSink::Shutdown(), it's more complicated. But I think the same
> idea could be applied.
> Currently MDSM still does things after MDSM::StopAudioThread(), e.g (enter
> dormant, initiate seek, Shutdown, DECODER_STATE_COMPLETED,
> DispatchAudioCaptured)
> And the reason we do sync-Shutdown() to AudioSink is that we don't want
> AudioSink accesses AudioQueue() while MDSM is going to reset that.
> 
> One idea is to set a shutdown flag into AudioSink via AudioSink::Shutdown,
> and AudioSink stops to access AudioQueue according to the flag and enters
> asyn-shutdown-stage inside.  After shutdown completed, notify
> AudioShutdownManager to clear the instance.
> In this case, the work-flow in MDSM would be much easier, and by doing so,
> we make the relation between MDSM and AudioSink looser.
> 
> I'd like to file 2 bugs for patch as follow up, one to create base class for
> AudioSink, the other to make the AudioSink::Init()/AudioSink::Shutdown()
> async insdie AudioSink, and handled by AudioShutdownManager.
> 
> What do you think ?
> 

As an aside, If we do aysn-shutdown inside AudioSink and let new proposed AudioSinkShutdownManager handles the destroy of AudioSink instance, MDSM could just call mAudioSink->Shutdown() then null the mAudioSink pointer.  It could also reduce the dependency between MDSM and AudioSink.
Flags: needinfo?(cpearce)
(In reply to Kilik Kuo [:kikuo][:kilikkuo] from comment #31)
> As an aside, If we do aysn-shutdown inside AudioSink and let new proposed
> AudioSinkShutdownManager handles the destroy of AudioSink instance, MDSM
> could just call mAudioSink->Shutdown() then null the mAudioSink pointer.  It
> could also reduce the dependency between MDSM and AudioSink.

Sounds good to me.
Flags: needinfo?(cpearce)
(Reporter)

Updated

3 years ago
Depends on: 1188268, 1188269
(Reporter)

Updated

3 years ago
Depends on: 1188812
(Reporter)

Updated

3 years ago
Depends on: 1194606
(Reporter)

Updated

3 years ago
Depends on: 1194652
(Reporter)

Updated

3 years ago
Depends on: 1194918
Component: Audio/Video → Audio/Video: Playback
(Reporter)

Updated

11 months ago
Assignee: kikuo → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.