Closed Bug 877116 Opened 11 years ago Closed 11 years ago

[RTSP] audio playback latency is serious with the following media streams(Video:H.264 Audio:AAC)

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:1.3+, firefox28 fixed)

RESOLVED FIXED
1.3 Sprint 6 - 12/6
blocking-b2g 1.3+
Tracking Status
firefox28 --- fixed

People

(Reporter: u459114, Assigned: bechen)

References

Details

(Whiteboard: [FT:RIL])

Attachments

(1 file, 1 obsolete file)

RTP Sender send out two streams
  Video: H.264
  Audio: AAC

PTP Receiver playout audio with serious latency
This is a follow up bug for bug 831645.

When we are streaming the url
rtsp://184.72.239.149/vod/mp4:BigBuckBunny_175k.mov
The audio sounds very bad...

E/OMXCodec( 1744): Successfully allocated OMX node 'OMX.qcom.video.decoder.avc'
E/OMXCodec( 1744): Successfully allocated OMX node 'OMX.qcom.audio.decoder.aac'

From RtspMediaResource perspective:
There are two RtspTrackBuffer for video and audio. The incoming rate of data we get from network is OK.
But somehow the audio decoder is more slower than video decoder.
We can observe that the read function |ReadFrameFromTrack| we called for audio track is more slower than video track.
After some investigation at MediaDecoderStateMachine::AudioLoop().
I dump the these variables |sampleTime| |missingFrames| in AudioLoop().

And found that the decoded audio raw data is corrupt, then the MediaDecoderStateMachine insert some silence frames to audio output.

// hw decoder dump log
sampleTime 5119 missingFrames 3071
0x4466c0c0 Decoder playing 3071 frames of silence
audioDuration 5119 audioStartTime 0
sampleTime 5119 missingFrames 0
//

And also we have a workaround now:
Using software decoder for audio track instead hardware decoder.
In OmxDecoder::init(), change the flag kHardwareCodecsOnly to kSoftwareCodecsOnly

// software dump log
 audioDuration 1024 audioStartTime 0
ampleTime 1023 missingFrames -1
audioDuration 2048 audioStartTime 0
sampleTime 2047 missingFrames -1
audioDuration 3072 audioStartTime 0
sampleTime 3071 missingFrames -1
//
We made a mp4 video which contains aac audio, and play it from local sdcard 
v.s. streaming it from rtsp server to see if that we may find some clues.

1. From decoder's output perspective: 
ex: Assume that each frame contains 1s audio data.
    sdcard:
      The 1st read: timestamp 0s, 1 frame
      The 2nd read: timestamp 1s, 1 frame
      The 3rd read: timestamp 2s, 1 frame ...
    rtsp:
      The 1st read: timestamp 0s, 2 frame (missing 3 frames)
      The 2nd read: timestamp 5s, 2 frame (missing 3 frames)
      The 3rd read: timestamp 10s, 2 frame ...

The MediaDecodeStateMechine see the gap between each ReadAudio and insert some silence frame to audio out. As I describe in comment 2, the output audio frames of hw omx decoder are not enough.

2. From decoder's input perspective:
    sdcard:
      The 1st read: timestamp 0s, 1 frame
      The 2nd read: timestamp 1s, 1 frame
      The 3rd read: timestamp 2s, 1 frame ...
    rtsp:
      The 1st read: timestamp 0s, 5 frame
      The 2nd read: timestamp 5s, 5 frame
      The 3rd read: timestamp 10s, 5 frame ...

sdcard: Due to the mp4 is a container file, the omx decoder can read only 1 frame. (container has sampletable so it knows each frame size and each frame timestamp)

rtsp: The data is received from network hence the network packet contains multiple frames. So the omx decoder reads multiple frames every time.

I'm not sure the issue is a hw issue? But since we turn on the software aac decoder, the problem goes away.

Hi Sotaro:
Do we have any constrains that define how to feed data into the hw decoder?
In this case, I feed multiple aac audio frames once per decoder read.
Or how to know it's a hw relative issue?
Flags: needinfo?(sotaro.ikeda.g)
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
(In reply to Benjamin Chen [:bechen] from comment #3)
> 
> Hi Sotaro:
> Do we have any constrains that define how to feed data into the hw decoder?
> In this case, I feed multiple aac audio frames once per decoder read.
> Or how to know it's a hw relative issue?

Sorry, I do not know about it. It is better to ask qcom's people.
Flags: needinfo?(sotaro.ikeda.g)
Diego, do you know about hw audio codec's limitation?
Flags: needinfo?(dwilson)
Is this a requirement for WebRTC?
Flags: needinfo?(dwilson)
No, at last for now.
WebRTC use software OPUS/g711 codec only.
What is the use case for RTSP then?
Hi Diego:
In briefly, we want to support RTSP in a video tag "<source src="rtsp://xxxxx"/>",  in bug 831645.
By applying the 9 patches and 1 gaia app in bug 831645, we can streaming rtsp stream.

We found that when we are streaming a video with aac audio, the audio sounds very bad.
And I suspect it may be a hw issue since I change the aac decoder to software, problem goes away. More detail in comment 2, comment 3.

Since the streaming data is received from network, the aac packet contains 1 timestamp and several audio frames in there. It's impossible to divide them into individual frame. So I feed them(1 packet) to hw decoder once per read, and the output audio frames are not enough.

== rtsp link
rtsp://184.72.239.149/vod/mp4:BigBuckBunny_175k.mov
http://tools.ietf.org/html/rfc3640#section-3.2.3.2

According to the rfc3640, there are multiple Access-Units(AU) in a RTP packet. And the timestamp for each AU can be calculated. So we can divide the RTP packet into several AUs then feed into HW aac decoder.

If the "constantDuration" parameter is present in SDP, means that each AU's audio time is fixed. By calculating the sampling rate, constantDuration, and the timestamp in the first AU, we can reconstruct the other timesatmps for other AU.

The implementation can be done in |AMPEG4ElementaryAssembler::addPacket| and |AMPEG4ElementaryAssembler::submitAccessUnit|, abd check the timestamp here
|RTSPSource::onMessageReceived case MyHandler::kWhatAccessUnit:|.
Hence we need the "constantDuration" from SDP.

Note: 
1. If the "constantDuration" is not present in SDP, we may need another implementation for it.
2. The AU in RTP packet can be interleaved.
Blocks: b2g-RTSP-1.3
Attached patch aac.patch (obsolete) — Splinter Review
Comment on attachment 779090 [details] [diff] [review]
aac.patch

Hi Chris:
In this patch, I divide one RTP packet into individual Access-Unit.
Do you know who might want to review this patch?
Attachment #779090 - Flags: feedback?(chris.double)
(In reply to Benjamin Chen [:bechen] from comment #12)
> Comment on attachment 779090 [details] [diff] [review]
> aac.patch
> 
> Hi Chris:
> In this patch, I divide one RTP packet into individual Access-Unit.
> Do you know who might want to review this patch?

From the looks of it, a network peer.
Attachment #779090 - Flags: feedback?(chris.double)
blocking-b2g: --- → 1.3+
Whiteboard: [FT:RIL]
Comment on attachment 779090 [details] [diff] [review]
aac.patch

Hi workman: 
could you please review this patch?
Attachment #779090 - Flags: review?(sworkman)
Comment on attachment 779090 [details] [diff] [review]
aac.patch

Review of attachment 779090 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly the code looks good. I just have some questions before I r+ it.

::: netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp
@@ +224,5 @@
>      }
> +
> +    // If constantDuration and CTSDelta are not present. We should assume the
> +    // stream has fixed duration and calculate the mConstantDuration.
> +    if (!mConstantDuration && !mCTSDeltaLength && mPreviousAUCount

So, this will not work for the first packet of a stream with no "content-duration", right? How will that affect the rtpTime calculation on line 341?

@@ +378,4 @@
>  #if 0
>      printf(mAccessUnitDamaged ? "X" : ".");
>      fflush(stdout);
>  #endif

Can you remove this code in the same patch please? Just some cleanup.
(In reply to Steve Workman [:sworkman] from comment #15)
> Comment on attachment 779090 [details] [diff] [review]
> aac.patch
> 
> Review of attachment 779090 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Mostly the code looks good. I just have some questions before I r+ it.
> 
> ::: netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp
> @@ +224,5 @@
> >      }
> > +
> > +    // If constantDuration and CTSDelta are not present. We should assume the
> > +    // stream has fixed duration and calculate the mConstantDuration.
> > +    if (!mConstantDuration && !mCTSDeltaLength && mPreviousAUCount
> 
> So, this will not work for the first packet of a stream with no
> "content-duration", right? How will that affect the rtpTime calculation on
> line 341?
> 
For the first rtp packet, it won't go into the submitAccessUnit() because the check condition at line 222.
When the second rtp packet is arrived, the code flow will call submitAccessUnit() for the first rtp packet.
After that we can complete the calculation for mConstantDuration.

So for the case without "content-duration", the mConstantDuration won't affect the rtpTime calculation. The timestamp for the AUs are all the same.

Maybe we could add more codes to correct the first rtp packet.

> @@ +378,4 @@
> >  #if 0
> >      printf(mAccessUnitDamaged ? "X" : ".");
> >      fflush(stdout);
> >  #endif
> 
> Can you remove this code in the same patch please? Just some cleanup.

Sure.
(In reply to Benjamin Chen [:bechen] from comment #16)
> (In reply to Steve Workman [:sworkman] from comment #15)
> > Comment on attachment 779090 [details] [diff] [review]
> > aac.patch
> > 
> > Review of attachment 779090 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Mostly the code looks good. I just have some questions before I r+ it.
> > 
> > ::: netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp
> > @@ +224,5 @@
> > >      }
> > > +
> > > +    // If constantDuration and CTSDelta are not present. We should assume the
> > > +    // stream has fixed duration and calculate the mConstantDuration.
> > > +    if (!mConstantDuration && !mCTSDeltaLength && mPreviousAUCount
> > 
> > So, this will not work for the first packet of a stream with no
> > "content-duration", right? How will that affect the rtpTime calculation on
> > line 341?
> > 
> For the first rtp packet, it won't go into the submitAccessUnit() because
> the check condition at line 222.
> When the second rtp packet is arrived, the code flow will call
> submitAccessUnit() for the first rtp packet.
> After that we can complete the calculation for mConstantDuration.
> 
> So for the case without "content-duration", the mConstantDuration won't
> affect the rtpTime calculation. The timestamp for the AUs are all the same.
> 
> Maybe we could add more codes to correct the first rtp packet.

I think your explanation is enough. So, unless you think you need more code for the first packet, it seems ok to me.
 
> > @@ +378,4 @@
> > >  #if 0
> > >      printf(mAccessUnitDamaged ? "X" : ".");
> > >      fflush(stdout);
> > >  #endif
> > 
> > Can you remove this code in the same patch please? Just some cleanup.
> 
> Sure.

Thanks.

r=me.
Attachment #779090 - Flags: review?(sworkman) → review+
Hi Ken,
Do we need to set the target milestone to Sprint6? Or it was landed already?
Flags: needinfo?(kchang)
(In reply to whuang from comment #18)
> Hi Ken,
> Do we need to set the target milestone to Sprint6? Or it was landed already?
Hi Benjamin, can you answer Wesley’s question.
Flags: needinfo?(kchang)
Target Milestone: --- → 1.3 Sprint 6 - 12/6
(In reply to whuang from comment #18)
> Hi Ken,
> Do we need to set the target milestone to Sprint6? Or it was landed already?

Hi Wesley, Ken, we will land it soon.
Attached patch bug-877116.patchSplinter Review
Remove "#if 0" code block.
r=sworkman
Attachment #779090 - Attachment is obsolete: true
Attachment #830653 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/88a054d270df
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
See Also: → 1149616
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: