Closed Bug 805711 Opened 8 years ago Closed 8 years ago

OMXDecoder trapped in infinite loop while seeking

Categories

(Core :: Audio/Video, defect, P1)

All
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
B2G C2 (20nov-10dec)
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: kanru, Assigned: sinker)

References

Details

(Whiteboard: [caf:blocking])

Attachments

(7 files, 5 obsolete files)

240.02 KB, video/mp4
Details
2.25 KB, patch
cajbir
: review+
cpearce
: review-
Details | Diff | Splinter Review
2.80 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
893.35 KB, video/3gpp
Details
677 bytes, patch
cpearce
: review+
Details | Diff | Splinter Review
756 bytes, patch
sinker
: review+
cpearce
: checkin+
Details | Diff | Splinter Review
WIP
4.12 KB, patch
cpearce
: feedback+
Details | Diff | Splinter Review
Attached video Sample video
While playing a sample mp4 video the seeking doesn't work and eats lots of CPU. From logcat I could observe lots of pmem being allocated and released.
Assignee: nobody → kchen
Attachment #675425 - Flags: review?(chris.double)
Comment on attachment 675425 [details] [diff] [review]
Prevent infinite loop in OMX decoder. v1

r+ on the Media OMX changes. I defer to cpearce to check the changes to his seeking code as he understands that  better than I do.
Attachment #675425 - Flags: review?(cpearce)
Attachment #675425 - Flags: review?(chris.double)
Attachment #675425 - Flags: review+
Comment on attachment 675425 [details] [diff] [review]
Prevent infinite loop in OMX decoder. v1

Review of attachment 675425 [details] [diff] [review]:
-----------------------------------------------------------------

Your patch is hiding the actual bug.

The bug is that nsMediaOmxReader::DecodeVideoFrame() returns the same frame twice when it is called by DecodeToTarget().

DecodeToTarget()'s loop is trying to decode until it gets a frame that ends after the target. nsMediaOmxReader::DecodeVideoFrame() should not return the same frame twice. There must be a bug in nsMediaOmxReader::DecodeVideoFrame()'s "seek hacks".

::: content/media/nsBuiltinDecoderReader.cpp
@@ -403,5 @@
>        }
>        video = mVideoQueue.PeekFront();
>        // If the frame end time is less than the seek target, we won't want
>        // to display this frame after the seek, so discard it.
> -      if (video && video->mEndTime <= aTarget) {

If you change the <= to < do you still get the infinite loop?
Attachment #675425 - Flags: review?(cpearce) → review-
> The bug is that nsMediaOmxReader::DecodeVideoFrame() returns the same frame
> twice when it is called by DecodeToTarget().
> 
> DecodeToTarget()'s loop is trying to decode until it gets a frame that ends
> after the target. nsMediaOmxReader::DecodeVideoFrame() should not return the
> same frame twice. There must be a bug in
> nsMediaOmxReader::DecodeVideoFrame()'s "seek hacks".

Correct. I wasn't sure which is expected behavior.

> 
> ::: content/media/nsBuiltinDecoderReader.cpp
> @@ -403,5 @@
> >        }
> >        video = mVideoQueue.PeekFront();
> >        // If the frame end time is less than the seek target, we won't want
> >        // to display this frame after the seek, so discard it.
> > -      if (video && video->mEndTime <= aTarget) {
> 
> If you change the <= to < do you still get the infinite loop?

No. IIRC this line was changed to <= for correctness sake.
"IIRC this line was changed to <= for correctness sake." what does this mean?
bug 626273 changed this
(In reply to Andreas Gal :gal from comment #5)
> "IIRC this line was changed to <= for correctness sake." what does this mean?

Bug 626273
kanru, can you remove the = and see if the testcase in that bug still passes?
(you probably want a desktop firefox build for that)
ok, building
The "Jump to 00:00:12:21" case is not correct after I remove the =.
The bug here is that nsMediaOmxReader::DecodeVideoFrame() is stopping decoding before it reaches the target.
mVideoSource->read always return the frame at the same timestamp for unknown reason. The log from android OMXCodec.cpp:

V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] FILL_BUFFER_DONE(buffer: 0x43d82510, size: 69120, flags: 0x00000000, timestamp: 4000000 us (4.00 secs))
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] Calling processSEIData
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] Calling fillBuffer on buffer 0x43d82510
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] FILL_BUFFER_DONE(buffer: 0x43d824c0, size: 69120, flags: 0x00000000, timestamp: 4000000 us (4.00 secs))
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] Calling processSEIData
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] Calling fillBuffer on buffer 0x43d824c0
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] FILL_BUFFER_DONE(buffer: 0x43d82510, size: 69120, flags: 0x00000000, timestamp: 4000000 us (4.00 secs))
V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] Calling processSEIData
We are going to need a workaround for this in our part of the stack. We might be able to get a vendor fix, but on android we will run into this as well. Kanru, can you adjust your hack? We will need it unfortunately.
I would *guess* that the problem is that libstagefright can't properly handle a frame which are the same as their predecessor; frames which have a zero-delta.

A better hack would be to detect when we read the same frame again in nsMediaOmxReader::ReadVideoFrame() and set the mEndTime of the frame to the duration of the video, push that frame to mVideoQueue, and return false to say the video stream has ended.

That way playback will finish at the correct time, since all the playback logic of nsBuiltinDecoder* handles variable length frame durations, and we won't get stuck in an infinite loop.
cpearce, are you sure this only happens at the end of the stream? (getting frames with zero delta). If so setting the end time will work.
No, I'm not sure this only happens at the end of stream, but I'm sure that if libstagefright keeps giving us the same frame with the same start time over and over we're never going to get a different frame to display anyway, so this is the best we can do.
This patch should work as well. Testing.
Attachment #679814 - Flags: review?(cpearce)
Comment on attachment 679814 [details] [diff] [review]
detect same-frame-returned as end of video stream

Review of attachment 679814 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, this is what I had in mind.
Attachment #679814 - Flags: review?(cpearce) → review+
To see the seek fail

$ adb push VID_0001.3gp /mnt/sdcard

Then start the Video application, this video will never call onseeked or be displayed (without the current patch)
Attachment #679841 - Flags: review?(cpearce)
Attachment #679841 - Attachment is obsolete: true
Attachment #679841 - Flags: review?(cpearce)
kanru, it took me an hour of my life to understand your patch, you have to comment your changes better for us mere mortals to figure out what you are doing :)
Attachment #679842 - Flags: review?(cpearce)
Attachment #679842 - Attachment is obsolete: true
Attachment #679842 - Flags: review?(cpearce)
Attachment #679844 - Flags: review?(cpearce)
Attachment #679844 - Flags: review?(cpearce) → review+
Comment on attachment 679814 [details] [diff] [review]
detect same-frame-returned as end of video stream

+        mLastVideoFrame && frame.mTimeUS == mLastVideoFrame->mTime) {
should be
+        mLastVideoFrame && frame.mTimeUs == mLastVideoFrame->mTime) {
Once I apply both these patches DecodeVideoFrame completes, however I do get stuck in the inner while loop in nsBuiltinDecoderReader.cpp

https://github.com/mozilla/mozilla-central/blob/master/content/media/nsBuiltinDecoderReader.cpp#L387

This happens with both the video recorded from the device and a video recorded on the Otoro + Android
(In reply to Kan-Ru Chen [:kanru] from comment #13)
> mVideoSource->read always return the frame at the same timestamp for unknown
> reason. The log from android OMXCodec.cpp:
> 
> V/OMXCodec( 2058): [OMX.google.mpeg4.decoder] FILL_BUFFER_DONE(buffer:
> 0x43d82510, size: 69120, flags: 0x00000000, timestamp: 4000000 us (4.00
> secs))

I have a basic question about the log. "OMX.google.mpeg4.decoder" say that software codec provided by google is used for video playback.
Why the software codec is used for video playback?

From performance point of view, hw codec should be used for video playback.
(In reply to Kan-Ru Chen [:kanru] from comment #0)
> Created attachment 675424 [details]

By the way attachment 675424 [details] uses MPEG4 as a video codec not H.264.
I have been debugging this a little

nsMediaOmxReader::DecodeAudioData finds a frame and pushes it to nsaudioqueue

nsBuiltinDecoderReader::DecodeToTarget is waiting for frames to process (while !eof) but mVideoQueue.GetSize() in 

    while (mVideoQueue.GetSize() == 0 && !eof) {

is always 0 despite just being pushed to, we havent hit eof so it just keeps spinning
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> I have a basic question about the log. "OMX.google.mpeg4.decoder" say that
> software codec provided by google is used for video playback.
> Why the software codec is used for video playback?
> 
> From performance point of view, hw codec should be used for video playback.

+1. Stated earlier on IRC, need to copy over /system/lib/libOmxMpeg4Dec.so from backup-otoro to use the hardware decoder. Will make a pull request for extract-files.sh to grab it.
Why are we supporting playback of non-h.264 formats? My understanding was that we'd be supporting H.264/AAC/MP3 only. What exactly are the video formats that B2G expect us to support?
The encoder for video recording records in 3gp, so I believe its a pretty hard requirement to be able to play the videos we record.

I wasnt particularly clear earlier, with the above patches the video decoding complete but the audio decoding gets stuck in an infinite loop. 

Similarly to the Video patch the Audio gets confused at whether it is seeing empty frames due to seeking or whether it has reached the end of the stream and gets caught in an infinite loop. I may be able to make a patch in the morning but it likely needs someone more familiar.
I think infinite loop happens by combination of following

-[1] stagefright's MediaSource has seeking capability
-[2] In mp4 file format case, always seek to sync frame
    normal movie content has sync frame every 1-2 sec. But it depends on a mp4 file.  
-[3] OmxDecoder::ReadVideo() asked to MediaSource previous sync frame than tartget by using SEEK_PREVIOUS_SYNC.
-[4] nsBuiltinDecoderReader::DecodeToTarget() also checks seeking target(then becomes duplicate)

There are 2 places to decide reaching seeking target. stagefright's MediaSource and nsBuiltinDecoderReader::DecodeToTarget(). IMHO, the code to judge seeking target in nsBuiltinDecoderReader::DecodeToTarget() is not necessary in this case.

follo
- OmxDecoder::ReadVideo()
http://mxr.mozilla.org/mozilla-central/source/content/media/omx/OmxDecoder.cpp#445

- nsBuiltinDecoderReader::DecodeToTarget()
http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderReader.cpp#379

- AwesomePlayer::onVideoEvent()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1602

- MediaSource::ReadOptions::SeekMode
http://androidxref.com/4.0.4/xref/frameworks/base/include/media/stagefright/MediaSource.h#62

I think infinite loop happens by combination of following

- stagefright's MediaSource has seeking capability
- In mp4 file format case, always seek to sync frame
    normal movie content has sync frame every 1-2 sec. But it depends on a mp4 file.  
- OmxDecoder::ReadVideo() asked to MediaSource previous sync frame than tartget by using SEEK_PREVIOUS_SYNC.
- nsBuiltinDecoderReader::DecodeToTarget() also checks seeking target(then becomes duplicate)

There are 2 places to decide reaching seeking target. stagefright's MediaSource and nsBuiltinDecoderReader::DecodeToTarget()
IMHO, the code to judge seeking target in nsBuiltinDecoderReader::DecodeToTarget() is not necessary in this case.


- OmxDecoder::ReadVideo()
http://mxr.mozilla.org/mozilla-central/source/content/media/omx/OmxDecoder.cpp#445

- nsBuiltinDecoderReader::DecodeToTarget()
http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderReader.cpp#379

- MediaSource::ReadOptions::SeekMode
http://androidxref.com/4.0.4/xref/frameworks/base/include/media/stagefright/MediaSource.h#62

- MPEG4Source::read()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/MPEG4Extractor.cpp#1963

- AwesomePlayer::onVideoEvent()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1602
(In reply to Sotaro Ikeda [:sotaro] from comment #33)

Sorry, comment #33 is because of failed copy and paste... re-post the comment.

I think infinite loop happens by combination of following
-[1] stagefright's MediaSource has seeking capability
-[2] In mp4 file format case, always seek to sync frame
    normal movie content has sync frame every 1-2 sec. But it depends on a mp4 file.  
-[3] OmxDecoder::ReadVideo() asked to MediaSource previous sync frame than tartget by using SEEK_PREVIOUS_SYNC.
-[4] nsBuiltinDecoderReader::DecodeToTarget() also checks seeking target(then becomes duplicate)

There are 2 places to decide reaching seeking target. stagefright's MediaSource and nsBuiltinDecoderReader::DecodeToTarget(). IMHO, the code to judge seeking target in nsBuiltinDecoderReader::DecodeToTarget() is not necessary in this case.


- OmxDecoder::ReadVideo()
http://mxr.mozilla.org/mozilla-central/source/content/media/omx/OmxDecoder.cpp#445

- nsBuiltinDecoderReader::DecodeToTarget()
http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderReader.cpp#379

- MediaSource::ReadOptions::SeekMode
http://androidxref.com/4.0.4/xref/frameworks/base/include/media/stagefright/MediaSource.h#62

- MPEG4Source::read()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/MPEG4Extractor.cpp#1963

- AwesomePlayer::onVideoEvent()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1602
Attached patch Seeking beyond EOF of audio (obsolete) — Splinter Review
OmxDecoder::ReadAudio() does not handle EOF of the container while OmxDecoder::ReadVideo() does.  Maybe, sometime, the length of audio stream is not as long as video one, then it is in spinning for the audio data beyond the length of audio stream.

This patch make OmxDecoder::ReadAudio() aware of EOF of the container.  It resolves the issue of seeking, but cause the problem of playing of the sample uploaded by Kan-Ru.  At the end of playing the sample, the decode is blocked for staying in OMXCodec::stop() for waiting leaving an intermediate state (FLUSHING).
blocking-basecamp: --- → ?
blocking-basecamp: ? → +
Applying both this patch and the patch of bug 810734 fix infinite loop when playing MP4.  By handling EOS, it avoids spinning for seeking.  Stopping doing skepToNextKeyFrame, fix an infinite waiting at end of playing of the example video uploaded by @kanru.

skipToNextKeyframe make OMXCodec seeking.  For seeking, OMXCodec will SHUTTING_DOWN input & output temporary.  But, ImageContainer returns buffers in async, it makes OMXCodec segfault to return a buffer while the output port is SHUTTING_DOWN.  Then, OMXDecoder::stop() in turn goes into an infinite loop.

In this patch, I just comment out skipToNextKeyFrame to avoid seeking.  Then, the sample video provided by kanru can be played and stopped properly.  If we still want to skip frames, I need to doing sync between returning buffers and skipToNextKeyFrame.  As the suggestion from kanru, we should sent an empty frame to ImageContainer to make it release the previous one frame, waiting for returning the buffer of the previous one, then start doing skipToNextKeyFrame.
Attachment #680494 - Attachment is obsolete: true
Attachment #681870 - Flags: feedback?
Attachment #681870 - Flags: feedback? → feedback?(eflores)
(In reply to Thinker Li [:sinker] from comment #36)
> buffers and skipToNextKeyFrame.  As the suggestion from kanru, we should
> sent an empty frame to ImageContainer to make it release the previous one
> frame, waiting for returning the buffer of the previous one, then start
> doing skipToNextKeyFrame.

You can clear out the frame in ImageContainer using ClearCurrentFrame. We use this in nsMediaOmxReader::ResetDecode so we can shut dow nthe OMXCodec without hanging/crashing too.

With the exception of the commenting out of the skipToNextKeyFrame line the patch in attachment 681870 [details] [diff] [review] looks ok.
Attachment #681870 - Flags: feedback?(eflores) → feedback+
Comment on attachment 681870 [details] [diff] [review]
handle EOS and stop skepToNextKeyFrame

Review of attachment 681870 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsBuiltinDecoderStateMachine.cpp
@@ +843,5 @@
>            (static_cast<uint32_t>(mReader->VideoQueue().GetSize())
>             < LOW_VIDEO_FRAMES))) &&
>          !HasLowUndecodedData())
>      {
> +	//skipToNextKeyframe = true;

We use this, don't break it!
Attachment #681870 - Flags: feedback-
Whiteboard: [caf:blocking]
Attached patch Check EOS (obsolete) — Splinter Review
Check EOS to fix infinite loop in OMX decoder.

skipToNextKeyFrame issue will be fixed in another bug.
Attachment #681870 - Attachment is obsolete: true
Attachment #682731 - Flags: review?(chris.double)
I had filed bug 812756 for handling skipToNextKeyFrame issues.
This bug has been called out as likely having risk to non-B2G platforms. Given that, marking as P1, and moving into the C2 milestone. We should prioritize this landing to mozilla-beta as soon as possible, to prevent late-breaking regressions to other platforms.
Priority: -- → P1
Target Milestone: --- → B2G C2 (20nov-10dec)
Thinker, This bug needs to be marked depends-on bug 812756, right?
Assignee: kchen → tlee
Attachment #682731 - Flags: review?(chris.double) → review+
(In reply to Inder from comment #42)
> Thinker, This bug needs to be marked depends-on bug 812756, right?

Not really!  Bug 812756 is about different symptoms.  I find the symptom resolving on bug 812756
 while doing this bug, but there is not dependence between them.
Guys how are we doing here? Is this ready to land?
(In reply to Andreas Gal :gal from comment #44)
> Guys how are we doing here? Is this ready to land?

Running testcases.
Attached patch Check EOSSplinter Review
r=doublec
Attachment #682731 - Attachment is obsolete: true
Attachment #684829 - Flags: review+
Attachment #684829 - Flags: checkin?
(In reply to Thinker Li [:sinker] from comment #47)
> Created attachment 684829 [details] [diff] [review]
> Check EOS
> 
> r=doublec

Landed on mozilla-inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7841347b558

It's not clear to me that this patch fixes the bug entirely, and it's not clear to me which of the r+ patches need to land, so I added [leave open] to the Whiteboard field. This means the bug won't auto-close when d7841347b558 is merged to mozilla-central.

Could someone outline which of the other patches here need to land please?
Whiteboard: [caf:blocking] → [caf:blocking][leave open]
Attachment #684829 - Flags: checkin? → checkin+
To simplify tracking, please file a new bug if this is still a problem. Also, please make sure this lands on aurora as needed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Thinker, please land on aurora.
Whiteboard: [caf:blocking][leave open] → [caf:blocking]
This still isn't working for me on central.
Attached patch WIPSplinter Review
Interpret duplicate frames as EOS in OMX decoder
Comment on attachment 685409 [details] [diff] [review]
WIP

Review of attachment 685409 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Does it fix the bug?
Attachment #685409 - Flags: feedback+
Comment on attachment 685409 [details] [diff] [review]
WIP

Review of attachment 685409 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +168,1 @@
>        mSkipCount++;

I can not find these lines.  Are you sure that you are working on default branch of mc?  Since mGraphicBuffer is a nsRefPtr<...>, it is initialized as null, right?

::: content/media/omx/OmxDecoder.cpp
@@ +485,5 @@
> +    logcat("time = %lld", timeUs);
> +    if (timeUs == mLastFrameTimeUs) {
> +      // A bug in the OMX MPEG4 decoder causes it to not signal EOS properly so
> +      // we need this hack to detect it properly. Yay.
> +      logcat("BOOM!");

Have you applied the patch in bug 810734?  It have landed.
Yes. It's still broken when using the software decoder.
Comment on attachment 684829 [details] [diff] [review]
Check EOS

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 80571
User impact if declined: infinite loop for video app
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky):
String or UUID changes made by this patch:
Attachment #684829 - Flags: approval-mozilla-aurora?
Comment on attachment 684829 [details] [diff] [review]
Check EOS

For the time being, bb+ bugs don't need explicit approval for uplift.
Attachment #684829 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/releases/mozilla-beta/rev/0a4986420d89

(This is also in my queue to land on Aurora once it reopens)
Testing on an Otoro running ICS I see the same issue of the time not advancing when using the software decoder. It plays fine using the hardware decoder however. Perhaps the Otoro software ICS MPEG4 decoder is broken?
Testing using the stock browser on an Otoro running ICS using the software MPEG4 video decoder crashes. This seems to indicate there is something wrong with it.
You need to log in before you can comment on or make changes to this bug.