Closed Bug 877461 Opened 7 years ago Closed 6 years ago

[Intermittent] YouTube - Video freezes (audio continues to play)

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(blocking-b2g:leo+, firefox22 wontfix, firefox23 wontfix, firefox24 fixed, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

RESOLVED FIXED
1.1 QE2 (6jun)
blocking-b2g leo+
Tracking Status
firefox22 --- wontfix
firefox23 --- wontfix
firefox24 --- fixed
b2g18 --- verified
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed

People

(Reporter: jsmith, Assigned: roc)

References

Details

(Whiteboard: [tef-triage][YouTubeCertBlocker+])

Attachments

(4 files, 1 obsolete file)

Saw this three times.

Example STR #1

1. Started playing to tell a lie episode (~40 min long) in YouTube certified app
2. One minute in - youtube video frozen, audio kept playing

Example STR #2

1. Was watching a 3 minute video in fullscreen and seeked into the future
2. Video began buffering, then I exited fullscreen
3. Audio kept playing, but the video did not
Blocks: 877024
Filed a separate bug away from bug 876843 until we have exact confirmation that the YouTube certified app bug here is exactly the same as what was seen with viewing YouTube videos in the Video app. It might be the same though.
blocking-b2g: --- → tef?
See Also: → 876843
Summary: [Intermittent] YouTube - Audio continues to play with video frozen → [Intermittent] YouTube - Video freezes (audio continues to play)
Sounds like this is either a media issue or a gfx issue.
Though, in honesty, I wonder if we're getting to the point where there's simply too many issues in the v1.0 codebase and we simply need to let it go and instead aim for v1.1 to be the first release that supports youtube :(

The fact that we're running in to so many fundamental issues is really concerning.
Is this the same as bug 871753?
(In reply to Chris Pearce (:cpearce) from comment #4)
> Is this the same as bug 871753?

It is different from bug 871753. In bug 871753's case, audio and app's ui also freeze because of deadlock.
I tend to agree with comment 3, I think it is too late to try to fix this for 1.0.1 and we should go for do this properly for 1.1
Whiteboard: [tef-triage]
(In reply to Daniel Coloma:dcoloma from comment #6)
> I tend to agree with comment 3, I think it is too late to try to fix this
> for 1.0.1 and we should go for do this properly for 1.1

That would require a decision to pull the YouTube app permanently for 1.01 entirely. Which we haven't reached that decision just yet - we said we would make a final decision here at the latest by June 4th.

I tend to think though this will probably cause us to fail YouTube's certification process, even though it's intermittent across this bug and bug 876843 (i.e. we need to be solid on the quality bar when doing anything involving playing media content).
Jeff, let's see if we can work with Sotaro on this and shed some light on it.
Flags: needinfo?(jmuizelaar)
(In reply to Jason Smith [:jsmith] from comment #7)
> (In reply to Daniel Coloma:dcoloma from comment #6)
> > I tend to agree with comment 3, I think it is too late to try to fix this
> > for 1.0.1 and we should go for do this properly for 1.1
> 
> That would require a decision to pull the YouTube app permanently for 1.01
> entirely. Which we haven't reached that decision just yet - we said we would
> make a final decision here at the latest by June 4th.

Or that we don't need this bug to ship YT. Have you or anybody else been able to reproduce since? What was the video used?
Keywords: qawanted
Able to reproduce 1 time out of 5 times on the 

Unagi Build ID: 20130530070208
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/09ac1fd2959c
Gaia: 1cca9324d4444ad28c6fa99875e17abf7e8230be
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #0)
> Example STR #2
> 
> 1. Was watching a 3 minute video in fullscreen and seeked into the future
> 2. Video began buffering, then I exited fullscreen
> 3. Audio kept playing, but the video did not


I can reproduce the bug intermittently using these STR. It seems that nsMediaOmxReader::DecodeVideoData() is returning false, which shuts down playback of the video stream but not the audio stream. nsMediaOmxReader::DecodeVideoData() is returning false because we're unable to create the the VideoData object (we're hitting the line NS_WARNING("Unable to create VideoData");, I'm not sure why yet.
We're unable to create the VideoData because OmxDecoder::ReadVideo is returning true but not filling out valid data in aFrame. mVideoSource->Read() is failing with err == -ETIMEDOUT.
I'm seeing this in the logcat right before we hit the -ETIMEDOUT return:

E/OMXCodec( 6033): [OMX.qcom.video.decoder.avc] Timed out waiting for output buffers: 0/12

Seems that something is holding onto the gralloc buffer too long.
This patch will recycle the ImageLayers used by recreated nsVideoFrames. It shouldn't make a difference (maybe be a bit more efficient), but it might help avoid some bugs?
(In reply to Chris Pearce (:cpearce) from comment #13)
> I'm seeing this in the logcat right before we hit the -ETIMEDOUT return:
> 
> E/OMXCodec( 6033): [OMX.qcom.video.decoder.avc] Timed out waiting for output
> buffers: 0/12
> 
> Seems that something is holding onto the gralloc buffer too long.

It happens normally when MediaStreamSource::readAt() is blocked longer time.
http://mxr.mozilla.org/mozilla-b2g18/source/content/media/omx/OmxDecoder.cpp#95
This patch basically seems to work.
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> (In reply to Chris Pearce (:cpearce) from comment #13)
> > I'm seeing this in the logcat right before we hit the -ETIMEDOUT return:
> > 
> > E/OMXCodec( 6033): [OMX.qcom.video.decoder.avc] Timed out waiting for output
> > buffers: 0/12
> > 
> > Seems that something is holding onto the gralloc buffer too long.
> 
> It happens normally when MediaStreamSource::readAt() is blocked longer time.
> http://mxr.mozilla.org/mozilla-b2g18/source/content/media/omx/OmxDecoder.
> cpp#95

Can you explain that in more detail? If readAt is blocked, wouldn't that cause the decoder to stop consuming buffers, thus making it more likely we *don't* run out of buffers?
I wrote comment #15, because it is the sympton I saw in the past. It need to make clear why.
OMXCodec uses one mutex to protect everything of OMXCodec. If MediaStreamSource::readAt() takes longer time, following things happen.
- Client can not read decoded data.
- buffers can not be returned to OMXCodec.
- thread in OMXCodec can not run
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> This patch basically seems to work.

I take that back, I don't think it does.
(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> I wrote comment #15, because it is the sympton I saw in the past. It need to
> make clear why.
> OMXCodec uses one mutex to protect everything of OMXCodec. If
> MediaStreamSource::readAt() takes longer time, following things happen.
> - Client can not read decoded data.
> - buffers can not be returned to OMXCodec.
> - thread in OMXCodec can not run

OK, but if buffers aren't being returned to OMXCodec, and it's not consuming any, why would we run out?
I've tried pretty hard to reproduce this with a Gecko debug build, without success.
Duplicate of this bug: 876843
FWIW, I got this reproduce again. My STR this time was something along the lines of merely seeking ahead to a portion of the video that has not buffered yet with HQ disabled.
See Also: 876843
roc - what can QA do to get debugging when they're able to reproduce?

We've had a number of instances of this (sometimes within a couple of video plays). We should investigate as if it's a YT blocker.
Assignee: nobody → roc
blocking-b2g: tef? → tef+
Also adding steps-wanted for better STR.
Keywords: steps-wanted
batch update on tef+ milestones. partner to make a final on 6/3 Asia time. TEF+ needs to be resolved by 6/3 to be in the final build. thanks
Target Milestone: --- → 1.0.1 IOT3 (3jun)
QA Contact: jsmith
https://bugzilla.mozilla.org/show_bug.cgi?id=877024#c8
blocking-b2g: tef+ → leo+
Target Milestone: 1.0.1 IOT3 (3jun) → ---
Since we're not caring about 1.01 anymore, I found a reproducible STR that seems to always reproduce this bug with the HTML 5 YouTube app.

STR

1. Go to https://m.youtube.com/?tsp=1&player=html5&monetize=1 in the browser
2. Search for Community
3. Select "Community - Best of Troy (Season 1)
4. Disable the HQ button
5. Play the video
6. Seek forward in the video by a minute or two

Result - You'll see this bug.
Keywords: steps-wanted
Blocks: 878343
Target Milestone: --- → 1.1 QE2 (6jun)
On browser youtube I see this message when the screen freezes

E/libgenlock(  132): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1,        >err=Connection timed out fd=31)
E/omx_vdec(  132): Failed to acquire genlock, ret = 1

I you're seeing the "[OMX.qcom.video.decoder.avc] Timed out" message make sure you have the patches from bug 863441 and bug 864230.
(In reply to Diego Wilson [:diego] from comment #29)
> On browser youtube I see this message when the screen freezes
> 
> E/libgenlock(  132): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK
> failed (lockType0x1,        >err=Connection timed out fd=31)
> E/omx_vdec(  132): Failed to acquire genlock, ret = 1

IIRC, I see this error only when hw composer rendering is enabled.
(In reply to Chris Pearce (:cpearce) from comment #13)
> I'm seeing this in the logcat right before we hit the -ETIMEDOUT return:
> 
> E/OMXCodec( 6033): [OMX.qcom.video.decoder.avc] Timed out waiting for output
> buffers: 0/12

I checked the buri v1-train source code. In the code, when timeout happened in OMXCodec::read(), OMXCodec's state change to ERROR state. But such code does not present in AOSP. And also unagi v1-train source code does not have it.
(In reply to Sotaro Ikeda [:sotaro] from comment #31)
> I checked the buri v1-train source code. In the code, when timeout happened
> in OMXCodec::read(), OMXCodec's state change to ERROR state. But such code
> does not present in AOSP. And also unagi v1-train source code does not have
> it.

Is the libstagefright code for buri different to that in our mozilla-b2g/B2G repository? If so, how does one get access to that?
Sotaro, can you help us understand what is going on here?

What does it mean when the OMXCodec reports "Timed out waiting for output buffers"?
Is it a bug for MediaStreamSource::readAt to block indefinitely? Because the design of the media cache is that reads are allowed to block.
If we return an error, or zero bytes, instead of blocking, would that help?
How exactly can a blocking read from the cache cause the OMXCodec to report "Timed out waiting for output buffers"?
After adapting patch from Bug 870564, This problem is occurred.

I saw this log also.

E/OMXCodec( 6033): [OMX.qcom.video.decoder.avc] Timed out waiting for output buffers: 0/12
(In reply to Chris Double (:doublec) from comment #32)
> Is the libstagefright code for buri different to that in our mozilla-b2g/B2G
> repository? If so, how does one get access to that?

unagi(inari), hamachi(buri), leo use CAF(codeaurora)'s stagefright code. It is modifed from default android source.
It is defined in https://github.com/mozilla-b2g/b2g-manifest
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #33)

> What does it mean when the OMXCodec reports "Timed out waiting for output buffers"?

This error log is printed when OMXCodec can not provide decoded data within specified time(3 sec).

> Is it a bug for MediaStreamSource::readAt to block indefinitely? Because the design of the media cache is that reads are allowed to block.

read blocking is not a problem. stagefright works like that in android.
The problem is read is blocked very longer time than android.
And media cache size and others(like AMPLE_AUDIO_USECS) might affected to it.

> If we return an error, or zero bytes, instead of blocking, would that help?

It does not help in this case:-( OMXCodec expect that data source can be read same as local file system.

> How exactly can a blocking read from the cache cause the OMXCodec to report "Timed out waiting for output buffers"?

I did not investigated about it deeply. I am going to investigate about it.
Sotaro ask CAF for help.
Flags: needinfo?(dwilson)
(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #33)
> > What does it mean when the OMXCodec reports "Timed out waiting for output buffers"?
> 
> This error log is printed when OMXCodec can not provide decoded data within
> specified time(3 sec).

If the codec tries to read data we haven't fetched yet, then it could easily take more than 3 seconds to fetch. Especially if we have to initiate a new HTTP request to get the right byte range. Maybe the problem is as simple as that.

If that's the problem, then maybe we just need to have that timeout increased to a much larger value?
In android, 3 sec is enough for timeout. If media buffer works as expected this problem does not happen. I am wondering that current Firefox OS has some problem around handling MediaCache. If there are not enough pre-fetched data in MediaCache, decoding/playback should be paused. If this works correctly, this timeout problem does not happen.

In android, when prefetched data becomes low water mark, playback is paused. It is in AwesomePlayer::onBufferingUpdate()
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#692

Media cache is handled in NuCachedSource2 class. 
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/NuCachedSource2.cpp

One big different about media chache between FirefoxOS and android is that NuCachedSource2 keeps data continuous area. Firefox OS's media cache does not keep cache continuous area. In Firefox OS, during playback, audio cache read ahead 1sec from video track(this need to be changed to more shorter). If media cache caches data continuous area, video track have already data for decoding. But is is not guarantee in Firefox OS. Sometimes MediaCache does not have data for video track.
And this case is not handled in MediaOmxReader::GetBuffered()
http://mxr.mozilla.org/mozilla-central/source/content/media/omx/MediaOmxReader.cpp#289

Then new http session triggered by OMXCodec::read();
So, Need to fix part is around MediaCache, I think.
This timeout is used also bad content check that the hardware can not handle.
There was a bug to extend timeout, Bug 864181. But I set to WORKSFORME. Because to extend timeout cause bad effect to this case. And extend the timeout did not solve the problem.
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> In android, 3 sec is enough for timeout. If media buffer works as expected
> this problem does not happen.

It seems that an easier way to debug this is revert Bug 863441.
I think you're saying that we should only call into the OMX code to decode frames when we know all the data that it might read is already present in the media cache. But that seems very difficult to guarantee, since we don't know how much data it is going to need.

We can try to tweak our buffering heuristics so we buffer more before calling into the codec. But I don't think we can guarantee that the codec never runs out of cached data.

It looks like AwesomePlayer uses heuristics too. It seems to me that AwesomePlayer could also make the codec read uncached data if those heuristics are wrong; is that correct? Is there something I can't see that makes AwesomePlayer avoid these timeouts we're seeing?
Also, I don't understand why increasing the timeout would not fix this bug.

> And this case is not handled in MediaOmxReader::GetBuffered()
> http://mxr.mozilla.org/mozilla-central/source/content/media/omx/MediaOmxReader.cpp#289

GetBuffered is a heuristic (assumes constant bit rate), so I hope we're not depending on that to be very accurate.

I'll see if I can do something to deal with this audio/video skew issue, anyway.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #43)
> Also, I don't understand why increasing the timeout would not fix this bug.

If network connection is cellular data connection, data disconnect period could be way much longer than wlan. We can not extend the timeout to fix this case. Extend timeout causes following drawbacks.

If we increase the timeout, OMXCodec::read() could be blocked until this timeout at most. It means, even when video tag try to release video decoder, the video tag could continue to hold hw codec the timeout period at most. This could happen in following use cases.
 - a video content is corrupted
 - device does not support a video content 
 - network connection becomes wrong

For example, we extend the timeout to 20 sec

If Sd card has 40 videos that does not supported to playback on the device. In this case, thumbnail processing could becomes 20*40 sec.

Another example. When a video is played in browser, user push home button and changed to video app. But at that time, there is a problem on the network connection. When a browser go to background, a hardware codec is going to be freed. The free could take 20 sec at most. Until the free end, the video app can not do video playback.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #42)
> It looks like AwesomePlayer uses heuristics too. It seems to me that
> AwesomePlayer could also make the codec read uncached data if those
> heuristics are wrong; is that correct? Is there something I can't see that
> makes AwesomePlayer avoid these timeouts we're seeing?

In actual use, AwesomePlayer avoid this problem in almost all use cases. It is not only this heuristics, but also how caching is done in NuCachedSource2.
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> In android, 3 sec is enough for timeout. If media buffer works as expected
> this problem does not happen. 

Correction, 3 sec since got OMXCodec's mutex. If mutex is already held by other task like reading data source, this duration is not counted.

http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/OMXCodec.cpp#3897
I found HwcComposer bug 878977. If I skip rendering canvas layers as described here https://bugzilla.mozilla.org/show_bug.cgi?id=878977#c1 my video doesn't lock anymore!

Of course, skipping canvas layers isn't an option, but it narrows down for me.
Flags: needinfo?(dwilson)
(In reply to Diego Wilson [:diego] from comment #47)
> I found HwcComposer bug 878977. If I skip rendering canvas layers as
> described here https://bugzilla.mozilla.org/show_bug.cgi?id=878977#c1 my
> video doesn't lock anymore!
> 
> Of course, skipping canvas layers isn't an option, but it narrows down for
> me.

nice! Though I am not sure this fix all this Video freezes (audio continues to play) problem.
This bug shows up quickly with the steps in comment #28 because our seeking code doesn't do any buffering. All we do in nsMediaOmxReader::Seek is set mAudioSeekTimeUs and mVideoSeekTimeUs and call nsBuiltinDecoderReader::DecodeToTarget to decode video and audio up to the desired point. That in turn calls OmxDecoder::ReadVideo which simply does options.setSeekTo and then mVideoSource->read. Of course we don't have the data for that point in time, and can't get it in 3 seconds, so we get the timeout error from libstagefright.

This is difficult to fix well because this part of our code, like the rest of our code, is designed around the assumption that it's OK to make codecs block until the data they request is available. The libstagefright design, where we don't know what data the codec will request but apparently it has to be available immediately, seems crazy to me. But it is what it is I guess.

I don't see anything in AwesomePlayer that would avoid this sort of problem. It just sets the seek option and starts reading from the decoder without preloading any data or checking buffers. Sotaro, do you know how AwesomePlayer avoids a timeout when it seeks?
http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/AwesomePlayer.cpp#1600
Flags: needinfo?(sotaro.ikeda.g)
I tried completely disabling HwcComposer2D by adding a "return false;" at the start of HwcComposer2D::TryRender, and it fixed the seeking problem. The codec still does a lot of reads of uncached data but the ETIMEDOUT error is never returned. That makes me think that blocking the codec during a data read doesn't have to be a problem.

So I'm going to focus on 878977 next.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #50)
> I tried completely disabling HwcComposer2D by adding a "return false;" at
> the start of HwcComposer2D::TryRender, and it fixed the seeking problem.

Actually it makes the bug harder to reproduce, but it still can occur. So never mind.
I still don't know how to fix seeking. When we try to seek to a particular time, we don't even know which bytes of the resource are going to be needed by the codecs until they call readAt, and then it's too late because we've only got 3 seconds to fetch the requested data.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #49)
>
> I don't see anything in AwesomePlayer that would avoid this sort of problem.
> It just sets the seek option and starts reading from the decoder without
> preloading any data or checking buffers. Sotaro, do you know how
> AwesomePlayer avoids a timeout when it seeks?
> http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/
> AwesomePlayer.cpp#1600

Actually AwesomePlayer does not fix seek problem. It does not make a problem in most use cases. But there is still a problem. Then Android is going to use NuPlayer for streaming playback. It fixes this problem. In JB, the NuPlayer is used for RTSP and HTTP Live Streaming. 

http://androidxref.com/4.2.2_r1/xref/frameworks/av/media/libmediaplayerservice/MediaPlayerFactory.cpp#198

And seemsto use NuPlayer as default player in tuture.
http://androidxref.com/4.2.2_r1/xref/frameworks/av/media/libmediaplayerservice/MediaPlayerFactory.cpp#63
Flags: needinfo?(sotaro.ikeda.g)
NuPlayer does seek asynchronously. At first it does seek only by using parser without using OMXCodec.

http://androidxref.com/4.2.2_r1/xref/frameworks/av/media/libmediaplayerservice/nuplayer/NuPlayerDriver.cpp#184
Comment on attachment 757919 [details] [diff] [review]
Make nsMediaCache aggressively cache data just behind the current playback position

Review of attachment 757919 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/nsMediaCache.cpp
@@ +932,1 @@
>      BlockOwner* bo = &block->mOwners[i];

While you're here, how about you fix the "Blocks can be belong" typo on line 923?
Attachment #757919 - Flags: review?(cpearce) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #53)
> I still don't know how to fix seeking. When we try to seek to a particular
> time, we don't even know which bytes of the resource are going to be needed
> by the codecs until they call readAt, and then it's too late because we've
> only got 3 seconds to fetch the requested data.

My current best idea for fixing seeking is to handle the timeout error by restarting decoding. If I can get restarting to work properly (which may require reinstantiating the libstagefright decoder, but hopefully not) then this should work pretty well --- there will still be a delay of course, but only as long as it takes the data to arrive, and since you're seeking that's expected.
Whiteboard: [tef-triage] → [tef-triage][YouTubeCertBlocker+]
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #57)
> My current best idea for fixing seeking is to handle the timeout error by
> restarting decoding. If I can get restarting to work properly (which may
> require reinstantiating the libstagefright decoder, but hopefully not) then
> this should work pretty well --- there will still be a delay of course, but
> only as long as it takes the data to arrive, and since you're seeking that's
> expected.

I think if Bug 879297 is fixed, even when timeout happens on video track, the video track continue to playback. OmxDecoder already ignores timeout error as no error.
I confirmed on v1.1 unagi that even when timeout happened on video OMXCodec, video read continued on the track.
https://hg.mozilla.org/integration/mozilla-inbound/rev/45361ebec718
https://hg.mozilla.org/integration/mozilla-inbound/rev/86a166bf1ba7

(In reply to Sotaro Ikeda [:sotaro] from comment #59)
> I confirmed on v1.1 unagi that even when timeout happened on video OMXCodec,
> video read continued on the track.

OK. We still need to do something though since currently we're passing garbage to VideoData::Create. I'll work on it.
Whiteboard: [tef-triage][YouTubeCertBlocker+] → [tef-triage][YouTubeCertBlocker+][leave open]
It's a bad idea to have AudioFrame zero-initialize its members but VideoFrame not do it.
Attachment #758558 - Flags: review?(chris.double)
With this patch (and bug 879297 hacked around by disabling HWC) the HD version of the video from comment #28 works great. I can seek all over the place, and even if we block reading data, we retry after the codec times out, and it works well.

Unfortunately, the non-HD version still breaks when I seek. The weird thing is that from my logs, it seems that after we request the seek in OMXDecoder, libstagefright doesn't do any readAt calls, it just gets stuck in a never-ending series of timeouts and retries. Since it's not doing any readAts, it seems that the media cache can't be the problem. So this must be some other bug.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #64)
>
> Unfortunately, the non-HD version still breaks when I seek. The weird thing
> is that from my logs, it seems that after we request the seek in OMXDecoder,
> libstagefright doesn't do any readAt calls, it just gets stuck in a
> never-ending series of timeouts and retries. Since it's not doing any
> readAts, it seems that the media cache can't be the problem. So this must be
> some other bug.

Yeah, maybe qcom's modification to OMXCodec affect to this.
I'm not sure how to debug that problem, since it seems to be happening without entering code we control, possibly even in Qualcomm's driver for which we don't have the source.
I think if I get review here I'll land the patches and close this bug and open a new one for that seeking issue.
Comment on attachment 758558 [details] [diff] [review]
Part 2: Initialize VideoFrame members to 0

Can you raise a bug to have the equivalent change done in the android code please.
Attachment #758558 - Flags: review?(chris.double) → review+
 (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #63)
> Created attachment 758561 [details] [diff] [review]
> Part 3: Retry libstagefright audio/video decoding if it fails due to a
> timeout

Looks basically good. I have one question. Timeout could happen not only by data source, but also by content. In this case, OmxDecoder::ReadVideo() and OmxDecoder::ReadAudio() fall int almost infinite loop.

In this situation, how could nsBuiltinDecoderStateMachine shut down decoding thread?
(In reply to Sotaro Ikeda [:sotaro] from comment #69)
> Looks basically good. I have one question. Timeout could happen not only by
> data source, but also by content. In this case, OmxDecoder::ReadVideo() and
> OmxDecoder::ReadAudio() fall int almost infinite loop.
> 
> In this situation, how could nsBuiltinDecoderStateMachine shut down decoding
> thread?

Ah, that's a very good point. I need to do something about that.
Attached patch Part 3 v2Splinter Review
Alright, I think this is a lot better. This removes the retry loop from ReadVideo/ReadAudio; the regular video decode loop does the retries implicitly. That way we can shut down cleanly without any extra work.
Attachment #758897 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 758897 [details] [diff] [review]
Part 3 v2

Looks good to me!
Attachment #758897 - Flags: review?(sotaro.ikeda.g) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #64)
> 
> Unfortunately, the non-HD version still breaks when I seek. The weird thing
> is that from my logs, it seems that after we request the seek in OMXDecoder,
> libstagefright doesn't do any readAt calls, it just gets stuck in a
> never-ending series of timeouts and retries. Since it's not doing any
> readAts, it seems that the media cache can't be the problem. So this must be
> some other bug.

To debug around seek timeout, it is necessary to remember about difference between ics_chocolate(unagi, inari) and ics_strawberry(hamachi, buri). See Bug 878981.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #70)
> Part 1 landed on mozilla-b2g18:
> https://hg.mozilla.org/releases/mozilla-b2g18/rev/9778ac50b70b

https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/9778ac50b70b
(In reply to Sotaro Ikeda [:sotaro] from comment #74)
> To debug around seek timeout, it is necessary to remember about difference
> between ics_chocolate(unagi, inari) and ics_strawberry(hamachi, buri). See
> Bug 878981.

Thanks for the reminder. Unfortunately we don't have any ics_strawberry devices around here.
Attachment #758561 - Attachment is obsolete: true
Attachment #758561 - Flags: review?(sotaro.ikeda.g)
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/6a5f8ebc836a - b2g arm opt (the emulator build) says

https://tbpl.mozilla.org/php/getParsedLog.php?id=23845842&tree=Mozilla-Inbound

SVGAnimatedPathSegList.cpp
In file included from /builds/slave/m-in-ics_a7_g-0000000000000000/build/content/media/omx/OmxDecoder.h:12,
                 from /builds/slave/m-in-ics_a7_g-0000000000000000/build/content/media/omx/MediaOmxReader.cpp:16:
/builds/slave/m-in-ics_a7_g-0000000000000000/build/content/media/omx/MPAPI.h: In constructor 'MPAPI::VideoFrame::VideoFrame()':
/builds/slave/m-in-ics_a7_g-0000000000000000/build/content/media/omx/MPAPI.h:48: error: class 'MPAPI::VideoFrame' does not have any field named 'mEndTimeUs'
make[7]: *** [MediaOmxReader.o] Error 1
Is it possible to have this fix in v1.0.1?
(In reply to khu from comment #79)
> Is it possible to have this fix in v1.0.1?

We could, but hasn't that ship already sailed?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #77)
> https://hg.mozilla.org/releases/mozilla-b2g18/rev/50fbee667acb

Interesting, apparently our commit hooks on b2g18 don't care if there's a bug # or not in the commit message.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #80)
> (In reply to khu from comment #79)
> > Is it possible to have this fix in v1.0.1?
> 
> We could, but hasn't that ship already sailed?

But, the carrier would like to have the solution provided via FOTA as soon as possible. It means, we still need to have this solution in v1.0.1. 

Do we know how long we need to put it into v1.0.1? Thanks!
(In reply to khu from comment #82)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #80)
> > (In reply to khu from comment #79)
> > > Is it possible to have this fix in v1.0.1?
> > 
> > We could, but hasn't that ship already sailed?
> 
> But, the carrier would like to have the solution provided via FOTA as soon
> as possible. It means, we still need to have this solution in v1.0.1. 
> 
> Do we know how long we need to put it into v1.0.1? Thanks!

comment 27 already confirmed that this was out for 1.01 per a discussion over the plan for YouTube. I don't think it's wise at all to introduce any distractions involving 1.01 right now given tight schedules and focus needed on 1.1 for YouTube.
I'm going to let this bug close. We'll open new ones for similar issues that are still around.
Whiteboard: [tef-triage][YouTubeCertBlocker+][leave open] → [tef-triage][YouTubeCertBlocker+]
Keywords: verifyme
The YouTube video experience with these patches is definitely better on b2g18. I am seeing the behavior being seen in bug 880601, however.
Flags: in-moztrap?
Flags: in-moztrap? → in-moztrap?(dwatson)
Created two cases covering under 10 minutes videos and above 10 minute videos, making sure that they are playable without issues.

https://moztrap.mozilla.org/manage/cases/?filter-id=9395
https://moztrap.mozilla.org/manage/cases/?filter-id=9396
Flags: in-moztrap?(dwatson) → in-moztrap+
You need to log in before you can comment on or make changes to this bug.