crash in mozalloc_abort(char const*) | __android_log_assert | android::ACodec::sendFormatChange(android::sp<T> const&)

RESOLVED WORKSFORME

Status

()

defect
--
critical
RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: NicholasN, Assigned: mchiang)

Tracking

(4 keywords)

unspecified
Unspecified
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.5+, b2g-v2.2 unaffected, b2g-master unaffected)

Details

(Whiteboard: [2.5-Daily-Testing][Spark], crash signature, )

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-77e26bb8-31d5-4c74-a273-f1bc32150723.
=============================================================

Description:
The user opens the browser and navigates to youtube.com. They select any video and play it. If an ad comes up, the video will be stuck on the first frame of the ad and won't play. If an ad does not appear the video will load briefly and then a crash occurs.


Repro Steps:
1) Update a Flame to 20150723010205
2) Open the browser and navigate to youtube.com
3) Select a video and play it.
4) If an ad appears load a different video until there is no ad.


Actual:
Browser crashes when youtube video is played.


Expected:
Youtube video plays.


Notes:

Environmental Variables:
Device: Flame 2.5
Build ID: 20150723010205
Gaia: f04fdbfa1943dddeab8ecd1299a76ab56e590d00
Gecko: 2ddec2dedced
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5)
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0


Repro frequency: 4/4
Link to failed test case: https://moztrap.mozilla.org/manage/case/6073/
See attached: video clip, logcat
This issue also occurs on Aries 2.5, but does not occur on Flame 2.2.

Aries 2.5

Actual:
Browser crashes when youtube video is played.

Environmental Variables:
Device: Aries 2.5
BuildID: 20150723134401
Gaia: aa1698251e86c820c50c045b0a3ff65fd6b0eee7
Gecko: eee2d49d055c
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 42.0a1 (2.5) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0


Flame 2.2 

Actual:

Youtube video plays.

Environmental Variables:
Device: Flame 2.2
BuildID: 20150723002503
Gaia: e1e6317f17a840b19af9dbb25f5a771d8d9fa161
Gecko: d8326043baec
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Whiteboard: [2.5-Daily-Testing][Spark]
Component: Gaia::Browser → Video/Audio
Product: Firefox OS → Core
[Blocking Requested - why for this release]:
Severe issue failing smoke tests.
blocking-b2g: --- → 2.5?
Flags: needinfo?(pbylenga)
QA Contact: ktucker
blocking-b2g: 2.5? → 2.5+
Anthony, any idea whats going on?
Flags: needinfo?(ajones)
Mozilla Central Window

Last Working
Device: Flame 2.5
BuildID: 20150722045350
Gaia: b57aef5b7f52c40f88ee4c069ff722404e8e8521
Gecko: 221f20e9523e
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

First Broken
Device: Flame 2.5
BuildID: 20150722050151
Gaia: b57aef5b7f52c40f88ee4c069ff722404e8e8521
Gecko: e7434cafdf2f
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

Gecko pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=221f20e9523e&tochange=e7434cafdf2f

I am still working on the window.
Looking through that,  Bug 1165772 looks suspect
Mozilla Inbound

Last Working
Device: Flame 2.5
BuildID: 20150721205351
Gaia: 84c3bf622e211046d905803b34de5d331761f22d
Gecko: f8051d507461
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5)
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

First Broken
Device: Flame 2.5
BuildID: 20150721212751
Gaia: 84c3bf622e211046d905803b34de5d331761f22d
Gecko: de0a5cf8c4f9
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5)
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

Last Working Gaia First Broken Gecko: Issue DOES reproduce
Gaia: 84c3bf622e211046d905803b34de5d331761f22d
Gecko: de0a5cf8c4f9

First Broken Gaia Last Working Gecko: Issue DOES NOT reproduce
Gaia: 84c3bf622e211046d905803b34de5d331761f22d
Gecko: f8051d507461

Gecko pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f8051d507461&tochange=de0a5cf8c4f9

This looks to have been caused by bug 1165772
Blocks: 1165772
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jyavenard)
Jean-Yves, can you take a look at this please? I'm not sure what the exact culprit is in the pushlog but I am guessing it is the landing for bug 1165772. This is a smoke test blocker so we will need this landing backed out if this is the cause.
Flags: needinfo?(ajones)
On android, the temporary workaround would be to set media.mediasource.format-reader to false
Flags: needinfo?(jyavenard)
http://hg.mozilla.org/integration/mozilla-inbound/rev/a465aecfd856

However, backing this out would be silly at this stage IMHO. Sounds like another stagefright bug simply being exposed.
snorp, any chance you could provide me with a full backtrace ?
Or let me know which place is asserting there?
https://android.googlesource.com/platform/frameworks/av/+/e2d617f5ba7fb90f27b03e2593666b2c927e4dc9/media/libstagefright/ACodec.cpp#L2469
Flags: needinfo?(snorp)
Blake, could you look into this ?
can't find my flame
Flags: needinfo?(bwu)
Which version of flame are you using ?
Are you using the v18Dv3 Nightly base build?
Flags: needinfo?(nnelson)
Flags: needinfo?(ktucker)
(In reply to Jean-Yves Avenard [:jya] from comment #11)
> Blake, could you look into this ?
> can't find my flame
I think Munro can help it.
Murno, 
Could you look into this?
Flags: needinfo?(bwu) → needinfo?(mchiang)
So after spending a day on this ; I had assumed it was related to the new MSE.

turned out, MSE isn't even active on the flame. Also, building an image without any of the changes listed in the regression range (I checked out a version prior any of those commits).

And I get the exact same behaviour.
KTucker, how do you check your regression range?
Flags: needinfo?(snorp)
Blake indicated that the issue had happened before and appears to be timing related.
In any case, nothing to do with bug 1165772
No longer blocks: 1165772
Reproduced on today's Flame 2.5 with the following variables. We are on the v18D_nightly_v4 base image.

Environmental Variables:
Device: Aries 2.5
BuildID: 20150709163311
Gaia: fc6643dd3da2ccdf2ab284479643836bb3698644
Gecko: 917e7b01ea54
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 42.0a1 (2.5) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0
Flags: needinfo?(nnelson)
Our regression window process is something like this:

1. We find the last working build in which the issue does not occur and the first broken build where the issue started to occur in Mozilla Central. 
2. After finding those two builds, we swap the gaia and gecko to determine if it is a gaia or gecko issue. 
3. From there we create a gecko or gaia pushlog and bisect that further to determine if the offending commit landed in B2G inbound or Mozilla inbound.
4. Then we find the last working and first broken build in B2G inbound or Mozilla inbound.
5. We create another pushlog and see what the possible culprit is.

As stated in Comment, I am not sure which one of those check ins caused the issue. We are making a guess from what is in our pushlog. We currently cannot do Gecko bisections only Gaia. 

We are using the v18D_nightly_v4 base build.
Flags: needinfo?(ktucker)
Backed out of 1171379 pref change after having the pref changed tested and seeing that we did not get this break.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
This is totally silly.
The smoke test failing doesn't even use MSE.

The bug is elsewhere, so rather than fix it properly, you just hide it.

And now, we're going to have hundreds of failure elsewhere, because backing out just the pref is going to break all webref and mochitests.
Flags: needinfo?(nhirata.bugzilla)
There's evidence that something in the code path with this preference could end up hurting our end users Foxfood on aries as well as our community members on flame from using youtube and such.  If you want to delay the build due to this and have angry community members (including devs who wants their features tested) because they haven't had an update for weeks, I can send them your way if you want.  :)

I would recommend that you figure out where in the code it's causing the break.  If you need to find a flame, I can provide one.  v4 is listed : https://drive.google.com/drive/folders/0B_0LdM1CVycIZXNWeWc1TlM4aUk

As far as the back out is concerned I had asked the group, people have been supportive of backing out of any code that breaks end user usage:  https://groups.google.com/forum/#!topic/mozilla.dev.b2g/STO5PvCYyR4
Flags: needinfo?(nhirata.bugzilla) → needinfo?(jyavenard)
You could just have disabled it on FFOS only to start with, rather than everywhere.

This is going to seriously affect all other platforms and will cause havoc on our plans to enable MSE for all websites.

You have pushed something straight to central, without even checking what else it was going to break. You will cause test failures on all platforms where MSE is enabled (Windows >= 7 and OS X).

It was a hasty change, without careful consideration of the implications.
Something that consultation with the media team would have prevented. We were working hard on a fix, you've hampered all our efforts.
Flags: needinfo?(jyavenard)
#if defined(MOZ_WIDGET_GONK)
pref("media.mediasource.format-reader", false);
#else
// Enable new MediaSource architecture.
pref("media.mediasource.format-reader", true);
#endif

this is what you need as temporary measure, the failure are starting to show on treeherder now.
Flags: needinfo?(nhirata.bugzilla)
We asked for you to backout of the change.  

You had the opportunity to do so yourself; I had read comment 9 and followed through with just this backout as per your suggestion.  

This had gotten escalated to me, and I backed it out.  So as per the comment previously, we can work together for a solution to fix the issue with the pref turned on or I can go ahead and escalate this to our managers.
Flags: needinfo?(nhirata.bugzilla) → needinfo?(jyavenard)
Problem is, I can reproduce the crash with my local build, with a version of gecko of 2 weeks ago, one that doesn't have any of the changes mentioned here (including the one you backed out). (and again, youtube on mobile doesn't use MSE on flame). The FFOS team knew of a race in android's stagefright, unfortunate that there's no bug tracking for that issue.

As to backing out. A proper fix, or a push with a partial backout would have taken the same amount of time should proper channels were followed. I prefer the earlier. I don't like band-aid just hiding problems.

At the end of the day, what I would retain from all this is: why are mochitests and webref platform test disabled on FFOS ??
Flags: needinfo?(jyavenard)
Unbacked out (and including the changes from comment 22) in https://hg.mozilla.org/mozilla-central/rev/d3228c82badd
Re-opening as the core issue isn't fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → mchiang
Blocks: 1187542
[:jya] from comment #24)
> Problem is, I can reproduce the crash with my local build, with a version of
> gecko of 2 weeks ago, one that doesn't have any of the changes mentioned
> here (including the one you backed out). (and again, youtube on mobile
> doesn't use MSE on flame). The FFOS team knew of a race in android's
> stagefright, unfortunate that there's no bug tracking for that issue.
> 
> As to backing out. A proper fix, or a push with a partial backout would have
> taken the same amount of time should proper channels were followed. I prefer
> the earlier. I don't like band-aid just hiding problems.
> 
> At the end of the day, what I would retain from all this is: why are
> mochitests and webref platform test disabled on FFOS ??
I'm very sorry that you suffered from this. 
Currently mochitests on B2G is only enabled on Android ICS version. However, MSE on B2G is only supported after JB(18), so it is not under protection with mochitests. I am going to check when mochitests on Android KK or L will be ready. And maybe it would be better to default pref-off the decoding path via MediaFormatReader until mochitests are ready.
There are two commits leading to some regression issues.

commit 18c83d08736ea7f4a2129f87677619bcedcc1cf6
Author: Alastor Wu <alwu@mozilla.com>
Date:   Fri Jul 17 17:25:25 2015 +0800

    Bug 1184055 - Muted by default in b2g. r=baku

This commit will cause audio / video playback failed to start.
Action: Check with Alastor.

commit a9584eece6f511085380f28478a9847974d89a07
Author: Jean-Yves Avenard <jyavenard@mozilla.com>
Date:   Fri Jul 17 16:51:40 2015 +1000

    Bug 1171379: P1. Enable MediaSourceDemuxer by default. r=kentuckyfriedtakahe

This commit cause ACodec crash.
Two possible root causes: 1) MediaSourceDemuxer feed incorrect data to decoder, which doesn't trigger fatal error with fault tolerant codec on other platform  2) Google aac decoder bug.

Action: Need to check further.
Flags: needinfo?(mchiang) → needinfo?(alwu)
It has been solved by the bug1185051 that media can't playback in initial.
Flags: needinfo?(alwu)
Munro, Blake states that MSE isn't used on FFOS as Youtube doesn't serve MSE video (it's currently an action item for them to do so).

So how could MediaSourceDemuxer be involved?

The audio served is the same across all platforms, so if the AAC data was incorrect I would expect to see problems on other platforms.
Jean-Yves,

As I know, MediaFormatReader / MediaSourceDemuxer are used by both local playback and MSE. I use gdb to trace and they are indeed used in this case.

Breakpoint 2, mozilla::MediaSourceDemuxer::Init (this=<optimized out>) at /home/munro/codebase/flame-kk-2/B2G/gecko/dom/media/mediasource/MediaSourceDemuxer.cpp:40
40	}
(gdb) bt
#0  mozilla::MediaSourceDemuxer::Init (this=<optimized out>) at /home/munro/codebase/flame-kk-2/B2G/gecko/dom/media/mediasource/MediaSourceDemuxer.cpp:40
#1  0xb53d72c2 in mozilla::MediaFormatReader::AsyncReadMetadata (this=0xafb64800) at /home/munro/codebase/flame-kk-2/B2G/gecko/dom/media/MediaFormatReader.cpp:293
Blocks: 1171379
This issue was not seen on the latest Spark 2.5 build during Smoketest.
So chances are bug 1185051 resolves/hides the issue on B2G, while Android may see this issue still.  Leaving open for firefox for android investigation.
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #33)
> So chances are bug 1185051 resolves/hides the issue on B2G, while Android
> may see this issue still.  Leaving open for firefox for android
> investigation.

I don't see how it could hide the issue on B2G.
Okay... found the reason.

With the new MSE on, two AAC decoders are created. The first one is unused and just serve to check if we can create an AAC decoder or not.
that decoder will be drained then flushed before being shutdown.

Draining a GonkMediaDataDecoder that hasn't been fed data causes the crash and assert in stagefright::ACodec.

Bug 1185886 added a work around so that if there are no pending samples in a decoder, it won't attempt to drain it.

I believe the bug should be properly fixed in the GonkMediaDataDecoder however, as it only shows that it's in an invalid state until data is fed ; an early shutdown could still cause the crash to occur.

I tested it locally on my flame ; new MSE can be enabled by default now.
Flags: needinfo?(mchiang)
This issue is a ACodec issue.
Google has fixed it with this patch https://android.googlesource.com/platform/frameworks/av/+/4bdda35319d5f46efea2089b865c8a64816389cd%5E%21/#F0

With this patch, there will be no unnecessary port definition query in this stage.
Flags: needinfo?(mchiang)
Changing ni? to Greg as he's handling all stability issues.
Flags: needinfo?(cyang) → needinfo?(ggrisco)
(In reply to Munro Chiang [:mchiang] from comment #37)
> Carol, we found an AOSP issue which has been fixed by Google. Could you help
> cherry-picj the commit to CAF?
> 
> CAF:
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/
> libstagefright/ACodec.cpp?h=LNX.LA.3.5.2.1.1
> 
> Google patch:
> https://android.googlesource.com/platform/frameworks/av/+/
> 4bdda35319d5f46efea2089b865c8a64816389cd%5E%21/#F0

It looks like the code already has this change:

https://www.codeaurora.org/cgit/external/gigabyte/platform/frameworks/av/tree/media/libstagefright/ACodec.cpp?h=caf/LF.BR.1.2.3#n4700

Can you confirm?
Flags: needinfo?(ggrisco) → needinfo?(mchiang)
Yes, LF.BR.1.2.3 already contains the change (https://www.codeaurora.org/cgit/external/gigabyte/platform/frameworks/av/commit/media/libstagefright/ACodec.cpp?h=caf/LF.BR.1.2.3&id=4bdda35319d5f46efea2089b865c8a64816389cd)

Can we merge the commit from LF.BR.1.2.3 to LNX.LA.3.5.2.1.1?
Flags: needinfo?(mchiang) → needinfo?(ggrisco)
Can you explain why you need this change on LNX.LA.3.5.2.1.1?  Also, it seems that Google already has the patch so I don't think it's something that we'd upstream to AOSP.  But I'm not really sure.
Flags: needinfo?(ggrisco)
Sorry, I indicated wrong branch.

Our FxOS Flame KK codebase manifest fetch the branch mozilla/b2g_kk_3.5.

The branch doesn't contain this Google fix yet.
https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/libstagefright/ACodec.cpp?h=mozilla%2Fb2g_kk_3.5#n3457

Can you help cherry-pick the patch to mozilla/b2g_kk_3.5?
Flags: needinfo?(ggrisco)
Oh, sorry Munro, but kk_3.5 is already met EOL and we are not making releases on that branch anymore.
Flags: needinfo?(ggrisco)
Smoketest has not seen this issue for several days. Resolving as WorksForMe.
YouTube is able to play properly without crashing.

Environmental Variables:
Device: Aries 2.5
Build ID: 20150730145424
Gaia: 1d3595836bd55b70478923d771051268a5dabf91
Gecko: 305ba37a62f8
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 42.0a1 (2.5)
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Resolution: --- → WORKSFORME
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(pbylenga)
You need to log in before you can comment on or make changes to this bug.