Open Bug 787319 Opened 8 years ago Updated 1 year ago

[meta] crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs or devices with PowerVR SGX 531 GPU running ICS

Categories

(Core :: Audio/Video: Playback, defect, critical)

17 Branch
ARM
Android
defect
Not set
critical

Tracking

()

Tracking Status
firefox16 --- unaffected
firefox17 - affected
firefox18 --- affected
firefox19 --- affected
fennec + ---

People

(Reporter: scoobidiver, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, meta, regression, Whiteboard: [native-crash], [swdecoder])

Crash Data

Attachments

(1 obsolete file)

It first appeared in 17.0a1/20120824. The regression range might be (discontinuous across builds):
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5650196a8c7d&tochange=1c0ac073dc65

Signature 	libomxplugin.so@0xde3 More Reports Search
UUID	55df875c-562d-4caa-8f7a-e39a12120830
Date Processed	2012-08-30 17:54:32
Uptime	12
Last Crash	36 seconds before submission
Install Age	51 seconds since version was first installed.
Install Time	2012-08-30 17:53:32
Product	FennecAndroid
Version	18.0a1
Build ID	20120830030531
Release Channel	nightly
OS	Linux
OS Version	0.0.0 Linux 3.0.8+1.0.21100-30145-01926-g787c968 #1 SMP PREEMPT Tue May 29 20:15:09 2012 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0xc77a04ec
App Notes 	
AdapterDescription: 'Qualcomm -- Adreno (TM) 220 -- OpenGL ES 2.0 2184622 -- Model: LT26i, Product: LT26i_1257-9042, Manufacturer: Sony Ericsson, Hardware: semc'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ 
Sony Ericsson LT26i
SEMC/LT26i_1257-9042/LT26i:4.0.4/6.1.A.0.453/Ov5_zw:user/release-keys
EMCheckCompatibility	True
Adapter Vendor ID	Qualcomm
Adapter Device ID	Adreno (TM) 220

Frame 	Module 	Signature 	Source
0 		@0xc77a04ec 	
1 	libomxplugin.so 	libomxplugin.so@0xde3 	
2 	libstagefright.so 	libstagefright.so@0x64303 	
3 	libstagefright.so 	libstagefright.so@0x64317 	
4 	libutils.so 	libutils.so@0x20c77 	
5 	libstagefright.so 	libstagefright.so@0x64303 	
6 	libstagefright.so 	libstagefright.so@0x64303 	
7 	libstagefright.so 	libstagefright.so@0x64203 	
8 	libstagefright.so 	libstagefright.so@0x14494e 	
9 	libstagefright.so 	libstagefright.so@0x13de16 	
10 	libstagefright.so 	libstagefright.so@0x74b13 	
11 	libomxplugin.so 	OmxDecoder::Init 	OmxPlugin.cpp:375
12 		@0x7 	
13 	libomxplugin.so 	CreateDecoder 	OmxPlugin.cpp:693
14 	libxul.so 	nsMediaPluginHost::CreateDecoder 	nsMediaPluginHost.cpp:125
15 	libxul.so 	nsMediaPluginReader::ReadMetadata 	nsMediaPluginReader.cpp:42
16 	libxul.so 	nsBuiltinDecoderStateMachine::DecodeMetadata 	nsBuiltinDecoderStateMachine.cpp:1720
17 	libxul.so 	nsBuiltinDecoderStateMachine::DecodeThreadRun 	nsBuiltinDecoderStateMachine.cpp:479
18 	libxul.so 	nsRunnableMethodImpl<void , true>::Run 	nsThreadUtils.h:349
19 	libxul.so 	nsThread::ProcessNextEvent 	nsThread.cpp:624
20 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:220
21 	libxul.so 	nsThread::ThreadFunc 	nsThread.cpp:257
22 	libnspr4.so 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:156
23 	libc.so 	libc.so@0x13496 	
24 	libc.so 	libc.so@0x12fd2 

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libomxplugin.so%400xde3
These crashes are all various Sony devices with Adreno 220 or 225 GPUs running ICS 4.0.4. 

The OmxPlugin.cpp code near line 375 is calling mAudioSource->read(). I wonder if this crash is related to bug 785340, a different mAudioSource->read() crash that also started around 17.0a1/20120824.
Priority: -- → P1
Summary: crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so@0xde3 on ICS → crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so@0xde3 on Sony devices with Adreno 220/225 GPUs running ICS
Crash Signature: [@ libomxplugin.so@0xde3] → [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37]
Summary: crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so@0xde3 on Sony devices with Adreno 220/225 GPUs running ICS → crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs running ICS
It's #2 top crasher in 17.0b1 with many dupes. I assume you can't browse with those devices.
Keywords: topcrash
tracking-fennec: --- → ?
Crash Signature: [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] → [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] [@ libcutils.so@0x61f4] [@ libcutils.so@0x638c] [@ libcutils.so@0x62a0]
Summary: crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs running ICS → crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs or devices with PowerVR SGX 531 GPU running ICS
This has moved up into the #1 spot for crashing - handing over to Chris for further investigation.
Assignee: nobody → cpeterson
AFAICT, when Stagefright's MediaExtractor::Create() is sniffing the media file type, it calls back into our OmxPlugin's MediaStreamSource::readAt() with bad data. This is surprising because Stagefright is just peeking at the first few bytes of the media data; we haven't even begun decoding video yet.

The Sony Ericsson LT26i (Xperia S aka Xperia NX aka "Nozomi") accounts for about half of the crash reports, so it's either very popular or very crashy.
Status: NEW → ASSIGNED
doublec, when you have a chance, can you please take a look at this Stagefright crash? This bug is our #1 topcrash for Fennec Beta 17. I'm supposed to be focusing 100% on B2G IME bugs right now, so my boss blassey has asked me to reassign my high-priority Stagefright bugs to you.
Assignee: cpeterson → chris.double
Blocks: 755364
Blocks: 782508
I don't have access to this type of phone. Can I get one sent to me?
I'll check the local importers today and see if any of them have the device I can expense.
I've been able to get a crash on a Sony Xperia Ion LT28H playing an mp4:

F/libc    ( 6063): Fatal signal 11 (SIGSEGV) at 0xc77a04ec (code=1)
If I use DataSource::CreateFromURI to create the datasource, passing a URI to an MP4 file, playback works although it uses the software decoder due to:

I/OmxPlugin( 7827): Unknown video color format: 0x7fa30c03
I/OmxPlugin( 7827): Falling back to software decoder

So for some reason our custom derived DataSource is causing the crash, much like it was on GingerBread. Does the Xperia have some sort of munged libstagefright?
Crash Signature: [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] [@ libcutils.so@0x61f4] [@ libcutils.so@0x638c] [@ libcutils.so@0x62a0] → [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] [@ libomxplugin.so@0xedb] [@ libstagefright.so@0x76974] [@ libcutils.so@0x61f4] [@ libcutils.so@0x638c] [@ libcutils.so@0x62a0]
All the crashing devices seem to be running the same OS version, ICS 4.0.4.

I saw a possibly related problem with media sniffing (bug 782830). When trying to sniff the media type on the Samsung Galaxy S III (ICS 4.0.4), the call to MediaExtractor::Create() would never return. Stagefright's AwesomePlayer would start playing our audio track, so we never got a chance to play the video ourselves. Strangely, I only saw this problem on the Korean/international model of the Galaxy S III, not North American model.
(In reply to Chris Double (:doublec) from comment #9)
> I/OmxPlugin( 7827): Unknown video color format: 0x7fa30c03
> I/OmxPlugin( 7827): Falling back to software decoder

btw, color format 0x7fa30c03 is QOMX_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka. The vlc source code claims this format is identical to NV12:

http://git.videolan.org/gitweb.cgi/vlc.git/?p=vlc.git;a=blob;f=modules/codec/omxil/qcom.c;h=9f530b2d5a0f0396b387ed88e175695530cfc1e1;hb=4a8d38843ad4012e2ea0c75e0c485edd6a41ce07#l36
Whiteboard: [native-crash] → [native-crash], [swdecoder]
It looks like Sony are using a modified DataSource - the wrong virtual functions are being called in places. I'm trying to track down what their class layout is.
I have playback working now with a modified DataSource.h. The virtual functions in DataSource.h seem to have been re-arranged and there's a new one that gets called. If I return 'OK' for that new one and put readAt in the spot in the table that the device seems to expect it works.

Remaining work:

1) More testing/minimize the patch
2) Change build to produce a shared library for these devices
3) Detect these devices somehow and load that specific shared library
4) Fix color conversion so hardware decoder works.
tracking-fennec: ? → 17+
I'm going to spin out (4) into a separate bug and just fix the crashing here since getting the color conversion working appears to be complicated. It's not NV12, the source mentioned in comment 11 seems to just refer to them naming the YUV420 portion of the identifier.

For (3) I'm unsure how specific the detection should be so I'm appealing to the android people to get an idea of what they'd prefer:

1) Calculate a CRC of the libstagefright shared library and only use our modified code loaded on devices with that specific CRC. 

This will likely break on minor device updates. But we won't know if our code works on those updates anyway.

2) Check specific device and android versions.

This could break if someone installs a 4.0.4 alternate rom with different stagefright libraries.

Thoughts? Other options?
I think option 2 sounds safer and easier. Would Sony's busted libstagefright.so likely have the same CRC on different device models (even if they are all running ICS 4.0.4)?
I don't think there is such a think as anything being "likely" in the android world sadly.
Attached patch Fix crash (obsolete) — Splinter Review
Fixes crash. Software decoding only due to lack of support for the color conversion format. I'll file a followup bug for this.

I've whitelisted the devices and versions I saw in the crashreports. I'm not sure if this is the right approach and I'm open to suggestions from the Android team to decide how they want to enable/switch on this.

I've only tested on a Sony Xperia Ion running 4.0,4 which previously crashed but with this patch plays video.
Attachment #673065 - Flags: review?(cpeterson)
Blocks: 803394
Bug 803394 is the followup bug for getting hardware decoding working on these devices.
(In reply to Chris Double (:doublec) from comment #17)
> I've whitelisted the devices and versions I saw in the crashreports.
Some devices are missing (those with PowerVR SGX 531 GPU and Huawei MediaPad with Adreno 220 GPU) in the code. Indeed, here are correlations per device for each signature in 17.0 Beta extracted from https://crash-analysis.mozilla.com/rkaiser/2012-10-17/2012-10-17.fennecandroid.beta.17.0.devices.weekly.html#sigs:
* libomxplugin.so@0xe37	844
Sony Ericsson LT26i	591
Sony Ericsson LT22i	116
Sony ST25i	29
Sony Ericsson LT28h	22
Sony MT27i	21
Sony Ericsson IS12S	15
Sony Ericsson LT26w	15
Sony LT30p	13
Sony Ericsson LT28at	10
Sony SO-05D	6
Sony ST27i	3
Sony SO-04D	3
* libcutils.so@0x61f4 	66
Alps GT-I9100 	12
GT-I9300 GT-I9300 	11
Samsung GT-I9100 	9
Micromax A90 	8
Alps A9550 	5
Alps B79 	5
Alps Zyrex_ZA987_OneScribe 	5
Iball iball 	4
Alps i-mobile i-style Q1 	2
Alps i-mobile i-style Q2 	2
Samsung GT-P1000 	1
Motorola XT910 	1
LENOVO Lenovo S880 	1
* libstagefright.so@0x76974 	35
HUAWEI MediaPad 	35
* libcutils.so@0x638c 	19
Alps GT-I9100 	13
Alps i-mobile i-style Q3 	6
* libcutils.so@0x63a0 	4
LENOVO Lenovo A789 	3
Fly IQ440 	1
* libcutils.so@0x62a0 	3
Alps I9220 	3
(In reply to Scoobidiver from comment #19)
> Some devices are missing (those with PowerVR SGX 531 GPU and Huawei MediaPad
> with Adreno 220 GPU) in the code. 

I only have a single Sony device for testing and this fix is only for Sony devices. In fact I can only say the fix works on the Sony Xperia Ion. And fixes the crash I encountered in comment 8. I defer to the Android team to tell me what devices they want to whitelist given that information. Hopefully they have more devices that me to test with.

> * libomxplugin.so@0xe37	844
> * libcutils.so@0x61f4 	66
> * libstagefright.so@0x76974 	35
> * libcutils.so@0x63a0 	4
> * libcutils.so@0x62a0 	3

Each of these signatures should be a separate bug report imho. Or each device should have a separate bug for it. Otherwise we're going to end up with one long bug constantly being reopened and closed for any crash in video playback.
(In reply to Chris Double (:doublec) from comment #20)
> Each of these signatures should be a separate bug report imho. Or each
> device should have a separate bug for it.
I disagree. A bug is independent from the way it will be fixed or the device where it's tested.

> Otherwise we're going to end up with one long bug constantly being reopened
> and closed for any crash in video playback.
If we want to add later a device currently not in crash stats to the newly created blacklist, we can file a dedicated bug.
Comment on attachment 673065 [details] [diff] [review]
Fix crash

Review of attachment 673065 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/plugins/nsMediaPluginHost.cpp
@@ +95,5 @@
>    PRLibrary *lib = PR_LoadLibrary(name);
>    if (lib) {
>      Manifest *manifest = static_cast<Manifest *>(PR_FindSymbol(lib, "MPAPI_MANIFEST"));
> +    if (manifest) {
> +      ALOG("Loaded omx plugin: %s", name);

1. Does this ALOG() need to be #ifdef'd ANDROID?

2. Would a "Failed to load" error message be helpful? Or is TryLoad() expected to be called with library names that might not be found (so an error message would cause unnecessary alarm)?

@@ +105,5 @@
>  nsMediaPluginHost::nsMediaPluginHost() {
>    MOZ_COUNT_CTOR(nsMediaPluginHost);
>  #if defined(ANDROID) && !defined(MOZ_WIDGET_GONK)
> +  nsCOMPtr<nsIPropertyBag2> infoService = do_GetService("@mozilla.org/system-info;1");
> +  NS_ASSERTION(infoService, "Could not find a system info service");

I prefer the fatal MOZ_ASSERT() over NS_ASSERTION(). We would crash on a null infoService anyways, so we might as well pinpoint the source of the error with an accurate assert.

@@ +111,5 @@
> +  nsAutoString device;
> +  nsresult rv = infoService->GetPropertyAsAString(NS_LITERAL_STRING("device"), device);
> +  if (NS_SUCCEEDED(rv)) {
> +    ALOG("Android Device is: %s", NS_LossyConvertUTF16toASCII(device).get());
> +  }

If the "device" property is not found, I think we should log an error message and/or fatally MOZ_ASSERT().

@@ +119,5 @@
> +  int32_t version;
> +  rv = infoService->GetPropertyAsInt32(NS_LITERAL_STRING("version"), &version);
> +  if (NS_SUCCEEDED(rv)) {
> +    ALOG("Android Version is: %d", version);
> +  }

If the "version" property is not found, I think we should log an error message and/or fatally MOZ_ASSERT().

@@ +121,5 @@
> +  if (NS_SUCCEEDED(rv)) {
> +    ALOG("Android Version is: %d", version);
> +  }
> +
> +  if (version ==15 &&

Need a space between == and 15.

@@ +132,5 @@
> +    TryLoad("lib/libomxpluginsony.so");
> +  }
> +  else {
> +    TryLoad("lib/libomxplugin.so");
> +  }

I think these tricky device and version checks should be extracted into a separate function. If you refactored this code into a function like `const char* GetOmxPluginName()`, you could replace these multiple calls to TryLoad() with just one.

Also, GetOmxPluginName() would be a good place to add other whitelist/blacklist logic. The function could return NULL for blacklisted and unsupported devices.

::: media/omx-plugin/include/ics/stagefright/DataSource.h
@@ +73,5 @@
> +    virtual void foo1() { }
> +    virtual void foo2() { }
> +    virtual void foo3() { }
> +    virtual void foo4() { }
> +#endif

Since you named these virtual functions "foo", I assume you don't know what Sony expects them to do. Does Sony's libstagefright actually call these functions or do you just need them to pad DataSource's vtable?

I think you should mark them private and add fatal MOZ_CRASH() or abort() even in release builds. If Sony's libstagefright calls our foo() functions, it will likely crash mysteriously later. I think a clean MOZ_CRASH() will make crash reports more informative.
Attachment #673065 - Flags: review?(cpeterson) → review-
(In reply to Chris Double (:doublec) from comment #20)
> Each of these signatures should be a separate bug report imho. Or each
> device should have a separate bug for it. Otherwise we're going to end up
> with one long bug constantly being reopened and closed for any crash in
> video playback.

I agree with doublec that each crash address should probably be a separate bug report. Different addresses suggest that the reason for the crash is different or that the library version is different. Since this bug is about Sony beaking binary compatibility of the libstagefright.so ABI, library version is particularly important for this bug.

Maybe we should file a new dependent bug for doublec's Sony libomxplugin.so@0xe37 fix and make this bug into a meta-bug?
Attachment #673065 - Attachment is obsolete: true
(In reply to Chris Peterson (:cpeterson) from comment #23)
> Maybe we should file a new dependent bug for doublec's Sony
> libomxplugin.so@0xe37 fix and make this bug into a meta-bug?
I agree with that.
Keywords: meta
Depends on: 803794
Removing myself as assignee as I don't work on meta bugs. I'll work on any specific bugs raised for single issues.
Assignee: chris.double → nobody
tracking-fennec: 17+ → ?
Not tracking this bug, because it's a meta bug, but we'll track all the breakout bugs for each signature as their importance to 17 dictates.
Crash Signature: [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] [@ libomxplugin.so@0xedb] [@ libstagefright.so@0x76974] [@ libcutils.so@0x61f4] [@ libcutils.so@0x638c] [@ libcutils.so@0x62a0] → [@ libomxplugin.so@0xde3] [@ libomxplugin.so@0xe37] [@ libomxplugin.so@0xedb] [@ libstagefright.so@0x76974] [@ libcutils.so@0x61f4] [@ libcutils.so@0x638c] [@ libcutils.so@0x62a0] [@ libcutils.so@0x63a0] [@ libcutils.so@0x6770]
Depends on: 804768
tracking-fennec: ? → 17+
Summary: crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs or devices with PowerVR SGX 531 GPU running ICS → [meta] crash in nsMediaPluginHost::CreateDecoder @ libomxplugin.so on Sony devices with Adreno 220/225 GPUs or devices with PowerVR SGX 531 GPU running ICS
Keywords: topcrash
no longer tracking 17 since we have a white list in place
tracking-fennec: 17+ → +
Depends on: 817478
Component: Audio/Video → Audio/Video: Playback
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.
You need to log in before you can comment on or make changes to this bug.