Android crash in EnterBaseline on ARMv6 devices

RESOLVED WORKSFORME

Status

()

defect
--
critical
RESOLVED WORKSFORME
6 years ago
6 years ago

People

(Reporter: scoobidiver, Assigned: jandem)

Tracking

(Blocks 1 bug, {crash, regression})

23 Branch
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox22 unaffected, firefox23+ unaffected, firefox24 unaffected)

Details

(Whiteboard: [native-crash][ARMv6][leave open], crash signature)

Attachments

(3 attachments)

The fixes of bug 858083 and bug 858002 are not enough to fix crashes on ARMv6 devices.
It's #3 top crasher in 23.0a1.

Signature 	EnterBaseline More Reports Search
UUID	0aacc0e4-2883-4548-903c-2bacf2130419
Date Processed	2013-04-19 11:21:10
Uptime	36
Last Crash	52 seconds before submission
Install Age	16.3 hours since version was first installed.
Install Time	2013-04-18 19:01:06
Product	FennecAndroid
Version	23.0a1
Build ID	20130418031048
Release Channel	nightly
OS	Android
OS Version	0.0.0 Linux 2.6.35.10-perf+cm9 #1 PREEMPT Wed Dec 12 12:45:39 EET 2012 armv6l lge/thunderg/thunderg:2.3.3/GRI40/LG-P500-V20g.19C11F164C:user/release-keys
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x51
App Notes 	
AdapterDescription: 'Qualcomm -- Adreno (TM) 200 -- OpenGL ES 2.0 2131267 -- Model: LG-P500, Product: lge_p500, Manufacturer: LGE, Hardware: p500'
GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+ 
nothumb Build
LGE LG-P500
lge/thunderg/thunderg:2.3.3/GRI40/LG-P500-V20g.19C11F164C:user/release-keys
Processor Notes 	sp-processor09.phx1.mozilla.com_851:2012; exploitability tool failed: 127
EMCheckCompatibility	True
Adapter Vendor ID	Qualcomm
Adapter Device ID	Adreno (TM) 200
Device	LGE LG-P500
Android API Version	15 (REL)
Android CPU ABI	armeabi-v6l

Frame 	Module 	Signature 	Source
0 		@0x4c10b600 	
1 	libxul.so 	EnterBaseline 	js/src/ion/BaselineJIT.cpp:153

More reports at:
https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=EnterBaseline
Whiteboard: [native-crash][ARMv6]
This is the top crash on Android now even though it only affects ARMv6, which is a rather small set of our user population.

We don't have many URLs, but here's the small sample we have, minus one with adult content:

2 	http://www.boston.com/metrodesk/2013/04/25/VdhQ9rI8u6nGJWsLlGuMuN/story.html
1 	https://apps.facebook.com/inthemafia/?install_source&zy_track&install_link&zy_link&zy_creative&fb_sig_locale&zy_affiliate&sendkey&type&ref=bookmarks&mw_rdcnt2=1&state=90573d3bc40bf5c812618cc5dd525472&code=AQDVdoPofVFTc7ttPdWIPrxke7e_U8ZVb4K9ln-Ia7vDI8g1uJ
1 	about:crashes
1 	http://sourceforge.net/projects/v4l2vd/forums/forum/579262/topic/3317824
1 	http://www.google.com/mobile/+/
1 	about:blank
1 	about:config
1 	about:home
1 	http://www.foxnews.com/us/2013/04/25/woman-who-accused-military-officer-sexual-assault-stunned-by-reversal-his/


I'm assigning this to Jan, who has looked into Baseline crashes before and has access to minidumps to be able to figure this out.
Assignee: general → jdemooij
Also, here is a list of Android devices that saw EnterBaseline crashes in the last 7 days:

EnterBaseline 	33
Samsung GT-S5360 	13
HTC Wildfire 	9
HTC Liberty 	4
Samsung GT-S5830 	2
Samsung GT-S5570 	2
Sony Ericsson E15i 	1
Samsung GT-B5512 	1
Amazon KFTT 	1
Bug 861208 might be contributing to these problems on ARM6 devices.  This issue is related to the constant pools and not the absence of the VFP, so crashes on ARMv6 devices with a VFP would support this being a dup of bug 861208.
Marty and I looked at one of the minidumps. The problem seems to be that we are emitting VFP instructions (vpush in this case) on hardware that doesn't support it.

Although Ion is disabled on such hardware, Baseline still works. We use cx->runtime->jitSupportsFloatingPoint to disable FP-related stubs.

What we should try:

(1) Build an ARM debug-build where HasVFP() always returns |false|.
(2) Add asserts to all VFP-related (Macro)Assembler methods.
(3) Run jit-tests and fix all asserts.

That's what I did for pre-SSE2 hardware in bug 858022 and worked pretty well.

Marty, Douglas: I don't have real ARM hardware, so would one of you guys be interested in this?
(In reply to Jan de Mooij [:jandem] from comment #4)
> Marty and I looked at one of the minidumps. The problem seems to be that we
> are emitting VFP instructions (vpush in this case) on hardware that doesn't
> support it.
> 
> Although Ion is disabled on such hardware, Baseline still works. We use
> cx->runtime->jitSupportsFloatingPoint to disable FP-related stubs.
> 
> What we should try:
> 
> (1) Build an ARM debug-build where HasVFP() always returns |false|.

Bug 857071 adds support for reading a list of feature overrides
from an environment variable and can be used to override HasVFP()
to return |false| for such testing.  Shall revise it and submit
it for review.  Marty, any feedback on this bug would be
appreciated?

> (2) Add asserts to all VFP-related (Macro)Assembler methods.

This is a good idea, thanks.  I'll look into it.  There are
other features that could also use similar debug assertion
checks.
Adding this assertion to a common VFP code path picks up lots of test failures.  

Baseline failure:

#0  0x0054f652 in js::ion::Assembler::writeVFPInst (this=0xf6fdce20, sz=js::ion::Assembler::isDouble, blob=3777822752, dest=0x0) at js/src/ion/arm/Assembler-arm.cpp:1970
#1  0x0054ffe8 in js::ion::Assembler::as_vdtm (this=0xf6fdce20, st=js::ion::IsStore, rn=..., vd=..., length=32, c=js::ion::Assembler::Always) at js/src/ion/arm/Assembler-arm.cpp:2205
#2  0x00464fee in js::ion::Assembler::finishFloatTransfer (this=0xf6fdce20) at js/src/ion/arm/Assembler-arm.h:1718
#3  0x0054739a in GenerateBailoutThunk (masm=..., frameClass=0) at js/src/ion/arm/Trampoline-arm.cpp:481
#4  0x00547750 in js::ion::IonRuntime::generateBailoutTable (this=0xf64d4cf0, cx=0xf64d8750, frameClass=0) at js/src/ion/arm/Trampoline-arm.cpp:566
#5  0x004008c4 in js::ion::IonRuntime::initialize (this=0xf64d4cf0, cx=0xf64d8750) at js/src/ion/Ion.cpp:210
#6  0x000a219c in JSRuntime::createIonRuntime (this=0xf64a2eb8, cx=0xf64d8750) at js/src/jscompartment.cpp:120
#7  0x000a1258 in JSRuntime::getIonRuntime (this=0xf64a2eb8, cx=0xf64d8750) at js/src/jscntxt.h:789
#8  0x000a2220 in JSCompartment::ensureIonCompartmentExists (this=0xf6539858, cx=0xf64d8750) at js/src/jscompartment.cpp:142
#9  0x003f1c7c in js::ion::CanEnterBaselineJIT (cx=0xf64d8750, scriptArg=0xf5c29098, fp=0xf5ea2028, newType=false) at js/src/ion/BaselineJIT.cpp:256
#10 0x0012690c in js::RunScript (cx=0xf64d8750, fp=0xf5ea2028) at js/src/jsinterp.cpp:357
#11 0x001275b4 in js::ExecuteKernel (cx=0xf64d8750, script=..., scopeChainArg=..., thisv=..., type=js::EXECUTE_GLOBAL, evalInFrame=..., result=0x0) at js/src/jsinterp.cpp:573
#12 0x001277c2 in js::Execute (cx=0xf64d8750, script=..., scopeChainArg=..., rval=0x0) at js/src/jsinterp.cpp:613
#13 0x0005a43e in JS_ExecuteScript (cx=0xf64d8750, objArg=0xf5c25040, scriptArg=0xf5c29098, rval=0x0) at js/src/jsapi.cpp:5603

Asm.js failure (probably should not have been attempting to asm.js compile without VFP?):

#0  0x0054f652 in js::ion::Assembler::writeVFPInst (this=0xf6fd8f40, sz=js::ion::Assembler::isDouble, blob=3777822752, dest=0x0) at js/src/ion/arm/Assembler-arm.cpp:1970
#1  0x0054ffe8 in js::ion::Assembler::as_vdtm (this=0xf6fd8f40, st=js::ion::IsStore, rn=..., vd=..., length=32, c=js::ion::Assembler::Always) at js/src/ion/arm/Assembler-arm.cpp:2205
#2  0x00464fee in js::ion::Assembler::finishFloatTransfer (this=0xf6fd8f40) at js/src/ion/arm/Assembler-arm.h:1718
#3  0x0054739a in GenerateBailoutThunk (masm=..., frameClass=0) at js/src/ion/arm/Trampoline-arm.cpp:481
#4  0x00547750 in js::ion::IonRuntime::generateBailoutTable (this=0xf6504288, cx=0xf64d8750, frameClass=0) at js/src/ion/arm/Trampoline-arm.cpp:566
#5  0x004008c4 in js::ion::IonRuntime::initialize (this=0xf6504288, cx=0xf64d8750) at js/src/ion/Ion.cpp:210
#6  0x000a219c in JSRuntime::createIonRuntime (this=0xf64a2eb8, cx=0xf64d8750) at js/src/jscompartment.cpp:120
#7  0x000a1258 in JSRuntime::getIonRuntime (this=0xf64a2eb8, cx=0xf64d8750) at js/src/jscntxt.h:789
#8  0x000a2220 in JSCompartment::ensureIonCompartmentExists (this=0xf6539858, cx=0xf64d8750) at js/src/jscompartment.cpp:142
#9  0x0050432a in ModuleCompiler::init (this=0xf6fda6c8) at js/src/ion/AsmJS.cpp:1101
#10 0x005103d2 in CheckModule (cx=0xf64d8750, ts=..., fn=0xf650c780, module=0xf6fdbf70) at js/src/ion/AsmJS.cpp:5429
#11 0x005107b6 in js::CompileAsmJS (cx=0xf64d8750, ts=..., fn=0xf650c780, options=..., scriptSource=0xf64d60f0, bufStart=0, bufEnd=316, moduleFun=...) at js/src/ion/AsmJS.cpp:5529

Will look into this further.
I wouldn't worry about any assertion failures, since fennec *requires* a vfp unit in order to be supported.  Likely the best course of action is to check if there is a vfp unit present at startup, and bail with an error message if it isn't.  Evidently, the play store version of fennec has blacklisted basically every device without a vfp unit, so these are all from people who have manually installed the .apk.

Other features like mov{w,t} and vfpv3 features can be emulated using other instructions (pc relative load, slower truncation code), but without a vfp unit, there is really nothing that we can do short of stubbing every single fpu operation out to the libc's software implementation of the operation, so we've marked those devices as unsupported.  sse2 is a bit different, since old x86 machines without sse2 can still do floating poit arithmetic using the 8087 floating point stack.
(In reply to Marty Rosenberg [:mjrosenb] from comment #7)
> Evidently, the play store version of fennec has blacklisted
> basically every device without a vfp unit, so these are all from people who
> have manually installed the .apk.

These are all people on Nightly, which isn't available on the Play Store at all, but only through .apk downloads. :)
OK, if this configuration is not officially supported, let's just disable Baseline explicitly on ARM hardware without a vfp unit.
Crash Signature: [@ EnterBaseline] → [@ EnterBaseline] [@ @0x0 | EnterBaseline ]
The VFP feature detection in isVFPPresent() current uses the presence of the __VFP_FP__ to assume that the CPU has VFP support. However __VFP_FP__ only indicates that the floating point format is the VFP format, which is IEEE-754, and does not indicate that the code is compiled to use the VFP CPU feature.  This patch just removes this compile time conditional so that isVFPPresent() will fall back to checking '/proc/cpuinfo' - this would have been unreachable code for most builds, and becomes a new code path to be tested.

The checking of '/proc/cpuinfo' has been changed to also be used when MOZ_B2G is defined because this parallels the detection code used by Ion and is perhaps necessary.

There is some duplication of the ARM cpu feature detection between the code here and the Ion compiler, see js/src/ion/arm/Architecture-arm.cpp.  This duplication also affects code elsewhere, such as JitSupportsFloatingPoint() in jsapi.cpp where it tests both supportsFloatingPoint() and then hasVFP().

Would it be appropriate to move the Ion ARM feature detection into the assembler/assembler/MacroAssemblerARM* ?
Attachment #745757 - Flags: review?
Attachment #745757 - Flags: feedback?(mrosenberg)
Attachment #745757 - Flags: feedback?(jdemooij)
With these patches applied and some build hacks, it has been possible to compiler up Firefox for Android without the VFP support. It passes some limited testing and seems to be using the Baseline compiler.  This has not been tested on real ARM6 hardware with VFP support - it may well still be emitting VFP instructions that have not been noticed.

Getting together suitable tools for test is a challenge. Locating an Android device with an ARM6 cpu that does not have the VFP feature but does have adequate graphics support to run and test Firefox would be helpful.  The Android emulator defaults to an ARM6 device with VFP, but it may be possible to choose another processor without VFP.

Would the mobile team want to revisit the decision to compile the ARM6 version with VFP support?  The JS compilers could dynamically take advantage of the VFP feature if present.  But would it significantly degrade performance in other areas.
Attachment #745760 - Flags: review?(jdemooij)
(In reply to Marty Rosenberg [:mjrosenb] from comment #7)
> I wouldn't worry about any assertion failures, since fennec *requires* a vfp
> unit in order to be supported.  Likely the best course of action is to check
> if there is a vfp unit present at startup, and bail with an error message if
> it isn't.  Evidently, the play store version of fennec has blacklisted
> basically every device without a vfp unit, so these are all from people who
> have manually installed the .apk.

Yes, Firefox for Android is built with the -mfpu=vfp compiler option
so VFP machine code could be expected to be found even in the compiled C
code.  When built with this option it needs to check early and bail out
if the feature is not found.

Interesting, Firefox still seems to basically work even using a soft-float
build, along with some minor changes.  It would be interesting to know
where the VFP support is needed, apart from the JS compilers.
(In reply to Douglas Crosher [:dougc] from comment #11)
> Locating an Android
> device with an ARM6 cpu that does not have the VFP feature but does have
> adequate graphics support to run and test Firefox would be helpful.

Does comment #2 help you there?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> (In reply to Douglas Crosher [:dougc] from comment #11)
> > Locating an Android
> > device with an ARM6 cpu that does not have the VFP feature but does have
> > adequate graphics support to run and test Firefox would be helpful.
> 
> Does comment #2 help you there?

Yes, the comments help.

Looking at the comments shows a number of devices based on the MSM7227
processor.  The MSM7227 does not have the VFP feature (but the
MSM7227A does).  The LG Optimus ONE and HTC Wildfire S stand out as
having 512MB RAM and might be better developer phones.  The GPU is the
Adreno 200, and perhaps would support Firefox Android?  At the least
it should be possible to run some JS shell tests on these.

* LG Optimus ONE P500
  CPU: MSM7227

* HTC Wildfire
  CPU: Qualcomm MSM7225

  Confident this has no VFP, but it appears to have no GPU so
  probably would not run Firefox anyway.

* HTC Liberty
  CPU: MSM7227

* Samsung GT-S5830 AKA: Samsung Galaxy Ace
  CPU: MSM7227-1

* Samsung GT-S5570 AKA: Samsung Galaxy Mini
  CPU: MSM7227

Note the top crasher is the Samsung GT-S5360 which appears to have a
Broadcom BCM21553 CPU and searching for 'VFP BCM21553' returns
/proc/cpuinfo results with VFP in the feature list, so it is
not clear that the Samsung GT-S5360 crashes are VFP related.

I've not had any luck with the Android SDK emulator.  The images
appear to support the ARMV5te and will not run with an ARM11 cpu
selected, and the ARMV5te will not run ARM6 specific code. Perhaps
a custom image could be built for an ARM11 cpu without VFP.
Comment on attachment 745757 [details] [diff] [review]
Incremental fix for the ARM VFP detection.

Review of attachment 745757 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/MacroAssemblerARM.cpp
@@ -72,5 @@
>  
> -#if defined(__GNUC__) && defined(__VFP_FP__)
> -    return true;
> -#endif
> -

This is in the old JSC assembler.  What code is still using that?  also, VFP should always be defined in our builds (not that you can't build without it)
(In reply to Douglas Crosher [:dougc] from comment #12)
> Yes, Firefox for Android is built with the -mfpu=vfp compiler option
> so VFP machine code could be expected to be found even in the compiled C
> code.

Sorry for the delay here. However, how important is it to support ARMv6-without-VFP support if (1) our official builds are not available on such hardware and (2) our nightly builds also assume C/C++ code is available?
(In reply to Jan de Mooij [:jandem] from comment #16)
> (In reply to Douglas Crosher [:dougc] from comment #12)
> > Yes, Firefox for Android is built with the -mfpu=vfp compiler option
> > so VFP machine code could be expected to be found even in the compiled C
> > code.
> 
> Sorry for the delay here. However, how important is it to support
> ARMv6-without-VFP support if (1) our official builds are not available on
> such hardware and (2) our nightly builds also assume C/C++ code is available?

It's not critical.

It would be good to have Firefox bail out early if the build
depends on features not available - and before generating a
crash report.  There are other ARMv6 issues that could cause
crashes and filtering out the VFP related crashes would make
the crash reports more useful.

Some clean up of rotting duplicate code would seem positive.
A supported build working due to programming errors does not
seem a great state.

A few small fixes to support the ARMv6 non-VFP version does
not seem a big issue.  It is quite usable for general web
browsing using only the BC.

I do not know the reasons behind the decision to
support the ARMv6, or to compile the C code to use VFP
instructions, or the reason not to support a non-VFP
build.

Personally I would like to see the ARMv6-VFP R.Pi
supported in the source, and I'm happy to explore
ARMv6 non-VFP support a little if it helps open new
opportunities to gain users of legacy mobile phones.

Marty has only recently added the Ion ARMv6 support
and the BC is new too, so perhaps the state has
changed.
Got hold of a LG Optimus ONE P500 and it does appear
to have VFP support, as reported by /proc/cpuinfo.

Upon further searching it still appears to be consistent
with it having a MSM7227 cpu, as this appears to be an
ARM1136 based design and this has optional VFP support.

Not sure if all the models noted above with MSM7227
cpus have VFP support, but it is possible.

Here's the relevant output from /proc/cpuinfo:
Processor : ARMv6-compatible processor rev 5 (v61)
MogoMIPS  : 599.65
Features  : swp half thumb fastmult vfp edsp java
CPU implementer : 0x41
CPU architecture : 6TEJ
CPU variant : 0x1
CPU part : 0xb36
CPU revision : 5
Hardware : THUNDER Global board (LGE LGP500)

Firefox 21 installs from Google Play Store.
These devices do appear to be currently supported
without even needing to download an ARMv6 specific
build. These were popular devices and are still
quite usable.
Comment on attachment 745757 [details] [diff] [review]
Incremental fix for the ARM VFP detection.

Review of attachment 745757 [details] [diff] [review]:
-----------------------------------------------------------------

See Marty's comment 15.
Attachment #745757 - Flags: feedback?(jdemooij)
Attachment #745760 - Flags: review?(jdemooij) → review+
Keywords: checkin-needed
Whiteboard: [native-crash][ARMv6] → [native-crash][ARMv6][leave open]
It's still #10 top crasher in 24.0a1 despite the patch. It's a low volume crash in 23.0a2 because there are almost no ARMv6 users on Aurora.
(In reply to Scoobidiver from comment #22)
> It's still #10 top crasher in 24.0a1 despite the patch. It's a low volume
> crash in 23.0a2 because there are almost no ARMv6 users on Aurora.

The patch was a small part of ARMv6 no-VFP support, and was not
expected to address the issue alone.  Most of the devices appear
to be ARMv6 with VFP so it's not even relevant.

If there is no other solution on the horizon then perhaps the
proof-of-concept patch in bug 861208 could be landed to at least
try and reduce the frequency of these crashes?
Depends on: 861208
Crashes have stopped since 24.0a1/20130610 and 23.0a2/20130610. The working range (3 days) might be (were discontinuous across builds previously):
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=efbe547a7972&tochange=0414d6d0f60d
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=9eeb7511a906&tochange=6f7c753b3abc
It might have been fixed by the patch of bug 879651 or bug 865820.

(In reply to Douglas Crosher [:dougc] from comment #23)
> The patch was a small part of ARMv6 no-VFP support, and was not
> expected to address the issue alone.  Most of the devices appear
> to be ARMv6 with VFP so it's not even relevant.
Do you plan to uplift the patch to Aurora?
Flags: needinfo?(dtc-moz)
Keywords: topcrash
(In reply to Scoobidiver from comment #24)
> Crashes have stopped since 24.0a1/20130610 and 23.0a2/20130610. The working
> range (3 days) might be (were discontinuous across builds previously):
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=efbe547a7972&tochange=0414d6d0f60d
> http://hg.mozilla.org/releases/mozilla-aurora/
> pushloghtml?fromchange=9eeb7511a906&tochange=6f7c753b3abc
> It might have been fixed by the patch of bug 879651 or bug 865820.

If bug 879651 or bug 865820 address the crashes then my understanding of the problem is wrong.

It might be that these crashes are begin masked by other bugs.  Bug 860838 broke the ARM JS compiler and this is still being resolved.  Let's see if the crashes return when the tree is in better shape.
 
> (In reply to Douglas Crosher [:dougc] from comment #23)
> > The patch was a small part of ARMv6 no-VFP support, and was not
> > expected to address the issue alone.  Most of the devices appear
> > to be ARMv6 with VFP so it's not even relevant.
> Do you plan to uplift the patch to Aurora?

The patch would not help solve this bug - as I understand it. It might only help someone adding support for ARMv6 with no-VFP in future.

It is possible that this bug will be addressed in bug 760642, but I am not yet sure of the scope, and it could be a good block of work.
Flags: needinfo?(dtc-moz)
(In reply to Douglas Crosher [:dougc] from comment #25)
...
> It might be that these crashes are begin masked by other bugs.  Bug 860838
> broke the ARM JS compiler and this is still being resolved.  Let's see if
> the crashes return when the tree is in better shape.

Sorry, a correction: Bug 860838 is only expected to affect asm.js code, and this is not widely used, so it is not a plausible reason for a change in these crash reports.
Note that those crashes are both fixed on Aurora and Nightly.

(In reply to Douglas Crosher [:dougc] from comment #25)
> The patch would not help solve this bug - as I understand it. It might only
> help someone adding support for ARMv6 with no-VFP in future.
We have no reason to keep this bug open.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
(In reply to Scoobidiver from comment #27)
> We have no reason to keep this bug open.

Umm, why not? From all I understand, nothing has been fixed here and there is a good possibility that something has only hidden that crash behind something else.
(In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #28)
> Umm, why not? From all I understand, nothing has been fixed here
If something is missing about the ARMv6 support, a dedicated bug should be filed.

> there is a good possibility that something has only hidden that crash behind
> something else.
In Aurora too?
Attachment #745757 - Flags: review?
You need to log in before you can comment on or make changes to this bug.