Closed Bug 1207642 Opened 4 years ago Closed 4 years ago

Crash at hex address on Android x86 from JNI stack misalignment

Categories

(Firefox for Android :: General, defect)

Unspecified
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 44
Tracking Status
firefox40 --- unaffected
firefox41 - wontfix
firefox42 + fixed
firefox43 + fixed
firefox44 + fixed
fennec 41+ ---

People

(Reporter: kbrosnan, Assigned: jchen)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash-android-x86)

Crash Data

Attachments

(5 files, 1 obsolete file)

We have a crash that is taking up 27 of the top 50 spots after a day of reporting. All the crashes are at a hex address.
Crash Signature: [ @0x82c27a8c] [ @0x69127a8c] [ @0x6ad27a8c] [ @0x69227a8c] [ @0x6a727a8c] [ @0x6ac27a8c] [ @0x4007ec79] [ @0x69627a8c] [ @0x83227a8c] [ @0x828a3935] [ @0x82d27a8c] [ @0x5da27a8c] [ @0x6a827a8c] [ @0x83127a8c] [ @0x6b227a8c] [ @0x69327a8c] … → [@ @0x82c27a8c] [@ @0x69127a8c] [@ @0x6ad27a8c] [@ @0x69227a8c] [@ @0x6a727a8c] [@ @0x6ac27a8c] [@ @0x4007ec79] [@ @0x69627a8c] [@ @0x83227a8c] [@ @0x828a3935] [@ @0x82d27a8c] [@ @0x5da27a8c] [@ @0x6a827a8c] [@ @0x83127a8c] [@ @0x6b227a8c] …
We had this reported recently in #mobile.  See also Bug 1201310 and particularly Bug 1031657, which includes crashreport links for x86.
tracking-fennec: --- → ?
Assignee: nobody → snorp
Status: NEW → ASSIGNED
tracking-fennec: ? → 41+
Snorp, you were going to order a MeMoPad. Do you have an order in?
Ioana would you work with the SV team to test this on x86 devices. I'll send you URLs from the crash reports, though we don't know if URLs are key to reproduce this.
Flags: needinfo?(ioana.chiorean)
Attached file x86_crash_logs
Was able to reproduce this crash just browsing on facebook.com, or http://www.ayzdorov.ru/lechenie_ykys_blohi.php, or just about:crashes. No specific steps, just panning and zooming.
Device: ZTE Grand X In (Android 4.0.4)
Crash: https://crash-stats.mozilla.com/report/index/fc1f91ad-fa96-4309-bf86-55c522150929

Attaching logcat
Flags: needinfo?(ioana.chiorean)
Keywords: crash
Using a ME302C, Firefox crashes very often (mostly on first or second visited page) and is not usable. It seems to me that the crashes occur when scrolling or loading a page. If I stay idle on a loaded page it does not seem to crash. Previous version was not crashing, Fx Beta is crashing the same way.
Duplicate of this bug: 1210288
Snorp, do we better understand this issue now? This was mentioned in channel meeting as a top crash on Mobile. It would be nice if we had a potential fix in the works soon.
Flags: needinfo?(snorp)
KaiRo, I was looking at the crash-state page and this report https://crash-stats.mozilla.com/report/list?product=FennecAndroid&range_value=7&range_unit=days&date=2015-10-07&signature=%400x82c27a8c&version=FennecAndroid%3A41.0. 

Based on product info, does it mean this is not happening on 41.0.1 or is product version 41.0 the same as 41.0.1 now?
Flags: needinfo?(kairo)
Firefox for Android did not build a 41.0.1.
Right, there is no 41.0.1 for Android - yet. ;-)
Flags: needinfo?(kairo)
Margaret, Snorp seems busy, do you know who else could help? thanks
Flags: needinfo?(margaret.leibovic)
This is a causing a huge spike in term of crashes.
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> Margaret, Snorp seems busy, do you know who else could help? thanks

Snorp is PTO, but I think he should be back Monday. But maybe jchen can help take a look or redirect in the meantime...
Flags: needinfo?(margaret.leibovic) → needinfo?(nchen)
We suspect the random hex address is a problem with breakpad not generating valid crash reports. Bug 1069556 may fix that; it wouldn't fix the underlying cause of the crashes but would let us identify the crashes better.
Depends on: 1069556
Flags: needinfo?(nchen)
A very similar crash affects Samsung GT-i9195/European LTE devices (I know because v41.0 was delivered by autoupdate this evening). Firefox starts up, but crashes after idling on the home screen for perhaps a second or so. Failure is too quick to permit accessing any menus. Crashes, sends crash report, crashes.

FF is installed on the SD card, claims to have 57MB of "data" somewhere in device storage but I'm unwilling to wipe this for fear of destroying bookmarks etc. The GT-i9195 is not an x86 device and doesn't have an Imagination Technologies CPU either so this is probably a red herring. (Qualcomm Snapdragon 400/Adreno 305, ARM arch.)
Sorry, GPU. The point stands; the GPU is a Qualcomm Adreno 305.
Margaret, Jim, there is a small chance that I will do another dot release for 41 in a day or two. Based on comment 16 and the fact that this is a top-crash on Fennec, do you think we will have a fix ready for uplift soon (today at best)?
Flags: needinfo?(nchen)
Flags: needinfo?(margaret.leibovic)
I have the ASUS device in question and can't get it to crash. I'm going to try applying the system update and see if that changes things.
Flags: needinfo?(snorp)
(In reply to Ritu Kothari (:ritu) from comment #19)
> Margaret, Jim, there is a small chance that I will do another dot release
> for 41 in a day or two. Based on comment 16 and the fact that this is a
> top-crash on Fennec, do you think we will have a fix ready for uplift soon
> (today at best)?

It doesn't look like there's been significant progress here, so I'm skeptical we'll have a patch soon. I defer to snorp.
Flags: needinfo?(margaret.leibovic)
Attached video asus2.mov
Attached a small video, in case it can help. The crash can be triggered quite easily by hiding/displaying the address bar (when scrolling) once or twice
I looked at a Nightly crash dump and the crash was due to a SSE2 instruction accessing misaligned data, due to a misaligned stack pointer. The stack stays in libxul then goes to libdvm, so it's likely that the stack misalignment happened when libdvm calls the JNI entry point in libxul. In particular, this commit in Dalvik [1] from Honeycomb was about fixing a stack misalignment bug, so I think some x86 devices don't have this fix in their Dalvik builds (sigh). I think we can work around the Dalvik bug by telling GCC to fix the stack manually. 

[1] https://github.com/android/platform_dalvik/commit/4570ad0a7706d3338d58bd0204e102719e4d68fb
Flags: needinfo?(nchen)
Assuming comment 24 is the real cause of the crashes, this patch should work.
It uses the force_align_arg_pointer attribute to force realigning stack at JNI
entry points. Because of recent JNI changes, We will need separate patches for
aurora and beta.
Attachment #8675838 - Flags: review?(snorp)
Attachment #8675838 - Flags: review?(snorp) → review+
Jim, I guess we want that to land asap.
Could you fill the uplift request to aurora and beta? Thanks
Flags: needinfo?(nchen)
Keywords: checkin-needed
Attached patch Patch for AuroraSplinter Review
Approval Request Comment

[Feature/regressing bug #]: N/A

[User impact if declined]: Random crashes on x86 devices

[Describe test coverage new/current, TreeHerder]: Tested locally

[Risks and why]: Very small; patch makes trivial changes to compiled code

[String/UUID change made/needed]: None
Attachment #8676417 - Flags: review+
Attachment #8676417 - Flags: approval-mozilla-aurora?
Attached patch Patch for BetaSplinter Review
Approval Request Comment

[Feature/regressing bug #]: N/A

[User impact if declined]: Random crashes on x86 devices

[Describe test coverage new/current, TreeHerder]: Tested locally

[Risks and why]: Very small; patch makes trivial changes to compiled code

[String/UUID change made/needed]: None
Attachment #8676418 - Flags: review+
Attachment #8676418 - Flags: approval-mozilla-beta?
this failed to apply:

renamed 1207642 -> Bug-1207642---Work-around-Dalvik-bug-by-realigning.patch
applying Bug-1207642---Work-around-Dalvik-bug-by-realigning.patch
patching file mozglue/android/nsGeckoUtils.cpp
Hunk #2 FAILED at 68
1 out of 2 hunks FAILED -- saving rejects to file mozglue/android/nsGeckoUtils.cpp.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and refresh Bug-1207642---Work-around-Dalvik-bug-by-realigning.patch
Flags: needinfo?(snorp)
Keywords: checkin-needed
Jim can you fix this up for uplift
Flags: needinfo?(snorp) → needinfo?(nchen)
Use the force_align_arg_pointer attribute to force realigning stack at
JNI entry points.
Attachment #8676891 - Flags: review+
Attachment #8675838 - Attachment is obsolete: true
Flags: needinfo?(nchen)
Keywords: checkin-needed
Summary: Crash at hex address in Android 41 → x86 only - Crash at hex address in Android
Too far in the 41 cycle to do a new dot release, tracking for 42 and we will be doing a 48 beta 9 for this.
Assignee: snorp → nchen
Summary: x86 only - Crash at hex address in Android → Crash at hex address on Android x86 from JNI stack misalignment
https://hg.mozilla.org/mozilla-central/rev/0f85afd327f2
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 44
Comment on attachment 8676417 [details] [diff] [review]
Patch for Aurora

trying to land that again!
Attachment #8676417 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8676418 [details] [diff] [review]
Patch for Beta

Should be in 42 beta 9.
Attachment #8676418 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I've verified this on Firefox 42 Beta 9 on ZTE Grand X In (Android 4.0.4):
I wasn't able to reproduce this issue following the steps from comment 6, it seems to be fixed, however we might want to check this in crash-stats to be sure. Marking this verified after checking crash-stats.
Jim, James, any idea ? :/
Status: RESOLVED → REOPENED
Flags: needinfo?(snorp)
Flags: needinfo?(nchen)
Resolution: FIXED → ---
Unfortunately our device it is a ZTE Grand X In with Android 4.0.4 ( API 15). As I can see the crashes reproduce on API 19 only ( no ZTE listed there either.) So it might be fine for our device. Is there anyone with a API 19 x86 device?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> Sigh, this seems to still happen in 42.0b9: See e.g.
> https://crash-stats.mozilla.com/report/list?signature=%400x4007ed69 or
> https://crash-stats.mozilla.com/report/list?signature=%400x4007ec79

These two signatures are abort() calls (crash address is 0xdeadbaad). These shouldn't be as frequent as the other ones, and are not covered by the fix.
Flags: needinfo?(nchen)
Jim, could you open a new bug to follow up with this one? Thanks
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(nchen)
Resolution: --- → FIXED
Flags: needinfo?(nchen)
You need to log in before you can comment on or make changes to this bug.