Closed Bug 1391802 Opened 7 years ago Closed 6 years ago

Crash in DoTypeMonitorFallback on startup on Ubuntu armhf (many chip versions) in version 55

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

55 Branch
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jeff, Unassigned)

Details

(Keywords: crash)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux armv7l; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160414070736

Steps to reproduce:

Start Firefox (simple enough with just "firefox" on the command-line).


Actual results:

Immediate crash:
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...



Expected results:

Firefox should have started.
More details:

Version tested:  55.0.2 (Ubuntu package:  55.0.2+build1-0ubuntu0.16.04.1)

Previous stable release personally tested:  52.0.2.  I have heard other users claim that version 54 was stable and that this problem is new in version 55.

I also installed the firefox-dbg package to capture some debugger output (running as "firefox -g"):

Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
js::jit::DoTypeMonitorFallback (cx=0xb6a6b000, frame=0xbeffb8a8, 
    stub=0xab264010, value=..., res=...)
    at /build/firefox-_s6XUY/firefox-55.0.2+build1/js/src/jit/SharedIC.cpp:2370
2370	/build/firefox-_s6XUY/firefox-55.0.2+build1/js/src/jit/SharedIC.cpp: No such file or directory.

(gdb) bt
#0  js::jit::DoTypeMonitorFallback (cx=0xb6a6b000, frame=0xbeffb8a8, 
    stub=0xab264010, value=..., res=...)
    at /build/firefox-_s6XUY/firefox-55.0.2+build1/js/src/jit/SharedIC.cpp:2370
#1  0x42f2fb78 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0xb6ff6000 (LWP 28840) "firefox" js::jit::DoTypeMonitorFallback (
    cx=0xb6a6b000, frame=0xbeffb8a8, stub=0xab264010, value=..., res=...)
    at /build/firefox-_s6XUY/firefox-55.0.2+build1/js/src/jit/SharedIC.cpp:2370
  2    Thread 0xb21ef450 (LWP 28848) "gmain" 0xb6d76b90 in poll ()
    at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0xb19ef450 (LWP 28849) "gdbus" 0xb6d76b90 in poll ()
    at ../sysdeps/unix/syscall-template.S:84
  4    Thread 0xb0fff450 (LWP 28858) "Gecko_IOThread" syscall ()
    at ../sysdeps/unix/sysv/linux/arm/syscall.S:38
  6    Thread 0xad671450 (LWP 28860) "Timer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  7    Thread 0xac2ff450 (LWP 28861) "Link Monitor" 0xb6d76b90 in poll ()
    at ../sysdeps/unix/syscall-template.S:84
  8    Thread 0xabaff450 (LWP 28862) "Socket Thread" 0xb6d76b90 in poll ()
    at ../sysdeps/unix/syscall-template.S:84
  9    Thread 0xab1ff450 (LWP 28863) "JS Watchdog" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  10   Thread 0xaa9ff450 (LWP 28864) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  11   Thread 0xaa7ff450 (LWP 28865) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  12   Thread 0xaa5ff450 (LWP 28866) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  13   Thread 0xaa3ff450 (LWP 28867) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  14   Thread 0xaa1ff450 (LWP 28868) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  15   Thread 0xa9fff450 (LWP 28869) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  16   Thread 0xa9dff450 (LWP 28870) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  17   Thread 0xa9bff450 (LWP 28871) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  18   Thread 0xa99ff450 (LWP 28872) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  19   Thread 0xa97ff450 (LWP 28873) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  20   Thread 0xa95ff450 (LWP 28874) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  21   Thread 0xa93ff450 (LWP 28875) "JS Helper" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  22   Thread 0xa8dff450 (LWP 28876) "Hang Monitor" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  23   Thread 0xa85ff450 (LWP 28877) "firefox" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47

(gdb) info registers
r0             0xb6a6b000	3064377344
r1             0x0	0
r2             0x0	0
r3             0xffffff82	4294967170
r4             0xa906acc0	2835786944
r5             0xab264010	2871410704
r6             0xbeffb8a8	3204429992
r7             0xb6a6b000	3064377344
r8             0xab22e024	2871189540
r9             0xbeffb880	3204429952
r10            0xb54b2b00	3041602304
r11            0xbeffb8cc	3204430028
r12            0xb4a15e4d	3030474317
sp             0xbeffb800	0xbeffb800
lr             0x0	0
pc             0xb4a15eb2	0xb4a15eb2 <js::jit::DoTypeMonitorFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICTypeMonitor_Fallback*, JS::HandleValue, JS::MutableHandleValue)+102>
cpsr           0x800d0030	-2146631632


I had to revert to the only other version managed by apt (45) but I can switch between the 2 versions without too much difficulty if you need me to collect some more information.
Component: Untriaged → JavaScript Engine: JIT
Keywords: crash
Product: Firefox → Core
This is definitely a problem and checking the odroid forums should provide you enough evidence to confirm that this bug exists.

I have confirmed it on four different xu4s.
I should add these crashes are occurring with the linux 4.9 kernel with egl not opengl.
Summary: Crash on startup on Ubuntu 16.04 armhf (Odroid XU4) in version 55 → Crash in DoTypeMonitorFallback on startup on Ubuntu 16.04 armhf (Odroid XU4) in version 55
I also have the same crash.  I went back to v 45, but then did an update of another piece of S/W and it was replaced.  I then purged v 55 again then reloaded v 45 and also did an apt-mark hold Firefox.  Let's hope that works until it is corrected.
Let's get it fixed Mozilla!
I don't think we have anyone working on Linux arm support. I suggest you disable the JITs in about:config.
How do you get to about:config if the browser crashes???  Is there a cmdline capability to do this.  Also note it crashes on startup in "safe-mode" as well.  No JIT in safe-mode.
Since we haven't received these problem reports on arm-linux in general this could be a problem with the specific device.  This is totally speculative, but I see we have a workaround in our code for another specific Exynos where we flush the icache twice to circumvent a bug on the chip.  If we could repro locally we should at least enable that hack to see if it fixes the problem.

(jit/arm/Architecture-arm.cpp, grep for Exynos)
Mozilla, any new info re: fix or recreate?
I don't think you should expect swift action on this.  ARM Linux (other than Android) is a tier-3 platform for us and we do not dedicate resources to it, depending instead on distros to support it and submit patches: https://developer.mozilla.org/en/docs/Supported_build_configurations.
It's not just happening on the XU4 (Samsung Exynos5422 Cortex-A15).  I'm also experiencing it on all my Orange Pi boards (Allwinner H3  Cortex A7).  Other users are reporting it on the Raspberry Pi 3 (Broadcom BCM2837 Cortex A-53).  All are running Ubuntu 16.04.  See https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1711337.

My guess the only reason you're not hearing more Raspberry Pi users complaining about this is because Raspbian (the default distribution on the RPi3, which is a modified version of Debian Stretch) uses v 52.

Firefox v 54 was working for me.  The upgrade to the v 55 started to cause it to break.(In reply to Lars T Hansen [:lth] from comment #8)
> Since we haven't received these problem reports on arm-linux in general this
> could be a problem with the specific device.  This is totally speculative,
> but I see we have a workaround in our code for another specific Exynos where
> we flush the icache twice to circumvent a bug on the chip.  If we could
> repro locally we should at least enable that hack to see if it fixes the
> problem.
> 
> (jit/arm/Architecture-arm.cpp, grep for Exynos)

(In reply to Lars T Hansen [:lth] from comment #8)
> Since we haven't received these problem reports on arm-linux in general this
> could be a problem with the specific device.  This is totally speculative,
> but I see we have a workaround in our code for another specific Exynos where
> we flush the icache twice to circumvent a bug on the chip.  If we could
> repro locally we should at least enable that hack to see if it fixes the
> problem.
> 
> (jit/arm/Architecture-arm.cpp, grep for Exynos)
More Raspberry Pi 3 users are reporting the same problem at https://ubuntu-mate.community/t/firefox-55-0-2-doesnt-start-crashes-on-ubuntu-mate-raspberrypi-3/14637.
OK, noted.
OK, I can repro a crash locally on Ubuntu 14.04, NVidia Jetson TK-1, Firefox 55.0.2.  It's not an out-of-memory problem because I had plenty of headroom and 2GB swap.  I don't yet have any call stacks or other diagnostic information.  It could be days before I have time to look any closer.  (No matter how we slice it the platform is still tier-3.)
Summary: Crash in DoTypeMonitorFallback on startup on Ubuntu 16.04 armhf (Odroid XU4) in version 55 → Crash in DoTypeMonitorFallback on startup on Ubuntu armhf (many chip versions) in version 55
Status: UNCONFIRMED → NEW
Ever confirmed: true
So this started in 55? I don't have a Linux ARM machine but if someone could use mozregression to bisect this that would be very useful, see http://mozilla.github.io/mozregression/quickstart.html
(In reply to Jan de Mooij [:jandem] from comment #15)
> So this started in 55? I don't have a Linux ARM machine but if someone could
> use mozregression to bisect this that would be very useful, see
> http://mozilla.github.io/mozregression/quickstart.html

This probably doesn't work if there are no offical builds? In any case, bisecting this on mozilla-central would be a great start.
Priority: -- → P3
Any news on this? 

Arch linux fixed the bug with passing --enable-optimize="-g -O2 -fno-schedule-insns" during compilation, I tested it at runtime on a rpi2 and no segfault occurs. Ubuntu tried to do the same, but failed to fix this at runtime for 14.04 at least. Fedora has the same fix in the repos, but their packet manager is so crippled that I couldn't find a way to install the latest version of firefox. 

All of them claim it is a compiler bug, are there any hints if this means gcc or rust, or maybe clang? Any bug reports anywhere for further informations?
The compiler bug would have to be in gcc or clang, as there is no rust code yet in SpiderMonkey and I think we all assume that the problem is in SpiderMonkey.

If I read our build logs correctly we use GCC 6.4.0 for building 32-bit linux builds (though x86, of course).  Last I looked on Ubuntu 14 on ARM the standard C++ compiler was technically too old for Firefox (4.8, IIRC).
I just checked - while it is true that the major gcc version of trusty/ubuntu 14.04 is gcc-4.8, they use gcc-4.9.4 to build their trusty firefox binary packages. I had been surprised were they using the gcc-4.8, there is propably a reason for mozilla to not support 4.8 any longer and the ubuntu team usually doesn't write patches on their own to fix such issues if an upgrade of a dependency would simply solve it ;-) 

Arch Linux uses gcc-7.1.1 to compile their firefox arm port, for fedora it is either 7.1 or 7.2 - not sure, since I don't have fedora up and running on my arm device. 

Meanwhile I found a bug report from fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1426850), but without any further informations. 

Talking about we, you propably mean the toolchain mozilla uses to compile their binaries for linux-i686 and linux-amd64, right? 

Thanks for pointing out that the problem likely is to be found in SpiderMonkey, I wasn't aware of that. Has anyone tried to shuffle through the commits to find anything supsicious?
(In reply to tt_1 from comment #19)

Thanks for digging up this information!

> Talking about we, you propably mean the toolchain mozilla uses to compile
> their binaries for linux-i686 and linux-amd64, right? 

Yes.

> Thanks for pointing out that the problem likely is to be found in
> SpiderMonkey, I wasn't aware of that. Has anyone tried to shuffle through
> the commits to find anything supsicious?

Not to my knowledge.  As I mentioned earlier (comment 10) we basically do not devote any resources to ARM-Linux, beyond Android.

My initial guess would have been that our JITs are doing something wrong on ARMHF-devices; I believe Android uses the softfp ABI on all platforms still (but I have spent no time lately trying to verify that).  But the data you posted in comment 17 points more towards a C++ compiler bug.  So now I am clueless :)
Now that several users have confirmed being able to run Firefox 57 on armhf with workarounds (see https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1711337) the original JIT issue appears to have been fixed. I've tested and can load JavaScript pages with javascript.options.baselinejit set to true.

Lars, should this bug be marked as resolved? We could then open a new ticket for the Skia crash + other fatal regressions in Firefox 58 armhf.

> How do you get to about:config if the browser crashes?

mfkyle, for future reference you can set startup flags by editing ~/.mozilla/firefox/*/prefs.js
Closing this since it appears to have been a C++ compiler instruction scheduling issue and the problem is no longer observed in the JS engine.  Marking "INCOMPLETE" since we don't have call stacks or truly actionable information though see cited bugs above for more data about the crash.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.