Closed Bug 1452128 Opened 6 years ago Closed 5 years ago

Crash in mozJSComponentLoader::Import (armhf)

Categories

(Core :: XPConnect, defect, P5)

59 Branch
Unspecified
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is
report bp-e32e8f3b-e4b5-4282-88fc-f7a9f0180406.
=============================================================

Seen while looking at release crash stats: https://bit.ly/2q9FeSB. This signature affects 59.0.1 and 59.0.2 but I believe has been seen in other releases. Currently #22 top crash in 59.0.2. Linux only.

Many of the comments mention Firefox not working using ubuntu mate raspberry Pi 3.

Top 10 frames of crashing thread:

0 libxul.so libxul.so@0x85f8b0 
1 firefox firefox@0xaf4f 
2 libxul.so libxul.so@0x887b1f 
3 libxul.so libxul.so@0x887aeb 
4 libxul.so libxul.so@0x3c9b0b 
5 libxul.so libxul.so@0x3c9c67 
6 libxul.so libxul.so@0x175da43 
7 libxul.so libxul.so@0x339d95e 
8 libxul.so libxul.so@0x889625 
9 firefox firefox@0xaf4f 

=============================================================
All crashes are on ARM CPUs, but we don't get useful stacks :(
Product: Firefox → Core
Flags: needinfo?(mozillamarcia.knous) → needinfo?(ajones)
Anthony, any ideas here. You were referred to as someone who might have some ideas about these crashes on ARM?
There are lots of comments about Raspberry Pi and Ubuntu Mate not working (is this a configuration we support?) Also this signature is confined to 59.x. 751 crashes across all 59 versions - my guess is this is not something big enough to worry about, unless there are lots of other signatures similar to this one.
Component: General → XUL
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #4)
> There are lots of comments about Raspberry Pi and Ubuntu Mate not working
> (is this a configuration we support?) Also this signature is confined to
> 59.x. 751 crashes across all 59 versions - my guess is this is not something
> big enough to worry about, unless there are lots of other signatures similar
> to this one.

I'm going to assume that 59 is the only version available for Ubuntu Mate.
I guess Ubuntu Mate is not uploading symbols, which is why we don't have useful crash reports.
(In reply to Mike Hommey [:glandium] from comment #6)
> I guess Ubuntu Mate is not uploading symbols, which is why we don't have
> useful crash reports.

That's the most likely explanation if this is the packaged version of Firefox they ship. All frames have the '"missing_symbols": true' entry set in the Socorro dump which means we just couldn't find the symbols when symbolicating the stack. I'll look up the packager for Ubuntu Mate on the Pi and point him to this bug.
Are you guys only trying to resolve the symbol upload from Launchpad to Socorro? If so, Chris Coulson would be the one to contact about that but he has very hard to get ahold of. Users haven't found this a blocker on getting useful stack traces, because Launchpad still creates and uploads the firefox-dbg package. sudo apt install firefox-dbg

Or if the Mozilla team intends to actually *fix* the Firefox armhf crashes, that would be wonderful. Fatal crashes on startup for armhf are not limited to Firefox 59. Rather, this has been happening since Firefox 55 or further back. There has not been an official working up-to-date Firefox for Raspberry Pi for more than 8 months, and the response we received in bug 1391802 was that this is a tier-3 platform, i.e. do not expect it to work.

The stack traces, analyses, and workarounds for at least two startup crashes are covered in the Launchpad ticket: https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1711337 Known workarounds include forcing off SK_JUMPER_USE_ASSEMBLY or cross-compiling the whole browser with clang.
(In reply to jdonald.x from comment #8)
> Or if the Mozilla team intends to actually *fix* the Firefox armhf crashes,
> that would be wonderful.

If I can reproduce them locally I'd be glad to help.

> Fatal crashes on startup for armhf are not limited
> to Firefox 59. Rather, this has been happening since Firefox 55 or further
> back. There has not been an official working up-to-date Firefox for
> Raspberry Pi for more than 8 months, and the response we received in bug
> 1391802 was that this is a tier-3 platform, i.e. do not expect it to work.

In my eyes the fact that it's a tier-3 platform only means it might take a little more time to fix it.

> The stack traces, analyses, and workarounds for at least two startup crashes
> are covered in the Launchpad ticket:
> https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1711337 Known
> workarounds include forcing off SK_JUMPER_USE_ASSEMBLY or cross-compiling
> the whole browser with clang.

The signature here is for one specific crash. If there's more than one bug involved we should file them separately.
There is a long debate on launchpad about wether the startup crash is one common crash across all toolchains used or wether they are of a different cause. What I can say is that it is possible to compile at least firefox-60.0-beta16 (last I tried) with clang-5, and if reverting https://bugzilla.mozilla.org/show_bug.cgi?id=1238661 it works with gcc-7.3/glibc-2.25/binutils-2.29.1 as well. However, reverting https://bugzilla.mozilla.org/show_bug.cgi?id=1238661 does break clang compile. Compile done nativly, not cross.
(In reply to Gabriele Svelto [:gsvelto] from comment #9)
> If I can reproduce them locally I'd be glad to help.

This initial startup crash should be 100% reproducible on any Raspberry Pi running Raspbian or Ubuntu MATE (both 32-bit systems).

sudo apt install firefox
firefox
# immediately crashes at startup

This can be done on Raspberry Pi or any modern 32-bit ARM Linux system. gsvelto do you have the appropriate hardware on hand?

> The signature here is for one specific crash. If there's more than one bug
> involved we should file them separately.

Ok, let's keep this ticket focused on the first startup crash. The next crash (in Skia) when a window is about to appear does not get hit by users until they get past this startup crash.

(In reply to tt_1 from comment #10)
> There is a long debate on launchpad about wether the startup crash is one
> common crash across all toolchains used or wether they are of a different
> cause.

Very helpful, at least confirming how the answer to this debate is so unclear. I can also say that cross-compiling Firefox 59 to armhf via clang works. We have yet to see a recent gcc-5.4 build (Ubuntu 16.04) build make it past this crash, and some gcc 4.9 ones (Ubuntu 14.04) avoid it while others end up crashing at the same spot.
(In reply to jdonald.x from comment #11)
> This initial startup crash should be 100% reproducible on any Raspberry Pi
> running Raspbian or Ubuntu MATE (both 32-bit systems).
> 
> sudo apt install firefox
> firefox
> # immediately crashes at startup
> 
> This can be done on Raspberry Pi or any modern 32-bit ARM Linux system.
> gsvelto do you have the appropriate hardware on hand?

No, but I can get it quickly.

> Very helpful, at least confirming how the answer to this debate is so
> unclear. I can also say that cross-compiling Firefox 59 to armhf via clang
> works. We have yet to see a recent gcc-5.4 build (Ubuntu 16.04) build make
> it past this crash, and some gcc 4.9 ones (Ubuntu 14.04) avoid it while
> others end up crashing at the same spot.

What's the default compiler on that platform? We require GCC 6.1 on Linux for building the current development version (61) but version 59 was supposed to buildable with GCC 4.9.
(In reply to Gabriele Svelto [:gsvelto] from comment #12)
> What's the default compiler on that platform? We require GCC 6.1 on Linux
> for building the current development version (61) but version 59 was
> supposed to buildable with GCC 4.9.

Assuming you're referring to Ubuntu MATE for Pi, that would be gcc-5.4. It is possible to install gcc-6 via Apt and use that instead.

Some time ago I cross-compiled a test-build under Debian 9 with its default gcc-6.3, and the resulting binary encountered the same error.
(In reply to jdonald.x from comment #11)
> I can also say that cross-compiling Firefox 59 to armhf via clang
> works.

Given that our overall direction is toward clang, why don't we just use clang on Ubuntu Mate?
Flags: needinfo?(ajones)
That sounds sensible to me.

Note that Firefox being unable to start is not specific to Ubuntu MATE on armhf, but affects the majority of 32-bit ARM distros. The known exceptions are Arch Linux which seemed to avoid the problem via --disable-stylo (not recommended), and FreeBSD which already builds with clang.

This startup regression or the Skia armhf crash (apparently also solved with clang) begun around August 2017: https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=150438&start=45
Well, it is very much possible to build a binary for armhf/32bit which does not crash at startup. 
For this, it needs a gcc-7.3 toolchain (I skipped 6.4) and to work out this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1434526
The only solution at the moment is to revert https://bugzilla.mozilla.org/show_bug.cgi?id=1238661, the compile passes with gcc-7.3 - but at the cost of breaking the compile with clang (tested with clang-5)

So either revert https://bugzilla.mozilla.org/show_bug.cgi?id=1238661 and build with gcc-7.3, or go for clang straight away.
Component: XUL → General
Well need to put this in the proper component, @emma: suggestions?
Flags: needinfo?(ehumphries)
Because we're missing symbols, and this is not a build which we do (the distros build and package it,) I'm not sure where this one goes. 

It does sound like there's an argument for building on Arm64 (which I'd like for my own selfish Pi fangirl reasons) so this becomes a RelEng matter, probably in a separate bug.

Unless we can get symbols, I don't know what we can do here until we get a supported build.
Flags: needinfo?(ehumphries)
Summary: Crash in libxul.so@0x85f8b0 | firefox@0xaf4f → Crash in libxul.so@0x85f8b0 | firefox@0xaf4f (Arm64)
(In reply to tt_1 from comment #19)
> Are you looking for this?

Yes but for our crash infrastructure to properly analyze the crashes these need to be uploaded to our symbol servers. It's something that should be done at packaging time so that the symbols are up-to-date with the build, see here for more info: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Uploading_symbols_to_Mozillas_symbol_server
Well, I'm not one of the maintainers but just a normal user. I might be able to upload a smaller file for once, but these debug stuff is a bit more than a Gigabyte of raw data. Nothing I can upload anywhere with my super slow connection, and the maintainers are not very responsive either.
> It's something that should be done at packaging time

It doesn't have to be. I've uploaded symbols from debug symbol packages for Debian. My script is Debian-specific, though. The other half of the problem is that our crash infrastructure can't easily deal with reprocessing crashes once we do have the symbols, so we'd need to hope new crashes come in, which they probably would, but otoh, the new crashes would then have a totally different signature, and would be hard to find, until they get a critical mass.
@emma can you help us get this to the proper component?
Flags: needinfo?(ehumphries)
If I understand the state of play now, we need symbols uploaded so we can look at the crash and understand where it is happening. 

We have symbols from Debian on ARM, but not to my knowledge for Ubuntu Mate.
Flags: needinfo?(ehumphries)
I'm going to take a stab at a better component (Product: Firefox Build System > Component: General: Unsupported Platforms). Also, P5 seems best for a tier 3 platform (https://developer.mozilla.org/en-US/docs/Mozilla/Supported_build_configurations).
Component: General → General: Unsupported Platforms
Priority: -- → P5
Product: Core → Firefox Build System
(In reply to Emma Humphries, Bugmaster ☕️🎸🧞‍♀️✨ (she/her) [:emceeaich] (UTC-8) needinfo? me from comment #24)
> If I understand the state of play now, we need symbols uploaded so we can
> look at the crash and understand where it is happening. 
> 
> We have symbols from Debian on ARM, but not to my knowledge for Ubuntu Mate.

I can reproduce what appears to be this bug on Debian ARM; I've just sent in a crash report (with this bug number mentioned in the crash report description so that it's hopefully easy to find).  Let me know if I can do anything else to help debug this.
The bug hits firefox on debian stretch armhf (on odroid xu4). the messages i get in the terminal where is start firefox is

SendContinuousSignalToChild
Wait for continuous signal

previous versions of firefox on an odroid xu4 were very usable; please fix this bug!
Unfortunately we still don't have symbols for these ARM builds. We can't do much about this until the package maintainers start uploading symbols for their builds.
I have now installed firefox 70.0.3538.67-2 from testing on stretch on armhf (odroid xu4) - it seems to have fixed the bug (i.e. it starts and runs - no extensive testing, but it did not start before and now starts). perhaps other check if the bug is fixed in -2.
The latest major Firefox release is 63 and nightly is at 64.0a1. Isn't 70.0.3538.67-2 a version of Chromium?
FYI I got in touch with the package maintainers so hopefully we'll have symbols soon.
I cross compiled FFesr 60.4 for RPi3 [32 bit] with gcc and clang. In both cases, firefox segfaulted immediately when run.

But it ran using libxul.so which had been stripped, but not elfhack'd.

So all I needed to do was build with --disable-elf-hack, even though elfhack hadn't actually failed during the build.

Also had to build without stylo because of this error which I haven't found a fix for:

error: failed to run custom build command for `style v0.0.1 (/mozbuild/mozilla/servo/components/style)`
process didn't exit successfully: `/mozbuild/esr60/firefox-build-dir/toolkit/library/release/build/style-d0a15782b59421df/build-script-build` (exit code: 101)'

@Ray thanks for looking into that and making a working ESR build.

--disable-elf-hack is enabled by default for Ubuntu's mozconfig, so I have actually never tested with the ELF hack enabled.

The most common startup crash (although not the only one) that began prior to Firefox 57 is specifically in Skia used by Stylo. Unfortunately if you build with --disable-stylo you're avoiding the problem at quite an expense. You can read about that in detail in bug 1711337 which I linked above. As for troubleshooting your stylo build problem, you may need to look earlier in your log to find the specific compiler error.

Assuming you're targeting Debian Stretch, are you using gcc 6.3? That's good news if you made something functional using a compiler older than gcc 7, even if it's without Skia/Stylo. Lately I've resorted purely to clang for the most reliable builds. Here are my steps last tested on Firefox 64.0: https://github.com/jdonald/firefox-armhf

@gsvelto any word on symbol uploads from the distro maintainers or other ways to unblock armhf debugging?

Why did the title of this ticket change to say "(Arm64)"? It was originally just "Crash in libxul.so@0x85f8b0 | firefox@0xaf4f" and all of the conversation is around armhf 32-bit. The arm64 build is not known to have the startup crash issues described here.

(In reply to jdonald.x from comment #33)

Why did the title of this ticket change to say "(Arm64)"? It was originally just "Crash in libxul.so@0x85f8b0 | firefox@0xaf4f" and all of the conversation is around armhf 32-bit. The arm64 build is not known to have the startup crash issues described here.

Is the Raspian distribution of ESR 32 bit or 64 bit?

Is this a problem that can be solved by Raspian distributing a 64 bit build by default? That would exclude first generation devices, but most Pi are now Bs and B+, and the Zeros are not intended to be used as desktops?

(In reply to Emma Humphries, Bugmaster ☕️🎸🧞‍♀️✨ (she/her) [:emceeaich] (UTC-8) needinfo? me from comment #34)

Is the Raspian distribution of ESR 32 bit or 64 bit?

Raspbian sources do not provide firefox-esr, possibly because it's just broken. Debian provides both armhf and arm64 firefox-esr, with the armhf stretch one crashing as described.

64 bit build by default? That would exclude first generation devices

This would exclude second- and 1.5-generation devices (Pi 2B v1.1, B+, A+) as well, while the Pi 3B, 3B+, 3A+ and 2B v1.2 have 64-bit CPUs.

Do you have crash ids for stretch armhf firefox-esr crashes? I can upload symbols for those are reprocess the crashes to get better information.

Just submitted one: https://crash-stats.mozilla.com/report/index/31de8be9-7c42-4b38-9082-5b7f00190108

Steps to repro on Raspbian:

sudo apt install dirmngr
echo deb http://security.debian.org/ stretch/updates main contrib non-free | sudo tee /etc/apt/sources.list.d/debian.list
sudo apt-key adv --recv-key --keyserver keyserver.ubuntu.com 8B48AD6246925553
sudo apt update
sudo apt install -y firefox-esr
firefox-esr # crash

I uploaded the symbols for current debian stretch firefox-esr, as well as the symbols for the original crash in this bug (was from 59.0.2+build1-0ubuntu0.16.04.3. They turn out to be different crashes. I'm modifying the bug title to reflect the original crash signature. As for https://crash-stats.mozilla.com/report/index/31de8be9-7c42-4b38-9082-5b7f00190108 , it would seem to be bug 1309328.

Component: General: Unsupported Platforms → XPConnect
Product: Firefox Build System → Core
Summary: Crash in libxul.so@0x85f8b0 | firefox@0xaf4f (Arm64) → Crash in mozJSComponentLoader::Import (armhf)
See Also: → 1309328

No crashes in recent builds. Resolving as WFM.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

As described in bug 1309328, this is one of many crash signatures caused by the AAPCS armhf change in gcc 5 & 6 mismatched with the XPCOM asm code. On a Raspberry Pi 4 I just tested Firefox ESR 60.6 (the last one available on Debian Stretch armhf) and Firefox 69.0 for Ubuntu 16.04 LTS Xenial armhf. Still all crashing on startup. Here's a link to a crash report: https://crash-stats.mozilla.com/report/index/0e8d1344-cd80-463f-a5c4-d0a850190830

This was quite a pain for the last two years, but users are more willing to accept WNF now that Debian/Raspbian Buster is out. The problem goes away upon upgrading to distros with gcc 8+.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
You need to log in before you can comment on or make changes to this bug.