Startup crash with OpenSUSE-built Firefox 114.x due to unsupported CPU instruction, not bug #1837201, Mozilla-built Firefox is fine.
Categories
(Firefox Build System :: Third Party Packaging, defect)
Tracking
(firefox-esr102 unaffected, firefox-esr115 fixed, firefox114 wontfix, firefox115 wontfix, firefox116 fixed, firefox117 fixed)
People
(Reporter: kaykaykay123, Assigned: lsalzman)
References
(Regression)
Details
(4 keywords)
Crash Data
Attachments
(1 file)
|
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr115+
|
Details | Review |
Steps to reproduce:
Download/install Firefox 114 or 114.0.1 and run.
Actual results:
Startup crash.
Expected results:
Ordinary run & work.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Firefox::Downloads Panel' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
ILL bug persist only on the newest Linux distributions, such as openSUSE Tumbleweed.
openSUSE bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101
Possibly this is not Firefox bug, but gcc or compile options or something other.
Info from "Crash Report" - "Raw Data and Minidumps":
"crash_info": {
"address": "0x00007fe6fbe912e0",
"assertion": null,
"crashing_thread": 0,
"instruction": "vshufps xmm0, xmm0, xmm0, 0x0",
"memory_accesses": [ ],
"type": "SIGILL / ILL_ILLOPN"
Instruction vshufps is a SSE instruction coded in AVX notation, that is why it requires AVX support: https://www.felixcloutier.com/x86/shufps
CPUs without AVX support will hang.
Hint: older Intel Celerons & Pentiums don't support AVX, it begins with Alder Lake for desktops: https://en.wikipedia.org/wiki/List_of_Intel_Celeron_processors
Intel Atom CPUs support - since Gracemont microarchitecture (end of 2022 year): https://en.wikipedia.org/wiki/Gracemont_(microarchitecture)
Users need SSE instruction in SSE coding.
Instead of AVX-styled
vshufps xmm0, xmm0, xmm0, 0x0
I expect SSE-styled
shufps xmm0, xmm0, 0x0
Status confirmed - not a Firefox issue, because Firefox from tar from Mozilla works OK: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101#c31
ILL as gcc or code optimiser have changed instruction. With that change code works faster on compatible CPUs by using AVX, avoiding penalties for hopping between ordinary and VEX coding scheme https://en.wikipedia.org/wiki/VEX_prefix
Updated•2 years ago
|
Updated•2 years ago
|
Comment 6•2 years ago
|
||
I've updated the symbol scraping scripts and I'm now running them by hand. With some luck we'll have symbols for this crash soon enough.
Comment 7•2 years ago
|
||
Here's the proper signature for this crash
Updated•2 years ago
|
Comment 8•2 years ago
|
||
Set release status flags based on info from the regressing bug 1821512
:lsalzman, since you are the author of the regressor, bug 1821512, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
(In reply to Gabriele Svelto [:gsvelto] from comment #7)
Here's the proper signature for this crash
I guess you've already got it, but in case not: we had another report of it downstream in Gentoo at https://bugs.gentoo.org/908412 and there's a mostly-complete (first 13 frames) there:
#0 skvx::Vec<4, float>::VecStorage(float) (this=this@entry=0x7fffffff8eb0, this=<optimized out>) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/base/SkVx.h:274
#1 0x00007ffff19d7402 in skvx::operator*<4, float, float>(skvx::Vec<4, float> const&, float) (x=..., y=<optimized out>)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/base/SkVx.h:539
#2 0x00007ffff19c8ed6 in map_rect_affine (mat=<optimized out>, src=...) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkM44.cpp:151
#3 SkMatrixPriv::MapRect(SkM44 const&, SkRect const&) (m=<optimized out>, src=...) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkM44.cpp:220
#4 0x00007ffff19c8c24 in SkCanvas::computeDeviceClipBounds(bool) const (this=this@entry=0x7fffe2603100, outsetForAA=outsetForAA@entry=true)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkDevice.h:143
#5 0x00007ffff19c8b13 in SkCanvas::init(sk_sp<SkBaseDevice>) (this=this@entry=0x7fffe2603100, device=...)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkCanvas.cpp:425
#6 0x00007ffff19c8340 in SkCanvas::SkCanvas(SkBitmap const&, SkSurfaceProps const&) (props=..., bitmap=..., this=0x7fffe2603100)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkCanvas_Raster.cpp:31
#7 SkSurface_Raster::onNewCanvas() (this=0x7fffde611ba0) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface_Raster.cpp:76
#8 0x00007ffff19c8215 in SkSurface_Base::getCachedCanvas() (this=0x7fffde611ba0) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface_Base.h:210
#9 0x00007ffff19c7a66 in SkSurface::getCanvas() (this=<optimized out>) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface.cpp:80
#10 mozilla::gfx::DrawTargetSkia::Init(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat)
(this=this@entry=0x7fffe27e2400, aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/2d/DrawTargetSkia.cpp:1748
#11 0x00007ffff19c7833 in mozilla::gfx::Factory::CreateDrawTarget(mozilla::gfx::BackendType, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat)
(aBackend=mozilla::gfx::BackendType::SKIA, aSize=..., aFormat=mozilla::gfx::SurfaceFormat::B8G8R8A8) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/2d/Factory.cpp:388
#12 0x00007ffff19c76ef in gfxPlatform::CreateOffscreenContentDrawTarget(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, bool)
(this=<optimized out>, aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8, aFallback=aFallback@entry=false)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/thebes/gfxPlatform.cpp:1684
#13 0x00007ffff19bf147 in gfxPlatform::Init() () at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/thebes/gfxPlatform.cpp:985
| Assignee | ||
Comment 10•2 years ago
•
|
||
My offhand guess as to why our official builds work and these distro builds don't is they're building with AVX flags in the binary. This causes SkVx to include them in the build and use them regardless.
Official builds on x86 are only built with SSE2, with higher SIMD levels only explicitly enabled selectively in various files. I would encourage the distro build maintainers to fix this on their end by not supplying AVX flags to the build. Acceleration for higher levels of SIMD is still used in Mozilla builds, but again, we only selectively enable that in various parts of the moz.build where it is actually safe and properly gated by CPU feature-level checks.
There is no nice way to work around this inside our code, but the downstream build fixes on the distro end should be simple.
Comment 11•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 5 desktop browser crashes on Linux on release (startup)
For more information, please visit BugBot documentation.
| Reporter | ||
Comment 12•2 years ago
|
||
Supposedly this bug is caused by using gcc 13 (13.1?) with "--enable-lto" key: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101#c56
| Assignee | ||
Comment 13•2 years ago
•
|
||
That would be another way this could be occurring, if GCC is LTOing multiple versions of a function compiled with different SIMD flags. Disabling all of GCC LTO would be sad, but I am not sure the status of how broken GCC is or if this would be considered acceptable behavior on its end. But clearly, it causes substantial breakage on our end, if that is the case.
| Assignee | ||
Comment 14•2 years ago
|
||
Mike, any ideas about the best way to navigate this if this is indeed a GCC LTO quirk?
Comment 15•2 years ago
|
||
This sounds like a bug in GCC, rather than a quirk. Jan would know better.
| Reporter | ||
Comment 16•2 years ago
|
||
Firefox 114.0.1 from openSUSE Leap 15.4 ('experimental' repo) works OK with previous openSUSE compile settings (when Firefox 114.0.1 from Tumbleweed breaks on non-AVX hardware). OpenSUSE Leap 15.4 uses gcc 7.5.
OpenSUSE uses 'gnu99' C Dialect Options together with 'c++17' https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
IMHO 'gnu99' is rather outdated.
What compiler and build settings Mozilla is using for Firefox (tar) on Linux? (which works OK too).
Comment 17•2 years ago
|
||
Mozilla uses clang.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Set release status flags based on info from the regressing bug 1821512
| Assignee | ||
Comment 19•2 years ago
|
||
It seems that GCC, under certain circumstances, does not completely inline code in the skvx namespace in Skia, even though the code specifies "always_inline". As a side-effect, it leaves around symbols that are generated with different architecture flags supplied. LTO then picks one of the symbols, at what may as well be random.
This could potentially be an issue under clang if it ever failed to inline.
As a workaround for both, we force skvx to exist in arch-specific namespaces, i.e. -Dskvx=skvx_foo, so that even in the worst case, no ambiguous symbols will be generated.
Updated•2 years ago
|
Comment 20•2 years ago
|
||
Comment 21•2 years ago
|
||
| bugherder | ||
| Assignee | ||
Comment 23•2 years ago
|
||
(In reply to msirringhaus from comment #22)
Could we also get this uplifted to esr?
Can you verify that the fix works first?
| Assignee | ||
Comment 24•2 years ago
|
||
Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Firefox builds with GCC don't work properly.
- User impact if declined: Firefox builds with GCC may crash when run on older processors.
- Fix Landed on Version: 117
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Just renames a namespace in different source files to not clash when linking.
Comment 25•2 years ago
|
||
:lsalzman will this ride the train with 117 or does it need to be uplifted to 116 beta?
| Assignee | ||
Comment 26•2 years ago
|
||
Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium
Beta/Release Uplift Approval Request
- User impact if declined: Firefox builds with GCC don't work properly.
- Is this code covered by automated tests?: Unknown
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String changes made/needed:
- Is Android affected?: Unknown
Comment 27•2 years ago
|
||
Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium
Approved for 116.0b4
Comment 28•2 years ago
|
||
Updated•2 years ago
|
Comment 29•2 years ago
|
||
Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium
Approved for 115.1esr.
Comment 30•2 years ago
|
||
| uplift | ||
Updated•2 years ago
|
Description
•