Closed Bug 1838323 Opened 2 years ago Closed 2 years ago

Startup crash with OpenSUSE-built Firefox 114.x due to unsupported CPU instruction, not bug #1837201, Mozilla-built Firefox is fine.

Categories

(Firefox Build System :: Third Party Packaging, defect)

Firefox 114
x86_64
Linux
defect

Tracking

(firefox-esr102 unaffected, firefox-esr115 fixed, firefox114 wontfix, firefox115 wontfix, firefox116 fixed, firefox117 fixed)

RESOLVED FIXED
117 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- fixed
firefox114 --- wontfix
firefox115 --- wontfix
firefox116 --- fixed
firefox117 --- fixed

People

(Reporter: kaykaykay123, Assigned: lsalzman)

References

(Regression)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

Steps to reproduce:

Download/install Firefox 114 or 114.0.1 and run.

Actual results:

Startup crash.

Expected results:

Ordinary run & work.

The Bugbug bot thinks this bug should belong to the 'Firefox::Downloads Panel' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Downloads Panel

ILL bug persist only on the newest Linux distributions, such as openSUSE Tumbleweed.
openSUSE bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101
Possibly this is not Firefox bug, but gcc or compile options or something other.

Batch of crash reports:
https://crash-stats.mozilla.org/signature/?product=Firefox&signature=libxul.so%400x3c912e0%20%7C%20libxul.so%400x3da095a%20%7C%20firefox%400x17eee&date=%3E%3D2023-06-06T20%3A12%3A00.000Z&date=%3C2023-06-13T20%3A12%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#summary

Info from "Crash Report" - "Raw Data and Minidumps":

"crash_info": {

"address": "0x00007fe6fbe912e0",
"assertion": null,
"crashing_thread": 0,
"instruction": "vshufps xmm0, xmm0, xmm0, 0x0",
"memory_accesses": [ ],
"type": "SIGILL / ILL_ILLOPN"

Instruction vshufps is a SSE instruction coded in AVX notation, that is why it requires AVX support: https://www.felixcloutier.com/x86/shufps
CPUs without AVX support will hang.
Hint: older Intel Celerons & Pentiums don't support AVX, it begins with Alder Lake for desktops: https://en.wikipedia.org/wiki/List_of_Intel_Celeron_processors
Intel Atom CPUs support - since Gracemont microarchitecture (end of 2022 year): https://en.wikipedia.org/wiki/Gracemont_(microarchitecture)

Users need SSE instruction in SSE coding.

Component: Downloads Panel → Untriaged
OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Instead of AVX-styled
vshufps xmm0, xmm0, xmm0, 0x0

I expect SSE-styled
shufps xmm0, xmm0, 0x0

Status confirmed - not a Firefox issue, because Firefox from tar from Mozilla works OK: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101#c31
ILL as gcc or code optimiser have changed instruction. With that change code works faster on compatible CPUs by using AVX, avoiding penalties for hopping between ordinary and VEX coding scheme https://en.wikipedia.org/wiki/VEX_prefix

Summary: Startup crash with FF 114 due to unsupported CPU instruction, not bug #1837201 → Startup crash with Firefox 114.x due to unsupported CPU instruction, not bug #1837201
Duplicate of this bug: 1838701
Status: UNCONFIRMED → NEW
Crash Signature: [@ libxul.so@0x3c912e0 | libxul.so@0x3da095a | firefox@0x17eee]
Component: Untriaged → Third Party Packaging
Ever confirmed: true
Product: Firefox → Firefox Build System
Summary: Startup crash with Firefox 114.x due to unsupported CPU instruction, not bug #1837201 → Startup crash with OpenSUSE-built Firefox 114.x due to unsupported CPU instruction, not bug #1837201, Mozilla-built Firefox is Fine.
Summary: Startup crash with OpenSUSE-built Firefox 114.x due to unsupported CPU instruction, not bug #1837201, Mozilla-built Firefox is Fine. → Startup crash with OpenSUSE-built Firefox 114.x due to unsupported CPU instruction, not bug #1837201, Mozilla-built Firefox is fine.
See Also: → 1835488

I've updated the symbol scraping scripts and I'm now running them by hand. With some luck we'll have symbols for this crash soon enough.

Here's the proper signature for this crash

Crash Signature: [@ libxul.so@0x3c912e0 | libxul.so@0x3da095a | firefox@0x17eee] → [@ libxul.so@0x3c912e0 | libxul.so@0x3da095a | firefox@0x17eee] [@ skvx::Vec<T>::Vec]
Keywords: regression
Regressed by: 1821512

Set release status flags based on info from the regressing bug 1821512

:lsalzman, since you are the author of the regressor, bug 1821512, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)

(In reply to Gabriele Svelto [:gsvelto] from comment #7)

Here's the proper signature for this crash

I guess you've already got it, but in case not: we had another report of it downstream in Gentoo at https://bugs.gentoo.org/908412 and there's a mostly-complete (first 13 frames) there:

#0  skvx::Vec<4, float>::VecStorage(float) (this=this@entry=0x7fffffff8eb0, this=<optimized out>) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/base/SkVx.h:274
#1  0x00007ffff19d7402 in skvx::operator*<4, float, float>(skvx::Vec<4, float> const&, float) (x=..., y=<optimized out>)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/base/SkVx.h:539
#2  0x00007ffff19c8ed6 in map_rect_affine (mat=<optimized out>, src=...) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkM44.cpp:151
#3  SkMatrixPriv::MapRect(SkM44 const&, SkRect const&) (m=<optimized out>, src=...) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkM44.cpp:220
#4  0x00007ffff19c8c24 in SkCanvas::computeDeviceClipBounds(bool) const (this=this@entry=0x7fffe2603100, outsetForAA=outsetForAA@entry=true)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkDevice.h:143
#5  0x00007ffff19c8b13 in SkCanvas::init(sk_sp<SkBaseDevice>) (this=this@entry=0x7fffe2603100, device=...)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkCanvas.cpp:425
#6  0x00007ffff19c8340 in SkCanvas::SkCanvas(SkBitmap const&, SkSurfaceProps const&) (props=..., bitmap=..., this=0x7fffe2603100)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/core/SkCanvas_Raster.cpp:31
#7  SkSurface_Raster::onNewCanvas() (this=0x7fffde611ba0) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface_Raster.cpp:76
#8  0x00007ffff19c8215 in SkSurface_Base::getCachedCanvas() (this=0x7fffde611ba0) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface_Base.h:210
#9  0x00007ffff19c7a66 in SkSurface::getCanvas() (this=<optimized out>) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/skia/skia/src/image/SkSurface.cpp:80
#10 mozilla::gfx::DrawTargetSkia::Init(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat)
(this=this@entry=0x7fffe27e2400, aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/2d/DrawTargetSkia.cpp:1748
#11 0x00007ffff19c7833 in mozilla::gfx::Factory::CreateDrawTarget(mozilla::gfx::BackendType, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat)
(aBackend=mozilla::gfx::BackendType::SKIA, aSize=..., aFormat=mozilla::gfx::SurfaceFormat::B8G8R8A8) at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/2d/Factory.cpp:388
#12 0x00007ffff19c76ef in gfxPlatform::CreateOffscreenContentDrawTarget(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SurfaceFormat, bool)
(this=<optimized out>, aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8A8, aFallback=aFallback@entry=false)
at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/thebes/gfxPlatform.cpp:1684
#13 0x00007ffff19bf147 in gfxPlatform::Init() () at /var/tmp/portage/www-client/firefox-114.0/work/firefox-114.0/gfx/thebes/gfxPlatform.cpp:985

My offhand guess as to why our official builds work and these distro builds don't is they're building with AVX flags in the binary. This causes SkVx to include them in the build and use them regardless.

Official builds on x86 are only built with SSE2, with higher SIMD levels only explicitly enabled selectively in various files. I would encourage the distro build maintainers to fix this on their end by not supplying AVX flags to the build. Acceleration for higher levels of SIMD is still used in Mozilla builds, but again, we only selectively enable that in various parts of the moz.build where it is actually safe and properly gated by CPU feature-level checks.

There is no nice way to work around this inside our code, but the downstream build fixes on the distro end should be simple.

Flags: needinfo?(lsalzman)

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 5 desktop browser crashes on Linux on release (startup)

For more information, please visit BugBot documentation.

Keywords: topcrash

Supposedly this bug is caused by using gcc 13 (13.1?) with "--enable-lto" key: https://bugzilla.opensuse.org/show_bug.cgi?id=1212101#c56

That would be another way this could be occurring, if GCC is LTOing multiple versions of a function compiled with different SIMD flags. Disabling all of GCC LTO would be sad, but I am not sure the status of how broken GCC is or if this would be considered acceptable behavior on its end. But clearly, it causes substantial breakage on our end, if that is the case.

Mike, any ideas about the best way to navigate this if this is indeed a GCC LTO quirk?

Flags: needinfo?(mh+mozilla)

This sounds like a bug in GCC, rather than a quirk. Jan would know better.

Flags: needinfo?(mh+mozilla) → needinfo?(jh)

Firefox 114.0.1 from openSUSE Leap 15.4 ('experimental' repo) works OK with previous openSUSE compile settings (when Firefox 114.0.1 from Tumbleweed breaks on non-AVX hardware). OpenSUSE Leap 15.4 uses gcc 7.5.

OpenSUSE uses 'gnu99' C Dialect Options together with 'c++17' https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
IMHO 'gnu99' is rather outdated.

What compiler and build settings Mozilla is using for Firefox (tar) on Linux? (which works OK too).

Mozilla uses clang.

See Also: → 1841661

Set release status flags based on info from the regressing bug 1821512

It seems that GCC, under certain circumstances, does not completely inline code in the skvx namespace in Skia, even though the code specifies "always_inline". As a side-effect, it leaves around symbols that are generated with different architecture flags supplied. LTO then picks one of the symbols, at what may as well be random.

This could potentially be an issue under clang if it ever failed to inline.

As a workaround for both, we force skvx to exist in arch-specific namespaces, i.e. -Dskvx=skvx_foo, so that even in the worst case, no ambiguous symbols will be generated.

Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/07bc853b8799 Disambiguate skvx when building with different arch options. r=glandium
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch

Could we also get this uplifted to esr?

Flags: needinfo?(lsalzman)

(In reply to msirringhaus from comment #22)

Could we also get this uplifted to esr?

Can you verify that the fix works first?

Flags: needinfo?(lsalzman) → needinfo?(msirringhaus)

Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Firefox builds with GCC don't work properly.
  • User impact if declined: Firefox builds with GCC may crash when run on older processors.
  • Fix Landed on Version: 117
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just renames a namespace in different source files to not clash when linking.
Attachment #9342495 - Flags: approval-mozilla-esr115?

:lsalzman will this ride the train with 117 or does it need to be uplifted to 116 beta?

Flags: needinfo?(lsalzman)

Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium

Beta/Release Uplift Approval Request

  • User impact if declined: Firefox builds with GCC don't work properly.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String changes made/needed:
  • Is Android affected?: Unknown
Flags: needinfo?(lsalzman)
Attachment #9342495 - Flags: approval-mozilla-beta?

Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium

Approved for 116.0b4

Attachment #9342495 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Pushed by dsmith@mozilla.com: https://hg.mozilla.org/releases/mozilla-beta/rev/1dcdc0d37b33 Disambiguate skvx when building with different arch options. r=glandium,a=dsmith

Comment on attachment 9342495 [details]
Bug 1838323 - Disambiguate skvx when building with different arch options. r?glandium

Approved for 115.1esr.

Flags: needinfo?(msirringhaus)
Flags: needinfo?(jh)
Attachment #9342495 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: