Closed Bug 1861365 Opened 1 year ago Closed 11 months ago

Crash in [@ std::locale::operator=]

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(firefox-esr115 unaffected, firefox119 unaffected, firefox120+ fixed, firefox121 fixed, firefox122 wontfix, firefox123 fixed)

RESOLVED FIXED
123 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox119 --- unaffected
firefox120 + fixed
firefox121 --- fixed
firefox122 --- wontfix
firefox123 --- fixed

People

(Reporter: aryx, Assigned: sergesanspaille)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 1 obsolete file)

Lee, any chance you can identify what started these crashes? Thank you.

14 crashes from 6 installations of Firefox 120.0b2, all on Linux.

Crash report: https://crash-stats.mozilla.org/report/index/05f43308-6fd4-4ad6-b79c-764620231026

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  firefox  std::locale::operator=  
1  firefox  std::ios_base::imbue  
2  firefox  std::basic_ios<char, std::char_traits<char> >::imbue  
3  libxul.so  sh::InitializeStream<std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > >  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Common.h:238
3  libxul.so  sh::TCompiler::setResourceString  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:1294
4  libxul.so  sh::TCompiler::Init  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:407
5  libxul.so  sh::ConstructCompiler  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/ShaderLang.cpp:368
6  libxul.so  mozilla::webgl::ShaderValidator::Create  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:215
6  libxul.so  mozilla::WebGLContext::CreateShaderValidator const  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:196
7  libxul.so  mozilla::WebGLShader::CompileShader  /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShader.cpp:98
Flags: needinfo?(lsalzman)
Assignee: nobody → jgilbert
Severity: -- → S2
Flags: needinfo?(lsalzman)

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on beta
  • Top 5 desktop browser crashes on Linux on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

friendly reminder that we are in the last week of betas before RC next week to fix this for fx120

The crashing caller in our code is here: https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/compiler/translator/Common.h#238

template <typename T>
T InitializeStream()
{
    T stream;
    stream.imbue(std::locale::classic());  // <- SIGSEGV
    return stream;
}

This code is not traditionally fallible, and should never realistically crash, as far as I can tell.

I don't believe we've changed any code here recently. (Calling code is in ANGLE)

I worry that this is due to third-party-builds of Firefox, or perhaps something weird that we've changed in our config?
Most of the crashes have a distro id of "canonical", though some have "mozilla" (but is this 100% us?).

One crash has a comment saying this has happened more than once.

We haven't actually seen this on 121, just apparently on a number of 120betas.

I don't see a path towards actionability.

We will discuss it in gfx-triage in ~17hrs.

As shown by Kelsey in comment 3, the graphics code is not at fault here. It is rather the interaction with the standard library code around locales that seems to be buggy. There are multiple surprising elements that we are currently investigating with [:sergesanspaille], but here are a few points.

Based on user comments and distribution of the volume across OS versions, I was able to reproduce this crash consistently from a fresh Ubuntu 18.04 install, by installing Firefox as follows:

sudo add-apt-repository ppa:mozillateam/firefox-next 
sudo apt-get update
sudo apt-get install firefox

STR: Navigate to any WebGL page, e.g. Aquarium.

The fact that the crash occurs in these builds for 18.04 specifically could potentially be explained by compiling for/with a libstc++ version that predates PR libstdc++/59439 (i.e. PR libstdc++/59439 could be preventing the crash even though that was not the original reason behind it): (Edit: this part was a false lead.)

2018-10-03  Jonathan Wakely  <jwakely@redhat.com>

	PR libstdc++/59439
	* src/c++98/locale.cc (locale::locale(const locale&)): Bypass
	reference count updates for the classic locale.
	(locale::~locale()): Likewise.
	(locale::operator=(const locale&)): Likewise.
	* src/c++98/locale_init.cc (locale::locale()): Likewise.
	(locale::global(const locale&)): Likewise.

But the weirdest part is that many libstc++ functions related to locales are directly incorporated in the firefox binary, and their symbols are exported, thus taking precedence over libstdc++.so.6. In the crash stack, std::locale::operator= is part of firefox, not libstdc++.so.6.

The crash itself may have been introduced as a side effect of bug 1855986, which may have led the compiler to take this surprising decision to incorporate std::locale code directly into the firefox binary.

We have made good progress with [:sergesanspaille]. The crash is likely caused by the use of the -static-libstdc++ in these builds of Firefox, which [:sergesanspaille] suggests could have been introduced initially because of a need to use a more recent version of libstdc++ than available on the platform.

The root cause here seems to be a bug in libstdc++ [1], which occurs when using -static-libstdc++ to compile a main executable and to compile a dynamically loaded library if both make use of std::locale code that reaches std::locale::_S_initialize. The main executable and the shared library will each have their own instance of the local c_locale buffer (declared in locale_init.cc inside an anonymous namespace), but they will share a common global variable for std::locale::_S_once (declared in locale_classes.h as a static class variable). When the shared library reaches std::locale::_S_Initialize(), it will consume the global std::locale::_S_once to initialize its local buffer c_locale. So if the main executable later reaches std::locale::_S_Initialize(), it will not go through std::locale::_S_initialize_once because std::locale::_S_once is already consumed, and so the local buffer c_locale of the main executable will remain uninitialized. Hence uses of std::locale::classic() from the main executable will return a pointer to an uninitialized locale, and using it will result in a null pointer dereference.

When we reproduce the crash, we observe this behavior, with a first call to liblgpllibs.so's std::locale::_S_initialize occuring from liblgpllibs.so (and consuming the global std::locale::_S_once), and a later call to firefox's std::locale::_S_initialize through std::locale::classic() (called from graphics code in libxul.so) resulting in the crash.

[:sergesanspaille] has suggestions, which we will forward, about how these Firefox packages could be built without -static-libstc++ and still use a different version of libstdc++ than the one available on the platform.

Edit: [1] As explained in comment 7, this behavior is only observed when using -static-libstdc++ -Bsymbolic-functions, so it's hard to qualify it as a stdlibc++ bug.

Component: Graphics: CanvasWebGL → General

Extra report after another batch of delightful hours with [:yannis].

Some Firefox libs, including lgpllibs are compiled with both:

  1. -Bsymbolic-functions. This gives more optimization space to the linker at the expense of preventing symbol interposition for functions;

  2. -static-libstdc++. This is hacked in by Ubuntu to support building firefox with a modern toolchain while shipping on an older system.

The combination of these two flags results in lgpllibs calling a non-interposed version of a libstdc++ function symbol, which in turns references a non-function symbol. However -Bsymbolic-functions does not prevent interposition of non-function symbols, so the non-interposed function references an interposed symbol that comes from the host binary, firefox in that case. This interposed symbol happens to be a guard that got initialized in the host binary, so the guard check passes while it shouldn't, kaboom.

We've crafted a minimal reproducer attached to this bug for those who want to play with it / get a deeper understanding of the behavior.

Going forward: we could investigate using if -Bsymbolic would work as a default, disabling -Bsymbolic-functions for some or all libs, but it seems easier to just link dyna:mically with libstdc++, bundle it alongside the other firefox libs and set the RPATH accordingly.

Adding (one of) the Ubuntu maintainer to the thread, he probably has suggestion on the best way forward.

Flags: needinfo?(ricotz)

Maybe we could break the configure when these two args are passed

No longer blocks: gfx-triage

Although the crashes come from multiple Linux distributions, they are reported with the same debug_id for firefox. For example, aggregating 120.0b8 crashes over platform version yields:

1 Ubuntu 18.04.6 LTS               63 67.02%
2 Debian GNU/Linux 12 (bookworm)   14 14.89 %
3 Zorin OS 15.3                    14 14.89 %
4 Linux Mint 19.3 Tricia            2  2.13 %
5 Linux Lite 4.8                    1  1.06 %

But aggregating over modules in stack shows:

1 firefox/b69db3399738ef7fe10ba11fc3e870c40 94 100.00 %

This suggests that all these crashes are from users that use the build from the PPA firefox-next package mentioned in comment 4. As far as we understand, while this community-maintained PPA firefox-next package continues to follow our release cycles to provide the latest beta version even for Ubuntu 18.04 LTS, the official firefox package maintained by Canonical which provides the latest release version to Ubuntu 18.04 LTS users is now stuck on version 113.0.2+build1-0ubuntu0.18.04.1 since Ubuntu 18.04 LTS reached end of support on 31 May 2023. Therefore we do not expect to receive additional crash volume with the release of 120.

Thank you for taking notice and the investigation of the problem.

I have dropped the static linking and instead started shipping private copies of the required libraries.

The currently available builds for Ubuntu 18.04 (bionic) are fixing the problem and doesn't show the crash.

Flags: needinfo?(ricotz)

I confirm that I can now watch 3D fish in my 18.04 VM without crashing. Thanks a lot for the quick fix!

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

Reopening for the sake of taking the patch that will forbid this combination of flags in the future, but the crash itself should progressively disappear regardless of that.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This is now affecting users on NixOS 23.05 (Stoat) in 120 release, probably for the same reasons. I will try to reproduce and report the issue.

Edit: Somebody else did that already.

Edit: I confirm that the Firefox version that comes directly installed with NixOS 23.05 is Firefox 120 and it does crash on any WebGL example page. The upcoming crash volume is huge.

Duplicate of this bug: 1866373

We at NixOS have triaged the issue downstream and it has the same root cause. A workaround has been merged, and a proper fix is being backported.

Thanks for the quick reaction!

Adding another crash signature with lower volume for NixOS 23.05. This is the same issue (crash in CanvasRenderer or Renderer thread) but with older build IDs: 20231114100918 and 20231114100649, so maybe a release candidate? I guess this should be fixed by the proper fix mentioned in comment 17 if it isn't fixed already.

Crash Signature: [@ std::locale::operator=] → [@ std::locale::operator=] [@ firefox@0xc3138 ]
Assignee: jgilbert → sguelton
Status: REOPENED → ASSIGNED
Product: Core → Firefox Build System

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3
Attachment #9363888 - Attachment is obsolete: true
Attachment #9365581 - Attachment description: Bug 1861365 - Detect statically linking with libstdc++ r=glandium → Bug 1861365 - Detect statically linking with libstdc++ r=glandium!
Pushed by sguelton@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cc5b886ea3a2 Detect statically linking with libstdc++ r=glandium
Pushed by sguelton@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0930954b46bb Detect statically linking with libstdc++ r=glandium
Status: ASSIGNED → RESOLVED
Closed: 1 year ago11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch

Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.

Flags: needinfo?(sguelton)

The patch landed in nightly and beta is affected.
:sergesanspaille, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox122 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(sguelton)
Flags: needinfo?(sguelton)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: