Crash in [@ std::locale::operator=]
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr115 unaffected, firefox119 unaffected, firefox120+ fixed, firefox121 fixed, firefox122 wontfix, firefox123 fixed)
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox119 | --- | unaffected |
firefox120 | + | fixed |
firefox121 | --- | fixed |
firefox122 | --- | wontfix |
firefox123 | --- | fixed |
People
(Reporter: aryx, Assigned: sergesanspaille)
References
()
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 1 obsolete file)
Lee, any chance you can identify what started these crashes? Thank you.
14 crashes from 6 installations of Firefox 120.0b2, all on Linux.
Crash report: https://crash-stats.mozilla.org/report/index/05f43308-6fd4-4ad6-b79c-764620231026
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 firefox std::locale::operator=
1 firefox std::ios_base::imbue
2 firefox std::basic_ios<char, std::char_traits<char> >::imbue
3 libxul.so sh::InitializeStream<std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Common.h:238
3 libxul.so sh::TCompiler::setResourceString /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:1294
4 libxul.so sh::TCompiler::Init /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:407
5 libxul.so sh::ConstructCompiler /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/ShaderLang.cpp:368
6 libxul.so mozilla::webgl::ShaderValidator::Create /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:215
6 libxul.so mozilla::WebGLContext::CreateShaderValidator const /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:196
7 libxul.so mozilla::WebGLShader::CompileShader /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShader.cpp:98
Updated•1 year ago
|
Updated•1 year ago
|
Comment 1•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on beta
- Top 5 desktop browser crashes on Linux on beta
For more information, please visit BugBot documentation.
Comment 2•1 year ago
|
||
friendly reminder that we are in the last week of betas before RC next week to fix this for fx120
Comment 3•1 year ago
|
||
The crashing caller in our code is here: https://searchfox.org/mozilla-central/source/gfx/angle/checkout/src/compiler/translator/Common.h#238
template <typename T>
T InitializeStream()
{
T stream;
stream.imbue(std::locale::classic()); // <- SIGSEGV
return stream;
}
This code is not traditionally fallible, and should never realistically crash, as far as I can tell.
I don't believe we've changed any code here recently. (Calling code is in ANGLE)
I worry that this is due to third-party-builds of Firefox, or perhaps something weird that we've changed in our config?
Most of the crashes have a distro id of "canonical", though some have "mozilla" (but is this 100% us?).
One crash has a comment saying this has happened more than once.
We haven't actually seen this on 121, just apparently on a number of 120betas.
I don't see a path towards actionability.
We will discuss it in gfx-triage in ~17hrs.
Comment 4•1 year ago
•
|
||
As shown by Kelsey in comment 3, the graphics code is not at fault here. It is rather the interaction with the standard library code around locales that seems to be buggy. There are multiple surprising elements that we are currently investigating with [:sergesanspaille], but here are a few points.
Based on user comments and distribution of the volume across OS versions, I was able to reproduce this crash consistently from a fresh Ubuntu 18.04 install, by installing Firefox as follows:
sudo add-apt-repository ppa:mozillateam/firefox-next
sudo apt-get update
sudo apt-get install firefox
STR: Navigate to any WebGL page, e.g. Aquarium.
The fact that the crash occurs in these builds for 18.04 specifically could potentially be explained by compiling for/with a (Edit: this part was a false lead.)libstc++
version that predates PR libstdc++/59439
(i.e. PR libstdc++/59439
could be preventing the crash even though that was not the original reason behind it):
2018-10-03 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/59439
* src/c++98/locale.cc (locale::locale(const locale&)): Bypass
reference count updates for the classic locale.
(locale::~locale()): Likewise.
(locale::operator=(const locale&)): Likewise.
* src/c++98/locale_init.cc (locale::locale()): Likewise.
(locale::global(const locale&)): Likewise.
But the weirdest part is that many libstc++
functions related to locales are directly incorporated in the firefox
binary, and their symbols are exported, thus taking precedence over libstdc++.so.6
. In the crash stack, std::locale::operator=
is part of firefox
, not libstdc++.so.6
.
The crash itself may have been introduced as a side effect of bug 1855986, which may have led the compiler to take this surprising decision to incorporate std::locale
code directly into the firefox
binary.
Comment 5•1 year ago
•
|
||
We have made good progress with [:sergesanspaille]. The crash is likely caused by the use of the -static-libstdc++
in these builds of Firefox, which [:sergesanspaille] suggests could have been introduced initially because of a need to use a more recent version of libstdc++
than available on the platform.
The root cause here seems to be a bug in libstdc++
[1], which occurs when using -static-libstdc++
to compile a main executable and to compile a dynamically loaded library if both make use of std::locale
code that reaches std::locale::_S_initialize
. The main executable and the shared library will each have their own instance of the local c_locale
buffer (declared in locale_init.cc
inside an anonymous namespace), but they will share a common global variable for std::locale::_S_once
(declared in locale_classes.h
as a static class variable). When the shared library reaches std::locale::_S_Initialize()
, it will consume the global std::locale::_S_once
to initialize its local buffer c_locale
. So if the main executable later reaches std::locale::_S_Initialize()
, it will not go through std::locale::_S_initialize_once
because std::locale::_S_once
is already consumed, and so the local buffer c_locale
of the main executable will remain uninitialized. Hence uses of std::locale::classic()
from the main executable will return a pointer to an uninitialized locale, and using it will result in a null pointer dereference.
When we reproduce the crash, we observe this behavior, with a first call to liblgpllibs.so
's std::locale::_S_initialize
occuring from liblgpllibs.so
(and consuming the global std::locale::_S_once
), and a later call to firefox
's std::locale::_S_initialize
through std::locale::classic()
(called from graphics code in libxul.so
) resulting in the crash.
[:sergesanspaille] has suggestions, which we will forward, about how these Firefox packages could be built without -static-libstc++
and still use a different version of libstdc++
than the one available on the platform.
Edit: [1] As explained in comment 7, this behavior is only observed when using -static-libstdc++ -Bsymbolic-functions
, so it's hard to qualify it as a stdlibc++
bug.
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 6•1 year ago
|
||
Assignee | ||
Comment 7•1 year ago
|
||
Extra report after another batch of delightful hours with [:yannis].
Some Firefox libs, including lgpllibs
are compiled with both:
-
-Bsymbolic-functions
. This gives more optimization space to the linker at the expense of preventing symbol interposition for functions; -
-static-libstdc++
. This is hacked in by Ubuntu to support building firefox with a modern toolchain while shipping on an older system.
The combination of these two flags results in lgpllibs
calling a non-interposed version of a libstdc++
function symbol, which in turns references a non-function symbol. However -Bsymbolic-functions
does not prevent interposition of non-function symbols, so the non-interposed function references an interposed symbol that comes from the host binary, firefox
in that case. This interposed symbol happens to be a guard that got initialized in the host binary, so the guard check passes while it shouldn't, kaboom.
We've crafted a minimal reproducer attached to this bug for those who want to play with it / get a deeper understanding of the behavior.
Going forward: we could investigate using if -Bsymbolic
would work as a default, disabling -Bsymbolic-functions
for some or all libs, but it seems easier to just link dyna:mically with libstdc++, bundle it alongside the other firefox libs and set the RPATH
accordingly.
Assignee | ||
Comment 8•1 year ago
|
||
Adding (one of) the Ubuntu maintainer to the thread, he probably has suggestion on the best way forward.
Comment 9•1 year ago
|
||
Maybe we could break the configure when these two args are passed
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Although the crashes come from multiple Linux distributions, they are reported with the same debug_id for firefox
. For example, aggregating 120.0b8 crashes over platform version
yields:
1 Ubuntu 18.04.6 LTS 63 67.02%
2 Debian GNU/Linux 12 (bookworm) 14 14.89 %
3 Zorin OS 15.3 14 14.89 %
4 Linux Mint 19.3 Tricia 2 2.13 %
5 Linux Lite 4.8 1 1.06 %
But aggregating over modules in stack
shows:
1 firefox/b69db3399738ef7fe10ba11fc3e870c40 94 100.00 %
This suggests that all these crashes are from users that use the build from the PPA firefox-next
package mentioned in comment 4. As far as we understand, while this community-maintained PPA firefox-next
package continues to follow our release cycles to provide the latest beta version even for Ubuntu 18.04 LTS, the official firefox
package maintained by Canonical which provides the latest release version to Ubuntu 18.04 LTS users is now stuck on version 113.0.2+build1-0ubuntu0.18.04.1
since Ubuntu 18.04 LTS reached end of support on 31 May 2023. Therefore we do not expect to receive additional crash volume with the release of 120.
Comment 11•1 year ago
|
||
Thank you for taking notice and the investigation of the problem.
I have dropped the static linking and instead started shipping private copies of the required libraries.
The currently available builds for Ubuntu 18.04 (bionic) are fixing the problem and doesn't show the crash.
Comment 12•1 year ago
|
||
I confirm that I can now watch 3D fish in my 18.04 VM without crashing. Thanks a lot for the quick fix!
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
Reopening for the sake of taking the patch that will forbid this combination of flags in the future, but the crash itself should progressively disappear regardless of that.
Comment 15•1 year ago
•
|
||
This is now affecting users on NixOS 23.05 (Stoat)
in 120 release, probably for the same reasons. I will try to reproduce and report the issue.
Edit: Somebody else did that already.
Edit: I confirm that the Firefox version that comes directly installed with NixOS 23.05 is Firefox 120 and it does crash on any WebGL example page. The upcoming crash volume is huge.
Updated•1 year ago
|
Comment 17•1 year ago
|
||
We at NixOS have triaged the issue downstream and it has the same root cause. A workaround has been merged, and a proper fix is being backported.
Comment 18•1 year ago
|
||
Thanks for the quick reaction!
Comment 19•1 year ago
•
|
||
Adding another crash signature with lower volume for NixOS 23.05. This is the same issue (crash in CanvasRenderer
or Renderer
thread) but with older build IDs: 20231114100918 and 20231114100649, so maybe a release candidate? I guess this should be fixed by the proper fix mentioned in comment 17 if it isn't fixed already.
Assignee | ||
Comment 20•1 year ago
|
||
Updated•1 year ago
|
Comment 21•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 22•11 months ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit BugBot documentation.
Updated•11 months ago
|
Updated•11 months ago
|
Comment 23•11 months ago
|
||
Comment 24•11 months ago
|
||
Backed out for causing build bustages
Backout link: https://hg.mozilla.org/integration/autoland/rev/39d0ed0949301ef5753100b26fd55a1a20651c8e
Log link: https://treeherder.mozilla.org/logviewer?job_id=443078756&repo=autoland&lineNumber=1142
Please also check this SM bustages
Comment 25•11 months ago
|
||
Comment 26•11 months ago
|
||
bugherder |
Comment 27•11 months ago
|
||
Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•11 months ago
|
Comment 28•11 months ago
|
||
The patch landed in nightly and beta is affected.
:sergesanspaille, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox122
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•11 months ago
|
Description
•