Nvidia/Linux: Crash in [@ getenv]
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
People
(Reporter: sefeng, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, topcrash, topcrash-startup)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/949c820b-c629-4b0f-88c0-979120221009
Reason: SIGSEGV / SI_KERNEL
Top 10 frames of crashing thread:
0 libc.so.6 getenv /usr/src/debug/glibc/stdlib/getenv.c:84
1 libnvidia-eglcore.so.515.65.01 _glNamedBufferAttachMemoryNV
2 libnvidia-eglcore.so.515.65.01 NvGlEglGetFunctions
3 libnvidia-eglcore.so.515.65.01 _glNamedBufferAttachMemoryNV
4 libnvidia-eglcore.so.515.65.01 _glNamedBufferAttachMemoryNV
5 libnvidia-eglcore.so.515.65.01 _glNamedBufferAttachMemoryNV
6 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit
7 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit
8 libEGL_nvidia.so.0 NvEglwlaf47906in
9 libEGL_nvidia.so.0 NvEglwlaf47906in
Just hit this crash in 20221009094451 Nightly twice this morning.
Updated•2 years ago
|
Comment 1•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Linux on release (startup)
For more information, please visit auto_nag documentation.
Comment 2•2 years ago
|
||
The most recent of these crashes are often UAF e5e5 crashes, starting on 2022/11/02. Before that they were all nullptr.
They UAFs seem to be caused by bugs in gfx drivers, in particular libnvidia-eglcore.so.515, libnvidia-eglcore.so.470, libgallium, and libnvidia-glsi.so.515
getenv() is a footgun in multithreaded code; any pointer it returns can become a UAF if some other thread modifies the environment. Likely the nullptr crashes were just safer variants of the same footgun.
Very likely there's nothing we can do to fix this. Note also that most of these are startup crashes; I don't know what the fallback is or if these bad drivers can cause perma-crash on startup, which would be bad.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
I'm duplicating this against bug 1752703 because they're basically the same crash and require the same solution (namely not calling setenv()
in our code, or making a getenv()
wrapper that leaks the returned string).
Comment 4•2 years ago
•
|
||
I'm not really a programmer, but I assume this could be related to bug 1784813 and it could be worth a try
-
to remove this code:
https://searchfox.org/mozilla-central/rev/a4a41aafa80bf38f6e456238a60781fed46f9d08/gfx/thebes/gfxPlatformGtk.cpp#129-132// Bug 1714483: Force disable FXAA Antialiasing on NV drivers. This is a
// temporary workaround for a driver bug.
PR_SetEnv("__GL_ALLOW_FXAA_USAGE=0");- This code is to prevent massive glitches caused by manually enabling FXAA in Nvidia settings. It's the user's own fault.
- bug 1714483 comment 37: Nvidia has added a blocklist rule for Firefox in their driver, so that users can't manually enable it, but it likely wouldn't apply to Thunderbird, etc.
-
at least to restrict this to isMesa:
https://searchfox.org/mozilla-central/rev/a4a41aafa80bf38f6e456238a60781fed46f9d08/gfx/thebes/gfxPlatformGtk.cpp#174-177if (feature.IsEnabled() && IsX11Display()) {
// Enabling glthread crashes on X11/EGL, see bug 1670545
PR_SetEnv("mesa_glthread=false");
}- This code is to prevent a crash caused by manually setting mesa_glthread=true environment variable. It's the user's own fault.
Comment 5•2 years ago
|
||
Thanks, this wouldn't solve the problem since we have other calls but it would at least mitigate it.
Comment 6•2 years ago
|
||
__GL_ALLOW_FXAA_USAGE=0 even caused an unexpected slowdown: bug 1736245
Updated•8 months ago
|
Description
•