Open Bug 1824620 Opened 2 years ago Updated 6 months ago

firefox segfaults inside libxul when memory allocation fails

Categories

(Core :: Widget: Gtk, defect, P3)

Firefox 111
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: castilma+mozilla, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0

Steps to reproduce:

  1. Disable memory overcommit on linux.

echo 2 >/proc/sys/vm/overcommit_memory

echo 100 >/proc/sys/vm/overcommit_ratio # smaller number would make your system run out of memory faster

  1. start firefox and open many sites. preferably memory hungry sites (reddit etc.), maybe play multiple videos.

Actual results:

Firefox (or sometimes just some tabs) crash.

dmesg:
[240522.490689] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.491303] Sandbox Forked[11546]: segfault at 0 ip 00007f5a1a3ec490 sp 00007f59fb0bc2d0 error 6 in libxul.so[7f5a18130000+5e7b000] likely on CPU 1 (core 0, socket 0)
[240522.491319] Code: 0d 5d 93 0f 04 48 89 01 c7 04 25 00 00 00 00 c3 02 00 00 ff 15 41 93 0f 04 48 8d 05 c5 1e 16 fc 48 8b 0d 3b 93 0f 04 48 89 01 <c7> 04 25 00 00 00 00 98 02 00 00 ff 15 1f 93 0f 04 e8 5a ea bb 03
[240522.506906] __vm_enough_memory: pid: 10300, comm: IPC Launch, no enough memory for the allocation
[240522.515712] __vm_enough_memory: pid: 10300, comm: IPC Launch, no enough memory for the allocation
[240522.745923] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.745937] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.745942] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.745944] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.745947] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240522.745949] __vm_enough_memory: pid: 11546, comm: Sandbox Forked, no enough memory for the allocation
[240536.229865] __vm_enough_memory: 26885 callbacks suppressed

.local/share/sddm/xorg-session.log:
Exiting due to channel error.
[..].
Exiting due to channel error.
ATTENTION: default value of option mesa_glthread overridden by environment.
[...]
ATTENTION: default value of option mesa_glthread overridden by environment.
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC I/O Parent] WARNING: process 11546 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC I/O Parent] WARNING: process 11981 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC I/O Parent] WARNING: process 12101 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC I/O Parent] WARNING: process 12388 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC I/O Parent] WARNING: process 12416 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC I/O Parent] WARNING: process 12430 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
ATTENTION: default value of option mesa_glthread overridden by environment.
[Parent 10245, IPC I/O Parent] WARNING: process 12536 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC I/O Parent] WARNING: process 12547 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266
[Parent 10245, IPC Launch] WARNING: fork() failed: Nicht genügend Hauptspeicher verfügbar: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_linux.cc:273
[Parent 10245, IPC I/O Parent] WARNING: Failed to launch tab subprocess: file /build/firefox/src/firefox-111.0.1/ipc/glue/GeckoChildProcessHost.cpp:770
[Parent 10245, IPC I/O Parent] WARNING: process 12554 exited on signal 11: file /build/firefox/src/firefox-111.0.1/ipc/chromium/src/base/process_util_posix.cc:266

gdb:
[New LWP 10245]
[New LWP 11543]
[New LWP 11538]
[New LWP 11537]
[New LWP 11536]
[New LWP 11535]
[New LWP 11462]
[New LWP 11420]
[New LWP 11380]
[New LWP 11378]
[New LWP 11377]
[New LWP 11369]
[New LWP 11366]
[New LWP 11365]
[New LWP 11364]
[New LWP 11363]
[New LWP 11362]
[New LWP 11340]
[New LWP 11304]
[New LWP 11288]
[New LWP 11274]
[New LWP 11235]
[New LWP 11219]
[New LWP 11196]
[New LWP 11129]
[New LWP 11098]
[New LWP 11034]
[New LWP 11030]
[New LWP 11025]
[New LWP 11020]
[New LWP 10930]
[New LWP 10898]
[New LWP 10881]
[New LWP 10880]
[New LWP 10879]
[New LWP 10845]
[New LWP 10842]
[New LWP 10788]
[New LWP 10786]
[New LWP 10722]
[New LWP 10718]
[New LWP 10714]
[New LWP 10693]
[New LWP 10692]
[New LWP 10691]
[New LWP 10690]
[New LWP 10677]
[New LWP 10674]
[New LWP 10668]
[New LWP 10663]
[New LWP 10662]
[New LWP 10657]
[New LWP 10656]
[New LWP 10650]
[New LWP 10636]
[New LWP 10589]
[New LWP 10471]
[New LWP 10421]
[New LWP 10418]
[New LWP 10417]
[New LWP 10412]
[New LWP 10408]
[New LWP 10404]
[New LWP 10399]
[New LWP 10395]
[New LWP 10392]
[New LWP 10382]
[New LWP 10358]
[New LWP 10356]
[New LWP 10355]
[New LWP 10352]
[New LWP 10351]
[New LWP 10350]
[New LWP 10349]
[New LWP 10348]
[New LWP 10347]
[New LWP 10336]
[New LWP 10335]
[New LWP 10327]
[New LWP 10326]
[New LWP 10325]
[New LWP 10324]
[New LWP 10322]
[New LWP 10321]
[New LWP 10320]
[New LWP 10319]
[New LWP 10316]
[New LWP 10315]
[New LWP 10314]
[New LWP 10313]
[New LWP 10311]
[New LWP 10304]
[New LWP 10300]
[New LWP 10299]
[New LWP 10298]
[New LWP 10297]
[New LWP 10296]
[New LWP 10295]
[New LWP 10294]
[New LWP 10293]
[New LWP 10292]
[New LWP 10291]
[New LWP 10290]
[New LWP 10289]
[New LWP 10288]
[New LWP 10287]
[New LWP 10286]
[New LWP 10285]
[New LWP 10284]
[New LWP 10283]
[New LWP 10275]
[New LWP 10274]
[New LWP 10272]
[New LWP 10271]
[New LWP 10270]
[New LWP 10269]
[New LWP 10268]
[New LWP 10267]
[New LWP 10265]
[New LWP 10262]
[New LWP 10260]
[New LWP 10259]
[New LWP 10258]
[New LWP 10257]
[New LWP 10256]
[New LWP 10255]
[New LWP 10251]
[New LWP 10250]
[New LWP 10249]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/usr/lib/firefox/firefox'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f5a1a3ec490 in ?? () from /usr/lib/firefox/libxul.so
[Current thread is 1 (Thread 0x7f59ec7ff6c0 (LWP 11546))]
(gdb) bt
#0 0x00007f5a1a3ec490 in () at /usr/lib/firefox/libxul.so
#1 0x00007f5a1a3ebe2c in () at /usr/lib/firefox/libxul.so
#2 0x00007f5a196f122e in () at /usr/lib/firefox/libxul.so
#3 0x00007f5a196f092b in () at /usr/lib/firefox/libxul.so
#4 0x00007f5a19d6c973 in () at /usr/lib/firefox/libxul.so
#5 0x00007f5a19d6d5d3 in () at /usr/lib/firefox/libxul.so
#6 0x00007f5a195655cb in () at /usr/lib/firefox/libxul.so
#7 0x00007f5a18892a69 in () at /usr/lib/firefox/libxul.so
#8 0x00007f5a19d6df07 in () at /usr/lib/firefox/libxul.so
#9 0x00007f5a194da6f1 in () at /usr/lib/firefox/libxul.so
#10 0x00007f5a194da137 in () at /usr/lib/firefox/libxul.so
#11 0x00007f5a203e9c7b in _pt_root (arg=0x7f5a0ba78ca0) at pthreads/ptthread.c:201
#12 0x000055c88b5cbfd0 in set_alt_signal_stack_and_start(PthreadCreateParams*) ()
#13 0x00007f5a20e9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#14 0x00007f5a20f20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)

Expected results:

Firefox (or sometimes just some tabs) should not crash.

Attachment #9325089 - Attachment description: gdb.log → complete gdb output
Attachment #9325089 - Attachment description: complete gdb output → complete gdb output. note that gdb says it downloaded symbols for firefox and all libraries, but the backtrace does not seem to print any function names

The Bugbug bot thinks this bug should belong to the 'Firefox Build System::General' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → General
Product: Firefox → Firefox Build System
Component: General → Untriaged
OS: Unspecified → Linux
Product: Firefox Build System → Firefox
Hardware: Unspecified → x86_64

Setting this to Core > Widget: Gtk, so that our developers could take a look at this issue - if this is not the right component, please move it to a more appropriate one. Thanks!

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

It's expected as we don't support fallible allocations.

Priority: -- → P3

I don't know whether this contradicts what you said, but I had many other warnings from other firefox processes (which I filtered out in the OP) but I did not notice any of them quitting, let alone segfault. At least it should not segfault but exist cleanly in this situation, don't you think?

I'm seeing very similar behavior to castilma's report, although the smoking gun here is that I have a shell script reading /proc/meminfo every second, and on my 64 GiB RAM / 64 GiB swap, overcommit-disabled box, I'm seeing vmem usage jump instantly from 80 GiB (e..g) to 104 or more GB for under a second, and when the grabs are near the limit of vram (abou 127 GiB), contemporaneously one of Firefox, another independent Firefox, or a game of Steam + Starfield (not required to see the issue occur), or some other program will encounter a bad memory alloc in that moment and implode, with dmesg blaming libxul.so the vast majority of the time:

[Sun May 11 05:01:46 2025] Sandbox Forked[1960390]: segfault at 0 ip 00007ab83a5dae59 sp 00007ab828efe3f0 error 6 in libxul.so[7ab83a282000+662f000] likely on CPU 10 (core 10, socket 0)

In the 251 cases currently in dmesg, the segfaults were in (prefixed by count, and stripped down to essentially unchanging info)

      1 DOM Worker: firefox 
      1 Isolated Web Co: firefox 
      1 Isolated Web Co: libxul.so 
    230 Sandbox Forked: libxul.so 
      7 Stream firefox 
      1 Stream segfault at 0 ip 0000615d39591fad sp 000074cac3d3b1e0 error 6 

Here are the results of one where Firefox suffered:

[Sat May  3 14:25:59 2025] __vm_enough_memory: 20 callbacks suppressed
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 3734883, comm: Sandbox Forked, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 3734884, comm: Sandbox Forked, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 199034, comm: IPC Launch, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:25:59 2025] __vm_enough_memory: pid: 146538, comm: firefox, not enough memory for the allocation
[Sat May  3 14:26:00 2025] Sandbox Forked[3734884]: segfault at 0 ip 00007ab83a5dae59 sp 00007ab828efe3f0 error 6 in libxul.so[7ab83a282000+662f000] likely on CPU 7 (core 7, socket 0)
[Sat May  3 14:26:00 2025] Code: 30 01 00 00 0f 85 0c 01 00 00 48 81 c4 38 01 00 00 5b 41 5c 41 5e 41 5f c3 48 8d 05 e1 20 32 06 48 8b 0d c2 63 31 09 48 89 01 <c7> 04 25 00 00 00 00 a8 02 00 00 e8 ff 0b cb ff 48 8d 35 f1 20 32
[Sat May  3 14:26:00 2025] Sandbox Forked[3734883]: segfault at 0 ip 00007cf8f5bdae59 sp 00007cf8e3cbd3f0 error 6 in libxul.so[7cf8f5882000+662f000] likely on CPU 0 (core 0, socket 0)
[Sat May  3 14:26:00 2025] Code: 30 01 00 00 0f 85 0c 01 00 00 48 81 c4 38 01 00 00 5b 41 5c 41 5e 41 5f c3 48 8d 05 e1 20 32 06 48 8b 0d c2 63 31 09 48 89 01 <c7> 04 25 00 00 00 00 a8 02 00 00 e8 ff 0b cb ff 48 8d 35 f1 20 32

This host had previous had only 32 GiB of swap, and after getting tired of watching Firefox commit suicide nearly every day for a while (i have a lot of tab/windows open), I up it to 64 GiB. Suicides decreased, but with the task watching /proc/meminfo, it's getting pretty clear that libxul.so is the problem child.

The comment by Martin Stránský was especially disturbing:
"It's expected as we don't support fallible allocations."

Since if that is indeed about not supporting any situation where malloc() could return a 0, and is then combined with large opportunistic memory grabs, the results I'm seeing would be guaranteed.

Currently I don't have a setup that supports including the size of the attempted/failed allocation in the dmesg logs, nor have I yet crawled through the libxul code to find out where this grandiose, suspected allocation style might be happening, where, with some amazing luck, I might discover someone else realized this would be a problem and might have provided some environment variable to control it or something. Optimism, I know. I apologize for those, but am including what I have anyway, since:

  • This issue appears to still be a problem
  • More than a few people are subject to this problem (i.e. we use overcommit for reasons we really shouldn't have to explain)
  • It's my belief that no normal code should implement the behavior of grab-all-free memory (if that's indeed what's happening), even if it's to immediately release most in the next ~1 second
  • I can't see a reason Firefox would be normally allocing chunks of ~40 GiB unless it's part of the prior bullet point

Grabbing all the remaining virtual memory is a hostile tactic, given that other processes will be affected if overcommit is off - and overcommit should be off if enough swap is available to make overcommit unnecessary. And turning on overcommit spreads instability into even programs that check malloc() returns and do everything correctly, that would normally be immune to these problems. Makes me miss IRIX and its ability to enable something like overcommit ("logical memory") for specific programs only, so that a program expected to consume more than 1/2 of vram could still do fork() and exec() to run something else.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: