Open Bug 1836528 Opened 11 months ago Updated 4 months ago

DOM Worker segfault, error 6 in libxul.so, likely on CPU

Categories

(Core :: DOM: Workers, defect, P3)

Firefox 114
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: ab, Unassigned)

References

Details

Attachments

(1 file)

For a few weeks now, I see more and more Firefox crashes while none before.
The error displayed in dmesg is:
DOM Worker[1245571]: segfault at 0 ip 00007f7591542be1 sp 00007f757142cc60 error 6 in libxul.so[7f7591015000+5db6000] likely on CPU 4 (core 4, socket 0).

As mentioned in the linked bug report, I'm also using a AMD Ryzen 5 5600X 6-Core Processor, with Debian testing (12, bookworm).
I'm using Firefox Dev edition 114.0b9 (64 bits).

Is your Firefox transmitting telemetry and would you mind to check the crashes in About -> crashes for a crash id? This would give us much more information. Thank you!

Firefox was configured to not send crash report since they might include sensitive data.
I've changed that and will wait for the next crash, then post here the ID.
Are those crash dump public?

Those dumps are public but sensitive parts (URL, minidump and others) are only accessible for a few people here under strict restrictions. You can find more information about our privacy policy here: https://www.mozilla.org/en-US/privacy/firefox/. Thanks for your support

(setting needinfo to reflect current bug state)

Severity: -- → S3
Flags: needinfo?(ab)
Priority: -- → P3

Well, I activated the crash report functionality, but that might not work because Firefox does not really crash, only the DOM Worker do, and then there are no crashes to report.
What's happening after such DOM worker crash, is that Firefox interface does not react anymore.

I then need to kill -15 (not -9) all processes related to Firefox, then restart it.

Not sure what additional info I can send to help you identify the problem.

Flags: needinfo?(ab)

The worker crashing should still take down the process that crashed unless firefox is being run in such a way that a segfault handler is intervening and keeping the process alive so that a debugger like gdb or lldb can attach to it. (Or a debugger is actually attaching to it.) This would explain why the process would stay alive and also why Firefox would lock up; the way gdb suspends a process can cause problems in the parent process because it's expecting vsync IPCs to be processed in a timely fashion or situations like that.

Is it possible you are running Firefox in a way that debuggers would intervene, or that the XPCOM_DEBUG_BREAK environment variable is being set? (Although XPCOM_DEBUG_BREAK I think would produce obvious stdout/stderr output and trigger interrupt 3 which should dump core unless something is configured to trigger a debugger. But I don't know that would match up with what you're seeing in demsg.)

(In reply to AciD from comment #5)

Well, I activated the crash report functionality, but that might not work because Firefox does not really crash, only the DOM Worker do, and then there are no crashes to report.
What's happening after such DOM worker crash, is that Firefox interface does not react anymore.

I then need to kill -15 (not -9) all processes related to Firefox, then restart it.

Not sure what additional info I can send to help you identify the problem.

(asuth was faster, but I'll submit this comment for the additional hints where to look)

As you know from killing them, Firefox uses multiple processes, and each of those multiple threads. Still any crash in any of those should be reported, AFAIK. However, in case of a content process crash the main browser should keep working and you should just see a crash message in the affected tab. The "DOM Worker" in your initial comment most probably stems from this thread creation.

What do you see if you type "about:crashes" as URL in the browser? There could be a section of "Unsubmitted crash reports" and one of submitted ones. If you see unsubmitted ones, please trigger submission manually.

What's happening after such DOM worker crash, is that Firefox interface does not react anymore.

That is unfortunate and surely not what we would expect to happen. Do you see any signs of abnormal memory and/or CPU usage while Firefox hangs? You could also try to close it from the "X" of the window (or even kill -9) and wait if it crashes for a shutdown hang after a while (~70 seconds). This might help at least to investigate the abnormal "hang after crash" situation.

In alternative you could try to capture the console messages that Firefox produces while running (stdout and stderr), there might be hints on what is going wrong there, as well.

And doing some test with a clean profile and/or without any extension could also help to see if there has happened something quirky in your environment (which might still be a bug, of course). Some more general information on how to Troubleshoot Firefox crashes.

A more sophisticated approach to dig into this (if you can reproduce this fairly easy) could also be to configure the profiler for shutdown and see if you find the configured profiler file after you had to kill the process. If you want to try this, please ensure to have selected the "DOM Worker" thread in the profiler settings before enabling it. But this is not feasible for occasional crashes while browsing, as the profile would become very huge, though.

Thanks for your support

Flags: needinfo?(ab)

First, echo $XPCOM_DEBUG_BREAK outputs nothing since it's not set.
I'm using Debian testing with KDE, and there are no debugger used that would freeze the application. Do note that I do not use the firefox package that comes with Debian, but an auto-updating version that came with umake installed a while ago.

Second, "about:crashes" only shows the previous crash reports sent up until nov 2022 (when I first deactivated the feature), and no more crash reports since I reactivated it (while since my post, I did have a few lockups).

However, I might have found the culprit; I've seen other applications crash lately in dmesg, so I started to suspect a faultry RAM (I do not have ECC memory on that rig so that's possible), but in the log something came up: I saw many lines like VFS: file-max limit 65535 reached, just before a segfault for libespeak-ng.so.1.1.51.
It might be possible that Firefox used up all the available file descriptors (granted I have a 4200+ tabs sessions...) and that would explain some of the weird behavior I saw recently in my desktop.

To test my theory, I changed the value of /proc/sys/fs/inotify/file-max from 65535 to 128000, and will keep you posted.
Perhaps then the bug would be to limit the number of files Firefox opens up?

Flags: needinfo?(ab)

(In reply to AciD from comment #8)

To test my theory, I changed the value of /proc/sys/fs/inotify/file-max from 65535 to 128000, and will keep you posted.
Perhaps then the bug would be to limit the number of files Firefox opens up?

I think inotify is specifically about watching files for changes, I think you may want something like:
/proc/sys/fs/file-max
/proc/sys/fs/file-nr

On my Ubuntu 22.04 system file-max is 9223372036854775807. If your system-wide values are 64k that would seem to be low.

(In reply to AciD from comment #8)

First, echo $XPCOM_DEBUG_BREAK outputs nothing since it's not set.

Thanks for checking! That would have been surprising but it helps to check everything.

I'm using Debian testing with KDE, and there are no debugger used that would freeze the application. Do note that I do not use the firefox package that comes with Debian, but an auto-updating version that came with umake installed a while ago.

I'm assuming umake is https://wiki.ubuntu.com/ubuntu-make. It seems like based on the installation logic that it's installing the official Firefox Developer Edition binaries so that should be normal. (What's happening is still obviously not normal, but if the file limit is being hit, that's definitely a reason that all kinds of weird things could happen!)

Happens with monotonous regularity. Seems to be most prevalent whilst a video is playing

Firefox 116.0 (64 bit) on Gentoo Linux running with Fluxbox.
Happens if i start it from the command line or as a part of my fluxbox startup.
Tried redirecting stdout and stderr to /dev/null, as described elsewhere, to no avail.

32G memory, lots of spare disk.

Linux Lyalls-PC 6.1.41-gentoo #2 SMP PREEMPT_DYNAMIC Thu Aug 3 22:48:50 ACST 2023 x86_64 Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz GenuineIntel GNU/Linux

I have the nvidia driver installed version 525.125.06

$ ldd /usr/lib64/firefox/firefox
linux-vdso.so.1 (0x00007fffb5ebc000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6 (0x00007f801fa00000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libgcc_s.so.1 (0x00007f801fd71000)
libc.so.6 => /lib64/libc.so.6 (0x00007f801f82e000)
/lib64/ld-linux-x86-64.so.2 (0x00007f801fe9b000)
libm.so.6 => /lib64/libm.so.6 (0x00007f801fc98000)

lyall@Lyalls-PC ~
$ ldd /usr/lib64/firefox/libxul.so
linux-vdso.so.1 (0x00007ffcf0bb7000)
libmozsandbox.so => /usr/lib64/firefox/libmozsandbox.so (0x00007fdadfe84000)
liblgpllibs.so => /usr/lib64/firefox/liblgpllibs.so (0x00007fdadfe78000)
libmozsqlite3.so => /usr/lib64/firefox/libmozsqlite3.so (0x00007fdad8abb000)
libmozgtk.so => /usr/lib64/firefox/libmozgtk.so (0x00007fdadfe74000)
libmozwayland.so => /usr/lib64/firefox/libmozwayland.so (0x00007fdadfe6e000)
libicui18n.so.73 => /usr/lib64/libicui18n.so.73 (0x00007fdad8600000)
libicuuc.so.73 => /usr/lib64/libicuuc.so.73 (0x00007fdad8200000)
libaom.so.3 => /usr/lib64/libaom.so.3 (0x00007fdad7c00000)
libdav1d.so.6 => /usr/lib64/libdav1d.so.6 (0x00007fdad8420000)
libasound.so.2 => /usr/lib64/libasound.so.2 (0x00007fdad89cd000)
libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007fdad7abf000)
libXcomposite.so.1 => /usr/lib64/libXcomposite.so.1 (0x00007fdad89c6000)
libXdamage.so.1 => /usr/lib64/libXdamage.so.1 (0x00007fdad89c1000)
libXext.so.6 => /usr/lib64/libXext.so.6 (0x00007fdad89ac000)
libXfixes.so.3 => /usr/lib64/libXfixes.so.3 (0x00007fdad89a4000)
libXrandr.so.2 => /usr/lib64/libXrandr.so.2 (0x00007fdad8997000)
libc.so.6 => /lib64/libc.so.6 (0x00007fdad78ed000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdadfeb4000)
libffi.so.8 => /usr/lib64/libffi.so.8 (0x00007fdad8989000)
libplc4.so => /usr/lib64/libplc4.so (0x00007fdad8982000)
libnspr4.so => /usr/lib64/libnspr4.so (0x00007fdad8940000)
libz.so.1 => /lib64/libz.so.1 (0x00007fdad8406000)
libm.so.6 => /lib64/libm.so.6 (0x00007fdad7814000)
libssl3.so => /usr/lib64/libssl3.so (0x00007fdad8199000)
libsmime3.so => /usr/lib64/libsmime3.so (0x00007fdad77ea000)
libnss3.so => /usr/lib64/libnss3.so (0x00007fdad76af000)
libnssutil3.so => /usr/lib64/libnssutil3.so (0x00007fdad767d000)
libfreetype.so.6 => /usr/lib64/libfreetype.so.6 (0x00007fdad75b5000)
libfontconfig.so.1 => /usr/lib64/libfontconfig.so.1 (0x00007fdad7569000)
libgtk-3.so.0 => /usr/lib64/libgtk-3.so.0 (0x00007fdad6c00000)
libgdk-3.so.0 => /usr/lib64/libgdk-3.so.0 (0x00007fdad7478000)
libpango-1.0.so.0 => /usr/lib64/libpango-1.0.so.0 (0x00007fdad740d000)
libharfbuzz.so.0 => /usr/lib64/libharfbuzz.so.0 (0x00007fdad6ae8000)
libatk-1.0.so.0 => /usr/lib64/libatk-1.0.so.0 (0x00007fdad73e4000)
libcairo-gobject.so.2 => /usr/lib64/libcairo-gobject.so.2 (0x00007fdad8931000)
libcairo.so.2 => /usr/lib64/libcairo.so.2 (0x00007fdad69b0000)
libgdk_pixbuf-2.0.so.0 => /usr/lib64/libgdk_pixbuf-2.0.so.0 (0x00007fdad6983000)
libgio-2.0.so.0 => /usr/lib64/libgio-2.0.so.0 (0x00007fdad679f000)
libgobject-2.0.so.0 => /usr/lib64/libgobject-2.0.so.0 (0x00007fdad673e000)
libglib-2.0.so.0 => /usr/lib64/libglib-2.0.so.0 (0x00007fdad65fa000)
libgraphite2.so.3 => /usr/lib64/libgraphite2.so.3 (0x00007fdad65d5000)
libwebp.so.7 => /usr/lib64/libwebp.so.7 (0x00007fdad6569000)
libwebpdemux.so.2 => /usr/lib64/libwebpdemux.so.2 (0x00007fdad8193000)
libevent-2.1.so.7 => /usr/lib64/libevent-2.1.so.7 (0x00007fdad6517000)
libvpx.so.8 => /usr/lib64/libvpx.so.8 (0x00007fdad6200000)
libpixman-1.so.0 => /usr/lib64/libpixman-1.so.0 (0x00007fdad6473000)
libdbus-glib-1.so.2 => /usr/lib64/libdbus-glib-1.so.2 (0x00007fdad6449000)
libdbus-1.so.3 => /usr/lib64/libdbus-1.so.3 (0x00007fdad61b3000)
libxcb-shm.so.0 => /usr/lib64/libxcb-shm.so.0 (0x00007fdad73df000)
libX11-xcb.so.1 => /usr/lib64/libX11-xcb.so.1 (0x00007fdad73da000)
libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007fdad6189000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6 (0x00007fdad5e00000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libgcc_s.so.1 (0x00007fdad616a000)
libicudata.so.73 => /usr/lib64/libicudata.so.73 (0x00007fdad3e00000)
libXrender.so.1 => /usr/lib64/libXrender.so.1 (0x00007fdad643c000)
libplds4.so => /usr/lib64/libplds4.so (0x00007fdad6437000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fdad6157000)
libpng16.so.16 => /usr/lib64/libpng16.so.16 (0x00007fdad6120000)
libexpat.so.1 => /usr/lib64/libexpat.so.1 (0x00007fdad60f5000)
libgmodule-2.0.so.0 => /usr/lib64/libgmodule-2.0.so.0 (0x00007fdad60ee000)
libpangocairo-1.0.so.0 => /usr/lib64/libpangocairo-1.0.so.0 (0x00007fdad60dd000)
libpangoft2-1.0.so.0 => /usr/lib64/libpangoft2-1.0.so.0 (0x00007fdad60c5000)
libfribidi.so.0 => /usr/lib64/libfribidi.so.0 (0x00007fdad60a5000)
libepoxy.so.0 => /usr/lib64/libepoxy.so.0 (0x00007fdad5cd2000)
libXi.so.6 => /usr/lib64/libXi.so.6 (0x00007fdad6092000)
libatk-bridge-2.0.so.0 => /usr/lib64/libatk-bridge-2.0.so.0 (0x00007fdad6056000)
libwayland-client.so.0 => /usr/lib64/libwayland-client.so.0 (0x00007fdad6044000)
libxkbcommon.so.0 => /usr/lib64/libxkbcommon.so.0 (0x00007fdad3dba000)
libwayland-cursor.so.0 => /usr/lib64/libwayland-cursor.so.0 (0x00007fdad603a000)
libwayland-egl.so.1 => /usr/lib64/libwayland-egl.so.1 (0x00007fdad6033000)
libXcursor.so.1 => /usr/lib64/libXcursor.so.1 (0x00007fdad6026000)
libxcb-render.so.0 => /usr/lib64/libxcb-render.so.0 (0x00007fdad5cc3000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007fdad3d25000)
libmount.so.1 => /lib64/libmount.so.1 (0x00007fdad3cc3000)
libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdad3c28000)
libsharpyuv.so.0 => /usr/lib64/libsharpyuv.so.0 (0x00007fdad5cbb000)
libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007fdad601f000)
libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00007fdad5cb3000)
libatspi.so.0 => /usr/lib64/libatspi.so.0 (0x00007fdad3bed000)
libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdad3b95000)
lyall@Lyalls-PC ~
$
Aug 8 20:25:06 lyalls-pc kernel: Web Content[21383]: segfault at 0 ip 00007ff82c9fadbc sp 00007fffdc510760 error 6 in libxul.so[7ff82bfc9000+54fc000] likely on CPU 1 (core 1, socket 0)
Aug 8 20:25:06 lyalls-pc kernel: Code: 48 8b 7c 24 08 48 85 ff 0f 84 72 fc ff ff 48 8b 07 ff 50 10 e9 67 fc ff ff 48 8d 05 52 21 0e fe 48 8b 0d 3f 61 f7 04 48 89 01 <c7> 04 25 00 00 00 00 be 01 00 00 e8 84 e7 ab 04 e8 af e7 ab 04 cc

Oh, did I mention, being Gentoo linux I use, Firefox is compiled from source, not a binary download.

(In reply to Lyall Pearce from comment #11)

Oh, did I mention, being Gentoo linux I use, Firefox is compiled from source, not a binary download.

Is it compiled via the gentoo package manager or "by hand" from you?

Package manager, so I can tweak the build to include debug flags if that will help.

rebuilding with debug symbols and will post a core stack trace when it happens again....

Stack trace of Web Content on my Gentoo Linux box

(In reply to Lyall Pearce from comment #15)

Stack trace of Web Content on my Gentoo Linux box

That seems to be an instance of bug 1833951. Would you mind to move over there with your stack trace and offer help with debugging if you can reproduce this? Thank you for your support!

See Also: → 1833951
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: