Closed Bug 1769616 Opened 2 years ago Closed 2 years ago

[openSUSE Tumbleweed] Crash in [@ syscall | libnuma.so.1@0x4c0b]

Categories

(Core :: Audio/Video: Playback, defect)

defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 blocking fixed

People

(Reporter: pascalc, Assigned: gerard-majax)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(7 files)

Crash report: https://crash-stats.mozilla.org/report/index/6c21669c-d12f-4e1a-b7ce-9f8360220516

Reason: SIGSYS / SYS_SECCOMP

Top 10 frames of crashing thread:

0 libc.so.6 syscall 
1 libnuma.so.1 libnuma.so.1@0x0000000000004c0b 
2 libgomp.so.1 libgomp.so.1@0x000000000003e8a5 
3 libnuma.so.1 libnuma.so.1@0x000000000000da77 
4 libnuma.so.1 libnuma.so.1@0x000000000000da77 
5 ld-linux-x86-64.so.2 call_init /usr/src/debug/glibc-2.35-2.4.x86_64/elf/dl-init.c:26
6 ld-linux-x86-64.so.2 _dl_init /usr/src/debug/glibc-2.35-2.4.x86_64/elf/dl-init.c:117
7 libc.so.6 __GI__dl_catch_exception /usr/src/debug/glibc-2.35-2.4.x86_64/elf/dl-error-skeleton.c:182
8 ld-linux-x86-64.so.2 dl_open_worker /usr/src/debug/glibc-2.35-2.4.x86_64/elf/dl-open.c:808
9 libc.so.6 __GI__dl_catch_exception /usr/src/debug/glibc-2.35-2.4.x86_64/elf/dl-error-skeleton.c:208

I see mozilla::FFmpegRuntimeLinker::Init() in all of these stacks.

Component: General → Audio/Video: Playback

Jed, these are SYS_SECCOMP crashes. Could this be due to sandboxing changes you made?

Flags: needinfo?(jld)

In that range, bug 1769182 is suspicious, as it relates to sec comp. Bug 1759784 is video related, so I suppose that could also cause issues, but that seems less likely to me as we seem to be hitting the sandbox immediately upon initialiation.

Oops, bug 1769182 got backed out immediately so it can't be at fault here.

Flags: needinfo?(jld)

Maybe Bug 1759784 is the regressor then?

Flags: needinfo?(stransky)

Random tabs crashing. I see that I'm not alone. ;-)

Operating System: openSUSE Tumbleweed 20220515
KDE Plasma Version: 5.24.5
KDE Frameworks Version: 5.93.0
Qt Version: 5.15.2
Kernel Version: 5.17.5-1-default (64-bit)
Graphics Platform: X11
Processors: 4 × Intel® Core™ i7-4810MQ CPU @ 2.80GHz
Memory: 31.0 GiB of RAM
Graphics Processor: Mesa Intel® HD Graphics 4600

openSUSE Tumbleweed contains strange packages, like ffmpeg5.0 + openh264 so I'm not surprised it crashes.
Also mozregressions can't help you much as openSUSE Tumbleweed is a rolling release and packages are changed so the crash itself can be caused by a package combination or so.

Would be great to get a backtrace of the crash, please install debuginfo packages for libnuma/libgomp and try coredump to get a backtrace of the crash.
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Using_coredumpctl_to_get_backtrace

Also does it crashes whole browser or just a web tab or do you see only a fallback to SW decode?
Thanks.

Flags: needinfo?(stransky) → needinfo?(p7272)

Just random tabs crash.
The debugging... I'm a little embarrassed to say, but it's not working for me. New for me, but would love to learn. ;^))
I run Nightly from this path--> /home/jonzn4suse/Downloads/Firefox/Nightly/firefox/firefox-bin and the cmds from the fedoraproject are not working for me. The only cmd I got to work is firefox -g -d gdb. Looking at my path, I'm guessing I should run something like this?
/home/jonzn4suse/Downloads/Firefox/Nightly/firefox/firefox-bin -g -d gdb

Flags: needinfo?(p7272)

(In reply to jonzn4SUSE from comment #9)

Just random tabs crash.
The debugging... I'm a little embarrassed to say, but it's not working for me. New for me, but would love to learn. ;^))
I run Nightly from this path--> /home/jonzn4suse/Downloads/Firefox/Nightly/firefox/firefox-bin and the cmds from the fedoraproject are not working for me. The only cmd I got to work is firefox -g -d gdb. Looking at my path, I'm guessing I should run something like this?
/home/jonzn4suse/Downloads/Firefox/Nightly/firefox/firefox-bin -g -d gdb

Run from gdb does not work as RDD subprocess crashes, not Firefox main process.
You just need to run nightly, wait until the tab crashes and then run 'coredumpctl list' on terminal and you should see the crash here. coredumpctl is a part of systemd so you're supposed to have it.

Syscall number is 0xee which is set_mempolicy. But why is libnuma trying to change the main thread's NUMA memory policy at static initializer time?

(Given the debuginfo packages for libnuma and the existing crash report, addr2line -i would probably give us some useful information.)

No joy with coredumpctl

Just FYI...

Comment on attachment 9277037 [details]
Crash reports_Screenshot_20220518_014453.jpeg

In case someone wanted to copy the Id instead of typing it. ;-)

bp-1f4632bf-970e-45bc-8fc2-778260220518 5/18/22, 01:44
View
bp-b21e1e91-52b9-40f1-9c5b-b13f80220518 5/18/22, 01:44
View
bp-e0429aa1-ef52-4112-aa64-521440220518 5/18/22, 01:44
View
bp-7422512b-4ffb-43ba-b384-1c3cb0220518 5/18/22, 01:44
View
bp-2dd5ca00-5dd5-4fdc-870e-a43d10220518 5/18/22, 01:44
View
bp-9c14a463-a5de-4044-944e-36c500220517 5/17/22, 05:30
View
bp-a0ae7927-f9c2-4329-ac5a-b81700220517 5/17/22, 05:30
View
bp-c56c8cfb-87ec-479c-9d7b-322790220517 5/17/22, 05:08
View

Reporter, which libnuma version do you have installed?
Thanks.

Flags: needinfo?(pascalc)
Flags: needinfo?(pascalc) → needinfo?(p7272)
Summary: Crash in [@ syscall | libnuma.so.1@0x4c0b] → [openSUSE Tumbleweed] Crash in [@ syscall | libnuma.so.1@0x4c0b]

My crashing adventure continues even after updating today.

2580c807-57a2-e948-a9fd-2113736e4779 5/18/22, 21:29
2365f90d-fc9f-f676-8798-b647b30fa848 5/18/22, 21:29
7160719c-a682-cdfb-dd3e-ef2d2a353575 5/18/22, 21:28
7957f913-fbe2-9f8c-a9b5-e625a11fad5b 5/18/22, 21:28 .

Flags: needinfo?(p7272)

Copying crash signatures from duplicate bugs.

Crash Signature: [@ syscall | libnuma.so.1@0x4c0b] → [@ syscall | libnuma.so.1@0x4c0b] [@ syscall | numa_init]

Could it be from thigher sandboxing on utility process? no because it landed as default in 20220516215740 but pascal's comment mentions the problem starting on 20220514213937

Assignee: nobody → lissyx+mozillians
Crash Signature: [@ syscall | libnuma.so.1@0x4c0b] [@ syscall | numa_init] → [@ syscall | libnuma.so.1@0x4c0b] [@ syscall | numa_init]

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #11)

Syscall number is 0xee which is set_mempolicy. But why is libnuma trying to change the main thread's NUMA memory policy at static initializer time?

(Given the debuginfo packages for libnuma and the existing crash report, addr2line -i would probably give us some useful information.)

I can't be 100% sure but I tend to recall, at some point, when establishing Utility process sandbox, I had to allow set_mempolicy when experimenting. This went away, though.

I'll have a look at ffmpeg 5.0 maybe they changed something

This is where the syscalls are likely coming from. It seems like they're setting memory policies on library load.

Specifically that function calls set_sizes() which in turn calls set_kernel_abi() that does the syscalls. Interestingly if get_mempolicy() fails then set_mempolicy() should never be called. Did something changes so that get_mempolicy() passes? Or maybe it's a SUSE specific change to libnuma.

The Tumbleweed package was updated on the 10th of May so could be a vendor-induced regression.

(In reply to Gabriele Svelto [:gsvelto] from comment #23)

Specifically that function calls set_sizes() which in turn calls set_kernel_abi() that does the syscalls. Interestingly if get_mempolicy() fails then set_mempolicy() should never be called. Did something changes so that get_mempolicy() passes? Or maybe it's a SUSE specific change to libnuma.

Yes we allow get_mempolicy: https://searchfox.org/mozilla-central/rev/7f729f601c0b738f870ae0ed49098f9268e250f9/security/sandbox/linux/SandboxFilter.cpp#2057-2059

Alright, then that's the likely culprit. OpenSUSE Tumbleweed's package carries a couple of patches but they don't touch that code.

Attachment #9277335 - Attachment description: WIP: Bug 1769616 - Allow set_mempolicy where we allow get_mempolicy → Bug 1769616 - Allow set_mempolicy where we allow get_mempolicy r?jld!

(In reply to Gabriele Svelto [:gsvelto] from comment #23)

Specifically that function calls set_sizes() which in turn calls set_kernel_abi() that does the syscalls.

That's… not the ideal way of doing that, since it can race with allocations on other threads and affect what memory they get. (The better way to probe features like this, which usually works, is to pass a null pointer so that you get EFAULT if it's recognized or EINVAL if it's not, with no side effects.)

Or maybe it's a SUSE specific change to libnuma.

Looks like it was added in this commit, which is newer than the latest release. SUSE might be tracking the development branch.

Attachment #9277335 - Attachment description: Bug 1769616 - Allow set_mempolicy where we allow get_mempolicy r?jld! → Bug 1769616 - Error(ENOSYS) for set_mempolicy() on Content and Utility AudioDecoder r?jld!

(In reply to Martin Stránský [:stransky] (ni? me) from comment #8)

openSUSE Tumbleweed contains strange packages, like ffmpeg5.0 + openh264 so I'm not surprised it crashes.
Also mozregressions can't help you much as openSUSE Tumbleweed is a rolling release and packages are changed so the crash itself can be caused by a package combination or so.

Installed an opensuse tumbleweed, and from crashes it looks like it is not using ffmpeg-5 but ffmpeg-4, we have libavcodec58

Depends on: 1770406

I failed to repro the issue so far on a VM with opensuse tumbleweed and local build. The libnuma1 version does match what is reported here.

I did however hit an issue https://bugzilla.mozilla.org/show_bug.cgi?id=1770406 so I have a few questions:

  • does aac works with opensuse packages?
  • do you repro with a self-build or a mozilla build or an opensuse package?
  • if aac does not work can you do a run with a debug build and MOZ_LOG=PlatformDecoderModule:5 set in the env to verify ; OR ; symlink libavcodec.so.58 to point to libavcodec.so.58.134 in /usr/lib64 ?

[Note that I'm using security.sandbox.content.syscall_whitelist 238 as a workaround on the affected version during this test.]

  • AAC decoding works both with the affected version (with the workaround mentioned above), and with openSUSE's packaged version of firefox.
  • I repro this bug with the mozilla-built nightlies. I'm not in a position to do a self-build here at the moment; the opensuse firefox package is from the release branch (100.0 for openSUSE), and does not reproduce the issue as seen in the nightlies.
  • (AAC was working, so I didn't try debug builds)
  • I haven't had a chance to try AAC decoding without the about:config workaround above applied (sorry!) — it's possible this'd just crash the tab.

As noted in the sandbox log in bug #1769888, there are some errors loading libavcodec mentioned, which includes both the bundled and distro versions, but they don't seem to be fatal. Looking in /proc/<pid>/maps, it looks like the openSUSE libavcodec is the one being used, not any bundled version: /usr/lib64/libavcodec.so.58.134.100

Relevant log lines:

Sandbox: Recording mapping /home/david/firefox/firefox-bin -> /proc/13676/exe
Sandbox: Failed errno -2 op open flags 02000000 path /home/david/firefox/libavcodec.so.59
Sandbox: Failed errno -2 op open flags 02000000 path /lib64/libavcodec.so.59
Sandbox: Failed errno -2 op open flags 02000000 path /usr/lib64/libavcodec.so.59
<…>
Sandbox: Failed errno -2 op open flags 02000000 path /home/david/firefox/libavcodec.so.58

Thanks. I figured after that opensuse has patches for the soname. Do you have steps to repro the issue reliably ? I tried several audio play-back but nothing triggered

Flags: needinfo?(david)

It's reproducing completely reliably on this machine, on any page which loads media I've tried. (For example, the youtube homepage, or just any media file).

That being said, I just tried to reproduce this on a different machine (with the same openSUSE Tumbleweed + Nightly setup), and media elements were loading perfectly. That machine did have a bunch of about:config overrides for vaapi-based decoding, but even with those disabled it seems to be working. (The other big differences between them are that the working one is a laptop w/ Intel+Mesa hardware and the upstream kernel, the other is a desktop running (a slightly modified) nVidia proprietary driver and the openSUSE patched kernel. As well as some other locale/installed packages differences on the machines themselves.)

So it seems to be dependent on something else. I'll upload about:support for both systems, and will try to answer any questions you have, but will be away from them for the next 2–3 weeks, so there's a limit to what I'll be able to do.

Flags: needinfo?(david)

Thanks, that could explain why i cant. Ill have a look at those support file, thanks !

Installing ffmpeg-4 (and libavcodec.so.58.134) from Packman repository as suggested on https://en.opensuse.org/SDB:Firefox_MP4/H.264_Video_Support triggers

Looks like OpenSUSE Tumbleweed package are not built with --enable-libx265 but Packman are, and this is what is pulling libnuma

(In reply to Alexandre LISSY :gerard-majax from comment #39)

Installing ffmpeg-4 (and libavcodec.so.58.134) from Packman repository as suggested on https://en.opensuse.org/SDB:Firefox_MP4/H.264_Video_Support triggers

Looks like OpenSUSE Tumbleweed package are not built with --enable-libx265 but Packman are, and this is what is pulling libnuma

Good news, after that I could confirm that I repro the issue using this version of the ffmpeg libs. And once I run a build with the fix pending review, then I can perform playback on YouTube.

Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a8056fbf4a4
Error(ENOSYS) for set_mempolicy() on Content and Utility AudioDecoder r=jld
Crash Signature: [@ syscall | libnuma.so.1@0x4c0b] [@ syscall | numa_init] → [@ syscall | libnuma.so.1@0x4c0b] [@ syscall | numa_init] [@ libc.so.6@0x125add | libnuma.so.1@0x4c0b ] [@ libc.so.6@0x1258ad | libnuma.so.1@0x4c0b ] [@ libc.so.6@0x10c8ad | libnuma.so.1@0x4c0b ] [@ libc.so.6@0x12a49d | numa_init ] [@ numa_init ]
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: