1883915 - firefox tabs keep on crashing for no reason on gentoo/musl

Reporter

Description

•

8 months ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:123.0) Gecko/20100101 Firefox/123.0

Steps to reproduce:

i compiled the latest version of firefox on gentoo being www-client/firefox-123.0.1::gentoo against the musl/hardened/selinux profile. I have created a fresh new profile and also used troubleshoot mode to disable everything. I click on any webpage and randomly my tabs keep on crashing and even reloading them causes them to instantly crash again only after reloading quite a few times does it let me view the page. I cant see any error logs or crash reports anywhere so it would be great if you could tell me where i can find this. kind regards.

Actual results:

tabs crashing

Expected results:

tabs not crash

Anon

Reporter

Comment 1

•

8 months ago

[Parent 32547, IPC I/O Parent] WARNING: process 491 exited on signal 4: file /var/tmp/portage/www-client/firefox-123.0.1/work/firefox-123.0.1/ipc/chromium/src/base/process_util_posix.cc:265

This is the only error that I have seen from the terminal

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

8 months ago

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk

Product: Firefox → Core

Anon

Reporter

Updated

•

8 months ago

Component: Widget: Gtk → Security: Process Sandboxing

OS: Unspecified → Linux

Hardware: Unspecified → x86_64

Gian-Carlo Pascutto [:gcp]

Comment 3

•

8 months ago

•

Edited

Not sure why this was moved to Sandboxing?

Signal 4 is SIGILL so this means you compiled Firefox in a way that it's trying to use instructions that your CPU doesn't support. I think this bug is INVALID, unless i'm missing something?

If you can attach gdb to it you should see the faulty instruction. You'll need to do the following in gdb first:

handle SIG38 noprint nostop pass
handle SIG64 noprint nostop pass
handle SIGSYS noprint nostop pass

Or the debugger will stop on most system calls. But after that, you should be able to run and see where you crash.

Component: Security: Process Sandboxing → Widget: Gtk

Flags: needinfo?(glmlrqaoufmbirifie)

Anon

Reporter

Comment 4

•

8 months ago

Hello Gian, sorry for the late response. I have tried using gdb however when it crashes the tab it doesn't actually show up on gdb where I can do a bt it just carries on running. I am not sure what I could do further to see the exact reason why this is happening.

Flags: needinfo?(glmlrqaoufmbirifie)

Gabriele Svelto [:gsvelto]

Comment 5

•

8 months ago

Our current process architecture means that there will be different processes for different web pages. If there's a web page where you encounter this issue more frequently try the following:

Open a tab with the web page most likely to crash
In a separate tab navigate to about:processes
Identify the PID of the process corresponding to the web page you loaded
Attach with gdb to that process, letting it continue uninterrupted once gdb is attached
Keep browsing until the page crashes

Keep in mind that if you browse to a web page with a different domain the process will change, even within the same tab, and you'll have to reattach to it.

Anon

Reporter

Comment 6

•

3 months ago

Again sorry for the late reply I didn't see a response to this. I can confirm this happens as well with firefox-128

I have followed the advice above however I am getting the problem where the page is not responsive (i.e can't click on any elements and the page just freezes) when I attach to that process. However lldb does give me an output the moment I attach to it and I have posted its log however I am not sure if this is helpful.

Anon

Reporter

Comment 7

•

3 months ago

Attached file lldb log after attatching to process — Details

Martin Stránský [:stransky] (ni? me)

Updated

•

3 months ago

Priority: -- → P5

Gabriele Svelto [:gsvelto]

Comment 8

•

3 months ago

•

Edited

(In reply to Anon from comment #6)

I have followed the advice above however I am getting the problem where the page is not responsive (i.e can't click on any elements and the page just freezes) when I attach to that process. However lldb does give me an output the moment I attach to it and I have posted its log however I am not sure if this is helpful.

The process freezing is normal behavior, you have to tell lldb to resume the process execution after attaching:

lldb --attach-pid <pid>
wait for the process to be attached
in the lldb prompt execute: process continue
the process will resume at this point, keep using it. At the point of crash the process will freeze again, go back to the lldb window and you'll find that it will display the crashing thread and the signal that stopped it
in the lldb prompt execute: bt this will print the stack trace at the point of crash, save it and attach it here
in the lldb prompt use quit to leave the debugging session

Anon

Reporter

Comment 9

•

3 months ago

I have used the above steps and obtained a bt of what is causing a crash on a particular tab. I am not sure exactly if this is the crash I wanted to find out or it could be another one. Either way, here it is.

Anon

Reporter

Comment 10

•

3 months ago

Attached file crash.log — Details

Gabriele Svelto [:gsvelto]

Comment 11

•

3 months ago

This appears to be a sandboxing issue. A child process is crashing because it's trying to load a font and it's being killed by the SIGSYS signal we raise when an syscall that is not allowed within the sandbox gets called. It is bizarre that we're trying to load a font in a child process, under normal conditions it should never happen. However yours is a peculiar setup using both musl and SELinux. SELinux in particular may alter the control flow significantly enough that we might end up in an unexpected situation, and try loading a font from within a child process instead of the parent process. Without more information it's hard to know why this happens. We'd need a trace of the whole execution of the code that loads the font in the crashing process in order to debug this. That being said you might want to check if SELinux is blocking some syscalls right before the crash, that might give us a hint as to what is happening.

Anon

Reporter

Comment 12

•

3 months ago

I would like to mention that I am no longer using selinux and just using a musl/clang profile. This is also an issue on musl/gcc profile on gentoo. So how would I be able to obtain a whole execution of the code? I think this issue is related to the musl libc as most if not all systems that I have used and some of my colleagues have used all have the exact same issue.

Gabriele Svelto [:gsvelto]

Comment 13

•

3 months ago

One possibility is via the rr tool - which is packaged for Gentoo - but it generates enormous traces that are hard to share. However, Alpine Linux is building Firefox with musl and they're not encountering this particular issue. Did you change anything else to Firefox' configuration? Have you got any preferences that were manually modified in about:config? If you could attach the contents of the about:support page it would be very helpful.

Anon

Reporter

Comment 14

•

3 months ago

Attached file about_support.txt — Details

I have attached the information on about:support. Please let me know if there is anything else that you would like me to share. I don't have experience with a graphical alpine amd64 system unfortunately, so I am not sure what its like for alpine systems.

Anon

Reporter

Comment 15

•

3 months ago

(In reply to Gabriele Svelto [:gsvelto] from comment #11)

This appears to be a sandboxing issue. A child process is crashing because it's trying to load a font and it's being killed by the SIGSYS signal we raise when an syscall that is not allowed within the sandbox gets called. It is bizarre that we're trying to load a font in a child process, under normal conditions it should never happen. However yours is a peculiar setup using both musl and SELinux. SELinux in particular may alter the control flow significantly enough that we might end up in an unexpected situation, and try loading a font from within a child process instead of the parent process. Without more information it's hard to know why this happens. We'd need a trace of the whole execution of the code that loads the font in the crashing process in order to debug this. That being said you might want to check if SELinux is blocking some syscalls right before the crash, that might give us a hint as to what is happening.

Should we move this bug from Gtk → Security: Process Sandboxing?

Gabriele Svelto [:gsvelto]

Comment 16

•

3 months ago

(In reply to Anon from comment #15)

Should we move this bug from Gtk → Security: Process Sandboxing?

Not really, because that font shouldn't be loaded in a child process at all. The sandboxing code is doing what it's expected to do in this case, we have to figure out why the code that loads the fonts is getting confused.

Anon

Reporter

Comment 17

•

3 months ago

I have been looking in the alpine packages repository to see how they build firefox. There are two patches related to the sandbox, maybe it might be something related with these that trigger the problem on gentoo.

https://git.alpinelinux.org/aports/tree/community/firefox/sandbox-fork.patch
https://git.alpinelinux.org/aports/tree/community/firefox/sandbox-sched_setscheduler.patch

I am thinking it might be the first one although I could be wrong entirely.

lldb log after attatching to process 3 months ago Anon 22.50 KB, text/x-log		Details
crash.log 3 months ago Anon 16.46 KB, text/x-log		Details
about_support.txt 3 months ago Anon 24.67 KB, text/plain		Details