Catch SIGXCPU
Categories
(Toolkit :: Crash Reporting, enhancement)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
Details
Our exception handler doesn't catch SIGXCPU on Linux so when we receive the signal Firefox will terminate immediately without leaving a crash report around. Crashpad's exception handler includes it among the list of caught signals so it should be possible to do the same in Breakpad.
Note that we might return gracefully from SIGXCPU and avoid crashing, but the process will be killed by a SIGKILL signal anyway sometimes later.
I'm getting a few crashes with SIGXCPU on the latest nightly asan builds (might have started about two weeks ago). No report is generated.
Is there any hope getting the crash handler to catch these?
| Reporter | ||
Comment 2•3 years ago
|
||
How did you encounter those crashes? We've had a SIGXCPU signal handler to deal with media tasks for a while see here so it's handled there but maybe not in other processes.
(In reply to Gabriele Svelto [:gsvelto] from comment #2)
How did you encounter those crashes? We've had a SIGXCPU signal handler to deal with media tasks for a while see here so it's handled there but maybe not in other processes.
The crashes are seemingly random during normal use with lot of tabs.
I've had some trouble getting proper core dumps, but so far I've got a truncated core dump which gives the crash as:
Program terminated with signal SIGXCPU, CPU time limit exceeded.
#0 internal_madvise () at /builds/worker/fetches/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:199
cannot get full backtrace working at the moment, due to truncated core file.
Nothing much is logged. I'll try to catch the firefox log too.
Process 207380 (firefox-bin) of user 1000 dumped core.
Module /home/.../firefox-bin with build-id 373c8c9a41732639bca63caa2873d85bfce36fd7
Stack trace of thread 207525:
#0 0x000055ff0db9fbea n/a (/home/.../firefox-bin + 0x13ebea)
ELF object binary architecture: AMD x86-64
| Reporter | ||
Comment 4•3 years ago
|
||
(In reply to Henri from comment #3)
The crashes are seemingly random during normal use with lot of tabs.
I've had some trouble getting proper core dumps, but so far I've got a truncated core dump which gives the crash as:
Are the tabs crashing or is it affecting the whole browser?
(In reply to Gabriele Svelto [:gsvelto] from comment #4)
(In reply to Henri from comment #3)
The crashes are seemingly random during normal use with lot of tabs.
I've had some trouble getting proper core dumps, but so far I've got a truncated core dump which gives the crash as:Are the tabs crashing or is it affecting the whole browser?
It's the whole browser
| Reporter | ||
Comment 6•3 years ago
|
||
(In reply to Henri from comment #5)
It's the whole browser
Thanks!
Paul, I filed this bug because I was musing to install a SIGXCPU handler in the main browser process. Given Henri's feedback I think I should raise the priority here and do it sooner rather than later, however I don't want to interfere in the SIGXCPU handler used by the media code. Do you know in which processes is that handler used? Is there anything else I should be aware about how we already use SIGXCPU signals inside Firefox? I wouldn't want to accidentally introduce a regression by turning harmless operations into full browser crashes.
Comment 7•3 years ago
|
||
A Firefox process can receive SIGXCPU if it has anything to do with real-time audio.
The parent process can receive it because it uses PulseAudio, that promotes the thread(s) servicing the real-time audio callbacks to a real-time scheduling class.
Content processes that are playing audio can receive it because some threads are promoted to real-time scheduling class because they are woken up by the aforementioned real-time callback thread from the parent process, and anything else would mean we have a priority inversion. Real-time being real-time, they need to be scheduled as soon as possible, and delays are unacceptable.
We're only handling it in the child process, because it's only there that the script can block the callback for long enough. We're catching the signal and demoting the thread, long before the "hard" limit that causes a process kill, so there's no need to handle it in the parent for us.
Catching it in the parent should not interfere with what we're doing.
| Reporter | ||
Comment 8•3 years ago
|
||
Excellent, thanks!
Description
•