Video crash firefox, no crashlog
Categories
(Core :: Audio/Video: cubeb, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox109 | --- | disabled |
firefox110 | --- | disabled |
firefox111 | --- | fixed |
People
(Reporter: grin, Assigned: pehrsons)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
Steps to reproduce:
Firefox crashes on various video chat applications. My test was done with
https://melonapp.com/ because it was crashing pretty fast on setup screen (usually within 1 mnute).
Actual results:
Firefox crashes. No crash log, only this:
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
CPU time limit exceeded
Bisect told me that:
23:47.89 INFO: Last good revision: 23705e7275490be1e8422adcf07ab7df296858ce
23:47.89 INFO: First bad revision: 1fde17a7764ba61cf1f6449c46627a28c54d97bf
I really hope that it was really good, not just "not yet crashed" (waited for 3-4 minutes and did resolution changes in hope to crash faster).
Expected results:
A crashlog would be nice. Or something suggesting what's happening. Best would be no crash.
Comment 1•2 years ago
|
||
The latest Nightly 111.0a1 doesn't crash on my side on https://melonapp.com/, tested on macOS 11.7 and on Ubuntu 20.04.
Setting this to Core: WebRTC so that our developers could take a look at this issue. If this is not the right component, please assign a more appropriate one. Thanks!
Reporter | ||
Comment 2•2 years ago
|
||
I have been talking about this with the Internet, and possible wild guesses:
- FF may have been killed due to prlimit, aka. RTTIME timeout for real-time tasks
- based on bisect it may or may not be related to audio (AND possibly video)
- melonapp was just an easy example, it crashes on microsoft© teams™ or possibly many of the video chat services. It definitely doesn't crash on bbb, but I cannot really offer experience with others.
Assignee | ||
Comment 3•2 years ago
|
||
Hi, thanks for filing!
I guess this is linux? Which distro and version? Also is this a distro-packaged Firefox or did you get it off Mozilla's server?
"CPU time limit exceeded" means we blew the budget for a realtime thread and got killed by SIGXCPU. We do use realtime threads for audio (input/output for "realtime audio" aka MediaStream/WebAudio based audio) . Your wild guesses are not so wild! We do however get a heads up in content processes that we are taking too long in the callback and thereby should disable the realtime mode. Probably this crash happens in the parent process then.
Thanks for bisecting this, however I have not been able to make sense of the revision hashes. They don't seem valid for nightly, beta or release. Could you help me pinpoint them?
Also if this is fairly easy to reproduce, could you try to reproduce it under gdb to get a stack? I reckon the parent process is where it is crashing, so it should be enough to start firefox through gdb and you're already hooked to the right process.
Note, here is an explanation of how we use realtime threads for audio. I also see bug 1795719 which is for a SIGXCPU crash for screen capture. This is surprising because that thread should not be a realtime thread.
Reporter | ||
Comment 4•2 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #3)
I guess this is linux? Which distro and version? Also is this a distro-packaged Firefox or did you get it off Mozilla's server?
Oh, sorry, I fogot that there is no detailed crashlog. :-)
Debian sid/amd64, and it includes various bits and pieces from Debian-multimedia.
I do not think v111 is packaged anywhere, so no, this is Nightly autoupdating itself, but if you look at the bisect you'll see that this possibly originates way back in the past.
Thanks for bisecting this, however I have not been able to make sense of the revision hashes. They don't seem valid for nightly, beta or release. Could you help me pinpoint them?
This was offered by the arch's automagic bisect tool and it had to get deep into the revs, so I guess these are not release tags but changesets: this one is the bad and I guess these are the changes before it, two of them seem to be inbetween. I cannot readily tell you how to get unifiied difference of it.
Also if this is fairly easy to reproduce, could you try to reproduce it under gdb to get a stack? I reckon the parent process is where it is crashing, so it should be enough to start firefox through gdb and you're already hooked to the right process.
I hope this helps (I worry that there are multiple subsystems, namely audio and video is interacting):
Thread 210 "CaptureThread" received signal SIGXCPU, CPU time limit exceeded.
[Switching to Thread 0x7fffc85ba6c0 (LWP 30807)]
0x00007fffee8dd690 in CopyRow_ERMS () from /mnt/storage/firefox/libxul.so
(gdb) bt
#0 0x00007fffee8dd690 in CopyRow_ERMS () at /mnt/storage/firefox/libxul.so
#1 0x00007fffee8d11d1 in CopyPlane () at /mnt/storage/firefox/libxul.so
#2 0x00007fffee8ce601 in ScalePlane () at /mnt/storage/firefox/libxul.so
#3 0x00007fffee8c9793 in libyuv::I4xxToI420(unsigned char const*, int, unsigned char const*, int, unsigned char const*, int, unsigned char*, int, unsigned char*, int, unsigned char*, int, int, int, int, int) () at /mnt/storage/firefox/libxul.so
#4 0x00007fffee8c96ff in I422ToI420 () at /mnt/storage/firefox/libxul.so
#5 0x00007fffee8d22f7 in libyuv::JpegI422ToI420(void*, unsigned char const* const*, int const*, int) () at /mnt/storage/firefox/libxul.so
#6 0x00007fffee8cc3f7 in libyuv::MJpegDecoder::DecodeToCallback(void (*)(void*, unsigned char const* const*, int const*, int), void*, int, int) () at /mnt/storage/firefox/libxul.so
#7 0x00007fffee8d21a6 in MJPGToI420 () at /mnt/storage/firefox/libxul.so
#8 0x00007fffee8d2dce in ConvertToI420 () at /mnt/storage/firefox/libxul.so
#9 0x00007fffee4b33cc in webrtc::videocapturemodule::VideoCaptureImpl::IncomingFrame(unsigned char*, unsigned long, webrtc::VideoCaptureCapability const&, long) () at /mnt/storage/firefox/libxul.so
#10 0x00007fffee4b259c in webrtc::videocapturemodule::VideoCaptureModuleV4L2::CaptureProcess() () at /mnt/storage/firefox/libxul.so
#11 0x00007fffee4b2784 in std::_Function_handler<void (), webrtc::videocapturemodule::VideoCaptureModuleV4L2::StartCapture(webrtc::VideoCaptureCapability const&)::$_2>::_M_invoke(std::_Any_data const&) () at /mnt/storage/firefox/libxul.so
#12 0x00007fffee505b37 in std::_Function_handler<void (), rtc::PlatformThread::SpawnThread(std::function<void ()>, absl::string_view, rtc::ThreadAttributes, bool)::$_1>::_M_invoke(std::_Any_data const&) () at /mnt/storage/firefox/libxul.so
#13 0x00007fffee505a46 in rtc::(anonymous namespace)::RunPlatformThread(void*) () at /mnt/storage/firefox/libxul.so
#14 0x000055555560182e in set_alt_signal_stack_and_start(PthreadCreateParams*) ()
#15 0x00007ffff7aa7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#16 0x00007ffff7b2866c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Assignee | ||
Comment 5•2 years ago
|
||
Thanks! This is an rtc::PlatformThread
SIGXCPU
like in bug 1795719. I'm not sure why this thread would have a cpu time limit, since it should not be a realtime thread. But audioipc2 as a regressor makes some sense since it sets realtime priority for its audio threads. Could you run through the steps in bug 1795719 comment 9 also (though in your case use camera instead of screen capture, and "Capture" instead of "Screen" in step 5 when finding the TID)?
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Comment 7•2 years ago
|
||
~$ chrt -m
SCHED_OTHER min/max priority : 0/0
SCHED_FIFO min/max priority : 1/99
SCHED_RR min/max priority : 1/99
SCHED_BATCH min/max priority : 0/0
SCHED_IDLE min/max priority : 0/0
SCHED_DEADLINE min/max priority : 0/0
$ ulimit -Sa
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 35
file size (blocks, -f) unlimited
pending signals (-i) 127554
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 99
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127554
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ ps -Lp 18719 | grep Cap
18719 28994 pts/77 00:00:00 VideoCapture
18719 29071 pts/77 00:00:00 CaptureThread
$ chrt -ap 28994 | grep 28994
pid 28994's current scheduling policy: SCHED_OTHER
pid 28994's current scheduling priority: 0
$ chrt -ap 28994 | grep 29071
pid 29071's current scheduling policy: SCHED_FIFO
pid 29071's current scheduling priority: 96
Assignee | ||
Comment 8•2 years ago
|
||
Thanks! This makes some sense because "CaptureThread" is a libwebrtc thread with priority kHigh
which ends up having a sched_param.sched_priority
of in this case max(99 - 1 - 2, 2)=96
. Then it ends up in pthread_setschedparam(pthread_self(), policy, ¶m)
, and AH policy
is SCHED_FIFO
.
kinetik, paul, while I can work around this by patching libwebrtc, do you have any clue why audioipc2 would regress this? As in, would it affect whether other threads than those it manages itself may set an rt scheduler? Does it do something differently from audioipc1?
Assignee | ||
Comment 9•2 years ago
|
||
That SCHED_FIFO has been around since the dawn of time, fwiw. We had a local fix back in the FxOS days but it soon got lost. Fixing this upstream was also dismissed.
I'm guessing we were not normally allowed to set SCHED_FIFO on desktop linux systems, so this never had an effect. It doesn't on my local Ubuntu 22.04 either.
Comment 10•2 years ago
|
||
What's happening here is that audioipc2
sets an rlimit(RLIMIT_RTTIME, ...)` equivalent to roughly 5ms, while the previous code used the default value of about 50ms. 5ms is too short for copying large buffers around, especially desktop capture in YUV422 as it seems to be the case, because that's going to be multiple megabytes.
This was never intended to be used on anything else but the real-time audio threads, but looks like it works well enough, we should move it back to 50ms. We're doing cleanups in the audio remoting later this year (after the removal of the old branch), and we'll be able to make this nicer, probably.
Assignee | ||
Comment 11•2 years ago
|
||
My ulimit -Sa
reports real-time priority (-r) 0
, so setting this to 99
would be a prerequisite to reproducing.
It might still make sense to remove these libwebrtc threads' ability to set SCHED_FIFO.
Comment 12•2 years ago
|
||
A fix mitigation / fix for this is up at https://github.com/mozilla/audioipc/pull/165.
Assignee | ||
Comment 13•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 14•2 years ago
|
||
Assignee | ||
Updated•2 years ago
|
Comment 16•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Description
•