Open Bug 1690929 Opened 5 years ago Updated 2 years ago

Firefox crashes and freezes X server and udev when playing youtube videos on linux 5.10.x kernel

Categories

(Core :: Graphics, defect, P3)

Firefox 85
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: whattheheccamidoing, Unassigned)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36

Steps to reproduce:

Playing youtube videos will randomly crash firefox. In one hour long youtube session with multiple videos, there's a good chance this will happen once, but certain videos will always exhibit this behavior. https://www.youtube.com/watch?v=XQVMSpHlWQU is the video I've been using to test this behavior and is recreatable with near perfect certainty, but at completely random times in the video.

Actual results:

Firefox crashing, making udev not respond, and freezes the entire X server except for the mouse cursor. acpid responds to a power button press, and starts the shutdown process, but udev prevents shutdown as it's not responding. Since I have a laptop, I use laptop-mode-tools, and this process is what openRC says is not responding most of the time when it's enabled, but disabling laptop-mode-tools shows that udev not responding is the cause and presents the same behavior at shutdown, with openRC saysing udev is not responding instead of laptop-mode.

Firefox will usually be able to print out an error stating that the "manager is detached:" and states the MediaDecodeStateMachine.cpp file, but not always the same line of code or function.

Important to note that this bug only presents itself, as far as I know, on 5.10.x kernels, with a couple hours of tracing commits leading from this commit https://github.com/torvalds/linux/commit/93b694d096cc10994c817730d4d50288f9ae3d66 all the way back to https://github.com/torvalds/linux/commit/4de962300b883cc4aaafd7b625cbd497a299e6e1 and further down this commit tree. I havent spent enough time compiling and recompiling and switching commits to know where the bug started, but this is a good start to possibly find the cause of the bug. I have an inspiron 5570 with a core i5-8520U and intel UHD graphics using the i915 kernel driver. The changed to the i915 driver in those above commits are what leads me to believe is causing the bug, however any version of xf86-video-intel still causes the bug. The only thing I've found that fixes it is running a 5.9.x linux kernel, particularly the 5.9.14 kernel with gentoo patchset that has been saved on my system. I will also attach my 5.9.14 and 5.10.11 kernel config files for reference.

I'm running a standard gentoo install, and the precompiled firefox-bin package does not exhibit this behavior, as far as I can tell, which makes me assume that firefox is somehow not playing well with a compiler option, or odd use flag combination, possibly it doesn't like being compiled with a certain version of a package that the precompiled version from mozilla is not compiled with. This issue also exhibits itself in a completely separate gentoo install with the same use flags and kernel config, but using musl libc, so it's across libC's and operating systems on the same hardware. Regardless I've tried every combination of use flags that I could think of, making sure to try using gcc instead of clang, and the packaged versions of everything that had a use flag, rather than the system's. No matter what I compiled with or what use flags I used everything still exhibited this bug when running a 5.10.x kernel and did not present itself in a 5.9.14 kernel with no other changes to the software.

I have tried everything that I could think of to fix this, including diabling hardware acceleration, removing the .mozilla folder, safe mode, new user profiles, even logging out of my youtube account has all still produced the bug

Expected results:

Firefox should have continued playing the video without stuttering or crashing

OS: Unspecified → Linux
Hardware: Unspecified → x86_64

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core

Out of a freak coincidence of compiling and installing librewolf for a test, I think that the bug is caused by youtube running ads in videos. Will test further, but librewolf comes with uBlock Origin installed by default, and I noticed that the bug didn't present itself in librewolf after a test that has always shown the bug happening failed to show that it was present. Installed uBlock Origin on the default firefox install, and that worked too! Very glad to finally get somewhere on this!

More testing has shown a very odd behavior. No, uBlock Origin was not the savior, and removing it completely does nothing to prevent or cause the bug. However, librewolf itself seems to be my saving grace. I built librewolf with the same use flags as firefox, with firefox running 85.0-r1 and librewolf on 84.0.2 (however, the bug has been presenting itself since before 84.0.2 firefox anyway) and an odd behavior has been happening. After a clean reboot, starting firefox, immediately opening the testing video from above, and letting it run will crash firefox at some point throughout the video, causing me to need to restart my computer. Doing the same with librewolf however does not present the bug. Weirdly enough, simply running librewolf before running firefox causes the bug to not present itself in firefox after multiple tests. Simply starting librewolf is enough to fix the problem in firefox. Tested with and without uBlock Origin installed, and these findings remain the same: librewolf fixes the problem, and firefox creates it. Although I'm sure the bug was presenting itself before firefox 84.0.2, I will be ensuring that the bug is present on firefox 84.0.2 regardless.

I'm experiencing something kind of similar, but on open source Radeon (RadeonSi) graphics on Xorg/X11 on Fedora 33's version of Firefox 84 (and now 85). I haven't seen my Intel-based laptop (with its Intel Sandybridge graphics) freeze up, only my workstation sporting a radeon. It seems to happen at random when playing videos, but particularly if they are fullscreened or playing picture-in-picture mode.

When it happens, the sound still plays (but sometimes I've heard it crackle when that happens, strangely enough) and the whole display server except the mouse cursor freezes up (but the graphics don't crash, in my case) until either:

  • Firefox unfreezes by itself and video playback resumes
  • Firefox reaches the end of the video, playback stops, and everything unfreezes
  • you kill Firefox via SSH (then everything unfreezes)

I've been trying to find some clues by having the laptop logged into the affected workstation and monitoring these:

  • CPU usage in htop: nothing unusual going on, and barely 30-50% of my cores are used
  • radeontop: nothing unusual going on
  • the terminal output from Firefox (if you launch it with "DISPLAY:0 firefox" in SSH, for example): nothing unusual going on. The only interesting thing is that if you hit the spacebar to try to stop playback while Firefox is frozen, it will output a Javascript error on the terminal:
    Javascript error: , line 0: NotAllowedError: The play method is not allowed by the user agent or the platform in the current context, possibly because the user denied permission.
  • the live output of "journalctl -f", where nothing critical seems to happen, except these mysterious warnings that refer to Firefox's process:
feb 09 20:02:46 workstation pcscd[1028]: 99999999 auth.c:137:IsClientAuthorized() Process 222220 (user: 1000) is NOT authorized for action: access_pcsc
feb 09 20:02:46 workstation pcscd[1028]: 00000308 winscard_svc.c:335:ContextThread() Rejected unauthorized PC/SC client
feb 09 20:03:14 workstation pcscd[1028]: 28120894 auth.c:137:IsClientAuthorized() Process 222220 (user: 1000) is NOT authorized for action: access_pcsc
feb 09 20:03:14 workstation pcscd[1028]: 00000342 winscard_svc.c:335:ContextThread() Rejected unauthorized PC/SC client
feb 09 20:03:19 workstation pcscd[1028]: 04711345 auth.c:137:IsClientAuthorized() Process 222220 (user: 1000) is NOT authorized for action: access_pcsc
feb 09 20:03:19 workstation pcscd[1028]: 00000171 winscard_svc.c:335:ContextThread() Rejected unauthorized PC/SC client
feb 09 20:03:24 workstation pcscd[1028]: 05060165 auth.c:137:IsClientAuthorized() Process 222220 (user: 1000) is NOT authorized for action: access_pcsc
feb 09 20:03:24 workstation pcscd[1028]: 00000258 winscard_svc.c:335:ContextThread() Rejected unauthorized PC/SC client
feb 09 20:03:28 workstation pcscd[1028]: 04391746 auth.c:137:IsClientAuthorized() Process 222220 (user: 1000) is NOT authorized for action: access_pcsc
feb 09 20:03:28 workstation pcscd[1028]: 00000772 winscard_svc.c:335:ContextThread() Rejected unauthorized PC/SC client

...but those warnings don't happen at the time of the freeze, they seem to happen periodically before/during/after, and they seem to be totally unrelated (AFAIU this is about some security card daemon or something).

I'm really puzzled by this. If there is information I could provide to help troubleshoot this (without involving compiling a kernel nor Firefox) I'd be happy to help, as right now I'm unable to see what could be causing this.

I have webrender turned on (and it has been working fine for a year or so), but no dmabuf-related about:config tweaks or anything fancy. It just started happening out of the blue as a regression with Firefox 84 (if I recall correctly) and persists with Firefox 85.

Blocks: wr-linux
Severity: -- → S3
Priority: -- → P3

I have some additional clues/observations to report: it is probably related to audio or animation, not necessarily video decoding, because it also happens on audio-only Discord voice chats! And when this happened (multiple times): I was able to SSH into the machine, look at the system processes using the CPU in htop, and notice that it was one "Web Content" (firefox -contentproc) process eating up a lot of CPU; killing that process forced the offending Discord tab to crash, instantly unfreezing the operating system's udev/graphics stack, and the web browser. I was then able to simply reload the tab and re-join the VoIP conversation in Discourse, until it would freeze up again sometime later; rinse, repeat.

Jeff, can you reproduce this with Webrender disabled? That would be very good to know.

Flags: needinfo?(nekohayo)

Seems like a race condition of some kind... because of course when I tried to reproduce it today, it didn't happen, with or without webrender turned on, even with the older (5.10.15) Linux kernel I was previously on (instead of the newer 5.10.17 kernel I now have).

I tried @beanie's video above in all conditions and it played to the end, and then I played https://www.youtube.com/watch?v=5lMmnfVylEE for 3-4 hours on either of my two monitors, fullscreened or windowed, without any issues either. But maybe the bug would reappear, typically when I'm not looking for it... Damned heisenbugs!

Flags: needinfo?(nekohayo)
You need to log in before you can comment on or make changes to this bug.