Firefox (with extensions) locks up system when unlimited and crashes with oom-kill when running with slightly limited (20G) memory.
Categories
(Core :: Widget: Gtk, defect)
Tracking
()
People
(Reporter: vkosd.20.mddlr, Unassigned, NeedInfo)
References
(Depends on 1 open bug)
Details
(Keywords: crash, memory-footprint)
Attachments
(4 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0
Steps to reproduce:
I'm running Firefox with about 6 extension including ublock origin, sideberry and violent monkey (though no scripts are active there). Without them I can't run Firefox long enough to trigger this bug without puking at what the web looks like, so I can't test without them.
When I run in a memory restricted environment with only access to 20G of memory, like so
systemd-run --scope -p MemoryLimit=20000M firefox &
firefox crashes with the oom-killer killing the firefox-wrapper process.
After some time, if I run normally, my system will lock up. Sometimes but not always I it will unfreeze after some time. Sometimes I
Actual results:
The main Firefox process gets killed by the oom-killer via firefox-wrapper
Expected results:
Firefox should manage memory compatibly with the oom-killer so that individual windows responsible for memory overuse get killed first.
I guess this is a problem in some code that is part of a web sites and gets upset or memory leaks when blocked from accessing something (e.g. by ublock). The process that is allocating too much memory should be stopped at some point.
| Reporter | ||
Comment 1•9 months ago
|
||
Comment 2•9 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
| Reporter | ||
Comment 3•9 months ago
|
||
Attached memory report from a separate later session - I don't know a way to get the same info from a crashing Firefox, though I will start looking.
| Reporter | ||
Comment 4•9 months ago
|
||
This doesn't seem to be specific to one site, though it might be some shared code loaded by several. My most suspected website for causing problems is reddit, though I've also had plenty of similar problems in browser sessions that never had reddit opened, apparently triggered by various standard news sites e.g. theguardian.com
Continuing a sentence in the original report - sometimes if I am running without memory restrictions and the browser locks up but then unfreezes (typically after oom-kill of a Isolated Web Component process); sometimes I also catch the browser slowing considerably but before it locks up. In both cases if I kill several of the Web Component processes, selecting the ones that come to the top in atop, and everything goes back to working for another hour or more although I'd say the problem recurs more quickly than from a clean start.
Comment 5•9 months ago
|
||
This should be a duplication of bug 1974582.
| Reporter | ||
Comment 6•9 months ago
|
||
I don't think so, that's a low memory bug when the system has e.g. 2GB whereas this is happening on a 32GB system when firefox alone has 20GB allocated. In a sense a sign of something much more serious wrong surely.
| Reporter | ||
Updated•9 months ago
|
Comment 7•9 months ago
|
||
Could you provide oom_score of Firefox processes?
What is firefox-wrapper? Is it a shell script that run firefox? May I also have it's oom_score?
Comment 8•9 months ago
•
|
||
(In reply to Michael De La Rue from comment #6)
I don't think so, that's a low memory bug when the system has e.g. 2GB whereas this is happening on a 32GB system when firefox alone has 20GB allocated. In a sense a sign of something much more serious wrong surely.
Bug 1974582 set memory limitation through cgroup. The reporter of bug 1974582 has 32Gb DRAM, but restrict firefox to 2 Gb, 4Gb, and 6Gb memory through cgroup.
As far as I know, the command
systemd-run --scope -p MemoryLimit=20000M firefox &
also create a cgroup to restrict the memory usage if I am right although you have 20Gb here.
The 20Gb here includes not only RSS, but also swap size and pages backed by files. According to the memory report provided by you earlier, firefox occupies about 5Gb of core memory (RSS).
According to what I mentioned above, I still believe that this bug is a duplication of bug 1974582. If you think I miss anything, please let me know.
| Reporter | ||
Comment 9•9 months ago
|
||
As far as I know, the command / systemd-run --scope -p MemoryLimit=20000M firefox & / lso create a cgroup to restrict the memory usage if I am right although you have 20Gb here.
Yes, that's my understanding.
According to the memory report provided by you earlier, firefox occupies about 5Gb of core memory (RSS).
Please remember the memory report is not the crashing process (I don't know how to get that report out at crash time) and I think it's not even close to the state at the time. The memory report is relatively soon after starting a new instance though with a number of the windows that seemed to cause the crash. When I'm killing processes I typically see the Isolated Web Components go up in memory and definitely CPU considerably.
I still believe that this bug is a duplication of bug 1974582. If you think I miss anything, please let me know.
The most thing I think haven't made 100% clear is that I started using cgroups to try to work out why firefox kept locking up my system. The aim was both to limit the system damage but also to work out whether Firefox alone, or other processes such as signal (electron) or vscode could be the trigger of system stability problems. What seems to be clear is that since I started keeping firefox in a cgroup there have been no system crashes, so I would say that firefox is at the center of problems even when there isn't a cgroup.
I think the current cgroup bug is covering up a more general memory overallocation bug.
let me put a little speculation
My thought is that there are three or four potential bugs here. The bug which clearly doesn't seem to be covered is the fact that I have to use cgroups in the first case to keep my system stable. In factI think that's wrong, but if it isn't wrong then Firefox should be being installed to use a cgroup by default. To be honest, I'm not sure you aren't right and as written my report is basically the same one about measuring cgroup memory usage and I'm definitely happy to try to close refile if there's a thought about
Inside Cgroups:
\1. Cgroups: inability to read current status in cgroups, using - this means that firefox itself will hit a memory limit whilst
Key symptom that makes this a bug: firefox actually crashes as a whole
Outside Cgroups
2. Outside inability to quickly evict tabs / free up memory in case of sudden need for memory from other applications - this leads to lock ups /
Key symptom that makes this a bug: system sometimes crashes completely
Note key thing here, there's a firefox statement elsewhere that it's the OS job to protect itself. The Linux tool for that is cgroups. If Linux users aren't
-
Lack of a clear way to prioritize killing tabs / to get the right tab killed
-
Lack of handing of tabs that get OOMkilled
In both cases
5. Firefox runs close to the limit on memory and sometimes that needlessly locks things up
First, yes I know about and agree with the firefox philosophy that memory is there to be used. However, if the browser ends up thrashing on memory and especially if that ends up with the system thrashing on memory that is slower than if memory hadn't been used.
That leads to the following proposals from me for potential fixes
Fix A - start reading cgroups
obviously reading the cgroups file will improve things for 1
Fix B - handle oomkilled windows nicely
If an oomkill looked like unloading a window, and the reloaded it when the window switched to it then oomkilling wouldn't be nearly as big a problem for firefox
Fix C - prioritize processes for oomkilling (much better after B)
- main firefox process gets highest priority
- visible windows and active media outputing process get one step lower
- non visible windows are
Fix D - more rapidly and aggressively return memory to the OS as memory limits approach
- If firefox realizes that memory is low, I think it's currently insufficiently aggressive in returning memory. Making it a bit more active in doing this could solve system stability problems I think.
Comment 10•9 months ago
•
|
||
Thank you for the explanation and details.
As far as I know, firefox is not cgroup awared in terms of memory management. So, it is definitely something should be fixed.
Tab unloading not working well is a known issue as well.
These issues have been mentioned in bug 1974582. I would make this bug depend on bug 1974582, so we can keep tracking.
Updated•9 months ago
|
Comment 11•8 months ago
|
||
Please check this comment. It may help you!
https://bugzilla.mozilla.org/show_bug.cgi?id=1974582#c14
| Reporter | ||
Comment 12•8 months ago
|
||
This is taken with firefox not running in a cgroup.
This shows the memory just as firefox is about to cause a system lockup. At this point the GUI is non responsive but I can get into a terminal and still kill processes. If I don't kill any processes, they may be killed by the oom-killer or the system may lock up for what seems like an indefinite period and sometimes is certainly longer than several hours.
| Reporter | ||
Comment 13•8 months ago
|
||
experimenting with setting the following following suggestion in
https://bugzilla.mozilla.org/show_bug.cgi?id=1974582#c14
browser.tabs.unloadOnLowMemory False -> True
browser.tabs.min_inactive_duration_before_unload unchanged 60000
browser.low_commit_space_threshold_percent 5% -> 10%
browser.low_commit_space_threshold_mb 500 -> 2000
| Reporter | ||
Updated•8 months ago
|
| Reporter | ||
Comment 14•8 months ago
|
||
In a continuation of this, after doing the above settings, when running a firefox with quite a few tabs already open, I hit the problem again by asking sideberry to restore several (10) old tabs. This time I didn't manage to get a console and atop open in time, so X crashed. The log doesn't show any oom-killer processes having triggered but it shows many log entries like this:
Aug 07 16:31:16 kinga systemd-journald[849]: Under memory pressure, flushing caches.
my feeling is that tab unloading with the above settings has helped over the last two days and meant that firefox ran stably, but that it was too slow to help in the situation where mutiple tabs made memory demands at the same time when already close to memory limits.
Comment 15•8 months ago
•
|
||
(In reply to Michael De La Rue from comment #14)
my feeling is that tab unloading with the above settings has helped over the last two days and meant that firefox ran stably, but that it was too slow to help in the situation where mutiple tabs made memory demands at the same time when already close to memory limits.
This is interesting We probably can do something basing on this observation.
Do you see any thing in Xorg.*.log? If the X server crashes under memory pressure without being killed by oom-killer or others, X server itself probably needs to be improved if not wrong in terms of memory management.
In terms of restoring a big number of tabs, it could be a good idea to throttle tab opening rate, especially when the main thread of the main process is busy or threshing. It is not only shaping memory usage pattern but also keeping good performance (avoid performance degradation) of the system.
| Reporter | ||
Comment 16•8 months ago
|
||
In my display manager jogs as root I found this first log set and in m In my user journal I found the second corresponding past log (relatively short so inline. hope that suits)
-- Boot 77752270a5894e0ea53d6a2dc01a3cc9 --
Aug 06 11:15:48 kinga systemd[1]: Starting Display Manager...
Aug 06 11:15:48 kinga systemd[1]: Started Display Manager.
Aug 06 11:16:24 kinga lightdm[1971]: gkr-pam: unable to locate daemon control file
Aug 06 11:16:24 kinga lightdm[1971]: gkr-pam: stashed password to try later in open session
Aug 06 11:16:24 kinga lightdm[1971]: pam_unix(lightdm:session): session opened for user mikedlr(uid=1000) by (uid=0)
Aug 06 11:16:24 kinga lightdm[1971]: gkr-pam: gnome-keyring-daemon started properly and unlocked keyring
Aug 07 16:32:18 kinga lightdm[1661]: Error updating user /org/freedesktop/Accounts/User1000: Timeout was reached
Aug 07 16:32:47 kinga lightdm[1661]: Error updating user /org/freedesktop/Accounts/User1001: Timeout was reached
Aug 07 16:33:22 kinga lightdm[1661]: Error updating user /org/freedesktop/Accounts/User1000: Timeout was reached
Aug 07 16:35:42 kinga systemd[1]: display-manager.service: Unexpected error response from GetNameOwner(): Connection terminated
Aug 07 16:36:38 kinga lightdm[1661]: Error terminating login1 session: Timeout was reached
Aug 07 16:36:41 kinga systemd[1]: display-manager.service: Deactivated successfully.
Aug 07 16:36:41 kinga systemd[1]: display-manager.service: Consumed 41min 9.380s CPU time, 1.5G memory peak, 1.1G read from disk, 824K written to disk.
Aug 07 16:36:41 kinga systemd[1]: display-manager.service: Scheduled restart job, restart counter is at 1.
Aug 07 16:36:41 kinga systemd[1]: Starting Display Manager...
Aug 07 16:36:42 kinga systemd[1]: Started Display Manager.
Aug 07 16:36:57 kinga lightdm[193445]: gkr-pam: unable to locate daemon control file
Aug 07 16:36:57 kinga lightdm[193445]: gkr-pam: stashed password to try later in open session
Aug 07 16:36:57 kinga lightdm[193445]: pam_unix(lightdm:session): session opened for user mikedlr(uid=1000) by (uid=0)
Aug 07 16:36:57 kinga lightdm[193445]: gkr-pam: gnome-keyring-daemon started properly and unlocked keyring
Aug 07 18:02:01 kinga systemd[1]: Stopping Display Manager...
Aug 07 18:02:02 kinga display-manager[193310]: Failed to get D-Bus connection
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Main process exited, code=exited, status=1/FAILURE
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Killing process 193322 (X) with signal SIGKILL.
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Failed with result 'exit-code'.
Aug 07 18:02:02 kinga systemd[1]: Stopped Display Manager.
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Consumed 3min 7.391s CPU time, 724.5M memory peak, 208.6M read from disk, 140K written to disk.
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Triggering OnFailure= dependencies.
Aug 07 18:02:02 kinga systemd[1]: display-manager.service: Failed to enqueue OnFailure=plymouth-quit.service job, ignoring: Unit plymouth-quit.service not found.
Aug 06 07:29:11 kinga xsession[1939]: cinnamon-session-binary[1939]: CinnamonDesktop-WARNING: t+190904.25376s: Failed to acquire idle monitor proxy: Timeout was reached
Aug 06 07:29:11 kinga cinnamon-session-binary[1939]: CinnamonDesktop-WARNING: t+190904.25376s: Failed to acquire idle monitor proxy: Timeout was reached
Aug 06 07:29:24 kinga .cinnamon-wrapp[2170]: Attempting to remove actor of type 'StIcon' from group of class 'ClutterBox', but the container is not the actor's parent.
Aug 06 07:29:25 kinga .cinnamon-wrapp[2170]: Attempting to remove actor of type 'StIcon' from group of class 'ClutterBox', but the container is not the actor's parent.
Aug 06 08:27:24 kinga dbus-daemon[1951]: [session uid=1000 pid=1951] Activating service name='org.cinnamon.ScreenSaver' requested by ':1.189' (uid=1000 pid=585880 comm="/nix/store/9yh9ak97gn659bk4d3n411fx6c0ng7s2-python")
Aug 06 08:27:25 kinga org.cinnamon.ScreenSaver[585887]: Loading AccountsService
Aug 06 08:27:25 kinga org.cinnamon.ScreenSaver[585887]: Fractional scaling active: False
Aug 06 08:27:25 kinga org.cinnamon.ScreenSaver[585887]: Trying to connect to logind...
Aug 06 08:27:25 kinga dbus-daemon[1951]: [session uid=1000 pid=1951] Successfully activated service 'org.cinnamon.ScreenSaver'
Aug 06 08:27:25 kinga org.cinnamon.ScreenSaver[585887]: Starting screensaver...
Aug 06 08:27:25 kinga org.cinnamon.ScreenSaver[585887]: Cinnamon Screensaver support not found in current theme - adding some...
Aug 06 08:27:29 kinga org.cinnamon.ScreenSaver[585887]: AccountsService ready
Aug 06 08:27:29 kinga org.cinnamon.ScreenSaver[585887]: Successfully using logind
Aug 06 08:28:19 kinga .csd-print-noti[1971]: Source ID 3 was not found when attempting to remove it
Aug 06 08:28:19 kinga cinnamon-settings-daemon-keyboard.desktop[1966]: ![1754453545,000,xklavier_evt_xkb.c:xkl_xkb_process_x_event/] ATTENTION! Currently cached group 1 is not equal to the current group from the event: 0
Aug 06 08:28:19 kinga cinnamon-settings-daemon-keyboard.desktop[1966]: ![1754453545,000,xklavier_evt_xkb.c:xkl_xkb_process_x_event/] ATTENTION! Currently cached group 1 is not equal to the current group from the event: 0
Aug 06 08:28:19 kinga cinnamon-settings-daemon-keyboard.desktop[1966]: ![1754453545,000,xklavier_evt_xkb.c:xkl_xkb_process_x_event/] ATTENTION! Currently cached group 1 is not equal to the current group from the event: 0
Aug 06 08:28:19 kinga xsession[1939]: cinnamon-session-binary[1939]: WARNING: t+194451.76018s: Application 'xapp-sn-watcher.desktop' killed by signal 15
Aug 06 08:28:19 kinga cinnamon-session-binary[1939]: WARNING: t+194451.76018s: Application 'xapp-sn-watcher.desktop' killed by signal 15
Aug 06 08:28:19 kinga cinnamon-settings-daemon-keyboard.desktop[1966]: !
Aug 06 08:28:19 kinga xsession[1939]: cinnamon-session-binary[1939]: WARNING: t+194451.81458s: Application 'cinnamon.desktop' killed by signal 15
Aug 06 08:28:19 kinga cinnamon-session-binary[1939]: WARNING: t+194451.81458s: Application 'cinnamon.desktop' killed by signal 15
Aug 06 08:28:19 kinga pipewire[2102]: mod.x11-bell: X11 I/O error handler called on display :0
Aug 06 08:28:19 kinga pipewire[2102]: mod.x11-bell: X11 display (:0) has encountered a fatal I/O error
| Reporter | ||
Comment 17•8 months ago
|
||
I've also just had a repeat of the problem whilst relaunching my set of windows (I'm using sideberry's snapshot feature to store and restore sets of tabs).
This time, however, I already had an atop running in a terminal and felt the browser and system slowing down. I killed the top two Isolated web browser processes from atop (sorted by CPU). This immediately made the system responsive and working. Judging from the tabs which showed as having crashed, one of those processes was a Whatsapp web tab that had just been started. The second one was a random (technical) reddit page.
| Reporter | ||
Comment 18•8 months ago
|
||
Realized that I hadn't answered some questions whilst answering others
What is firefox-wrapper? Is it a shell script that run firefox? May I also have it's oom_score?
/nix/store/9psgaidji6ngsabm7ml0k988vidbilnr-firefox-141.0.3/bin/.firefox-wrapped: symbolic link to ../lib/firefox/firefox
so in fact it's firefox-wrapped and it's really firefox,
Could you provide oom_score of Firefox processes?
the main / wrapped process has
oom_score 706
oom_adj 0
a randomly chosen content process has
oom_score 780
oom_adj 2
Comment 19•8 months ago
|
||
(In reply to Michael De La Rue from comment #17)
I've also just had a repeat of the problem whilst relaunching my set of windows (I'm using sideberry's snapshot feature to store and restore sets of tabs).
This time, however, I already had an atop running in a terminal and felt the browser and system slowing down. I killed the top two Isolated web browser processes from atop (sorted by CPU). This immediately made the system responsive and working. Judging from the tabs which showed as having crashed, one of those processes was a Whatsapp web tab that had just been started. The second one was a random (technical) reddit page.
It sounds like that the unloader doesn't kick in at all.
According to the comment 13, these are new tabs (restored by sideberry).
The unloader has to wait for 60000 ms(60s) to unload them.
Doesn't sideberry do lazy loading on restored tabs?
| Reporter | ||
Comment 20•8 months ago
|
||
It sounds like that the unloader doesn't kick in at all.
According to the comment 13, these are new tabs (restored by sideberry).
That is possible it doesn't. However whilst these are new tabs, when I see problems that's when there are also plenty of other old tabs which would be candidates for unloading.
The unloader has to wait for 60000 ms(60s) to unload them.
Doesn't sideberry do lazy loading on restored tabs?
Sideberry appears not to, at least in this case. Is it expected to be Sideberry or is that a Firefox decision? However, each of these tabs that I'm restoring in my case is a communicator program - e.g. gmail or whatsapp. If they were restored lazily I would have to go to each one to trigger it to check if there are new messages, so what sideberry is doing in this case matches with a valuable user intent - "get connected to all my communicator services and tell me if I have any new messages"
What I'd "expect" (hope ?) to happen is that the new tabs don't load any faster than memory is available and in the meantime some background tabs are unloaded to make space for them.
Comment 21•8 months ago
•
|
||
(In reply to Michael De La Rue from comment #20)
It sounds like that the unloader doesn't kick in at all.
According to the comment 13, these are new tabs (restored by sideberry).That is possible it doesn't. However whilst these are new tabs, when I see problems that's when there are also plenty of other old tabs which would be candidates for unloading.
The unloader has to wait for 60000 ms(60s) to unload them.
Doesn't sideberry do lazy loading on restored tabs?Sideberry appears not to, at least in this case. Is it expected to be Sideberry or is that a Firefox decision?
Sideberry the tab creator should make the decision.
What I'd "expect" (hope ?) to happen is that the new tabs don't load any faster than memory is available and in the meantime some background tabs are unloaded to make space for them.
This expectation is reasonable.
Stephen,
What do you think about adding a throttle on tab loading?
I mean to limit the number of tabs being loaded in a period of time and/or base on memory usage.
| Reporter | ||
Comment 22•8 months ago
|
||
I think the systemd-oomd part of this, separate from the lockup and tab unloading suggestions, might be helped by this enhancement proposal: https://bugzilla.mozilla.org/show_bug.cgi?id=1984223
| Reporter | ||
Comment 23•7 months ago
|
||
running with:
browser.tabs.unloadOnLowMemory true
#note one minute, not ten minutes
browser.tabs.min_inactive_duration_before_unload 60000
browser.low_commit_space_threshold_mb 8000
browser.low_commit_space_threshold_percent 20
I have not experienced a single system crash, browser crash or even proper browser lockup in days. Generally my browser felt much faster and more responsive than ever in the past many months, probably better than chromium. In fact, I just noticed any problem for the first time when three different sites were running very heavy load because the browser slowed down briefly to the level it used to be almost all the time and one video didn't play well.
Even in the situation of a slow down, it's now possible to use the about:processes page to identify what tabs are triggering problems and close some of them. That's a vast improvement over the previous situation.
This has been combined with running the earlyoom daemon, as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1974582#c22 but that has not had to trigger once in over a week. I think that reducing the inactive duration is important because in situations where there has been a quick switch from one tab to another to another, for example whilst searching for information, its very easy for some quite memory heavy tabs to build up quickly.
I still think that there's a need to handle cgroups and cgroup memory limits correctly, but I think that this actual bug could probably count as solved if tab unloading was turned on by default and triggered more aggressively at about 1/3 of memory.
Comment 24•6 months ago
|
||
Thank you for sharing your experiences.
You mentioned about reducing the inactive duration, but you still keep default value (10mins). I would like to know why you decide to keep it.
| Reporter | ||
Comment 25•6 months ago
|
||
Hi ; I definitely adjusted the setting and had improvement. I don't know at what point I reverted it, but apparently I have because that is what I currently have set and it's generally working okay with no crashes though I have noticed some occasional slowdowns. I do wonder if I was switching window more rapidly when testing?
Description
•