Closed Bug 1771712 Opened 2 years ago Closed 2 years ago

Set oom_score_adj in SetProcessPriority() on Linux to make it less likely for the main process to be killed under memory pressure

Categories

(Core :: Hardware Abstraction Layer (HAL), enhancement)

All
Linux
enhancement

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
relnote-firefox --- 105+
firefox105 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

The introduction of PSI-based means our users encountering more often cases where Firefox gets OOM-killed, especially on machines with very small or no swap files (which ironically are often machines with large amounts of physical memory which should never encounter an OOM in the first place).

To mitigate this issue we can implement hal::SetProcessPriority() on Linux and instruct it to adjust the value of /proc/<pid>/oom_score_adj so that background processes are more likely to be targeted by the OOM killer than foreground processes, or the main process itself.

Note that if we can do this right then we might disable tab unloading on Linux and rely on the system OOM killer to reap background tasks for us (a little bit like we did on Firefox OS where unloading an app was entirely done by signaling the OOM killer which pages we cared less about).

Some very quick testing on my machine shows that even small adjustments to oom_score_adj (such as +100 or even +50) are more than enough to make small child processes have a larger score than a much larger main process. This will however make the affected child processes be more likely to be killed compared to other applications too. That's not necessarily a problem though, if we only make sure that background child processes get killed then Firefox will play nicely with other apps under memory pressure without visibly affecting the user. That isn't a bad outcome.

One last thing worth keeping in mind: the process priority manager was written before Fission, we should double-check that it's still detecting background/foreground processes correctly (it might not).

(In reply to Gabriele Svelto [:gsvelto] from comment #2)

One last thing worth keeping in mind: the process priority manager was written before Fission, we should double-check that it's still detecting background/foreground processes correctly (it might not).

I think @mccr8 fixed this recently.

Flags: needinfo?(continuation)

(In reply to Gabriele Svelto [:gsvelto] from comment #0)

Note that if we can do this right then we might disable tab unloading on Linux and rely on the system OOM killer to reap background tasks for us (a little bit like we did on Firefox OS where unloading an app was entirely done by signaling the OOM killer which pages we cared less about).

It'd be helpful if we still get low memory notifications so that the remaining processes can do some compaction. I think I remember someone saying that tab unloading happes "before" low memory notifications so without tab unloading this should actually be better. I don't have a good picture of how tab unloading and low memory notifications interact.

(In reply to Paul Bone [:pbone] from comment #4)

It'd be helpful if we still get low memory notifications so that the remaining processes can do some compaction. I think I remember someone saying that tab unloading happens "before" low memory notifications so without tab unloading this should actually be better. I don't have a good picture of how tab unloading and low memory notifications interact.

Yes, a process being knowingly targeted by the OOM killer could be used in place of the low-memory detection machinery and we could react to it by sending memory pressure events to the other processes. Right now we detect low memory, try to unload tabs, and if that fails also send memory pressure events (thought that's something I wanted to change).

(In reply to Gabriele Svelto [:gsvelto] from comment #2)

One last thing worth keeping in mind: the process priority manager was written before Fission, we should double-check that it's still detecting background/foreground processes correctly (it might not).

Yes, I fixed this in bug 1618547. You just need to flip the pref on Linux, and the tests should pass and everything. (Of course, it isn't actually changing priorities on Linux, just tracking them.)

Flags: needinfo?(continuation)

Thanks! Given the introduction of systemd-oomd in mainstream distros this is a good time as any to start using it.

This patch implements hal::SetProcessPriority() on Linux and leverages
it to make it more likely that the parent process and foreground tab
survive an OOM situation, letting Linux' OOM killer reap preallocated
processes and background tabs first when reclaiming memory.

This is achieved by setting the oom_score_adj values of said processes
such that they will be killed in this order:

  • Preallocated processes will be the first to go, they don't contain
    user data and are not visible, so they're a good candidate to free up
    memory
  • Background tabs will be killed next, we don't generate crash reports
    for thoes nor do we inform the user, we just reload the tab, so in
    most cases one being killed will only be a small annoyance
  • Background tabs playing video come next, but only if they're not also
    playing audio, see below
  • Finally foreground tabs will be killed as a last resort, background
    tabs playing audio are also considered to be in the foreground as the
    user will immediately notice if they crash

Note that this patch only implements the low-level plumbing. The process
priority manager has not been enabled on Linux yet so that needs to
happen before this actually works. Additionally I need to check how
tabs having an active WebRTC session behave, they need to be considered
as foreground tabs and I'm not sure if we're accounting for them yet.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Blocks: 1782720

I'm running a bunch of tests before landing this because I want to be sure this doesn't interfere with other stuff in automation.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/38595fc61412
Make it more likely for child processes to be killed under OOM conditions compared to the parent process on Linux r=jld
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch

Release Note Request (optional, but appreciated)
[Why is this notable]: Firefox on Linux will be less likely to run out of memory and crash. Additionally if a web page is leaking memory it won't cause other applications to run out of memory, even if it's in the background.
[Affects Firefox for Android]: This is already supported on Firefox for Android, it only affects Firefox desktop on Linux
[Suggested wording]: "Firefox is less likely to run out of memory on Linux, and is better behaved towards the rest of the system when memory is running low."
[Links (documentation, blog post, etc)]:

relnote-firefox: --- → ?

Added to the 105 nightly release notes.

This doesn't work as expected when running as a systemd service, which ships OOMScoreAdjust=100 for user@.service and another +100 because of DefaultOOMScoreAdjust is unset.

Maybe Firefox could read its current oom_score_adj early and adjust on that base.

(In reply to lilydjwg from comment #14)

This doesn't work as expected when running as a systemd service, which ships OOMScoreAdjust=100 for user@.service and another +100 because of DefaultOOMScoreAdjust is unset.

Maybe Firefox could read its current oom_score_adj early and adjust on that base.

Yes, we can do that. I'll file a follow-up.

Blocks: 1787638
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: