Open Bug 1782178 Opened 3 years ago Updated 13 days ago

Split low-memory detection between low physical memory and low commit-space using different mechanisms

Categories

(Core :: Memory Allocator, task)

Unspecified
Windows
task

Tracking

()

People

(Reporter: gsvelto, Unassigned)

References

(Blocks 1 open bug)

Details

The current code in the AvailableMemoryTracker/AvailableMemoryWatcher evolved from the old tracker in the hope that we could better detect low-memory condition and unload tabs directly upon hitting them. However we still use a mixed system to detect these conditions, first using Windows' memory resource notifications then polling with GlobalMemoryStatusEx().

The main issue with this system is that memory resources check for low physical memory while what the watcher tries to detect is low commit space. The reason for this mismatch is that outside of memory resources we didn't have a non-polling mechanism and for power-consumption reasons we wanted to move away from polling. We want to remove this mismatch for two reasons:

  • While we used this detection to drive tab unloading it only partially worked, unloading a tab requires memory for shutdown so it lead to more OOM crashes in child processes, as the unloaded process tried to shut down cleanly (main process crash rate also went down, which is something we want to retain)
  • Because low commit-space detection relies on low physical memory we would miss scenarios where we'd be low on commit-space but not on physical memory

Bug 1716727 gives us a new non-polling signal to detect when we're low on commit-space: if an allocation failed and we recovered from that condition. So I propose the following:

  • Use memory resources to detect low physical memory and unload tabs in response to this signal (low physical memory means the user performance is degraded by swapping and unloading a tab is going to help, this would also be useful to kill background tabs that might have been leaking memory)
  • Use low commit-space detection to drive memory-pressure events - these are guarantee to reduce our memory footprint without requiring a memory usage spike in the meantime
Depends on: 1784039

Since we've been talking about this I'd add an extra note for non-Windows platforms. On macOS there is no mechanism to avoid going out-of-memory because macOS always expands the swap file as much as needed (up to two times the size of the physical memory!), but we have a signal that we're swapping, so we should use that for tab unloading but never try to avoid an OOM (because it can't happen). Check this code for the macOS stuff. On Linux on the other hand the situation is more complicated. We used to poll the system memory info but that's a really bad way to measure available memory: it's slow, prone to errors and doesn't take into account things like control groups. We've disabled acting on memory pressure events and unloading tabs because of that (see bug 1780058 and bug 1847396). So, on Linux we can use PSI to detect swapping (:thinker just landed support in bug 1982963) but we need a separate signal to detect OOM conditions. Because we can only observe process kills by the OOM killer from the outside I propose the following: in bug 1771712 I've added machinery to prioritize which processes get killed first when the OOM killer needs to pick one. The preloaded processes are the best candidate for being the very first to be reaped when memory is low, so we can count on them going away first. If we can observe that a preloaded process has received a SIGKILL we could use that as a pretty clear signal that memory is low. That should work with any type of machinery that honors OOM adjustement scores (which I hope should cover most of the possible ways of enforcing memory limitations). Jed would it be possible to use the process watcher to detect this condition?

Flags: needinfo?(jld)

(In reply to Gabriele Svelto [:gsvelto] from comment #1)

If we can observe that a preloaded process has received a SIGKILL we could use that as a pretty clear signal that memory is low. That should work with any type of machinery that honors OOM adjustement scores (which I hope should cover most of the possible ways of enforcing memory limitations). Jed would it be possible to use the process watcher to detect this condition?

We don't have infrastructure for that yet, but making it easier to add that (by centralizing where the process status is obtained) was one of the ideas I had with the ProcessWatcher rewrite, and I filed it a while back as bug 1752625. I've commented there with a sketch of how it could be implemented, and what this use case might look like. (Note that the comment over there about the fork server is obsolete; that was bug 1752638 and it's fixed now.)

Flags: needinfo?(jld)

I am not sure if I read the question and the context correctly. If a preallocated process is killed, it's IPC socket for PContent should be disconnected immediately. Isn't it good enough?

(In reply to Thinker Li [:sinker] from comment #3)

I am not sure if I read the question and the context correctly. If a preallocated process is killed, it's IPC socket for PContent should be disconnected immediately. Isn't it good enough?

That's a good point: if a preallocated process exits unexpectedly (from the IPC channel being disconnected when it isn't already in shutdown), we may not strictly know whether it was SIGKILL, but the OOM killer is probably the reason why (because why else would that process exit unexpectedly when it's idle?).

Very good point, any proxy signal would be suitable and is better than what we have now. Thanks to both.

You need to log in before you can comment on or make changes to this bug.