Document how the low-memory killer and low-memory notification work

RESOLVED FIXED

Status

Developer Documentation
Firefox OS
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: gsvelto, Assigned: cmills)

Tracking

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
The low-memory killer and low-memory trigger behavior have never been documented and information about how they work, how they relate to each other and what chains of events they trigger within Gecko are currently scattered among a plethora of bug comments, commit and ML posts. We should gather that information in one place (MDN) both to help out new contributors willing to work on those areas and vendors wishing to tune it for their handsets.

Here's a non-exhaustive list of what should be documented:

- How the low-memory killer and trigger work (this is really a Linux kernel feature but it's important for understanding what's going on)

- How we set the killer/trigger parameters and why we do so

- How we monitor low-memory notifications, how we propagate them, what type of notifications are available and how various components react to them if what they do is non-trivial (e.g. ongoing memory-pressure notifications and image caches)

- How killer/trigger parameters are related to process priorities and how we react to process priority changes (e.g. memory minimization for background processes)
This sounds perfectly reasonable, and I am happy to start work on this in the next couple of weeks once I have got some other stuff out of the way. Can you give me some links to the most pertinent bugs, mailing list posts, etc. where this information can be found?
(Reporter)

Comment 2

4 years ago
(In reply to Chris Mills from comment #1)
> This sounds perfectly reasonable, and I am happy to start work on this in
> the next couple of weeks once I have got some other stuff out of the way.
> Can you give me some links to the most pertinent bugs, mailing list posts,
> etc. where this information can be found?

Sorry for the delay, here's a list of relevant bugs:

bug 768832
bug 771195
bug 800166
bug 808756
bug 814771
bug 839312
bug 846065
bug 862970
bug 876029
bug 877222

The current behavior is more or less the cumulation of those bugs but it might be tricky to figure out all the details from them. It shouldn't be too hard to document the first two points in my comment above, for the other two I'm willing to help you out and write as much documentation as possible myself as I know the topic already pretty much in-depth.
Heh, I'm gonna find it very hard to document this by perusing those bugs ;-). If you don't mind, could you write me out a few rough notes on what happens, how the system works, when and how the notifications/events are fired, when/how a developer can respond to that information? 

I've already had a go at starting an article, see attached to the bug. Let me know if this looks reasonable as a structure, if there is anything missing, etc.

FYI, we already published https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_OOMs, which duplicates some of this information, but I'm thinking it would be best to have two articles:

1. intro to how this all works (what we're currently discussing) 
2. how to debug OOMs (what we've currently got published; I could cut the intro explanation out of this one and put it in the other article)

Does that sound like a plan?
Created attachment 8350663 [details]
Rough draft/notes for the article
(Reporter)

Comment 5

4 years ago
Sorry for the huge delay on this Chris, here's a description of how the whole thing works (i.e. what you'll want to put in the first article). I'll try to describe the principle of operation of the whole system starting with your article as an outline with sufficient detail but if you need more feel free to ask me. Some details are often subject to change (e.g. which applications run in the main process or the behavior of special applications such as the homescreen or keyboard).

FxOS operation involves multiple processes - one "main process" running basic system services and potentially many "child processes". In general every app runs in its own child process. Since in the FxOS environment applications are rarely closed by the user the system automatically manages their lifetime to make room for new apps or for existing apps requiring additional memory.

Two systems are used in this regard: the low memory killer (LMK from now on) and low memory notifications. The LMK [1] is a subsystem of the Android kernel that automatically kills processes to make room for memory requests. In order to chose which process is killed first for freeing up memory each process is assigned a priority via the /proc/<pid>/oom_adj or /proc/<pid>/oom_score_adj [2] file. In general the larger the adjustment score is the more likely a process is to be killed. The LMK provides multiple levels each corresponding to a certain amount of free memory and a minimum adjustment score. Whenever the amount of free memory in the system drops below a certain level, all process with an adjustment score higher than the minimum specified for that level are eligible to be killed. Among those processes the LMK will start by killing the larger one and keep going until it freed enough memory to go above the threshold again.

In FxOS we use the different levels provided by the LMK to ensure the following policy in order to free memory:

- The first apps to be killed will be the background apps, starting with the least recently used

- The homescreen application is killed next, when all background applications have been killed

- Background applications which are perceivable by the user are killed next (for example a music player playing audio in the background or an app holding a 'high-priority' or 'cpu' wakelock and having a registered handler for system messages)

- If the keyboard application is running, it will be killed next

- The foreground applications will be killed next

- Finally foreground applications that have requested a 'high-priority' or 'cpu' wakelocks are the last to get killed

This policy is enforced by giving each application a priority level and associating an OOM adjustment score to these levels. The current values are set in prefs and can be found in [3].

There's a couple of exceptions to these rules. First of all the main process is never killed by the LMK as doing so would restart the operating system. Then we keep a process around to speed up the startup of new applications, this is called the preallocated process. This process is usually always kept alive because it consumes little memory and significantly speeds up application startup. The only case under which it can be killed is if there's not enough memory available for the main process to keep running after having killed all other processes.

The second mechanism we use to free memory are low memory notifications. The LMK provides a special threshold which when crossed can send notifications to the userspace that we're running low on memory. Both the system application and regular user apps continuously wait for this condition and will react upon it by sending a "memory-pressure" event via the observer service. This event is visible only to C++ or chrome JS code and not directly by an application. Through the Gecko codebase we use this event to free as much memory as we can normally by purging internal caches (images, DNS, sqlite, etc...), dropping assets that can be recreated (WebGL contexts for example) and running the garbage collector and cycle collector.

When encountering a low memory condition the first event that will be sent will have the "low-memory" payload. If after a predefined time (5s) we're still in a low memory condition another event will be fired but this time with the "low-memory-ongoing" payload. This payload is used when we continue to be in a low-memory condition and we want to flush caches and do other cheap forms of memory minimization, but heavy handed approaches like a GC are unlikely to succeed.

Currently the low memory threshold is set above the LMK level for background applications but below the one for the homescreen, see [5]. So the aggregated action of the LMK and low memory notifications is the following when running out of memory:

1. Kill background apps in least-recently-used order
2. If we didn't free enough memory send memory-pressure events to all remaining applications, if the condition persists keep sending events every 5 seconds but mark them as ongoing so the GC/CC doesn't react to them
3. Then kill the homescreen
4. Then kill perceivable background or high-priority background applications
5. Then kill the keyboard app
6. Then kill foreground applications
7. Then kill high priority foreground applications
8. Then kill the preallocated process

[1] https://android.googlesource.com/kernel/common.git/+/edd540ea92954f896bfb7ee0ebf5dfdde6e6cb41/drivers/staging/android/lowmemorykiller.txt

[2] https://www.kernel.org/doc/Documentation/filesystems/proc.txt

[3] http://hg.mozilla.org/mozilla-central/file/545c35907eff/b2g/app/b2g.js#l661

[4] https://www.codeaurora.org/cgit/quic/la//kernel/msm/commit/?id=b3f986cba580b14438b77b42070ebbc77b69d4c4

[5] http://hg.mozilla.org/mozilla-central/file/545c35907eff/b2g/app/b2g.js#l722
No problem Gabriele. I have taken your text and turned it into 

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Out_of_memory_management_on_Firefox_OS

Does this read ok to you? Let me know if the mentions of oom_adj and specific priority numbers are ok. I took them from some previous information I had available.

also, what part of the text should 

https://www.codeaurora.org/cgit/quic/la//kernel/msm/commit/?id=b3f986cba580b14438b77b42070ebbc77b69d4c4

be linked to? You provided this as a reference but didn't link it up with a corresponding reference number.

Finally, the article that references this is at 

https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_OOMs
(Reporter)

Comment 7

4 years ago
(In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #6)
> No problem Gabriele. I have taken your text and turned it into 
> 
> https://developer.mozilla.org/en-US/Firefox_OS/Platform/
> Out_of_memory_management_on_Firefox_OS

This is excellent Chris, thanks!

> Does this read ok to you? Let me know if the mentions of oom_adj and
> specific priority numbers are ok. I took them from some previous information
> I had available.

It's all correct, I will adjust it in the near future because the oom_adj field is deprecated and we should stop using it in favor of oom_score_adj but until I land that particular fix this matches exactly the behavior we have in the code.

> also, what part of the text should 
> 
> https://www.codeaurora.org/cgit/quic/la//kernel/msm/commit/
> ?id=b3f986cba580b14438b77b42070ebbc77b69d4c4
> 
> be linked to? You provided this as a reference but didn't link it up with a
> corresponding reference number.

Right, I forgot, that's about low memory notifications so it should have been at the end of this phrase:

"The LMK provides a special threshold that, when crossed, can send notifications to the userspace that is running low on memory."

> Finally, the article that references this is at 
> 
> https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_OOMs

That's good, maybe we could also add a mention in:

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Architecture
(In reply to Gabriele Svelto [:gsvelto] from comment #7)
 
> > Does this read ok to you? Let me know if the mentions of oom_adj and
> > specific priority numbers are ok. I took them from some previous information
> > I had available.
> 
> It's all correct, I will adjust it in the near future because the oom_adj
> field is deprecated and we should stop using it in favor of oom_score_adj
> but until I land that particular fix this matches exactly the behavior we
> have in the code.

Cool!

> > also, what part of the text should 
> > 
> > https://www.codeaurora.org/cgit/quic/la//kernel/msm/commit/
> > ?id=b3f986cba580b14438b77b42070ebbc77b69d4c4
> > 
> > be linked to? You provided this as a reference but didn't link it up with a
> > corresponding reference number.
> 
> Right, I forgot, that's about low memory notifications so it should have
> been at the end of this phrase:
> 
> "The LMK provides a special threshold that, when crossed, can send
> notifications to the userspace that is running low on memory."

Great, thanks. I've added a link.

> > Finally, the article that references this is at 
> > 
> > https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_OOMs
> 
> That's good, maybe we could also add a mention in:
> 
> https://developer.mozilla.org/en-US/Firefox_OS/Platform/Architecture

Where would be best to add in a mention, do you think?
(Reporter)

Comment 9

4 years ago
(In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #8)
> Where would be best to add in a mention, do you think?

I'd say after the "Processes and threads" section.

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Architecture#Processes_and_threads
Cool, thanks - I've added a note to that section.

Any more queries or required changes? If not, I'll close the bug.
(Reporter)

Comment 11

4 years ago
(In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #10)
> Cool, thanks - I've added a note to that section.
> 
> Any more queries or required changes? If not, I'll close the bug.

I think we're good :)
Closed!
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.