Last Comment Bug 768832 - Set oom_adj values for background/foreground content processes
: Set oom_adj values for background/foreground content processes
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: IPC (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: -- normal (vote)
: mozilla17
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
Depends on: 762802 766437 777135 780437
Blocks: 785064 b2g-e10s-work 785171 800166
  Show dependency treegraph
 
Reported: 2012-06-27 04:29 PDT by Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
Modified: 2012-10-30 14:20 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Initialize oom_adj for the b2g chrome process to 0 (1.20 KB, patch)
2012-07-30 21:21 PDT, Michael Vines [:m1] [:evilmachines]
no flags Details | Diff | Splinter Review
Patch, v1 (25.20 KB, patch)
2012-07-31 19:41 PDT, Justin Lebar (not reading bugmail)
bzbarsky: review+
Details | Diff | Splinter Review
Part 1b: Address cjones's review coments. (4.68 KB, patch)
2012-08-02 17:08 PDT, Justin Lebar (not reading bugmail)
cjones.bugs: review+
Details | Diff | Splinter Review
Part 1c: Address bz's review comments, move ProcessPriorityManager from dom/browser-element to dom/ipc. (10.02 KB, patch)
2012-08-02 17:09 PDT, Justin Lebar (not reading bugmail)
bzbarsky: review-
Details | Diff | Splinter Review

Description Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-27 04:29:35 PDT
We want to make sure that when memory is low in the system, content processes get killed before anything else.  Further, we want background content processes to be killed before foreground.  A good start for this bug is ensuring that content processes are marked at a more vulnerable oom_adj.  (I forget which values correspond to which classes, but that's in the android source.)

The next step is to dynamically adjust oom_adj for foreground/background.  We can do that here or in a followup.  Basically, content processes with "active" docshells should be foreground, all others background.

We can make this finer grained in followup work, but heuristics become less clear after that.
Comment 1 Justin Lebar (not reading bugmail) 2012-06-27 04:56:36 PDT
The oom_adj (/proc/pid/oom_adj) ranges from -17 to +15.  Higher means more likely to be killed [1].

[1] http://lwn.net/Articles/317814/
Comment 2 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-06-27 05:00:59 PDT
Noth worth doing until we're actually using process-per-app.
Comment 3 Justin Lebar (not reading bugmail) 2012-07-23 12:13:25 PDT
Also doesn't make much sense until we fire an event letting us know that a content process has died.
Comment 4 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-23 12:17:49 PDT
Why is that?  (I don't think it really matters since we need both of these bugs, but just out of curiosity :).)
Comment 5 Justin Lebar (not reading bugmail) 2012-07-26 13:20:32 PDT
This depends on bug 777135 because at the moment, BrowserElementChild doesn't even initialize for apps, which means that none of our message listeners get set up, and therefore visibility doesn't get propagated properly.
Comment 6 Justin Lebar (not reading bugmail) 2012-07-27 17:08:26 PDT
We should also renice these background processes, in a separate bug.

But just to sound off here: This is actually trickier than I'd like, because only root can increase a process's priority.  Since the master Gecko process is not running as root, that means none of our code has permission to increase a process's priority, when it comes into the foreground!
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-27 17:17:13 PDT
You want a followup to bug 762782.

Only root can lower the nice rlimit, but conveniently the b2g process is root!  We don't have much reason to raise the rlimit for content processes anyway, we just /might/ want to nice up background processes, for which we don't need root.  But let's have this discussion in another bug.
Comment 8 Justin Lebar (not reading bugmail) 2012-07-27 17:19:55 PDT
> conveniently the b2g process is root!

Oh!  Well then, that simplifies a lot of things.
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-27 17:21:39 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> You want a followup to bug 762782.
> 
> We don't have much reason to raise the rlimit for content processes

By which I of course meant, "lower the rlimit".
Comment 10 Michael Vines [:m1] [:evilmachines] 2012-07-30 21:21:48 PDT
Created attachment 647427 [details] [diff] [review]
Initialize oom_adj for the b2g chrome process to 0

I've attached a patch that is maybe a first step in this bug.  Currently /system/b2g/b2g runs with oom_adj=-16.  This is not very nice because when the LMK activates it may kill something like rild or adbd before /system/b2g/b2g.  Content processes will inherit this oom_adj, where 0 so much better than -16.  Ideally we'd set the oom_adj for a content process at no less than +1 from the chrome process (fg/bg logic aside).  I can't do that patch though because looks like MPL2 code would need to be touched to make that happen.  If somebody can figure a way to do this in A2 code then let me know :)

FYI: The defaults for the LMK begin to whack oom_adj=1 at 8MB free and oom_adj=0 at 6MB free.
Comment 11 Michael Vines [:m1] [:evilmachines] 2012-07-30 21:29:19 PDT
BTW.  oom_adj is deprecated for oom_score_adj according to kernel/Documentation/feature-removal-schedule.txt
Comment 12 Justin Lebar (not reading bugmail) 2012-07-30 21:30:37 PDT
FYI, I really am working on this bug.  But we're not overlapping much, because most of the work is in determining what's a foreground/background process.
Comment 13 Michael Vines [:m1] [:evilmachines] 2012-07-30 21:51:28 PDT
sgtm.  That patch is about all I can do right now due to software license trauma :(
Comment 14 Justin Lebar (not reading bugmail) 2012-07-31 19:41:28 PDT
Created attachment 647826 [details] [diff] [review]
Patch, v1

r?cjones for hal, r?bz for the rest.

I /think/ this works on the device.

I was able to confirm that process foreground/background notifications are sent properly for out-of-process apps on B2G x86.  And I was able to confirm that oom_adj and nice are set properly for the root process and foreground processes.  (I checked nice using busybox's top -o pid,nice,comm.)

I was not able to test background processes on the device because the home button does not work in either the calculator or task apps.

The oom_adj and nice values for root/foreground/background processes are set arbitrarily.  The device does not recover well from process crashes at the moment, so I don't think it's useful to worry too much about the precise values just yet.
Comment 15 Justin Lebar (not reading bugmail) 2012-07-31 19:49:04 PDT
(In reply to Michael Vines [:m1] from comment #11)
> BTW.  oom_adj is deprecated for oom_score_adj according to
> kernel/Documentation/feature-removal-schedule.txt

I missed this.  Do you know if oom_adj is deprecated in our device's kernel?  If so, do you have any guidance on the values I should use for oom_score_adj?
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2012-07-31 20:32:15 PDT
Comment on attachment 647826 [details] [diff] [review]
Patch, v1

>+++ b/dom/browser-element/ProcessPriorityManager.cpp
>+ProcessPriorityManager::OnContentDocumentGlobalCreated(
>+  nsCOMPtr<nsPIDOMWindow> outerWindow = do_GetInterface(aOuterWindow);

do_QueryInterface?  The above works, but sorta by accident.

Why bother having a void Init() method?  Just because you want to run code after you have a ref to yourself, so can't do it from the constructor?  If so, you should documetn that.

Should this class/file be called HalProcessPriorityManager?  Esp. given the export to mozilla/ ?

It's a little weird to me that browser-element expects Hal to be available without any sort of makefile checking or ifdefs...  Is that correct?

r=me modulo the above.
Comment 17 Justin Lebar (not reading bugmail) 2012-07-31 20:47:55 PDT
> do_QueryInterface?  The above works, but sorta by accident.

Indeed!

> Why bother having a void Init() method?  Just because you want to run code
> after you have a ref to yourself, so can't do it from the constructor?

That, and we don't always create a ProcessPriorityManager -- the root process doesn't get one.

> Should this class/file be called HalProcessPriorityManager?  Esp. given the
> export to mozilla/ ?

I'm happy to rename it or move it into a deeper namespace, but we usually reserve "Hal" for the actual hardware-abstraction code.  For example, the mozilla::dom::battery::BatteryManager is roughly analogous to mozilla::ProcessPriorityManager, but for the Hal battery methods.

Would mozilla::dom::ProcessPriorityManager be any better, do you think?

> It's a little weird to me that browser-element expects Hal to be available
> without any sort of makefile checking or ifdefs...  Is that correct?

Yes; hal is always available, but we have empty fallback methods on unsupported platforms.
Comment 18 Michael Vines [:m1] [:evilmachines] 2012-07-31 20:51:31 PDT
(In reply to Justin Lebar [:jlebar] from comment #15)
> (In reply to Michael Vines [:m1] from comment #11)
> > BTW.  oom_adj is deprecated for oom_score_adj according to
> > kernel/Documentation/feature-removal-schedule.txt
> 
> I missed this.  Do you know if oom_adj is deprecated in our device's kernel?
> If so, do you have any guidance on the values I should use for oom_score_adj?

oom_adj works fine in our kernel.  It's not going away in JB either.  adbd uses it, amongst other stuff in Android.  I think we're fine for now and we can all jump off the cliff together.
Comment 19 Boris Zbarsky [:bz] (still a bit busy) 2012-07-31 20:56:21 PDT
> Would mozilla::dom::ProcessPriorityManager be any better, do you think?

I think so, yes.

> Yes; hal is always available, but we have empty fallback methods on unsupported platforms.

Ah, nice.
Comment 20 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-01 18:26:39 PDT
Comment on attachment 647826 [details] [diff] [review]
Patch, v1

I *really* don't like putting process management into hal.

ipc/glue/ is the best place to put this.  We have a small amount of
platform-specific code in there already, which is really all we need
here.

>diff --git a/b2g/app/b2g.js b/b2g/app/b2g.js

>+pref("hal.processPriorityManager.enabled", true);
>+pref("hal.processPriorityManager.gracePeriodMS", 1000);

These are platform-neutral, but per above would not use "hal.".  Other
ipc-related prefs have gone in "ipc." which may be a good place for
these.

>+pref("hal.processPriorityManager.rootOomAdjust", 0);
>+pref("hal.processPriorityManager.foregroundOomAdjust", 1);
>+pref("hal.processPriorityManager.backgroundOomAdjust", 2);
>+pref("hal.processPriorityManager.rootNice", -1);

"root" is a pretty overloaded name in this context.  Would recommend
"master".

>+pref("hal.processPriorityManager.foregroundNice", 0);
>+pref("hal.processPriorityManager.backgroundNice", 10);

These are not only linux specific but also gonk specific.
"ipc.gonk.*" wfm.

>diff --git a/dom/browser-element/ProcessPriorityManager.cpp b/dom/browser-element/ProcessPriorityManager.cpp

This feels pretty awkward in dom/browser-element, but we definitely
require some glue code.  I defer to bz's judgment.  Didn't look over
closely.

>diff --git a/hal/Hal.cpp b/hal/Hal.cpp

>+void
>+SetProcessPriority(int aPid, ProcessPriority aPriority)
>+{
>+  if (InSandbox()) {
>+    hal_sandbox::SetProcessPriority(aPid, aPriority);

Why do content processes control this?  It's not so much a security
issue, but we know in the master process when processes go
visible/hidden.

>diff --git a/hal/gonk/GonkHal.cpp b/hal/gonk/GonkHal.cpp

(--> ipc/glue.)

>+void
>+SetProcessPriority(int aPid, ProcessPriority aPriority)
>+{
>+  HAL_LOG(("SetProcessPriority(pid=%d, priority=%d)", aPid, aPriority));
>+
>+  const char* priorityStr = NULL;
>+  if (aPriority == PROCESS_PRIORITY_BACKGROUND) {

switch (aPriority)

>+    priorityStr = "background";
>+  } else if (aPriority == PROCESS_PRIORITY_FOREGROUND) {
>+    priorityStr = "foreground";
>+  } else if (aPriority == PROCESS_PRIORITY_ROOT) {
>+    priorityStr = "root";
>+  } else {
>+    MOZ_ASSERT(false);

MOZ_NOT_REACHED()

>diff --git a/hal/sandbox/PHal.ipdl b/hal/sandbox/PHal.ipdl

>+  virtual bool
>+  RecvSetProcessPriority(const int& aPid, const ProcessPriority& aPriority)
>+  {
>+    // TODO As a security check, we should ensure that aPid is either the pid
>+    // of our child, or the pid of one of the child's children.

I'm not convinced this is needed, but we can get to the pid from
PHalParent so there's no need to send it.
Comment 21 Justin Lebar (not reading bugmail) 2012-08-01 21:40:17 PDT
> Why do content processes control this?  It's not so much a security
> issue, but we know in the master process when processes go
> visible/hidden.

We do?  Maybe I've gone about this entirely wrong, then.  Could you point me at some code?

> we can get to the pid from PHalParent so there's no need to send it.

...until we have nested content processes, right?  I did it this way specifically to support that case.
Comment 22 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-01 23:45:05 PDT
(In reply to Justin Lebar [:jlebar] from comment #21)
> > Why do content processes control this?  It's not so much a security
> > issue, but we know in the master process when processes go
> > visible/hidden.
> 
> We do?  Maybe I've gone about this entirely wrong, then.  Could you point me
> at some code?
> 

frame.setVisible(true) == foreground, frame.setVisible(false) == background, and recursively from there.

> > we can get to the pid from PHalParent so there's no need to send it.
> 
> ...until we have nested content processes, right?  I did it this way
> specifically to support that case.

Hmmm.  My preference is to propagate these changes down the tree rather than up.  Is it simpler to implement bubbling this upwards?
Comment 23 Justin Lebar (not reading bugmail) 2012-08-02 06:45:47 PDT
> frame.setVisible(true) == foreground, frame.setVisible(false) == background, and recursively from 
> there.

Oh, you're saying that we know because the visibility change comes from a call to BrowserElementParent::SetVisible?  That's interesting; I hadn't thought of that!

BrowserElementParent doesn't currently get an event when the frame is removed from the DOM, for example, or even when the embedding page is navigated.  But I guess we could add a DOM mutation observer, to detect when the frame is added/removed, and a visibilitychange listener to detect when the page is not shown (navigated, or otherwise hidden).

I don't know if that would be an improvement over what we currently have.  I think it's better to be less-tied to mozbrowser where we can be, all things being equal, and the existing code will work for any OOP implementation, current or future.

> Hmmm.  My preference is to propagate these changes down the tree rather than up.

I thought you had to be root to decrease your nice score.  (I also don't know if you can modify your own oom_adj score if you're not root; doubtful, though.)
Comment 24 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-02 08:20:46 PDT
/proc/[plugin-container]/oom_score_adj is +rw for app_0 on my nexus s.  Generally, with these kinds of things any process with equivalent rights can *raise* nice / oom_adj within a fixed resource limit, but it takes root privs to *lower* that limit.
Comment 25 Justin Lebar (not reading bugmail) 2012-08-02 08:23:52 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #24)
> /proc/[plugin-container]/oom_score_adj is +rw for app_0 on my nexus s. 
> Generally, with these kinds of things any process with equivalent rights can
> *raise* nice / oom_adj within a fixed resource limit, but it takes root
> privs to *lower* that limit.

Exactly, and we have to go both up and down with oom_adj and nice.  That's why I was doing it from the master process, which runs as root.
Comment 26 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-02 08:26:34 PDT
The "within fixed resource limit" was important there.  If my limit for nice is 2, e.g., then I can raise up arbitrarily up to 20 and back down to 2, but to go *below* 2, I need root.
Comment 27 Justin Lebar (not reading bugmail) 2012-08-02 13:07:02 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #26)
> The "within fixed resource limit" was important there.  If my limit for nice
> is 2, e.g., then I can raise up arbitrarily up to 20 and back down to 2, but
> to go *below* 2, I need root.

Ah, okay.

On my newly-flashed Otoro, /proc/self/oom_adj and /proc/self/oom_score_adj are owned by root and are rw-r--r--.  So we're still stuck with proxying this up to the main process, right?
Comment 28 Justin Lebar (not reading bugmail) 2012-08-02 16:05:02 PDT
So...I was wrong yet again about why we need to be root here, but I still think the net result is that we need to be root:

On my Otoro, /proc/<pidof plugin-container>/oom_adj is owned by app0.  So the plugin-container can set its own oom_adj.  But it does not appear to have an rlimit; that is, plugin-container can increase its oom_adj, but never decrease it.
Comment 29 Justin Lebar (not reading bugmail) 2012-08-02 17:08:29 PDT
Created attachment 648574 [details] [diff] [review]
Part 1b: Address cjones's review coments.
Comment 30 Justin Lebar (not reading bugmail) 2012-08-02 17:09:36 PDT
Created attachment 648575 [details] [diff] [review]
Part 1c: Address bz's review comments, move ProcessPriorityManager from dom/browser-element to dom/ipc.

Boris, I'd like you to take a quick look particularly to double check that my handling of the refcounting in the ProcessPriorityManager constructor is sane.
Comment 31 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-02 23:52:01 PDT
Comment on attachment 648574 [details] [diff] [review]
Part 1b: Address cjones's review coments.

Best we can do, because teh lunix implements oom_adj in a dumb way.  Oh well.
Comment 32 Boris Zbarsky [:bz] (still a bit busy) 2012-08-03 20:05:21 PDT
Comment on attachment 648575 [details] [diff] [review]
Part 1c: Address bz's review comments, move ProcessPriorityManager from dom/browser-element to dom/ipc.

I don't think this constructor setup is all that great, esp. if someone subclasses this class sometime.

Better to leave the Init() function and document why it's there.
Comment 33 Justin Lebar (not reading bugmail) 2012-08-03 21:00:01 PDT
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #32)
> I don't think this constructor setup is all that great, esp. if someone
> subclasses this class sometime.
> 
> Better to leave the Init() function and document why it's there.

sgtm.  If the rest of the patch looks OK, I'll land with the old Init() function and your r+ on that patch.
Comment 34 Boris Zbarsky [:bz] (still a bit busy) 2012-08-03 21:14:10 PDT
That sounds fine to me.
Comment 35 Ed Morley [:emorley] 2012-08-05 08:49:13 PDT
https://hg.mozilla.org/mozilla-central/rev/0e213ba77dca

Note You need to log in before you can comment on or make changes to this bug.