Closed Bug 1082290 Opened 10 years ago Closed 9 years ago

Implement cgroup swappiness feature for low memory target (256 MB)

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(feature-b2g:2.2+, firefox36 wontfix, firefox37 wontfix, firefox38 fixed, b2g-v2.1 affected, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S5 (6feb)
feature-b2g 2.2+
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- fixed
b2g-v2.1 --- affected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: tkundu, Assigned: dhylands)

References

Details

(Whiteboard: [caf priority: p2][CR 787261][ft:media] [2.2 target])

Attachments

(4 files)

Hi,

we saw that Android kitkat has a nice feature as below : 

If memory cgroups are available, the ActivityManager will mark lower priority threads as being more swappable than other threads. If memory is needed, the Android kernel will start migrating memory pages to zRAM swap, giving a higher priority to those memory pages that have been marked by ActivityManager. 

ref: https://source.android.com/devices/low-ram.html#zram


This will help not to swap high priority process from memory and it will increase performance over all on zram enabled device.
[Blocking Requested - why for this release]:
No longer blocks: CAF-v2.1-FC-metabug
blocking-b2g: --- → 2.1?
blocking-b2g: 2.1? → 2.2?
This looks like a useful feature. I've filed bug 1081871 just a few days ago to resurrect the experimental cgroup support we had and see if it works well on KK devices.
See Also: → 1081871
blocking-b2g: 2.2? → ---
feature-b2g: --- → 2.2?
It's supported after Linux Kernel 3.10. Since it depends on kernel configure and kernel version. It seems to be better to enable this feature by partner in current stage. For long term, we may need to create a certified API for this kind of use for system app. Then system app can have more flexible control for other apps. But we need to have more discussion with it.
Blocks: 994518
Assignee: nobody → dhylands
(In reply to Ken Chang[:ken] from comment #3)
> It's supported after Linux Kernel 3.10. 

We can assume the 3.10 kernel for the purposes of implementing this bug in the v2.2 timeframe.
This could be implemented as part of the HAL's SetProcessPriority() function and be transparent to the rest of the codebase:

http://hg.mozilla.org/mozilla-central/file/d1adecc7adad/hal/gonk/GonkHal.cpp#l1536

We should probably create and populate the /mnt/gentoo/sys/fs/cgroup/memory group ourselves and verify it's working by looking for cgroup.procs file.

(In reply to Michael Vines [:m1] [:evilmachines] from comment #4)
> We can assume the 3.10 kernel for the purposes of implementing this bug in
> the v2.2 timeframe.

It would be worth tracking bug 1094121 and its dependencies to figure out when we have something usable for testing.
See Also: → gonk-L
Whiteboard: [ft:media] [2.2 target]
Target Milestone: --- → 2.2 S4 (23jan)
Quick note, once this lands we should remember to update the documentation here:

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Architecture#Processes_and_threads
Hema, is it something related to media?
Flags: needinfo?(hkoka)
(In reply to Kevin Hu [:khu] from comment #7)
> Hema, is it something related to media?

No, this touches all applications, it's not media-specific.
Flags: needinfo?(hkoka)
That's why I asked. I don't think it's a media issue. But, ft:media was tagged in the whiteboard.
It's probably ft:media because I'm on the media team, and working on it.
feature-b2g: 2.2? → 2.2+
QA Whiteboard: [2.2-feature-qa+]
Flags: in-moztrap?(twen)
Status: NEW → ASSIGNED
Do we have any update for this bug? Do we plan to finish this bug by 1/12, the branch date? 
Thank you.
Flags: needinfo?(dhylands)
(In reply to Kevin Hu [:khu] from comment #11)
> Do we have any update for this bug? Do we plan to finish this bug by 1/12,
> the branch date? 

This kind-of depends on bug 1081871 which is close to being finished and which we also want to land before the 1/12 date. If that bug lands then implementing this on top of it should be very easy.
(In reply to Kevin Hu [:khu] from comment #11)
> Do we have any update for this bug? Do we plan to finish this bug by 1/12,
> the branch date? 
> Thank you.

I've been reading up on cgroups and just reviewing 1081871 (will be r+'ing today).

My reading of this now, is that once bug 1081871 lands, all that's required are some tweaks in files which are under vendor control (as discussed in https://source.android.com/devices/tech/low-ram.html#zram - there are some kernel configs to enable, and a line that needs to be added to fstab).

khu - the changes needed (kernel config and fstab change) will be phone specific. Is there a particular phone that this is being targeted for?

I can test on my flame, but I don't have any 256 Mb phone.
Flags: needinfo?(dhylands)
Flags: needinfo?(khu)
Depends on: 1081871
(In reply to Dave Hylands [:dhylands]{PTO until Jan 5) from comment #13)
> My reading of this now, is that once bug 1081871 lands, all that's required
> are some tweaks in files which are under vendor control (as discussed in
> https://source.android.com/devices/tech/low-ram.html#zram - there are some
> kernel configs to enable, and a line that needs to be added to fstab).

We should probably add a little bit of logic on top of bug 1081871 too. The cgroup hierarchy which I've implemented in bug 1081871 only applies to the cpu cgroup. The memory cgroup will have a separate hierarchy living under another path (/sys/fs/cgroup/memory apparently). We can re-use the same layout we have for both hierarchies but we'll need to treat them separately: we'll have two cgroup.procs file per priority level, one under /dev/cpuctl/<path>/cgroup.procs and another under /sys/fs/cgroup/memory/<path>/cgroup.procs and the configuration files will also need to be adjusted separately (/dev/cpuctl/<path>/cpu.shares vs /sys/fs/cgroup/memory/<path>/memory.swappiness).
The flame already has zram0 swap enabled, so once bug 1081871 lands, then this should be implemented.
Attachment #8546280 - Flags: review?(mwu)
Attachment #8546280 - Flags: review?(mwu) → review+
(In reply to Dave Hylands [:dhylands] from comment #13)
> (In reply to Kevin Hu [:khu] from comment #11)
> > Do we have any update for this bug? Do we plan to finish this bug by 1/12,
> > the branch date? 
> > Thank you.
> 
> I've been reading up on cgroups and just reviewing 1081871 (will be r+'ing
> today).
> 
> My reading of this now, is that once bug 1081871 lands, all that's required
> are some tweaks in files which are under vendor control (as discussed in
> https://source.android.com/devices/tech/low-ram.html#zram - there are some
> kernel configs to enable, and a line that needs to be added to fstab).
> 
> khu - the changes needed (kernel config and fstab change) will be phone
> specific. Is there a particular phone that this is being targeted for?
> 
> I can test on my flame, but I don't have any 256 Mb phone.

Hi, Dave, I talked with QA. They are also using Flame with 319MB configuration now. We need a better solution for 256MB memory usage testing. I will raise the discussion and figure it out. Sorry for the inconvenience. 

Keep the ni to me for tracking.
https://github.com/dhylands/codeaurora_kernel_msm/commit/aff4ca737d0966bf0498f83e977ce9a3178bdf24
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reopening, since gsvelto pointed out that I'm missing some stuff.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Android seems to keep the memory cgroups in /sys/fs/cgroup/memory

Sigh - I don't know why I didn't appreciate that this patch would be required. But here it is.

I verified that the groups get created and that the process moves around as it changes priority (from fg to bg).
Attachment #8547794 - Flags: review?(gsvelto)
Comment on attachment 8547794 [details] [diff] [review]
Added memory cgroup to FxOS process control

Review of attachment 8547794 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM save for a few code formatting nits.

::: hal/gonk/GonkHal.cpp
@@ +1400,5 @@
>    NS_NAMED_LITERAL_CSTRING(kDevCpuCtl, "/dev/cpuctl/");
>    NS_NAMED_LITERAL_CSTRING(kSlash, "/");
>  
> +  nsAutoCString groupName(aGroup);
> +  HAL_LOG("EnsureCpuCGroupExists for group '%s'", groupName.get());

nit: Do you think this is really needed? I'd prefer to have log messages only when something unexpected happens so that we aren't too chatty when everything works.

@@ +1432,5 @@
>      }
>  
>      offset++;
>    }
> +  HAL_LOG("EnsureCpuCGroupExists created group '%s'", groupName.get());

nit: Likewise

@@ +1461,5 @@
> +  NS_NAMED_LITERAL_CSTRING(kMemCtl, "/sys/fs/cgroup/memory/");
> +  NS_NAMED_LITERAL_CSTRING(kSlash, "/");
> +
> +  nsAutoCString groupName(aGroup);
> +  HAL_LOG("EnsureMemCGroupExists for group '%s'", groupName.get());

nit: Likewise

@@ +1471,5 @@
> +  if (!aGroup.IsEmpty()) {
> +    prefPrefix += aGroup + NS_LITERAL_CSTRING(".");
> +  }
> +
> +  nsAutoCString memSwappinessPref(prefPrefix + NS_LITERAL_CSTRING("memory_swappiness"));

nit: This and other lines go beyond the 80-characters-per-line limit we have

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Line_Length

I'm not to fond of it, especially for C++ code that tends to be quite verbose, but consistency is also important.

@@ +1488,5 @@
> +      return false;
> +    }
> +
> +    offset++;
> +  }

nit: Since we use it in two places it might be worth extracting this code into its own function, I'd call it MkdirDashP() :-P

@@ +1489,5 @@
> +    }
> +
> +    offset++;
> +  }
> +  HAL_LOG("EnsureMemCGroupExists created group '%s'", groupName.get());

nit: Same as for the previous log messages

@@ +1498,5 @@
> +                   nsPrintfCString("%d", memSwappiness).get())) {
> +    HAL_LOG("Could not set the memory.swappiness for group %s", memSwappinessPath.get());
> +    return false;
> +  }
> +  HAL_LOG("Set memory.swappiness for group %s to %d", memSwappinessPath.get(), memSwappiness);

nit: Same as for the previous log messages

@@ +1569,1 @@
>    }

nit: I'd add an empty line between the closing brace and the following if() for extra clarity
Attachment #8547794 - Flags: review?(gsvelto) → review+
Yeah - the HAL_LOG's were just for debugging and I forgot to remove them.

I factored out the directory creation and created:

static bool MakeCGroupDir(const nsACstring& aRootDir,
                          const nsACString& aGroupName)

rather than MakeDirDashP since it isn't really mkdir -p (i.e. it only creates the directories within the group name.
https://hg.mozilla.org/mozilla-central/rev/f9625445803a
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
(In reply to Dave Hylands [:dhylands] from comment #21)
> Yeah - the HAL_LOG's were just for debugging and I forgot to remove them.
> 
> I factored out the directory creation and created:
> 
> static bool MakeCGroupDir(const nsACstring& aRootDir,
>                           const nsACString& aGroupName)
> 
> rather than MakeDirDashP since it isn't really mkdir -p (i.e. it only
> creates the directories within the group name.

My suggestion was meant only as a joke, this makes the intent clearer too, thanks!
About 256MB testing devices, we're targeting Dolphin 2.1 and Woodduck now, and around the end of January, we will change another one. Please let me know if you want to know more in details, and/or anything I can help here. Thanks.
Flags: needinfo?(khu)
Talked with Teri. Now, the first testing we would like to perform is to see if we have improvement with this bug fix. In this case, before we have better 256MB device to test, we can try to verify it on Flame 319MB.
(In reply to Kevin Hu [:khu] from comment #25)
> About 256MB testing devices, we're targeting Dolphin 2.1 and Woodduck now,
> and around the end of January, we will change another one.

Note that his is not scheduled for uplift to the 2.1 branch. I'm setting the appropriate flags to have it uplifted to 2.2 because it was scheduled for it but for 2.1 we'd need approval for both this and its dependency, bug 1081871.
Backed out per bug 1081871 comment 31.

https://hg.mozilla.org/integration/b2g-inbound/rev/869662d3bd76
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 2.2 S4 (23jan) → ---
Whiteboard: [ft:media] [2.2 target] → [CR 787261][ft:media] [2.2 target]
Whiteboard: [CR 787261][ft:media] [2.2 target] → [caf priority: p2][CR 787261][ft:media] [2.2 target]
Now that bug 1081871 has re-landed we can re-land this too.
Keywords: checkin-needed
This was merged a few days ago and got missed.

https://hg.mozilla.org/mozilla-central/rev/1affa6bf62b8
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Comment on attachment 8547794 [details] [diff] [review]
Added memory cgroup to FxOS process control

[Approval Request Comment]
Same as bug 1081871 comment 48
Attachment #8547794 - Flags: approval-mozilla-b2g37?
Attachment #8547794 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
NI Walter to help with MTBF run.
Flags: needinfo?(wachen)
I happened to finish a run of yesterday's build. I think there is no white screen / hung issue. However, the MTBF time is too short. There must be some memory issues or other issues. I will try to look into it tomorrow.
Blocks: MTBF-2015Q1
The MTBF results is worse comparing to pre-FL. It used to be more than 10 hours. It's now only around 9 hours or less. We see lots of strange stuffs. For example, sudden shutdown of device and high memory usage.

I have attached some of our logs. This is for one devices that shutdown automatically. We suspect there are some issues regarding the NICE value or power. Please let us know what we can do to see further into this issues.

We have our memory bug filed in another bug.
Flags: needinfo?(wachen)
I saw Nexus 5 running in MTBF shutdown in the same way -> device shutdown when we are running test. I think we might need further help into this issue.
Flags: needinfo?(twen)
If your MTBF setup is simple to get going, we can try to run it over here as well.  If the issues are reproducible on our hardware then we're better equipped to debug sudden device shutdowns (ie, kernel or other non-userspace crashes).
No longer blocks: CAF-v3.0-FL-metabug
NI Walter for MTBF help.
Flags: needinfo?(twen) → needinfo?(wachen)
Hi, Michael, I think the setup should be as easy as gaiatest (or even easier).
I believe we met another issue might relate with this bug again.
Let's Alison do the clarification :P
Blocks: MTBF-B2G
Flags: needinfo?(wachen) → needinfo?(ashiue)
I encountered Nexus 5 shut down issue twice when running MTBF testing (about 12 hours). 
Although the device looks like shut down, I still could use adb shell but have no logcat and no data from b2g-ps.
Besides, the serialNumber has been changed to 0123456789ABCDEF. 
Any idea for this symptom?
Flags: needinfo?(ashiue)
Attached file last_kmsg
I hit this issue again yesterday (run MTBF about 8 hours), attached last_kmsg for reference.
(In reply to Alison Shiue from comment #44)
> I hit this issue again yesterday (run MTBF about 8 hours), attached
> last_kmsg for reference.

The warning messages seems to come from one of the kernel drivers; are you sure that the issue is related to the patch here? Otherwise I'd move testing this issue to another bug so that we better track it.
(In reply to Gabriele Svelto [:gsvelto] from comment #45)
> (In reply to Alison Shiue from comment #44)
> > I hit this issue again yesterday (run MTBF about 8 hours), attached
> > last_kmsg for reference.
> 
> The warning messages seems to come from one of the kernel drivers; are you
> sure that the issue is related to the patch here? Otherwise I'd move testing
> this issue to another bug so that we better track it.

No, I'm not sure if this issue is related to this patch. This warning message is just for reference if  it can provide more idea about this problem since I cannot get much information from the test device under the strange status. 

And I think track this issue to another bug is a good idea.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: