Closed Bug 1119467 Opened 10 years ago Closed 8 years ago

Implement a workaround to have working cgroup assignments on ICS-based devices

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1081871 +++

Almost all Firefox OS ICS-based devices have cgroup functionality enabled by default in their kernel. Unfortunately said functionality is plagued by a bug that prevents using the cgroup.procs pseudo-file directly for assigning a PID to a group. Apparently the assignment doesn't happen until the PID is moved to a different group, effectively making the process always behave as if it was in the previous group it was assigned too. See bug 1081871 comment 25 for the STR for this problem.

One way to work around this issue on ICS devices is to duplicate the cgroup hierarchy by creating a set of dummy control groups with identical characteristics as the real ones as suggested by :dhylands in bug 1081871 comment 26.

Assigning a process to a group will now first assign it to the dummy group, then to the real one. This way if by chance the device has a working cgroup implementation the process will always end up in the right group; and if the device is buggy it will still behave correctly.
Blocks: b2g-nexuss
Nexus S device is affected: following bug 1081871 comment 25, I see both process with 50% of CPU until I "echo $pid1 > /dev/cpuctl/cgroup.procs".
This patch creates dummy CPU control groups on ICS and uses them so that we have properly working cgroup functionality. Note that I've removed a couple of exit points from the EnsureCpuCGroupExists() so that the CPU returns on failure only when it makes sense to because it won't be able to execute what comes afterwards.

One easy way to test this is with the following STR:

1. Download the "Sketchbook Squad" app from the marketplace and launch it
2. While the intro animation is running quickly go back to the homescreen
3. The icons on the homescreen take forever to reapper as the bug is causing the newly backgrounded app to run with high priority, and the homescreen to run with low priority even though it's in the foreground

After applying my patch the homescreen responds immediately on my hamachi. I've checked this on my Keon too and it seems that GP has also re-enbled cgroups support there so it's working flawlessly. I assume this applies to the Peak too. Alexandre, it would be cool if you could test this on your Nexus S too.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Attachment #8552451 - Flags: review?(dhylands)
Attachment #8552451 - Flags: feedback?(lissyx+mozillians)
Situation seems better than without the nice and without the cgroup on my Nexus S, but it still feels quite a lot laggy.
Comment on attachment 8552451 [details] [diff] [review]
[PATCH] Make CPU cgroup assignments work under ICS

This is not working as expected at all: on my Nexus S, this is breaking WiFi !

I cannot explain why, because that seems totally unrelated, but when this patch is applied, then the WiFi driver fails to properly load.

We get this in |dmesg|:
> <6>[   21.720641] wake enabled for irq 164
> <4>[   24.720878] dhd_bus_rxctl: resumed on timeout
> <4>[   26.722281] dhd_bus_txctl: ctrl_frame_stat == TRUE
> <4>[   29.723565] dhd_bus_rxctl: resumed on timeout
> <4>[   29.726282] dhdsdio_probe: failed

Proper |dmesg| is:
> <4>[   21.915321] DHD: dongle ram size is set to 294912(orig 294912)
> <6>[   22.271058] wake enabled for irq 164
> <4>[   22.277337] Firmware version = wl0: Feb 11 2011 17:01:05 version 4.218.248.23
> <4>[   22.443380] wlan0: Broadcom Dongle Host Driver mac=xx:xx:xx:xx:xx:xx
> <4>[   22.444362] 
> <4>[   22.444366] Dongle Host Driver, version 4.218.248.23
> <4>[   22.447995] current firmware_path[]=/vendor/firmware/fw_bcm4329.bin
> <4>[   22.448130] GOT STA FIRMWARE
> <4>[   22.448197] SET firmware_path[]=/vendor/firmware/fw_bcm4329.bin , str_p:bf02b0e0
> <6>[   30.223141] request_suspend_state: wakeup (0->0) at 30212879481 (2015-01-21 22:25:48.083427086 UTC)
> <7>[   35.154903] wlan0: no IPv6 routers present

And the |lsmod| output shows the bcm4329 module is in "Loading" state instead of "Live". This makes also the device not working as expected, with some kind of lockup where nothing works (B2G frozen at Wifi loading, loosing adb, ...).
Flags: needinfo?(gsvelto)
Attachment #8552451 - Flags: feedback?(lissyx+mozillians) → feedback-
I've thought a little about this and save for some gross bug in the kernel that actually spills over to the wifi module I can't fathom why this is happening :-( One thing that might be worth trying is the following:

Go into the b2g/app/b2g.js file and replace these lines:
 
pref("hal.processPriorityManager.gonk.MASTER.cgroup", "");
pref("hal.processPriorityManager.gonk.PREALLOC.cgroup", "apps/bg_non_interactive");
pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.cgroup", "apps/critical");
pref("hal.processPriorityManager.gonk.FOREGROUND.cgroup", "apps");
pref("hal.processPriorityManager.gonk.FOREGROUND_KEYBOARD.cgroup", "apps");
pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.cgroup", "apps/bg_perceivable");
pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.cgroup", "apps/bg_non_interactive");
pref("hal.processPriorityManager.gonk.BACKGROUND.cgroup", "apps/bg_non_interactive");

With the following:

pref("hal.processPriorityManager.gonk.MASTER.cgroup", "");
pref("hal.processPriorityManager.gonk.PREALLOC.cgroup", "");
pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.cgroup", "");
pref("hal.processPriorityManager.gonk.FOREGROUND.cgroup", "");
pref("hal.processPriorityManager.gonk.FOREGROUND_KEYBOARD.cgroup", "");
pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.cgroup", "");
pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.cgroup", "");
pref("hal.processPriorityManager.gonk.BACKGROUND.cgroup", "");

This will lump all the apps in the default control group, thus preventing other groups from being created or used. If the problem still happens then there's something wrong with the kernel, if it doesn't and going back to the default configuration still reproduces it then there's something horribly wrong with the kernel :-)
Flags: needinfo?(gsvelto)
As I said on IRC, this do not help :(.
There are good chances that we'll have to back out the whole cgroup support but I'd like to make one final try to figure out what's wrong on your device. Can you try applying this patch on top of the other? It introduces a 100ms delay between cgroup assignment. It's super-clunky and it's not meant as a solution but I'm trying to figure out if what's causing the problem on your side is the fact that we quickly assign the same PID to one group and then to another one.
Flags: needinfo?(lissyx+mozillians)
Comment on attachment 8552451 [details] [diff] [review]
[PATCH] Make CPU cgroup assignments work under ICS

Clearing the review until the situation around bug 1122119 has cleared up. No use making more cgroup work if we'll be force to back all of this out.
Attachment #8552451 - Flags: review?(dhylands)
Not helping :( Even pushing things to 1000ms :(
Flags: needinfo?(lissyx+mozillians)
Not gonna happen, see bug 1258684.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: