Closed Bug 1161189 Opened 7 years ago Closed 4 years ago

The OOM Strikes Back

Categories

(Firefox OS Graveyard :: Gaia::System, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g-v2.2 unaffected, b2g-master affected)

RESOLVED WONTFIX
Tracking Status
b2g-v2.2 --- unaffected
b2g-master --- affected

People

(Reporter: djf, Unassigned)

References

Details

(Keywords: regression)

Running today's nightly build on a 319mb Flame, I can cause just about any app to OOM by repeatedly pressing the Home button and then tapping the app icon again to re-open the app.

This happens pretty consistently within about 5 home/reopen cycles.  When an OOM has occurred and I tap the app icon, I see a blank white screen for a second or so before the app restarts.

When the OOM happens, I see messages like these in dmesg:

<6>[  226.516796] Killing 'Camera' (1687), adj 667,
<6>[  226.516801]    to free 26120kB on behalf of 'b2g' (208) because
<6>[  226.516806]    cache 19892kB is below limit 20480kB for oom_score_adj 588
<6>[  226.516810]    Free memory is 2396kB above reserved.
<6>[  226.516813]    Free CMA is 0kB
<6>[  226.516816]    Total reserve is 2908kB
<6>[  226.516819]    Total free pages is 2396kB
<6>[  226.516822]    Total file cache is 30216kB
<6>[  226.516825]    GFP mask is 0x220da

I know that this happens for lots of different apps because I can grep dmesg for 'Killing':

$ adb shell dmesg | grep Killing
<6>[   66.268594] Killing 'Find My Device' (1250), adj 734,
<6>[   66.396602] Killing 'Camera' (1289), adj 667,
<6>[  246.996437] Killing 'Music' (1936), adj 667,
<6>[  412.936784] Killing 'Find My Device' (2294), adj 667,
<6>[  418.135849] Killing 'Video' (2317), adj 667,
<6>[  639.730420] Killing 'Communications' (3665), adj 667,
<6>[  682.445186] Killing 'Messages' (3829), adj 667,
<6>[  737.924617] Killing 'Gallery' (4317), adj 667,

Memory-hungry apps like Gallery seem to OOM more easily than others do, but they all seem to die.

Note that I'm not trying to run multiple apps here. A 319mb Flame is not able to keep the homescreen and one other app alive.
Keywords: qawanted
Can we do a branch check?
This is not a completely new issue. I flashed a nightly master build from 2015-04-30-01-02-01 and have the same issue there.
I can't reproduce this on today's 2.2 nightly build.
I thought I was flashing my phone back to 3.0, but apparently I got mozilla-aurora instead of mozilla-central by mistake. So while I'm on this build, I can report that this bug does not reproduce with pvt/mozilla-aurora-flame-kk-eng/latest/flame-kk.zip
Marking this as a regressions since it does not occur in 2.2 but does in 3.0.  I'd love to know when in 3.0 it started...

It seems clear that the OOMs occur either when pushing the home button or when tapping the app icon.  Something about the process of launching or closing the app is using up enough memory to push the LMK into killing the app.  

In addition to seeing the app OOM, I also sometimes see the homescreen restart. (It scrolls back to the top, for example)
Keywords: regression
STR:
0) Have a page of pictures in Gallery app. Perform a 'adb root' to ensure console has root access.
1) Open Gallery, and tap to view any picture
2) Press Home button
3) Go back to Gallery, and view a different picture
4) Repeatedly switch between Gallery and Homescreen for 20 times, each time viewing a different picture
5) $ adb shell dmesg | grep Killing

Expected: Nothing is displayed in terminal

Actual: Something similar to:
<6>[   60.280129] Killing 'Find My Device' (1215), adj 734,
<6>[   74.779425] Killing 'Gallery' (1237), adj 667,
<6>[  133.923451] Killing 'Gallery' (1459), adj 667,
<6>[  190.438596] Killing 'Gallery' (1780), adj 667,
<6>[  227.863227] Killing 'Gallery' (2051), adj 667,
<6>[  247.933070] Killing 'Gallery' (2317), adj 667,

----

This issue reproduces on Flame 3.0.

Device: Flame 3.0 (319MB KK full flash)
BuildID: 20150504010202
Gaia: e18cce173840d6ff07fb6f1f0e0ffb58b99aab3e
Gecko: dc5f85980a82
Gonk: a9f3f8fb8b0844724de32426b7bcc4e6dc4fa2ed
Version: 40.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0

-----

This issue does NOT reproduce on Flame 2.2. Following STR, nothing is displayed on terminal.

Device: Flame 2.2 (319MB KK full flash)
BuildID: 20150504002502
Gaia: 8d14361337e608c8cdf165ea5034db5eda23b618
Gecko: cb7cb6597c91
Gonk: a9f3f8fb8b0844724de32426b7bcc4e6dc4fa2ed
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

---

Although this is a regression, I don't suggest on asking for a regression window for a performance bug like this. I have seen white screen on app launch and it has been occurring since March, possibly even earlier than March (see bug 1153023 comment 3).
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
I think Bug 1151571 can be related to this issue, since the white screen often signals the app is restarting.
Pi Wei is right: this bug goes back to some time before April.  I just verified that it reproduces with the April 1st nightly build.
Just flashed build 20150504010202. I think it's bug 1151672.

b2g-ps shows that the app processes are forked from b2g, instead of Nuwa:

APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              0 root      212   1     251380 70408 ffffffff b6ebf894 S /system/b2g/b2g
(Nuwa)           0 root      443   212   99644  8404  ffffffff b6ebf894 S /system/b2g/b2g
Homescreen       2 u0_a1137  1137  212   125868 27024 ffffffff b6ee8894 S /system/b2g/plugin-container
Browser          2 u0_a1767  1767  212   110520 26340 ffffffff b6e8c894 S /system/b2g/plugin-container
Depends on: 1151672
Also, the LMK parameters are different. On 3.0:

  notify_trigger 14336 KB

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 20480 KB

And on 2.2:
  notify_trigger 9216 KB

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 10240 KB

This makes LMK much aggressive to kill on 3.0. Plus bug 1151672 makes app process consume more memory. The result is app processes fall victim to LMK very easily on 3.0 now.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.