Closed Bug 1037627 Opened 10 years ago Closed 10 years ago

[B2G][Marketplace] Multiple apps will experience LMK when the device is put into sleep mode after the app launches.

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(b2g-v1.4 unaffected, b2g-v2.0 affected, b2g-v2.1 affected)

RESOLVED WONTFIX
2.1 S1 (1aug)
Tracking Status
b2g-v1.4 --- unaffected
b2g-v2.0 --- affected
b2g-v2.1 --- affected

People

(Reporter: dgomez, Assigned: huseby)

References

()

Details

(Keywords: memory-footprint, perf, regression, Whiteboard: [273MB-Flame-Support], [2.0-exploratory] [MemShrink:P2][c=memory p=4 s= u=2.0])

Attachments

(3 files)

Description:
When a user launches an installed Marketplace app (confirmed to happen with Youtube, Facebook, ConnectA2, CutTheRope, and Interactive Firefox Logo app), puts their device into sleep mode for about 15 - 20 seconds, and wakes up their device, the launched app will have crashed.  The user must restart the app in order to get back into it.

Prerequisites: Be connected to the internet via WiFi.

Repro Steps:
1) Update a Flame to 20140711000201
2) Launch the Marketplace app.
3) Download Youtube, Facebook, ConnectA2, CutTheRope, or Interactive Firefox Logo app
4) Launch the downloaded app.
5) Once the app is loaded, put the device into sleep mode by pressing the power button.
6) Wait about 20 seconds and press the power button again to wake the device.
7) Observe the screen.

Actual:
The launched app will experience an OOM, LMK.
Expected:
The launched app will remain on the screen.

2.0 Environmental Variables:
Device: Flame 2.0 (273MB)
BuildID: 20140711000201
Gaia: 18c44a1bc31b374ba00a069904465a8d07971a60
Gecko: f880dae4fdbe
Version: 32.0a2 (2.0) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Repro frequency: 4/4 - 100%
See attached: Logcat_Marketplace_OOM.txt, Video: http://youtu.be/AMYOw4t6gMA
This issue DOES occur on Flame 2.1 (273MB).

Flame 2.1 (273MB)

2.1 Environmental Variables:
Device: Flame Master
Build ID: 20140711040202
Gaia: c47094a26c87ba71a3da4bae54febd0da21f3393
Gecko: 1b1296d00330
Version: 33.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Result: The app the user had launched is LMKed.

-------------------------------------------------------------------------------------------------------

This issue DOES NOT occur on Flame 2.0 (512MB), Flame 1.4 (273MB), Buri 2.0, Buri 2.1, Buri 1.4, Open_C 2.0, Open_C 1.4, Open_C 2.1, Flame Base v122 (273MB), and Flame Base v121-2 (273MB).


Flame 2.0 (512mb)

2.0 Environmental Variables:
Device: Flame 2.0
BuildID: 20140711000201
Gaia: 18c44a1bc31b374ba00a069904465a8d07971a60
Gecko: f880dae4fdbe
Version: 32.0a2 (2.0) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


Flame 1.4 (273MB)

1.4 Environmental Variables:
Device: Flame 1.4
Build ID: 20140711000202
Gaia: e273c1f52ed7187e4e0b2d66ed5718f0f20c6eeb
Gecko: 896fa800b72d
Version: 30.0 (1.4)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0


Buri 2.1

2.1 Environmental Variables:
Device: Buri Master
Build ID: 20140711040202
Gaia: c47094a26c87ba71a3da4bae54febd0da21f3393
Gecko: 1b1296d00330
Version: 33.0a1 (Master) MOZ
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0



Buri 2.0

2.0 Environmental Variables:
Device: Buri 2.0
Build ID: 20140711000201
Gaia: 18c44a1bc31b374ba00a069904465a8d07971a60
Gecko: f880dae4fdbe
Version: 32.0a2 (2.0)
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


Buri 1.4

1.4 Environmental Variables:
Device: Buri 1.4
Build ID: 20140711000202
Gaia: e273c1f52ed7187e4e0b2d66ed5718f0f20c6eeb
Gecko: 896fa800b72d
Version: 30.0 (1.4) MOZ
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0


Open_C 2.1

2.1 Environmental Variables:
Device: Open_C Master
Build ID: 20140711040202
Gaia: c47094a26c87ba71a3da4bae54febd0da21f3393
Gecko: 1b1296d00330
Version: 33.0a1 (Master)
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0


Open_C 2.0

2.0 Environmental Variables:
Device: Open_C 2.0
Build ID: 20140711000201
Gaia: 18c44a1bc31b374ba00a069904465a8d07971a60
Gecko: f880dae4fdbe
Version: 32.0a2 (2.0) 
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


Open_C 1.4

1.4 Environmental Variables:
Device: Open_C 1.4
Build ID: 20140711000202
Gaia: e273c1f52ed7187e4e0b2d66ed5718f0f20c6eeb
Gecko: 896fa800b72d
Version: 30.0 (1.4) 
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0


Flame Base v122 (273MB)

Base v122 Environmental Variables:
Device: Flame 1.3
Build ID: 20140616171114
Gaia: e1b7152715072d27e0880cdc6b637f82fa42bf4e
Gecko: e181a36ebafaa24e5390db9f597313406edfc794
Version: 28.0 (1.3)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0


Flame Base v121-2 (273MB)

Base v121-2 Environmental Variables:
Device: Flame 1.3
Build ID: 20140610200025
Gaia: e106a3f4a14eb8d4e10348efac7ae6dea2c24657
Gecko: b637b0677e15318dcce703f0358b397e09b018af
Version: 28.0 (1.3)
Firmware Version: v121-2
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0


Actual Result: The user is taken back to the app that was launched earlier.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Component: General → Performance
Product: Marketplace → Firefox OS
Version: Avenir → unspecified
Keywords: footprint, perf
Whiteboard: [273MB-Flame-Support], [2.0-exploratory] → [273MB-Flame-Support], [2.0-exploratory] [MemShrink]
Dean can you flip the tracking flags according to your results?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage-]
Flags: needinfo?(pbylenga) → needinfo?(dgomez)
Updating tracking flags.
I could not set tracking flags earlier due to the Product being set to Marketplace.
QA Whiteboard: [QAnalyst-Triage-] → [QAnalyst-Triage?]
Flags: needinfo?(dgomez) → needinfo?(pbylenga)
Nominating to block 2.0, is a regression from 1.4 at 273MB and affects multiple apps.  Requesting a window.
blocking-b2g: --- → 2.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Priority: -- → P2
Whiteboard: [273MB-Flame-Support], [2.0-exploratory] [MemShrink] → [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p= s= u=]
blocking-b2g: 2.0? → 2.0+
Severity: normal → blocker
Priority: P2 → P1
Whiteboard: [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p= s= u=] → [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p= s= u=2.0]
This is a problem due to memory pressure.  Since the homescreen memory usage issues tracked by Bug 1029902 are the highest priority at the moment, we want to re-test this after Bug 1029902 is resolved.  For now we'll just block on that and see if this goes away when homescreen uses less memory.
Depends on: 1029902
This needs to be resolve regardless of homescreen memory pressure. This is a race condition in the system app. The homescreen should OOM and we should be able to re-launch it.

We are going to improve the homescreen soon, but I fear that is just going to mask the real problem.
No longer depends on: 1029902
QA Contact: ckreinbring
After talking with :kgrandon, we decided that this shouldn't wait for the homescreen memory issues to be fixed before investigating/addressing this one.
Assignee: nobody → dhuseby
Whiteboard: [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p= s= u=2.0] → [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p=4 s= u=2.0]
Regression window:

Last working
Build ID: 20140604142816
Gaia: a38a6a5c6fabc97dd16d5360632b5ac5c7e06241
Gecko: 951e3a671279
Platform Version: 32.0a1
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

First broken
Build ID: 20140604173717
Gaia: d2cfef555dabab415085e548ed44c48a99be5c32
Gecko: 2ebadbba6cc8
Platform Version: 32.0a1
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Working Gaia / Broken Gecko = No repro
Broken Gaia / Working Gecko = Repro
Gaia pushlog: https://github.com/mozilla-b2g/gaia/compare/a38a6a5c6fabc97dd16d5360632b5ac5c7e06241...d2cfef555dabab415085e548ed44c48a99be5c32


B2G Inbound

Last working
Build ID: 20140604084216
Gaia: 2a4c7becdb141d2601e47a040a27eebe52a8db79
Gecko: fd5bb34861d6
Platform Version: 32.0a1
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

First broken
Build ID: 20140604105916
Gaia: 18e2e8dc2d9ff19cd1210026367c14956d04eb0d
Gecko: c36c5f011229
Platform Version: 32.0a1
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Working Gaia / Broken Gecko = No repro
Broken Gaia / Working Gecko = Repro
Gaia pushlog: https://github.com/mozilla-b2g/gaia/compare/2a4c7becdb141d2601e47a040a27eebe52a8db79...18e2e8dc2d9ff19cd1210026367c14956d04eb0d
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Another pushlog pointing to the first vertical homescreen build as the cause.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
We've already encountered a similar issue on the Tarako. The issue there was triggered by the natural lack of free memory rather than the homescreen regression we had here but the root cause is the same once memory starts to get low. For a description of the issue you can read bug 989713 comment 72. Long story short, when turning off the screen the foreground's application priority drops below the priority of the homescreen. If we're tight on memory this will cause the said app to be reaped by the LMK. I had a fix for this in attachment 8423464 [details] [diff] [review] but I refrained from landing it at the time. The reason we didn't take it was that this would change the current behavior of the process priority manager significantly when dealing with FG/BG transitions and this requires large changes to the existing tests as well as further coverage, and also could induce startup time regressions when the phone is loaded. Since I didn't have enough time to do extensive testing we decided not to pick it up. If having a fix for this particular scenario is important however I should have enough time next week for finishing it up.
I can confirm that the oom kill is happening on my flame flashed with v122+fonts plus gecko+gaia build ID 20140711000201.  it took considerably longer than 20 seconds for the oom kill to happen.  The log shows these oom kill events:

E/OomLogger(  333): [Kill]: send sigkill to 2431 (Marketplace), adj 667, size 4667
E/OomLogger(  333): [Kill]: send sigkill to 3000 (Facebook), adj 667, size 5472
E/OomLogger(  333): [Kill]: send sigkill to 3133 (YouTube), adj 667, size 4879
This is a memory report gathered on a v2.0 flame that is in sleep mode, just before oom killing the twitter app that was running when the device went to sleep.

If I'm reading this memory report correctly, at the bottom of the memory report, I'm seeing the following:

  168 MB for b2g+child processes
   85 MB for system processes
+  11 MV for zram-compr-data-size
---------------------------------
  264 MB total

That's going to cause the memory pressure event that is oom killing the twitter app.
This is the memory report with the twitter app running before the device goes to sleep.  Unfortunately, this doesn't look much different than when the device is asleep, just before oom killing the twitter app.

Hrm....
i just caught the system oom killing the twitter app with this message:

E/OomLogger( 3959): [Kill]: gtp_init_ext_watchdog i2c_transfer send sigkill to 4394 (Twitter), adj 667, size 5939

so the gtp_init_ext_watchdog is in the gt9xx.c touchscreen driver.  is this log error saying that the touchscreen event processing caused the oom kill?
Kyle,

is the oom kill log message in Comment 14 show that the cause is touch event processing?  the gtp_init_ext_watchdog is associated with the touch screen device.
Flags: needinfo?(khuey)
I've never seen it with something else in there before.  Maybe dhylands knows?
Flags: needinfo?(khuey) → needinfo?(dhylands)
Whiteboard: [273MB-Flame-Support], [2.0-exploratory] [MemShrink][c=memory p=4 s= u=2.0] → [273MB-Flame-Support], [2.0-exploratory] [MemShrink:P2][c=memory p=4 s= u=2.0]
It's entirely possible that the gt9xx driver is trying to do a memory allocation.

If no memory is available, then this will trigger an OOM and one of the child processes will be killed.
Flags: needinfo?(dhylands)
It's also possible that it has nothing to do with the gt9xx driver and that it was just in the middle of printing something, got context switched away to something else which tried to allocate and caused the OOM.
(In reply to Dave Hylands [:dhylands][on PTO Wed Jul 23] from comment #17)
> It's entirely possible that the gt9xx driver is trying to do a memory
> allocation.

Since the LMK is triggered by any VM activity then it could be anything touching some space that wasn't previously backed by physical memory at that moment. gtp_init_ext_watchdog() doesn't seem to be doing allocations on its own but it could really be just touching some clean pages and thus triggering the LMK. Unless it's repeatable I wouldn't read too much into it, I don't think that particular function call is responsible for the lack of free memory.
IIRC, I saw it multiple times.  That was two days ago.  Let me see if it happens again.
QA Wanted to retest on 319 MB Flame.
QA Whiteboard: [QAnalyst-Triage+]
Keywords: qawanted
QA Contact: ckreinbring → jmitchell
Issue does NOT repro on 2.0 or 2.1 flame in 319 mb mem

Actual Results - The launched apps remain on the screen when the device is unlocked

Device: Flame Master (319 mem)
Build ID: 20140725102306
Gaia: 3a06aa58245eaf848242d6d1497c1af536fffabd
Gecko: 8da875b402fe
Version: 34.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame 2.0 (319 mem)
Build ID: 20140721082721
Gaia: b9d19011123487009c80d1200937652d58c434a0
Gecko: d69cd84b6824
Version: 32.0a2 (2.0)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: qawanted
issue does not repro in 319 mem, removing blocking flag but leaving open for perf investigation
blocking-b2g: 2.0+ → ---
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(pbylenga)
Since this bug is not longer blocking 2.0 due to not being able to reproduce on the new 319MB minimum RAM spec, I'm closing it as won't fix.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Target Milestone: --- → 2.1 S1 (1aug)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: