Closed Bug 1087882 Opened 10 years ago Closed 6 years ago

[Tarako] Clock app OOM-killed when trying to display "alarm ringing" screen if the phone is busy

Categories

(Firefox OS Graveyard :: Runtime, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g-v1.4 affected)

RESOLVED WONTFIX
Tracking Status
b2g-v1.4 --- affected

People

(Reporter: atsai, Unassigned)

References

Details

Attachments

(1 file)

46 bytes, text/x-github-pull-request
timdream
: review+
Details | Review
The bug is reproducible on Tarako devices, but it's super hard to reproduce on other devices (and reference phone)

Alarm will not work if the phone is busy.

[STR]
1. Set an alarm ring after 3 minutes
2. Execute multiple apps to keep it busy
   Sample: 2-1. Launch browser to visit website with many pictures
           2-2. Answer a phone call
3. After the alarm is set for 3 minutes, the alarm should ring

[Expected Result]
3. The alarm will ring

[Actual Result]
3. The alarm will not ring
We believe this should be a general bug. Simply due to we are testing w/ built-in apps and it will be harder to reproduce in devices with more RAM. I'll suggest to find out the root cause and fix it in the next version.
Attached file PR to v1.3t
Tim, I think do not OOP clock could be a solution for Tarako alarm issue. I verified on a real device, and it does work properly. Could you review this patch? Thanks!
Attachment #8510142 - Flags: review?(timdream)
Comment on attachment 8510142 [details] [review]
PR to v1.3t

I'll just rubberstamp this ....
Attachment #8510142 - Flags: review?(timdream) → review+
Well, that's really not great to do that since we won't get rid of the clock app on memory pressure. So the phone will be in a worse state as soon as we use the clock.
(In reply to Fabrice Desré [:fabrice] from comment #4)
> Well, that's really not great to do that since we won't get rid of the clock
> app on memory pressure. So the phone will be in a worse state as soon as we
> use the clock.

Yes, I totally agree with you. The current situation is when there is a foreground app and system is in low memory state, clock will be killed as soon as it was launched. There could be another alternative solution, modify the nature of launch an App, but I don't have any idea how to realize it.
Is the issue that the clock is killed between the system messages is sent and the alarm is triggered? Couldn't be use a Wake Lock for this? I see the Clock uses one but maybe it's requested too late in the startup process...
Hi Kai-Zen -

Per Julien's comment, is it possible to use wake lock to prevent the LMK and let Alarm launch?
Flags: needinfo?(kli)
(In reply to Vance Chen [:vchen][vchen@mozilla.com] from comment #7)
> Hi Kai-Zen -
> 
> Per Julien's comment, is it possible to use wake lock to prevent the LMK and
> let Alarm launch?

I think wakelock is the way we used to keep acpu awake or screen on.
Once we met memory pressure, LMK will kill the processes with adj value of corresponds adj.
holding a wakelock may not help for this case :(
Flags: needinfo?(kli)
Guess we are facing a dilemma now. To ensure that the Alarm will ring under any circumstance, or to prevent B2G process consume too much RAM, could anyone help to make a decision here regarding which one is more beneficial to the Tarako device users?

Quoted from the Tarako device review website[1]:

"There isn't even anything to keep the crucial alarm process alive. If the phone is busy when an alarm is supposed to go off, it just doesn't go off. An alarm you can't trust to work 100 percent of the time is useless."

[1]http://arstechnica.com/gadgets/2014/10/testing-a-35-firefox-os-phone-how-bad-could-it-be/2/
Flags: needinfo?(fabrice)
I think getting a WakeLock on 'cpu' actually change the "adj" value for this process. But Fabrice will definitely know.
(In reply to Vance Chen [:vchen][vchen@mozilla.com] from comment #9)

> Quoted from the Tarako device review website[1]:
> 
> "There isn't even anything to keep the crucial alarm process alive. If the
> phone is busy when an alarm is supposed to go off, it just doesn't go off.
> An alarm you can't trust to work 100 percent of the time is useless."

For the record, this comment is totally wrong. There is no "alarm process" that dies, since alarms are managed by the parent process. The alarm goes off, but we fail to launch the app.

Alarms are triggering a system message, and we hold a cpu wakelock when sending system messages to have a better chance to launch the app. It seems it's not enough on Tarako, likely because we drop this wakelock on a timer too early on this slower device.

So... I'm not keen on doing big platform changes here for tarako. What would be acceptable is to:
- launch the clock app in process.
- add to the system app the capability to get rid of in-process apps when we receive memory pressure and the app is not in the foreground, like we do with the keyboard.
Flags: needinfo?(fabrice)
Fabrice, is it possible that the clock app is taking a lock too late because of all the require.js machinery ?
(In reply to Julien Wajsberg [:julienw] from comment #12)
> Fabrice, is it possible that the clock app is taking a lock too late because
> of all the require.js machinery ?

The lock I talked about is taken by gecko when sending the system message, so require.js will not matter.
I understood this, but it's eventually dropped, right? When is it dropped?
The clock app is taking a lock at one moment but only when it executes the system message handler IIRC.
Component: Gaia::Clock → Runtime
Summary: [Tarako] Alarm won't work if the phone is busy → [Tarako] Clock app OOM-killed when trying to display "alarm ringing" screen if the phone is busy
(In reply to Julien Wajsberg [:julienw] (PTO until 1/5) from comment #14)
> I understood this, but it's eventually dropped, right? When is it dropped?
> The clock app is taking a lock at one moment but only when it executes the
> system message handler IIRC.

I was spelunking in this code for other reasons, so this is hopefully the right answer:

The lock is dropped after the handler registered by mozSetMessageHandler is (synchronously) invoked.  Note that there seem to be 2 different wake-locks going on; one held by the parent in SystemMessageInternal.js and one held by the client that was allegedly grabbed by the parent on behalf of the child (and is handled in SystemMessageManager.js).  The parent lock is only dropped once all of the messages have been handled.  The child lock OTOH may be dropped after the first handler has run.  (The inconsistency there is that _dispatchMessage doesn't actually know how many messages there are to be processed but it makes the decision of when to fire.)

- mozSetMessageHandler in the child sends GetPendingMessages to its parent.  Note that the type of the message is sent.
  https://dxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageManager.js#121

- The parent gets the pending messages and sends them to the child as GetPendingMessages:Return.  Note that messages are tracked on a per-type basis so there should be no weird problems if mozSetMessageHandler calls for different types happen in different turns of the event loop.  (AKA you won't footgun yourself by having different modules each doing mozSetMessageHandler for different types.)
  http://dxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageInternal.js#458

- The child receives the messages and dispatches them using _dispatchMessage if there's a handler.  _dispatchMessage synchronously calls the handler and then sends an async "HandleMessagesDone" to the parent.  If there was no handler a "HandleMessagesDone" is sent for all the messages received.  There is also a "handle-system-messages-done" observer notification that's sent within the child process in both cases for that client lock thing I mentioned.  The comment name-checks Bug 874353.
  message handler:
  https://dxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageManager.js#203
  _dispatchMessage:
  https://dxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageManager.js#60

- The parent receives the HandleMessagesDone which includes the number of messages processed in the payload.  It decrements the provided count from the wakelock using _releaseCpuWakeLock and releases the lock if it hits zero.
  http://dxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageInternal.js#523
Ok, so, as far as I understand, we hold a lock during the whole process then. Thanks for the thorough investigation!
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: