Alarm can't go off (clock doesn't start up) under "process buster" stress

VERIFIED FIXED

Status

()

Core
General
VERIFIED FIXED
6 years ago
5 years ago

People

(Reporter: cjones, Assigned: Justin Lebar (not reading bugmail))

Tracking

unspecified
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, b2g18 verified, b2g18-v1.0.1 verified)

Details

(Whiteboard: [target 28/2])

Attachments

(1 attachment)

STR
 (1) Open Clock application
 (2) Set an alarm for 10 minutes or so in the future
 (3) Load http://people.mozilla.com/~cjones/membuster.com
 (4) Choose "Bust processes"
 (5) Wait for tabs to stabilize and load example.com.  Usually takes 30s or so.
 (6) Put phone to sleep

The alarm never goes off.  Unlocking the phone shows a "ghost" Clock app entry in the task switcher.  This indicates b2g tried to launch Clock but it crashed while being initialized.
(Assignee)

Comment 1

6 years ago
This is essentially the same as bug 836199.

There are essentially two problems here, and we probably need to fix both.

The one is that the browser app creates ten foreground processes.  I'm trying to fix that, and it's hard.  We may be hitting APZC bugs or something; I'm not sure yet.  The numerous fg processes make it hard for the alarm app to compete for CPU cycles and make it more likely to be killed.

The other is that the clock app launches with regular fg permission.  We cannot guarantee that the LMK won't kill the alarm app unless it's running with higher permission than the other fg apps.  I'm thinking that we need to elevate the permission of the alarm/communications/etc. app when it is sent a "high-priority" system message (potentially right from the start when it starts up) and drop it down only after it has handled said message.

I wish we weren't hitting this stuff at the eleventh hour.  :(
(Assignee)

Updated

6 years ago
Depends on: 836647
(Assignee)

Updated

6 years ago
Depends on: 836199
No longer depends on: 836647
(In reply to Justin Lebar [:jlebar] from comment #1)
> The other is that the clock app launches with regular fg permission.  We
> cannot guarantee that the LMK won't kill the alarm app unless it's running
> with higher permission than the other fg apps.  I'm thinking that we need to
> elevate the permission of the alarm/communications/etc. app when it is sent
> a "high-priority" system message (potentially right from the start when it
> starts up) and drop it down only after it has handled said message.
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For the alarm, we used to let the AlarmService.jsm listen to the "SystemMessageManager:HandleMessageDone" IPC message to know when to release the CPU wake lock, which might be the same time point to drop down the priority.

For the phone call, I used to introduce the similar mechanism at [1] but was rejected in the long run, which might be useful here. Another alternative place could be RadioInterfaceLayer.handleCallDisconnected() (when the phone call ends) but it doesn't accurately indicate the time point of finishing handling the system message.

[1] attachment #705699 [details] [diff] [review] and attachment #705726 [details] [diff] [review]
(Assignee)

Comment 3

6 years ago
> we used to let the AlarmService.jsm listen to the "SystemMessageManager:HandleMessageDone" IPC 
> message to know when to release the CPU wake lock, which might be the same time point to drop down 
> the priority.

Yes, that's exactly my thinking.
(Assignee)

Comment 4

6 years ago
Bug 836654 may be all we need here.
Depends on: 836654
(Assignee)

Comment 5

6 years ago
Also, in this situation, I think the alarm service will permanently hold on to the CPU wake lock.

We take the CPU wake lock before we send the alarm system message, and we don't release it until we get SystemMessageManager:HandleMessageDone.  But that message never comes if the alarm process crashes.
blocking-b2g: tef? → tef+
Andrew, should we + 836654 as well then?

Justin, you're looking at the other bug (836199) this depends on - are you looking at 836654 and this one as well?
Flags: needinfo?(overholt)
(Assignee)

Comment 7

6 years ago
I'm looking at all these bugs.

(OT, but it would be really helpful if you'd say "bug XXX" so that bugs are linkified and so we can hover over the links to get the bug titles.)
(Assignee)

Comment 8

6 years ago
> I'm looking at all these bugs.

... but I could definitely use help with the Gaia stuff.

Updated

6 years ago
Flags: needinfo?(overholt)
Justin, assigning to you as you're looking at this. We'll try to find some help if you need it.

Clearing needinfo on Andrew as he's +ed bug 836654.
Assignee: nobody → justin.lebar+bug
(Assignee)

Updated

5 years ago
Depends on: 840277
Justin, any news on this ?
Flags: needinfo?(justin.lebar+bug)
(Assignee)

Comment 11

5 years ago
Please see the dependent bugs.
Flags: needinfo?(justin.lebar+bug)
We'll leave this on the blockers for now, but can qa verify if they have seen this in the last 3 weeks or so?
Keywords: qawanted
(Assignee)

Comment 13

5 years ago
I can reproduce this easily.
Thanks jlebar; removing qawanted tag.
Keywords: qawanted
Whiteboard: [target 28/2]
(Assignee)

Comment 15

5 years ago
All of the dependent bugs are fixed here (well, aside from bug 836199, but that doesn't count).  All that remains to be done is to verify that this bug is in fact fixed.
Keywords: qawanted

Comment 16

5 years ago
The alarms go through, after the "Bust processes" was selected and fully loaded. 
Tested on the nightly build: 20130225070200, as well as the master build: 20130225030609. Changing status to WFM
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
(Assignee)

Comment 17

5 years ago
Thanks for testing!

(We usually use WFM when we're not sure what fixed a bug.  But here, it's clear where the fixes came from.)
Resolution: WORKSFORME → FIXED
Resolving branches per comment 16, 17
status-b2g18: --- → fixed
status-b2g18-v1.0.1: --- → fixed
removing qawanted.
Keywords: qawanted
(Assignee)

Comment 21

5 years ago
Created attachment 734171 [details]
membuster testcase

Here's the membuster testcase.
(Assignee)

Comment 22

5 years ago
And for ease of typing on a phone, I made http://bit.ly/membuster.

Comment 23

5 years ago
The alarm went off as it is supposed to, after the "bust process" was ran. 
Issue is Verified as fixed in the following branches-- 

Unagi Build ID: 20130408075816
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/3b51ced33215
Gaia: 8cf8dbd430af5e72a2ad542824421debe6de29d4


Unagi Build ID: 20130408075751
Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/24b27125d907
Gaia: b973e66f328d8e09e364654e17ddab9f2c938bc4
Status: RESOLVED → VERIFIED
status-b2g18: fixed → verified
status-b2g18-v1.0.1: fixed → verified
You need to log in before you can comment on or make changes to this bug.