Closed Bug 866366 Opened 11 years ago Closed 11 years ago

[Buri][Alarm]it will not start alarm until light up the LCD

Categories

(Firefox OS Graveyard :: Gaia::Clock, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 verified)

VERIFIED FIXED
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- fixed
b2g18 --- verified
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- verified

People

(Reporter: sync-1, Assigned: airpingu)

References

Details

(Whiteboard: [status: needs review][required_last_cert_round] c=)

Attachments

(4 files, 11 obsolete files)

8.20 KB, patch
airpingu
: review+
Details | Diff | Splinter Review
5.67 KB, text/plain
Details
5.75 KB, patch
fabrice
: review+
justin.lebar+bug
: feedback+
Details | Diff | Splinter Review
14.19 KB, patch
justin.lebar+bug
: review+
fabrice
: feedback+
Details | Diff | Splinter Review
AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.019.085
 Firefox os  v1.0.1
 Mozilla build ID:20130422230201
 
 +++ This bug was initially created as a clone of Bug #446810 +++
 
 DEFECT DESCRIPTION:
 (1)Switch on the Flight mode
 (2)set one alarm ,such as 8 hours later, or 20 minutes later.
 (3)When the time is out, the mobile will not has alarm, only when you light up the LCD, it start the alarm ring.  ---- NOK
 
  REPRODUCING PROCEDURES:
 
  EXPECTED BEHAVIOUR:
 The alarm should work normally, alarm just at the time set.
 
  ASSOCIATE SPECIFICATION:
 
  TEST PLAN REFERENCE:
 
  TOOLS AND PLATFORMS USED:
 
  USER IMPACT:
 serious
 
  REPRODUCING RATE:
 80%
 
  For FT PR, Please list reference mobile's behavior:
 
 ++++++++++ end of initial bug #446810 description ++++++++++
blocking-b2g: --- → tef?
(In reply to sync-1 from comment #0)
>  DEFECT DESCRIPTION:
>  (1)Switch on the Flight mode
>  (2)set one alarm ,such as 8 hours later, or 20 minutes later.

Hi sync-1,
I try to reproduce the issue via above steps with 20 minutes later. But I cannot reproduce it. Need QA's help.
Keywords: qawanted
(In reply to Ian Liu [:ianliu] from comment #1)
> (In reply to sync-1 from comment #0)
> >  DEFECT DESCRIPTION:
> >  (1)Switch on the Flight mode
> >  (2)set one alarm ,such as 8 hours later, or 20 minutes later.
> 
> Hi sync-1,
> I try to reproduce the issue via above steps with 20 minutes later. But I
> cannot reproduce it. Need QA's help.

CC Al.
If this is reproduced, it should be tef+. Does it happen with NITZ enabled or without it?
Flags: needinfo?(sync-1)
Whiteboard: [tef-triage]
NITZ is activated. The main problem here is not that the alarm is not working,just it can not waken up the mobile. If I set the alarm 8 hours later,it can always reproduce the problem.
Flags: needinfo?(sync-1)
Bug 865570 may be a dup of this bug
I can reproduce the issue with build version TEFUS_20130427. Still need to investigate deeply.
Bug 860884 and Bug 865570 all smell like the same thing. It's highly possible due to the CPU wake lock issue, because the alarm won't ring until you turn on the screen.

Hi Justin, we need to add the CPU wake lock thing back to ensure the code path from RTC wake up to firing system message is also under the scope of CPU wake lock.
(In reply to Gene Lian [:gene] from comment #7)
> Bug 860884 and Bug 865570 all smell like the same thing. It's highly
> possible due to the CPU wake lock issue, because the alarm won't ring until
> you turn on the screen.
> 
> Hi Justin, we need to add the CPU wake lock thing back to ensure the code
> path from RTC wake up to firing system message is also under the scope of
> CPU wake lock.

setting ni? to Justin
Flags: needinfo?(justin.lebar+bug)
I try to add some log with time stamp. It's help me to make sure that Clock app have been ran(received system message) when the device be waked up via user. This is a CPU wake lock issue.
> Hi Justin, we need to add the CPU wake lock thing back to ensure the code path from RTC wake up to 
> firing system message is also under the scope of CPU wake lock.

We're covering this with the patch we've been discussing in bug 862322, right?
Flags: needinfo?(justin.lebar+bug)
tef+ given comment 3. Starting with you Ian (please reassign/unassign if you're not the right person).
Assignee: nobody → iliu
blocking-b2g: tef? → tef+
comment 6 confirms a reproduction, so removing qawanted
Keywords: qawanted
Whiteboard: [tef-triage]
Adding c= whiteboard to get this into the scrumbugs backlog.
Whiteboard: c=
(In reply to Justin Lebar [:jlebar] from comment #10)
> > Hi Justin, we need to add the CPU wake lock thing back to ensure the code path from RTC wake up to 
> > firing system message is also under the scope of CPU wake lock.
> 
> We're covering this with the patch we've been discussing in bug 862322,
> right?

Yes. And looks like Gene will fire another issue for fixing CPU wake lock.(https://bugzilla.mozilla.org/show_bug.cgi?id=862322#c34) So, I assign the issue to him. 

Gene,
If you need any help in Gaia side, please feel free to let me know. Thanks.
Assignee: iliu → gene.lian
blocking-b2g: tef+ → tef?
(In reply to Alex Keybl [:akeybl] from comment #12)
> tef+ given comment 3. Starting with you Ian (please reassign/unassign if
> you're not the right person).

Alex,
Could you please help to remark tef+ ? I mark it to unknown state accidentally.
I'm going to move my patch from Bug 862322 to here for solving the CPU wake lock issue. Please stay tuned.
Blocks: alarm-api, 862322
blocking-b2g: tef? → tef+
Gene,
I would like to give you some information for CPU wake lock issue.

Set an alarm 20 mins later. The alarm should goes off at 15:13. But the alarm did not goes off until turn on the screen at 15:27. I add some log to make sure where the process is sleeping in...

It looks like window_manager receive the "mozChromeEvent" in correct alarm time.

======= correct alarm time =======
E/GeckoConsole( 2040): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1517 in anonymous: --> open-app!!!--> 15:13:00:1367478780
E/GeckoConsole( 2040): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1518 in anonymous: --> e.detail.url = app://clock.gaiamobile.org/index.html

======= turn on screen by myself =======
E/GeckoConsole( 2215): Content JS LOG at app://clock.gaiamobile.org/gaia_build_defer_index.js:294 in gotMessage: --> mozSetMessageHandler(): received message..--> 15:27:08:1367479628
E/GeckoConsole( 2215): Content JS LOG at app://clock.gaiamobile.org/gaia_build_defer_index.js:294 in aac_onAlarmFiredHandler: --> onAlarmFiredHandler(): requestWakeLock..--> 15:27:08:1367479628

Hoping the info is useful.
Attached patch WIP Patch (obsolete) — Splinter Review
Provided a WIP patch based on Justin's bug 862322, comment #33. Sadly, I had difficulties building my Buri. I'll test my patch tomorrow ASAP.

Based on the comment #18. It seems the request of opening app (by system message mechanism) has arrived on time, but not for the system message handling. Hmm... maybe our assumption is wrong. However, it's still possible that the app isn't launched completely before system sleeping, thus not yet acquiring the next CPU wake lock for handling system message (bug 836654).

Anyway, let's see if my patch is working tomorrow.
Attached patch Patch V3 (obsolete) — Splinter Review
Hi Justin,

This patch works well. :D This bug cannot be reproduced anymore with this patch applied. This patch is refined based on bug 862322, comment #33. The following addresses your concern:

> Also, is it a problem if AlarmFiredEvent blocks the alarm thread for a long
> time?  Like, if we have two separate alarms set close in time, do we expect
> to fire two separate alarm notifications, or is it OK if they get merged
> into one notification?
> 
> It might be safer if you unlocked the lock from the AlarmFiredEvent's run
> method and didn't dispatch sync.

Thanks for your suggestion! In the new patch, I won't let the main thread block the alarm-watcher thread.

However, the alarm-watcher thread and main thread can now have chance to compete. However, we don't need to worry about that because we can discuss the following two cases:

  1. Alarm-watcher thread runs first to the blocking IO before the next alram is programmed: the blocking IO will just block and wait there even if an alarm hasn't been programmed. It won't release until the next alarm is programmed and needs to go off.

  2. The next alarm is programmed first before the alarm-watcher thread runs to the blocking IO: when the alarm-wather thread runs to the blocking IO later, it will (a) release immediately if the programmed alarm has been expired or (b) block and wait for the programmed alarm to fire.

Both cases will safely fire two separate alarm events, which works as expected. Note that we often run into the case #1 in practice because the alarm-watcher thread has no extra tasks to do after dispatching the alarm event to the main thread.
Attachment #744569 - Attachment is obsolete: true
Attachment #745056 - Flags: review?(justin.lebar+bug)
(In reply to Gene Lian [:gene] from comment #20)
> Created attachment 745056 [details] [diff] [review]
> Patch V3
> 
I just tried to reproduce the issue via Patch V3. I set 5, 20, 25 mins alarm for the test. All of them are working fine for me. Thanks for Gene's help.
Comment on attachment 745056 [details] [diff] [review]
Patch V3

This looks great to me!  I'm really happy to hear it's solving some of our
problems, too.  :)

>@@ -528,16 +528,20 @@ bool WriteToFile(const char *filename, c
>+// Some CPU wake locks may be acquired internally in HAL. We use a counter to
>+// keep track of these needs.
>+int sInternalLockCpuCount = 0;

int32_t, please.  We avoid raw types like int when possible.

Also, please add a comment indicating that one has to hold
sInternalLockCpuMonitor when reading or writing this variable.

>+static Monitor* sInternalLockCpuMonitor = NULL;

nullptr, please.

>+static void
>+UpdateCpuSleepState()
>+{

Please assert that sInternalLockCpuMonitor is held.

>+  bool allowed = sCpuSleepAllowed && !sInternalLockCpuCount;
>+  WriteToFile(allowed ? wakeUnlockFilename : wakeLockFilename, "gecko");
>+}

> bool
> GetCpuSleepAllowed()
> {
>   return sCpuSleepAllowed;
>+  // To Justin: don't need return |sCpuSleepAllowed && !sInternalLockCpuCount|
>+  // because |.cpuSleepAllowed| should be fully handled by Gaia. Right?
>+  // I mean the Gaia end shouldn't be aware of the internal wake locks.
>+  // This comment will be removed after getting confirmed.

Yes, this is exactly correct.

>-// If |sAlarmData| is non-null, it's owned by the watcher thread.
>+// If |sAlarmData| is non-null, it's owned by the alarm-watcher thread.
> typedef struct AlarmData {

You don't need typedef struct in C++.

>+    // The fired alarm event has been delivered to the observer (if needed),
>+    // we can now release a CPU wake lock.

Nit: s/,/;/

>@@ -902,20 +937,25 @@ WaitForAlarm(void* aData)
>-    if (!alarmData->mShuttingDown &&
>-        alarmTypeFlags >= 0 && (alarmTypeFlags & ANDROID_ALARM_RTC_WAKEUP_MASK)) {
>+    if (!alarmData->mShuttingDown && alarmTypeFlags >= 0 &&
>+        (alarmTypeFlags & ANDROID_ALARM_RTC_WAKEUP_MASK)) {
>+      // To make sure the observer can get the alarm firing notification
>+      // *on time* (the system won't sleep during the process in any way),
>+      // we need to acquire a CPU wake lock before firing the alarm event.
>+      InternalLockCpu();
>       NS_DispatchToMainThread(new AlarmFiredEvent(alarmData->mGeneration));

This must be

        nsRefPtr<AlarmFiredEvent> event = new AlarmFiredEvent(...);
        NS_DispatchToMainThread(event);

The reason is, it's incorrect to call an XPCOM function on an object which has
a refcount of 0, as this object does right after its constructor.  If
NS_DispatchToMainThread addref'ed and then released the event, it would end up
deleting the event!  Then when it tried to use it later, that would be a
use-after-free.
Attachment #745056 - Flags: review?(justin.lebar+bug) → review+
Addressing comment #22. r=jlebar a=tef+. Ready to land!
Attachment #745056 - Attachment is obsolete: true
Attachment #745744 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/f209fc8b569a
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
My test also shows it works well on SW128
AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.019.092
Firefox os  v1.0.1
Mozilla build ID:20130430070201.
Verified according to Comment 27
Status: RESOLVED → VERIFIED
Sorry, the Comment 27 is wrong.
092 did not include the patch at all.

I retested on:
AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.019.105
Firefox os  v1.0.1
Mozilla build ID:20130512070209

which is including the patch, but the problem is not fixed. I can still see same issue happens. So I reopen the issue.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(In reply to Amelie Kong from comment #29)
> Sorry, the Comment 27 is wrong.
> 092 did not include the patch at all.
> 
> I retested on:
> AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.019.105
> Firefox os  v1.0.1
> Mozilla build ID:20130512070209
> 
> which is including the patch, but the problem is not fixed. I can still see
> same issue happens. So I reopen the issue.

I a
(In reply to Amelie Kong from comment #29)
> Sorry, the Comment 27 is wrong.
> 092 did not include the patch at all.
> 
> I retested on:
> AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.019.105
> Firefox os  v1.0.1
> Mozilla build ID:20130512070209
> 
> which is including the patch, but the problem is not fixed. I can still see
> same issue happens. So I reopen the issue.

I am sorry to write a mistake version. I mistook 094 for 092. And Maybe I didn't try enough times to test this problem and too hurry to made a conclusion. I have asked verification and they said that these phenomenon comes out randomly. He tried about 30 times and several times this problem was reproduces.
In recent two days, I've tested the following two sets:

  - Unagi (Gecko b2g18 + Gaia v1-train)
  - Buri (Hamachi) (Gecko v1.0.1 + Gaia v1.0.1)

at least 20 times separately, but they're still not reproducing.

Amelie and 田旻, which device are you working on? I'm wondering there could be some platform-dependent issues.
(In reply to Gene Lian [:gene] from comment #32)
> In recent two days, I've tested the following two sets:
> 
>   - Unagi (Gecko b2g18 + Gaia v1-train)
>   - Buri (Hamachi) (Gecko v1.0.1 + Gaia v1.0.1)
> 
> at least 20 times separately, but they're still not reproducing.
> 
> Amelie and 田旻, which device are you working on? I'm wondering there could be
> some platform-dependent issues.
Hi Gene Lian,

You set an alarm and wait for its come tomorrow morning. The next you will find it's silence without any remind at all.  They  all told me reproduce the bug in this way, and I will have a try in the same way tonight.
I can reproduce the issue in the third snooze.
- Buri (Hamachi) (Gecko v1.0.1 + Gaia v1.0.1)(build version 20130514230202)

The situation is same with comment 18. Looks like the process is sleeping in window_manager while opening Clock App. I will add more log to check which asynchronous function call to unlock the CPU.
It the problem is still present, I'm wondering there is a very tricky edge case that the code path is not under the protection of CPU wake lock from:

  - firing 'open-app' chrome event by shell.js (gecko)

to 

  - starting to load the frame in window_manager.js (gaia)

After the frame starts to load, it will acquire another CPU wake lock and don't release it until the system message is handled (Bug 836654). Assuming that the system would fall into sleep at any points if no CPU wake lock is hold, since some of the code-paths mentioned above might be *async*, therefore the CPU wake lock we hold in the HAL doesn't really help. I'll try to dump more logs tomorrow to figure out when does the system actually sleep again.

Justin, may I have your comments on my guess? Especially Bug 836654, I didn't really follow all the details of that.
Flags: needinfo?(justin.lebar+bug)
To clarify my point, I think if we can make sure the Gecko can always deliver the chrome event 'open-app' on time and the system would sleep somewhere in the window_manager.js, then Gaia needs to acquire extra wake locks before the frame (clock app) is loaded.
Whiteboard: [fixed-in-birch] c= → [status: needs new patch][fixed-in-birch] c=
Update my test:
I set an alarm to notify me at 10:05 AM last night. This morning, it goes off at 10:07 AM... In this case, the symptom is more like Bug 861971 - Alarm clock not reliable . Sometimes it could be waked up with delay <n> minutes.

===== window_manager =====
E/GeckoConsole(  976): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1530 in anonymous: --> open-app!!!--> 10:05:00:1368669900
E/GeckoConsole(  976): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1531 in anonymous: --> e.detail.url = app://clock.gaiamobile.org/index.html
E/GeckoConsole(  976): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1556 in anonymous: --> isRunning!!!--> 10:05:00:1368669900

===== Clock App ===== 
E/GeckoConsole( 1146): Content JS LOG at app://clock.gaiamobile.org/gaia_build_defer_index.js:294 in gotMessage: --> mozSetMessageHandler(): received message..--> 10:07:21:1368670041
E/GeckoConsole( 1146): Content JS LOG at app://clock.gaiamobile.org/gaia_build_defer_index.js:294 in aac_onAlarmFiredHandler: --> onAlarmFiredHandler(): requestWakeLock..--> 10:07:21:1368670041
I'm having a log worth discussing and it implies the root cause. I'll paste it later. Please stay tuned.
I tried to dump more info when alarm fires. The log implies the root cause. Let's discuss them based on different cases in details. Note that all the cases shown below did happen in my tests.

1. Clock App is alive

Case A (safe)
------------------------------
15:03:59.989 I/(  143): Gecko: HAL says alarm fired!
15:03:59.989 I/(  143): Gecko: HAL acquires a CPU wake lock.
15:04:00.039 I/(  454): Gaia: Clock App starts to handle the system message.
15:04:00.039 I/(  454): Gaia: Clock App acqures a CPU wake lock.
15:04:00.049 I/(  143): Gaia: the 'open-app' chrome event is received.
15:04:00.049 I/(  143): Gecko: HAL releases the CPU wake lock.

Case B (safe)
------------------------------
15:05:59.989 I/(  143): Gecko: HAL says alarm fired!
15:05:59.989 I/(  143): Gecko: HAL acquires a CPU wake lock.
15:06:00.039 I/(  143): Gaia: the 'open-app' chrome event is received.
15:06:00.039 I/(  454): Gaia: Clock App starts to handle the system message.
15:06:00.039 I/(  454): Gaia: Clock App acqures a CPU wake lock.
15:06:00.049 I/(  143): Gecko: HAL releases the CPU wake lock.

Case C (not safe)
------------------------------
15:06:59.989 I/(  143): Gecko: HAL says alarm fired!
15:06:59.989 I/(  143): Gecko: HAL acquires a CPU wake lock.
15:07:00.039 I/(  143): Gaia: the 'open-app' chrome event is received.
15:07:00.049 I/(  454): Gaia: Clock App starts to handle the system message.
15:07:00.049 I/(  143): Gecko: HAL releases the CPU wake lock.
15:07:00.059 I/(  454): Gaia: Clock App acqures a CPU wake lock.

Case D (not safe)
------------------------------
15:17:59.999 I/(  143): Gecko: HAL says alarm fired!
15:17:59.999 I/(  143): Gecko: HAL acquires a CPU wake lock.
15:18:00.059 I/(  143): Gaia: the 'open-app' chrome event is received.
15:18:00.059 I/(  143): Gecko: HAL releases the CPU wake lock.
15:18:00.069 I/(  454): Gaia: Clock App starts to handle the system message.
15:18:00.069 I/(  454): Gaia: Clock App acqures a CPU wake lock.


2. Clock App is killed

Case A (safe)
------------------------------
15:08:59.989 I/(  143): Gecko: HAL says alarm fired!
15:08:59.989 I/(  143): Gecko: HAL acquires a CPU wake lock.
15:09:00.019 I/(  143): Gaia: the 'open-app' chrome event is received.
15:09:00.029 I/(  143): Gecko: SystemMessageHandledListener acquires a CPU wake lock.
15:09:00.049 I/(  143): Gecko: HAL releases the CPU wake lock.
15:09:00.529 I/(  467): Gaia: Clock App starts to handle the system message.
15:09:00.539 I/(  467): Gaia: Clock App acqures a CPU wake lock.
15:09:00.599 I/(  143): Gecko: SystemMessageHandledListener releases the CPU wake lock.


Please see cases 1-C and 1-D, the system is not under the protection of CPU wake lock because the HAL releases the wake lock *before* the Clock App acquires its own wake lock (for attention screen), letting the system have chance to fall into sleep. Interestingly, the case 2 is always safe, because when the Clock App starts, the Gecko will internally acquire a wake lock and doesn't releases it until the system message is handled.

In summary, these log results meet the bug description (it doesn't mention the app is killed). I'm wondering in the window_manager.js, there are asynchronous codes when bringing the running apps to the foreground. I think Gaia somehow needs to acquire extra wake locks before the HAL releases its one.
Today Gene and Ian discussed this to me.
My understanding is, if clock is not running, the CPU may fall into sleep between:
1) system app got mozChromeEvent: open-app
2) system app creates a new mozbrowser iframe
because there's no CPU wake lock between 1) and 2)

As telephony already has a 5s timer to request CPU wake lock, I would suggest
A) System app request a 5s CPU wake lock after got open-app event.
B) mozAlarm in gecko request a 5s CPU wake lock.

I think it's more important to let the system message be dealt with in time rather than a 5 second CPU power "waste". How about this?
(In reply to Alive Kuo [:alive] from comment #40)
> Today Gene and Ian discussed this to me.
> My understanding is, if clock is not running, the CPU may fall into sleep
> between:
> 1) system app got mozChromeEvent: open-app
> 2) system app creates a new mozbrowser iframe
> because there's no CPU wake lock between 1) and 2)

No, if the clock is not running, it won't fall into sleep (please see comment #39). Actually, the Gecko did acquire a wake lock before firing 'open-app' but will release it when the function (catching 'open-app') returns. During this process, (2) is executed, which will acquire a wake lock and don't release until the system message is handled.

The issue is only happening when the clock is running. We don't have (2) to acquire an extra wake lock, thus the system would have chance to fall into sleep when the function returns (HAL releases the wake lock before the Clock App acquires its own).

> 
> As telephony already has a 5s timer to request CPU wake lock, I would suggest
> A) System app request a 5s CPU wake lock after got open-app event.
> B) mozAlarm in gecko request a 5s CPU wake lock.
> 
> I think it's more important to let the system message be dealt with in time
> rather than a 5 second CPU power "waste". How about this?

It sounds fine to me. Actually, the in-coming call is doing this way. That is, acquiring a wake lock with 5 seconds no matter if the system message is handled. This ensures the *running-app* can process its system message on time.

Justin, what do you think? I understand you might not like the timeout stuff. However, as Alive's mentioning, it's very cheap to acquire a 5-sec wake lock and alarms don't actually fire very often.
Attached patch Follow-Up Patch (obsolete) — Splinter Review
This patch is not yet tested. Just a WIP. Please see the previous comments. Thanks!
Attachment #750379 - Flags: feedback?(justin.lebar+bug)
Attached patch Follow-Up Patch, V1.1 (obsolete) — Splinter Review
Correct the wording things.
Attachment #750379 - Attachment is obsolete: true
Attachment #750379 - Flags: feedback?(justin.lebar+bug)
Attachment #750382 - Flags: feedback?(justin.lebar+bug)
With an alarm set at 14:40 that got fired at 14:41, the debug I added gives:

674 E/GeckoConsole(  107): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1275 in anonymous: WindowManager Event ("2013-05-16T12:40:00.023Z"): open-app
675 E/GeckoConsole(  107): Content JS LOG at app://system.gaiamobile.org/js/window_manager.js:1298 in anonymous: WindowManager Event ("2013-05-16T12:40:00.024Z"): IN open-app

Please also notice that it seems easier to trigger the issue if I open a lot of applications.
Whiteboard: [status: needs new patch][fixed-in-birch] c= → [status: needs review][fixed-in-birch] c=
(In reply to Alexandre LISSY :gerard-majax from comment #45)
> With an alarm set at 14:40 that got fired at 14:41, the debug I added gives:
> 
> 674 E/GeckoConsole(  107): Content JS LOG at
> app://system.gaiamobile.org/js/window_manager.js:1275 in anonymous:
> WindowManager Event ("2013-05-16T12:40:00.023Z"): open-app
> 675 E/GeckoConsole(  107): Content JS LOG at
> app://system.gaiamobile.org/js/window_manager.js:1298 in anonymous:
> WindowManager Event ("2013-05-16T12:40:00.024Z"): IN open-app

The above timings prove the window_manager.js can receive the 'open-app' chrome event on time. Although there are some milliseconds delay, it's inevitable to avoid that as long as we would have some codes to run.

What we're really caring about is why the Clock App rings at 14:41 (one minute late)? I think the root cause is due to the cases 1-C and 1-D at comment #39.
Uploading the codes for dumping the log at comment #39 for the reference.

Btw, the log at comment #39 is tested under the scenario that the power is connected and the screen is on, so that's why the alarm seems still fired on time although we'd have some non-wake-lock gaps at the cases 1-C and 1-D. However, those non-wake-lock gaps will stop us firing the alarm when we attempt to wake up the system from sleeping.
Attached patch Follow-Up Patch, V2 (obsolete) — Splinter Review
Hi Justin,

Please see comment #39 and comment #41 for the root cause and why we need this patch.

If you don't like the way of acquiring wake locks segment by segment, I think we might need to pull back the Part 2 patch at Bug 796255 that you used to strip out. That patch could ensure us to safely acquire the wake lock from the point of firing system message to the point of being handled, to protect the cases 1-C and 1-D at comment #39 when the app is running.
Attachment #750382 - Attachment is obsolete: true
Attachment #750382 - Flags: feedback?(justin.lebar+bug)
Attachment #750862 - Flags: review?(justin.lebar+bug)
Ah, I think I understand what's going on here.

Everything works properly in the case when the clock app is not running.  But we don't take the CPU wake lock in the case when the app is already running.

I don't think this patch is the right solution.  If the clock app happens to take more than 5s to launch, we're in trouble.  This is just opening us up to a bug where, if the CPU happens to be busy when the phone wakes up (e.g. due to a bg process, or the main process), we'll miss an alarm.

Putting the code from bug 796255 part 2 back in is better.  But that only works for the alarm.

It seems to me that we need to do this for every app when firing a system message.  I think we still need the changes I made in bug 796255, because we need to hold the lock while the process is starting, if it's starting.  But we also need to hold the lock if the process is already around.

Does that sound right to you?  It's quite late here, so please tell me if I'm talking crazy.
Flags: needinfo?(justin.lebar+bug)
Attachment #750862 - Flags: review?(justin.lebar+bug) → review-
Sweet! I'm really happy you can understand all the discovers I put here. :D I'll come back with a solution in the System Message mechanism. Thanks for your suggestion!
Attached patch Follow-Up Patch, V3 (obsolete) — Splinter Review
Notes:

1. The system message to be sent or broadcasted has been associated with a message ID. We can reuse it to specify if a system message has been handled. My basic solution is: whenever a .sendAsyncMessage("SystemMessageManager:Message", ...) is called by parent, the parent should expect to receive one "SystemMessageManager:HandleMessageDone" from the child. We cannot release the CPU wake lock until all of them are collected and counted.

2. If an app doesn't register a handler for a certain system message type, we should consider it as a kind of "handled", so that we can ask the parent to release the holding CPU wake lock earlier, instead of by timeout.

3. My understanding to the SystemMessageHandledListener is each process has only one SystemMessageHandledListener to be initiated. In the original logic, we'd fire one "SystemMessageManager:HandleMessageDone" (I'll rename this later) for each system message handled, which means the SystemMessageHandledListener would shut down and release the wake lock by the *first* system message. However, we cannot do that until all the system messages are handled. I feel the way of SystemMessageHandledListener is doing is wrong.

4. I rename "SystemMessageManager:HandleMessageDone" to be "handle-system-messages-done" and put the .notifyObservers(...) in a better place. Also, I reuse "SystemMessageManager:HandleMessageDone" for my new purpose.

5. I'm wondering we don't even need the SystemMessageHandledListener and the "handle-system-messages-done". The new "SystemMessageManager:HandleMessageDone" has already covered a wider scope of the CPU wake lock. I think we can strip out the SystemMessageHandledListener.

The codes are not yet tested. I'll verify them next week. Could you please return some feedback to me at the same time? Thanks!
Attachment #750862 - Attachment is obsolete: true
Attachment #751005 - Flags: feedback?(justin.lebar+bug)
Attachment #751005 - Flags: feedback?(fabrice)
Attached patch Follow-Up Patch, V3.1 (obsolete) — Splinter Review
Please see comment #51 for the patch summary. After some thoughts, I decide to correct the #1 to:

Whenever the parent is sending a system message to a certain page, we require a CPU wake lock for it and will expect that page can handle that system message and send a "SystemMessageManager:HandleMessageDone" back to the parent to attempt to release the holding CPU wake lock.
Attachment #751005 - Attachment is obsolete: true
Attachment #751005 - Flags: feedback?(justin.lebar+bug)
Attachment #751005 - Flags: feedback?(fabrice)
Attachment #751043 - Flags: feedback?(justin.lebar+bug)
Attachment #751043 - Flags: feedback?(fabrice)
>diff --git a/dom/messages/SystemMessageInternal.js b/dom/messages/SystemMessageInternal.js
>--- a/dom/messages/SystemMessageInternal.js
>+++ b/dom/messages/SystemMessageInternal.js

> SystemMessageInternal.prototype = {
>+  _acquireCpuWakeLock: function _acquireCpuWakeLock(aMessageId) {
>+    let cpuWakeLock = this._cpuWakeLocks[aMessageId];
>+    if (!cpuWakeLock) {
>+      // We have to ensure the CPU doesn't sleep during the process of
>+      // handling system messages, so that they can be handled on time.
>+      cpuWakeLock = this._cpuWakeLocks[aMessageId] = {
>+        wakeLock: powerManagerService.newWakeLock("cpu"),
>+        timer: Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer),
>+        pageCount: 1
>+      };
>+    } else {
>+      // We've already acquied the CPU wake lock for this system message,
>+      // so just adding the page count.
>+      ++cpuWakeLock.pageCount;
>+    }
>+
>+    // Set a watchdog to avoid locking the CPU wake lock too long,
>+    // because it'd exhaust the battery quickly which is very bad.
>+    // This could probably happen if the app failed to launch or
>+    // handle the system messages due to any unexpected reasons.
>+    cpuWakeLock.timer.initWithCallback(function timerCb() {
>+      debug("Release the CPU wake lock if the system messages are not " +
>+            "successfully handled before timeout.");
>+      this._releaseCpuWakeLock(aMessageId);

I don't think we want to run releaseCpuWakeLock as currently defined -- we want
to run the version that decreases pageCount and releases the lock only if
pageCount goes to 0.

>+    }.bind(this), 30000, Ci.nsITimer.TYPE_ONE_SHOT);
>+  },
>+
>+  _releaseCpuWakeLock: function _releaseCpuWakeLock(aMessageId) {
>+    let cpuWakeLock = this._cpuWakeLocks[aMessageId];
>+    if (cpuWakeLock) {
>+      cpuWakeLock.wakeLock.unlock();
>+      cpuWakeLock.timer.cancel();
>+      delete this._cpuWakeLocks[aMessageId];

Shouldn't this look at cpuWakeLock.pageCount and only do this if the count goes
to 0?

>@@ -343,25 +386,45 @@ SystemMessageInternal.prototype = {
>             if (pendingMessages[i].msgID === msg.msgID) {
>               pendingMessages.splice(i, 1);
>               break;
>             }
>           }
>         }, this);
>         break;
>       }
>+      case "SystemMessageManager:HandleMessageDone":
>+      {
>+        debug("received SystemMessageManager:HandleMessageDone for " +
>+              "system message ID = " + msg.msgID);
>+
>+        let cpuWakeLock = this._cpuWakeLocks[msg.msgID];
>+        if (cpuWakeLock) {
>+          --cpuWakeLock.pageCount;
>+          if (cpuWakeLock.pageCount == 0) {
>+            debug("Unlock the CPU wake lock after all the system messages " +
>+                  "are successfully handled for sure.");
>+            this._releaseCpuWakeLock(msg.msgID);

Oh, I see.  It's confusing that the pageCount logic is in the addWakeLock
function but not in the releaseWakeLock function.  I see that this is because
you want to call releaseWakeLock in a loop in order to release all of the wake
locks on shutdown.  Anyway, we can figure something out.

>   observe: function observe(aSubject, aTopic, aData) {
>     switch (aTopic) {
>       case "xpcom-shutdown":
>         kMessages.forEach(function(aMsg) {
>           ppmm.removeMessageListener(aMsg, this);
>         }, this);
>+        // Cancel the timers for system message CPU wake locks.
>+        for (let messageId in this._cpuWakeLocks) {
>+          this._releaseCpuWakeLock(messageId);
>+        }

This modifies this._cpuWakeLocks while you iterate over it.  I doubt that's
going to work out well.

Anyway maybe this code isn't necessary.

>diff --git a/dom/messages/SystemMessageManager.js b/dom/messages/SystemMessageManager.js
>--- a/dom/messages/SystemMessageManager.js
>+++ b/dom/messages/SystemMessageManager.js

>@@ -168,29 +174,37 @@ SystemMessageManager.prototype = {
>+    // |messages| is an array of |{ msg, msgID }|s.
>+    let messages = (aMessage.name == "SystemMessageManager:Message")
>+                 ? [{ msg: msg.msg, msgID: msg.msgID }]

Can this be [msg] instead?

This looks good to me, although of course it needs to be tested.

To the question of whether we can remove ContentParent's
SystemMessageHandledObserver, my question is: Is this CPU wake lock taken
before we create the new frame?  It doesn't look like it to me, but you
probably know better than I how this stuff works.
Attachment #751043 - Flags: feedback?(justin.lebar+bug) → feedback+
Comment on attachment 751043 [details] [diff] [review]
Follow-Up Patch, V3.1

Review of attachment 751043 [details] [diff] [review]:
-----------------------------------------------------------------

I think jlebar has a much better grasp on this than me.
Attachment #751043 - Flags: feedback?(fabrice)
Attached patch Follow-Up Patch, V4 (obsolete) — Splinter Review
This patch is verified. Every case is working well. Take the case 1-D at comment #39 as an example. The log becomes:

15:55:59.989 I/(  109): Gecko: HAL says alarm fired!
15:55:59.989 I/(  109): Gecko: HAL acquires a CPU wake lock.
15:56:00.049 I/(  109): Gecko: System Message Internal acqures a CPU wake lock.
15:56:00.059 I/(  109): Gaia: the 'open-app' chrome event is received.
15:56:00.069 I/(  109): Gecko: HAL releases the CPU wake lock.
15:56:00.089 I/(  487): Gaia: Clock App starts to handle the system message.
15:56:00.089 I/(  487): Gaia: Clock App acqures a CPU wake lock.
15:56:00.149 I/(  109): Gecko: System Message Internal releases the CPU wake lock.

You can see the CPU wake lock acquired by the Clock App is between the CPU wake lock pair acquired by System Message Internal.

Regarding the following comment:

> To the question of whether we can remove ContentParent's
> SystemMessageHandledObserver, my question is: Is this CPU wake lock taken
> before we create the new frame?  It doesn't look like it to me, but you
> probably know better than I how this stuff works.

The answer is absolutely yes. The CPU wake lock is taken much earlier before creating new frame. Let's see what the system message internal does when sending a system message:

1. Gecko calls .sendMessage() or .broadcastMessage().
2. System message internal fires the system message(s) through IPC.
3. System message internal queues the fired system message(s).
4. System message internal .notifyObservers("system-messages-open-app")
5. Shell.js gets "system-messages-open-app" and fires "open-app" event.
6. Gaia's window_manager.js gets "open-app" and tries to open the app.

The solution in my patch acquires the CPU wake lock *before* #2, so it can lock the CPU *before* the running app catches and handles the system message. However, the SystemMessageHandledListener acquires the CPU wake lock at #6, at which point the running app might already finish handling the system message. Take the case 2 at comment #39 for example. The log becomes:

15:53:59.992 I/(  109): Gecko: HAL says alarm fired!
15:53:59.992 I/(  109): Gecko: HAL acquires a CPU wake lock.
15:54:00.032 I/(  109): Gecko: System Message Internal acqures a CPU wake lock.
15:54:00.042 I/(  109): Gaia: the 'open-app' chrome event is received.
15:54:00.072 I/(  109): Gecko: SystemMessageHandledListener acquires a CPU wake lock.
15:54:00.092 I/(  109): Gecko: HAL releases the CPU wake lock.
15:54:00.752 I/(  487): Gaia: Clock App starts to handle the system message.
15:54:00.752 I/(  487): Gaia: Clock App acqures a CPU wake lock.
15:54:00.822 I/(  109): Gecko: System Message Internal releases the CPU wake lock.
15:54:00.832 I/(  109): Gecko: SystemMessageHandledListener releases the CPU wake lock.
Attachment #751043 - Attachment is obsolete: true
Attachment #751315 - Flags: review?(justin.lebar+bug)
Whiteboard: [status: needs review][fixed-in-birch] c= → [status: needs review][fixed-in-birch][required_last_cert_round] c=
tested on ikura build 26/04, and it's working fine.
Whiteboard: [status: needs review][fixed-in-birch][required_last_cert_round] c= → [status: needs review][required_last_cert_round] c=
Comment on attachment 751315 [details] [diff] [review]
Follow-Up Patch, V4

Review of attachment 751315 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks good to me, and I tested successfully on device. I'd still like Justin to review though.
Attachment #751315 - Flags: feedback+
>> Is this CPU wake lock taken before we create the new frame?
> The answer is absolutely yes.

Okay, great!  Can you file a new bug to remove this code, then?  (It's good not
to do it in here, so this patch is the minimal change necessary to fix the
bug.)

This is a review of the interdiff, which you can generate with

> interdiff -U8 <(curl -L https://bugzilla.mozilla.org/attachment.cgi?id=751005) <(curl -L https://bugzilla.mozilla.org/attachment.cgi?id=751315)

I have some English nits that aren't very important.  More importantly, I think
there's a serious bug here, so I can't give this r+.

>diff -U8 b/dom/messages/SystemMessageInternal.js b/dom/messages/SystemMessageInternal.js
>--- b/dom/messages/SystemMessageInternal.js
>+++ b/dom/messages/SystemMessageInternal.js

Grammar nit: It's usually better to use the present progressive tense+aspect
for log messages instead of the imperative.  In contrast, it's usually better
to use the imperative in a comment.

http://en.wikipedia.org/wiki/Uses_of_English_verb_forms#Progressive

  Good comment (present imperative): Release the lock.
  Good log message (present progressive): Releasing the lock.

This is idiomatic because a log message is the same sort of thing someone
saying to you "I'm doing this!", which doesn't mean the same thing as if
someone says "Do this!"

(Please don't think that I think the English is bad here; it's actually very
good!)

>@@ -72,48 +72,64 @@
>   _acquireCpuWakeLock: function _acquireCpuWakeLock(aMessageId) {
>     let cpuWakeLock = this._cpuWakeLocks[aMessageId];
>     if (!cpuWakeLock) {
>       // We have to ensure the CPU doesn't sleep during the process of
>-      // handling system messages, so that they can be handled on time.
>+      // handling the system message, so that it can be handled on time.
>+      debug("Acquire a CPU wake lock for system message ID = " + aMessageId);

/Acquiring/ a CPU wake lock

>       cpuWakeLock = this._cpuWakeLocks[aMessageId] = {
>         wakeLock: powerManagerService.newWakeLock("cpu"),
>         timer: Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer),
>-        winCount: 1
>+        pageCount: 1

Nit: Can we call this "lockCount"?  I know you're counting the number of pages
which have acquired this lock, but it's easier (for me, anyway) to think about
this with a smaller scope: "How many times has acquireCpuWakeLock been called
for a particular message ID?"  That has nothing to do with pages.

>       };
>     } else {
>-      // We've already acquied the CPU wake lock for this system message.
>-      ++cpuWakeLock.winCount;
>+      // We've already acquired the CPU wake lock for this system message,
>+      // so just add the page count and extend the timeout.

Nit: "Add /to/ the page count"

>+      ++cpuWakeLock.pageCount;

Nit: I think postfix operator++ would be better here, since then the operator
is closer to the thing it's operating on.

>     // Set a watchdog to avoid locking the CPU wake lock too long,
>     // because it'd exhaust the battery quickly which is very bad.
>     // This could probably happen if the app failed to launch or
>-    // handle the system messages due to any unexpected reasons.
>+    // handle the system message due to any unexpected reasons.
>     cpuWakeLock.timer.initWithCallback(function timerCb() {
>-      debug("Release the CPU wake lock if the system messages are not " +
>-            "successfully handled before timeout.");
>-      this._releaseCpuWakeLock(aMessageId);
>+      debug("Release the CPU wake lock if the system message is not " +
>+            "successfully handled by its registered pages before timeout.");

Nit: /Releasing/ the CPU wake lock /because/ the system message /was/ not
sucessfully handled by its registered pages before /timing out/.

>+      this._cancelCpuWakeLock(aMessageId);
>     }.bind(this), 30000, Ci.nsITimer.TYPE_ONE_SHOT);
>   },
> 
>   _releaseCpuWakeLock: function _releaseCpuWakeLock(aMessageId) {
>     let cpuWakeLock = this._cpuWakeLocks[aMessageId];
>     if (cpuWakeLock) {
>-      cpuWakeLock.wakeLock.unlock();
>-      cpuWakeLock.timer.cancel();
>-      delete this._cpuWakeLocks[aMessageId];
>+      --cpuWakeLock.pageCount;
>+      if (cpuWakeLock.pageCount == 0) {
>+        debug("Unlock the CPU wake lock after the system message has been " +
>+              "successfully handled by its registered pages for sure.");

Nit: /Unlocking/ the CPU wake lock /now that/ the system message has been
successfully handled by its /listeners/.

>+        this._cancelCpuWakeLock(aMessageId);
>+      }
>     }
>   },

>@@ -389,40 +407,30 @@
>         }, this);
>         break;
>       }
>       case "SystemMessageManager:HandleMessageDone":
>       {
>         debug("received SystemMessageManager:HandleMessageDone for " +
>               "system message ID = " + msg.msgID);
> 
>-        let cpuWakeLock = this._cpuWakeLocks[msg.msgID];
>-        if (cpuWakeLock) {
>-          --cpuWakeLock.winCount;
>-          if (cpuWakeLock.winCount == 0) {
>-            debug("Unlock the CPU wake lock after all the system messages " +
>-                  "are successfully handled for sure.");
>-            this._releaseCpuWakeLock(msg.msgID);
>-          }
>-        }
>+        // One of the pages that the system message sent to has finished
>+        // handling that. De-count the need of holding the CPU wake lock.
>+        this._releaseCpuWakeLock(msg.msgID);

"De-count" is not idiomatic; we'd say "decrement", but that's still not quite
what you mean.  How about

s/\. De-count.*$/, so we can release the wake lock we acquired on behalf of that page./

>   observe: function observe(aSubject, aTopic, aData) {
>     switch (aTopic) {
>       case "xpcom-shutdown":
>         kMessages.forEach(function(aMsg) {
>           ppmm.removeMessageListener(aMsg, this);
>         }, this);
>-        // Cancel the timers for system message CPU wake locks.
>-        for (let messageId in this._cpuWakeLocks) {
>-          this._releaseCpuWakeLock(messageId);
>-        }

I guess we know that this doesn't cause "leaks"?  (I was merely hoping that it
doesn't.)  Or, I guess if it does cause leaks, we'll find out once we write a
test that tickles this, which is fine with me.

>         Services.obs.removeObserver(this, "xpcom-shutdown");
>         Services.obs.removeObserver(this, "webapps-registry-start");
>         Services.obs.removeObserver(this, "webapps-registry-ready");
>         ppmm = null;
>         this._pages = null;
>         this._bufferedSysMsgs = null;
>         break;
>       case "webapps-registry-start":

>@@ -496,27 +504,32 @@
>     // Don't send the system message not granted by the app's permissions.
>     if (!SystemMessagePermissionsChecker
>           .isSystemMessagePermittedToSend(aType,
>                                           aPageURI,
>                                           aManifestURI)) {
>       return false;
>     }
> 
>+    // Whenever the parent is sending a system message to a certain page,
>+    // we require a CPU wake lock for it and will expect that page can handle
>+    // that system message and send a "SystemMessageManager:HandleMessageDone"
>+    // back to the parent to attempt to release the holding CPU wake lock.
>+    this._acquireCpuWakeLock(aMessageID);

  Whenever the parent /sends/ a system message to a page, we /acquire/ a CPU wake
  lock for it and /expect/ that /we will receive a/
  "SystemMessageManager:HandleMessageDone" message when the page finishes
  handling the message.  At that point, we'll release the lock we acquired.

>     let targets = this._listeners[aManifestURI];
>     if (targets) {
>       for (let index = 0; index < targets.length; ++index) {
>         let manager = targets[index].target;
>         manager.sendAsyncMessage("SystemMessageManager:Message",
>                                  { type: aType,
>                                    msg: aMessage,
>                                    manifest: aManifestURI,
>                                    uri: aPageURI,
>                                    msgID: aMessageID });
>-        this._acquireCpuWakeLock(aMessageID);

Why did you move thie acquireCpuWakeLock call outside this loop (where it was
in your previous patch)?  Won't we get one
SystemMessageManager:HandleMessageDone event for each call to
SystemMessageManager:Message?

r- until we figure this particular bit out.

>diff -U8 b/dom/messages/SystemMessageManager.js b/dom/messages/SystemMessageManager.js
>--- b/dom/messages/SystemMessageManager.js
>+++ b/dom/messages/SystemMessageManager.js

>@@ -174,34 +174,34 @@
>       cpmm.sendAsyncMessage(
>         "SystemMessageManager:Message:Return:OK",
>         { type: msg.type,
>           manifest: msg.manifest,
>           uri: msg.uri,
>           msgID: msg.msgID });
>     }
> 
>+    // |messages| is an array of |{ msg, msgID, [...] }| elements.
>     let messages = (aMessage.name == "SystemMessageManager:Message")
>-                 ? [{ msg: msg.msg, msgID: msg.msgID }]
>+                 ? [msg]
>                  : msg.msgQueue;

Nit: The ?: should be indented under the (, not the =.  You don't need to break
the line before the :, either.

Normally I'd say that we should put the ? and : on the previous lines (because
we don't start lines with operators), but it's /so much/ clearer the way you
wrote it, I think we should leave it.  :)
 
>-    // We only handle the system messages when the handler is registered.
>-    let handler = this._handlers[msg.type];
>-    if (handler) {
>-      messages.forEach(function(aMessage) {
>+    messages.forEach(function(aMessage) {
>+      // We only handle the system messages when the handler is registered.

s/handle messages/dispatch messages/
s/the handler/a handler/

>+      let handler = this._handlers[msg.type];
>+      if (handler) {
>         this._dispatchMessage(msg.type, handler, aMessage.msg);
>-      }, this);
>-    }
>+      }
> 
>-    // We need to notify the parent the system message has been handled
>-    // (even it no handler is registered for it) to release the CPU wake lock.
>-    messages.forEach(function(aMessage) {
>+      // We need to notify the parent the system messages have been handled
>+      // (even it no handler is registered for it) to release CPU wake locks.

  We need to notify the parent that this system message has been handled, even
  if there's no handler registered for this message, so the parent can release
  the CPU wake lock it took on our behalf.

>       cpmm.sendAsyncMessage("SystemMessageManager:HandleMessageDone",
>-                            { msgID: aMessage.msgID });
>-    }
>+                            { manifest: msg.manifest,
>+                              msgID: aMessage.msgID });
>+    }, this);

This sends SystemMessageManager:HandleMessageDone once for each message in the
array.  But even if you moved the _acquireCpuWakeLock into the loop as
suggested above, we're still only expecting one
SystemMessageManager:HandleMessageDone event per message manager, which isn't
what we get here.

So r- until we clear this up, too.

It sounds like maybe it would be simpler to fire one event after all pages are
done handling their messages, instead of doing it this way.
Attachment #751315 - Flags: review?(justin.lebar+bug) → review-
(In reply to Justin Lebar [:jlebar] from comment #58)
> Why did you move thie acquireCpuWakeLock call outside this loop (where it was
> in your previous patch)?  Won't we get one
> SystemMessageManager:HandleMessageDone event for each call to
> SystemMessageManager:Message?

As my best understanding, the target queues are aimed for keeping track of the aliveness of the running processes (windows). That is, one target implies one process (window) and one app (identified by the manifest URL) can have multiple targets (windows).

What we're sure is: a system message must only be delivered to a specific window page (with a given manifest URL and a given page URL). If a system message is delivered to a page where the manifest URL or page URL doesn't match, that page (i.e. target) will refuse to handle that [1]. My point is: one system message can only be delivered and handled by one target (even there could be multiple targets in the same app), so that's why we don't need to add count for each target.

Why we need page counts here is because it's for the broadcasting.

> This sends SystemMessageManager:HandleMessageDone once for each message in
> the
> array.  But even if you moved the _acquireCpuWakeLock into the loop as
> suggested above, we're still only expecting one
> SystemMessageManager:HandleMessageDone event per message manager, which isn't
> what we get here.

No, it's not good. We should consider the CPU wake locking mechanism as each-system-message-based instead of each-process-based. Supposing we're going to broadcast a system message (like phone-call) to multiple pages (which may be from different applications) registered for it, we cannot release the CPU wake lock until all the pages finish handling it.

Does that sound reasonable to you? Btw, thanks for all the NITs! Good to polish my English.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageManager.js#161
> My point is: one system message can only be delivered and handled by one target (even 
> there could be multiple targets in the same app), so that's why we don't need to add 
> count for each target.

But in your patch, you change it so that we send a HandledMessage message even if the app /did not/ handle the message.  So I don't see how this applies.

Anyway, it sounds like we agree that we should change it to be a much simpler invariant: we get a call back when the SystemMessageManager:Message handler in the child completes, regardless of whether or not there were any listeners in the child.  Right?

> We should consider the CPU wake locking mechanism as each-system-message-based instead of 
> each-process-based.

I'm not sure what you mean.  I think the parent should hold a CPU wake lock until each process which was sent a SystemMessageManager:Message message finishes dispatching that message to its listeners, if any.  That invariant doesn't require any special thinking to reason about.
(In reply to Gene Lian [:gene] from comment #59)
> (In reply to Justin Lebar [:jlebar] from comment #58)
> > This sends SystemMessageManager:HandleMessageDone once for each message in
> > the
> > array.  But even if you moved the _acquireCpuWakeLock into the loop as
> > suggested above, we're still only expecting one
> > SystemMessageManager:HandleMessageDone event per message manager, which isn't
> > what we get here.
> 
> No, it's not good. We should consider the CPU wake locking mechanism as
> each-system-message-based instead of each-process-based. Supposing we're
> going to broadcast a system message (like phone-call) to multiple pages
> (which may be from different applications) registered for it, we cannot
> release the CPU wake lock until all the pages finish handling it.

Also, even for a single page, it can handle multiple system messages. What's happening if we have two system messages where the first one comes from alarm A and the second one comes from alarm B? If the starting app fetches these 2 pending messages from the central and handling them at one time, in your opinion it's going to ask the central to release only one CPU wake lock instead of two. However, two fired alarms should have two CPU wake locks being acquired.

Again, thinking the CPU wake lock as *system-message-based* sounds more reasonable.
(In reply to Justin Lebar [:jlebar] from comment #60)
> > My point is: one system message can only be delivered and handled by one target (even 
> > there could be multiple targets in the same app), so that's why we don't need to add 
> > count for each target.
> 
> But in your patch, you change it so that we send a HandledMessage message
> even if the app /did not/ handle the message.  So I don't see how this
> applies.

That just means a page doesn't set a message handler for it. However, that page could have registration for that system message type in its manifest. In this case, a system message can still be sent or broadcasted to that page.

> 
> Anyway, it sounds like we agree that we should change it to be a much
> simpler invariant: we get a call back when the SystemMessageManager:Message
> handler in the child completes, regardless of whether or not there were any
> listeners in the child.  Right?

Exactly right!

> 
> > We should consider the CPU wake locking mechanism as each-system-message-based instead of 
> > each-process-based.
> 
> I'm not sure what you mean.  I think the parent should hold a CPU wake lock
> until each process which was sent a SystemMessageManager:Message message
> finishes dispatching that message to its listeners, if any.  That invariant
> doesn't require any special thinking to reason about.

Please see another example in my previous comment.
> Exactly right!

Okay, I think we sort of agree.  Let's see how your new patch looks.  :)
Attached patch Follow-Up Patch, V4.1 (obsolete) — Splinter Review
Please do feel free to let me know it there is any detail can be improved even for a tiny grammar nit. :) I can apply these thoughts for my reviews as well.
Attachment #751315 - Attachment is obsolete: true
Attachment #751999 - Flags: review?(justin.lebar+bug)
Attachment #751999 - Flags: feedback+
Looking at your comment again. It's not very clear to me if we're in agreement? Did you already see comment #61?
Blocks: 874353
(In reply to Justin Lebar [:jlebar] from comment #60)
> Anyway, it sounds like we agree that we should change it to be a much
> simpler invariant: we get a call back when the SystemMessageManager:Message
> handler in the child completes, regardless of whether or not there were any
> listeners in the child.  Right?

To elaborate more with the following two cases:

1. When the child process is running, it can receive "SystemMessageManager:Message" sent from the parent through IPC and handle that *single* message in its handler (if a handler is also registered for that type).

2. When the child process is not running, it cannot receive the "SystemMessageManager:Message" through IPC obviously. Instead, the target page will be launched by the "open-app" event first and then run its .mozSetMessageHandler(...). When running that DOM API, the child will actively retrieve the pending messages (can be *multiple*) from the parent [1] and handle all of them at one time.

For case #1, we only have one message to be handled so it doesn't matter to send back the "SystemMessageManager:HandleMessageDone" inside or outside the loop.

My concern is for case #2, which might have multiple messages to be handled. Noticeably, each of them can be fired at different time points, which means the time points of acquiring the CPU wake locks can be different. Therefore, we cannot just return single "SystemMessageManager:HandleMessageDone" (outside the loop) after all the messages are handled. Instead, multiple "SystemMessageManager:HandleMessageDone" have to be returned one by one corresponding to each system message to release its CPU wake lock respectively.

Comment #61 is a practical example. Hope that's clear to you.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/messages/SystemMessageManager.js#109
> Instead, multiple "SystemMessageManager:HandleMessageDone" have to be returned one by one 
> corresponding to each system message to release its CPU wake lock respectively.

That's fine, but then you have to increment the wake lock's count /once for each pending message/, because we will decrement the lock's count once for each pending message.  I see the wake lock's count only being incremented once when we deliver multiple pending messages.

That's why I was suggesting instead a simpler (to my mind) invariant:

Every time the parent sends SMM:Message or SMM:GetPendingMessages:Return:OK, the child must send some message (call it HandleMessageDone, or whatever) when it's done handling that one event.  There is therefore a 1:1 correspondence between messages sent from the parent and messages sent from the child.  The parent holds a lock until it hears back from the child.  That is very simple to reason about.

Another problem with this patch that I still do not understand is why it's safe to call acquireCpuWakeLock outside the loop that dispatches to multiple handlers.  I know you've tried to explain it before -- I'm busy moving (I'd planned to take these days as PTO), so I don't have the time right now to acquire the global understanding necessary to understand whether what you're saying is right.  But it seems to me that we can have a simpler rule which would be obviously correct, and that's better than a more complex rule that may or may not be correct.
Attachment #751999 - Flags: review?(justin.lebar+bug) → review-
Hi all, excuse me, I have one question. Sometimes, I find the sms notification is also blocked before LCD lighted up.Is it a similar problem? Shall I raise another bug? Or this patch can resolve it as well?
(In reply to Justin Lebar [:jlebar] from comment #67)
> > Instead, multiple "SystemMessageManager:HandleMessageDone" have to be returned one by one 
> > corresponding to each system message to release its CPU wake lock respectively.
> 
> That's fine, but then you have to increment the wake lock's count /once for
> each pending message/, because we will decrement the lock's count once for
> each pending message.  I see the wake lock's count only being incremented
> once when we deliver multiple pending messages.

I did! I require and release the wake lock by a *System Message ID*. Right? It means we do increment/decrement one wake lock for one message, instead of for a set of pending messages.

> 
> That's why I was suggesting instead a simpler (to my mind) invariant:
> 
> Every time the parent sends SMM:Message or SMM:GetPendingMessages:Return:OK,
> the child must send some message (call it HandleMessageDone, or whatever)
> when it's done handling that one event.  There is therefore a 1:1
> correspondence between messages sent from the parent and messages sent from
> the child.  The parent holds a lock until it hears back from the child. 
> That is very simple to reason about.

OK. I'll try to work out a solution based on the invariant you're suggesting. However, I'd still prefer my original invariant, though. I'll come back with a new patch. Basically, the new approach is going to be: the CPU wake lock is indexed by the page instead of the system message ID.

Thanks for all the clarifications. :)
(In reply to buri.blff from comment #68)
> Hi all, excuse me, I have one question. Sometimes, I find the sms
> notification is also blocked before LCD lighted up.Is it a similar problem?
> Shall I raise another bug? Or this patch can resolve it as well?

Exactly right! I used to think of the same thing that we should also acquire the wake lock somewhere for SMS/MMS. We've already done for incoming calls and alarms, but not for SMS/MMS yet.

The solution here can solve partial issues for SMS/MMS, which is going to acquire and release CPU wake locks within the system message mechanism. However, we still need to acquire extra CPU wake locks for the code path from RIL wake up to calling .broadcastMessage().
Attached patch Follow-Up Patch, V5 (obsolete) — Splinter Review
Attachment #751999 - Attachment is obsolete: true
Attachment #752520 - Flags: review?(justin.lebar+bug)
Attachment #752520 - Flags: feedback+
Attached patch Follow-Up Patch, V5.1 (obsolete) — Splinter Review
Just refined some comments. Some notes you might want to concern:

1. Why we need |handledCount|?

It's aimed for avoiding a race condition issue. Without this, you can imagine what's happening if the child fires SMM:HandleMessagesDone and the parent fires system message at the same time? The CPU wake lock for the newly fired system message would prematurely be cancelled. Therefore, the child needs to explicitly claim how many system messages have been handled for each run.

2. Why we need |type| to do the index?

Because the child always receives the messages given a specific system message type, no matter it's coming from SMM:Message or SMM:GetPendingMessages:Return. It means the child can only claim that the pending messages given a certain type have been handled. For example, the child can call .mozSetMessageHandler("alarm", ...) and then .mozSetMessageHandler("activity", ...) to retrieve 2 sets of pending messages separately. Therefore, the CPU wake locks maintained in the parent must be indexed by:

  - system message type
  - manifest URL
  - page URL
Attachment #752520 - Attachment is obsolete: true
Attachment #752520 - Flags: review?(justin.lebar+bug)
Attachment #752542 - Flags: review?(justin.lebar+bug)
Attachment #752542 - Flags: feedback+
Try to elaborate more about my new patch:

1. The old approach (V4.1) is system-message-based CPU wake lock mechanism. We cannot release the wake lock until all the pages finish handling that message. The |lockCount| means the number of pages.

2. The new approach (V5.1) is page-based CPU wake lock mechanism. We cannot release the wake lock until all the system messages have been handled. The |lockCount| means the number of system messages.

Approach #2 is the invariant you want (hopefully). Well, #2 uses less IPCs because it fires single SMM:HandleMessagesDone after all the messages are handled. It might be an advantage. :)
> I did! I require and release the wake lock by a *System Message ID*.

You're right, I missed this.  But it's /still/ not right.  The code will release the wake lock after the /first/ message is handled, but we want it to release the wake lock after the /last/ message is handled.

But also, it's important to write code such that it not only /is/ correct, but that it /looks/ correct.  Even if this code was correct (and I don't think it was), it certainly didn't appear to be correct.

I'll have a look at the new patch.  :)
This patch still acquires the CPU wake lock outside the loop.

I think I understand why you believe this is right, and it probably is.  But in order for someone reading the code to believe that this code is right, they have to have global knowledge of how system messages and our app loading code works, and you, me, and Fabrice are basically the only people in that position.

This late in the game I cannot r+ this patch until it is obviously correct.  In order to meet that burden, we need a simpler invariant.  The CPU wake lock needs to be acquired in the loop, and then processes must respond to a SystemMessageManager:Message event, even if they don't match the manifest.

I'm sorry to micromanage your patch like this, but there have been too many subtle bugs in past versions of this patch for me to be comfortable with anything other than the most obviously-correct solution.
Attachment #752542 - Flags: review?(justin.lebar+bug) → review-
Sure. No problem. I'm glad to work out the most optimized and understandable solution. :) I'll come back with a new patch by putting the wake lock acquiring into the loop. Please stay tuned. Thanks!
(In reply to Justin Lebar [:jlebar] from comment #74)
> > I did! I require and release the wake lock by a *System Message ID*.
> 
> You're right, I missed this.  But it's /still/ not right.  The code will
> release the wake lock after the /first/ message is handled, but we want it
> to release the wake lock after the /last/ message is handled.

No, regarding this I believe this is correct. Each "SMM:HandleMessageDone" carries the system message ID to simply *decrement one lock count* of that specific system message, instead of cancelling the entire wake lock.
Hi Justin and Fabrice,

I hope to fix this first. This is obviously wrong to send the system message to a target where the page URL doesn't match. This is absolutely redundant and a waste of IPCs.

The part-2 patch later can benefit from this fix.
Attachment #753203 - Flags: review?(justin.lebar+bug)
Attachment #753203 - Flags: review?(fabrice)
Please see comment #72 for some details, which still stands. This patch needs to be on top of the part-1 patch.
Attachment #752542 - Attachment is obsolete: true
Attachment #753205 - Flags: review?(justin.lebar+bug)
Attachment #753205 - Flags: feedback?(fabrice)
In summary, I hope to strip out

-    if (msg.manifest != this._manifest || msg.uri != this._uri) {
-      return;
-    }

This makes the "SMM:HandleMessageDone" much more sense because it's now an 1:1 return corresponding to the SMM:Message or SMM:GetPendingMessages:Return:OK.

Another benefit is we don't need to waste redundant IPC to send the system message to a App page that doesn't match the manifest URL and page URL.

Hope it's clear to you.
Comment on attachment 753203 [details] [diff] [review]
Follow-Up Patch, Part 1 (don't need to send system message to a mismatching page), V1

I'm in favor of sending fewer messages!

>@@ -232,23 +232,25 @@ SystemMessageInternal.prototype = {
>     }
> 
>     switch(aMessage.name) {
>       case "SystemMessageManager:AskReadyToRegister":
>	 return true;
>	 break;
>       case "SystemMessageManager:Register":
>       {
>-        debug("Got Register from " + msg.manifest);
>+        debug("Got Register from " + msg.uri + " @ " + msg.manifest);
>	 let targets, index;
>	 if (!(targets = this._listeners[msg.manifest])) {
>	   this._listeners[msg.manifest] = [{ target: aMessage.target,
>+                                             uri: msg.uri,

Wow, there are /tabs/ in this file?  I wonder who we have to blame for that.
:)  You might want to match it here and elsewhere so we don't make the problem
worse.

>@@ -438,23 +440,27 @@ SystemMessageInternal.prototype = {
>					   aPageURI,
>					   aManifestURI)) {
>       return false;
>     }
> 
>     let targets = this._listeners[aManifestURI];
>     if (targets) {
>       for (let index = 0; index < targets.length; ++index) {
>-          let manager = targets[index].target;
>-          manager.sendAsyncMessage("SystemMessageManager:Message",
>-                                   { type: aType,
>-                                     msg: aMessage,
>-                                     manifest: aManifestURI,
>-                                     uri: aPageURI,
>-                                     msgID: aMessageID });
>+        let target = targets[index];
>+        // We only need to send the system message to the targets which match
>+        // the manifest URL and page URL of the destination of system message.
>+        if (target.uri != aPageURI) {

Why is it OK to check only the page URI?  I'd think we'd have to check the
manifest URI as well, since two different apps could be at the same page URI,
when only one of the apps is supposed to get the system message.

>diff --git a/dom/messages/SystemMessageManager.js b/dom/messages/SystemMessageManager.js
>--- a/dom/messages/SystemMessageManager.js
>+++ b/dom/messages/SystemMessageManager.js

>@@ -234,18 +228,19 @@ SystemMessageManager.prototype = {
>       debug("the app loaded in the browser doesn't need to register " +
>	     "the manifest for listening to the system messages");
>       return;
>     }
> 
>     if (!this._registerManifestReady) {
>       cpmm.sendAsyncMessage("SystemMessageManager:Register",
>			     { manifest: this._manifest,
>-                              innerWindowID: this.innerWindowID
>-                            });
>+                              uri: this._uri,
>+                              innerWindowID: this.innerWindowID });
>+
>       this._registerManifestReady = true;
>     }
>   },

We want to modify SystemMessageManager:GetPendingMessages:Return to remove the
URI too, right?

This looks good to me, if we can get the question about checking the manifest
URIs cleared up.  I'd like Fabrice to look at the next version, though.
Attachment #753203 - Flags: review?(justin.lebar+bug) → feedback+
Comment on attachment 753205 [details] [diff] [review]
Follow-Up Patch, Part 2 (acquire CPU wake locks to make sure system messages are handled on time), V6

This looks really good to me, Gene; I'm as confident as I can be that we're not
breaking anything!  Thanks for being patient with me.

>+  _acquireCpuWakeLock: function _acquireCpuWakeLock(aPageKey) {
>+    let cpuWakeLock = this._cpuWakeLocks[aPageKey];
>+    if (!cpuWakeLock) {
>+      // We have to ensure the CPU doesn't sleep during the process of the page
>+      // handling the system messages, so that they can be handled on time.
>+      debug("Acquiring a CPU wake lock for page key = " + aPageKey);
>+      cpuWakeLock = this._cpuWakeLocks[aPageKey] = {
>+        wakeLock: powerManagerService.newWakeLock("cpu"),
>+        timer: Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer),
>+        lockCount: 1
>+      };
>+    } else {
>+      // We've already acquired the CPU wake lock for this page,
>+      // so just add to the lock count and extend the time out.

s/time out/timeout

>@@ -345,16 +404,26 @@ SystemMessageInternal.prototype = {
>             if (pendingMessages[i].msgID === msg.msgID) {
>               pendingMessages.splice(i, 1);
>               break;
>             }
>           }
>         }, this);
>         break;
>       }
>+      case "SystemMessageManager:HandleMessagesDone":
>+      {
>+        debug("received SystemMessageManager:HandleMessagesDone " + msg.type +
>+          " with " + msg.handledCount + " for " + msg.uri + " @ " + msg.manifest);
>+
>+        // A page has finished handling part of its system messages, so we try

Nit: s/part/some/  "Part of" usually refers to taking one thing and dividing it
into pieces; "He ate part of the pie."  If you have something that's already
in discrete units ("its system messages"), we use "some of".  You can often use
"some of" where you use "part of", but not vice versa.

>+        // to release the CPU wake lock we acquired on behalf of that page.
>+        this._releaseCpuWakeLock(this._createKeyForPage(msg), msg.handledCount);
>+        break;
>+      }
>     }
>   },
> 

>@@ -437,33 +506,55 @@ SystemMessageInternal.prototype = {
>     // Don't send the system message not granted by the app's permissions.
>     if (!SystemMessagePermissionsChecker
>           .isSystemMessagePermittedToSend(aType,
>                                           aPageURI,
>                                           aManifestURI)) {
>       return false;
>     }
> 
>+    let isAppPageRunning = false;

Nit: appPageIsRunning, or appIsAlive, or something like that.

>+    // The app page isn't running and relies on the 'open-app' chrome event to
>+    // wake it up. We still need to acquire a CPU wake lock for that page and
>+    // expect that we will receive a "SystemMessageManager:HandleMessagesDone"
>+    // message when the page finishes handling the system message with other
>+    // pending messages. At that point, we'll release the lock we acquired.
>+    if (!isAppPageRunning) {
>+      this._acquireCpuWakeLock(pageKey);
>+    }

Nit: As written, this comment is fine, but it needs to be inside the if
statement, because it's only true inside the if.  Otherwise you can change it
to say "If the app page isn't running, we rely on the 'open-app chrome event to
wake it up."

>diff --git a/dom/messages/SystemMessageManager.js b/dom/messages/SystemMessageManager.js
>--- a/dom/messages/SystemMessageManager.js
>+++ b/dom/messages/SystemMessageManager.js

Thanks for writing this helpful comment.

>+  // Possible messages:
>+  //
>+  //   - SystemMessageManager:Message
>+  //     This one will only be received when the child process is alive,
>+  //     carrying the single system message fired from the parent.

Nit: The child process doesn't receive any messages if it's not alive.  :)

I think you mean "This one will only be received when the child process is
alive /when the message is initially sent/."

>+  //   - SystemMessageManager:GetPendingMessages:Return
>+  //     This one will be received when the starting child process wants to
>+  //     retrieve the pending system messages from the parent (i.e. sending
>+  //     SystemMessageManager:GetPendingMessages).

Nit: "i.e. /after/ sending SystemMessageManager:GetPendingMessages
Attachment #753205 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 753203 [details] [diff] [review]
Follow-Up Patch, Part 1 (don't need to send system message to a mismatching page), V1

Review of attachment 753203 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, with the removal of uri in SystemMessageManager:GetPendingMessages:Return
Attachment #753203 - Flags: review?(fabrice) → review+
(In reply to Justin Lebar [:jlebar] from comment #81)

> 
> >@@ -438,23 +440,27 @@ SystemMessageInternal.prototype = {
> >					   aPageURI,
> >					   aManifestURI)) {
> >       return false;
> >     }
> > 
> >     let targets = this._listeners[aManifestURI];
> >     if (targets) {
> >       for (let index = 0; index < targets.length; ++index) {
> >-          let manager = targets[index].target;
> >-          manager.sendAsyncMessage("SystemMessageManager:Message",
> >-                                   { type: aType,
> >-                                     msg: aMessage,
> >-                                     manifest: aManifestURI,
> >-                                     uri: aPageURI,
> >-                                     msgID: aMessageID });
> >+        let target = targets[index];
> >+        // We only need to send the system message to the targets which match
> >+        // the manifest URL and page URL of the destination of system message.
> >+        if (target.uri != aPageURI) {
> 
> Why is it OK to check only the page URI?  I'd think we'd have to check the
> manifest URI as well, since two different apps could be at the same page URI,
> when only one of the apps is supposed to get the system message.

It's ok because 'targets' is already filtering only the listeners for a given manifest uri.
Attachment #753205 - Flags: feedback?(fabrice) → feedback+
> It's ok because 'targets' is already filtering only the listeners for a given manifest 
> uri.

Ah, duh.

Thanks.
Thanks Justin and Fabrice's help! :D

----------
The following isn't directly related to this bug. Just found something we can improve. Will fire another bug for this.

The current ._listners[aManifestURI] mechanism is indexed by the manifest URL. I suggest we should make it indexed by (manifest URL + page URL) when registering the target, because one (manifest URL + page URL) must be equivalent to *one* target (i.e. one SystemMessageManager). I mean, only *one* target is going to be delivered with the system message, since the system message's target (manifest URL + page URL) is always unique in any way.

If my assumption is yes, then we don't need the following |targets| array and the page URL check. We should just focus on that unique target to send our system message.

let targets = this._listeners[aManifestURI];
if (targets) {
  for (let index = 0; index < targets.length; ++index) {
     let target = targets[index];
     // We only need to send the system message to the targets which match
     // the manifest URL and page URL of the destination of system message.
     if (target.uri != aPageURI) {
       continue;
     }
     ...
     let manager = target.target;
     manager.sendAsyncMessage("SystemMessageManager:Message", ...);
  }
}

Does it sound reasonable?
(In reply to Gene Lian [:gene] from comment #88)
> The current ._listners[aManifestURI] mechanism is indexed by the manifest
> URL. I suggest we should make it indexed by (manifest URL + page URL) when
> registering the target.

Fire Bug 875694 for this.
(In reply to Gene Lian [:gene] from comment #70)
> (In reply to buri.blff from comment #68)
> > Hi all, excuse me, I have one question. Sometimes, I find the sms
> > notification is also blocked before LCD lighted up.Is it a similar problem?
> > Shall I raise another bug? Or this patch can resolve it as well?
> 
> Exactly right! I used to think of the same thing that we should also acquire
> the wake lock somewhere for SMS/MMS. We've already done for incoming calls
> and alarms, but not for SMS/MMS yet.
> 
> The solution here can solve partial issues for SMS/MMS, which is going to
> acquire and release CPU wake locks within the system message mechanism.
> However, we still need to acquire extra CPU wake locks for the code path
> from RIL wake up to calling .broadcastMessage().

Fire Bug 875703 for this.
https://hg.mozilla.org/mozilla-central/rev/d78a82d19889
https://hg.mozilla.org/mozilla-central/rev/4b96a52d6ce1
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Amelie and 田旻,

Could you please help verify this? We need more eyes on this kind of random issue. I believe it's working well now. A reminder: please do make sure your build includes the check-ins at comment #87. Thanks!
(In reply to Gene Lian [:gene] from comment #92)
> Amelie and 田旻,
> 
> Could you please help verify this? We need more eyes on this kind of random
> issue. I believe it's working well now. A reminder: please do make sure your
> build includes the check-ins at comment #87. Thanks!

OK, but I am still waiting for the update of this patch in our build.
This issue no longer reproduces on both V1.1 and V1.0.1 The alarm is set off and functions correctly at the set time chosen by the user.

Leo Mozz build V1.1: 20130530070208
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/09ac1fd2959c
Gaia: 1cca9324d4444ad28c6fa99875e17abf7e8230be
Version 18.0

Buri Mozz build V1.0.1: 20130530070213
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/11b55d3ada71
Gaia: ac293ce59acc3bede083fad1b973794fa8bf0253
Version 18.0
Status: RESOLVED → VERIFIED
Comment from Mozilla:This issue no longer reproduces on both V1.1 and V1.0.1 The alarm is set off and functions correctly at the set time chosen by the user.
 
 Leo Mozz build V1.1: 20130530070208
 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/09ac1fd2959c
 Gaia: 1cca9324d4444ad28c6fa99875e17abf7e8230be
 Version 18.0
 
 Buri Mozz build V1.0.1: 20130530070213
 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/11b55d3ada71
 Gaia: ac293ce59acc3bede083fad1b973794fa8bf0253
 Version 18.0
Great! Thank you guys for all the double checks. :)
Comment from Mozilla:Great! Thank you guys for all the double checks. :)
You need to log in before you can comment on or make changes to this bug.