Closed Bug 1181489 Opened 10 years ago Closed 10 years ago

mozAlarm mixes up 2 alarms: one for a regular CalDav sync, the other to wake you up

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla44

Project Flags:

blocking-b2g

2.5+

Tracking Flags:

Tracking

Status

firefox44

---

fixed

b2g-master

---

affected

People

(Reporter: jlorenzo, Assigned: timhuang)

References

Details

(Keywords: foxfood)

Attachments

(3 files, 1 obsolete file)

Logcat 10 years ago Johan Lorenzo [:jlorenzo] 14.52 KB, text/plain		Details
dev-log-main.log 10 years ago Léonard 73.01 KB, text/plain		Details
0001-Bug-1181489-enable-debug-for-AlarmService.jsm.patch 10 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 1004 bytes, patch	fabrice : review+	Details \| Diff \| Splinter Review
bug1181489.patch 10 years ago Tim Huang[:timhuang] 2.93 KB, patch	fabrice : review+	Details \| Diff \| Splinter Review

Johan Lorenzo [:jlorenzo]

Reporter

Description

•

10 years ago

Attached file Logcat — Details

Pre-requisites On Firefox OS: * Have a CalDav set up in Calendar * Have an alarm in Clock Steps to reproduce 1. Don't do anything[1]: don't change the time, don't change the time zone, don't wait for daylight saving switch to happen, don't create a new alarm, don't deactivate and reactivate it. 2. When the alarm rings in the middle of the night, capture the logs. Results We have a race condition in MozAlarm. In the logs attached, you can see that the Calendar creates a new alarm for itself to sync up. > 07-08 02:58:21.334 4776 4776 I Calendar: Content JS LOG: controllers/periodic_sync [calendar] "Will save alarm:" {"alarmId":3853,"start":"2015-07-08T00:58:21.241Z","end":"2015-07-08T01:28:21.241Z"} Then, at 01:28:22 UTC (3:28 in my timezone), an alarm is indeed emitted, but it's not the right one: > 07-08 03:28:22.104 4905 4905 I Clock : Content JS LOG: [Clock] ### ALARM FIRED! ### Details: {"id":3579,"date":"2015-07-08T05:05:00.000Z","respectTimezone":"ignoreTimezone","data":{"id":1,"type":"normal","date":"2015-07-08T05:05:00.000Z"}} Repro frequency: As said above, that's a race condition. However, I've had 2 alarms that rang in the middle of the night for 3 weeks Potential dupes: bug 1031573, bug 1180257 Build ID 20150707071412 Gaia Revision e6506d68e01489b57616f8b74e8773f938ce62b3 Gaia Date 2015-07-06 18:14:41 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/e7e69cc8c07b Gecko Version 42.0a1 Device Name aries Firmware(Release) 4.4.2 Firmware(Incremental) eng.worker.20150619.224059 Firmware Date Fri Jun 19 22:41:08 UTC 2015 Bootloader s1 [1] These are the actions I've not done since the last weekend. The alarm worked perfectly on Monday and Tuesday.

Johan Lorenzo [:jlorenzo]

Reporter

Comment 1

•

10 years ago

Gene Lian was apparently the main contributor for MozAlarm, but he left the company. From the last reviews that happened on /dom/alarms and regarding Fernando's feedback, it seems like you could help investigating this bug Fabrice. Is that right? If not, please redirect.

Severity: normal → blocker

Component: DOM: Device Interfaces → DOM

Flags: needinfo?(fabrice)

Bobby Chien

Updated

•

10 years ago

QA Whiteboard: [foxfood-triage]

Léonard

Comment 2

•

10 years ago

Attached file dev-log-main.log — Details

This night my firefox OS 2.2 started an alarm at 4h34 AM. But I only have an alarm set at 5h23 UTC. In the log, we can see that OpenWapp is setting an alarm every hour (first is at 1h34). Do not hesitate to ask for more test or log.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 3

•

10 years ago

Reading [1], I wonder if we should check whether the currentAlarm belongs to the requesting app? I see the previous condition checks it. It would help to set the DEBUG const to true in that file and test similar cases. [1] https://dxr.mozilla.org/mozilla-central/source/dom/alarm/AlarmService.jsm#489-491 Reading the file, I think the dance around "this._currentAlarm" is really complicated and error-prone. There is code like [2] that comes from this and could just be removed if we'd use only the queue (and the "current" alarm would be this._alarmQueue[0]). [2] https://dxr.mozilla.org/mozilla-central/source/dom/alarm/AlarmService.jsm#430-436

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 4

•

10 years ago

Attached patch 0001-Bug-1181489-enable-debug-for-AlarmService.jsm.patch (obsolete) — Details — Splinter Review

hey Fabrice, what do you think about adding some more debug here? Do you think we should enable it in AlertsManager as well?

Attachment #8638912 - Flags: review?(fabrice)

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 5

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #3) > Reading the file, I think the dance around "this._currentAlarm" is really > complicated and error-prone. There is code like [2] that comes from this and > could just be removed if we'd use only the queue (and the "current" alarm > would be this._alarmQueue[0]). I see that setting _currentAlarm actually set the underlying alarm. I don't really like this because this is quite implicit and I prefer explicit code, but now I understand better why this is used.

[:fabrice] Fabrice Desré

Comment 6

•

10 years ago

Comment on attachment 8638912 [details] [diff] [review] 0001-Bug-1181489-enable-debug-for-AlarmService.jsm.patch Review of attachment 8638912 [details] [diff] [review]: ----------------------------------------------------------------- How much debug spew do we get from that? I would be fine doing that on non-release builds only.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 7

•

10 years ago

I think the debug lines would be quite small: only logs when the alarms are set or triggered. We have already such logs in the clock app (and OpenWapp too from the previous logs). But logs in the API would help knowing if the alarms are triggered from the HAL backend or directly from the queue (because some code would find out that they're expired).

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 8

•

10 years ago

We need more logs in the wild and we can disable them once we find the issue.

[:fabrice] Fabrice Desré

Updated

•

10 years ago

Attachment #8638912 - Flags: review?(fabrice) → review+

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

10 years ago

Keywords: checkin-needed, leave-open

Pulsebot

Comment 9

•

10 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/f7b98d678e3e

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 10

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/f7b98d678e3e

Mahendra Potharaju [:mahe]

Comment 11

•

10 years ago

Johan, Can you reproduce this again? Based on that we can 2.5+ it. Thanks

Flags: needinfo?(jlorenzo)

Johan Lorenzo [:jlorenzo]

Reporter

Comment 12

•

10 years ago

Not consistently. The last time was last week at 3 in the morning. I didn't have the log patch at that time. I updated my phone this morning so I'll get some more logs next time it happens.

Flags: needinfo?(jlorenzo)

Al Tsai [:atsai]

Comment 13

•

10 years ago

[Triage] Blocked by this.

blocking-b2g: 2.5? → 2.5+

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 14

•

10 years ago

I also got numerous reports of similar issues, but nobody gave me any log yet. (and I don't reproduce myself, but likely because I don't use Whatsapp/Telegram/Calendar that set other alarms).

Mahendra Potharaju [:mahe]

Comment 15

•

10 years ago

Johan, Any luck with the logs on this one? Or, should we request QAWanted to try reproducing it? This is a 2.5 blocker and is waiting on logs. Thanks

Flags: needinfo?(jlorenzo)

Johan Lorenzo [:jlorenzo]

Reporter

Comment 16

•

10 years ago

On the dogfood POV, I didn't get waken by a nightly alarm. I tried to force the race condition by creating 20 alarms, plus 5 calendars, 1 timer and all. I couldn't get it. We could try a qawanted, but I'm afraid we need a more mechanical way to make this race condition to appear. For example, create 10,000 mozAlarms and see if the id logged is always the correct one. At the moment, I don't think the QA team has the bandwidth to ramp up on the mozAlarm API, in order to create that many alarms. Can somebody from the dev team give it a try?

Flags: needinfo?(jlorenzo) → needinfo?(mpotharaju)

Mahendra Potharaju [:mahe]

Comment 17

•

10 years ago

Ken, Can you please find an assignee for this one too? Thanks!

Flags: needinfo?(mpotharaju) → needinfo?(kchang)

Tim Huang[:timhuang]

Assignee

Updated

•

10 years ago

Assignee: nobody → tihuang

Ken Chang[:kenkai]

Comment 18

•

10 years ago

Thanks, Tim.

Flags: needinfo?(kchang)

Tim Huang[:timhuang]

Assignee

Comment 19

•

10 years ago

I believe that the root cause of this bug is triggered by async operations between the alarms service and the child thread which listens the RTC timer when it involves the “remove alarm” operation. I, however, cannot fully confirm that this is the root cause due to the lack of debug messages. The reason that I think the “remove” involves is based on the second log. In this log, there were always 5 alarms set for OpenWapp, and 5 awake signals had been fired. At the very last round of set-and-trigger before the alarm of clock app fired, there were 5 alarms set, but only 4 awake signals. Thus, I believe there was one alarm removed in one of these awake signals. In a certain situation, the remove operation may cause the next alarm to be fired. Before I illustrate the process which triggers this bug, there is a very important premise need to describe first. That is, even the child thread fires an alarm on time, the actual time that this alarm been processed by the alarms service may be delayed due to some reasons, like the system is busy or some events need to be process first. The process triggers this bug which involves three alarms, two of these alarms will expire close to each other, say they separate in timing for 100ms, and the first alarm will remove the second alarm in its handle function. The third alarm will expire later than these two alarms. Following is the process in a chronological order and I named these alarms with numbers according to the order of their expired timing. 1. Three alarms been set. Current alarm = Alarm1, Queued alarm = Alarm2, 3, underling alarm = Alarm1 2. Alarm 1 expired, the child thread fires the alarm 1. Current alarm = Alarm1, Queued alarm = Alarm2, 3, underling alarm = NULL 3. The alarms service catches the fired event from child thread for alarm 1, send alarm 2 to app, and set the second alarm. Current alarm = Alarm2, Queued alarm = Alarm3, underling alarm = Alarm2 4. The remove request has been sent from App, but the alarms service is not going to set current alarm at this moment, it will set the alarm after it finishes the database removal. Current alarm = Alarm2, Queued alarm = Alarm3, underling alarm = Alarm2 5. Alarm 2 expired, the child thread fires the alarm 2. Current alarm = Alarm2, Queued alarm = Alarm3, underling alarm = NULL 6. The database removal returns, the alarms service set the current alarm here. Current alarm = Alarm3, Queued alarm = NULL, underling alarm = alarm 3 7. The alarms service catches the fired even from child thread for alarm 2, sent alarm 3 to app. (The bug happens here) Current alarm = NULL, Queued alarm = NULL, underling alarm = alarm 3 Right now, I can reproduce the similar phenomena by manually adding a short delay when alarms manager handling the fired alarm. And this bug could be solved by adding a sanity check when alarm been fired. If the alarm does not expire at that time, I reset this alarm to prevent this issue.

Tim Huang[:timhuang]

Assignee

Comment 20

•

10 years ago

I also found another bug may cause that an alarm would be fired earlier. The actual timing of setting the underlying alarm would be delayed when the system is extremely busy. So it will set an alarm which already expired. This may trigger some problems of the alarm service. This could be solved by adding a sanity check when the alarm writing to underlying alarm. If this alarm already expired at that moment, it directly notify this alarm instead of setting it.

Tim Huang[:timhuang]

Assignee

Comment 21

•

10 years ago

Attached patch bug1181489.patch — Details — Splinter Review

Attachment #8664614 - Flags: review?(fabrice)

Tim Huang[:timhuang]

Assignee

Comment 22

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7721606645

[:fabrice] Fabrice Desré

Updated

•

10 years ago

Attachment #8664614 - Flags: review?(fabrice) → review+

Tim Huang[:timhuang]

Assignee

Updated

•

10 years ago

Attachment #8638912 - Attachment is obsolete: true

Tim Huang[:timhuang]

Assignee

Updated

•

10 years ago

Keywords: leave-open → checkin-needed

Pulsebot

Comment 23

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a91c821d85b7

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 24

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a91c821d85b7

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox44: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla44

Marcus Cavanaugh [:mcav]

Updated

•

10 years ago

Updated

•

10 years ago

Blocks: 1232256

[:fabrice] Fabrice Desré

Updated

•

9 years ago

Flags: needinfo?(fabrice)

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.