Closed Bug 1220032 Opened 9 years ago Closed 7 years ago

SystemMessageInternal may fail to (re) launch an app if the app shuts down/unregisters around the time the message is sent

Categories

(Core :: DOM: Device Interfaces, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
blocking-b2g 2.5+
Tracking Status
firefox45 --- affected

People

(Reporter: asuth, Unassigned)

Details

Attachments

(1 file)

In bug 1213169 comment 13 the following scenario was observed:
- The email app releases the cpu wake-lock keeping the device awake
- The email app calls window.close on itself
- The device goes to sleep
- The device wakes up because mozAlarms fires.  mozAlarms notifies the requestsync API, which then decides to send a "request-sync" message to the email app
- The (old) content process the email app was in actually gets around to finishing shutting down the email app.
- SystemMessagesInternal gets the "Unregister" event for the email app and rejects the message send.
- The "request-sync" message stays queued for the email app and will be delivered when the email app ends up getting launched, but that won't happen automatically now.

While that specific trace involves wake-locks and other complexity, the key idea is that this is fundamentally a race that results in system messages for an app not waking the app up.

While there may be legitimate reasons to stop waking an app up for reasons of bad behavior, or to temporarily delay waking the app due to resource constraints, that's not happening in this scenario.

If the transition to serviceworkers and its eventing/wakeup system will moot SystemMessageInternal in a timely fashion, then maybe this doesn't need to be dealt with before it's removed.  Otherwise, I'm assuming this will make a variety of APIs be flakey and unreliable, like simple push notifications.
[Blocking Requested - why for this release]:
This bug causes APIs like mozAlarms and push notifications to become unreliable.

I've now seen the bug trigger for both "alarms" and "request-sync" messages with an aries device on trunk.
blocking-b2g: --- → 2.5?
Summary: SystemMessageInternal may fail to launch an app if the app shuts down/unregisters around the time the message is sent → SystemMessageInternal may fail to (re) launch an app if the app shuts down/unregisters around the time the message is sent
There also appear to be no trivial workarounds on the part of an app to ensure its closure completes before the device goes to sleep.  In a particular reproduction case from just now, the email app calls window.close() while holding an active cpu wakelock (that it does not explicitly unlock).  (This specific case is a failsafe desperation measure and not just the email app being a good citizen.)  The device goes into deep power-sleep until the email-scheduled alarm fires.  When the device wakes up to service the alarm, the window.close() completes, racing the system message again, resulting in the email app not being (re)launched.

(Email could of course schedule a second alarm shortly after the first alarm, but that's no good for things being woken up by push notifications.  Also, it's admitting failure and creates a potential nightmare scenario of infinitely multiplying alarms.)


Solution-wise, I would propose that each system message should be given the opportunity to launch their target app at most once.  This avoids any infinite re-launch cycles while still accounting for the potential "lame duck" period for apps and their content process where they've irrevocably entered a stage of closing/shutdown.

SystemMessageInternal sorta tracks the concept of "in-flight" messages through its _cpuWakeLock table and _pendingPromises tables, although neither has a rich enough understanding right now to decide to launch a process when the Unregister event comes in.   Hygienically, it is nice to keep _cpuWakeLock separate, but as an example of the failings of that in this log, since it doesn't understand Unregister, it holds a wakelock for 30 wasted seconds.

The candidate implementation I'd propose is to beef up _pendingPromises to also track the message that was sent and whether a launch was triggered.  In the Unregister case, we find the entry, and if !launched, we cause the message to be sent again, causing it to be counted as a launch.  Note that the current Unregister implementation seems over-zealous in that it only matches based on manifestURL, but it seems like it should also be matching on pageURL.  (And _pageURL is constant in SystemMessageManager, even if there are history pushState shenanigans.)
Seems like broken functionality so 2.5+.

Fabrice/baku/ferjm: any takers?
blocking-b2g: 2.5? → 2.5+
Flags: needinfo?(ferjmoreno)
Flags: needinfo?(fabrice)
Flags: needinfo?(amarchesini)
I am quite busy atm :(

Andrew, you already did an excellent analysis of the code and the possible solution, maybe you can take this? Thank you! :)
Flags: needinfo?(ferjmoreno) → needinfo?(bugmail)
I can provide a fix if the owner/vision-holder for this code either signs-off on what I propose or proposes something better.  (I could see it being preferable to centralize the wake-lock and promise management in some cases.)  I suppose this code falls under the mozApps API stuff (https://wiki.mozilla.org/Modules/All#mozApps_API_.26_UI) since only FxOS stuff likes system messages.  Based on the module, that seems like you or fabrice.  Based on the hg log, :kanru also seems possible (maybe under the auspices of https://wiki.mozilla.org/Modules/All#Browser_WebAPI ?).

My primary concern is that a lot of the B2G DOM APIs and such suffer from technical debt, and I'd like to be helping move things forward according to a plan rather than just contributing another layer of unowned random hacks that creates even worse nightmares for the next person.

So, uh, if you are the person with ownership, or if I should pester :fabrice, etc., that would be much appreciated!  We do aspire to have the email app reliably check email for people!
Flags: needinfo?(bugmail) → needinfo?(ferjmoreno)
The approach you suggest LGTM, but I didn't write this code, so I'll defer to Fabrice.
Flags: needinfo?(ferjmoreno)
Flags: needinfo?(amarchesini)
Flags: needinfo?(fabrice)
Cleaning up Device Interfaces component, and mass-marking old FxOS bugs as incomplete.

If any of these bugs are still valid, please let me know.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: