Closed Bug 1182358 Opened 4 years ago Closed 4 years ago

[request-sync] request-sync uses an nsITimer for wake-up instead of a mozAlarm, so request-sync will never cause the device to wake up from power-saving sleep

Categories

(Core :: DOM: Device Interfaces, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
blocking-b2g 2.2?
Tracking Status
firefox43 --- fixed
b2g-v2.1 --- unaffected
b2g-v2.2 --- fixed
b2g-v2.2r --- fixed
b2g-master --- fixed

People

(Reporter: mhoye, Assigned: baku)

References

Details

(Keywords: foxfood, regression)

Attachments

(2 files)

Go check your email, open an email, look at it. Click the button on the side of the phone and put it away.

Wait for however long you like.

If I do this, the phone stops looking for my mail. If the last thing I leave on a screen before sleeping the phone is an open email, I won't get any email - not just no email notifications, but no new email - until I turn the phone _and_ flip back to the homescreen. I've got two email accounts here, set to check every ten minutes each, but if I hit this bug I can go hours without it checking.
Hi Mike, thanks for the bug report!

Can you attach a logcat using the steps outlined here:
https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo

That generally helps the email guys nail down the problem quicker.
Flags: needinfo?(mhoye)
[Blocking Requested - why for this release]:
Not checking mail when it really should is pretty bad, no?
blocking-b2g: --- → 2.5?
:mhenretty is right that a log would be invaluable.  It can also be super-hard to gather while reproducing at the current time.  If you do have time to help reproduce to gather a log, a trick is that if you go into the secret debug menu (https://wiki.mozilla.org/Gaia/Email/SecretDebugMode#Getting_to_the_Secret_Debug_Menu) you can hit a button that will make the sync interval UI have 20 second and 60 second options.  Unfortunately I think the pref 
dom.requestSync.minInterval defaults to 100 and so may also need to be manipulated prior to changing the email app settings.

(In reply to Mike Hoye [:mhoye] from comment #0)
> until I turn the phone _and_ flip back to the homescreen.

I think there's an important word missing around "turn the phone".  Do you mean turn the phone off or do you mean physically rotating the phone?  Most failure modes for the email app that are just the email app's fault can be addressed by force closing it by long-pressing the home button to bring the task switcher up and then closing the mail app by swiping it up to its doom or hitting the little "x" button.


The failure vectors that jump out at me as possible are:

- The duplicate event suppression logic could maybe be eating requestsync API alarms.

- requestsync is being flakey.

- The job/op queue or Inbox mutex is getting badly broken.
Once my eyes work right again - early next week! - I'll get a log for you. 

For what it's worth on the blocking status question, I'd rather have a phone that doesn't reliably make phone calls before I'd want to rely on one that periodically decides not to go get my email.
Flags: needinfo?(mhoye)
Ok, I'm going to set this up for reproduction now - should have logs in, well, 15 minutes or so, since I'm set to refresh every 10.

This bug is _killing me_, fwiw.  Do we have a [killing me] bugzilla keyword? We should.
(In reply to Mike Hoye [:mhoye] from comment #5)
> Ok, I'm going to set this up for reproduction now - should have logs in,
> well, 15 minutes or so, since I'm set to refresh every 10.

Great, thanks!  Please remember that because adb is a circular buffer and there's a lot of chatty things on the device, you need to have adb talking to a computer that is logging the stream to disk.  (And unfortunately, if using usb, this may impact how various APIs on the device operate.  Such as if you're only using wi-fi to get internet.  Which would actually explain your symptoms, in which case you want to change wi-fi to always be connected or file bugs against request-sync for not establishing wi-fi connections before waking up apps.)
Attached file jrburke logcat
I have a spark device and I was able to reproduce, the details on my setup:

* one gmail (non-moz) account.
* phone configured with wifi enabled but no sim card in the device. 
* Configured email account, set for 10 minute sync interval.
* Viewed a message, then pressed power button.
* Unplugged from USB.


At 16 minutes after that, I plugged in USB then woke up the phone and went back to message list from the message reader. The "Last Synced" said 17 minutes, which jives with the last sync I did before turning off the device.

So, no request-sync message to the email app since my last action.

This logcat shows all that happened in that 16 minutes. The top of it, with the iframe_shims messages comes from the message reader displaying the body of the message. There is a "pushCard for type: confirm_dialog" after that because I accidentally hit the delete button when trying to turn off the device. I said no to the delete, then powered it off.

No email app log messages after that. After plugging in the phone via USB, and then starting to type this comment, I noticed a sync come through about 10 minutes later from the 16 minute mark when I started this test.

So it seems like maybe request-sync goes into a deep sleep if no USB connection and the phone goes into sleep? Asking :baku for insight, as my request-sync contact.

Note that email uses `wifiOnly: false` when registering the request-sync action. The hope was that we would get woken up if cellular was available, not just if wifi.

I did an earlier test on a 5 minute interval but with the USB connection plugged in, and I did get consistent request-sync messages on the 5 minute interval.

If this is expected behavior for request-sync, then maybe we need to file an enhancement bug to have the request-sync machinery try to wake up network and connect if not exactly on the set interval specified by the app, then some approximate order of magnitude of wake would be nice.
Flags: needinfo?(amarchesini)
Keywords: foxfood
"then some approximate order of magnitude of wake would be nice."

Approximate is really not good enough here. There are a small handful of things a phone absolutely has to do reliably and correctly, and email is one of them.  If we're going to tell people we're checking email every ten minutes, then we'd damn well better be checking mail every ten minutes.
Can you clarify if you have a SIM providing (reliable) cellular data or if you're only using wi-fi?  If you're using wi-fi, can you please indicate what your "Wi-Fi Sleep" setting is on the Wi-Fi settings page?
Flags: needinfo?(mhoye)
I've got a data sim in and data is reliable where I am. I've tried it with Wi-Fi Sleep enabled or disabled, and get the same result.
Flags: needinfo?(mhoye)
I confirm comment 10. I went out in Paris with full H+ coverage during an entire day (no subway, no tunnel, Wi-Fi disabled), I end up to have no update for more than 5 hours straight. After noticing it, I performed a manual refresh.

Regarding Wi-Fi sleep, I have it enabled. But even if I turn my device on multiple times a day, and I leave it the time to connect to the Wi-Fi, my inbox ends up not to be updated for 10 hours.
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #11)
> I confirm comment 10. I went out in Paris with full H+ coverage during an
> entire day (no subway, no tunnel, Wi-Fi disabled), I end up to have no
> update for more than 5 hours straight. After noticing it, I performed a
> manual refresh.

is it a regression? It was supposed to work, right?
Flags: needinfo?(amarchesini)
QA wanted for a branch check.
Keywords: qawanted
(In reply to Andrea Marchesini (:baku) from comment #12) 
> is it a regression? It was supposed to work, right?

I'm not sure it is. As far as I remember, the Flame didn't show the issue. Let's verify that with a branch check.
This issue is reproducible on Aries, Flame 2.5, Flame 2.2:

My STR:
1) Connect to both Wifi and Cell Data, disconnect from USB the entire time
2) Log in to a gmail account, set auto sync interval to 5 min
3) Press hardware power button to turn screen off while viewing an email, and start a stopwatch
4) From PC send an email to the account at step 2 within 5 min after step 3
5) Wake the phone by pressing power button after 10 min, check how long ago last sync was, and whether or not the email at step 4 was auto retrieved during screen off time

Bug behavior: Last sync time was however long ago step 3 was at, and email at step 4 was not auto retrieved.

Device: Aries (dogfood debug build)
BuildID: 20150717151354
Gaia: 8c009877aff6b8b2f4a60756e2d09c0182393721
Gecko: 911935404233
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 42.0a1 (2.5 Master)
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

Device: Flame (full flashed, 319MB, KK)
BuildID: 20150717010206
Gaia: 77bc0d940bde2a5d2d4dfadfcccc6d8d77456d36
Gecko: 8d262d1d0ae5
Gonk: a4f6f31d1fe213ac935ca8ede7d05e47324101a4
Version: 42.0a1 (2.5 Master) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

Device: Flame (full flashed, 319MB, KK)
BuildID: 20150717002506
Gaia: e1e6317f17a840b19af9dbb25f5a771d8d9fa161
Gecko: 418b2388edb0
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

--------

This issue does NOT reproduce on Flame 2.1. Observed behavior: At around 6 minutes after I started the stopwatch, I received a sound notification (screen did not turn on) indicating some sort of activity, and after 10 minutes when I turned on screen, email was auto retrieved and shown on lockscreen as notification, as well as shown in email inbox. Last sync time was 1 min ago.

Device: Flame (KK, 319MB, full flashed)
BuildID: 20150717001205
Gaia: cc24cce17ab2ebf79f6505103da714fc65bc5ec1
Gecko: fe4c34c770e5
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 34.0 (2.1) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawantedregression
:baku, perhaps it is an issue with whatever timing mechanism that request-sync uses to check when to notify apps? Perhaps the mechanism backing it gets shut down in these low power, no USB connection sleep modes, so the request-sync emitting to apps does not occur?

I am going to look at a backout of email using request-sync and go back to mozAlarms, but it would be good to get to the root of the problem in case others have started to use request-sync. It will likely be an issue going forward too.

Also pretty sure it will not be a clean backout as the email app will need to make sure to deregister any registered request-syncs.
Flags: needinfo?(amarchesini)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
I turned on the debug() messages for the RequestSync*.js(m) files in a build of gecko, and I see this when first registering a request-sync:

V/WLAN_PSA(  214): NL MSG, len[048], NL type[0x11] WNI type[0x5050] len[028]
I/GeckoDump( 6808): LOG: sync interval changed to 300000
I/Gecko   ( 6808): WLOG: cronsync: ensureSync called
I/Gecko   ( 6808): 
I/GeckoDump( 6808): LOG: cronsync-main: ensureSync called
I/Gecko   ( 6808): DEBUG RequestSyncScheduler: registrations
I/GeckoDump( 6808): LOG: WRITING LOCAL STORAGE ITEM: data_has_account yes
I/Gecko   ( 6171): DEBUG RequestSyncService: receiveMessage
I/Gecko   ( 6171): DEBUG RequestSyncService: registrations
I/Gecko   ( 6808): DEBUG RequestSyncScheduler: receiveMessage
I/GeckoDump( 6808): LOG: cronsync-main: success!
I/Gecko   ( 6808): DEBUG RequestSyncScheduler: register
I/Gecko   ( 6171): DEBUG RequestSyncService: receiveMessage
I/Gecko   ( 6171): DEBUG RequestSyncService: register
I/Gecko   ( 6171): DEBUG RequestSyncService: dbTxn
I/Gecko   ( 6171): DEBUG RequestSyncService: pendingOperationStarted
I/Gecko   ( 6171): DEBUG RequestSyncService: pendingOperationDone
I/Gecko   ( 6171): DEBUG RequestSyncService: addRegistration
I/Gecko   ( 6171): DEBUG RequestSyncService: scheduleTimer
I/Gecko   ( 6808): DEBUG RequestSyncScheduler: receiveMessage
I/GeckoDump( 6808): LOG: cronsync-main: success: navigator.sync.register for IDs: 0 at 300000ms
I/GeckoDump( 6808): LOG: cronsync-main: ensureSync completed
I/Gecko   ( 6808): WLOG: cronsync: received an syncEnsured via a message handler

... later, after device was awoken

I/Gecko   ( 6171): DEBUG RequestSyncService: timeout
I/Gecko   ( 6171): DEBUG RequestSyncService: Pending messages: false
I/Gecko   ( 6171): DEBUG RequestSyncService: Sending message.
I/GeckoDump( 6808): LOG: mozSetMessageHandler: received a request-sync
I/GeckoDump( 6808): LOG: cronsync-main: wake locks acquired: [object MozWakeLock] for account IDs: 0
I/GeckoDump( 6808): LOG: cronsync-main: request-sync started at Mon Jul 20 2015 15:40:55 GMT-0700 (PDT)
I/Gecko   ( 6808): WLOG: cronsync: received a requestSync via a message handler

However, if the device is unplugged from USB and is just off for the 5 minute sync interval, the "RequestSyncService: timeout" does not show up, and no sync happens.

It looks like the Cc["@mozilla.org/timer;1"] in RequestSyncService's createTimer() method does not fire the timeout method.

So the bug seems to be related to getting the timer RequestSyncService sets to fire. Maybe instead of using a Cc["@mozilla.org/timer;1"], a mozAlarm (or its underpinnings) to get a reliable firing of the RequestSyncService timer would be a way to fix it. Or some setting on the Cc["@mozilla.org/timer;1"] to allow it to still fire in the phone sleep mode? I am speculating, likely not helpful as I do not know the options in gecko code.

Moving this bug over to Core::DOM for better tracking of a fix. A fix for this issue might also need to be ported to the B2G 2.2 branch(es), as QA was able to repro in that branch. This bug is also affecting foxfooding.
Component: Gaia::E-Mail → DOM
Product: Firefox OS → Core
Summary: Email won't check for updates if email is open when you shut off the phone. → [request-sync] Email won't check for updates if email is open when you shut off the phone.
Component: DOM → DOM: Device Interfaces
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][foxfood-traige]
QA Whiteboard: [QAnalyst-Triage+][foxfood-traige] → [QAnalyst-Triage+][foxfood-triage]
Summary: [request-sync] Email won't check for updates if email is open when you shut off the phone. → [request-sync] request-sync uses an nsITimer for wake-up instead of a mozAlarm, so request-sync will never cause the device to wake up from power-saving sleep
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Duplicate of this bug: 1176902
Duplicate of this bug: 1176902
The patch is written but I want to test it with the emulator and compiling b2g from scratch is taking a while.
:baku, apologies for the ping as I expect you are very busy. However, since this also affects 2.2 and email uses it on 2.2, I thought it best if a resolution could be reached and see about a 2.2 uplift before the 2.2 window closed. I think it may be hard as it is given that 2.2 is considered done, but as far as I know has not been shipped with any phones yet so there still may be hope.
Flags: needinfo?(amarchesini)
[Triage] We should be blocked by this due to it's out of functionality
blocking-b2g: 2.5? → 2.5+
James, you are totally right. I wrote the patch but I'm having hard time to test it with the emulator because of some issue with ssltunnel and marionette server. I try again today and in case, I'll ask for help on IRC.
Flags: needinfo?(amarchesini)
Attached patch rs2.patchSplinter Review
Basically I replace a timer with a AlarmService timer ID.
Attachment #8647606 - Flags: review?(nsm.nikhil)
Comment on attachment 8647606 [details] [diff] [review]
rs2.patch

Review of attachment 8647606 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/requestsync/RequestSyncService.jsm
@@ +6,5 @@
>  
>  const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;
>  
>  function debug(s) {
> +  dump('DEBUG RequestSyncService: ' + s + '\n');

I'll comment it out again. I just used it for debugging.
Comment on attachment 8647606 [details] [diff] [review]
rs2.patch

Review of attachment 8647606 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/requestsync/RequestSyncService.jsm
@@ +874,5 @@
>      let self = this;
> +    AlarmService.add(
> +      { date: new Date(Date.now() + interval * 1000),
> +        ignoreTimezone: true },
> +      function() { self.timeout(aObj); },

Could you just switch to fat arrow syntax for all of these and get rid of self?

AlarmService.add({ ... },
  () => this.timeout(aObj),
  aTimerId => this._timers[aObj.dbKey] = aTimerId)

The error callback can be null. Thanks!
Attachment #8647606 - Flags: review?(nsm.nikhil) → review+
ignoreTimezone does the opposite of what this patch thinks it does; we want it set to false, I think.  Here's the code from AlarmService.jsm:

    // For an alarm specified with "ignoreTimezone", it must be fired respect
    // to the user's timezone.  Supposing an alarm was set at 7:00pm at Tokyo,
    // it must be gone off at 7:00pm respect to Paris' local time when the user
    // is located at Paris.  We can adjust the alarm UTC time by calculating
    // the difference of the orginal timezone and the current timezone.
    if (aAlarm.ignoreTimezone) {
      alarmTime +=
        (this._currentTimezoneOffset - aAlarm.timezoneOffset) * 60000;
    }
    return alarmTime;
Comment on attachment 8647606 [details] [diff] [review]
rs2.patch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): RequestSync API
User impact if declined: the scheduling of some events can be broken.
Testing completed: yes
Risk to taking this patch (and alternatives if risky): low.
String or UUID changes made by this patch: none
Attachment #8647606 - Flags: approval-mozilla-b2g37?
https://hg.mozilla.org/integration/mozilla-inbound/rev/5f2d07e6f367

mochitests have to be disabled for android when AlarmService is involved.
https://hg.mozilla.org/mozilla-central/rev/5f2d07e6f367
https://hg.mozilla.org/mozilla-central/rev/1e3f628235f7
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Blocks: 1194978
Flags: needinfo?(amarchesini)
Tested on latest nightly flame-kk engineering build, using email 10 minute sync interval, not plugged in to USB, and it works for me.

Since this affects 2.2, is it possible to get it uplifted to mozilla-b2g37_v2_2, will try with setting blocking-b2g to 2.2? and see how that goes.
blocking-b2g: 2.5+ → 2.2?
Per guidance by :doliver, asking :josh if he knows if uplifts are still considered for mozilla-b2g37_v2_2.

:josh, without some sort of fix (either this gecko one or a gaia one changing the email app to go back to mozAlarms, which would be a more invasive change) background sync of email messages will likely not occur in deep sleep/USB-disconnected usage.
Flags: needinfo?(jocheng)
Hi James,
Please raise uplift request for 2.2 and Mahi the RM will decide whether to accept this patch for 2.2/2.2r.
Thanks.
Flags: needinfo?(jocheng) → needinfo?(jrburke)
Flags: needinfo?(mpotharaju)
For raising the uplift request, it looks like approval‑mozilla‑b2g37? is set on the fix attachment, hopefully that is enough, not sure if approval‑mozilla‑b2g37_v2_2r? is needed too. Will wait to see what Mahi says about it.
Flags: needinfo?(jrburke)
Comment on attachment 8647606 [details] [diff] [review]
rs2.patch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 
Testing completed: 
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch:
Flags: needinfo?(mpotharaju)
Attachment #8647606 - Flags: approval‑mozilla‑b2g37_v2_2r?
Approved for uplift. Please uplift to 2.2/2.2R.


Thanks
You need to change the approval flags on the attachment from ? to + if you expect it to show up in the uplift queries.
Flags: needinfo?(mpotharaju)
Comment on attachment 8647606 [details] [diff] [review]
rs2.patch

Changed ? to + on approval flags. 

Thanks Ryan
Flags: needinfo?(mpotharaju)
Attachment #8647606 - Flags: approval‑mozilla‑b2g37_v2_2r?
Attachment #8647606 - Flags: approval‑mozilla‑b2g37_v2_2r+
Attachment #8647606 - Flags: approval-mozilla-b2g37?
Attachment #8647606 - Flags: approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.