Closed Bug 1197625 Opened 9 years ago Closed 9 years ago

Sync-on-wake doesn't work correctly if there is no network available immediately after a wake.

Categories

(Firefox :: Sync, defect, P2)

All
macOS
defect

Tracking

()

RESOLVED FIXED
Iteration:
44.2 - Oct 19
Tracking Status
firefox45 --- fixed

People

(Reporter: rnewman, Assigned: markh)

References

Details

Attachments

(1 file, 1 obsolete file)

1. Sleep your Mac for a protracted period of time, to ensure that no backoffs or timers might impede a post-wake sync.
2. Send a tab from another device to your Mac. Force a sync if necessary to be sure.
3. Wake your Mac. Give it some time, click around.
4. Tools > Sync Now.


Expected: after 3, the sent tab opens.
Actual: after 4, the sent tab opens.


We're supposed to listen for wake:

https://dxr.mozilla.org/mozilla-central/source/services/sync/modules/policies.js#99


and the log (below) shows that we do indeed detect it… but the sync doesn't work until forced.

This really impacts the perceived reliability of Send Tab.



1440359435612	Sync.SyncScheduler	DEBUG	Woke from sleep.
1440359435613	Sync.SyncScheduler	DEBUG	More than 1 client. Syncing.
1440359435613	Sync.Service	DEBUG	User-Agent: Firefox/43.0a1 FxSync/1.45.0.20150822030206.
1440359435613	Sync.Service	INFO	Starting sync at 2015-08-23 12:50:35
1440359435614	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
1440359435615	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
1440359435615	browserwindow.syncui	DEBUG	onActivityStart with numActive: 0
1440359435615	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435616	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
1440359435616	browserwindow.syncui	DEBUG	onActivityStart with numActive: 0
1440359435616	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435617	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
1440359435617	browserwindow.syncui	DEBUG	onActivityStart with numActive: 0
1440359435617	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435618	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
1440359435619	browserwindow.syncui	DEBUG	onActivityStart with numActive: 0
1440359435621	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435623	Sync.SyncScheduler	DEBUG	Clearing sync triggers and the global score.
1440359435624	Sync.Status	INFO	Resetting Status.
1440359435624	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1440359435624	Sync.Status	DEBUG	Status.sync: success.sync => error.login.reason.network
1440359435624	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed
1440359435625	browserwindow.syncui	DEBUG	observed: weave:service:sync:error
1440359435625	browserwindow.syncui	DEBUG	observed: weave:service:sync:error
1440359435625	browserwindow.syncui	DEBUG	onActivityStop with numActive: 1
1440359435625	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435626	browserwindow.syncui	DEBUG	observed: weave:service:sync:error
1440359435627	browserwindow.syncui	DEBUG	onActivityStop with numActive: 1
1440359435627	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435627	browserwindow.syncui	DEBUG	observed: weave:service:sync:error
1440359435627	browserwindow.syncui	DEBUG	onActivityStop with numActive: 1
1440359435627	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435627	browserwindow.syncui	DEBUG	observed: weave:service:sync:error
1440359435627	browserwindow.syncui	DEBUG	onActivityStop with numActive: 1
1440359435627	browserwindow.syncui	DEBUG	_loginFailed has sync state=success.login
1440359435628	Sync.SyncScheduler	DEBUG	Next sync in 3600000 ms.
1440359435628	Sync.ErrorHandler	ERROR	Sync encountered an error
Flags: firefox-backlog+
Priority: -- → P1
Assignee: nobody → markh
Iteration: --- → 43.2 - Sep 7
This is possibly caused by Bug 1197152.
Depends on: 1197152
You can verify by using the debug patch there that dumps out the timer values that are modified, or use the disable patch and see if it fixes it
I can't reproduce this - on my Mac I see:

1441159299684	Sync.SyncScheduler	DEBUG	Woke from sleep.
1441159299685	Sync.SyncScheduler	DEBUG	More than 1 client. Syncing.
1441159299685	Sync.Service	DEBUG	User-Agent: Firefox/43.0a1 FxSync/1.45.0.20150902101036.
1441159299686	Sync.Service	INFO	Starting sync at 2015-09-02 12:01:39
1441159299687	browserwindow.syncui	DEBUG	observed: weave:service:sync:start
...
1441159299695	Sync.BrowserIDManager	INFO	Getting an assertion from: https://token.services.mozilla.com/1.0/sync/1.5
...
1441159305795	Sync.BrowserIDManager	DEBUG	Successfully got a sync token
1441159307061	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-231-us-west-2.sync.services.mozilla.com/1.5/25821319/info/collections
1441159307061	Sync.Resource	DEBUG	GET success 200 https://sync-231-us-west-2.sync.services.mozilla.com/1.5/25821319/info/collections

And Sync continues as normal.

(In reply to Chris Karlof [:ckarlof] from comment #1)
> This is possibly caused by Bug 1197152.

I'm not sure about that - IIUC, this isn't relying on timers, just on a notification we get about being woken.

So from Richard's report:

1440359435623	Sync.SyncScheduler	DEBUG	Clearing sync triggers and the global score.
1440359435624	Sync.Status	INFO	Resetting Status.
1440359435624	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1440359435624	Sync.Status	DEBUG	Status.sync: success.sync => error.login.reason.network
1440359435624	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed

The logging is letting us down, but it seems possible the issue simply *is* a network error. I'm not sure how the wireless hardware works with sleep, but it seems possible in my testing the network adapter itself wasn't fully asleep, so by the time Firefox woke the network was established - but after a long sleep I could imagine the network adapter being fully powered down, thus taking longer to wake and reconnect (eg, I occasionally see it take a number of seconds after wake), so Firefox is simply attempting to synch before the network is up, and Sync doesn't make any attempt to either mitigate or retry that.

Richard, does that sound feasible to you? Are you able to reproduce this every time?
Flags: needinfo?(rnewman)
Iteration: 43.2 - Sep 7 → ---
Priority: P1 → P2
I'm still seeing this happen (e.g., this morning in London).

Trying to repro now.
This does not _always_ fail. If I drop lastSync to a few hours ago, sleep then immediately wake the machine while the wireless network is live and happy, I'll get an instant sync.

So if I had to suggest STR, it'd be:

* Sleep the machine at work.
* Go home and make dinner.
* Wake the machine.

It'll take a few seconds to find the new wireless network, and in that time the sync will fail.

Probably the right thing to do is for Sync to listen to network change notifications, and also to speculatively check for a working network prior to actually running a sync on wake -- failure due to trying too early shouldn't cause an error log to be written. If there's no network, the network change event should trigger a sync a few seconds later.
Flags: needinfo?(rnewman)
Summary: Sync-on-wake no longer works (on Mac?), reporting _loginFailed has sync state=success.login and network failure → Sync-on-wake doesn't work correctly if there is no network available immediately after a wake.
Iteration: --- → 44.1 - Oct 5
(In reply to Richard Newman [:rnewman] from comment #5)
> Probably the right thing to do is for Sync to listen to network change
> notifications,

Best I can tell, these are unreliable. The offline and connectivity attributes of the IOService no longer attempt to track network connection states and the nsINetworkLinkService can get confused when there are multiple networks (ie, it checks that any network thinks it is connected, not whether it has connectivity). The (somewhat new) captive portal code also knows how to check for internet connectivity by hitting a specific endpoint, but for reasons I haven't explored, always tells me the captive portal state is "unknown".

But - I suspect nsINetworkLinkService probably is OK for the vast majority of users and seems to be the best we have.

This patch is untested on OSX, so only requesting feedback for now. It also extends all the "are we offline" checks to include nsINetworkLinkService.isLinkUp, which isn't strictly necessary but should hopefully make our offline-aware code actually do something again.

Richard, what do you think?
Attachment #8664661 - Flags: feedback?(rnewman)
Comment on attachment 8664661 [details] [diff] [review]
0001-Bug-1197625-if-we-resume-from-sleep-when-no-network-.patch

Review of attachment 8664661 [details] [diff] [review]:
-----------------------------------------------------------------

Looks sane to me!

::: services/sync/modules/policies.js
@@ +258,5 @@
>          this._log.debug("Woke from sleep.");
>          Utils.nextTick(() => {
>            // Trigger a sync if we have multiple clients.
>            if (this.numClients > 1) {
> +            let lns = Cc["@mozilla.org/network/network-link-service;1"]

nls?

@@ +575,5 @@
> +  /**
> +   * Return true if the app appears to have no network connectivity, either
> +   * due to the user explicitly entering offline mode or for any other reason.
> +   */
> +  get offline() {

isOffline

@@ +586,5 @@
> +    // But there is a network link service that tracks if there is any network
> +    // adaptor connected. Sadly this too is unreliable - eg, an adaptor on a
> +    // local private network (eg, a VMWare local network) may be technically
> +    // connected but still unable to hit the public network - but it should
> +    // only be unreliable in terms of indicating we online when we aren't but

s/online/are online
Attachment #8664661 - Flags: feedback?(rnewman) → feedback+
Status: NEW → ASSIGNED
Sadly that patch looks like it will be unreliable - it seems the order of the link change notifications and the wake notification is indeterminate - ie, that we are likely to see wake, followed by notification the network is down - by which point we will have already tried to start syncing.

So I think we need to get a bit smarter about how we handle network failure and rescheduling when we think the network may be back up.
Iteration: 44.1 - Oct 5 → 44.2 - Oct 19
Not sure if it helps, but on some platforms we're hooking into the system network status to drive the value of navigator.onLine. Might be able to use that same hook to determine whether to trigger sync.
Mark, how would you feel about just a few-second delay here as a stopgap? It seems like that would cover most home networks, maybe a large portion of WPA2 LDAP-authenticated work networks, and generally make life better without much work…

Keen to not make perfect the enemy of good.
Flags: needinfo?(markh)
(In reply to Richard Newman [:rnewman] from comment #10)
> Mark, how would you feel about just a few-second delay here as a stopgap? It
> seems like that would cover most home networks, maybe a large portion of
> WPA2 LDAP-authenticated work networks, and generally make life better
> without much work…
> 
> Keen to not make perfect the enemy of good.

While I think the scheduler needs love for various reasons, I don't see me finishing that work for a couple of weeks, so yeah, I agree that would be a reasonable short-term fix. Leaving ni? on myself to actually do it (but feel free to take it ;)
schedule a sync on wake after 5 seconds.
Attachment #8664661 - Attachment is obsolete: true
Flags: needinfo?(markh)
Attachment #8684726 - Flags: review?(rnewman)
Comment on attachment 8684726 [details] [diff] [review]
0001-Bug-1197625-delay-sync-on-wake-for-5-seconds-in-the-.patch

Review of attachment 8684726 [details] [diff] [review]:
-----------------------------------------------------------------

Rock n roll.
Attachment #8684726 - Flags: review?(rnewman) → review+
https://hg.mozilla.org/mozilla-central/rev/6eb43418296a
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 1416322
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: