Last Comment Bug 671378 - Sync interval doesn't adjust after error during initial Sync on additional device
: Sync interval doesn't adjust after error during initial Sync on additional de...
Status: VERIFIED FIXED
[verified in services]
: regression
Product: Cloud Services
Classification: Client Software
Component: Firefox Sync: Backend (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla8
Assigned To: Marina Samuel [:emtwo]
:
:
Mentors:
Depends on:
Blocks: 664792
  Show dependency treegraph
 
Reported: 2011-07-13 11:56 PDT by Philipp von Weitershausen [:philikon]
Modified: 2011-08-03 10:58 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
fixed


Attachments
v1: updateClientMode() on sync:error (3.64 KB, patch)
2011-07-13 12:52 PDT, Marina Samuel [:emtwo]
no flags Details | Diff | Splinter Review
v2: updateClientMode() on sync:error & adjust syncTimer appropriately (11.18 KB, patch)
2011-07-20 15:13 PDT, Marina Samuel [:emtwo]
philipp: review+
Details | Diff | Splinter Review
v3: updateClientMode() on sync:error & adjust syncTimer appropriately (11.31 KB, patch)
2011-07-21 11:45 PDT, Marina Samuel [:emtwo]
philipp: review+
asa: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Philipp von Weitershausen [:philikon] 2011-07-13 11:56:00 PDT
Let's say the first sync (on an additional device) fails somehow so, by default, we're in single device mode:

1310582104951	Sync.Service	TRACE	Event: weave:service:sync:error
1310582104984	Sync.SyncScheduler	TRACE	Adjusting syncInterval to singleDeviceInterval.
1310582105009	Sync.SyncScheduler	TRACE	Next sync in 86400 sec.

That all makes sense. Now for some reason the problem has gone away, and we sync successfully. This is the excerpt from that successful sync:

1310582546663	Sync.SyncScheduler	DEBUG	Client count: 0 -> 3
1310582546664	Sync.SyncScheduler	TRACE	Adjusting syncThreshold to MULTI_DEVICE_THRESHOLD
1310582546664	Sync.SyncScheduler	TRACE	Adjusting syncInterval to immediateInterval.
1310582546664	Sync.Service	INFO	Updating enabled engines: 3 clients.
1310582546666	Sync.Engine.Bookmarks	TRACE	Event: weave:engine:sync:start
1310582546669	Sync.Engine.Bookmarks	DEBUG	Engine syncIDs: aR0ssBQLNy5T,5xzEZ69rbuxh
1310582546670	Sync.Engine.Bookmarks	DEBUG	Resetting bookmarks last sync time
1310582546672	Sync.Engine.Bookmarks	DEBUG	First sync, uploading all items
1310582546677	Sync.Store.Bookmarks	TRACE	Creating SQL statement: SELECT guid FROM moz_bookmarks WHERE id = :item_id
1310582546838	Sync.Tracker.Bookmarks	TRACE	Clearing changed ID list
1310582546839	Sync.Engine.Bookmarks	INFO	145 outgoing items pre-reconciliation
1310582546962	Sync.Engine.Bookmarks	TRACE	Downloading & applying server changes
1310582547077	Sync.Tracker.Tabs	TRACE	onTab event: TabSelect
1310582547164	Sync.SyncScheduler	TRACE	bookmarks: score: 0
1310582547164	Sync.SyncScheduler	TRACE	forms: score: 0
1310582547165	Sync.SyncScheduler	TRACE	history: score: 0
1310582547165	Sync.SyncScheduler	TRACE	passwords: score: 0
1310582547165	Sync.SyncScheduler	TRACE	prefs: score: 0
1310582547166	Sync.SyncScheduler	TRACE	tabs: score: 1
1310582547167	Sync.SyncScheduler	TRACE	Global score updated: 5
1310582547167	Sync.SyncScheduler	TRACE	_checkSync returned "".
1310582547180	Sync.SyncScheduler	TRACE	Next sync in 86399 sec.

Why are we still only syncing once per day now? syncInterval is immediateInterval now!
Comment 1 Marina Samuel [:emtwo] 2011-07-13 12:52:01 PDT
Created attachment 545727 [details] [diff] [review]
v1: updateClientMode() on sync:error
Comment 2 Philipp von Weitershausen [:philikon] 2011-07-13 13:26:10 PDT
Summarizing face to face discussion:

The problem is that adjustSyncInterval doesn't actually change the timer, so what we should do is call scheduleNextSync from there (if we ended up changing the interval, that is).

This has two implications, though:

* scheduleNextSync uses nextSync to track whether we already have a timer going. From looking at the code, it will prevent the timeout from being reduced (e.g. going from 86400 to 300 seconds). We should change that and allow a reduction of the timeout (but not an increase). We should also ensure (via a test) that if the new timeout is greater than the time the timer has already done, the sync gets triggered (immediately-ish).

* adjustSyncInterval is called from the "idle" observer which we don't remove after somebody's stopped using sync. That means we would schedule syncs past that point, which isn't a good idea. To fix, we should

  - only register SyncScheduler as an idle observer in SyncScheduler.init() if Status.checkSetup() == STATUS_OK
  - register it on weave:service:setup-complete
  - unregsiter it on weave:service:start-over
Comment 3 Philipp von Weitershausen [:philikon] 2011-07-13 13:28:27 PDT
(In reply to comment #2)
>   - unregsiter it on weave:service:start-over

We can also move the SyncScheduler.setDefaults() call from Service.startOver to that observer, making Service a bit more agnostic of SyncScheduler.
Comment 4 Philipp von Weitershausen [:philikon] 2011-07-13 15:07:03 PDT
Comment on attachment 545727 [details] [diff] [review]
v1: updateClientMode() on sync:error

>       case "weave:service:sync:error":
>+        // There may be multiple clients on first sync but if the sync fails
>+        // client mode should still be updated so that the next sync has a correct interval.
>+        this.updateClientMode();
>         this.adjustSyncInterval();
>         this.handleSyncError();
>         break;

As discussed, this patch addresses a genuine problem, but not the first sync problem. Please adjust the comment accordingly.

>+  Records.clearCache();
>+  Svc.Prefs.resetBranch("");
>+  SyncScheduler.setDefaults();
>+  Clients.resetClient();

Please use Service.startOver() to clean up (replaces all of these).
Comment 5 Marina Samuel [:emtwo] 2011-07-14 08:43:26 PDT
(In reply to comment #2)
> Summarizing face to face discussion:
> 
> The problem is that adjustSyncInterval doesn't actually change the timer, so
> what we should do is call scheduleNextSync from there (if we ended up
> changing the interval, that is).

As I started coding this up, testing, etc. I realized that I don't think this part is necessary. 

I looked through the code for every call to adjustSyncInterval and we already have a call to scheduleNextSync everywhere (except on "idle") alongside a call to adjustSyncInterval. And by everywhere I mean on sync:finish and sync:error. I believe the reason this bug appeared wasn't because adjustSyncInterval didn't scheduleNextSync as well but because of the first point you mention next.

Also, a similar issue to this came up when we talked about going from active to idle state and having to wait for one more sync to happen before the timer got updated to idle. Since we won't be allowing the timeout to increase (as you mentioned in the next point), this suggested change won't affect that issue.

> * scheduleNextSync uses nextSync to track whether we already have a timer
> going. From looking at the code, it will prevent the timeout from being
> reduced (e.g. going from 86400 to 300 seconds). We should change that and
> allow a reduction of the timeout (but not an increase). 

This change seems good enough for this bug since we already call scheduleNextSync or scheduleAtInterval on sync:error through handleSyncError.  

> We should also
> ensure (via a test) that if the new timeout is greater than the time the
> timer has already done, the sync gets triggered (immediately-ish).

I'm not exactly sure what you mean here. Why would we be syncing immediately-ish? Don't we just sync when the current timer is up and not use the new timer?

> * adjustSyncInterval is called from the "idle" observer which we don't
> remove after somebody's stopped using sync. That means we would schedule
> syncs past that point, which isn't a good idea. To fix, we should
> 
>   - only register SyncScheduler as an idle observer in SyncScheduler.init()
> if Status.checkSetup() == STATUS_OK
>   - register it on weave:service:setup-complete
>   - unregsiter it on weave:service:start-over

I think that if it's possible to not have to call scheduleNextSync from adjustSyncInterval it would be better since (1) it would avoid doing this (2) It's technically already being called when we need it to be and (3) you said we usually only call scheduleNextSync on sync:finish etc. and this is more of an "internal" or "system" call.

Of course I could be missing something here but I just wanted to share my analysis thus far and I will attempt to prove this works with tests etc.
Comment 6 Philipp von Weitershausen [:philikon] 2011-07-14 12:10:50 PDT
(In reply to comment #5)
> > * scheduleNextSync uses nextSync to track whether we already have a timer
> > going. From looking at the code, it will prevent the timeout from being
> > reduced (e.g. going from 86400 to 300 seconds). We should change that and
> > allow a reduction of the timeout (but not an increase). 
> 
> This change seems good enough for this bug since we already call
> scheduleNextSync or scheduleAtInterval on sync:error through
> handleSyncError. 

Yes, you're right.

> > We should also
> > ensure (via a test) that if the new timeout is greater than the time the
> > timer has already done, the sync gets triggered (immediately-ish).
> 
> I'm not exactly sure what you mean here. Why would we be syncing
> immediately-ish? Don't we just sync when the current timer is up and not use
> the new timer?

Well, there's only one timer. Utils.namedTimer detects if a timer is already there and then just modifies its delay attribute. I think it will Just Work(tm), but we should verify that the following scenario works:

* A timer is set for, say, 86400 seconds
* 10 minutes we change the Sync interval in adjustSyncInterval to 5 minutes and then call scheduleNextSync to change the timer's delay.
* We sync 5 minutes later.

(Use more appropriate unites in unit test, of course :))

> > * adjustSyncInterval is called from the "idle" observer which we don't
> > remove after somebody's stopped using sync. That means we would schedule
> > syncs past that point, which isn't a good idea. To fix, we should
> > 
> >   - only register SyncScheduler as an idle observer in SyncScheduler.init()
> > if Status.checkSetup() == STATUS_OK
> >   - register it on weave:service:setup-complete
> >   - unregsiter it on weave:service:start-over
> 
> I think that if it's possible to not have to call scheduleNextSync from
> adjustSyncInterval it would be better since (1) it would avoid doing this
> (2) It's technically already being called when we need it to be and (3) you
> said we usually only call scheduleNextSync on sync:finish etc. and this is
> more of an "internal" or "system" call.

Yes. But I think we should still ensure the SyncScheduler is only registered as an idle observer when we need it to be (also gives us the opportunity to do the clean up mentioned in comment 3).
Comment 7 Marina Samuel [:emtwo] 2011-07-18 10:36:06 PDT
(In reply to comment #3)
> (In reply to comment #2)
> >   - unregsiter it on weave:service:start-over
> 
> We can also move the SyncScheduler.setDefaults() call from Service.startOver
> to that observer, making Service a bit more agnostic of SyncScheduler.

is the weave:service:start-over observer you're talking about the one in browser-syncui.js? If so, isn't moving SyncScheduler.setDefaults() to this observer worse than keeping it in Service.startOver since it's now in the UI code?
Comment 8 Philipp von Weitershausen [:philikon] 2011-07-20 13:32:36 PDT
(In reply to comment #7)
> is the weave:service:start-over observer you're talking about the one in
> browser-syncui.js? If so, isn't moving SyncScheduler.setDefaults() to this
> observer worse than keeping it in Service.startOver since it's now in the UI
> code?

I meant having SyncScheduler observe weave:service:start-over and then calling this.setDefaults() there, as well as having it unregister itself as an idle observer.
Comment 9 Marina Samuel [:emtwo] 2011-07-20 15:13:04 PDT
Created attachment 547275 [details] [diff] [review]
v2: updateClientMode() on sync:error & adjust syncTimer appropriately

In case the comments in test_adjust_timer are not clear enough, here are the steps that reproduce this bug exactly:

1) Sync - sets nextSync = 0.
2) scheduleNextSync is called before updateClientMode is called during the sync (eg. due to a score increment) - sets nextSync to a non-zero value using the initial large interval.
3) updateClientMode is called during the sync - sets the syncInterval to a smaller value.
4) scheduleNextSync (due to score increment or sync:finish)is called again but does not change the timer since nextSync is now a non-zero value.

the fix as discussed is to reset the timer to syncInterval if syncInterval is smaller than time left.
Comment 10 Philipp von Weitershausen [:philikon] 2011-07-20 18:13:40 PDT
Comment on attachment 547275 [details] [diff] [review]
v2: updateClientMode() on sync:error & adjust syncTimer appropriately

>+add_test(function test_adjust_timer() {
>+  _("Test that if syncInterval is smaller than current timout period it is used.");
>+  // Test scenario similar to bug 671378. nextSync value is set to non-zero
>+  // value prior to a call to adjustSyncInterval during a sync. Test on 
>+  // sync:finish syncTimer.delay is set to the min of syncInterval & current delay.

Can you please clarify this a bit (including the name of the test). The problem was that I apparently got a score update that wasn't quite big enough to trigger a sync, to scheduleNextSync() was called without an explicit time interval, which set nextSync and prevented the actual sync scheduling after the first sync to adjust the timer.

>+  // Wrap scheduleNextSync so we are notified when it is finished.
>+  SyncScheduler._scheduleNextSync = SyncScheduler.scheduleNextSync;
>+  SyncScheduler.scheduleNextSync = function() {
>+    SyncScheduler._scheduleNextSync();
>+    Svc.Obs.notify("scheduleNextSync:finish");
>+  };
...
>+  // Check on sync:finish scheduleNextSync sets the appropriate
>+  // syncInterval and syncTimer values.
>+  Svc.Obs.add("scheduleNextSync:finish", function onSchedNextSyncFinish() {
>+    if (syncSuccesses == 2) {
...

If you're already monkey patching scheduleNextSync, you might as well avoid the observer and put the if clause there.

>+  // Set nextSync != 0 
>+  // syncInterval still hasn't been set by call to updateClientMode.
>+  Svc.Obs.add("weave:service:sync:start", function onSyncStart() {

Please add a comment here that we're explicitly trying to invoke scheduleNextSync during a sync (a score update that's not quite big enough)

r=me with that.
Comment 11 Marina Samuel [:emtwo] 2011-07-21 11:45:29 PDT
Created attachment 547455 [details] [diff] [review]
v3: updateClientMode() on sync:error & adjust syncTimer appropriately

* addressing comment 10
* applied patch on top of bug 668622 with resolutions
Comment 12 Philipp von Weitershausen [:philikon] 2011-07-21 12:16:21 PDT
Comment on attachment 547455 [details] [diff] [review]
v3: updateClientMode() on sync:error & adjust syncTimer appropriately

Rock. r=me

Requesting approval for Aurora since this is a regression caused by bug 664792 which is in Aurora. This fixes an edge case in certain error cases, so the risk is minimal.
Comment 13 Philipp von Weitershausen [:philikon] 2011-07-21 12:23:55 PDT
http://hg.mozilla.org/services/services-central/rev/7648521ecb5a
Comment 14 Philipp von Weitershausen [:philikon] 2011-07-21 12:29:49 PDT
STRs for QA:

1. Connect a new profile to an existing Sync account that has other devices connected to it but that will cause the first sync to produce an error somehow (e.g. it has invalid records or HMAC mismatches or similar)
2. During the initial sync, cause minor score updates, e.g. by switching tabs a lot and/or producing some new history.
3. Ensure that the sync interval is no longer singleDeviceInteral (but in all likelihood activeInterval)
Comment 15 Asa Dotzler [:asa] 2011-07-21 14:50:52 PDT
Comment on attachment 547455 [details] [diff] [review]
v3: updateClientMode() on sync:error & adjust syncTimer appropriately

We're going to take this fix, but we're also now deep into Aurora and this has the new instant sync feature on our radar as potentially not ready for 7. I'm crossing my fingers that we don't find any additional blockers in the feature because we're now in that part of the release where turning off or backing out incomplete features is very appealing.
Comment 16 Philipp von Weitershausen [:philikon] 2011-07-21 23:23:31 PDT
(In reply to comment #15)
> Comment on attachment 547455 [details] [diff] [review] [review]
> v3: updateClientMode() on sync:error & adjust syncTimer appropriately
> 
> We're going to take this fix, but we're also now deep into Aurora and this
> has the new instant sync feature on our radar as potentially not ready for
> 7. I'm crossing my fingers that we don't find any additional blockers in the
> feature because we're now in that part of the release where turning off or
> backing out incomplete features is very appealing.

Thanks for the approval. I completely understand the reasoning. Indeed this how Aurora is supposed to work.

That said, AFAIUI Aurora is also meant as a stabilizing period. For Instant Sync this unfortunately meant that we had three edge case regressions (not major ones, mind you... after all, QA signed off on the original implementation before we even landed on m-c). IMHO the good news is that we caught the regression during the Aurora phase and not later! ;)

I do agree about the timescale. I too wish we that we would have fixed this bug earlier in the Aurora stage, along with the other two regressions. Alas, the Services Engineering off-site punched a small productivity hole in that plan ;)

Anyway, landed on Aurora... let's hope that this was it...
http://hg.mozilla.org/releases/mozilla-aurora/rev/72beea1d4cc7
Comment 17 Tracy Walker [:tracy] 2011-07-26 11:25:48 PDT
(In reply to comment #14)
> STRs for QA:
> 
> 1. Connect a new profile to an existing Sync account that has other devices
> connected to it but that will cause the first sync to produce an error
> somehow (e.g. it has invalid records or HMAC mismatches or similar)
Do you have a method to create/modify an account that will exhibit this?

> 2. During the initial sync, cause minor score updates, e.g. by switching
> tabs a lot and/or producing some new history.
do you mean literally during the initial sync or to trigger a first successful sync?

> 3. Ensure that the sync interval is no longer singleDeviceInteral (but in
> all likelihood activeInterval)
Ensure the sync interval is set at what point; before the first successful sync is triggered or after?
Comment 18 Tracy Walker [:tracy] 2011-07-26 15:16:45 PDT
testing on s-c build of 20110725:

After erroring on hmac mismatch, every successful sync on the added client is Adjusting syncInterval to singleDeviceInterval
Comment 19 Tracy Walker [:tracy] 2011-07-26 16:18:42 PDT
I followed comment #14 STR's. and the syncInterval is set to singleDeviceInterval.

Then hit Sync Now in the error bar. sync interva1 is still one day.  Philipp thinks it may be the way the hmac mismatch is breaking all records from the initial client. (not a usual thing) 

adding a third client:
erros from the hmac mismatch but then gets: Adjusting syncInterval to immediateInterval. 
then a minute later: Adjusting syncInterval to activeInterval.

based on that, I'll verify this fixed on s-c
Comment 20 Philipp von Weitershausen [:philikon] 2011-07-26 16:32:58 PDT
http://hg.mozilla.org/mozilla-central/rev/7648521ecb5a

Note You need to log in before you can comment on or make changes to this bug.