Closed Bug 732605 Opened 9 years ago Closed 9 years ago

Next sync scheduled if only failed records were applied

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla13
Tracking Status
firefox12 - fixed
firefox-esr10 - ---

People

(Reporter: gps, Assigned: gps)

Details

(Whiteboard: [verified in services])

Attachments

(2 files)

If a sync fails to successfully apply any records, the SyncScheduler will count this as "records applied" and will schedule a sync using the immediate interval (90s). i.e. if a bookmark record fails to apply and the browser is otherwise doing nothing else that would trigger a sync, syncs will run every 90 seconds forever, even though nothing is changing.

The proper behavior should be to only schedule an immediate sync interval if records were successfully applied.

There is a related bug of failed records never being removed from the failed state. We'll deal with that in another bug.

Patch will come shortly.
I believe this addresses the problem adequately.

I used Math.max() out of paranoia. The counting logic in that function is complex enough that I wouldn't doubt there is a corner case where negative values could be introduced.
Attachment #602519 - Flags: review?(rnewman)
Comment on attachment 602519 [details] [diff] [review]
scheduler fix, v1

Review of attachment 602519 [details] [diff] [review]:
-----------------------------------------------------------------

Bootiful. Requesting sr from philikon or mconnor, particularly because we want to get this into Aurora and Beta if we can.

(Philipp is on PTO, so we'll see...)
Attachment #602519 - Flags: review?(rnewman)
Attachment #602519 - Flags: review?(mconnor)
Attachment #602519 - Flags: review+
I'd like to get this in Firefox 12. Hopefully we'll have enough time to get it while it is still technically in Aurora.
Attachment #602519 - Flags: review?(mconnor) → review+
https://hg.mozilla.org/services/services-central/rev/fca77ac7cdf9

Tracy: We'd like QA ASAP so I can merge into m-c. I'll email the QA list with builds once they are cranked out. I'll also try to get STR into this bug.
Whiteboard: [fixed-in-services]
Whiteboard: [fixed-in-services] → [fixed in services]
Comment on attachment 602519 [details] [diff] [review]
scheduler fix, v1

[Approval Request Comment]
Regression caused by (bug #): Long-standing Sync issue
User impact if declined: Continued excessive syncing. People may notice excessive syncing through UI pauses associated with syncing. The real win is on the Sync servers: this patch may reduce load on the Sync service.
Testing completed (on m-c, etc.): None yet. Will be performed shortly.
Risk to taking this patch (and alternatives if risky): Low.
String changes made by this patch: None

I'm requesting approval before it hits m-c so it is on the release drivers' radar. I want this patch to get in the hands of users ASAP. However, it isn't worth a stop-the-world for 11, so I'm targeting 12. I'm hoping to get it while 12 is still in Aurora.

I'm requesting ESR uplift because I don't want this issue persisting in the wild for much longer than a few more months.
Attachment #602519 - Flags: approval-mozilla-esr10?
Attachment #602519 - Flags: approval-mozilla-aurora?
Comment on attachment 602519 [details] [diff] [review]
scheduler fix, v1

rnewman never put me on the review list like he said in comment 2, so I only saw this now. My apologies.

Yes, this looks good and should obviously have been implemented like this in the first place... My bad.
Attachment #602519 - Flags: review+
We'll have to resort to record munging to get steps to reliably reproduce.

1) Configure Sync. Set up verbose Sync logging.
2) Wait for a sync to run
3) Go to about:about
4) Open up a privileged web console via developer tools
5) Run the following:

---

Components.utils.import("resource://gre/modules/Services.jsm");
Components.utils.import("resource://services-sync/record.js");
Components.utils.import("resource://services-sync/engines.js");
Components.utils.import("resource://services-sync/resource.js");
var record = new CryptoWrapper("addons", "testid");
record.cleartext = {id: "testid", addonID: "DOES_NOT_EXIST_SYNC_TESTING", source: "amo", applicationID: Services.appinfo.ID};
record.encrypt();

var url = Engines.get("addons").engineURL + "/testid";
record.upload(url);

---

This uploads an add-on sync record that points to an invalid Add-on ID, so it should always fail to apply.

6) Perform a sync

In the sync logs, you should see something like:

1331078405245	Sync.Store.Addons	WARN	Failed to apply incoming record testid
1331078405245	Sync.Store.Addons	WARN	Encountered exception: Add-on not found after install: DOES_NOT_EXIST_SYNC_TESTING JS Stack trace: create([object Object])@addons.js:341 < Store_applyIncoming([object Object])@engines.js:275 < applyIncoming([object Object])@addons.js:312 < applyIncomingBatch([object Array])@engines.js:243 < doApplyBatch()@engines.js:795 < ([object Object])@engines.js:902 < ()@record.js:834 < Channel_onDataAvail([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper],0,802)@resource.js:605
1331078405247	Sync.Engine.Addons	DEBUG	Records that failed to apply: testid

A few lines later you will see something like:

1331078495552	Sync.SyncScheduler	DEBUG	Next sync in 90000 ms.

The next sync time before this patch will be "90000 ms." After the patch, it should be something much bigger, like "86400000 ms" for a single device account. If the device is active or you have multiple clients configured, you may see "3600000 ms" or "600000 ms." Under no circumstances should you see "90000 ms."

Also, TBPL is having issues currently. It might be a little longer on the builds.
So, Mac handled this correctly right away.

1331085839321	Sync.Store.Addons	WARN	Encountered exception: Add-on not found after install: DOES_NOT_EXIST_SYNC_TESTING JS Stack trace: create([object Object])@addons.js:341 < Store_applyIncoming([object Object])@engines.js:275 < applyIncoming([object Object])@addons.js:312 < applyIncomingBatch([object Array])@engines.js:243 < doApplyBatch()@engines.js:795 < ([object Object])@engines.js:902 < ()@record.js:834 < Channel_onDataAvail([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper],0,366)@resource.js:605
1331085839323	Sync.Collection	DEBUG	mesg: GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1
1331085839323	Sync.Collection	DEBUG	GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1
1331085839324	Sync.Engine.Addons	DEBUG	Records that failed to apply: testid
1331085839324	Sync.Engine.Addons	INFO	Records: 1 applied, 0 successfully, 1 failed to apply, 1 newly failed to apply, 0 reconciled.
1331085839324	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:applied
1331085839324	Sync.SyncScheduler	TRACE	Engine addons successfully applied 0 items.
1331085839325	Sync.ErrorHandler	TRACE	Handling weave:engine:sync:applied
1331085839325	Sync.Status	DEBUG	Status for engine addons: error.engine.reason.apply_fail
1331085839325	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1331085839325	Sync.ErrorHandler	DEBUG	addons failed to apply some records.
1331085839325	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:finish
1331085839325	Sync.Service	INFO	Sync completed at 2012-03-06 20:03:59 after 1.86 secs.
1331085839325	Sync.Service	TRACE	Event: weave:service:sync:finish
1331085839325	Sync.SyncScheduler	TRACE	Handling weave:service:sync:finish
1331085839326	Sync.SyncScheduler	TRACE	Adjusting syncInterval to activeInterval.
1331085839326	Sync.SyncScheduler	DEBUG	Next sync in 600000 ms.
1331085839326	Sync.ErrorHandler	TRACE	Handling weave:service:sync:finish
1331085839326	Sync.ErrorHandler	TRACE	Status.service is error.sync.failed_partial
1331085839326	Sync.ErrorHandler	DEBUG	Some engines did not sync correctly.


Win XP to come...
on Win XP initial failure:

1331085856386	Sync.Engine.Addons	INFO	0 outgoing items pre-reconciliation
1331085856832	Sync.Store.Addons	INFO	Found 0/1 add-ons during repository search.
1331085856833	Sync.Store.Addons	WARN	Failed to apply incoming record testid
1331085856833	Sync.Store.Addons	WARN	Encountered exception: Add-on not found after install: DOES_NOT_EXIST_SYNC_TESTING JS Stack trace: create([object Object])@addons.js:341 < Store_applyIncoming([object Object])@engines.js:275 < applyIncoming([object Object])@addons.js:312 < applyIncomingBatch([object Array])@engines.js:243 < doApplyBatch()@engines.js:795 < ([object Object])@engines.js:902 < ()@record.js:834 < Channel_onDataAvail([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper],0,366)@resource.js:605
1331085856836	Sync.Collection	DEBUG	mesg: GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1
1331085856836	Sync.Collection	DEBUG	GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1
1331085856837	Sync.Engine.Addons	DEBUG	Records that failed to apply: testid
1331085856837	Sync.Engine.Addons	INFO	Records: 1 applied, 0 successfully, 1 failed to apply, 1 newly failed to apply, 0 reconciled.
1331085856838	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:applied
1331085856838	Sync.SyncScheduler	TRACE	Engine addons successfully applied 0 items.
1331085856838	Sync.ErrorHandler	TRACE	Handling weave:engine:sync:applied
1331085856838	Sync.Status	DEBUG	Status for engine addons: error.engine.reason.apply_fail
1331085856838	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1331085856838	Sync.ErrorHandler	DEBUG	addons failed to apply some records.
1331085856839	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:finish
1331085856839	Sync.Service	INFO	Sync completed at 2012-03-06 20:04:16 after 1.85 secs.
1331085856839	Sync.Service	TRACE	Event: weave:service:sync:finish
1331085856839	Sync.SyncScheduler	TRACE	Handling weave:service:sync:finish
1331085856839	Sync.SyncScheduler	TRACE	Adjusting syncInterval to immediateInterval.
1331085856840	Sync.SyncScheduler	DEBUG	Next sync in 90000 ms.
1331085856840	Sync.ErrorHandler	TRACE	Handling weave:service:sync:finish
1331085856840	Sync.ErrorHandler	TRACE	Status.service is error.sync.failed_partial
1331085856840	Sync.ErrorHandler	DEBUG	Some engines did not sync correctly.

then another incorrect handling:
1331085945326	Sync.Engine.Addons	INFO	0 outgoing items pre-reconciliation
1331085948753	Sync.Store.Addons	INFO	Found 0/1 add-ons during repository search.
1331085948754	Sync.Store.Addons	WARN	Failed to apply incoming record testid
1331085948755	Sync.Store.Addons	WARN	Encountered exception: Add-on not found after install: DOES_NOT_EXIST_SYNC_TESTING JS Stack trace: create([object Object])@addons.js:341 < Store_applyIncoming([object Object])@engines.js:275 < applyIncoming([object Object])@addons.js:312 < applyIncomingBatch([object Array])@engines.js:243 < doApplyBatch()@engines.js:795 < ([object Object])@engines.js:902 < ()@record.js:834 < Channel_onDataAvail([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper],0,366)@resource.js:605
1331085948756	Sync.Collection	DEBUG	mesg: GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1&ids=testid
1331085948756	Sync.Collection	DEBUG	GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1&ids=testid
1331085948757	Sync.Engine.Addons	DEBUG	Records that failed to apply: testid
1331085948757	Sync.Engine.Addons	INFO	Records: 1 applied, 0 successfully, 1 failed to apply, 0 newly failed to apply, 0 reconciled.
1331085948758	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:applied
1331085948758	Sync.SyncScheduler	TRACE	Engine addons successfully applied 0 items.
1331085948758	Sync.ErrorHandler	TRACE	Handling weave:engine:sync:applied
1331085948758	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:finish
1331085948759	Sync.Status	DEBUG	Status.sync: success.sync => success.sync
1331085948759	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1331085948759	Sync.Service	INFO	Sync completed at 2012-03-06 20:05:48 after 4.87 secs.
1331085948759	Sync.Service	TRACE	Event: weave:service:sync:finish
1331085948759	Sync.SyncScheduler	TRACE	Handling weave:service:sync:finish
1331085948760	Sync.SyncScheduler	TRACE	Adjusting syncInterval to immediateInterval.
1331085948760	Sync.SyncScheduler	DEBUG	Next sync in 90000 ms.
1331085948760	Sync.ErrorHandler	TRACE	Handling weave:service:sync:finish
1331085948760	Sync.ErrorHandler	TRACE	Status.service is success.status_ok
1331085948760	Sync.ErrorHandler	TRACE	Clearing lastSyncReassigned.

finally, correctly setting to activeSync interval:

1331086039922	Sync.Engine.Addons	INFO	0 outgoing items pre-reconciliation
1331086040319	Sync.Store.Addons	INFO	Found 0/1 add-ons during repository search.
1331086040320	Sync.Store.Addons	WARN	Failed to apply incoming record testid
1331086040320	Sync.Store.Addons	WARN	Encountered exception: Add-on not found after install: DOES_NOT_EXIST_SYNC_TESTING JS Stack trace: create([object Object])@addons.js:341 < Store_applyIncoming([object Object])@engines.js:275 < applyIncoming([object Object])@addons.js:312 < applyIncomingBatch([object Array])@engines.js:243 < doApplyBatch()@engines.js:795 < ([object Object])@engines.js:902 < ()@record.js:834 < Channel_onDataAvail([object XPCWrappedNative_NoHelper],null,[object XPCWrappedNative_NoHelper],0,366)@resource.js:605
1331086040322	Sync.Collection	DEBUG	mesg: GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1&ids=testid
1331086040322	Sync.Collection	DEBUG	GET success 200 https://scl2-sync1185.services.mozilla.com/1.1/upghcf5kx3nrau4332iiutoebm66fj7q/storage/addons?full=1&ids=testid
1331086040322	Sync.Engine.Addons	DEBUG	Records that failed to apply: testid
1331086040323	Sync.Engine.Addons	INFO	Records: 1 applied, 0 successfully, 1 failed to apply, 0 newly failed to apply, 0 reconciled.
1331086040323	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:applied
1331086040323	Sync.SyncScheduler	TRACE	Engine addons successfully applied 0 items.
1331086040323	Sync.ErrorHandler	TRACE	Handling weave:engine:sync:applied
1331086040324	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:finish
1331086040325	Sync.Status	DEBUG	Status.sync: success.sync => success.sync
1331086040325	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1331086040325	Sync.Service	INFO	Sync completed at 2012-03-06 20:07:20 after 1.56 secs.
1331086040325	Sync.Service	TRACE	Event: weave:service:sync:finish
1331086040325	Sync.SyncScheduler	TRACE	Handling weave:service:sync:finish
1331086040325	Sync.SyncScheduler	TRACE	Adjusting syncInterval to activeInterval.
1331086040325	Sync.SyncScheduler	DEBUG	Next sync in 600000 ms.
1331086040326	Sync.ErrorHandler	TRACE	Handling weave:service:sync:finish
1331086040326	Sync.ErrorHandler	TRACE	Status.service is success.status_ok
1331086040326	Sync.ErrorHandler	TRACE	Clearing lastSyncReassigned.
disregard logs.  new tab records were getting involved here.  I so know better than to look at logs in new tabs while testing.  This is verified on this afternoons s-c builds... go to m-c
Whiteboard: [fixed in services] → [verified in services]
https://hg.mozilla.org/mozilla-central/rev/fca77ac7cdf9

Thanks for the prompt verification, Tracy!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
did that make it to m-c in time?  

I've turned off all engines 'cept addons to make  sure no other records are polluting results. (for sanity sake).  My m-c builds from this morning are stuck in the 90 second loop.

typical log attached.
No need to track for 12, but we'll likely approve for Aurora nonetheless once this has had a day or two of bake time.
Tracy: your most recent log reveals that you are using a build pre-patch! I can tell from the line:

1331133438958	Sync.Engine.Addons	INFO	Records: 1 applied, 1 failed to apply, 0 newly failed to apply, 0 reconciled.

In the patch, we introduced a "{n} successfully" bit between "applied" and "failed."
verified pm m-c with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20120308 Firefox/13.0a1

1331217433066	Sync.Engine.Addons	INFO	Records: 1 applied, 0 successfully, 1 failed to apply, 0 newly failed to apply, 0 reconciled.
1331217433066	Sync.SyncScheduler	TRACE	Handling weave:engine:sync:applied
1331217433066	Sync.SyncScheduler	TRACE	Engine addons successfully applied 0 items.


1331217433067	Sync.SyncScheduler	TRACE	Adjusting syncInterval to activeInterval.
1331217433067	Sync.SyncScheduler	DEBUG	Next sync in 600000 ms.
Status: RESOLVED → VERIFIED
Comment on attachment 602519 [details] [diff] [review]
scheduler fix, v1

[Triage Comment]
Approved for Aurora 12. Let's first verify that this did help server-side load before taking on the ESR.
Attachment #602519 - Flags: approval-mozilla-esr10?
Attachment #602519 - Flags: approval-mozilla-aurora?
Attachment #602519 - Flags: approval-mozilla-aurora+
(In reply to Alex Keybl [:akeybl] from comment #17)

> [Triage Comment]
> Approved for Aurora 12. Let's first verify that this did help server-side
> load before taking on the ESR.

N.B., in case it wasn't considered -- there's a client impact, too: affected clients will be doing a bunch of event-loop spinning and synchronous IO every ninety seconds, forever and ever.
(In reply to Richard Newman [:rnewman] from comment #18)
> N.B., in case it wasn't considered -- there's a client impact, too: affected
> clients will be doing a bunch of event-loop spinning and synchronous IO
> every ninety seconds, forever and ever.

Understood, but I don't believe we have a repro case that proves that this bug leads to perf regressions. Given the lesser testing coverage we receive with the ESR prior to release, we have to be very cautious w/r/t what fixes we take.
[Triage Comment]
Please re-nominate if there is data to support putting this out to ESR would help server load.  We don't expect that many ESR deploys are using Sync.
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.