950522 - CloudSync API

Reporter

Description

•

11 years ago

We are discussing supporting the ability to plug various sync adaptors into FF. Instead of just supporting the Mozilla cloud, we could support other cloud vendors to plug their clouds into FF. A couple extensions already exist for this. All of them tend to interact pretty poorly with the data sources and do main thread IO and such because the underlying APIs are complex and not very obvious how to use right. This patch explores what a clean API for that could look like that sync adaptors can plug into. It might also make sense to implement the rebooted sync implementation in the future on top of something like this.

API:

/*
 * Cloud Sync API
 *
 * This API allows sync adaptors to listen to changes to open tabs, bookmarks,
 * logins and history (visits).
 *
 * Changes to tabs, bookmarks and logins are reported as a complete JSON
 * dump of the new state. This is acceptable since these data sets tend to
 * be small, and are updated infrequently.
 *
 * adaptor.ontabschanged = function (tabs) { }
 * adaptor.onbookmarkschanged = function (bookmarks) { }
 * adaptor.onloginschanged = function (logins) { }
 *
 * Since the browser history can be very extensive and changes frequently,
 * a different API is used that first reports a complete JSON dump of the
 * last N entries in the history when the browser starts, and from there
 * on visits are reported incrementally.
 *
 * adaptor.onhistory = function (visits) { } // initial dump
 * adaptor.onvisit = function (visits) { } // incremental updates
 *
 * When the history is cleared this is reported to the adaptor. The adaptor
 * should try to respect this and clear the history in the cloud:
 *
 * adaptor.onclearhistory = function () { }
 *
 * When the browser starts it will call after some initial delay the a start
 * event on the adaptor to kick off the initial download of new sync state.
 * The event carries an API object that can be used by the adaptor to deliver
 * updates to the browser.
 */

Andreas Gal :gal

Reporter

Comment 1

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Assignee: nobody → gal

Richard Newman [:rnewman]

Comment 2

•

11 years ago

(In reply to Andreas Gal :gal from comment #0)
> All of them tend to interact pretty poorly with the data
> sources and do main thread IO and such because the underlying APIs are
> complex and not very obvious how to use right. This patch explores what a
> clean API for that could look like that sync adaptors can plug into.

+1.


>  * Changes to tabs, bookmarks and logins are reported as a complete JSON
>  * dump of the new state. This is acceptable since these data sets tend to
>  * be small, and are updated infrequently.

At the risk of being all debbie downer, these assumptions aren't correct, and so this part needs rethinking for a number of reasons.

Last I checked, we have way more users with lots of bookmarks than you might think. To cover the usage of just 99% of our bookmark users, we're talking about ~1500 bookmarks. Up to 99.9% is ~8000 bookmarks. (Source: FHR.)

Bookmarks are stored in a DB; walking the tree, calling SELECT * FROM * and serializing the results every time you want to notify a sync adapter of a change is going to hurt perf.

But let's ignore bookmarks for the moment.

Sync *already* does monolithic uploads of both prefs and tabs (and it can do these monolithically, despite receiving incremental notifications from prefs and Satchel, simply by asking the data sources for a snapshot). That monolithic record format has been the cause of user-facing bugs in prefs sync, and we would like to move in the opposite direction and introduce granularity (Bug 645259).

Tabs change a lot. We already receive user feedback that indicates that our infrequent updates of tab state are inadequate and non-competitive, and that we should be syncing near-instantaneously and incrementally. Switching to a monolithic update makes that much more expensive, particularly as we move toward syncing more tab state (e.g., form fields).

Instant, incremental sync is a goal.

If the proposal is to *upload* all of this state in one bundle, then we'd be doing a disservice to the user's data cap. (And feedback from recent user studies in Asia indicates that we should be aiming for _less_ dependence on reliable network connections, rather than more. "Small and often" is the way forward.)

If it's not, then it implies that the sync adapter is maintaining a mirror of both local and remote data in order to do effective diffing. That's needless complexity, IMO; rather than having some component see a record change, bundle up the entire data store, call this method, and have the sync adapter do a bunch of work to find out which change occurred, why not just cut out those middle steps?

Re passwords: I don't have any data for signons (we should get some!), but I believe that your assumption is correct, and it's small. But note that for signons the need for correct merging is much more rigorous (because if you get it wrong you lose a password change), so some care would have to be taken that the representation includes enough data to allow clients to do so.


>  * Since the browser history can be very extensive and changes frequently,
>  * a different API is used that first reports a complete JSON dump of the
>  * last N entries in the history when the browser starts, and from there
>  * on visits are reported incrementally.

This seems like a decent idea, assuming a push-oriented sync system. (Right now, Desktop is push-oriented (observer-based), while Android is pull-oriented.)

Richard Newman [:rnewman]

Comment 3

•

11 years ago

Oh, another thing just occurred to me.

We've seen plenty of bug reports from users who hit Sync's generous max record sizes due to large form history, large prefs, etc.

Those reports indicate that Firefox's data stores are a wild west. Not only do we want to avoid monolithic blobs due to raw size, but also because if there's one thing worse than having to sync a huge record, it's having to sync it *every time something unrelated changed*.

Andreas Gal :gal

Reporter

Comment 4

•

11 years ago

> At the risk of being all debbie downer, these assumptions aren't correct,
> and so this part needs rethinking for a number of reasons.
> 
> Last I checked, we have way more users with lots of bookmarks than you might
> think. To cover the usage of just 99% of our bookmark users, we're talking
> about ~1500 bookmarks. Up to 99.9% is ~8000 bookmarks. (Source: FHR.)
> 
> Bookmarks are stored in a DB; walking the tree, calling SELECT * FROM * and
> serializing the results every time you want to notify a sync adapter of a
> change is going to hurt perf.

For 99% of our users the bookmarks make up less than 150k of data. The I/O is async and we assemble it to a tree in memory. There won't be any measurable latency.

As I said in the comment, bookmarks essentially never change. Even for 8000 bookmarks its assembling a JSON graph < 1MB even for 99.9% of our users. Happy to measure and optimize here, but I don't see any significant problems here.

> 
> But let's ignore bookmarks for the moment.
> 
> Sync *already* does monolithic uploads of both prefs and tabs (and it can do
> these monolithically, despite receiving incremental notifications from prefs
> and Satchel, simply by asking the data sources for a snapshot). That
> monolithic record format has been the cause of user-facing bugs in prefs
> sync, and we would like to move in the opposite direction and introduce
> granularity (Bug 645259).

Upload is a different story. While I am convinced that you are wrong and the bug above is caused by the current implementation and not by the general approach, I don't think we have to decide that here. I think its the right approach to communicate the state as one object graph to the sync plugin. The sync plugin can do with it whatever it wants, and I am sending JSON because the intent is to have the sync plugins in workers to avoid any processing latency.

> 
> Tabs change a lot. We already receive user feedback that indicates that our
> infrequent updates of tab state are inadequate and non-competitive, and that
> we should be syncing near-instantaneously and incrementally. Switching to a
> monolithic update makes that much more expensive, particularly as we move
> toward syncing more tab state (e.g., form fields).

See above. Pushing updates to the plugin should be as a monolithic graph to keep the state machine simple. We can discuss the format of tab updates. Trying to do delta updates seems very questionable to me. I would go with zip first. Again, happy to change my mind based on numbers for the network protocol.

> 
> Instant, incremental sync is a goal.
> 
> If the proposal is to *upload* all of this state in one bundle, then we'd be
> doing a disservice to the user's data cap. (And feedback from recent user
> studies in Asia indicates that we should be aiming for _less_ dependence on
> reliable network connections, rather than more. "Small and often" is the way
> forward.)
> 
> If it's not, then it implies that the sync adapter is maintaining a mirror
> of both local and remote data in order to do effective diffing. That's
> needless complexity, IMO; rather than having some component see a record
> change, bundle up the entire data store, call this method, and have the sync
> adapter do a bunch of work to find out which change occurred, why not just
> cut out those middle steps?

Some sync adaptors will do one, others the other. Even if we do delta compression, I would much prefer to do that at a more abstract protocol level. A simple diff algorithm comparing to the last state sent will likely cause fewer bytes sent than any custom protocol we invent, and its less stateful.

In any case, I want to see hard data first that we need to optimize.

> 
> Re passwords: I don't have any data for signons (we should get some!), but I
> believe that your assumption is correct, and it's small. But note that for
> signons the need for correct merging is much more rigorous (because if you
> get it wrong you lose a password change), so some care would have to be
> taken that the representation includes enough data to allow clients to do so.

Merging data is over-rated. Passwords change rarely, and when they change, they tend to change into the same direction. Keep in mind that the same users is behind all the devices. A user won't change the password for a site at the same time to different values on different devices. Thats simply not a use case. Passwords will change somewhere and there will be a reasonable latency to propagate that before the user touches the other device and has the chance to trigger a password dialog. This lag will be in the range of minutes. Our goal will be to sync faster than that.

> 
> 
> >  * Since the browser history can be very extensive and changes frequently,
> >  * a different API is used that first reports a complete JSON dump of the
> >  * last N entries in the history when the browser starts, and from there
> >  * on visits are reported incrementally.
> 
> This seems like a decent idea, assuming a push-oriented sync system. (Right
> now, Desktop is push-oriented (observer-based), while Android is
> pull-oriented.)

We can use the same underlying data sources API for either approach. I looked at 2 non-Mozilla implementations. Neither uses real push. I expect most adaptors to use pull. One of the implementations I looked at uses the transition activity -> idle and idle -> activity to sync. That might be a pretty good compromise. When the user stops using the device is probably a good time to update the other device. When the user starts using a device is probably a good time to check for updates.

I will do a couple benchmarks on my implementation for my own sanity. Feel free to post additional measurements along with a description of the test case.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 5

•

11 years ago

(In reply to Andreas Gal :gal from comment #0)
> We are discussing supporting the ability to plug various sync adaptors into
> FF. Instead of just supporting the Mozilla cloud, we could support other
> cloud vendors to plug their clouds into FF. A couple extensions already
> exist for this. All of them tend to interact pretty poorly with the data
> sources and do main thread IO and such because the underlying APIs are
> complex and not very obvious how to use right. This patch explores what a
> clean API for that could look like that sync adaptors can plug into. It
> might also make sense to implement the rebooted sync implementation in the
> future on top of something like this.

This is a nice clear API. It reminds me a little of a JS port of the Google sync API used to test syncing bookmarks and history to Dropbox and Google Drive. I think the code is here:
https://github.com/vladikoff/picl-third-party

Your API has the advantage of simplicity, which makes it easier to understand IMO. There might be some concepts in the JS port that could be borrowed.

>  * Since the browser history can be very extensive and changes frequently,
>  * a different API is used that first reports a complete JSON dump of the
>  * last N entries in the history when the browser starts, and from there
>  * on visits are reported incrementally.
>  *
>  * adaptor.onhistory = function (visits) { } // initial dump
>  * adaptor.onvisit = function (visits) { } // incremental updates

I like this separation (full and incremental) but since we need it for at least one data type, we might as well consider it form more types too.

>  * When the history is cleared this is reported to the adaptor. The adaptor
>  * should try to respect this and clear the history in the cloud:
>  *
>  * adaptor.onclearhistory = function () { }

In Fennec, we had issues where users would clear history and not realize it would cascade to their desktop browser history too. We have kicked around the idea of "sync profiles": desktop and mobile being a simple example. I'm not saying we need this support right away, but we should consider the possibility at some point.

>  * When the browser starts it will call after some initial delay the a start
>  * event on the adaptor to kick off the initial download of new sync state.
>  * The event carries an API object that can be used by the adaptor to deliver
>  * updates to the browser.
>  */

I'm stating the obvious here, but let's make sure the adapter system could be run from non-JS platforms, like Android's background Sync system.

Richard Newman [:rnewman]

Comment 6

•

11 years ago

(In reply to Andreas Gal :gal from comment #4)
> I think its the right
> approach to communicate the state as one object graph to the sync plugin.
> The sync plugin can do with it whatever it wants, and I am sending JSON
> because the intent is to have the sync plugins in workers to avoid any
> processing latency.

Let me make sure I'm reading this right: the reason to do

  foo.onchange(json_dump)

rather than

  foo.onchange(storage_instance)

is because it's more efficient for workers to send all the data at once (even accounting for serialization), rather than to chat back and forth across a boundary for incremental changes?

(I would feel better about this if that JSON tree wasn't recomputed in memory every time. Perhaps you'd assumed that it wouldn't be?)

The latter, incidentally, is pretty much how Android works -- a SyncAdapter is notified with onPerformSync for a particular ContentProvider, and it's responsible for figuring out what changed (by asking the storage, typically via range queries) and doing whatever it needs to do with the network.

The Android scheduler is responsible for noticing changes to the data sources and deciding when to ask the SyncAdapter to do up/down/bidi.

I think that's the division of work you're proposing, and that part makes sense to me.


> Merging data is over-rated. Passwords change rarely, and when they change,
> they tend to change into the same direction. Keep in mind that the same
> users is behind all the devices. A user won't change the password for a site
> at the same time to different values on different devices. 

That's not the kind of merging I'm talking about; that's an intra-record conflict.

If you have a monolithic record containing all of your passwords, and you don't do even non-conflicting merging of the contents, then you're asserting that a user must successfully sync on both devices between every password operation on different devices. I don't think that's realistic, and it currently bites us with prefs sync.

I suspect we're simply talking with different terminology, so let me rephrase: I should be able to change a password on my desktop, and save a new password on my phone, have both then sync (in either order), and both new passwords should be on both devices. That seems worthwhile to me, given real-world constraints. What about you?


> > This seems like a decent idea, assuming a push-oriented sync system. (Right
> > now, Desktop is push-oriented (observer-based), while Android is
> > pull-oriented.)
> 
> We can use the same underlying data sources API for either approach. I
> looked at 2 non-Mozilla implementations. Neither uses real push.

Just to be clear: by "push" in this context, I mean "hey sync code, here's a change". By "pull" I mean "OK, data source, it's time to sync; what's changed since I last synced?".

Sync on desktop does the former -- it's driven by observers, and uses trackers to maintain changes in memory. On Android we use timestamps to retrieve changes, which fits much better with the system architecture.


> One of the implementations I looked at uses the
> transition activity -> idle and idle -> activity to sync. That might be a
> pretty good compromise. When the user stops using the device is probably a
> good time to update the other device. When the user starts using a device is
> probably a good time to check for updates.

We do some of this in Sync on desktop. It's a bit flaky because our idle service is dodgy, but it mostly works. On Android we delegate to when the system decides to sync, which is much smarter: it's based on idle, app launch, network transitions, data changes, etc.

Andreas Gal :gal

Reporter

Comment 7

•

11 years ago

I measured the latency of reading the bookmarks from sqlite. The first number is the latency in ms, the 2nd is the length of the accumulated list of bookmarks. This is completely unoptimized and just whatever I see right now. Executive summary: reading 4000 bookmarks from disk and putting them into a tree format doesn't cause any latency the user can observe (5ms << 16ms).

********* bookmarks duration: 5 count: 16
********* bookmarks duration: 5 count: 31
********* bookmarks duration: 5 count: 46
********* bookmarks duration: 5 count: 61
********* bookmarks duration: 5 count: 76
********* bookmarks duration: 5 count: 91
********* bookmarks duration: 5 count: 106
********* bookmarks duration: 6 count: 121
********* bookmarks duration: 5 count: 136
********* bookmarks duration: 5 count: 151
********* bookmarks duration: 5 count: 166
********* bookmarks duration: 5 count: 181
********* bookmarks duration: 5 count: 196
********* bookmarks duration: 4 count: 211
********* bookmarks duration: 4 count: 226
********* bookmarks duration: 4 count: 241
********* bookmarks duration: 5 count: 256
********* bookmarks duration: 5 count: 271
********* bookmarks duration: 5 count: 286
********* bookmarks duration: 5 count: 301
********* bookmarks duration: 5 count: 316
********* bookmarks duration: 6 count: 331
********* bookmarks duration: 5 count: 346
********* bookmarks duration: 5 count: 361
********* bookmarks duration: 5 count: 376
********* bookmarks duration: 5 count: 391
********* bookmarks duration: 5 count: 406
********* bookmarks duration: 5 count: 421
********* bookmarks duration: 5 count: 436
********* bookmarks duration: 5 count: 451
********* bookmarks duration: 6 count: 466
********* bookmarks duration: 5 count: 481
********* bookmarks duration: 5 count: 496
********* bookmarks duration: 4 count: 511
********* bookmarks duration: 5 count: 526
********* bookmarks duration: 5 count: 541
********* bookmarks duration: 5 count: 556
********* bookmarks duration: 5 count: 571
********* bookmarks duration: 5 count: 586
********* bookmarks duration: 4 count: 601
********* bookmarks duration: 4 count: 616
********* bookmarks duration: 5 count: 631
********* bookmarks duration: 5 count: 646
********* bookmarks duration: 5 count: 661
********* bookmarks duration: 6 count: 676
********* bookmarks duration: 5 count: 691
********* bookmarks duration: 5 count: 706
********* bookmarks duration: 4 count: 721
********* bookmarks duration: 5 count: 736
********* bookmarks duration: 5 count: 751
********* bookmarks duration: 5 count: 766
********* bookmarks duration: 4 count: 781
********* bookmarks duration: 6 count: 796
********* bookmarks duration: 5 count: 811
********* bookmarks duration: 5 count: 826
********* bookmarks duration: 5 count: 841
********* bookmarks duration: 4 count: 856
********* bookmarks duration: 5 count: 871
********* bookmarks duration: 5 count: 886
********* bookmarks duration: 5 count: 901
********* bookmarks duration: 5 count: 916
********* bookmarks duration: 4 count: 931
********* bookmarks duration: 5 count: 946
********* bookmarks duration: 5 count: 961
********* bookmarks duration: 5 count: 976
********* bookmarks duration: 5 count: 991
********* bookmarks duration: 4 count: 1006
********* bookmarks duration: 6 count: 1021
********* bookmarks duration: 5 count: 1036
********* bookmarks duration: 5 count: 1051
********* bookmarks duration: 5 count: 1066
********* bookmarks duration: 5 count: 1081
********* bookmarks duration: 4 count: 1096
********* bookmarks duration: 5 count: 1111
********* bookmarks duration: 6 count: 1126
********* bookmarks duration: 5 count: 1141
********* bookmarks duration: 4 count: 1156
********* bookmarks duration: 5 count: 1171
********* bookmarks duration: 5 count: 1186
********* bookmarks duration: 5 count: 1201
********* bookmarks duration: 5 count: 1216
********* bookmarks duration: 5 count: 1231
********* bookmarks duration: 5 count: 1246
********* bookmarks duration: 5 count: 1261
********* bookmarks duration: 5 count: 1276
********* bookmarks duration: 5 count: 1291
********* bookmarks duration: 5 count: 1306
********* bookmarks duration: 5 count: 1321
********* bookmarks duration: 5 count: 1336
********* bookmarks duration: 4 count: 1351
********* bookmarks duration: 4 count: 1366
********* bookmarks duration: 4 count: 1381
********* bookmarks duration: 5 count: 1396
********* bookmarks duration: 5 count: 1411
********* bookmarks duration: 5 count: 1426
********* bookmarks duration: 5 count: 1441
********* bookmarks duration: 5 count: 1456
********* bookmarks duration: 5 count: 1471
********* bookmarks duration: 5 count: 1486
********* bookmarks duration: 5 count: 1501
********* bookmarks duration: 5 count: 1516
********* bookmarks duration: 5 count: 1531
********* bookmarks duration: 5 count: 1546
********* bookmarks duration: 5 count: 1561
********* bookmarks duration: 6 count: 1576
********* bookmarks duration: 5 count: 1591
********* bookmarks duration: 5 count: 1606
********* bookmarks duration: 5 count: 1621
********* bookmarks duration: 5 count: 1636
********* bookmarks duration: 6 count: 1651
********* bookmarks duration: 5 count: 1666
********* bookmarks duration: 5 count: 1681
********* bookmarks duration: 5 count: 1696
********* bookmarks duration: 5 count: 1711
********* bookmarks duration: 6 count: 1726
********* bookmarks duration: 5 count: 1741
********* bookmarks duration: 5 count: 1756
********* bookmarks duration: 5 count: 1771
********* bookmarks duration: 5 count: 1786
********* bookmarks duration: 6 count: 1801
********* bookmarks duration: 5 count: 1816
********* bookmarks duration: 5 count: 1831
********* bookmarks duration: 5 count: 1846
********* bookmarks duration: 5 count: 1861
********* bookmarks duration: 5 count: 1876
********* bookmarks duration: 6 count: 1891
********* bookmarks duration: 5 count: 1906
********* bookmarks duration: 4 count: 1921
********* bookmarks duration: 5 count: 1936
********* bookmarks duration: 5 count: 1951
********* bookmarks duration: 5 count: 1966
********* bookmarks duration: 4 count: 1981
********* bookmarks duration: 5 count: 1996
********* bookmarks duration: 5 count: 2011
********* bookmarks duration: 5 count: 2026
********* bookmarks duration: 5 count: 2041
********* bookmarks duration: 5 count: 2056
********* bookmarks duration: 5 count: 2071
********* bookmarks duration: 5 count: 2086
********* bookmarks duration: 4 count: 2101
********* bookmarks duration: 5 count: 2116
********* bookmarks duration: 4 count: 2131
********* bookmarks duration: 5 count: 2146
********* bookmarks duration: 5 count: 2161
********* bookmarks duration: 5 count: 2176
********* bookmarks duration: 4 count: 2191
********* bookmarks duration: 5 count: 2206
********* bookmarks duration: 5 count: 2221
********* bookmarks duration: 5 count: 2236
********* bookmarks duration: 5 count: 2251
********* bookmarks duration: 5 count: 2266
********* bookmarks duration: 5 count: 2281
********* bookmarks duration: 5 count: 2296
********* bookmarks duration: 5 count: 2311
********* bookmarks duration: 4 count: 2326
********* bookmarks duration: 5 count: 2341
********* bookmarks duration: 5 count: 2356
********* bookmarks duration: 5 count: 2371
********* bookmarks duration: 5 count: 2386
********* bookmarks duration: 5 count: 2401
********* bookmarks duration: 5 count: 2416
********* bookmarks duration: 5 count: 2431
********* bookmarks duration: 5 count: 2446
********* bookmarks duration: 4 count: 2461
********* bookmarks duration: 5 count: 2476
********* bookmarks duration: 5 count: 2491
********* bookmarks duration: 4 count: 2506
********* bookmarks duration: 5 count: 2521
********* bookmarks duration: 5 count: 2536
********* bookmarks duration: 5 count: 2551
********* bookmarks duration: 5 count: 2566
********* bookmarks duration: 5 count: 2581
********* bookmarks duration: 5 count: 2596
********* bookmarks duration: 5 count: 2611
********* bookmarks duration: 5 count: 2626
********* bookmarks duration: 5 count: 2641
********* bookmarks duration: 5 count: 2656
********* bookmarks duration: 5 count: 2671
********* bookmarks duration: 5 count: 2686
********* bookmarks duration: 5 count: 2701
********* bookmarks duration: 5 count: 2716
********* bookmarks duration: 5 count: 2731
********* bookmarks duration: 5 count: 2746
********* bookmarks duration: 7 count: 2761
********* bookmarks duration: 8 count: 2776
********* bookmarks duration: 5 count: 2791
********* bookmarks duration: 7 count: 2806
********* bookmarks duration: 6 count: 2821
********* bookmarks duration: 5 count: 2836
********* bookmarks duration: 5 count: 2851
********* bookmarks duration: 5 count: 2866
********* bookmarks duration: 6 count: 2881
********* bookmarks duration: 5 count: 2896
********* bookmarks duration: 5 count: 2911
********* bookmarks duration: 5 count: 2926
********* bookmarks duration: 5 count: 2941
********* bookmarks duration: 5 count: 2956
********* bookmarks duration: 5 count: 2971
********* bookmarks duration: 6 count: 2986
********* bookmarks duration: 5 count: 3001
********* bookmarks duration: 5 count: 3016
********* bookmarks duration: 5 count: 3031
********* bookmarks duration: 5 count: 3046
********* bookmarks duration: 5 count: 3061
********* bookmarks duration: 5 count: 3076
********* bookmarks duration: 5 count: 3091
********* bookmarks duration: 5 count: 3106
********* bookmarks duration: 5 count: 3121
********* bookmarks duration: 5 count: 3136
********* bookmarks duration: 5 count: 3151
********* bookmarks duration: 5 count: 3166
********* bookmarks duration: 6 count: 3181
********* bookmarks duration: 5 count: 3196
********* bookmarks duration: 5 count: 3211
********* bookmarks duration: 5 count: 3226
********* bookmarks duration: 5 count: 3241
********* bookmarks duration: 5 count: 3256
********* bookmarks duration: 5 count: 3271
********* bookmarks duration: 5 count: 3286
********* bookmarks duration: 4 count: 3301
********* bookmarks duration: 5 count: 3316
********* bookmarks duration: 5 count: 3331
********* bookmarks duration: 6 count: 3346
********* bookmarks duration: 5 count: 3361
********* bookmarks duration: 5 count: 3376
********* bookmarks duration: 5 count: 3391
********* bookmarks duration: 5 count: 3406
********* bookmarks duration: 5 count: 3421
********* bookmarks duration: 5 count: 3436
********* bookmarks duration: 5 count: 3451
********* bookmarks duration: 5 count: 3466
********* bookmarks duration: 5 count: 3481
********* bookmarks duration: 5 count: 3496
********* bookmarks duration: 5 count: 3511
********* bookmarks duration: 5 count: 3526
********* bookmarks duration: 5 count: 3541
********* bookmarks duration: 5 count: 3556
********* bookmarks duration: 4 count: 3571
********* bookmarks duration: 5 count: 3586
********* bookmarks duration: 6 count: 3601
********* bookmarks duration: 5 count: 3616
********* bookmarks duration: 5 count: 3631
********* bookmarks duration: 5 count: 3646
********* bookmarks duration: 6 count: 3661
********* bookmarks duration: 5 count: 3676
********* bookmarks duration: 4 count: 3691
********* bookmarks duration: 5 count: 3706
********* bookmarks duration: 5 count: 3721
********* bookmarks duration: 5 count: 3736
********* bookmarks duration: 4 count: 3751
********* bookmarks duration: 6 count: 3766
********* bookmarks duration: 5 count: 3781
********* bookmarks duration: 5 count: 3796
********* bookmarks duration: 5 count: 3811
********* bookmarks duration: 5 count: 3826
********* bookmarks duration: 5 count: 3841
********* bookmarks duration: 5 count: 3856
********* bookmarks duration: 6 count: 3871
********* bookmarks duration: 6 count: 3886
********* bookmarks duration: 5 count: 3901
********* bookmarks duration: 5 count: 3916
********* bookmarks duration: 5 count: 3931
********* bookmarks duration: 5 count: 3946
********* bookmarks duration: 6 count: 3961
********* bookmarks duration: 5 count: 3976
********* bookmarks duration: 6 count: 3991
********* bookmarks duration: 5 count: 4006
********* bookmarks duration: 5 count: 4021
********* bookmarks duration: 5 count: 4036
********* bookmarks duration: 4 count: 4047

I will run the same experiment for the initial history dump as well.

Andreas Gal :gal

Reporter

Comment 8

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #3)
> Oh, another thing just occurred to me.
> 
> We've seen plenty of bug reports from users who hit Sync's generous max
> record sizes due to large form history, large prefs, etc.

We should only look at whether the API here is good enough to serve as abstraction between a sync protocol and our data sources. Even for the API we can chose meaningful limits. 5000 for history for example is perfectly fine to even limit at the CloudSync API level. A user wanting more than 5000 items of history will be a 1 in a million bug report. Worst case we make it a pref.

> 
> Those reports indicate that Firefox's data stores are a wild west. Not only
> do we want to avoid monolithic blobs due to raw size, but also because if
> there's one thing worse than having to sync a huge record, it's having to
> sync it *every time something unrelated changed*.

Again, this is an API between FF and plugins, not a network protocol.

Andreas Gal :gal

Reporter

Comment 9

•

11 years ago

> I like this separation (full and incremental) but since we need it for at
> least one data type, we might as well consider it form more types too.

I was mulling over this, too. Everything but history changes so rarely that it didn't seem worth-while at the browser<>plugin level. I change one password a month, tops. I can't convince myself we need to optimize for that.

> In Fennec, we had issues where users would clear history and not realize it
> would cascade to their desktop browser history too. We have kicked around
> the idea of "sync profiles": desktop and mobile being a simple example. I'm
> not saying we need this support right away, but we should consider the
> possibility at some point.

Yeah, I should drop the comment. I do think sync plugins want to have a notification when you clear history, but we probably shouldn't overload that with "clear cloud".

> 
> >  * When the browser starts it will call after some initial delay the a start
> >  * event on the adaptor to kick off the initial download of new sync state.
> >  * The event carries an API object that can be used by the adaptor to deliver
> >  * updates to the browser.
> >  */
> 
> I'm stating the obvious here, but let's make sure the adapter system could
> be run from non-JS platforms, like Android's background Sync system.

Yeah, good point, and no idea tbh. For Android the pure Java implementation seems like a really big win.

I am mulling using a hidden webapp as the plugin implementation container. That could work with Android as well, firing up a web view of some sort. Maybe we want to stick to all Java though. Don't know.

Andreas Gal :gal

Reporter

Comment 10

•

11 years ago

The same for the history dump:

********* history duration: 0 count: 2
********* history duration: 4 count: 17
********* history duration: 3 count: 32
********* history duration: 4 count: 47
********* history duration: 3 count: 62
********* history duration: 3 count: 77
********* history duration: 3 count: 92
********* history duration: 3 count: 107
********* history duration: 4 count: 122
********* history duration: 3 count: 137
********* history duration: 4 count: 152
********* history duration: 4 count: 167
********* history duration: 3 count: 182
********* history duration: 3 count: 197
********* history duration: 3 count: 212
********* history duration: 3 count: 227
********* history duration: 3 count: 242
********* history duration: 3 count: 257
********* history duration: 3 count: 272
********* history duration: 4 count: 287
********* history duration: 3 count: 302
********* history duration: 4 count: 317
********* history duration: 3 count: 332
********* history duration: 4 count: 347
********* history duration: 3 count: 362
********* history duration: 3 count: 377
********* history duration: 3 count: 392
********* history duration: 3 count: 407
********* history duration: 4 count: 422
********* history duration: 3 count: 437
********* history duration: 3 count: 452
********* history duration: 3 count: 467
********* history duration: 4 count: 482
********* history duration: 3 count: 497
********* history duration: 3 count: 512
********* history duration: 3 count: 527
********* history duration: 3 count: 542
********* history duration: 3 count: 557
********* history duration: 3 count: 572
********* history duration: 3 count: 587
********* history duration: 4 count: 602
********* history duration: 3 count: 617
********* history duration: 3 count: 632
********* history duration: 3 count: 647
********* history duration: 3 count: 662
********* history duration: 3 count: 677
********* history duration: 3 count: 692
********* history duration: 3 count: 707
********* history duration: 3 count: 722
********* history duration: 4 count: 737
********* history duration: 3 count: 752
********* history duration: 3 count: 767
********* history duration: 3 count: 782
********* history duration: 4 count: 797
********* history duration: 3 count: 812
********* history duration: 3 count: 827
********* history duration: 3 count: 842
********* history duration: 3 count: 857
********* history duration: 3 count: 872
********* history duration: 3 count: 887
********* history duration: 3 count: 902
********* history duration: 3 count: 917
********* history duration: 3 count: 932
********* history duration: 3 count: 947
********* history duration: 3 count: 962
********* history duration: 4 count: 977
********* history duration: 4 count: 992
********* history duration: 2 count: 1001

This doesn't cause any troubling latency either.

logins are sync because the APIs around them are all sync. If push comes to shove I might have to directly read from sqlite.

Richard Newman [:rnewman]

Comment 11

•

11 years ago

(In reply to Andreas Gal :gal from comment #10)

> This doesn't cause any troubling latency either.

That's awesome. Does this include stringifying and re-parsing? (Last I checked we could only send strings to workers, and it was expensive enough that it altered some design decisions, but that was a year ago.)


> logins are sync because the APIs around them are all sync. If push comes to
> shove I might have to directly read from sqlite.

Bug 853549 is replacing that with a JSON-backed API, FWIW -- at least for desktop. Flush-and-snapshotting that should be easy enough.

(On Android we need concurrent access from different threads, both with and without Gecko running, and so we have Bug 946857.)

Andreas Gal :gal

Reporter

Comment 12

•

11 years ago

Some data points:

x is a JSON dump of my history, with limit = 1000.

js> x.length
113497
js> JSON.parse(x).length
1000
js> var start = Date.now(); JSON.parse(x); print(Date.now() - start)
22
js> var y = JSON.parse(x)
js> var start = Date.now(); JSON.stringify(y); print(Date.now() - start)
30

We don't have to worry about JSON.parse times. That would run in a worker. JSON.stringify on the main thread for a large history (>> 1000) is probably a bad idea. We have to chunk that (history is a list, easy).

Chunking the tree-ified bookmarks is tricky. We should probably stringify the bookmarks list and tree-ify on the worker.

I will add some ChromeWorker code to the API. We should definitely force the worker behavior and make that path zero resistance to avoid any sync code on the MT.

Gregory Szorc [:gps]

Comment 13

•

11 years ago

(In reply to Andreas Gal :gal from comment #7)
> I measured the latency of reading the bookmarks from sqlite. The first
> number is the latency in ms, the 2nd is the length of the accumulated list
> of bookmarks. This is completely unoptimized and just whatever I see right
> now. Executive summary: reading 4000 bookmarks from disk and putting them
> into a tree format doesn't cause any latency the user can observe (5ms <<
> 16ms).

I have doubts that the ~5ms measurements achieved for SQLite queries are indicative of results on the typical client machine. My belief is influenced by Telemetry data showing a high variance between any kind of I/O operation (with a long tail often going up to hundreds of milliseconds for even the simplest of operations) and the simple fact that 5ms is faster than the seek time for most mechanical hard drives (and I believe much flash memory found on mobile devices).

I don't doubt you were able to achieve ~5ms SQLite read times. But I suspect this result was heavily influenced by having an SSD (or similar) and/or having the SQLite database in the page cache. I'd like to learn a little bit more about the methodology employed before accepting the 5ms read times from SQLite as typical.

Andreas Gal :gal

Reporter

Comment 14

•

11 years ago

(In reply to Gregory Szorc [:gps] (in southeast Asia until Dec 14) from comment #13)
> (In reply to Andreas Gal :gal from comment #7)
> > I measured the latency of reading the bookmarks from sqlite. The first
> > number is the latency in ms, the 2nd is the length of the accumulated list
> > of bookmarks. This is completely unoptimized and just whatever I see right
> > now. Executive summary: reading 4000 bookmarks from disk and putting them
> > into a tree format doesn't cause any latency the user can observe (5ms <<
> > 16ms).
> 
> I have doubts that the ~5ms measurements achieved for SQLite queries are
> indicative of results on the typical client machine. My belief is influenced
> by Telemetry data showing a high variance between any kind of I/O operation
> (with a long tail often going up to hundreds of milliseconds for even the
> simplest of operations) and the simple fact that 5ms is faster than the seek
> time for most mechanical hard drives (and I believe much flash memory found
> on mobile devices).
> 
> I don't doubt you were able to achieve ~5ms SQLite read times. But I suspect
> this result was heavily influenced by having an SSD (or similar) and/or
> having the SQLite database in the page cache. I'd like to learn a little bit
> more about the methodology employed before accepting the 5ms read times from
> SQLite as typical.

I didn't measure I/O read latency nor is that of any interest. I measured the time it takes to pull data out of our I/O layer as async callbacks come back that data is available. I don't care how slow sqplite queries as long those reads doesn't block the main thread. The 5ms measurement above would be the same even if pigeons are providing the data via RFC 1149.

As for the methodology, I measured the duration of executing the handleResult callback while processing the sqlite data thats coming back. This assumes that sqlite is running on an I/O thread and doesn't block the main thread, which it does if we use executeAsync, which we do. Its assumed that the message passing and data copying overhead is negligible, which tends to be true for such small amounts of data.

Andreas Gal :gal

Reporter

Comment 15

•

11 years ago

Measure twice, cut once. I measured the patch sending data via postMessage and it turns out we don't use JSON there since the text format isn't externally visible. The binary encoding we use is more than 10x faster and sending a large 1MB object graph (15k bookmarks) is very fast (other costs like syscall dominate). I will go back to the simplified ChromeWorker protocol that sends monolithic units. We can always break that up later if needed.

Andreas Gal :gal

Reporter

Comment 16

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #8347833 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 17

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #8350686 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 18

•

11 years ago

This WIP offloads processing of the data onto a worker (sync-worker.js). I/O is initiated on the main thread but is async. Results are sent to the worker in chunks to avoid lengthy pauses due to flattening when the message is sent. Even with many consumers (sync adaptors) and very large databases (megabytes of history and bookmarks), pause times are < 5ms on my machine.

Andreas Gal :gal

Reporter

Comment 19

•

11 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Latest WIP. There is now one worker that runs all the sync adaptors. Providers are spun up on demand after the adaptors are all loaded. Adaptors can be loaded over the network and are configured via a pref.

Attachment #8350689 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 20

•

10 years ago

Attached patch WIP — Details — Splinter Review

Attachment #8350946 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 21

•

10 years ago

Comment on attachment 8355805 [details] [diff] [review]
WIP

Can you please take a look and let me know what you think of the current state. This is much less priority than the Android work you are doing, so if you don't have time for it then just punt it. Thanks.

Attachment #8355805 - Flags: feedback?(rnewman)

Richard Newman [:rnewman]

Comment 22

•

10 years ago

(In reply to Andreas Gal :gal from comment #21)

> Can you please take a look and let me know what you think of the current
> state. This is much less priority than the Android work you are doing, so if
> you don't have time for it then just punt it. Thanks.

Predictably swamped indeed; hope to get to this early next week. (Fingers crossed.)

Richard Newman [:rnewman]

Comment 23

•

10 years ago

Comment on attachment 8355805 [details] [diff] [review]
WIP

Review of attachment 8355805 [details] [diff] [review]:
-----------------------------------------------------------------

Firstly, apologies for the delayed review. (But deadline hit, so it was worth it.)

Secondly, this is really interesting stuff.

You anticipated many of my observations/questions (epochs, for example). Lots of real-world edge cases and additions still to address, but the concept is really firming up.

I still have a reflexive objection to a single change => total reserialization => JSON => worker => adapter => figure out what changed => upload single modified record, but there are advantages in that pipeline (e.g., transport-level deltas), and if you can prove the perf, I can be persuaded. Some of this might be addressable by batching/buffering, some by using native representations (e.g., signons are now in-memory and serialized to JSON, so no sqlite pain), but beyond that who knows?

f? on Taras to see if he has any insights I've missed at this stage.

::: toolkit/components/cloudsync/CloudSyncWorker.js
@@ +19,5 @@
> +
> +// These are defined in nsINavBookmarksService.idl
> +const BOOKMARK = 1;
> +const FOLDER = 2;
> +const SEPARATOR = 3;

There are some others, too -- queries whose Places IDs need to be translated, for example.

@@ +49,5 @@
> +  list.forEach(function (entry) {
> +    let obj = map[entry.id];
> +    // Skip branches with a missing parent.
> +    if (!(entry.parent in map)) {
> +      return;

Skim comment: as far as I can tell, you're not enforcing a strict tree order for the listed items, so you can encounter a child before its parent, and so this clause will lose data.

To test: create a bookmark (id = N); create a folder (id = N + 1); move the bookmark into the folder.

@@ +135,5 @@
> +      return;
> +    }
> +    this.adaptors.forEach(function (adaptor) {
> +      if (eventName in adaptor && typeof adaptor[eventName] === "function") {
> +        adaptor[eventName](data);

I think it's worth considering whether these adaptors should be async. I contend that they should be, in which case you probably want to 'pause' an adaptor until it's ready for more data, buffering changes (only one, because you're doing total updates) until then.

::: toolkit/modules/CloudSync.jsm
@@ +18,5 @@
> + * logins and history (visits).
> + *
> + * Changes to tabs, bookmarks and logins are reported as a complete JSON
> + * dump of the new state. This is acceptable since these data sets tend to
> + * be small, and are updated infrequently.

Worth discussing batching/buffering/debouncing here.

There are plenty of add-ons that munge a bunch of bookmarks, passwords, or tabs at a time (and there are user behaviors like bookmarking a bunch of open tabs), so responding to individual Places events is likely to cause thrashing, potentially even causing adapters to race with themselves. (Places has batch notifications, but I don't think they're widely applicable enough to solve this issue.)

With your chosen approach it's probably enough to simply buffer future state changes until the adapter notifies that it's done handling the last one.

(This has to be async, of course -- e.g., have onbookmarkschanged return a promise.)

@@ +58,5 @@
> +  Services.tm.mainThread.dispatch(closure, Ci.nsIThread.DISPATCH_NORMAL);
> +}
> +
> +function log(msg) {
> +  dump("-- CLOUDSYNC: " + msg + "\n");

Future: use Log.jsm. (I know you're still sketching.)

@@ +75,5 @@
> + *
> + * The caller can manually request an update to the receiver fromt he provider
> + * using:
> + *
> + * provider.update();

Because the browser can die at any time without waiting for operations to complete (fast shutdown or crash), an adapter needs a way to find out the current state when it's created, rather than waiting for the next change. `update` appears to be that thing.

But it also needs a way to recognize when the state on init is the same as the last one it successfully handled. That implies an incrementing version number as part of the payload, passed in to an onfoochanged call and retrievable with the current state.

I think your epoch concept can fill this need, but it would need to be persistent, and perhaps linked to storage generations (at least, you don't want storage to change before you can start tracking epochs).

Something to think about.

@@ +80,5 @@
> + *
> + */
> +
> +let TabsProvider = {
> +  start: function TabsProvider_start(browser, receiver) {

Nit: our tools can extract method names from the field, so this is current style to avoid some redundancy:

  start: function (browser, receiver) {

@@ +116,5 @@
> +        let uri = browser.currentURI;
> +        let spec = uri.spec;
> +        // Don't enumerate any Firefox-internal URIs since other rendering
> +        // engines might consume these URIs.
> +        if (spec.substr(0, 6) === "about:")

spec.startsWith("about:")... or look at the URI's scheme directly. You also want to consider some of the other URLs we ignore:

  pref("services.sync.engine.tabs.filteredUrls", "^(about:.*|chrome://weave/.*|wyciwyg:.*|file:.*)$");

@@ +141,5 @@
> +    this.receiver_ = null;
> +    this.observer_ = null;
> +  },
> +
> +  update: function LoginsProvider_update() {

Future: you probably need to trigger a Master Password dialog here, or otherwise handle the unavailability of the data source.

@@ +221,5 @@
> +        handleError: function withBookmarksRoots_handleError(err) {
> +          Cu.reportError("AsyncStmt error (" + err.result + "): '" + err.message);
> +        },
> +        handleCompletion: function withBookmarksRoots_handleCompletion() {
> +          closure(roots);

handleCompletion can be called even if handleError is called. You should check the reason code passed to handleCompletion:

https://developer.mozilla.org/en-US/docs/MozIStorageStatementCallback#eatCookie.28.29

Attachment #8355805 - Flags: feedback?(taras.mozilla)

Attachment #8355805 - Flags: feedback?(rnewman)

Attachment #8355805 - Flags: feedback+

David Teller [:Yoric] - still alive but not very active

Comment 24

•

10 years ago

(In reply to Richard Newman [:rnewman] from comment #23)
> I still have a reflexive objection to a single change => total
> reserialization => JSON => worker => adapter => figure out what changed =>
> upload single modified record, but there are advantages in that pipeline
> (e.g., transport-level deltas), and if you can prove the perf, I can be
> persuaded. Some of this might be addressable by batching/buffering, some by
> using native representations (e.g., signons are now in-memory and serialized
> to JSON, so no sqlite pain), but beyond that who knows?
> 
> f? on Taras to see if he has any insights I've missed at this stage.


Experience with Session Restore indicates that this model doesn't scale too well when there's lots of data:
- single change => total collection caused plenty of jank, even when we added caching, and refactoring away from this took nearly one year to a three-men team;
- serializing a big object to send it to a worker still causes quite some jank, regardless of whether we send it immediately or we go through JSON and chunkification – we are in the process of refactoring away from it.

1. Minimize collection time: events cause small recollections, batched to avoid recollecting the same stuff several times in a row (tested with Session Restore, works nicely);
2. Minimize communications with the worker: the data store lives in a worker, recollections cause coarse-grained diffs to be sent to the worker (we're experimenting with this in Session Restore, no final design yet);
3. Minimize bandwidth usage: use the worker to rework coarse-grained diffs into fine-grained diffs, compress them, send them to the network (we haven't worked on this at all).

Additionally, we already have a SessionWorker that has access to some of the data you need (in particular, the history). It might be interesting to use this worker for both tasks.

(dormant account)

Comment 25

•

10 years ago

Comment on attachment 8355805 [details] [diff] [review]
WIP

I don't have the bandwidth for a detailed review. However in general dumping stuff as a big json document has horrible perf implications(memory, jank, server load, etc). This pattern should not happen.

As long as we do a json per record, that's fine. Eg, line-separated size-limited json records or even plain csv-style records are fine.
Otherwise you end up in a perf hell that is our session-store. Yoric might be able to offer more perf insight into this design as he has been fighting this kind of stuff for a long time.

Note another option here is to read from the places db sql in a worker and not do anything on the main thread to avoid horrible perf corner cases.

Attachment #8355805 - Flags: feedback?(taras.mozilla) → feedback?(dteller)

David Teller [:Yoric] - still alive but not very active

Comment 26

•

10 years ago

> Note another option here is to read from the places db sql in a worker and not do anything on the main thread to avoid horrible perf corner cases.

Ah, good point. We are actually working on mozStorage for chrome workers. This is currently very low priority as we do not have a use case, but we can bump this up.

David Teller [:Yoric] - still alive but not very active

Comment 27

•

10 years ago

Comment on attachment 8355805 [details] [diff] [review]
WIP

Review of attachment 8355805 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser.js
@@ +1097,5 @@
>          Cu.reportError(ex);
>        }
>      }, 10000);
>  
> +    CloudSync.init(gBrowser);

Nit: _delayedStartup looks like a weird place to initialize cloud synchronization.

::: toolkit/components/cloudsync/CloudSyncWorker.js
@@ +10,5 @@
> +let crypto = require("resource://gre/modules/workers/crypto.js");
> +
> +function request(fn) {
> +  postMessage({ fn: fn, args: Array.prototype.slice.call(arguments, 1) });
> +}

Please use |...args| instead of |arguments|.

Also, if adaptors are synchronous, we might end up with this worker frozen for upredictable amounts of time. We need to make sure that this cannot cause us to re-collect the same piece of data several times in a row.

@@ +103,5 @@
> +    try {
> +      // Import the adaptor script. We expect it to export a single global
> +      // variable 'ADAPTOR'.
> +      GLOBAL.ADAPTOR = null;
> +      importScripts(config.url);

Ugly and fragile. Please use require(), this will save us from plenty of headache once add-on authors will start writing their adaptors.

@@ +118,5 @@
> +      adaptor.onstart(config);
> +
> +      this.adaptors.push(adaptor);
> +    } catch (e) {
> +      log("unable to start adaptor " + config.url + " (" + e.toString() + ")");

e.moduleStack would be useful, too.

::: toolkit/components/cloudsync/moz.build
@@ +1,5 @@
> +# -*- Mode: python; c-basic-offset: 4; indent-tabs-mode: nil; tab-width: 40 -*-
> +# vim: set filetype=python:
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.

That file looks surprisingly empty.

::: toolkit/modules/CloudSync.jsm
@@ +13,5 @@
> +
> +/*
> + * Cloud Sync API
> + *
> + * This API allows sync adaptors to listen to changes to open tabs, bookmarks,

Side-note: Is there a way to call this something other than "sync"?

@@ +79,5 @@
> + * provider.update();
> + *
> + */
> +
> +let TabsProvider = {

This provider seems to implement a (largely broken) subset of session restore. I'm not sure what the objective is, but it doesn't look very useful as such.

Also, an update to a tab is often followed by a second update. I'd buffer and regroup updates using e.g. DeferredTask.

@@ +203,5 @@
> +    this.receiver_ = null;
> +    this.observer_ = null;
> +  },
> +
> +  update: function BookmarksProvider_update() {

I don't see what we gain by collecting data using a sql statement instead of just using onItem{Added, Removed, Changed, Moved} to immediately generate diffs. On the other hand, I see a possible source of jank – if the the 4-7 milliseconds cost you measure in comment 7 is correct, it can contribute to jank if the CPU is otherwise loaded or if Gecko is otherwise busy doing stuff between frames.

There would be a gain if we did this entirely from the ChromeWorker (see bug 853438), though.

@@ +207,5 @@
> +  update: function BookmarksProvider_update() {
> +    function withBookmarksRoots(closure) {
> +      let db = PlacesUtils.history.QueryInterface(Ci.nsPIPlacesDatabase).DBConnection;
> +      let stmt = db.createAsyncStatement("SELECT root_name, folder_id FROM moz_bookmarks_roots " +
> +                                         "ORDER BY folder_id");

Note: On the main thread, Sqlite.jsm generally makes for more readable, less error-prone code.

@@ +266,5 @@
> +};
> +
> +const HISTORY_LIMIT = 5000;
> +
> +let HistoryProvider = {

This should move to a ChromeWorker, too.

Also, here, too, I would buffer the changes using e.g. DeferredTask.

@@ +271,5 @@
> +  start: function HistoryProvider_start(browser, receiver) {
> +    this.receiver_ = receiver;
> +    this.observer_ = {
> +      onVisit: function(uri, visitID, time, sessionID, referreringID, transitionType, added) {
> +        if (transitionType < Ci.nsINavHistoryService.TRANSITION_EMBED) {

Ignoring frames?

@@ +307,5 @@
> +    if (visitId) {
> +      stmt = this.visitQuery_;
> +      if (!stmt) {
> +        let db = PlacesUtils.history.QueryInterface(Ci.nsPIPlacesDatabase).DBConnection;
> +        this.visitQuery_ = db.createAsyncStatement("SELECT p.url, p.frecency, v.visit_date " +

Ok, because of frecency, this probably cannot be handled directly by a history observer, which is a shame.

@@ +420,5 @@
> +  }
> +};
> +
> +const CHUNK_LIMIT = 500;
> +const PREF_NAME = "browser.cloudsync.config";

Nit: These names are near useless.

@@ +434,5 @@
> +  },
> +
> +  // To avoid blocking the main thread by having to JSON stringify large object
> +  // graphs for bookmarks and history we post them in chunks. The worker will
> +  // reassemble the data.

I have toyed with this kind of thing and was left unsatisfied with the result. I believe that we should rather invest in efficient postMessage for messages containing huge strings or, even better, in efficient async postMessage for objects that we guarantee we are not going to mutate during the execution of the postMessage.

Note that I have somewhere JS code that performs JSON stringification asynchronously. I can dust it up if you want.

@@ +506,5 @@
> +  restart: function CS_restart() {
> +    this.stopWorker();
> +
> +    try {
> +      let config = Services.prefs.getComplexValue(PREF_NAME, Ci.nsIPrefLocalizedString).data;

Localized string? What's the idea?

@@ +507,5 @@
> +    this.stopWorker();
> +
> +    try {
> +      let config = Services.prefs.getComplexValue(PREF_NAME, Ci.nsIPrefLocalizedString).data;
> +      let adaptors = JSON.parse(config);

My oh-gosh-we-re-going-to-bloat-the-pref-system sense is tingling. I'd like this to be stored somewhere else, we have enough trouble with the size of prefs as it is.

@@ +552,5 @@
> +      case "updateAll":
> +        Providers.updateAll();
> +        break;
> +      }
> +    };

Nit: A facility to report errors would be nice.

@@ +571,5 @@
> +    worker.onmessage = null;
> +
> +    worker.terminate();
> +  }
> +};

If the use case involves "I'm turning off my laptop and I want data to synchronize immediately on my phone", we need to plug this into AsyncShutdown.

Attachment #8355805 - Flags: feedback?(dteller) → feedback+

Alan K [:ack]

Assignee

Comment 28

•

10 years ago

It looks like I'll be adopting this patch.

Andreas Gal :gal

Reporter

Comment 29

•

10 years ago

As a general matter of principal its always ok to steal any patch from me, even without asking. Thanks Alan!

Assignee: gal → akligman

Alan K [:ack]

Assignee

Comment 30

•

10 years ago

It looks like this needs a manifest as well. The other manifests have GUIDs, are those generated by some tool? I'm looking through docs right now, but I welcome any advice that might save some time.

Brad Lassey [:blassey] (use needinfo?)

Comment 31

•

10 years ago

on irc "firebot uuid" in any channel with firebot (or message it directly).

Alan K [:ack]

Assignee

Comment 32

•

10 years ago

I don't see resource://gre/modules/workers/crypto.js in my source tree, where does this module live?

Andreas Gal :gal

Reporter

Comment 33

•

10 years ago

Attached patch crypto — Details — Splinter Review

This is my hack for this.

Alan K [:ack]

Assignee

Comment 34

•

10 years ago

> Adaptors can be loaded over the network and are configured via a pref.

Is the intent here that adapters should be loaded from remote URLs rather than from installed addons? I don't understand why that's desirable.

Andreas Gal :gal

Reporter

Comment 35

•

10 years ago

Its highly desirable for testing and development. It might also allow us to host the adapter code and update it independently as the server side changes. Might make it easier to synchronize with 3rd party servers that are updated etc. Not a strict requirement.

David Teller [:Yoric] - still alive but not very active

Updated

•

10 years ago

Depends on: 974642

David Teller [:Yoric] - still alive but not very active

Comment 36

•

10 years ago

(In reply to Andreas Gal :gal from comment #33)
> Created attachment 8376672 [details] [diff] [review]
> crypto
> 
> This is my hack for this.

As this would be useful not just for CloudSync, let's move this to bug 974642.

Alan K [:ack]

Assignee

Comment 37

•

10 years ago

> My oh-gosh-we-re-going-to-bloat-the-pref-system sense is tingling. I'd like this to be stored somewhere else, we have enough trouble with the size of prefs as it is.

This whole thing should probably be an XPCOM service, no? So instead of registering via pref, get service and register?

Alan K [:ack]

Assignee

Comment 38

•

10 years ago

New patches + discussion continued in #993584.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → DUPLICATE

patch 11 years ago Andreas Gal :gal 18.23 KB, patch		Details \| Diff \| Splinter Review
patch 11 years ago Andreas Gal :gal 5.93 KB, patch		Details \| Diff \| Splinter Review
patch 11 years ago Andreas Gal :gal 20.53 KB, patch		Details \| Diff \| Splinter Review
WIP 11 years ago Andreas Gal :gal 26.74 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago Andreas Gal :gal 29.36 KB, patch	rnewman : feedback+ Yoric : feedback+	Details \| Diff \| Splinter Review
crypto 10 years ago Andreas Gal :gal 5.07 KB, patch		Details \| Diff \| Splinter Review