Closed
Bug 434816
Opened 17 years ago
Closed 17 years ago
Make sync service prioritize between engines to sync
Categories
(Cloud Services :: General, defect, P1)
Cloud Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
0.2
People
(Reporter: hello, Assigned: myk)
References
Details
Attachments
(2 files, 2 obsolete files)
1.14 KB,
patch
|
hello
:
review+
|
Details | Diff | Splinter Review |
6.19 KB,
patch
|
hello
:
review+
|
Details | Diff | Splinter Review |
The service should prioritize which engines to sync. It can ask the engines how much a sync is needed for them (see bug 434812) and decide accordingly.
For example, we could establish a cutoff (e.g., '25' - see bug 434812), below which we won't sync an engine even if it has data.
Reporter | ||
Updated•17 years ago
|
Priority: -- → P1
Target Milestone: -- → 0.2
Comment 1•17 years ago
|
||
This patch changes the scheduler timer to 2 minutes, and on every cycle, the sync is performed only if the score returned by the engines tracker is >= 30. After a sync, we call resetScore.
TODO: We may need to add more fine-grained analysis of the score in the future, and also handle cases when the score is set to -1. This would need a way for us to differentiate between a sync triggered by the user and the scheduler.
Reporter | ||
Comment 2•17 years ago
|
||
Comment on attachment 322275 [details] [diff] [review]
Initial attempt at heuristic-sync
Wow, this was easier than I thought!
r+, pending we investigate what happens if sync takes longer than 2 mins. I want to make sure we won't attempt to start another sync while one is already going.
Attachment #322275 -
Flags: review?(thunder) → review+
Comment 3•17 years ago
|
||
Good catch! I think the simplest solution would be to use an object property in the engine as a sort of lock - and check it it's unset before attempting to sync. If a sync is in progress, we simply ignore the request, thereby waiting for the next 2 mins.
Updated patch attached.
Attachment #322275 -
Attachment is obsolete: true
Attachment #322281 -
Flags: review?(thunder)
Reporter | ||
Comment 4•17 years ago
|
||
I think the first patch might be best, since the service's sync method has a lock wrapper.
I would just like to make sure it's ok by testing an artificially long sync and checking there isn't a race.
Reporter | ||
Comment 5•17 years ago
|
||
Comment on attachment 322281 [details] [diff] [review]
Heuristics with checks against simultaneous syncs.
Yeah, we definitely want the first patch. The lock wrapper in the service's sync method (this._lock(...) will protect us from simultaneous syncs.
Attachment #322281 -
Flags: review?(thunder) → review-
Reporter | ||
Comment 6•17 years ago
|
||
form bug 434817,
the service should decrease the required score to sync over time, to ensure services with small changes do eventually get to sync them.
Assignee | ||
Comment 7•17 years ago
|
||
Comment on attachment 322275 [details] [diff] [review]
Initial attempt at heuristic-sync
Anant has checked in this patch, and I'm going to attach a followup that makes it use the approach described in comment 6.
Attachment #322275 -
Attachment is obsolete: false
Assignee | ||
Updated•17 years ago
|
Attachment #322281 -
Attachment is obsolete: true
Assignee | ||
Comment 8•17 years ago
|
||
This patch implements the approach described in comment 6. The code looks right, but I haven't tested it yet, as sm-labs01 is down again.
Assignee: anarayanan → myk
Attachment #322275 -
Attachment is obsolete: true
Attachment #324560 -
Flags: review?(thunder)
Assignee | ||
Updated•17 years ago
|
Attachment #322275 -
Attachment is obsolete: false
Assignee | ||
Comment 9•17 years ago
|
||
Which just goes to show the value of testing, as there were two bugs in the version posted earlier:
1. the interval between scheduled syncs was set to one hour rather than one minute;
2. user-initiated syncs used the same conditional-sync algorithm rather than unconditionally syncing.
This version corrects both problems and has been tested and confirmed to work.
Note that the algorithm supports a variety of variations. Most obviously, we can fiddle with the three constants to make syncing faster, either by reducing the total time it takes for the threshold to go from 100 to 1 or by increasing the frequency of scheduled syncs (f.e. we could make total time 10 minutes instead of 20, or we could leave it at 20 minutes but schedule syncs every 15 seconds instead of every minute, or both).
Here are two less obvious variations:
1. Once a threshold reaches 1, we could leave it there until something changes rather than resetting the threshold to its initial value. This would make rare changes get synced more frequently (which could be a good thing), although it also creates a disparity in sync speeds between regular and irregular trivial changes.
2. We could unconditionally sync once the threshold reaches 0. That would catch any changes that the engine doesn't see (which could be a good thing), but it might also mask bugs in the engine that we'd otherwise catch (which could be a bad thing).
Of these variations, I'm favorably inclined towards increasing the frequency of scheduled syncs while keeping the 20 minute total time stable. I'm also inclined to consider leaving the threshold at 1 once it reaches that value until something changes. I'm not sure about periodic unconditional syncs, though. That seems like it could be trouble (although it might make the user experience better at this early, buggy stage, which is significant).
Attachment #324560 -
Attachment is obsolete: true
Attachment #324588 -
Flags: review?(thunder)
Attachment #324560 -
Flags: review?(thunder)
Assignee | ||
Comment 10•17 years ago
|
||
Note: when I say "while keeping the 20 minute total time stable", I should clarify that I picked "20 minutes" as the cycle time off the top of my head, and it's perfectly reasonable to decide that we want a different time. Basically, we should figure out the maximum amount of time we're willing to let the server copy be stale for the most trivial change. That's the total cycle time.
Reporter | ||
Comment 11•17 years ago
|
||
Comment on attachment 324588 [details] [diff] [review]
followup patch v2: working version
This patch is hawt.
Attachment #324588 -
Flags: review?(thunder) → review+
Reporter | ||
Comment 12•17 years ago
|
||
I think we should leave the threshold at 1 once it reaches that level. My rationale is that right now the behavior for trivial changes is essentially random: so long as no sync occurs, every minute we unconditionally decrease the threshold and then reset it, so the probabilities are the same for any multiple of 5 (right? I didn't do so well in probability ;)
Thus, making it 'stick' at 1 is not going to distort stable, predictable behavior. Changing it will actually make it more predictable (though I suspect most users won't notice anyway).
About syncing unconditionally (which would make the above a moot point), I think we shouldn't for now and see how it goes. I think our trackers will work well, and they are simple enough that fixes should be easy to work out.
Note that there are other algorithms we could try, if this one doesn't work out for some reason, or just to experiment. In particular, we could try a "back off" rather than a "creep up" approach. That is, we sync as fast as possible, but delay trivial changes more and more the faster they come.
Anyway, looks great!
Assignee | ||
Comment 13•17 years ago
|
||
changeset: 510:9dbc5af4f284
date: Wed Jun 11 10:38:25 2008 -0700
summary: bug 434816: use a decreasing threshold algorithm for the periodic scheduled sync to make sure we eventually sync even small changes to data; r=thunder
I'll check in a followup that makes the threshold hover at 1 until it syncs something.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•17 years ago
|
||
(In reply to comment #12)
> I think we should leave the threshold at 1 once it reaches that level. My
> rationale is that right now the behavior for trivial changes is essentially
> random: so long as no sync occurs, every minute we unconditionally decrease
> the threshold and then reset it, so the probabilities are the same for any
> multiple of 5 (right? I didn't do so well in probability ;)
Hrm, I'm not super-smart at probability either, but I think in general you are correct: other things being equal, right now any trivial change (which I'll define for the purposes of this exercise as a change that adds one point to the score) has an equal chance of appearing at any point in the cycle, and thus the time it takes to sync it is evenly dispersed across a 20 minute spread.
> Thus, making it 'stick' at 1 is not going to distort stable, predictable
> behavior. Changing it will actually make it more predictable (though I
> suspect most users won't notice anyway).
Yeah, it certainly does mean that infrequent changes will sync more quickly (and this behavior will be consistent), which could be a good thing.
On the other hand, an engine with frequent trivial changes will sync those changes to the server on a very different schedule from one with infrequent trivial changes, which could confuse users into thinking that the frequent one is buggy.
For example, if engine A records a trivial change every minute, while engine B records a trivial change every half hour, then sticking at 1 means that engine B's change will always get synced within a minute of having been made, while engine A's first change will take 17 minutes to get synced (at which point its threshold would be 15 and its score 17).
But this is all theoretical, and either approach seems good enough, so I've changed it to stick at 1. We can always change it back later if needed.
> Note that there are other algorithms we could try, if this one doesn't work
> out for some reason, or just to experiment. In particular, we could try a
> "back off" rather than a "creep up" approach. That is, we sync as fast as
> possible, but delay trivial changes more and more the faster they come.
Ah, that's a good idea! Let's keep it in mind.
Updated•16 years ago
|
Component: Weave → General
Product: Mozilla Labs → Weave
Updated•16 years ago
|
QA Contact: weave → general
You need to log in
before you can comment on or make changes to this bug.
Description
•