Closed Bug 659107 Opened 9 years ago Closed 9 years ago

SyncEngine._processIncoming: only report failure for newly failed items

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
mozilla7

People

(Reporter: philikon, Assigned: emtwo)

References

Details

(Whiteboard: [verified in services])

Attachments

(1 file, 7 obsolete files)

No description provided.
Attached patch WIP (obsolete) — Splinter Review
Comment on attachment 534796 [details] [diff] [review]
WIP

>   set toFetch(val) {
>     this._toFetch = val;
>+    // Coerce the array to a string for more efficient comparison.
>+    if (val + "" == this._toFetch) {
>+      return;
>+    }

philikon: what is this doing? this._toFetch gets set to val then we compare the value of val & this._toFetch as strings. Wouldn't it always evaluate to true?
Comment on attachment 534796 [details] [diff] [review]
WIP

     this._toFetch = val;
+    // Coerce the array to a string for more efficient comparison.
+    if (val + "" == this._toFetch) {
+      return;
+    }

This doesn't make too much sense to me... you assign, then do an equality check, which will always return. Don't you want to switch the assignment to after the returning clause? Or am I misunderstanding something here?

(Sidenote: humph, this check also doesn't work if the array is in a different order. Really wish we had a set datatype!)

Another thought: add the count of previous failed to the reported count map.

Otherwise, looks like a decent start!
(In reply to comment #2)
> >   set toFetch(val) {
> >     this._toFetch = val;
> >+    // Coerce the array to a string for more efficient comparison.
> >+    if (val + "" == this._toFetch) {
> >+      return;
> >+    }
> 
> philikon: what is this doing? this._toFetch gets set to val then we compare
> the value of val & this._toFetch as strings. Wouldn't it always evaluate to
> true?

Yeah. Duh. We want this check before we set this._toFetch to val. Thanks for catching that!

(In reply to comment #3)
>      this._toFetch = val;
> +    // Coerce the array to a string for more efficient comparison.
> +    if (val + "" == this._toFetch) {
> +      return;
> +    }
> 
> This doesn't make too much sense to me... you assign, then do an equality
> check, which will always return. Don't you want to switch the assignment to
> after the returning clause? Or am I misunderstanding something here?

Nope, you're quite correct. Marina and I were pairing on this yesterday and she's now catching up to my mistakes! Good job! :)

> (Sidenote: humph, this check also doesn't work if the array is in a
> different order. Really wish we had a set datatype!)

Yeah, I know, but at least it will catch the by far most common case (an empty array) and the second next common one (one failing items) quite nicely.

> Another thought: add the count of previous failed to the reported count map.

Good idea.
Comment on attachment 534796 [details] [diff] [review]
WIP

philikon: another question:
>+    this.previousFailed = [];
why is this being set?

>+    this.previousFailed = Utils.arrayUnion(failed, this.previousFailed);

>+    count.newFailedGUIDs = Utils.arraySub(this.previousFailed, failedInPreviousSync);

My understanding from these lines is that this.previousFailed keeps track of *all* previously failed items including the newly added items from the current sync, whereas failedInPreviousSync is a copy of this.previousFailed *before* new failed items from the current sync are added. So why do we set this.previouslyFailed = [] ?  What am I missing?
(In reply to comment #5)
> philikon: another question:
> >+    this.previousFailed = [];
> why is this being set?

To reset the list at the beginning of a sync. 

> >+    this.previousFailed = Utils.arrayUnion(failed, this.previousFailed);
> 
> >+    count.newFailedGUIDs = Utils.arraySub(this.previousFailed, failedInPreviousSync);
> 
> My understanding from these lines is that this.previousFailed keeps track of
> *all* previously failed items including the newly added items from the
> current sync, whereas failedInPreviousSync is a copy of this.previousFailed
> *before* new failed items from the current sync are added. So why do we set
> this.previouslyFailed = [] ?  What am I missing?

Because an item might no longer fail to apply this time around. In that case we don't want to keep it around in that list, right? So this.previousFailed will keep track of items that only fail in the current on-going sync, which means we should reset it at the beginning.
(In reply to comment #6)
> (In reply to comment #5)
> > philikon: another question:
> > >+    this.previousFailed = [];
> > why is this being set?
> 
> To reset the list at the beginning of a sync. 
> 
> > >+    this.previousFailed = Utils.arrayUnion(failed, this.previousFailed);
> > 
> > >+    count.newFailedGUIDs = Utils.arraySub(this.previousFailed, failedInPreviousSync);
> > 
> > My understanding from these lines is that this.previousFailed keeps track of
> > *all* previously failed items including the newly added items from the
> > current sync, whereas failedInPreviousSync is a copy of this.previousFailed
> > *before* new failed items from the current sync are added. So why do we set
> > this.previouslyFailed = [] ?  What am I missing?
> 
> Because an item might no longer fail to apply this time around. In that case
> we don't want to keep it around in that list, right? So this.previousFailed
> will keep track of items that only fail in the current on-going sync, which
> means we should reset it at the beginning.

ah good point, thanks
Attached patch WIP (obsolete) — Splinter Review
* fixed problem from comment 2
* added a test
* added previous failed to count logging as in comment 3 as well
(In reply to comment #8)

> * added a test

I seem to recall reviewing a patch from Philipp that altered the syntax for this kind of thing:

+    syncTesting = new SyncTestingInfrastructure(makeSteamEngine);

Philipp, am I going crazy?


Marina: while you're working on tests, two I would appreciate as I rebase some complicated code on top of this are:

* One which verifies that previously failed items are refetched in the next sync (see the hmacErrorCount machinery in test_corrupt_keys.js for an idea of how you might do this), and

* One which verifies that a failed item is only reported once.

Thanks!
(In reply to comment #10)
> I seem to recall reviewing a patch from Philipp that altered the syntax for
> this kind of thing:
> 
> +    syncTesting = new SyncTestingInfrastructure(makeSteamEngine);
> 
> Philipp, am I going crazy?

You're not. I missed test_syncengine.js and a few cases in test_syncengine_sync.js.

> * One which verifies that previously failed items are refetched in the next
> sync (see the hmacErrorCount machinery in test_corrupt_keys.js for an idea
> of how you might do this), and

test_syncengine_sync.js has a test for this already (test_processIncoming_failed_records), or is that not enough for you?

> * One which verifies that a failed item is only reported once.

Yes, we definitely need a new test for that.
(In reply to comment #11)

> test_syncengine_sync.js has a test for this already
> (test_processIncoming_failed_records), or is that not enough for you?

Heh, I should have remembered that, given that I have the darn file open in Vim right now :)
Attached patch WIP (obsolete) — Splinter Review
checking in previousFailed instead of toFetch array for failed items
Attachment #534796 - Attachment is obsolete: true
Attachment #534903 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
* added another test

rnewman: is this what you had in mind for failed item only reported once?

philikon: rnewman: any ideas for other tests?
Attachment #534943 - Attachment is obsolete: true
Comment on attachment 535221 [details] [diff] [review]
WIP

Review of attachment 535221 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #535221 - Flags: review?(philipp)
(In reply to comment #14)
> Created attachment 535221 [details] [diff] [review] [review]
> WIP
> 
> * added another test
> 
> rnewman: is this what you had in mind for failed item only reported once?

Comment is wrong:

+    // toFetch is written asynchronously


Not a waste of time, but this actually isn't what I was intending. I meant something like this:

  // Pseudocode.
  let called = 0;
  let counts;
  
  // Put failing items on the server. Sync once so that our lastSync
  // excludes the earlier items in the next try.
  // TODO: also instrument this part; we shouldn't duplicate fetch
  // here, either.
  engine._processIncoming();
 
  // previousFailed is now populated. Sync, and verify that
  // the counts match up.
  Svc.Obs.add("weave:engine:sync:apply-failed", function(c) {
    _("Called with " + JSON.stringify(c));
    counts = c;
    ++called;
  });

  engine._processIncoming();

  do_check_eq(called, 1);
  do_check_eq(called.failed, XXX);  // Equals number of failed records

Make sense?
Comment on attachment 535221 [details] [diff] [review]
WIP

Looks great!

>+  loadPreviousFailed: function loadPreviousFailed() {
>+    // Initialize to empty if there's no file
>+    this._previousFailed = [];
>+    Utils.jsonLoad("toFetch/" + this.name, this, function(previousFailed) {

You want to replace "toFetch/" with "failed/" here. I don't see how test_previousFailed would've ever passed for you this way. Oh wait, I see it now, because you didn't add it to run_test()! See below.

>+    // Reset previousFailed for each sync since previously failed items may not fail again.
>+    this.previousFailed = [];

Yay comments!

>     // Apply remaining items.
>     doApplyBatchAndPersistFailed.call(this);
>-
>-    if (count.failed) {
>+  
>+    count.newFailedGUIDs = Utils.arraySub(this.previousFailed, failedInPreviousSync);
>+    if (count.newFailedGUIDs.length) {

Nit: You're adding whitespace here, please remove it :)

>     this._log.info(["Records:",
>                     count.applied, "applied,",
>                     count.failed, "failed to apply,",
>+                    count.newFailedGUIDs, "newly failed to apply,",
>                     count.reconciled, "reconciled."].join(" "));

I think you want to sai cont.newFailedGUIDs.length here.

>   _resetClient: function SyncEngine__resetClient() {
>     this.resetLastSync();
>+    this.previousFailed = [];
>     this.toFetch = [];
>   },

Please test this change in test_syncengine.js (test_resetClient).

>+function test_previousFailed() {

You forgot to call this function to run_test() which is probably why you never ran into the problems mentioned above because it never ran ;)

>+  _("SyncEngine.previousFailed corresponds to file on disk");
>+  let syncTesting = new SyncTestingInfrastructure();
>+  const filename = "weave/previousFailed/steam.json";

I think for the directory name we chose "failed/" not "previousFailed/"

>+  let engine = makeSteamEngine();
>+  try {
>+    // Ensure pristine environment
>+    do_check_eq(engine.previousFailed.length, 0);
>+
>+    // Write file to disk
>+    let previousFailed = [Utils.makeGUID(), Utils.makeGUID(), Utils.makeGUID()];
>+    engine.previousFailed = previousFailed;
>+    do_check_eq(engine.previousFailed, previousFailed);
>+    // toFetch is written asynchronously

s/toFetch/previousFailed/

>+function test_processIncoming_failed_items_reported_once() {
>+  _("Ensure that failed records are reported only once");
>+  let syncTesting = new SyncTestingInfrastructure();
>+  Svc.Prefs.set("clusterURL", "http://localhost:8080/");
>+  Svc.Prefs.set("username", "foo");

We should set the client.type preference to "mobile" here so that we also test the mobile batching. You then also want to set engine.mobileGUIDFetchBatchSize to a number smaller than NUMBER_OF_RECORDS, e.g. you can keep it the same as APPLY_BATCH_SIZE.

>+  const APPLY_BATCH_SIZE = 5;
>+  const NUMBER_OF_RECORDS = 5;

It would be good to test this with multiple batches, so perhaps bump NUMBER_OF_RECORDS to, say, 12?

Also, we should put some initial failed items into engines.previousFailed to verify that the list actually gets reset. You might have to adapt your custom applyIncomingBatch() method for that.

>+  // Engine that fails the first 2 records.
>+  let engine = makeSteamEngine();
>+  engine.applyIncomingBatchSize = APPLY_BATCH_SIZE;
>+	engine._store._applyIncomingBatch = engine._store.applyIncomingBatch;
>+  engine._store.applyIncomingBatch = function (records) {
>+    let failed = [];
>+    for (var i = 0; i < 2; i++) {
>+        failed.push(records.shift().id);
>+    }
>+		engine._store._applyIncomingBatch(records);
>+    return failed;
>+  };

You can simplify this function a bit by using Array.slice():

engine._store.applyIncomingBatch = function (records) {
  engine._store._applyIncomingBatch(records.slice(2));
  return records.slice(0, 2);
};

But the function may have to be a bit more complicated than that for the reasons mentioned above.

>+    // Failed items should only be reported once.
>+    // A second sync with the same failed items should not add the same items again.
>+    do_check_eq([id for (id in engine._store.items)].length, APPLY_BATCH_SIZE-2);

nit: spaces around the minus operator.

>+    // Ensure failed item that passed on second attempt is removed
>+    do_check_eq([id for (id in engine._store.items)].length, APPLY_BATCH_SIZE-1);

ditto.


Apart from these nits, the only request I have is that we also test that previousFailed items are refetched on the next sync. That could be either done by amending test_processIncoming_resume_toFetch() or by creating a similar test function just for previousFailed. Whatever you think is easier.
Attachment #535221 - Flags: review?(philipp) → feedback+
Attached patch patch for 659107 (obsolete) — Splinter Review
Attachment #535221 - Attachment is obsolete: true
Attachment #535699 - Flags: review?
> Also, we should put some initial failed items into engines.previousFailed to
> verify that the list actually gets reset. You might have to adapt your
> custom applyIncomingBatch() method for that.

philikon: I added initial failed items but wasn't sure why I'd need to adapt applyIncomingBatch().

rnewman: I think this is what you meant? As for instrumenting...did you mean having a separate function for generating failed records so code isn't repeated? If so, I agree & maybe that needs a separate bug to be filed.
Comment on attachment 535699 [details] [diff] [review]
patch for 659107

Looks great! Just minor niggles:

>+function test_processIncoming_failed_items_reported_once() {
...
>+  engine._store.applyIncomingBatch = function (records) {
>+    engine._store._applyIncomingBatch(records.slice(1));
>+    var arr = records.slice(0, 1);
>+    return [arr[0].id];
>+  };

The last two lines here could be replaced by

  return [records[0].id];

>+    // Logging only happened once so these values are not up to date
>+    do_check_eq(called, 1);
>+    do_check_eq(counts.failed, 3);
>+    do_check_eq(counts.applied, 15);
>+    do_check_eq(counts.newFailedGUIDs.length, 3);

I suggest rephrasing that comment a bit:

  // Failures weren't notified again because there were no newly failed items.

>+function test_processIncoming_previousFailed() {
...
>+  engine._store.applyIncomingBatch = function (records) {
>+    engine._store._applyIncomingBatch(records.slice(2));
>+    var arr = records.slice(0, 2);
>+    return [arr[0].id, arr[1].id];
>+  };

Same as above: last two lines in that functino could be replaced by

  return [records[0].id, records[1].id];

>+    // A second sync with the same failed items should not add the same items again.
>+    // Items that did not fail a second time should no longer be in previousFailed.
>+    do_check_eq([id for (id in engine._store.items)].length, 10);
>+    do_check_eq(engine.previousFailed.length, 4);
>+    do_check_eq(engine.previousFailed[0], "record-no-0");
>+    do_check_eq(engine.previousFailed[1], "record-no-1");
>+    do_check_eq(engine.previousFailed[2], "record-no-8");
>+    do_check_eq(engine.previousFailed[3], "record-no-9");

Please also verify that the refetched items that didn't fail have also been applied in the store by verifying their existence in engine._store.items (c.f. other tests)

r=me with those, but I'm also looping in rnewman since this touches the bowels of Sync that he just rewrote.
Attachment #535699 - Flags: review?(rnewman)
Attachment #535699 - Flags: review?
Attachment #535699 - Flags: review+
Attached patch patch for 659107 (obsolete) — Splinter Review
* changes requested by philikon
Attachment #535699 - Attachment is obsolete: true
Attachment #535699 - Flags: review?(rnewman)
Attachment #535735 - Flags: review?(rnewman)
Comment on attachment 535735 [details] [diff] [review]
patch for 659107

       count.failed += failed.length;
-      this._log.debug("Records that failed to apply: " + failed);
+      if (failed.length) {
+        this._log.debug("Records that failed to apply: " + failed);
+      }

Might as well move the count increment inside that check.

I'd personally be happier if get/set previousFailed and get/set toFetch shared some code; they're both "compare some value, then delayed jsonSave to some string path".

+    count.newFailedGUIDs = Utils.arraySub(this.previousFailed, failedInPreviousSync);

You're adding an array to an object which tracks counts. Is that necessary? How 'bout just adding the count?

It's also worth documenting somewhere whether count.failed now includes only old failures or both old and new. And I'd change the name to something like "previousFailed" and "failed"; after all, the title of the change is "only report failure for newly failed items".

+    do_check_eq(engine.previousFailed, 0);

How does this pass? .previousFailed is an array.

r- for now, and I'll read your other messages!
Attachment #535735 - Flags: review?(rnewman) → review-
(In reply to comment #22)
> I'd personally be happier if get/set previousFailed and get/set toFetch
> shared some code; they're both "compare some value, then delayed jsonSave to
> some string path".

Follow-up?

> +    count.newFailedGUIDs = Utils.arraySub(this.previousFailed,
> failedInPreviousSync);
> 
> You're adding an array to an object which tracks counts. Is that necessary?
> How 'bout just adding the count?

The idea was that consumers of the apply-failed notification find out about the GUIDs that failed to apply and could do things with it. The vision here is that the Error Handling Policy object (part of the Death To Unknown Error feature) could log this, find out more about the object, and then display more informative error messages.

This is all sci-fi for now, so might as well leave it out for now and do it properly when we have an idea of what the consumers actually want to know.

> +    do_check_eq(engine.previousFailed, 0);
> 
> How does this pass? .previousFailed is an array.

Because [] == 0 :) But good catch, this should be engine.previousFailed.length.
(In reply to comment #23)
> Follow-up?

Sure. I can tackle when I rebase.
 
> This is all sci-fi for now, so might as well leave it out for now and do it
> properly when we have an idea of what the consumers actually want to know.

Yup; at that point we can change the 'contract' of the observer, and maybe pass a much richer object of which "counts" is just a property.
> +    count.newFailedGUIDs = Utils.arraySub(this.previousFailed,
> failedInPreviousSync);

> It's also worth documenting somewhere whether count.failed now includes only
> old failures or both old and new. And I'd change the name to something like
> "previousFailed" and "failed"; after all, the title of the change is "only
> report failure for newly failed items".

did you want to change the name of count.newFailedGUIDs? count.failed contains all failures of the current sync while count.newFailedGUIDs contains all new failures (that didn't occur in the last sync) ... not sure if calling it previousFailed makes sense
Attached patch patch for 659107 (obsolete) — Splinter Review
Attachment #535735 - Attachment is obsolete: true
Attachment #535776 - Flags: review?(rnewman)
Comment on attachment 535776 [details] [diff] [review]
patch for 659107

+    let failedInPreviousSync = this.previousFailed.slice();

This no longer needs to be .slice(), because we set previousFailed immediately below. If tests pass with that, go ahead and remove it.

r=rnewman with that.
Attachment #535776 - Flags: review?(rnewman) → review+
Attached patch patch for 659107Splinter Review
Attachment #535776 - Attachment is obsolete: true
http://hg.mozilla.org/services/services-central/rev/6ac4e4601c95
Status: NEW → ASSIGNED
Whiteboard: [fixed in services]
STRs for QA: Need to have a Sync account with an item on the server that fails to apply locally. Here's a bit of JavaScript to create such a bad record on the server:

  Components.utils.import("resource://services-sync/main.js");
  Components.utils.import("resource://services-sync/engines/history.js");
  var eng = Weave.Engines.get("history");
  var guid = "sampleguid12";
  var rec = new HistoryRec("history", guid);
  rec.histUri = "invalid uri";
  rec.title = "foobar";
  rec.visits = [{type: 0, date: 1234567890}];
  rec.encrypt();
  new Weave.Resource(eng.engineURL + "/" + guid).put(JSON.stringify(rec))

1. Enter this in the Error Console or the Web Developer Console for about:config.

2. Sync. You should see the infamous Unknown Error bar.

3. Close the error bar and sync again. Previously the bar would reappear for each subsequent sync. It should now no longer appear.
Whiteboard: [fixed in services] → [verified in services]
http://hg.mozilla.org/mozilla-central/rev/6ac4e4601c95
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
verified in m-c build from 20110602
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla7
Blocks: 692556
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.