Closed Bug 591126 Opened 10 years ago Closed 9 years ago

handle upload interruption gracefully

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta8+

People

(Reporter: mconnor, Assigned: philikon)

References

Details

Attachments

(1 file)

Right now we fail on initial upload if any of the upload chunks fails.  We can probably just add all IDs to changedIDs, and use that array as the base for large uploads.
I'm pretty sure we do that already:
http://mxr.mozilla.org/services-central/source/fx-sync/services/sync/modules/engines.js#429
OS: Mac OS X → All
Hardware: x86 → All
Blocks: 592375
We do half of it, we don't cull changedIDs until everything completes.  We need to cull the IDs in changedIDs during doUpload, right before http://mxr.mozilla.org/services-central/source/fx-sync/services/sync/modules/engines.js#677

Failed IDs are all added back at the end, but we need to be sure to add those back if we're throwing, otherwise we'll lose that changeset (this could be a separate bug, but I'll fix it here).
Assignee: nobody → mconnor
blocking2.0: --- → beta8+
Flags: blocking-fx-sync1.5+
Things have changed slightly with bug 610923, so in the new scheme we should do this:

* set lastSyncLocal as soon as we call engine.getChangedIDs()
* remove successfully uploaded IDs from the backup
* on upload fail or any other exception, add all the backup IDs to the tracker (like we do now) but keep lastSyncLocal where it is.

This ensures that we don't reupload *everything* when only a certain batch fails to upload while still ensuring consistent state on the next sync.
Attached patch v1Splinter Review
Instead of rolling back a whole sync, we now do:

* Set the lastSyncLocal timestamp immediately as we fetch the changed IDs, then clear the tracker.

* Remove records from changed IDs as they have successfully uploaded.

* At the end of a sync, if there are remaining changed IDs, add them to the tracker. This can happen for items that failed to upload as well as when the engine throws an exception during sync (e.g. server responded with non-200).
Assignee: mconnor → philipp
Attachment #492462 - Flags: review?(mconnor)
Comment on attachment 492462 [details] [diff] [review]
v1

>diff --git a/services/sync/modules/engines.js b/services/sync/modules/engines.js

>-        if (failed_ids.length)
>-          this._log.debug("Records that will be uploaded again because "
>-                          + "the server couldn't store them: "
>-                          + failed_ids.join(", "));

This logging is important, if it happens.  It's rare now, but I think we should detact the case and log it.
>-    }
>-    catch (e) {
>-      this._rollback();
>-      this._log.warn("Sync failed");

I suppose we'll get the exception in the log instead here.

>-      throw e;
>+    } finally {
>+      this._syncCleanup();
Attachment #492462 - Flags: review?(mconnor) → review+
(In reply to comment #5)
> >-        if (failed_ids.length)
> >-          this._log.debug("Records that will be uploaded again because "
> >-                          + "the server couldn't store them: "
> >-                          + failed_ids.join(", "));
> 
> This logging is important, if it happens.  It's rare now, but I think we should
> detact the case and log it.

Will do.

> >-    }
> >-    catch (e) {
> >-      this._rollback();
> >-      this._log.warn("Sync failed");
> 
> I suppose we'll get the exception in the log instead here.

Correct. Service will catch the exception and log it. But now that you bring this up, I remember that the "Sync failed" string has been helpful when analyzing logs (e.g. you can just search for "fail"). I'll change service.js to prepend this string when we log the exception.
http://hg.mozilla.org/services/fx-sync/rev/5723ade26502
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 615021
Is there a way this can be verified from the user end?
(In reply to comment #8)
> Is there a way this can be verified from the user end?

Upload a large set of data (e.g. loads of history) from mobile, kill the network in between. Look at about:sync-log and verify that a few batches of POST requests have been successful in the beginning. Resume Sync, verify that it resumes where it left off instead of reuploading everything again.
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.