Closed Bug 1337978 Opened 7 years ago Closed 7 years ago

Unify the two notions of "weak upload" in sync.

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 56
Tracking Status
firefox56 --- fixed

People

(Reporter: tcsc, Assigned: tcsc)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 676563 introduces the notion of a "weak" bookmark upload, but it's "outgoing" telemetry record is shared with the outgoing telemetry for the main upload. This is not ideal.

I don't necessarily think this is high priority, but seems work getting on file that we don't do this already.
Thom to expand on the requirements for this.
Flags: needinfo?(tchiovoloni)
Following 1:1 with markh, expanding this to really be to unify the two notions of weak upload.

One of them was added by the repair work and exists to upload items that -- in this case weak means the data to upload is only stored in memory.

The other was added by the bookmark dateAdded syncing patch, and uses weak to mean that the data to upload is only stored in memory *and* that failure to upload it does not fail the sync. (It also is reset every sync and doesn't support tombstones -- these would likely have to change)

I think the 2nd implementation (the one written to support dateAdded) should be expanded to work for the repair case as well.

Biggest issue (or at least an important one I want to call out so that it doesn't get missed) needed to solve for this case: Items that are marked as changed locally (because it happened during the sync) are removed from the weak upload set, since uploading them could introduce corruption, and we'll get them in the next sync anyway.

The records we drop on the floor here don't cause the weak upload to fail and should/will be uploaded in the next sync (as a strong upload) anyway. As a result, we don't even save telemetry saying we dropped some records on the floor for this case (which was the original issue this bug was opened to fix).

Anyway, doing this silently would cause us to send the wrong set of IDs in our repairResponse client command, so we'd need to either:

1. upload them anyway if they're flagged as needing upload due to repair

2. be sure that when we sent the repairResponse command, their IDs weren't present in the result.

3. defer sending the repair response until the next sync, when they are uploaded

I think 3 is the only way to guarantee no introduction of corruption, but it also complicates repair and in practice might be somewhat unnecessary. I kind of think we want bug 1335891 to land first if we go this route to avoid dragging out repair for too long.
Flags: needinfo?(tchiovoloni)
Summary: Add telemetry for weak bookmark upload → Unify the two notions of "weak upload" in sync.
Priority: -- → P2
Assignee: nobody → tchiovoloni
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review130080

::: services/sync/modules/bookmark_repair.js:692
(Diff revision 1)
>      if (data != "bookmarks") {
>        return;
>      }
> -    Svc.Obs.remove("weave:engine:sync:uploaded", this.onUploaded, this);
> -    log.debug(`bookmarks engine has uploaded stuff - creating a repair response`);
> -    this._finishRepair();
> +    if (subject.failed) {
> +      // Weak upload failed, but it might work next time, so we don't remove our
> +      // listener, and just wait until we happen to sync again

Since we allow multiple concurrent repairs, could keeping the listener around leak? (I think it shouldn't, and this is OK...otherwise, we'll never finish the repair, anyway. But this change caught my eye).

::: services/sync/modules/engines.js:1657
(Diff revision 1)
> +          // uploaded as a tombstone. That said, this check seems worth making
> +          // for the sake of sanity.
> +          let weakUploadMetadata = this._needWeakUpload.get(id);
> +          if (!weakUploadMetadata || !weakUploadMetadata.forceTombstone) {
> +            this._needWeakUpload.delete(id);
> +            uploadedAsStrong.push(id);

Just so I understand the new logic: we don't want to weakly upload items that were already in `_modified`, unless they're tombstones. If they're tombstones, we keep them for weak upload anyway, just in case we strongly uploaded an item that shouldn't be on the server. Is that right?

::: services/sync/modules/engines.js:1721
(Diff revision 1)
>        } catch (e) {
>          if (Async.isShutdownException(e)) {
>            throw e;
>          }
>          this._log.warn("Weak reupload failed", e);
>        }

I'm guessing `weave:engine:sync:weak-uploaded` is fired elsewhere for the success case?

::: services/sync/modules/engines.js:1762
(Diff revision 1)
>          encodedRecord.encrypt(this.service.collectionKeys.keyForCollection(this.name));
>        } catch (ex) {
>          if (Async.isShutdownException(ex)) {
>            throw ex;
>          }
>          this._log.warn(`Failed to encrypt record "${id}" during weak reupload`, ex);

Nit: s/reupload/upload

::: services/sync/modules/engines.js:1787
(Diff revision 1)
> +        pendingSent.push(id);
>        }
>        this._store._sleep(0);
>      }
>      postQueue.flush(true);
> +    Observers.notify("weave:engine:sync:weak-uploaded",

Aha, found it.

::: services/sync/modules/engines.js:1886
(Diff revision 1)
>  
>    _resetClient() {
>      this.resetLastSync();
>      this.previousFailed = [];
>      this.toFetch = [];
> +    // this._needWeakUpload.clear();

Uncomment?

::: services/sync/modules/engines/bookmarks.js:554
(Diff revision 1)
>      }
>      return record;
>    },
>  
> -  buildWeakReuploadMap(idSet) {
> +  buildWeakUploadMap(idToMetadata) {
>      // We want to avoid uploading records which have changed, since that could

Thank you for the thorough comments here and elsewhere!

::: services/sync/modules/engines/bookmarks.js:561
(Diff revision 1)
>      //
>      // Strictly speaking, it would be correct to just call getChangedIds() after
> -    // building the initial weak reupload map, however this is quite slow, since
> +    // building the initial weak upload map, however this is quite slow, since
>      // we might end up doing createRecord() (which runs at least one, and
>      // sometimes multiple database queries) for a potentially large number of
>      // items.

The two `getChangedIds` calls threw me for a loop, too. (I guess it's possible for them to be more expensive if we have more strong changes, and fewer weak changes...but we don't expect that to be common?)

::: services/sync/modules/engines/bookmarks.js:597
(Diff revision 1)
> +      if (!forcedTombstones.has(id)) {
> +        discarded.add(id);
> +      }
> +    }
> +    // Grab any changes that happened in the (fairly short) window between when
> +    // we built initialChanges and now.

Because, if they were changed strongly, we should wait to pick them up on the next sync? (Unless they're tombstones, which we'll upload anyway).

::: toolkit/components/places/PlacesSyncUtils.jsm:121
(Diff revision 1)
>     * Resolves to an array of the syncIDs of bookmarks that have a nonzero change
>     * counter
>     */
> -  getChangedIds: Task.async(function* () {
> -    let db = yield PlacesUtils.promiseDBConnection();
> -    let result = yield db.executeCached(`
> +  async getChangedIds() {
> +    let db = await PlacesUtils.promiseDBConnection();
> +    let changes = await pullSyncChanges(db, false);

I'd be OK with copy-pasting the query from `pullChanges` and changing it to return only the GUID, and maybe whether it's a tombstone, for logging? OTOH, we don't expect many changes for the common case, so building the change records shouldn't be expensive. (I say "shouldn't," without any numbers to back up my claim). :-P Your call.
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review130194

Thanks - this patch is complex (nothing against the patch - the problem is complex to get right!) I think it's worth addressing Kit's and my comments and giving us another look!

::: services/sync/modules/bookmark_repair.js:712
(Diff revision 1)
>      let response = {
>        request: this._currentState.request.request,
>        collection: "bookmarks",
>        clientID: clientsEngine.localID,
>        flowID,
> -      ids: this._currentState.ids,
> +      ids: Array.from(ids),

IIUC, discarded is also going to contain items that are locally changed and so *will* be uploaded next sync. It seems a little wrong to not record them as being uploaded (as then the repair code will choose a different device to request upload from, even though this device probably has uploaded it by them).

IOW, I'm not sure treating "can't create record and almost certainly never will be able to create record" the same as "record's fine an will be uploaded shortly".

::: services/sync/modules/engines.js:1657
(Diff revision 1)
> +          // uploaded as a tombstone. That said, this check seems worth making
> +          // for the sake of sanity.
> +          let weakUploadMetadata = this._needWeakUpload.get(id);
> +          if (!weakUploadMetadata || !weakUploadMetadata.forceTombstone) {
> +            this._needWeakUpload.delete(id);
> +            uploadedAsStrong.push(id);

This uploadedAsStrong seems to be quite alot of complexity for what seems to be an optimization for an edge-case. Would it really hurt if we uploaded the same record twice in this edge case?

I believe this is an edge-case because the 2 use-cases for weak uploads are:

* re-uploading an incoming record with a corrected dateAdded - it seems extremely unlikely a user would have happened to touch that same bookmark on this device in that window.
* For repair, where it seems extremely unlikely that an item another device believes is missing happens to get touched by the user at the same instant we upload it.

::: services/sync/modules/engines.js:1777
(Diff revision 1)
>        if (!enqueued) {
> +        this._log.warn(`(weak upload) Failed to enqueue ${id}, discarding...`, error);
> +        discarded.add(id);
>          ++counts.failed;
> +        // Give up on ever uploading this record since it's probably just too
> +        // big, and will never succeed.

I think it's worth explicitly calling out something like "and we don't consult this.allowSkippedRecord because weak uploads are never used for tree correctness" or similar.

::: services/sync/modules/engines/bookmarks.js:572
(Diff revision 1)
>      // unlikely case that a record was modified while we were building the map.
>      let initialChanges = Async.promiseSpinningly(PlacesSyncUtils.bookmarks.getChangedIds());
> +
> +    // Build a table to easily check if a record was a forced tombstone.
> +    // In these cases we shouldn't discard due to local changes, since we aren't
> +    // uploading the "real" local value anyway.

what will go wrong if we avoid this complexity and treat tombstones just like normal items?

::: services/sync/modules/engines/bookmarks.js:596
(Diff revision 1)
> +    for (let id of initialChanges) {
> +      if (!forcedTombstones.has(id)) {
> +        discarded.add(id);
> +      }
> +    }
> +    // Grab any changes that happened in the (fairly short) window between when

I think it would help to clarify this comment to reflect that we've now called createRecord, so if an item is still not changed we are guaranteed to have the unchanged record safely stashed away here.

I'm also not clear on why this is all bookmark specific (ie, why much of this isn't in engine.js's version)

::: toolkit/components/places/PlacesSyncUtils.jsm:1718
(Diff revision 1)
>   *        The Sqlite.jsm connection handle.
>   * @return {Promise} resolved once all items have been fetched.
>   * @resolves to an object containing records for changed bookmarks, keyed by
>   *           the sync ID.
>   */
> -var pullSyncChanges = Task.async(function* (db) {
> +var pullSyncChanges = Task.async(function* (db, markAsSyncing = true) {

Given you replaces Task.async with async above, you might as well do it here too?
Attachment #8855533 - Flags: review?(markh)
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review130080

> Since we allow multiple concurrent repairs, could keeping the listener around leak? (I think it shouldn't, and this is OK...otherwise, we'll never finish the repair, anyway. But this change caught my eye).

If we end up with multiple concurrent repairs then we are probably screwed anyway - ie, we'd probably need a counter and ensure we only remove the observer when it hits zero. We go to alot of effort to prevent multiple repairs in the requestor, and I don't think this patch makes the situation any worse, so I think this is probably OK.
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review130080

> If we end up with multiple concurrent repairs then we are probably screwed anyway - ie, we'd probably need a counter and ensure we only remove the observer when it hits zero. We go to alot of effort to prevent multiple repairs in the requestor, and I don't think this patch makes the situation any worse, so I think this is probably OK.

Thank god, my head was briefly spinning with the thought of multiple concurrent repairs.

> Just so I understand the new logic: we don't want to weakly upload items that were already in `_modified`, unless they're tombstones. If they're tombstones, we keep them for weak upload anyway, just in case we strongly uploaded an item that shouldn't be on the server. Is that right?

Exactly.

> The two `getChangedIds` calls threw me for a loop, too. (I guess it's possible for them to be more expensive if we have more strong changes, and fewer weak changes...but we don't expect that to be common?)

I think that is the common case, isn't it?

Usually you aren't repairing, and usually all records should have reasonably accurate dateAdded values (even if it is not now, eventually this should be true, once all clients implement dateAdded).

> Because, if they were changed strongly, we should wait to pick them up on the next sync? (Unless they're tombstones, which we'll upload anyway).

Yes, consider if a child was added to a folder that is flagged for (non-tombstone) weak upload. If we upload that folder without the child, we will have introduced corruption. This is actually somewhat likely in the case where we trigger the sync due to changes when you're reorganizing your bookmarks, or similar.

Forced tombstones don't suffer from this, since they shouldn't be there to begin with.

> I'd be OK with copy-pasting the query from `pullChanges` and changing it to return only the GUID, and maybe whether it's a tombstone, for logging? OTOH, we don't expect many changes for the common case, so building the change records shouldn't be expensive. (I say "shouldn't," without any numbers to back up my claim). :-P Your call.

I think having one version of the query is probably better, and the difference doesn't seem significant here?

Might be worth profiling since that query might be slow even in the cases when no records match though...
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review130194

> IIUC, discarded is also going to contain items that are locally changed and so *will* be uploaded next sync. It seems a little wrong to not record them as being uploaded (as then the repair code will choose a different device to request upload from, even though this device probably has uploaded it by them).
> 
> IOW, I'm not sure treating "can't create record and almost certainly never will be able to create record" the same as "record's fine an will be uploaded shortly".

I waffled about this a decent amount. We'll need discarded in some cases anyway because if we truly failed to upload it (it's too big), we shouldn't include it in the ids (IMO), and so I think we'll need most of this complexity anyway.

Given that they're roughly the same complexity, it seems like not including the IDs is the right call, since otherwise e.g. iOS won't be able to rely on getting all the responses to assume repair is done. It might not be done until this device syncs at some unknown point in the future.

> This uploadedAsStrong seems to be quite alot of complexity for what seems to be an optimization for an edge-case. Would it really hurt if we uploaded the same record twice in this edge case?
> 
> I believe this is an edge-case because the 2 use-cases for weak uploads are:
> 
> * re-uploading an incoming record with a corrected dateAdded - it seems extremely unlikely a user would have happened to touch that same bookmark on this device in that window.
> * For repair, where it seems extremely unlikely that an item another device believes is missing happens to get touched by the user at the same instant we upload it.

Quite a lot of complexity is much stronger than I'd put it, it's about 9 lines in one function (not counting logging or comments).

We don't remove items that have changed from the weak upload set, and so it's possible that these didn't change due to a repair (or a dateAdded update) that happened this sync, but one that happened earlier, which makes this more likely (although maybe you'd still consider it an edge case). -- Same for if the sync failed due to XIUS check failing or similar.

Removing the items that changed from the weak upload set would be the wrong choice, because we want to eventually send the weak upload notification, and wouldn't know to do this otherwise.

But I'm willing to take it out if you think its warranted, it shouldn't be needed for correctness.

> what will go wrong if we avoid this complexity and treat tombstones just like normal items?

I tried that first. If we don't remember that "no really, these should be a tombstone" we won't upload a tombstone for records that do exist locally but shouldn't be on the server.

> I think it would help to clarify this comment to reflect that we've now called createRecord, so if an item is still not changed we are guaranteed to have the unchanged record safely stashed away here.
> 
> I'm also not clear on why this is all bookmark specific (ie, why much of this isn't in engine.js's version)

This is bookmark-specific because most engines don't have cross-record invariants to maintain. Uploading a changed record doesn't break anything. 

I've adjusted the comment, although what it says was already part of the comment at the start of the function (although perhaps not as clearly, and it doesn't hurt to be explicit).
(In reply to Thom Chiovoloni [:tcsc] from comment #8)
> I waffled about this a decent amount. We'll need discarded in some cases
> anyway because if we truly failed to upload it (it's too big), we shouldn't
> include it in the ids (IMO), and so I think we'll need most of this
> complexity anyway.

Yeah, I agree we don't want to record it in this case (or possibly better would be to report that fact to the requestor, but not in this bug :)

> Given that they're roughly the same complexity, it seems like not including
> the IDs is the right call, since otherwise e.g. iOS won't be able to rely on
> getting all the responses to assume repair is done. It might not be done
> until this device syncs at some unknown point in the future.

My concern is that we request id X. The responder finds that along with X we also need to upload 10 children records. However, the response indicates we didn't supply X because it was locally changed - however, X and all its children *do* end up on the server. The requestor tries another device for X which uploads a *different* representation of X, possibly with mismatched children. This seems bad.

It seems there are 2 edge-cases here in which we might hit this:
* At repair time the item in question was already in the changeset.
* At repair time the item was not in the changeset, but appeared in the changeset between the strong upload and us getting to the weak upload.

The latter seems a much shorter window of opportunity. However, IIUC, in the former, we are declining to list the ID even though it is *already* on the server, courtesy of the strong upload we just did.

Would it be possible to have the strong upload simply mark the item in the weaklist as already uploaded? Then the weak upload could decline to upload it, but still list it in the IDs of the items that were uploaded?

> Quite a lot of complexity is much stronger than I'd put it, it's about 9
> lines in one function (not counting logging or comments).

TBH I disagree - it looks alot like duplicated code with subtle changes, and it led Kit to ponder about where a matching notification of the same topic might live.  IOW, it seems both Kit and I were some level of confused on first reading the code.

> We don't remove items that have changed from the weak upload set, and so it's possible
> that these didn't change due to a repair (or a dateAdded update) that happened this 
> sync, but one that happened earlier, which makes this more likely (although maybe
> you'd still consider it an edge case).

I'm not clear on what you are saying here. The patch does remove items from this._needWeakUpload during the "strong" upload, but I suspect that's not what you mean.

> -- Same for if the sync failed due to XIUS check failing or similar.

That's actually an excellent point and one that is worth considering - it seems possible that's going to be relatively common. IIUC, in the current code without your patch, we'll simply not get the "weave:engine:sync:uploaded" notification, so a future sync after the 412 failure will still trigger our observer and do the right thing. I suspect your patch should work the same there?

> But I'm willing to take it out if you think its warranted, it shouldn't be
> needed for correctness.

I think that's the yardstick - if it's not needed for correctness I think it shouldn't exist (or to look at it another way, if it's not for correctness, what *is* it for?)

> > what will go wrong if we avoid this complexity and treat tombstones just like normal items?
> 
> I tried that first. If we don't remember that "no really, these should be a
> tombstone" we won't upload a tombstone for records that do exist locally but
> shouldn't be on the server.

I understand why we track it's a tombstone, but I don't understand why we need the special handling of them here - we are removing "real" changes from the set under the assumption they will be strong-uploaded later, which seems like it will work fine for tombstones too. Unless the *real* change isn't for a tombstone, in which case we probably have bigger problems?

> This is bookmark-specific because most engines don't have cross-record
> invariants to maintain. Uploading a changed record doesn't break anything. 

OTOH though, it also can't hurt those other collections. I'm concerned we now have multiple distinct variables in play to rationalize how this works. However, this patch doesn't really introduce that, so I guess it's really just a meta-comment/meta-sob.
(In reply to Mark Hammond [:markh] from comment #10)
>
> My concern is that we request id X. The responder finds that along with X we
> also need to upload 10 children records. However, the response indicates we
> didn't supply X because it was locally changed - however, X and all its
> children *do* end up on the server. The requestor tries another device for X
> which uploads a *different* representation of X, possibly with mismatched
> children. This seems bad.
> 
> It seems there are 2 edge-cases here in which we might hit this:
> * At repair time the item in question was already in the changeset.

It will still appear in the ids in this case. We remove it from the weak set, but don't add it to discarded. Which is effectively the same as your suggestion, although more implicit (it is probably worth calling this out in a comment, if we keep these semantics -- which, given the later comments, remains to be seen)

> 
> > Quite a lot of complexity is much stronger than I'd put it, it's about 9
> > lines in one function (not counting logging or comments).
> 
> TBH I disagree - it looks alot like duplicated code with subtle changes, and
> it led Kit to ponder about where a matching notification of the same topic
> might live.  IOW, it seems both Kit and I were some level of confused on
> first reading the code.

I think this would be all we could avoid: https://gist.github.com/thomcc/47319c68649b9d3347620b9aaf71d3bd. I'm willing to do it, but it doesn't seem like that much additional complexity or duplicated code (and even what's there could be simplified), so maybe there's some other part I'm missing that we could simplify if this change is made?

Really, I think I'm misunderstanding how much you're saying we should simplify. Are you arguing against having weak upload at all? I interpreted your argument as being against the uploadedAsStrong array.

> 
> > We don't remove items that have changed from the weak upload set, and so it's possible
> > that these didn't change due to a repair (or a dateAdded update) that happened this 
> > sync, but one that happened earlier, which makes this more likely (although maybe
> > you'd still consider it an edge case).
> 
> I'm not clear on what you are saying here. The patch does remove items from
> this._needWeakUpload during the "strong" upload, but I suspect that's not
> what you mean.

We'll remove them in the *next* strong upload, but not from the current one, is what I meant. And so every item in the weak upload set *will* eventually be uploaded as part of a weak batch, regardless of whether or not we have a reason to upload it "strongly".

> 
> That's actually an excellent point and one that is worth considering - it
> seems possible that's going to be relatively common. IIUC, in the current
> code without your patch, we'll simply not get the
> "weave:engine:sync:uploaded" notification, so a future sync after the 412
> failure will still trigger our observer and do the right thing. I suspect
> your patch should work the same there?

We'll actually still get it, that notification is sent on failures as well. And so it will still signal that we did upload the records (which is untrue). My patch will fix this issue, and we won't trigger the observer until the future sync (this is part of why I made it a separate notification).

> I think that's the yardstick - if it's not needed for correctness I think it
> shouldn't exist (or to look at it another way, if it's not for correctness,
> what *is* it for?)

Performance, lower network usage, etc (of course, whether these are good reasons depends on "how much").

Or, really, to maintain the same semantics (we don't upload records twice) that we had in both the dateAdded weak upload and the bookmark repair one (which isn't a good reason to keep it, so long as nobody relies on those semantics, which is almost certainly the case).

> I understand why we track it's a tombstone, but I don't understand why we
> need the special handling of them here ...

Ah, I missed that this was just about *this* code, and not why we have forceTombstone in general.

> ... Unless the *real* change isn't
> for a tombstone, in which case we probably have bigger problems?

That's basically what the code is avoiding, a case where getChangedIds returns an id of a record that something else requested as a forceTombstone. It's possible (likely even) we should consider that a bug in getChangedIds, though.
(In reply to Thom Chiovoloni [:tcsc] from comment #11)
> (In reply to Mark Hammond [:markh] from comment #10)
> >
> > My concern is that we request id X. The responder finds that along with X we
> > also need to upload 10 children records. However, the response indicates we
> > didn't supply X because it was locally changed - however, X and all its
> > children *do* end up on the server. The requestor tries another device for X
> > which uploads a *different* representation of X, possibly with mismatched
> > children. This seems bad.
> > 
> > It seems there are 2 edge-cases here in which we might hit this:
> > * At repair time the item in question was already in the changeset.
> 
> It will still appear in the ids in this case. We remove it from the weak
> set, but don't add it to discarded. Which is effectively the same as your
> suggestion, although more implicit (it is probably worth calling this out in
> a comment, if we keep these semantics -- which, given the later comments,
> remains to be seen)

Ah, yeah, thanks.

> I think this would be all we could avoid:
> https://gist.github.com/thomcc/47319c68649b9d3347620b9aaf71d3bd. I'm willing
> to do it, but it doesn't seem like that much additional complexity or
> duplicated code (and even what's there could be simplified), so maybe
> there's some other part I'm missing that we could simplify if this change is
> made?

Yeah, that's what I had in mind, and yeah, if that doesn't actually change the semantics of how the patch works, IMO it should be removed.

> 
> Really, I think I'm misunderstanding how much you're saying we should
> simplify. Are you arguing against having weak upload at all? I interpreted
> your argument as being against the uploadedAsStrong array.

Yes, the gist above, and I was also suggesting we might be able to avoid the |let forcedTombstones = new Set();| code (note - not tombstones in general, just the special handling of them in buildWeakUploadMap() in bookmarks.js. It's (obviously:) possible I'm missing something subtle there, but ISTM that the semantics should be the same without that code (at the cost of very very rarely uploading a tombstone twice)

> > That's actually an excellent point and one that is worth considering - it
> > seems possible that's going to be relatively common. IIUC, in the current
> > code without your patch, we'll simply not get the
> > "weave:engine:sync:uploaded" notification, so a future sync after the 412
> > failure will still trigger our observer and do the right thing. I suspect
> > your patch should work the same there?
> 
> We'll actually still get it, that notification is sent on failures as well.

Ouch, yeah, thanks.

> And so it will still signal that we did upload the records (which is
> untrue). My patch will fix this issue, and we won't trigger the observer
> until the future sync (this is part of why I made it a separate
> notification).

Makes sense, thanks.

> > I think that's the yardstick - if it's not needed for correctness I think it
> > shouldn't exist (or to look at it another way, if it's not for correctness,
> > what *is* it for?)
> 
> Performance, lower network usage, etc (of course, whether these are good
> reasons depends on "how much").

I think at this stage I'd call that premature - eg, I could imagine performance was actually worse (albeit trivially so) creating a map that is almost certainly always empty. This code is already so complicated that I personally value clarity very highly :)

> That's basically what the code is avoiding, a case where getChangedIds
> returns an id of a record that something else requested as a forceTombstone.
> It's possible (likely even) we should consider that a bug in getChangedIds,
> though.

Right - so yeah, in general, I think I'd rather have "assertions" (ie, throwing) than work-arounds for these should-be-impossible cases as it (IMO) actually increases clarity.

It also seems that many of these cases we are guarding against and improvements we have made don't have explicit tests, which I think would be worthwhile - eg, items in both weak and string changesets, failure to perform the upload, etc. I really wish we had decent coverage tooling :/
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review138010

This version is much simpler, but I don't understand how repair will work in the 412 or other general failure case. It may be that we could just re-do the repair response in that case? Either way, I think we probably want a repair test specifically for that case.

::: services/sync/modules/bookmark_repair.js:698
(Diff revision 3)
> +      // listener, and just wait until we happen to sync again
> +      return;
> +    }
>      Svc.Obs.remove("weave:engine:sync:uploaded", this.onUploaded, this);
> -    log.debug(`bookmarks engine has uploaded stuff - creating a repair response`);
> -    this._finishRepair();
> +    log.debug(`bookmarks engine has uploaded stuff - creating a repair response`, subject);
> +    this._finishRepair(subject);

_finishRepair doesn't take a param

::: services/sync/modules/engines.js:777
(Diff revision 3)
> +  //   of records that should exist locally, but should never be uploaded to the
> +  //   server (note that not all sync engines support tombstones)
> +  //
> +  // The difference between this and a "normal" upload is that these records
> +  // are only tracked in memory, and if the upload attempt fails (shutdown,
>    // 412, etc), we abort uploading the "weak" set.

hrmph - I missed this before and it appears to still be true - but I don't think we want this behaviour for repair, right?
Attachment #8855533 - Flags: review?(markh)
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review138010

> hrmph - I missed this before and it appears to still be true - but I don't think we want this behaviour for repair, right?

We do. We'll restart the whole repair response process the next sync, if we didn't abort, we'd upload it twice.
Priority: P2 → P1
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review142778

Clearing review per Mark's feedback in comment 14. Please let me know if you'd like for me to take another look. Thanks!
Attachment #8855533 - Flags: review?(kit)
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review149880

::: services/sync/modules/bookmark_repair.js:691
(Diff revision 4)
>  
>    onUploaded(subject, data) {
>      if (data != "bookmarks") {
>        return;
>      }
> +    if (subject.failed) {

In a previous comment you mentioned that "We'll restart the whole repair response process the next sync", so ISTM this observer may never be removed?

::: services/sync/modules/bookmark_repair.js:710
(Diff revision 4)
>        request: this._currentState.request.request,
>        collection: "bookmarks",
>        clientID: clientsEngine.localID,
>        flowID,
>        ids: this._currentState.ids,
> -    }
> +    };

surprised that either eslint wasn't complaining about this before, or isn't complaining about it now!

::: services/sync/tests/unit/test_bookmark_engine.js
(Diff revision 4)
>  
>      let record5 = await store.createRecord(item5GUID);
>      equal(record5.dateAdded, item5LastModified * 1000,
>            "If no dateAdded is provided, lastModified should be used (even if it's in the future)");
>  
> -    let item6WBO = JSON.parse(JSON.parse(collection._wbos[item6GUID].payload).ciphertext);

I understand this test would require changes, but don't understand why it can be removed?
Attachment #8855533 - Flags: review?(markh)
Attached patch new test (obsolete) — Splinter Review
Bugger - hit the wrong button. I think we need a new test for this error case (I've attached a WIP, which appears to confirm that not removing the observer is a problem), but you also mentioned other issues this should fix (which I'm running out of time to find), and I think we need tests for them - in general, it seems this has reduced test coverage which seems the wrong direction :)
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review149914

I'd like to have another look with Mark's feedback addressed. Thanks!

::: services/sync/modules/engines.js:779
(Diff revision 4)
> +  //   about the record, and write a tombstone for it anyway -- e.g. in the case
> +  //   of records that should exist locally, but should never be uploaded to the
> +  //   server (note that not all sync engines support tombstones)
> +  //
> +  // The difference between this and a "normal" upload is that these records
> +  // are only tracked in memory, and if the upload attempt fails (shutdown,

The second part is no longer true, right? They're uploaded in the same batch as strong records?

::: services/sync/modules/engines.js:1648
(Diff revision 4)
>  
>          counts.failed += failed.length;
>  
>          for (let id of successful) {
>            this._modified.delete(id);
> +          this._needWeakUpload.delete(id);

In case a record that we've already flagged for strong upload appears in the weak upload set?
Attachment #8855533 - Flags: review?(kit)
(In reply to Mark Hammond [:markh] from comment #19)
> Created attachment 8875616 [details] [diff] [review]
> new test
> 
> Bugger - hit the wrong button. I think we need a new test for this error
> case (I've attached a WIP, which appears to confirm that not removing the
> observer is a problem), 

Fixed, I hadn't realized that repair() would be called every sync until we either explicitly failed or finished. (Which makes perfect sense, since we haven't removed the command)

> but you also mentioned other issues this should fix
> (which I'm running out of time to find), and I think we need tests for them

This (there's really only one issue) is covered by your test -- if we failed to commit a bookmarks batch (due to some failure) we would previously report that we did upload all the items. (To be explicit, this is covered by the `checkRecordedEvents([]);`).

> - in general, it seems this has reduced test coverage which seems the wrong
> direction :)

I think most of this is because we are relaxing the guarantees made by the previous system. E.g. there are fewer tests, but most of those are were testing the parts of the code we decided to delete (specifically, that we wouldn't upload records that have changed locally). We could have kept that test code, but it would require the complexity that was present in earlier versions of this patch (some of it, at least).
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review149914

> The second part is no longer true, right? They're uploaded in the same batch as strong records?

It wasn't really true. Now it's true in that we don't persist the map past the sync. I've made the comment clearer.

> In case a record that we've already flagged for strong upload appears in the weak upload set?

This part was actually unnecessary now.
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review149880

> I understand this test would require changes, but don't understand why it can be removed?

We no longer make the guarantee that we don't upload records that have changed locally. The purpose of buildWeakReuploadMap was to support that guaratee, and because we don't make it anymore, there's nothing to test.
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review163866

LGTM. So now, the only difference between strong and weak uploads is that weak IDs are tracked in memory, correct? They're not uploaded in a separate batch, and a conflict would cause us to abort the batch?

::: services/sync/modules/engines.js:1684
(Diff revision 6)
>            ok = true;
>          } catch (ex) {
>            this._log.warn("Error creating record", ex);
>            ++counts.failed;
> -          if (Async.isShutdownException(ex) || !this.allowSkippedRecord) {
> +          if (!this.allowSkippedRecord) {
> +            // Don't bother for shutdown errors

This will still notify if `Async.isShutdownException`, IIUC. Did you mean to skip notifying in that case?
Attachment #8855533 - Flags: review?(kit) → review+
Comment on attachment 8855533 [details]
Bug 1337978 - Unify the multiple notions of 'weak upload' in sync.

https://reviewboard.mozilla.org/r/127366/#review164960

Awesome, thanks! r=me with comments addressed.

::: services/sync/modules/bookmark_repair.js:731
(Diff revision 6)
>      }
>      this._currentState = null;
>    }
>  
>    async _abortRepair(request, rawCommand, why) {
> +    Svc.Obs.remove("weave:engine:sync:uploaded", this.onUploaded, this);

why this change? There's only 1 `_abortRepair` call I can see and it looks impossible that it would have added the observer.

(FWIW, I also opened bug 1382913, as currently failure to remove an observer is silent, which I think is bad)

::: services/sync/modules/engines.js:1684
(Diff revision 6)
>            ok = true;
>          } catch (ex) {
>            this._log.warn("Error creating record", ex);
>            ++counts.failed;
> -          if (Async.isShutdownException(ex) || !this.allowSkippedRecord) {
> +          if (!this.allowSkippedRecord) {
> +            // Don't bother for shutdown errors

Yeah, ISTM we should probably skip notification here.

::: services/sync/modules/engines/bookmarks.js
(Diff revision 6)
>  };
>  
>  class BookmarksChangeset extends Changeset {
>    constructor() {
>      super();
> -    // Weak changes are part of the changeset, but don't bump the change

can we just remove the constructor completely? (Maybe not - I'm too lazy to google/test if this is needed when empty)

::: services/sync/tests/unit/test_bookmark_repair_responder.js:609
(Diff revision 6)
> +  ])
> +
> +  equal(numFailures, 1);
> +  equal(numSuccesses, 1);
> +
> +  Svc.Obs.remove("weave:engine:sync:uploaded", onUploaded);

oops - this came from me, but we need a trailing |this| arg here (another reason why I think bug 1382913 is important - this observer silently failed to be removed). I only noticed this because I forced failure when the observer wasn't found, and this triggered it.
Attachment #8855533 - Flags: review?(markh) → review+
Attachment #8875616 - Attachment is obsolete: true
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e631feaa72e9
Unify the multiple notions of 'weak upload' in sync. r=kitcambridge,markh
https://hg.mozilla.org/mozilla-central/rev/e631feaa72e9
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
You need to log in before you can comment on or make changes to this bug.