Closed Bug 1346438 Opened 5 years ago Closed 5 years ago

Use 'X-If-Unmodified-Since' header when uploading meta/global and crypto/keys to avoid race condition

Categories

(Firefox for Android Graveyard :: Android Sync, enhancement, P1)

enhancement

Tracking

(firefox55 fixed)

RESOLVED FIXED
Firefox 55
Tracking Status
firefox55 --- fixed

People

(Reporter: Grisha, Assigned: Grisha)

References

Details

Attachments

(2 files)

Assignee: nobody → gkruglov
Status: NEW → ASSIGNED
Priority: -- → P1
We upload m/g and c/k during a "freshStart". Those happen in response to:
- 404 when fetching either m/g or c/k
- missing storage version in fetched m/g
- outdated storage version in fetched m/g
- missing sync ID in fetched m/g

During a "freshStart", we first wipe the server, then upload m/g, then upload c/k. It seems sufficient here to specify XIUS=0, which will ensure that no other client can upload m/g or c/k before us.
Additionally, we also upload m/g:
- at the end of sync, if necessary
- after aborting sync for a non-http-failure reason

In those cases XIUS should be set to m/g's last-modified timestamp we received from info/collections.
Comment on attachment 8847376 [details]
Bug 1346438 - Specify X-I-U-S header value as 0 when uploading crypto/keys

https://reviewboard.mozilla.org/r/120344/#review122314

Are you confident you understand what happens on a 412?  Can you explain to me what happens?

I don't know if this runs in automation right now, but there are tests for much of this meta/global handling in https://dxr.mozilla.org/mozilla-central/source/mobile/android/tests/background/junit4/src/org/mozilla/gecko/sync/stage/test/TestFetchMetaGlobalStage.java?q=path%3A%2Ajava+testfresh&redirect_type=single#355.  I'd like to see if you can add a 412 test to assert what you expect.
Comment on attachment 8847375 [details]
Bug 1346438 - Specify X-I-U-S header value while uploading meta/global

https://reviewboard.mozilla.org/r/120342/#review122318

I've suggested a better approach, I think.  rnewman may have additional comments, or might do the final review instead.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java:414
(Diff revision 1)
>      Runnable doUpload = new Runnable() {
>        @Override
>        public void run() {
> -        config.metaGlobal.upload(new MetaGlobalDelegate() {
> +        // During regular m/g upload, set X-I-U-S to the last-modified value of m/g in info/collections,
> +        // to ensure we catch concurrent modifications by other clients. See Bug 1346438.
> +        config.metaGlobal.upload(config.infoCollections.getTimestamp("meta"), new MetaGlobalDelegate() {

This will silently coerce `null` to an NPE if for some reason `info/collections` does not include `meta/global`.  I think you should be defensive here and either:
- value check `null` and use 0
- accept `Long` and handle it in the `MetaGlobal.upload`.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/net/SyncStorageRequest.java:63
(Diff revision 1)
>        return SERVER_ERROR_MESSAGES.get(body);
>      }
>      return body;
>    }
>  
> +  private String ifUnmodifiedSinceTimestamp;

Whatever else happens here, this should be a `long` (or `Long`, if you want it to be optional).

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/net/SyncStorageRequest.java:161
(Diff revision 1)
>        this.request.delegate.handleRequestError(e);
>      }
>  
>      @Override
>      public void addHeaders(HttpRequestBase request, DefaultHttpClient client) {
> +      final String requestIfUnmodifiedSince = this.request.ifUnmodifiedSinceTimestamp;

Yes, things are terribly awkward in this code, but this solution is even more awkward.  Why not make `MetaGlobal` not be its own delegate -- or produce a delegate that wraps `this` if you must -- and have the delegate own the X-I-U-S, as it should?  This fall-through is a dicy approach.
Attachment #8847375 - Flags: review-
Nick's Comment 6 makes sense to me, so I'll await a new patch before reviewing.

(Also I should be asleep.)
Comment on attachment 8847375 [details]
Bug 1346438 - Specify X-I-U-S header value while uploading meta/global

https://reviewboard.mozilla.org/r/120342/#review122318

> Yes, things are terribly awkward in this code, but this solution is even more awkward.  Why not make `MetaGlobal` not be its own delegate -- or produce a delegate that wraps `this` if you must -- and have the delegate own the X-I-U-S, as it should?  This fall-through is a dicy approach.

Can you elaborate on why you think this particular thing is dicy?

To me it seems _less_ dicy than having delegate own some of the headers, which comes with its own set of interesting properties - namely, it makes it harder to reason about how your request will actually look like, contents+header, since you're populating those in different places, at different time, and perhaps from different threads. Having to jump through additional hoops as you suggest instead of simply setting xius header at the same as constructing the rest of the payload is strange to me.
Comment on attachment 8847376 [details]
Bug 1346438 - Specify X-I-U-S header value as 0 when uploading crypto/keys

https://reviewboard.mozilla.org/r/120344/#review122964

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java:788
(Diff revision 1)
>              if (keys == null) {
>                Logger.warn(LOG_TAG, "Got null keys from generateNewKeys; failing fresh start.");
>                freshStartDelegate.onFreshStartFailed(null);
>              }
>  
> -            // Upload new keys.
> +            // Upload new keys. Implementation will set X-I-U-S to 0, as per Bug 1346438.

I would prefer we pass `0` as an argument to `uploadKeys`; it's strictly more general, it matches what we do with `meta/global` just above, and this is the spot where you explain why we're doing it!

Also probably worth fleshing out that comment: we're using `0` because we're in the process of populating a server that we just wiped, so we expect there to be no keys on the server yet.
Attachment #8847376 - Flags: review?(rnewman) → review-
(In reply to Nick Alexander :nalexander from comment #5)
> Are you confident you understand what happens on a 412?  Can you explain to
> me what happens?

There are a few possibilities here. For meta/global, we may receive a 412 during:
1) fresh start
2) upload of changed meta after a successful sync
3) upload of changed meta after aborted sync

If (2) or (3) happen, this might signify that significant changes happened to some/all of our storage. Requesting an immediate re-sync seems like the thing to do.

If (1) happens, I think that means one of two things, and I'm not sure we can tell them apart:
- our wipe failed, even though it was reported as success. This seems very unlikely.
- somehow another client uploaded a meta/global before us. This seems "unlikely but possible". It seems that requesting an immediate re-sync is useful in this case?

In either case, we need to ensure there's no chance of getting sync adapter into a loop - sync, 412, sync, 412... Is it possible that we'll keep getting hit by 412s time and time again for this particular collection? To me that scenarios seems unlikely.
Heh, seems like I've accidentally broken sync by setting X-I-U-S on meta/global fetch as well.
That's better. But, there's an unnecessary follow-up sync being requested; I'll push an update.
Comment on attachment 8847375 [details]
Bug 1346438 - Specify X-I-U-S header value while uploading meta/global

https://reviewboard.mozilla.org/r/120342/#review124032

Man, that test code just exploded in size :)  I'm OK with this if you are.

::: commit-message-cef93:11
(Diff revision 3)
> +- when it was modified after an aborted sync
> +
> +Use X-I-U-S header to assert what we believe about meta/global's presence (during freshStart)
> +and last-modified timestamp (in all other cases).
> +
> +We might encounter a concurrent modification condition, manifesting as a 412 error:

nit: ... as a 412 error.  If we see such an error:
- on fresh start, we restart ...
- on regular upload, we ...

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/sync/FxAccountSyncAdapter.java:617
(Diff revision 3)
>  
> +    lastSyncRealtimeMillis = SystemClock.elapsedRealtime();
> +
> +    // We got to this point without being offered a result, and so it's unwise to proceed with
> +    // trying to sync stages again. Nothing else we can do but log an error.
> +    if (offeredResult == null) {

Looking at the `latch.offer` calls, this seems like unnecessary paranoia.  Can you get `null` here?  How does that happen?

I note you're not checking if the offered result is `Result.success`; is that intentional?

nit: consider renaming `latch` to `blockingQueue` or something as a post/review commit, 'cuz it's not a `CountdownLatch` anymore :)

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java:749
(Diff revision 3)
> +          globalSession.abort(e, "Fresh start failed.");
> +          return;
> +        }
> +
> +        if (((HTTPFailureException) e).response.getStatusCode() != 412) {
> -        globalSession.abort(e, "Fresh start failed.");
> +          globalSession.abort(e, "Fresh start failed.");

nit: failed with non-412 status code."  Just to differentiate from the above, although I bet the exception gets logged containing this information as well.
Attachment #8847375 - Flags: review?(nalexander) → review+
Comment on attachment 8847376 [details]
Bug 1346438 - Specify X-I-U-S header value as 0 when uploading crypto/keys

https://reviewboard.mozilla.org/r/120344/#review124044

You might as well fold this into the earlier patch, since it did all the work (and includes the tests, I think).
Attachment #8847376 - Flags: review?(nalexander) → review+
Comment on attachment 8847375 [details]
Bug 1346438 - Specify X-I-U-S header value while uploading meta/global

https://reviewboard.mozilla.org/r/120342/#review124032

> Looking at the `latch.offer` calls, this seems like unnecessary paranoia.  Can you get `null` here?  How does that happen?
> 
> I note you're not checking if the offered result is `Result.success`; is that intentional?
> 
> nit: consider renaming `latch` to `blockingQueue` or something as a post/review commit, 'cuz it's not a `CountdownLatch` anymore :)

This seems to really cover the "something threw an exception before we got to the `offer` call" case. So we could probably also just return in the catch statement above... Meh.

Regarding checking for Success - yes, that's intentional. One of the prime examples of "stage needs to be re-synced" is failing with 412.
(In reply to :Grisha Kruglov from comment #16)
> That's better. But, there's an unnecessary follow-up sync being requested;
> I'll push an update.

An `if` statement was checking the wrong thing. All good now.
Pushed by gkruglov@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3250c8e72a98
Specify X-I-U-S header value while uploading meta/global r=nalexander
https://hg.mozilla.org/integration/autoland/rev/0db18c13ea0a
Specify X-I-U-S header value as 0 when uploading crypto/keys r=nalexander
https://hg.mozilla.org/mozilla-central/rev/3250c8e72a98
https://hg.mozilla.org/mozilla-central/rev/0db18c13ea0a
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Attachment #8847375 - Flags: review?(rnewman)
Attachment #8847376 - Flags: review?(rnewman)
Product: Android Background Services → Firefox for Android
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.