Closed Bug 1253111 Opened 4 years ago Closed 3 years ago

[Android] Upload records atomically

Categories

(Firefox for Android :: Android Sync, defect, P1)

All
Android
defect

Tracking

()

RESOLVED FIXED
Firefox 51
Iteration:
1.3
Tracking Status
relnote-firefox --- 51+
firefox51 --- fixed

People

(Reporter: rnewman, Assigned: Grisha)

References

Details

(Whiteboard: [sync-atomic][MobileAS])

Attachments

(2 files, 1 obsolete file)

This is the Android Sync equivalent of Bug 1253051. See that bug for details.
See Also: → 1253112
Priority: -- → P1
Whiteboard: [data-integrity]
Assignee: nobody → gkruglov
An early WIP (mostly TODOs in hopefully the right places); Richard, is this roughly on a right track?
Flags: needinfo?(rnewman)
https://reviewboard.mozilla.org/r/66264/#review63642

Yes, this looks like a good start.

Note that it's not OK to call upload delegate methods until the batch is committed, so I would encourage you to introduce another layer of indirection here -- a batching uploader that handles the batching bookkeeping and invokes delegate methods at the right time. Use or don't use the batching version according to info/configuration, but calling code doesn't really care -- it just streams in records and gets told when they made it to the server.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoConfiguration.java:1
(Diff revision 1)
> +package org.mozilla.gecko.sync;

License block.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/stage/FetchInfoConfigurationStage.java:1
(Diff revision 1)
> +package org.mozilla.gecko.sync.stage;

License block.
Flags: needinfo?(rnewman)
Attachment #8773569 - Flags: feedback+
You might also want to take a skim over the discussion in the PR for Bug 1253112.
Status: NEW → ASSIGNED
Summary: Upload records atomically → [Android] Upload records atomically
Whiteboard: [data-integrity] → [data-integrity][MobileAS s1.1]
Whiteboard: [data-integrity][MobileAS s1.1] → [sync-atomic][MobileAS s1.1]
Comment on attachment 8773569 [details]
Bug 1253111 - Part 1: Introduce new sync stage to handle info/configuration

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66264/diff/1-2/
Attachment #8773569 - Attachment description: Bug 1253111: WIP for atomic uploads f=rnewman → Bug 1253111: WIP atomic uploads - refactor serverRepositorySession into two uploaders, and implement batching uploader f=rnewman
Attachment #8773569 - Flags: feedback+
I think these patches should be getting somewhat close. Most (all?) of the major pieces should be in place. I still need to ensure this actually works as expected against servers, unsafe mode didn't regress after the refactor, comments are in place, tests are running, etc.
Flags: needinfo?(rnewman)
I'm getting a patch series ready for an actual review today, that should be easier to read through than one big diff.
Review commit: https://reviewboard.mozilla.org/r/68122/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/68122/
Attachment #8773569 - Attachment description: Bug 1253111: WIP atomic uploads - refactor serverRepositorySession into two uploaders, and implement batching uploader f=rnewman → Bug 1253111 - WIP Part 1: Introduce new sync stage to handle info/configuration
Attachment #8776224 - Flags: review?(rnewman)
Attachment #8776225 - Flags: review?(rnewman)
Attachment #8773569 - Flags: review?(rnewman)
Comment on attachment 8773569 [details]
Bug 1253111 - Part 1: Introduce new sync stage to handle info/configuration

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66264/diff/2-3/
Whiteboard: [sync-atomic][MobileAS s1.1] → [sync-atomic][MobileAS s1.2]
Comment on attachment 8773569 [details]
Bug 1253111 - Part 1: Introduce new sync stage to handle info/configuration

https://reviewboard.mozilla.org/r/66264/#review67374

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java:178
(Diff revision 3)
>    }
>  
>    protected void prepareStages() {
>      Map<Stage, GlobalSyncStage> stages = new EnumMap<Stage, GlobalSyncStage>(Stage.class);
>  
> +    // TODO easy/clean we can retrofit a storage-server-version-dependent list of stages?

Either don't bother or file a bug :)

We don't support Sync 1.1 now…

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoConfiguration.java:15
(Diff revision 3)
> +
> +import java.util.HashMap;
> +
> +/**
> + * Wraps and provides access to configuration data returned from info/configuration.
> + * WIP docs: https://github.com/mozilla-services/docs/pull/60/files

This was merged. You can now refer to

https://docs.services.mozilla.com/storage/apis-1.5.html#general-info

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoConfiguration.java:50
(Diff revision 3)
> +    private static final int DEFAULT_MAX_POST_BYTES = 1048576;
> +    private static final int DEFAULT_MAX_TOTAL_RECORDS = 10000;
> +    private static final int DEFAULT_MAX_TOTAL_BYTES = 104857600;
> +
> +    private final boolean requestLimitDataAvailable;
> +    private final HashMap<String, Integer> requestLimits = new HashMap<>(5);

Consider `SimpleArrayMap`, or just make these instance fields!

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoConfiguration.java:68
(Diff revision 3)
> +
> +    public boolean isBatchingModeSupported() {
> +        return requestLimitDataAvailable;
> +    }
> +
> +    public Integer getRequestLimitValue(String key) {

Seems like having known fields is an easy win for readability, type-broadening, and efficiency:

  final InfoConfiguration ic = ...;
  long limitBytes = ic.maxPostBytes;

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/stage/FetchInfoConfigurationStage.java:43
(Diff revision 3)
> +            if (response.getStatusCode() == 404) {
> +                session.config.infoConfiguration = new InfoConfiguration();
> +                session.advance();
> +
> +            // Handle all other failures upstream.
> +            } else {

Prefer early returns.
Attachment #8773569 - Flags: review?(rnewman) → review+
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review67380

<p>I'm assuming this is a conservative refactor. But check your lock safety…</p>

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/MayUploadProvider.java:7
(Diff revision 1)
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +package org.mozilla.gecko.sync.repositories.uploaders;
> +
> +// TODO name this better

…

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:33
(Diff revision 1)
> +    private static byte[] recordSeparator;
> +    private static byte[] recordsEnd;
> +
> +    static {
> +        try {
> +            recordsStart    = "[\n".getBytes("UTF-8");

`java.nio.charset.StandardCharsets.UTF_8`

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:54
(Diff revision 1)
> +                                final URI postUri,
> +                                final SyncStorageRequestDelegate uploadDelegate,
> +                                ArrayList<byte[]> outgoing,
> +                                long byteCount) {
> +        this.mayUploadProvider = mayUploadProvider;
> +        this.postUri = postUri;

I have a preference for `postURI`.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploader.java:31
(Diff revision 1)
> +    protected final Server11RepositorySession repositorySession;
> +
> +    protected AtomicLong uploadTimestamp = new AtomicLong(0);
> +
> +    // Comma, newline.
> +    protected static final int PER_RECORD_OVERHEAD = 2;

`PER_RECORD_OVERHEAD_BYTE_COUNT`

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/UnsafeBatchingUploader.java:33
(Diff revision 1)
> + *
> + * A call to processDone would also trigger a flush.
> + *
> + * Thresholds we consider are either number of records or their total byte size (see constants below).
> + */
> +public class UnsafeBatchingUploader extends RecordUploader {

This name seems misleading. How about just NonAtomicUploader? (And make the LOG_TAG match.)

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/UnsafeBatchingUploader.java:73
(Diff revision 1)
> +    }
> +
> +    // Asynchronously upload records.
> +    // Must be locked!
> +    private void flush() {
> +        if (recordsBuffer.size() > 0) {

Firstly, `!recordsBuffer.isEmpty()`.

Secondly, you forgot to take the lock!
Attachment #8776224 - Flags: review?(rnewman) → review+
Comment on attachment 8776225 [details]
Bug 1253111 - WIP Part 3: Introduce BatchingAtomicUploader and use it when server supports batching mode

https://reviewboard.mozilla.org/r/68124/#review67394

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingAtomicUploader.java:31
(Diff revision 1)
> +/**
> + * Uploader which implements batching semantics introduced in Sync 1.5.
> + *
> + * Records are uploaded in batches according to limits defined in InfoConfiguration object.
> + * Batch ID is returned once first post succeeds, and is maintained across uploads, to ensure our
> + * uploads are grouped together. Last batch commits the series (with commit=true GET param).

You haven't addressed the case where the upload is larger than a batch. Your solution should include `MAX_TOTAL_*`, and probably involves 'bundles' -- see <https://github.com/mozilla/firefox-ios/pull/2014>.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingAtomicUploader.java:66
(Diff revision 1)
> +        flush(batchId, lastBatch);
> +    }
> +
> +    @Override
> +    public boolean shouldFlush(final int delta, final int byteCount, final ArrayList<byte[]> recordsBuffer) {
> +        return (delta + byteCount > configuration.getRequestLimitValue(InfoConfiguration.MAX_POST_BYTES)) ||

`delta` does not include record separators, so you're missing a `+ 2` here… unless this is the only record and it'll fill the batch.

You can avoid doing some edge-case checks because the max record size will always be smaller than the max POST size, fortunately.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingAtomicUploader.java:67
(Diff revision 1)
> +    }
> +
> +    @Override
> +    public boolean shouldFlush(final int delta, final int byteCount, final ArrayList<byte[]> recordsBuffer) {
> +        return (delta + byteCount > configuration.getRequestLimitValue(InfoConfiguration.MAX_POST_BYTES)) ||
> +                (recordsBuffer.size() >= configuration.getRequestLimitValue(InfoConfiguration.MAX_POST_RECORDS));

Keep a record count alongside the byte count.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingAtomicUploaderDelegate.java:119
(Diff revision 1)
> +        } catch (NonArrayJSONException e) {
> +            handleRequestError(e);
> +            return;
> +        }
> +
> +        if (success != null && success.size() > 0) {

isEmpty (throughout)
Attachment #8776225 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #13)
> You haven't addressed the case where the upload is larger than a batch. Your
> solution should include `MAX_TOTAL_*`, and probably involves 'bundles' --
> see <https://github.com/mozilla/firefox-ios/pull/2014>.

Current iOS PR splits up an upload into "batches" which consist of "payloads" (individual bunches of records which get POSTed); if total upload is larger than what one batch allows for, then multiple batches are used.

However, once we go past one batch per upload, wouldn't this cease to be an "atomic" uploader? If I understand server side of this correctly, once batch is committed, other clients will be able to see the changes right away. Until the last batch commits (and it might not), other clients will see partial changes - which is essentially exactly the same problem as we have now, only with (much) better safety margins.

Unless we ensure that per-batch limits are high enough, and refuse to upload if we can't fit data into one batch, are we essentially accepting a risk of data corruption, but for a hopefully lower percentage of users?

If this work is coupled with a two-phase application which checks for consistency of the remote state before applying it (iOS already does this for bookmarks, but Android is in early WIP stage - not sure about desktop?), then I suppose it's not necessarily a problem for structured record types.
(In reply to :Grisha Kruglov from comment #14)

> However, once we go past one batch per upload, wouldn't this cease to be an
> "atomic" uploader?

Correct.

> Unless we ensure that per-batch limits are high enough, and refuse to upload
> if we can't fit data into one batch, are we essentially accepting a risk of
> data corruption, but for a hopefully lower percentage of users?

Correct.

To be clear, the limits are high. But we have three choices:

* The current behavior: keep sending batches until we get an error from the server. This is the worst possible option: it consumes maximum bandwidth per sync and never succeeds.
* Refuse to upload at all if we can't fit in one commit. This requires checking in advance. Offers a poor user experience.
* Do our best and hope. This is what we do now without atomic uploads, and it works pretty well (for non-bookmark collections, at least).

 
> If this work is coupled with a two-phase application which checks for
> consistency of the remote state before applying it (iOS already does this
> for bookmarks, but Android is in early WIP stage - not sure about desktop?),
> then I suppose it's not necessarily a problem for structured record types.

Probably 99.9% of users have fewer bookmarks than the limit. Probably 99% of those will never need to upload *from* their mobile device.
Flags: needinfo?(rnewman)
Attachment #8776225 - Attachment is obsolete: true
Attachment #8776224 - Flags: review+ → review?(rnewman)
Comment on attachment 8773569 [details]
Bug 1253111 - Part 1: Introduce new sync stage to handle info/configuration

https://reviewboard.mozilla.org/r/66264/#review69204
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review69208
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review69464

First pass. More later today.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:42
(Diff revision 3)
> + * Once we go past using one batch this uploader is no longer "atomic". Partial state is exposed
> + * to other clients after our first batch is committed and before our last batch is committed.
> + * However, our per-batch limits are high, X-I-U-S mechanics help protect downloading clients
> + * (as long as they implement X-I-U-S) with 412 error codes in case of interleaving upload and download,
> + * and most mobile clients will not be uploading large-enough amounts of data (especially structured
> + * data, such as bookmarks).

Nice commentary.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:65
(Diff revision 3)
> +        // Access must by synchronized on payloadLock.
> +        private long byteCount = PER_PAYLOAD_OVERHEAD_BYTE_COUNT;
> +        private long recordCount = 0;
> +
> +        // Accessed by synchronously running threads.
> +        private final ArrayList<String> successRecordGuids = new ArrayList<>();

If this is accessed by more than one thread, you either need to synchronize or to use an appropriate data structure. They don't need to be racing threads in order for this to be a concern.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:83
(Diff revision 3)
> +        }
> +
> +        protected boolean canFitRecord(long recordDelta, final InfoConfiguration configuration) {
> +            final boolean canFit;
> +
> +            synchronized (payloadLock) {

Just return here and remove `canFit` altogether. The bytecode is the same.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:184
(Diff revision 3)
> +        protected final ArrayList<String> recordGuidsBuffer = new ArrayList<>();
> +
> +        protected boolean canFitRecord(final Record record, final InfoConfiguration configuration) {
> +            final boolean canFit;
> +
> +            synchronized (payloadLock) {

Same remark. I don't think this trick does anything.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:206
(Diff revision 3)
> +            }
> +        }
> +
> +        protected long getRecordByteDelta(final Record record) {
> +            final long delta;
> +            synchronized (payloadLock) {

Same.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:230
(Diff revision 3)
> +    private final InfoConfiguration configuration;
> +    private final Uri collectionUri;
> +
> +    private volatile boolean recordUploadFailed = false;
> +
> +    // Want to ensure that any updates to batchMeta (token, last-modified) are visible to all workers right away.

I'm not sure this comment makes sense. Did you mean to make this `volatile`? If so, are you sure that `volatile` applies to the members of an object, not just the reference?
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review69496

Back to you, grisha. Nearly there!

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:185
(Diff revision 3)
> +
> +        protected boolean canFitRecord(final Record record, final InfoConfiguration configuration) {
> +            final boolean canFit;
> +
> +            synchronized (payloadLock) {
> +                canFit = (getRecordByteDelta(record) + byteCount <= configuration.maxPostBytes) &&

This is the most expensive way to do this. You're serializing the record to JSON… twice, once to figure out the size, and once to actually add it.

Don't do that. Work in byte[] as soon as you get to the front of the queue.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:268
(Diff revision 3)
> +    public void process(final Record record) {
> +        synchronized (payloadLock) {
> +            Logger.debug(LOG_TAG, "Processing a record with guid: " + record.guid);
> +
> +            final boolean canFitRecordIntoPayload = payload.canFitRecord(record, configuration);
> +            final long payloadByteDelta = payload.getRecordByteDelta(record);

Oh gosh, *three* JSONifications! Or is it four?! Do this *once*.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:353
(Diff revision 3)
> +        }
> +
> +        // All of the records have been committed, so let's notify our store delegate.
> +        if (considerRecordsCommitted) {
> +            for (String guid : batchMeta.successRecordGuids) {
> +                sessionStoreDelegate.onRecordStoreSucceeded(guid);

We don't have a [guid] method for this?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:396
(Diff revision 3)
> +        if (!isBatchingSupported) {
> +            expectLastModifiedToChange = true;
> +
> +        // In batching mode, we only expect Last-Modified to change when we commit a batch.
> +        } else {
> +            expectLastModifiedToChange = isCommit;

```
expectLastModifiedToChange = isCommit || !isBatchingSupported;
```

seems clearer to me than `if-else`.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:438
(Diff revision 3)
> +     * @return Normalized timestamp in milliseconds.
> +     */
> +    public static long getNormalizedTimestamp(final String logTag, final SyncStorageResponse response) {
> +        long normalizedTimestamp = -1;
> +        try {
> +            normalizedTimestamp = response.normalizedWeaveTimestamp();

I would prefer to use `X-Last-Modified` here, if you're willing to do the work. An uploader doesn't care about the current time on the server, it cares about the last modified time of the collection.

I hypothesize that, while these will usually be the same, there are situations in which they are not (e.g., after uploading a POST with no successful records).

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:463
(Diff revision 3)
> +        }
> +    }
> +
> +    protected static long getRecordSize(final Record record, boolean withOverhead) {
> +        long byteCount = record.toJSONBytes().length;
> +        // No additional record overhead is when payload just has one record.

Note that the buffer will contain one separator for each record (the 'separator' for the first record is `[`), plus the final `]`, so you don't need this logic tweak.

One record:

[ + record + ]

Two records:

[ + record1 + , + record2 + ]

= (b + 1) * N + 1

But this method should die anyway :)

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadUploadDelegate.java:107
(Diff revision 3)
> +        }
> +
> +        try {
> +            uploader.setLastModified(
> +                    // We just checked for header presence above, so this should be fine.
> +                    response.httpResponse().getFirstHeader(X_LAST_MODIFIED).getValue(),

You should make this a method on `SyncStorageResponse`, complete with any validation or conversion it needs.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadUploadDelegate.java:135
(Diff revision 3)
> +                    Logger.error(LOG_TAG, "Got exception parsing POST success guid.", e);
> +                    // Not much to be done.
> +                }
> +            }
> +        }
> +        // GC

Sufficiently smart compiler, eh?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:38
(Diff revision 3)
> +    public final static byte[] RECORD_SEPARATOR = ",\n".getBytes(StandardCharsets.UTF_8);
> +    public final static byte[] RECORDS_END = "\n]\n".getBytes(StandardCharsets.UTF_8);
> +
> +//    public final static byte[] RECORDS_START = StandardCharsets.UTF_8.encode("[\n").array();
> +//    public final static byte[] RECORD_SEPARATOR = StandardCharsets.UTF_8.encode(",\n").array();
> +//    public final static byte[] RECORDS_END = StandardCharsets.UTF_8.encode("\n]\n").array();

You can kill these now.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:101
(Diff revision 3)
> +            for (int i = 0; i < numberOfRecords; i++) {
> +                count += outgoing.get(i).length;
> +            }
> +
> +            // Account for separators between the records.
> +            // There's one less separator than there are records.

Not if you think of the opening [ as a separator. :0
Attachment #8776224 - Flags: review?(rnewman)
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review69702

Adressed a bunch of the comments, but not yet all. As I was testing this before pushing I've had an assertion thrown that a token changed mid-way through batch - likely that's me screwing up threading and not server's fault, so there will be more pushes.
Another pass on this. As far as I can tell, batching and non-batching modes are now working correctly in various conditions. Added some tests (around payload and batch limits, token, l-m header), more to come.
Note to self, I need to revisit some of the sanity checks and make sure we're not throwing runtime exceptions because data we're receiving is weird; these should be handled upstream as "sync errors".
Whiteboard: [sync-atomic][MobileAS s1.2] → [sync-atomic][MobileAS s1.3]
Priority: P1 → P2
Whiteboard: [sync-atomic][MobileAS s1.3] → [sync-atomic][MobileAS]
Note that since we don't know whether or not batching is supported until the first payload succeeds, the current way in which payloads and batches are composed could be non-ideal when the server doesn't support batching. Specifically, if a record arrives for processing and the batch is full, we will commit the current payload - but it's possible that it isn't as full as it could have been.

Example: current payload has 5 records, and "maxRecordsPerPayload" limit is 10. However, current batch is at 20/20 records, so we commit the half-empty payload. This is "the way" of the batching mode, but obviously non-optimal when batch limits don't apply.
(note that actual limits are much higher)

I feel that for the sake of simplicity, this is a fair trade-off. The alternative would be to do more bookkeeping around incoming records, and either reshuffle them between payloads/batches once we know that that we can do batching, or somehow delay composing payloads/batches until after the first payload succeeds. Either way, this adds complexity which I don't feel is warranted to optimize for server behaviour which is getting shortly phased out.

A counter argument to this is that non-batching uploads are prone to screwing up our data, so if we're forced to do non-batching uploads, we should be trying to "minimize damage", so to speak. It seems to me that breaking records down into the smallest amount of payloads is strictly safer and less error prone than having more smaller payloads, and in certain cases the current batching logic won't be achieving that. Worst case, current logic might even post a payload consisting of just one or two records, and then proceed to post dozens more in consequent payloads.

However, this patch does introduce some additional measure of safety to non-batching mode:
- Last-Modified headers are now used while processing uploads; this will help constraint certain concurrent modification issues
- Limits have been increased two-fold. Payload limit used to be 50 records, and now the default (and likely what non-batching will use) is 100 records. This should help reduce our data corruption as well, especially for structured record types.

Another thing to consider is that our batch limits are high (I believe on the order of 10,000 records). So switching between batches should be a rare occurrence, and thus all of this is probably moot anyway for all but the edgier of cases.
> A counter argument to this is that non-batching uploads are prone
> to screwing up our data, so if we're forced to do non-batching
> uploads, we should be trying to "minimize damage", so to speak.

For a while we will be using this code with batching turned off, then gradually dial it up. We shouldn't be acting as if atomic upload support is the default just yet.

There might be one reasonable change: track whether or not we are using batching. If we know we're not -- that is, we either have no info/configuration or we didn't get a batch identifier -- then don't check the batch record count limits at all.

We'll then simply fill 100-record/1MB payloads over and over.

Another way of phrasing that: if batching is disabled, make the per-batch record count MAX_INT. We'll then never submit an individual payload just to avoid overflowing the batch limit.

Behavior around the edge cases of the 10,000-record batches, with batching enabled, I don't think is worth worrying about.
(In reply to Richard Newman [:rnewman] from comment #33)
> > A counter argument to this is that non-batching uploads are prone
> > to screwing up our data, so if we're forced to do non-batching
> > uploads, we should be trying to "minimize damage", so to speak.
> 
> For a while we will be using this code with batching turned off, then
> gradually dial it up. We shouldn't be acting as if atomic upload support is
> the default just yet.
> 
> There might be one reasonable change: track whether or not we are using
> batching. If we know we're not -- that is, we either have no
> info/configuration or we didn't get a batch identifier -- then don't check
> the batch record count limits at all.
> 
> We'll then simply fill 100-record/1MB payloads over and over.
> 
> Another way of phrasing that: if batching is disabled, make the per-batch
> record count MAX_INT. We'll then never submit an individual payload just to
> avoid overflowing the batch limit.

That is what I've been doing up to this point, check if we know batching isn't supported and consider batches to be of unlimited size. I've "simplified this away", but I suppose it's a perfectly valid approach. I'll that that check back in, thankfully it's just one if statement.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70776

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/Server11RepositorySession.java:161
(Diff revision 8)
> +    return serverRepository;
> +  }
> +
> +  @Override
> +  public void setStoreDelegate(RepositorySessionStoreDelegate delegate) {
> +    Logger.debug(LOG_TAG, "Setting store delegate to " + delegate);

No need for this.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchMeta.java:17
(Diff revision 8)
> +import org.mozilla.gecko.sync.repositories.uploaders.BatchingUploader.TokenModifiedException;
> +import org.mozilla.gecko.sync.repositories.uploaders.BatchingUploader.LastModifiedChangedUnexpectedly;
> +import org.mozilla.gecko.sync.repositories.uploaders.BatchingUploader.LastModifiedDidNotChange;
> +
> +import ch.boye.httpclientandroidlib.annotation.GuardedBy;
> +import ch.boye.httpclientandroidlib.annotation.ThreadSafe;

Are these the annotations you intend to use?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchMeta.java:43
(Diff revision 8)
> +
> +    public BatchMeta(Object payloadLock, long maxBytes, long maxRecords, Long collectionLastModified) {
> +        super(payloadLock, maxBytes, maxRecords);
> +
> +        if (collectionLastModified != null) {
> +            this.collectionLastModified = collectionLastModified.toString();

If you're storing it as a string, why not pass in the raw header value here?

::: mobile/android/tests/background/junit4/src/org/mozilla/android/sync/test/TestServer11RepositorySession.java:97
(Diff revision 8)
>    }
>  
>    private HTTPServerTestHelper data     = new HTTPServerTestHelper();
>  
> -  public class MockServer11RepositorySession extends Server11RepositorySession {
> -    public MockServer11RepositorySession(Repository repository) {
> +  // TODO structure of things is now different (UplodaRunnable is outside of session, etc),
> +  // TODO so need to rewrite these:

Don't forget!
Attachment #8776224 - Flags: review?(rnewman)
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review69496

> We don't have a [guid] method for this?

Nope. Could add it in, but that just moves for loops around a bit. (...but isn't that what most of our work is anyway...)

> You should make this a method on `SyncStorageResponse`, complete with any validation or conversion it needs.

I moved a few things down to SyncStorageResponse and cleaned up SyncResponse a bit. I feel like doing more would be a bit off-topic for this work (I'm primarily concerned around negatively affecting other parts of sync), but let me know if you think otherwise.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70776

> Are these the annotations you intend to use?

That is what we have available in the codebase; I figure if this lib gets removed, they aren't hard to re-add. Also, these annotations are primarily for documentation purposes.

> If you're storing it as a string, why not pass in the raw header value here?

Collection's L-M value is available to us as an integer via InfoCollections object, so this seemed like the most straightforward way to get at it. Raw header values of L-M received in payload responses is used though (except at the very end, when we convert it to long.

> Don't forget!

I'm considering dropping these (upload-related) tests entirely. Unit tests for the batching/payload splitting stuff are in place now, and the remainder (payloadrunnable, batchingdelegate, some callbacks) could easily be tested in a similar fashion via mocking out certain bits and pieces.

I like the granularity of that approach, as well as ability to easier drill down into various edge cases. Mocking out a whole server gets things exercised well, but I feel it's more cumbersome to write good unit tests around. Good "integration" test for this stuff though, so perhaps I'll add some simpler ones here, while largely focusing on unit testing.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70818

Looks pretty good. Need to update tests. Comments inline.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchMeta.java:41
(Diff revision 10)
> +    // Accessed by synchronously running threads.
> +    @GuardedBy("accessLock") private final List<String> successRecordGuids = new ArrayList<>();
> +
> +    protected final String collectionLastModified;
> +
> +    public BatchMeta(Object payloadLock, long maxBytes, long maxRecords, Long collectionLastModified) {

If elsewhere you're treating `X-Last-Modified` as a string, and you're stringifying it here, then just pass a `String` and make your life easier.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchMeta.java:45
(Diff revision 10)
> +
> +    public BatchMeta(Object payloadLock, long maxBytes, long maxRecords, Long collectionLastModified) {
> +        super(payloadLock, maxBytes, maxRecords);
> +
> +        if (collectionLastModified != null) {
> +            this.collectionLastModified = collectionLastModified.toString();

Indeed, I'm not sure how this is supposed to work.

Here you set `this.collectionLastModified` to the string representation of a Long -- `"1471645412480"`.

Then elsewhere you set it to `response.getLastModified()`, which returns the header as a raw string -- `"1471645412.48"`.

I think there's a bug here. And if there isn't, it's definitely confusing!

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchMeta.java:123
(Diff revision 10)
> +
> +    protected ArrayList<String> getSuccessRecordGuids() {
> +        synchronized (accessLock) {
> +            final ArrayList<String> clonedSuccessGuids = new ArrayList<>(successRecordGuids.size());
> +            clonedSuccessGuids.addAll(successRecordGuids);
> +            return clonedSuccessGuids;

```
return new ArrayList<>(this.successRecordGuids);
```

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:112
(Diff revision 10)
> +
> +        synchronized (payloadLock) {
> +            Logger.debug(LOG_TAG, "Processing a record with guid: " + guid);
> +
> +            // We can't upload individual records which exceed our payload byte limit.
> +            if ((recordDeltaByteCount + PER_PAYLOAD_OVERHEAD_BYTE_COUNT) > payload.maxBytes) {

`maxBytes` is `final`, so probably you're safe to move the `synchronized` block start to somewhere below this line.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:119
(Diff revision 10)
> +                return;
> +            }
> +
> +            // If we know for sure that batching isn't supported,
> +            // consider our batch to be of unlimited size.
> +            batchMeta.setIsUnlimited(isBatchingSupported != null && !isBatchingSupported);

I don't think you need to do this on every queued record.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:134
(Diff revision 10)
> +                // At this point, we could check if our payload/batch is considered "full", and upload.
> +                // This becomes a little tricky. We can't just look at the record counts, and need to
> +                // somehow "predict" how large the next record might be, and if it will fit into the
> +                // remaining space. However, if we allow for a large variation in record sizes, this
> +                // becomes increasingly complicated, as a single outlier might throw us off.
> +                // The simple thing then is to just wait until the next record comes in, and decide then.

You could certainly check the payload record limit. You could also figure out the bare minimum envelope size, and check number of bytes. That's a pretty small amount of code:


```
payload.add(recordDeltaByteCount, recordBytes, guid);
batchMeta.add(recordDeltaByteCount);

// Now also check to see if we'll for sure be unable to fit another.
if (!payload.canFit(minimumRecordSize)) {
    flush(false, false);
}
```

This is probably worth doing versus maintaining the comment!

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:142
(Diff revision 10)
> +            } else if (canFitRecordIntoBatch) {
> +                Logger.debug(LOG_TAG, "Current payload is full, uploading it.");
> +                flush(false, false);
> +
> +                Logger.debug(LOG_TAG, "Payload finished uploading; resetting it and recording the incoming record");
> +                payload.reset();

`flush` resets the payload already.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:154
(Diff revision 10)
> +            } else {
> +                Logger.debug(LOG_TAG, "Current batch is full, committing it.");
> +                flush(true, false);
> +
> +                Logger.debug(LOG_TAG, "Batch committed; resetting it and recording the incoming record");
> +                payload.reset();

Same.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:168
(Diff revision 10)
> +
> +    public void noMoreRecordsToUpload() {
> +        Logger.debug(LOG_TAG, "Received 'no more records to upload' signal. Flushing the payload.");
> +
> +        synchronized (payload) {
> +            if (!payload.isEmpty()) {

Careful. If you ever do speculatively upload -- as I suggested you do above -- and perhaps even so, you might have started a batch, uploaded precisely 100 records, and now have an empty payload that needs to be committed.

One approach to solving this is a dirty flag on `payload` itself. Your first `payload` instance has `needsCommit = false`. When you add a record to it, it flips to `true`. Every subsequent payload, apart from those that are created after committing a transaction, start with `needsCommit = true`.

Think carefully about whether you can hit this situation.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:215
(Diff revision 10)
> +        bumpTimestampTo(uploadTimestamp, lastModifiedTimestamp);
> +        finished(uploadTimestamp);
> +    }
> +
> +    private void finished(AtomicLong lastModifiedTimestamp) {
> +        repositorySession.storeDone(lastModifiedTimestamp.get());

Why is this not the same as the other `finished`? Why does it not just call

```
this.finished(lastModifiedTimestamp.get());
```
?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:294
(Diff revision 10)
> +                outgoing,
> +                byteCount,
> +                isCommit
> +        ));
> +
> +        payload.reset();

I have a tentative preference for simply:

```
    this.payload = new Payload(…);
```

but that's an immutability style thing…

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BufferSizeTracker.java:22
(Diff revision 10)
> + */
> +@ThreadSafe
> +public abstract class BufferSizeTracker {
> +    protected final Object accessLock;
> +
> +    // Access must by synchronized on payloadLock.

You mean `accessLock`?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/Payload.java:52
(Diff revision 10)
> +    }
> +
> +    protected ArrayList<byte[]> getRecordsBuffer() {
> +        synchronized (accessLock) {
> +            final ArrayList<byte[]> clonedBuffer = new ArrayList<>(recordsBuffer.size());
> +            clonedBuffer.addAll(recordsBuffer);

Same constructor comment.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadUploadDelegate.java:164
(Diff revision 10)
> +    }
> +
> +    @Override
> +    public void handleRequestError(Exception e) {
> +        ArrayList<String> failedOutgoingGuids = postedRecordGuids;
> +        postedRecordGuids = null; // Want to GC this ASAP.

It won't be GCed until after your `for` loop; you just assigned the reference to a local variable. You might as well just null `postedRecordGuids` after the loop.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:47
(Diff revision 10)
> +    private final SyncStorageRequestDelegate uploadDelegate;
> +
> +    private final ArrayList<byte[]> outgoing;
> +    private final long byteCount;
> +
> +    // Used to construct post uri during run().

POST URI

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/RecordUploadRunnable.java:79
(Diff revision 10)
> +
> +        @Override
> +        public void writeTo(OutputStream outstream) throws IOException {
> +            int count = outgoing.size();
> +            outstream.write(RECORDS_START);
> +            outstream.write(outgoing.get(0));

You'd better hope that `outgoing` isn't empty!
Attachment #8776224 - Flags: review?(rnewman)
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70818

> Indeed, I'm not sure how this is supposed to work.
> 
> Here you set `this.collectionLastModified` to the string representation of a Long -- `"1471645412480"`.
> 
> Then elsewhere you set it to `response.getLastModified()`, which returns the header as a raw string -- `"1471645412.48"`.
> 
> I think there's a bug here. And if there isn't, it's definitely confusing!

Hm, yeah. It does work, and I imagine that's because the server treats the two formats here as the same thing. But this certainly needs a fix.

> You could certainly check the payload record limit. You could also figure out the bare minimum envelope size, and check number of bytes. That's a pretty small amount of code:
> 
> 
> ```
> payload.add(recordDeltaByteCount, recordBytes, guid);
> batchMeta.add(recordDeltaByteCount);
> 
> // Now also check to see if we'll for sure be unable to fit another.
> if (!payload.canFit(minimumRecordSize)) {
>     flush(false, false);
> }
> ```
> 
> This is probably worth doing versus maintaining the comment!

This will work, yeah. Between additional complexity this will introduce for `noMoreRecordsToUpload`, generally introducing a concept of an empty payload which we might need to commit being a valid thing (it's not currently!), and a very insignificant performance win here, I'm not sure it's worth doing.

What do you think? I'm curious specifically around any wins this provides. It's not strictly "simpler" (in fact it seems to be the opposite of that), so any wins will come from us starting to upload a few milliseconds earlier... But they're probably outweighted by the fact that we then might have to upload empty payloads simply to commit a batch!

> Careful. If you ever do speculatively upload -- as I suggested you do above -- and perhaps even so, you might have started a batch, uploaded precisely 100 records, and now have an empty payload that needs to be committed.
> 
> One approach to solving this is a dirty flag on `payload` itself. Your first `payload` instance has `needsCommit = false`. When you add a record to it, it flips to `true`. Every subsequent payload, apart from those that are created after committing a transaction, start with `needsCommit = true`.
> 
> Think carefully about whether you can hit this situation.

If I don't speculatively upload, I don't think it's possible to get into the state that we have an empty payload which needs to be committed.

Say the payload limit is 100. I process 100 records. We don't upload them yet, either waiting for 101st, or for `noMoreRecordsToUpload` call. 101st record will upload the payload and commit the batch and will ensure that payload is non-empty (with 1 record), and `noMoreRecordsToUpload` will commit the batch with 100 records in it. Either way, we have a guarantee that we either commit, or ensure that payload is non-empty.

As you rightly point out, if I speculatively upload records, I certainly do get into this situation.

> Why is this not the same as the other `finished`? Why does it not just call
> 
> ```
> this.finished(lastModifiedTimestamp.get());
> ```
> ?

I think the idea was that the only `finished` that actually calls `repositorySession.storeDone` is the one that's dealing with the AtomicLong.

> I have a tentative preference for simply:
> 
> ```
>     this.payload = new Payload(…);
> ```
> 
> but that's an immutability style thing…

In an earlier version of this code, I've played around with an idea of making both Payload and Batch objects completely immutable. I.e. instead of an "add" method I would just create new objects, and so on. Then I had to worry about visibiliy of object changes across threads, etc., so marking them as final and synchronizing their internal state seemed a bit easier to reason about. Generally I suppose going full-on immutable like this is probably worth it, but just replacing .reset call in this case with a new object probably won't win us anything.

> You'd better hope that `outgoing` isn't empty!

Indeed. If there are no programming mistakes upstream of this, this won't be empty. If this is empty, something is screwed up, at which point we can only fail anyway.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70818

> This will work, yeah. Between additional complexity this will introduce for `noMoreRecordsToUpload`, generally introducing a concept of an empty payload which we might need to commit being a valid thing (it's not currently!), and a very insignificant performance win here, I'm not sure it's worth doing.
> 
> What do you think? I'm curious specifically around any wins this provides. It's not strictly "simpler" (in fact it seems to be the opposite of that), so any wins will come from us starting to upload a few milliseconds earlier... But they're probably outweighted by the fact that we then might have to upload empty payloads simply to commit a batch!

That depends on the calling code.

If this were iOS, where typically we run one or two queries to grab all of the outgoing results, queueing them all for upload at about the same time, then there's little benefit to be had -- as you note, only a few milliseconds of pipelining.

If Android is instead taking longer to queue up records, then perhaps we're sitting here with a full batch and an idle network uplink for 200msec while we grab another (or another 10, or…). You'll need to look at the calling code and/or some logs to see if that's the case.

> In an earlier version of this code, I've played around with an idea of making both Payload and Batch objects completely immutable. I.e. instead of an "add" method I would just create new objects, and so on. Then I had to worry about visibiliy of object changes across threads, etc., so marking them as final and synchronizing their internal state seemed a bit easier to reason about. Generally I suppose going full-on immutable like this is probably worth it, but just replacing .reset call in this case with a new object probably won't win us anything.

Quite!
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

Clearing flag while waiting for a new revision.
Attachment #8776224 - Flags: review?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #43)
> If Android is instead taking longer to queue up records, then perhaps we're
> sitting here with a full batch and an idle network uplink for 200msec while
> we grab another (or another 10, or…). You'll need to look at the calling
> code and/or some logs to see if that's the case.

That depends on what we're uploading. Tabs, Passwords, History should be quick - we fetch records we need to upload, and before each is sent for processing we convert them to a Record object.

Bookmarks are potentially much slower, as conversion into a BookmarkRecord includes running additional queries to obtain information about bookmark parent, and/or (if it's a folder) to build a list of children. All of this happens as we're iterating over the cursor results, so it's possible we'll spend some time waiting for the next call to `process`. I don't have a good sense of how long that wait might be - it's essentially however long it is for us to run one or two SQL queries, process results and construct necessary objects out of them. It'll also vary record to record.

Form history is split into two fetches, and we start handing off records to `process` right away - so at some point we'll have to wait for a SQL query to complete before receiving a follow-up `process` call.
Priority: P2 → P1
Depends on: 1297561
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70818

> Hm, yeah. It does work, and I imagine that's because the server treats the two formats here as the same thing. But this certainly needs a fix.

Instead of storing raw L-M header, I'm normalizing it first and storing the long, which is then "toStringed" for XIUS.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review70818

> That depends on the calling code.
> 
> If this were iOS, where typically we run one or two queries to grab all of the outgoing results, queueing them all for upload at about the same time, then there's little benefit to be had -- as you note, only a few milliseconds of pipelining.
> 
> If Android is instead taking longer to queue up records, then perhaps we're sitting here with a full batch and an idle network uplink for 200msec while we grab another (or another 10, or…). You'll need to look at the calling code and/or some logs to see if that's the case.

Added pre-emptive uploading of payloads, and pre-emptive committing of batches. I track smallest record (by byte-count) that we've seen during the overall upload, and consider that payload/batch is full either if record count is at the limit, or if they won't fit half of the smallest seen record.

Tracking "smallest seen" is a way to side-step the fact that different records types will have a different byte profile, and checking for "half of the smallest" is an attempt to be conservative/optimistic and not flush payloads/commit batches too early.

Added a flag on BatchMeta to track if it needs to be committed.
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review72054

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/net/MozResponse.java:46
(Diff revision 14)
>    public int getStatusCode() {
>      return this.response.getStatusLine().getStatusCode();
>    }
>  
>    public boolean wasSuccessful() {
> -    return this.getStatusCode() == 200;
> +    return this.getStatusCode() >= 200 && this.getStatusCode() < 300;

This change isn't congruent with the rest of the codebase.

`JSONFetchHandler` uses `wasSuccessful` to decide if it should expect a response body. That implies a check for 200, but is wrong for a 204.

Similar for `MetaGlobal.handleDownloadSuccess`.

There might be other dependencies for `SyncStorageResourceDelegate.handleHttpResponse`; I don't have IntelliJ open to check.

If you're touching this method, you should audit and correct those (e.g., change them to check that the status code is 200), or introduce another method (was200), or whatever makes sense to you.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/Server11RepositorySession.java:164
(Diff revision 14)
> +  @Override
> +  public void setStoreDelegate(RepositorySessionStoreDelegate delegate) {
> +    this.delegate = delegate;
> +
> +    // Now that we have the delegate, we can initialize our uploader.
> +    uploader = new BatchingUploader(this, storeWorkQueue, delegate);

Nit: stick with the style. this.uploader =

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:120
(Diff revision 14)
> +        }
> +
> +        synchronized (payloadLock) {
> +            // If we know for sure that we're not in a batching mode,
> +            // consider our batch to be of unlimited size.
> +            batchMeta.setIsUnlimited(inBatchingMode != null && !inBatchingMode);

Set this in `setInBatchingMode`, not here.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:132
(Diff revision 14)
> +                payload.add(recordDeltaByteCount, recordBytes, guid);
> +                batchMeta.add(recordDeltaByteCount);
> +
> +                // If we anticipate that we won't be able to fit another record,
> +                // either commit and reset the batch or upload the payload.
> +                if (!batchMeta.mightFitAnotherRecord()) {

I hate to churn ya, but: every record you add, you call `batchMeta.canFit`, `payload.canFit`, `batchMeta.mightFitAnotherRecord()`, `payload.mightFitAnotherRecord()`.

These calls are actually not cheap -- you're taking and releasing a lock four times.

There's no point calling the latter two until we're done with a sequence of adding records -- there's a caller somewhere who's just waiting for this function to return so it can provide the next record!

On iOS we have an API that accepts a sequence of records, so we naturally have a pausing point.

Given that this is an optimization, you could split out the 'might fit' stuff into a "take a breath" method. Callers with a delay until they gather the next record can call that to speculatively upload.

The alternative is to encapsulate adding more closely. On iOS we have a one step queue/flush if necessary method, which we can do because the batchmeta bookkeeping isn't isolated. You could imagine that `payload.add` might have the side-effect of calling `flush`!

Probably the smallest change to this PR is a tri-value "can fit" -- that is, "yes, I can fit this record, but I don't think I can fit another afterwards". If this is the 50th record, we want to add it and immediately flush.

You could implement this with a pair: `[thisRecord: bool, nextRecord: bool]`. Or you could implement this with an int: how many records do I think I can take -- 0, 1, many?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:260
(Diff revision 14)
> +        }
> +
> +        // In non-batching mode, every time we receive a Last-Modified timestamp, we expect it to change
> +        // since records are "committed" (become visible to other clients) on every payload.
> +        // In batching mode, we only expect Last-Modified to change when we commit a batch.
> +        batchMeta.setLastModified(lastModified, !inBatchingMode || isCommit);

Reverse the `isCommit` conditional. Might save a memory barrier on the volatile.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadUploadDelegate.java:48
(Diff revision 14)
> +    public String ifUnmodifiedSince() {
> +        final Long lastModified = uploader.getCurrentBatch().getLastModified();
> +        if (lastModified == null) {
> +            return null;
> +        }
> +        return lastModified.toString();

```
return org.mozilla.gecko.sync.Utils.millisecondsToDecimalSecondsString(lastModified);
```

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadUploadDelegate.java:175
(Diff revision 14)
> +    }
> +
> +    @Override
> +    public void handleRequestError(Exception e) {
> +        ArrayList<String> failedOutgoingGuids = postedRecordGuids;
> +        for (String guid : failedOutgoingGuids) {

Just use `postedRecordGuids` here directly.
Attachment #8776224 - Flags: review?(rnewman)
Comment on attachment 8776224 [details]
Bug 1253111 - Part 2: Add support for batching uploads

https://reviewboard.mozilla.org/r/68122/#review72186

Please manually test the ever-living fuck out of this.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:52
(Diff revision 15)
> + * to guard against concurrent-modification errors (different uploader commits before we're done).
> + *
> + * Non-batching mode notes:
> + * We also support Sync servers which don't enable batching for uploads. In this case, we respect
> + * payload limits for individual uploads, and every upload is considered a commit. Batching limits
> + * do not apply, and batch token is irrelevent.

s/irrelevent/irrelevant

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:138
(Diff revision 15)
> +                // Keep track of the overflow record.
> +                addAndFlushIfNecessary(recordDeltaByteCount, recordBytes, guid);
> +
> +            // Batch won't fit the record.
> +            } else {
> +                Logger.debug(LOG_TAG, "Current batch won't fit incoming record, committing it.");

s/committing it/committing batch.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:153
(Diff revision 15)
> +    }
> +
> +    // Convenience function used from the process method; caller must hold a payloadLock.
> +    private void addAndFlushIfNecessary(long byteCount, byte[] recordBytes, String guid) {
> +        boolean isPayloadFull = payload.addAndEstimateIfFull(byteCount, recordBytes, guid);
> +        boolean isBatchFull = batchMeta.addAndEstimateIfFull(byteCount);

Nice; much clearer.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BufferSizeTracker.java:100
(Diff revision 15)
> +        }
> +    }
> +
> +    private boolean canFitRecordByteDelta(long byteDelta, long recordCount, long byteCount) {
> +        return (byteCount + byteDelta) <= maxBytes
> +                && (recordCount + 1) <= maxRecords;

```
return recordCount < maxRecords &&
       (byteCount + byteDelta) <= maxBytes;
```

-- avoid a little arithmetic by using `<` not `<=`.
Attachment #8776224 - Flags: review?(rnewman) → review+
(In reply to Richard Newman [:rnewman] from comment #55)
> Please manually test the ever-living fuck out of this.

Aye, and thanks a ton for all the feedback and guidance!
Happy to help. You touch it, you own it ;)
Pushed by gkruglov@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/567759a4ab88
Part 1: Introduce new sync stage to handle info/configuration r=rnewman
https://hg.mozilla.org/integration/autoland/rev/737bdb5815e6
Part 2: Add support for batching uploads r=rnewman
https://hg.mozilla.org/mozilla-central/rev/567759a4ab88
https://hg.mozilla.org/mozilla-central/rev/737bdb5815e6
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Depends on: 1299997
See Also: → 1300221
Iteration: --- → 1.3
In the release notes of 51 with "Improved reliability of browser data syncing with atomic uploads" as wording.
Depends on: 1336001
See Also: → 1343726
Duplicate of this bug: 1300450
Depends on: 1362206
Product: Android Background Services → Firefox for Android
You need to log in before you can comment on or make changes to this bug.