Closed Bug 1389233 Opened 5 years ago Closed 5 years ago

Android sync ping "outgoing" field format is incorrect

Categories

(Firefox for Android Graveyard :: Android Sync, enhancement, P1)

enhancement

Tracking

(firefox60 fixed)

RESOLVED FIXED
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: tcsc, Assigned: tcsc)

References

Details

Attachments

(1 file)

Noticed while working on 1291822. This should be an array that represents outgoing batches, e.g. each one should have the count/failed for that batch. As it is it uploads a single object, and not an array.

As it is, the scala code will report null for the outgoing data [0], which it allows [1], but it means we won't have anything for this data.

In addition to making android actually report batch information, it's possible that we should modify the scala code so that it will accept an object with {sent, failed} so that we can use data from clients before this gets fixed (but it's also possible this doesn't matter)

[0] https://github.com/mozilla/telemetry-batch-view/blob/72e448d4ae0b90ce4745d414137c45f1413b3c50/src/main/scala/com/mozilla/telemetry/views/SyncView.scala#L265

[1] https://github.com/mozilla/telemetry-batch-view/blob/72e448d4ae0b90ce4745d414137c45f1413b3c50/src/main/scala/com/mozilla/telemetry/views/SyncView.scala#L171
Blocks: 1308337
See Also: → 1390315
What to do:
- fix scala code to understand pings that'll keep flowing until this is fixed (Thom says it should be easy)
- adjust sync ping docs to make the correct format more obvious

And:
- "quick fix this" by wrapping the outgoing object in [] on the clients, OR
- actually "fix this", and report per-batch/per-post results
- do one of the above for iOS in Bug 1390315

I'm assuming that the per-batch/per-post outgoing data will be useful once we start looking at it, and so it's worth fixing this properly.
Priority: -- → P1
Assignee: nobody → gkruglov
Priority: P1 → P2
Assignee: gkruglov → tchiovoloni
(I was going to request feedback on this to ask about the payload dispatcher/batching uploader/upload delegate bits, but I figured it out/came up with a way to do that part so that it's not necessary).
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review191270

::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/telemetry/pingbuilders/TelemetrySyncPingBuilderTest.java:10
(Diff revision 1)
>  
>  import android.os.Bundle;
>  import android.os.Parcelable;
>  
>  import org.json.JSONException;
> -import org.json.JSONObject;
> +import org.json.simple.JSONObject;

(I didn't read the rest of the patch yet) In Bug 1204559 I'm slowly working toward removing any and all mentions of org.json.simple from the codebase, so please don't make use of it in the new code.
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review191270

> (I didn't read the rest of the patch yet) In Bug 1204559 I'm slowly working toward removing any and all mentions of org.json.simple from the codebase, so please don't make use of it in the new code.

Wow, the org.json.simple API is poorly named in comparison. org.json is a much nicer/simpler to use API.

I didn't end up needing that import anyway. I'll try and see if I can use org.json over org.json.simple in the rest of this patch, but given ExtendedJSONObject's ubiquity, it seems tricky.
Product: Android Background Services → Firefox for Android
Priority: P2 → P1
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review202198

r- just so that I look at it again with your changes - if you think they're necessary. Can't say that I like this patch, but it is just as convoluted as the rest of this code, so that's not really a comment on the patch itself. I do think you can clean it up a bit.

::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetrySyncPingBuilder.java:109
(Diff revision 2)
> +            return null;
> +        }
> +        JSONArray arr = new JSONArray();
> +        for (int i = 0; i < batches.size(); ++i) {
> +            RecordsChannel.StoreBatchData batch = batches.get(i);
> +            if (batch.sent == 0 && batch.failed == 0) {

Don't you explicitely ensure that 'batches' like these won't be in the list?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:98
(Diff revision 2)
> +  private final AtomicInteger currentStoreBatchAccepted = new AtomicInteger();
> +  private final AtomicInteger currentStoreBatchFailed = new AtomicInteger();
> +  private final ConcurrentLinkedQueue<StoreBatchData> completedBatches = new ConcurrentLinkedQueue<>();
> +
> +
> +  public static class StoreBatchData {

You can encapsulate functionality such as `haveUncommittedBatchData` into this object, and have it own the `currentStore*` counters as well as logic to bump them, etc., - most things you've added here.

This will allow you to actually test this easily, if you choose to write unit tests, and probably guarantee some form of thread-safety.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:205
(Diff revision 2)
>      fetchFailedCount.set(0);
>      storeAttemptedCount.set(0);
>      storeAcceptedCount.set(0);
>      storeFailedCount.set(0);
>      storeReconciledCount.set(0);
> +    currentStoreBatchFailed.set(0);

currentStoreBatchAccepted?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:242
(Diff revision 2)
> +    return currentStoreBatchAccepted.get() != 0 ||
> +            currentStoreBatchAttempted.get() != 0 ||
> +            currentStoreBatchFailed.get() != 0;
> +  }
> +
> +  /* package-local */ ConcurrentLinkedQueue<StoreBatchData> finishAndGetStoreBatches() {

You should be able to hide the 'finish' bit from the consumers of this by encapsulating that logic inside of StoreBatchData itself.

Otherwise this is a pretty ugly "leak" in the abstraction layer.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:291
(Diff revision 2)
> +    // These might be different if we "forced" a failure due to an error unrelated to uploading
> +    // a record.
> +    int failed = Math.max(knownFailed, sent - knownSucceeded);
> +
> +    // Don't record data about "batches" that are really empty
> +    if (sent != 0 || failed != 0) {

How can we have sent==0 && failed>0? Shouldn't this be `sent > 0`?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:291
(Diff revision 2)
> +    // These might be different if we "forced" a failure due to an error unrelated to uploading
> +    // a record.
> +    int failed = Math.max(knownFailed, sent - knownSucceeded);
> +
> +    // Don't record data about "batches" that are really empty
> +    if (sent != 0 || failed != 0) {

Also, it's possible to have a legit empty `commit` batch. I suppose the counts for it (0) aren't interesting, but it might be interesting to record its presence nevertheless. E.g. its presence indicates that we didn't "pack" our payloads as effectively as we could have, and are wasting a bit of resources as a result.
Attachment #8914904 - Flags: review?(gkruglov) → review-
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review202198

I don't really like it either, but I figured getting it up for review might point out some ways it could improve, since most of the ugliness is interacting with the existing code that I don't understand very well.

> Don't you explicitely ensure that 'batches' like these won't be in the list?

Yes, but there's no reason to as you point out later.

> You can encapsulate functionality such as `haveUncommittedBatchData` into this object, and have it own the `currentStore*` counters as well as logic to bump them, etc., - most things you've added here.
> 
> This will allow you to actually test this easily, if you choose to write unit tests, and probably guarantee some form of thread-safety.

Hm, Okay. I think I'd need another class that represents an entry in the list, though.

> currentStoreBatchAccepted?

Nice catch!

> How can we have sent==0 && failed>0? Shouldn't this be `sent > 0`?

That's true, but your point about the empty batches is good, so I'm going to remove this check.

> Also, it's possible to have a legit empty `commit` batch. I suppose the counts for it (0) aren't interesting, but it might be interesting to record its presence nevertheless. E.g. its presence indicates that we didn't "pack" our payloads as effectively as we could have, and are wasting a bit of resources as a result.

Right now this will be hard to determine from the scala, but I think you're actually correct. This will tell how much we are paying for the 'optimization' where we do an early commit (I think most of the cost is in code complexity, though).
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review217940

This looks good to me, thanks Thom! A few questions and minor notes and nits.

::: mobile/android/app/src/test/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetrySyncPingBuilderTest.java:121
(Diff revision 3)
>          assertEquals(0, stage.validation.getArray("problems").size());
> +
> +        JSONArray outgoing = engine.getArray("outgoing");
> +        assertEquals(outgoing.size(), 3);
> +
> +        ExtendedJSONObject firstBatch = (ExtendedJSONObject)outgoing.get(0);

nit: spacing (i'd expect linter to have covered this though).

::: mobile/android/app/src/test/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetrySyncPingBuilderTest.java:130
(Diff revision 3)
> +        ExtendedJSONObject secondBatch = (ExtendedJSONObject)outgoing.get(1);
> +        assertEquals(secondBatch.getLong("sent", -1), 9);
> +        assertFalse(secondBatch.containsKey("failed"));
> +
> +        // Ensure we include "all zero" batches, since we can actually send those on android.
> +        ExtendedJSONObject lastBatch = (ExtendedJSONObject)outgoing.get(2);

So, we're expecting to send something like:

[
 {sent: 1, failed: 1}, {sent: 9}, {}
]

I'm assuming there's a good reason we're sending that {} - you say we're sending it on Android, does that mean desktop isn't sending it? Are we using it to count number of batches?

::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetrySyncPingBuilder.java:105
(Diff revision 3)
>      }
>  
> +    @Nullable
> +    private static JSONArray buildOutgoing(List<StoreBatchTracker.Batch> batches) {
> +        if (batches == null || batches.size() == 0) {
> +            return null;

I think this condition isn't tested.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/delegates/RepositorySessionStoreDelegate.java:28
(Diff revision 3)
>  
>    // Called with a GUID when store has succeeded.
>    void onRecordStoreSucceeded(String guid);
>    void onStoreCompleted();
>    void onStoreFailed(Exception e);
> +  void onBatchCommitted();

Can you add an explanatory comment here regarding the asymetric nature of this API, similar to one for `onRecordStoreReconciled`?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadDispatcher.java:99
(Diff revision 3)
>              bumpTimestampTo(uploadTimestamp, response.normalizedTimestampForHeader(SyncResponse.X_LAST_MODIFIED));
>              uploader.setLastStoreTimestamp(uploadTimestamp);
>              batchWhiteboard.clearSuccessRecordGuids();
>          }
>  
> +        if (isCommit || !batchWhiteboard.getInBatchingMode() || isLastPayload) {

Do you need isLastPayload in that conditional? I think it's redundant? Seems like you'll want to do this after any "committing" upload.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadDispatcher.java:100
(Diff revision 3)
>              uploader.setLastStoreTimestamp(uploadTimestamp);
>              batchWhiteboard.clearSuccessRecordGuids();
>          }
>  
> +        if (isCommit || !batchWhiteboard.getInBatchingMode() || isLastPayload) {
> +            uploader.sessionStoreDelegate.onBatchCommitted();

I don't think this is covered anywhere in tests.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/RecordsChannel.java:89
(Diff revision 3)
>    // reconciled <= accepted <= attempted
>    // reconciled = accepted - `new`, where `new` is inferred.
>    private final AtomicInteger storeAttemptedCount = new AtomicInteger();
>    private final AtomicInteger storeAcceptedCount = new AtomicInteger();
>    private final AtomicInteger storeFailedCount = new AtomicInteger();
>    private final AtomicInteger storeReconciledCount = new AtomicInteger();

You _could_ move all of these 'store' counters into StoreBatchTracker as well, right? And simplify delegate methods here a tiny bit. But you're choosing not to to minimize amount of changes here and to keep that class more focused?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/StoreBatchTracker.java:81
(Diff revision 3)
> +    }
> +
> +    // Note that this finishes the current batch (if any exists).
> +    /* package-local */ ArrayList<Batch> getStoreBatches() {
> +        if (haveUncommittedBatchData()) {
> +            onBatchFinished();

In what cases would this be necessary?

onBatchFinished is called whenever onBatchCommitted is fired; and we expect it to fire before we collect the batches in onSecondFlowCompleted.

So this conditional seems to indicate that there might be some weird out-of-order sitation, which seems like a bug?

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/synchronizer/SynchronizerSession.java:194
(Diff revision 3)
>  
>    public int getInboundCountReconciled() {
>      return numInboundRecordsReconciled.get();
>    }
>  
> +  // Returns outboundBatches, and replaces the currently stored list with null.

This will only ever run once; you're setting the list to `null` to help with GC, or..?
Attachment #8914904 - Flags: review?(gkruglov) → review+
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review217940

> So, we're expecting to send something like:
> 
> [
>  {sent: 1, failed: 1}, {sent: 9}, {}
> ]
> 
> I'm assuming there's a good reason we're sending that {} - you say we're sending it on Android, does that mean desktop isn't sending it? Are we using it to count number of batches?

Desktop will never send a batch that has no data, android only will in cases where it has done a speculative POST, which desktop doesn't do.

The scala will replace missing `sent` or `failed` keys with 0, and so `{}` is the same as `{"sent": 0, "failed": 0}`, except that it uses slighly less bandwidth. In fact, I think the schema will complain if you put a 0 in there, since its supposed to only be used for non-zero numbers (although, it's not like android follows the schema strictly).

> I think this condition isn't tested.

I'll add a test for it, but it's not like we have (or arguably even want) 100% test coverage.

> I don't think this is covered anywhere in tests.

We don't have any tests that do a sync and then look at the telemetry. I agree this is unfortunate, but it doesn't seem easy to add them, and it doesn't seem to be worth blocking fixing our telemetry on android on getting those.

> You _could_ move all of these 'store' counters into StoreBatchTracker as well, right? And simplify delegate methods here a tiny bit. But you're choosing not to to minimize amount of changes here and to keep that class more focused?

Yes.

> This will only ever run once; you're setting the list to `null` to help with GC, or..?

It wasn't clear to me that this would only ever run once.
Comment on attachment 8914904 [details]
Bug 1389233 - Record outgoing batches in the android sync ping

https://reviewboard.mozilla.org/r/186150/#review217940

> In what cases would this be necessary?
> 
> onBatchFinished is called whenever onBatchCommitted is fired; and we expect it to fire before we collect the batches in onSecondFlowCompleted.
> 
> So this conditional seems to indicate that there might be some weird out-of-order sitation, which seems like a bug?

It can happen if we early abort the batch because e.g. a bookmark is too big. Not every item will have been marked as failed.
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e5209339907d
Record outgoing batches in the android sync ping r=Grisha
https://hg.mozilla.org/mozilla-central/rev/e5209339907d
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.