Closed Bug 1362206 Opened 3 years ago Closed 3 years ago

Stop uploading bookmark records on encountering a processing error

Categories

(Firefox for Android :: Android Sync, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 58
Tracking Status
firefox58 --- fixed

People

(Reporter: Grisha, Assigned: tcsc)

References

Details

Attachments

(1 file)

While processing records in the BatchingUploader, we might encounter various errors (record too large, illegal record - JSON parsing problems). Currently, we skip uploading that particular record, but proceed to upload all the rest. Current SyncStage will be marked as failed at the end of the upload.

This behaviour might end up creating corrupt bookmark trees on the server.

We _do_ stop uploading records if the "failed" list in the response is non-empty (indicating a malformed/too large record).

Desktop allows configuring this behaviour depending on what's being uploaded. For bookmarks, it aborts the upload in case of record errors. E.g. [0],[1].

Android should probably follow suite and match desktop's behaviour (ignore for flat collections, abort otherwise). 

[0] http://searchfox.org/mozilla-central/source/services/sync/modules/engines.js#1678
[1] http://searchfox.org/mozilla-central/source/services/sync/modules/engines/bookmarks.js#291
Priority: -- → P2
Assignee: nobody → tchiovoloni
This isn't ideal, but it's very unclear to me how we could add specific handling to this in the delegate, (which seems like the more natural place for it). Or at least, I tried to do that initially, and it ended up being a lot more work.
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review182462

I'm really not sure this is worth making configurable on a repository level. The simplest possible patch would conditionally (record instanceof BookmarkRecord) set 'recordUploadFailed' flag while processing a bad record in BatchingUploader. Just need to be careful to do the right thing with with reading/writing that flag, you might need to synchronize - volatile won't be enough. What do you think?
Attachment #8905271 - Flags: review?(gkruglov)
I couldn't figure out what you want for synchronization, but I've applied the other changes.
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review184142

This looks good, but there's a problem with how you access storeFailed flag.

If we simplify things, this is what you're doing:
```
volatile storeFailed = false;
if (!storeFailed) {
 if (badRecord) {
  storeFailed = true;
 }
}
```
That is, writing to a volatile variable based on its current value.

This would have been fine if we can guarantee that we only ever write to `storeFailed` on a single thread. The above code will run on some threads that RecordsChannel sets up to process the queue of records. But we'll also sometimes write to `storeFailed` from threads running on BatchingUploader's single threaded executor queue (which isn't guaranteed to always use the same thread, just that there's only ever one thread active at any time).

You should be able to fix this by synchronizing access to storeFailed. Since PayloadDispatcher owns it, just synchronize reads and writes to it on that object. Also, add a /* GuardedBy('this') */ comment by that variable, to make things clearer to future maintainers.

It's a bit heavy-handed... But I hope impact from locking won't be too much here.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:241
(Diff revision 2)
>              }
>          });
>      }
>  
> +    @VisibleForTesting
> +    /* package-local */ boolean shouldFailBatchOnFailure(Record record) {

Can you add a comment here describing why we do this, and why only for bookmarks?
Attachment #8905271 - Flags: review?(gkruglov)
We've talked about this some, but this *should* be fine, as long as deferredStoreDelegate either does whatever it needs to after this function returns, or it does nothing that would cause it to synchronize on the payloadDispatcher.
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review185180

Our main concern here is that we might end up calling delegate's onStoreFail more than once. Your "did we already fail?" check should be sufficient here. While you're looking at this, can you make sure that PayloadDispatcher follows a similar pattern (or perhaps just calls into the new method)? In context of your patch it appears to "do its own thing".

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadDispatcher.java:137
(Diff revision 3)
>          Logger.debug(LOG_TAG, "Record store failed for guid " + recordGuid + " with exception: " + e.toString());
>          recordUploadFailed = true;
>          uploader.sessionStoreDelegate.onRecordStoreFailed(e, recordGuid);
>      }
>  
> -    void concurrentModificationDetected() {
> +    synchronized void concurrentModificationDetected() {

This seems largely useless without also synchronize reads.
Attachment #8905271 - Flags: review?(gkruglov)
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review185242

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/PayloadDispatcher.java:137
(Diff revision 3)
>          Logger.debug(LOG_TAG, "Record store failed for guid " + recordGuid + " with exception: " + e.toString());
>          recordUploadFailed = true;
>          uploader.sessionStoreDelegate.onRecordStoreFailed(e, recordGuid);
>      }
>  
> -    void concurrentModificationDetected() {
> +    synchronized void concurrentModificationDetected() {

I think this is mainly to make sure that we do this before the other. But it probably should check the storeFailed flag to enforce mutual exclusion to the delegate callback. Possibly on the recordFailed method as well... :(
Status: NEW → ASSIGNED
Priority: P2 → P1
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review190702

Sorry Thom, I think we overlooked "this one weird thing" :/

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:253
(Diff revision 4)
>  
> +    // Common handling for marking a record failure and calling our delegate's onRecordStoreFailed.
> +    private void failRecordStore(Exception e, Record record) {
> +        if (shouldFailBatchOnFailure(record)) {
> +            Logger.debug(LOG_TAG, "Batch failed with exception: " + e.toString());
> +            payloadDispatcher.doStoreFailed(e);

PayloadDispatcher's doStoreFailed will call into RecordsChannel's onStoreFailed, which will triggert onFlowCompleted, and we'll end up inspecting how the flow went in ServerLocalSynchronizerSession to determine if it succeeded. Currently that method looks either for "reflow exceptions" or for store failure counts.

When we flow bookmarks, we ensure that folders are first, since they're more likely to fail our size checks. If we have a a folder that's too large, and it fails the size check, we'll fail here, but won't bump and store failure counts; which will end up being treated as a successful synchronization, and we'll bump our timestamps, etc. Thankfully for bookmarks, they're version-tracked, and we won't bump their sync versions (via session's cleanUp method) until onStoreCompleted delegate callback is called - and so this flow might "just work" - but please double check :/

Another problem: ServerLocalSynchronizerSession decides if synchronization is a failure if there have been any store failures while uploading (via count). So your logic here is unfortunately incomplete for non-bookmark records: we'll continue uploading history/etc items if they fail sanity/size checks, but we'd still mark synchronization as failed, won't bump the timestamps, and will end up re-uploading the same items on the next sync.

I think the simplest thing (keeping in mind that we're hoping to simplify this mess soon, somehow) is to also touch "is success?" logic in ServerLocalSynchronizerSession and account for this distinction in behaviour.
Attachment #8905271 - Flags: review?(gkruglov)
This makes it so that we always call the recordStoreFailed for bookmark records, and doesn't fail for size overflow on other types of records.

I'm pretty confident this isn't everything you want here, but maybe it's some of it? Sorry, maybe we should meet up in vidyo to clarify -- I'm pretty confused as to what you would want me to do in ServerLocalSynchronizerSession also.
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review193292

I think this is good enough - see my comment below.

What I had in mind about ServerLocalSynchronizerSession is that you could try to limit the decision making on pass/fail of synchronization of different types of records in different scenarios to one place (the 'onSecondFlowCompleted' method). As it stands, we make this decision in "passes" in at least two different places. But I now feel that it's a larger and somewhat non-obvious change that's much better suited to the upcoming "make sync simple" work.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/repositories/uploaders/BatchingUploader.java:254
(Diff revision 5)
> +    private void failRecordStore(Exception e, Record record, boolean sizeOverflow) {
> +        if (shouldFailBatchOnFailure(record)) {
> +            Logger.debug(LOG_TAG, "Batch failed with exception: " + e.toString());
> +            sessionStoreDelegate.deferredStoreDelegate(executor).onRecordStoreFailed(e, record.guid);
> +            payloadDispatcher.doStoreFailed(e);
> +        } else if (!sizeOverflow) {

To paraphrase what you have here:

case 0)
- if record is not a bookmark (say, history item)
- and it failed sanity checks for reasons other than "it's too large" (say, record's json is 0 bytes)
- then mark record's store as 'failed' and continue uploading

case 1)
- if record is not a bookmark
- and it failed sanity checks because it's too large
- continue uploading, and don't fail synchronization on the account of this one record

case 2)
- if record is a bookmark
- and it failed for any reason
- stop uploading

For case 0:
After we finish the flow, we'll examine how it went in ServerLocalSynchronizerSession's onSecondFlowCompleted, see that we failed to store at least one record, and declare this synchronization as failed.

On the next sync, we'll attempt to do this whole thing again, since failed syncs don't bump last-synced timestamp of the sync stage. It seems like if we encounter such a record at some point, we'll just keep trying to re-upload all of the modified records for that collection, again and again, until something changes with the bad local record.

Uploader already follows case 0, so maintaining that behavior for a subset of records is fine - although it seems far from ideal.

The "what this bug attempts to resolve" case 2 is well handled, and case 1 brings android inline with desktop's behaviour.

It'll be nice if you maybe copy-pastad some of these comments into the code, to make things easier for whoever's unfortunate enough to maintain this.
Attachment #8905271 - Flags: review?(gkruglov) → review+
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/83aa9559d548
Have android abort bookmark syncs if a record is too large to upload to the server r=Grisha
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

This should fix it, clearing grisha's review to make sure this looks sane.
Flags: needinfo?(tchiovoloni)
Attachment #8905271 - Flags: review+ → review?(gkruglov)
Comment on attachment 8905271 [details]
Bug 1362206 - Have android abort bookmark syncs if a record is too large to upload to the server

https://reviewboard.mozilla.org/r/177068/#review194212

That's what we talked about, seems sane. I wonder if this breaks any unit tests. And if it doesn't, perhaps this flow isn't well tested?
Attachment #8905271 - Flags: review?(gkruglov) → review+
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2ad9ea63f3cc
Have android abort bookmark syncs if a record is too large to upload to the server r=Grisha
Product: Android Background Services → Firefox for Android
https://hg.mozilla.org/mozilla-central/rev/2ad9ea63f3cc
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
Depends on: 1409472
You need to log in before you can comment on or make changes to this bug.