Closed Bug 1408555 Opened 7 years ago Closed 3 years ago

Remove pickling/bundling from AndroidFxAccount

Categories

(Firefox for Android Graveyard :: Firefox Accounts, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: eoger, Unassigned)

Details

Attachments

(2 obsolete files)

      No description provided.
Priority: -- → P1
Component: Android Sync → Firefox Accounts
Summary: Remove bundling in AndroidFxAccount → Remove pickling/bundling from AndroidFxAccount
Comment on attachment 8918472 [details]
Bug 1408555 p1 - Remove Account pickling.

https://reviewboard.mozilla.org/r/189318/#review194624

Taking a (very brief) look before going away for the weekend. I'm not sure we can just remove pickling outright. We _shouldn't_ pickle, yes - especially for new accounts created after this patch. But consider what happens if an account was created sometime ago. With your patch, getAccountUID() will return 'null' if user is upgrading from a version prior to Bug 1368147 (so, from a version prior to 57), which will break things upstream (we won't be able to read Sync prefs, etc).

The most sensible thing to do, for now, seems to be:
- stop pickling, but live with the unpickling logic for a while (until we're confident no one with an account is running 56 or earlier, and might upgrade)

There is a possible alternative:
- on upgrade, if sync prefs haven't been migrated yet, get the accountUID out of the pickle file, trigger the necessary sync prefs migrations, then finally delete the pickle file. However, I don't think this is feasible - you're likely going to run into https://bugzilla.mozilla.org/show_bug.cgi?id=1368147#c49
Attachment #8918472 - Flags: review?(gkruglov) → review-
Thank you Grisha, I kept the unpicking logic and even added more keys (KEY_STATE_LABEL etc) to make sure tests pass.
Comment on attachment 8918464 [details]
Bug 1408555 p2 - Remove bundling from AndroidFxAccount.

https://reviewboard.mozilla.org/r/189308/#review196504

Let's outline the 'bundle' changes you're making here. I'll quote Nick from https://bugzilla.mozilla.org/show_bug.cgi?id=1368147#c46:

> * the layered user data store was introduced to allow us to be flexible, versioning different pieces of the account, while working
> around bugs with Android's `get/setUserData` implementations.  I think those bugs are gone.  We might dramatically simplify this
> whole thing by just using `get/setUserData` and not layeirng the data store.

> It's not at all obvious, but the original implementation has a 2-layer representation.  The outer layer includes stuff for Android
> (email, type, a little bit more) and then an inner bundle of stuff for FxA.  The idea was that the inner bundle could get
> corrupted and the Android account wouldn't be damaged; this was necessary because the `get/setUserData` API was mysteriously buggy
> (on some devices?  on some versions?), and we didn't trust that we could store `N` pieces of data robustly.  (And we definitely
> couldn't do so in a single method invocation.)

> I'm not sure that the original spirit is represented in the current implementation at this point; or if it should be.

And so, we're making an assumption here that the original concerns about data corruption do not apply anymore, and that the simplified representation of the data layer is going to be robust enough. Additionally, we're assuming that the maintenance cost of keeping this logic around is higher than any risk we're taking on by doing this work. 

In a bundle, we currently store:
- state label
- state's JSON representation
- profile's JSON representation

Bundle is stored as a single entry in the account's userdata, as a JSON dump of the above key/values.

Current change moves these individual pieces of information into separate key/value entries in the userdata. Userdata is backed by a SQL database, and calls such as 'setUserData' end up performing individual INSERTs or UPDATEs. So what before was an atomic operation, now becomes inherently unsafe.

You also perform a necessary migration - from the single JSON bundle into three separate keys. This migration might fail in unexpected ways - it won't be the first time we see failures parsing JSON - and will leave us in a very broken state - see notes below.

And so, I have two major concerns about this work, after seeing it sketched out (apologies for not thinking this through before you did it!):
- state label & state json need to be a single key entry, otherwise we're risking partial read/write failures due to a non-atomic nature of the writes/reads.
- bundle migration has very sharp edges, sharp enough to not just make firefox account unusable, but to also make the app crash constantly in the worst cases.

At this point, I think it's a good to consider:
- how likely are these sharp edges?
- what will it take to recover from them? In theory we should be able to surface up some UX to initialize some sort of "account recovery" (delete account and show a login flow?)
- given the added cost and risks, how much do we care about this cleanup?

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java:181
(Diff revision 4)
>     */
>    public AndroidFxAccount(Context applicationContext, Account account) {
>      this.context = applicationContext;
>      this.account = account;
>      this.accountManager = AccountManager.get(this.context);
> +    maybeMigrateBundle();

So, Murphy's law...

Depending on a failure mode, we'll have an account with a broken internal state from which we can't recover.

In fact, we're likely to crash hard in AndroidFxAccount's getState.

If we somehow loose our state information, the most sane thing to do seems to be to remove our system account entirely, somehow, in order to let the user recover. The experience will be ugly. But, this is probably not a good place to do this - AndroidFxAccount is init'd in a lot of different places at different times, from UI to backend and push services. I'm also not sure we have a precedent in the codebase.

Another approach is to ensure that at the very least we won't crash elsewhere if things go wrong during a migration, and somehow surface a notification or an error state to the user, letting them recover manually...

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java:323
(Diff revision 4)
>      final String product = GlobalConstants.BROWSER_INTENT_PACKAGE + ".reading";
>      final long version = CURRENT_RL_PREFS_VERSION;
>      return constructPrefsPath(accountKey, product, version, "");
>    }
>  
> +  private void maybeMigrateBundle() {

Failure here means a broken account, iiuc. If we fail to migrate state representation, we loose keyFetchToken, sessionToken, and unwrapkB, so we won't be able to sync or talk to FxA.

This is as sharp of an edge as we might hit. It's worth considering all of the failure scenarious in detail.

E.g. if we fail to migrate over profileJSON - that's fine, we will re-fetch it later (please double check that flow).

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java:340
(Diff revision 4)
> +      // Unwrap bundle values and save them individually.
> +      accountManager.setUserData(account, ACCOUNT_KEY_STATE_LABEL, stateLabel);
> +      accountManager.setUserData(account, ACCOUNT_KEY_STATE, state);
> +      accountManager.setUserData(account, ACCOUNT_KEY_PROFILE_JSON, profileJSON);
> +    } catch (Exception e) {
> +      Log.e(LOG_TAG, "Error in maybeMigrateBundle", e);

What exactly are you catching here? Where can we fail? Please be more specific with such exceptions.

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java:481
(Diff revision 4)
>        @NonNull String tokenServerURI,
>        @NonNull String profileServerURI,
>        @NonNull State state,
>        final Map<String, Boolean> authoritiesToSyncAutomaticallyMap,
>        final int accountVersion,
>        final boolean fromPickle,

I wonder, what about adding a telemetry event here to track whenever "fromPickle" path is taken? This will help us understand what % of users are still hitting this case.

If I understand this correctly, now that users can't install firefox to an SD card, this will be... 0?
Attachment #8918464 - Flags: review?(gkruglov) → review-
(In reply to :Grisha Kruglov from comment #10)
> Comment on attachment 8918464 [details]
> Bug 1408555 p2 - Remove bundling from AndroidFxAccount.
> 
> https://reviewboard.mozilla.org/r/189308/#review196504
> 
> Let's outline the 'bundle' changes you're making here. I'll quote Nick from
> https://bugzilla.mozilla.org/show_bug.cgi?id=1368147#c46:
> 
> > * the layered user data store was introduced to allow us to be flexible, versioning different pieces of the account, while working
> > around bugs with Android's `get/setUserData` implementations.  I think those bugs are gone.  We might dramatically simplify this
> > whole thing by just using `get/setUserData` and not layeirng the data store.
> 
> > It's not at all obvious, but the original implementation has a 2-layer representation.  The outer layer includes stuff for Android
> > (email, type, a little bit more) and then an inner bundle of stuff for FxA.  The idea was that the inner bundle could get
> > corrupted and the Android account wouldn't be damaged; this was necessary because the `get/setUserData` API was mysteriously buggy
> > (on some devices?  on some versions?), and we didn't trust that we could store `N` pieces of data robustly.  (And we definitely
> > couldn't do so in a single method invocation.)
> 
> > I'm not sure that the original spirit is represented in the current implementation at this point; or if it should be.
> 
> And so, we're making an assumption here that the original concerns about
> data corruption do not apply anymore, and that the simplified representation
> of the data layer is going to be robust enough. Additionally, we're assuming
> that the maintenance cost of keeping this logic around is higher than any
> risk we're taking on by doing this work. 
> 
> In a bundle, we currently store:
> - state label
> - state's JSON representation
> - profile's JSON representation
> 
> Bundle is stored as a single entry in the account's userdata, as a JSON dump
> of the above key/values.
> 
> Current change moves these individual pieces of information into separate
> key/value entries in the userdata. Userdata is backed by a SQL database, and
> calls such as 'setUserData' end up performing individual INSERTs or UPDATEs.
> So what before was an atomic operation, now becomes inherently unsafe.
> 
> You also perform a necessary migration - from the single JSON bundle into
> three separate keys. This migration might fail in unexpected ways - it won't
> be the first time we see failures parsing JSON - and will leave us in a very
> broken state - see notes below.
> 
> And so, I have two major concerns about this work, after seeing it sketched
> out (apologies for not thinking this through before you did it!):
> - state label & state json need to be a single key entry, otherwise we're
> risking partial read/write failures due to a non-atomic nature of the
> writes/reads.
> - bundle migration has very sharp edges, sharp enough to not just make
> firefox account unusable, but to also make the app crash constantly in the
> worst cases.

I think this is a good summary of the issues.  The existing code has grow horrific warts to work around issues, but the fundamental reality hasn't changed:

- the Android Account API isn't atomic;
- we do need to version the account data

> At this point, I think it's a good to consider:
> - how likely are these sharp edges?
> - what will it take to recover from them? In theory we should be able to
> surface up some UX to initialize some sort of "account recovery" (delete
> account and show a login flow?)
> - given the added cost and risks, how much do we care about this cleanup?

This is what the Doghouse state was intended for, although I think I didn't quite get it correct!  See http://searchfox.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/fxa/login/Doghouse.java.

The idea is that you can _always_ drive the Android account into the Doghouse, but the reality is that you can't, since the Doghouse state also wants to capture the email/UID or something else that you might have already lost.  In that situation, you really do need to delete the account or somehow work around the missing data.  (Which might already be done in FxAccountStatusActivity, but probably _isn't_ done elsewhere in the App.)  This was improved in Firefox for iOS, IIRC.
 
> :::
> mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/
> AndroidFxAccount.java:181
> (Diff revision 4)
> >     */
> >    public AndroidFxAccount(Context applicationContext, Account account) {
> >      this.context = applicationContext;
> >      this.account = account;
> >      this.accountManager = AccountManager.get(this.context);
> > +    maybeMigrateBundle();
> 
> So, Murphy's law...
> 
> Depending on a failure mode, we'll have an account with a broken internal
> state from which we can't recover.
> 
> In fact, we're likely to crash hard in AndroidFxAccount's getState.
> 
> If we somehow loose our state information, the most sane thing to do seems
> to be to remove our system account entirely, somehow, in order to let the
> user recover. The experience will be ugly. But, this is probably not a good
> place to do this - AndroidFxAccount is init'd in a lot of different places
> at different times, from UI to backend and push services. I'm also not sure
> we have a precedent in the codebase.
> 
> Another approach is to ensure that at the very least we won't crash
> elsewhere if things go wrong during a migration, and somehow surface a
> notification or an error state to the user, letting them recover manually...
> 
> :::
> mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/
> AndroidFxAccount.java:323
> (Diff revision 4)
> >      final String product = GlobalConstants.BROWSER_INTENT_PACKAGE + ".reading";
> >      final long version = CURRENT_RL_PREFS_VERSION;
> >      return constructPrefsPath(accountKey, product, version, "");
> >    }
> >  
> > +  private void maybeMigrateBundle() {
> 
> Failure here means a broken account, iiuc. If we fail to migrate state
> representation, we loose keyFetchToken, sessionToken, and unwrapkB, so we
> won't be able to sync or talk to FxA.
> 
> This is as sharp of an edge as we might hit. It's worth considering all of
> the failure scenarious in detail.
> 
> E.g. if we fail to migrate over profileJSON - that's fine, we will re-fetch
> it later (please double check that flow).
> 
> :::
> mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/
> AndroidFxAccount.java:340
> (Diff revision 4)
> > +      // Unwrap bundle values and save them individually.
> > +      accountManager.setUserData(account, ACCOUNT_KEY_STATE_LABEL, stateLabel);
> > +      accountManager.setUserData(account, ACCOUNT_KEY_STATE, state);
> > +      accountManager.setUserData(account, ACCOUNT_KEY_PROFILE_JSON, profileJSON);
> > +    } catch (Exception e) {
> > +      Log.e(LOG_TAG, "Error in maybeMigrateBundle", e);
> 
> What exactly are you catching here? Where can we fail? Please be more
> specific with such exceptions.
> 
> :::
> mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/
> AndroidFxAccount.java:481
> (Diff revision 4)
> >        @NonNull String tokenServerURI,
> >        @NonNull String profileServerURI,
> >        @NonNull State state,
> >        final Map<String, Boolean> authoritiesToSyncAutomaticallyMap,
> >        final int accountVersion,
> >        final boolean fromPickle,
> 
> I wonder, what about adding a telemetry event here to track whenever
> "fromPickle" path is taken? This will help us understand what % of users are
> still hitting this case.
> 
> If I understand this correctly, now that users can't install firefox to an
> SD card, this will be... 0?

Sadly, I think the only safe way to achieve this goal is to "stage" the migration, handling both the "old-style" and the "new-style" accounts in parallel, and then rolling out the "new-style" features piece-meal as we gain confidence :(
My vote for this work: unless we're willing to do this clean-up properly, don't do it at all.

There is higher benefit, less risk accounts work we can tackle instead. Bug 1407316 is worth doing to clean up tech debt and probably get some perf benefits while we're at it.
Comment on attachment 8918472 [details]
Bug 1408555 p1 - Remove Account pickling.

https://reviewboard.mozilla.org/r/189318/#review196468

Clearing the flag for now; in light of the other patch, I'm not sure this is worth the effort by itself.

::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AccountPickler.java:139
(Diff revision 3)
>     * @param account the AndroidFxAccount to persist to disk
>     * @param filename name of file to persist to; must not contain path separators.
> +   * @deprecated Pickling/Un-pickling logic will be removed entirely in the future. This method is
> +   *             still present to make sure un-pickling tests pass.
>     */
>    public synchronized static void pickle(final AndroidFxAccount account, final String filename) {

FYI, adding @VisibleForTesting annotation will trigger warnings in Android Studio if this method is used outside of tests.
Attachment #8918472 - Flags: review?(gkruglov)
Let's get back to this later :) Resetting the priority so we can re-triage this.
Assignee: eoger → nobody
Status: ASSIGNED → NEW
Priority: P1 → --
Attachment #8918472 - Attachment is obsolete: true
Attachment #8918464 - Attachment is obsolete: true
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195

Needinfo :susheel if you think this bug should be re-triaged.
Priority: -- → P5
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: