Startup Crash in java.lang.IllegalStateException: at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.getState(AndroidFxAccount.java)
Categories
(Firefox for Android Graveyard :: Firefox Accounts, defect, P2)
Tracking
(firefox-esr68 affected, firefox63 wontfix, firefox64 wontfix, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix)
People
(Reporter: pascalc, Assigned: andrei.a.lazar)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
Reporter | ||
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Comment 3•7 years ago
|
||
Comment 5•7 years ago
|
||
Updated•7 years ago
|
Comment 7•7 years ago
•
|
||
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
Bug 1505720 is about state misalignment that can be triggered by user action.
This bug is a startup crash that's triggered within the "require re-authentication if application data was cleared" code path. Bug 1429735 landed that functionality.
Comment 10•7 years ago
|
||
Reading relevant code a bit more along with the stack traces, it seems that Bug 1429735 is connected only incidentally, in a sense that it's making a certain error condition more likely to show up (since we're hitting that code path more often now).
The specific error condition is misalignment of account's 'state' information with what we expect to see. Our expectation is that an account that we get from an Android Account manager will have a stateLabel & state json dump, which are necessary to recreate this account's FxA state machine state. We assume we'll have this state because we set it explicitly during account creation, never delete it and only update it. See [0] for when state is created initially.
However, when we actually query the account manager, at least one of those is null
, meaning that we can't recreate an account state accurately. In fact, since this situation appears entirely not possible given our expectations above, we actually bail out at this point and throw a runtime exception, which is what those stack traces are.
Digging into the stacktraces a bit, 70% of these crashes happen on Huawei phones, mostly running API26. It's certainly not an even distribution over the user base, and so likely the cause of these crashes is some bad interaction with the accounts system on these specific devices. Perhaps the accounts system is buggy in these cases, or modified in a way that doesn't follow common patterns (wouldn't be a surprise to me).
There are two main directions to go here.
- investigate what exactly happens on these specific devices.
- That is: How can we create an android account, pass in userdata via
addAccountExplicitly
, make sure it's really set by also iterating over userdata and setting it again (see [1]), and then still get back 'null' when we query it later on?
- provide some kind of an escape hatch once we get into this state.
- This could look like
-- re-initializing an account and hoping it'll work this time,
-- returning a "fake" state and trying to persist that state in the background
-- deleting an account entirely, which in practice means that the FxA system just won't work at all for the affected devices
-- something else..? how can we do a sensible thing here when our core expectations aren't met?
I have a feeling that none of these options are going to be particularly easy or straightforward, but they're worth investigating. A key hurdle to get around might be that we're hitting this code path from many different entry points. There isn't a single owner of the account state, and so whoever queries it first after the problem starts to manifest itself will hit this problem. This makes our life harder, since it means that by the time we're in getState and crashing, we're deep in one of many top-level interactions.
Back when I actually worked on this codebase, the plan was to unify account management under a long-lived singleton-like object with a very explicit lifecycle, so that we have a single point of entry and ownership of these states, and thus a much easier recovery paths. Alas, that hasn't happened.
[0] https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java?q=path%3AAndroidFxAccount.java&redirect_type=single#579
[1] https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/fxa/authenticator/AndroidFxAccount.java?q=path%3AAndroidFxAccount.java&redirect_type=single#601
Updated•7 years ago
|
Comment 11•6 years ago
|
||
Hi all,
I encountered this crash today on Beta 66.0b11 ARM64 build (https://hg.mozilla.org/releases/mozilla-beta/rev/b08f44691ec3b9cdfb99bfd1a09c464e0a35785b), on Google Pixel (Android 9.0).
It was reproducible 2/3 times.
Not reproducible on Beta 66.0b11 from Playstore.
Should I file a new bug?
Crash signature:
java.lang.NullPointerException: at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.renameAccountIfNecessary(AndroidFxAccount.java)
Comment 12•6 years ago
|
||
Andrei, what are the next steps for this bug? The overall crash volume is pretty low, but it sounds like the crash is reproducible.
Comment 13•6 years ago
|
||
It sounds like this is not only reproducible but is a code path many users are likely to hit. Bumping the priority to P1.
Mirabela, did you have any clear STR to hit this bug?
Updated•6 years ago
|
Comment 14•6 years ago
|
||
Hi Liz,
The STR are:
Steps to reproduce:
- Go to Settings -> Sign in, and sign in with valid credentials.
- Go to Settings to your FxA and Disconnect
- Beta crashes
Thanks!
Assignee | ||
Comment 15•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
Andrei, what are the next steps for this bug? The overall crash volume is pretty low, but it sounds like the crash is reproducible.
I tried a fix for this issue but it looks like I didn't covered all the cases and Grisha suggested another approach for this issue which requires far more resources, but Susheel said we shouldn't invest in maintaining this component unless it's high crash volume.
(In reply to miralobontiu from comment #14)
Hi Liz,
The STR are:
Steps to reproduce:
- Go to Settings -> Sign in, and sign in with valid credentials.
- Go to Settings to your FxA and Disconnect
- Beta crashes
Thanks!
This is low volume because it happens only if you disconnect fast enough after you just signed in (~3 seconds, depends on internet connection).
I would add this description to STR.
Mira, can you confirm please?
Comment 16•6 years ago
|
||
Hi Andrei,
Yes, I confirm that Beta crashes only if I disconnect really fast.
If I wait for about 10 seconds, Beta doesn`t crash.
Thanks!
Comment 17•6 years ago
|
||
Stefan, it sounds like this startup crash in Accounts may increase in volume for Fennec users running ARM builds in the future, though it isn't high volume now. I'm not sure who might investigate further. Possibly glandium? What do you think?
Comment 18•6 years ago
|
||
Why is this related to ARM builds?
Comment 19•6 years ago
|
||
(In reply to Andrei Lazar from comment #15)
(In reply to Chris Peterson [:cpeterson] from comment #12)
Andrei, what are the next steps for this bug? The overall crash volume is pretty low, but it sounds like the crash is reproducible.
I tried a fix for this issue but it looks like I didn't covered all the cases and Grisha suggested another approach for this issue which requires far more resources, but Susheel said we shouldn't invest in maintaining this component unless it's high crash volume.
That's a good point. This is Fennec FxA code and is probably not reused in Fenix, so this bug's priority is not critical.
(In reply to Stefan Arentz [:st3fan] from comment #18)
Why is this related to ARM builds?
This bug is not ARM-specific. This Java exception indicates that the cached FxA bundle does not exist or its JSON data is malformed. There are a few x86 reports, too.
java.lang.IllegalStateException: stateLabelString and stateString must not be null, but: (stateLabelString == null) = true and (stateString == null) = true
Comment 20•6 years ago
•
|
||
Hi, I tried to reproduce this issue on the latest version of Nightly 68.0a1 (2019-03-27) using Google Pixel (Android 9) using the information from comment 14 and comment 16 but I wasn't able.
Updated•6 years ago
|
Comment 21•6 years ago
|
||
This crash is still in 68, but the volume is extremely low. Let's see what happens when 68 is released.
Comment 22•6 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 23•6 years ago
|
||
Still present in 68 betas, but volume is fairly minimal.
Comment 24•6 years ago
|
||
I was able to reproduce this crash 2 times consecutively on Fennec 68.1.1 using Google Pixel (Android 8.0.0):
bp-0c514db7-481c-4e9b-8172-939980191007
bp-0d9d45ff-c532-4021-ab3f-e23c60191007
Comment 25•5 years ago
|
||
Updated•5 years ago
|
Description
•