Closed
Bug 1401318
Opened 7 years ago
Closed 7 years ago
Crash in java.lang.NullPointerException: Attempt to read from field ''java.lang.String android.accounts.Account.name'' on a null object reference at android.os.Parcel.readException(Parcel.java)
Categories
(Firefox for Android Graveyard :: Firefox Accounts, defect)
Tracking
(firefox55 unaffected, firefox56 unaffected, firefox57 fixed, firefox58 fixed)
RESOLVED
FIXED
Firefox 57
Tracking | Status | |
---|---|---|
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | fixed |
firefox58 | --- | fixed |
People
(Reporter: Grisha, Assigned: Grisha)
References
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
59 bytes,
text/x-review-board-request
|
nalexander
:
review+
|
Details |
8.25 KB,
patch
|
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-e3bb7cfd-f51d-4089-9e09-970ed0170919.
=============================================================
Observed this in Sync Fest after changing primary email on an API21+ device.
Assignee | ||
Comment 1•7 years ago
|
||
java.lang.NullPointerException: Attempt to read from field 'java.lang.String android.accounts.Account.name' on a null object reference
at android.os.Parcel.readException(Parcel.java:1699)
at android.os.Parcel.readException(Parcel.java:1646)
at android.accounts.IAccountManager$Stub$Proxy.renameAccount(IAccountManager.java:1535)
at android.accounts.AccountManager$5.doWork(AccountManager.java:804)
at android.accounts.AccountManager$BaseFutureTask.startTask(AccountManager.java:2073)
at android.accounts.AccountManager$Future2Task.start(AccountManager.java:2132)
at android.accounts.AccountManager.renameAccount(AccountManager.java:801)
at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.doOptionalProfileRename21Plus(AndroidFxAccount.java:1112)
at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.renameAccountIfNecessary(AndroidFxAccount.java:1100)
at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.access$300(AndroidFxAccount.java:69)
at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount$ProfileResultReceiver.onReceiveResult(AndroidFxAccount.java:1023)
at android.os.ResultReceiver$MyRunnable.run(ResultReceiver.java:50)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43)
Assignee | ||
Comment 2•7 years ago
|
||
Somehow we're passing null account into a call to accountManager.renameAccount.
Assignee | ||
Comment 3•7 years ago
|
||
Actually, no. First thing that renameAccount does is checks if the account==null, and throws. And so, account becomes null sometime later.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 6•7 years ago
|
||
The patch above seems to do the trick - I can't reproduce the issue locally. But I don't really understand how Comment 3 can happen :(
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8910517 [details]
Bug 1401318 - Fix some of the 'shared-state' access problems around Account
https://reviewboard.mozilla.org/r/181938/#review187290
Attachment #8910517 -
Flags: review?(nalexander) → review+
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → gkruglov
Status: NEW → ASSIGNED
Pushed by gkruglov@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dd9b01ec0df2
Make sure account won't be null when it's processed by Android APIs r=nalexander
Comment 9•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Updated•7 years ago
|
status-firefox55:
--- → unaffected
status-firefox56:
--- → unaffected
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #9)
> https://hg.mozilla.org/mozilla-central/rev/dd9b01ec0df2
Sebastian, you've marked this as fixed in 57, but the push page says that the first build with this patch is 20170921220243, which it says is 58. Can you clarify which one it is? This needs to be in 57, so it'd be good to uplift this soon if we need to.
Same question for Bug 1400742.
Flags: needinfo?(aryx.bugmail)
Comment 11•7 years ago
|
||
It's in 57, see e.g. the milestone for https://hg.mozilla.org/mozilla-central/rev/dd9b01ec0df2 It was part of the last autoland merge before m-c stopped syncing to beta. The situation is similar for 1400742. First _Nightly_ builds with the pushes are 58.0a1 because they ran >8 hours later (with the version bump added).
Flags: needinfo?(aryx.bugmail)
Assignee | ||
Comment 12•7 years ago
|
||
Well, the patch that didn't quite make sense also didn't quite fixed this (surprise!). I'm still seeing these crashes on my device after changing the email on a different device (specifically, in the emulator).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 13•7 years ago
|
||
Not yet sure if this is related, but I'm seeing now in the emulator that the accounts.db file is locked:
10-05 19:59:30.378 1548 2172 W SQLiteConnectionPool: The connection pool for database '/data/system_de/0/accounts_de.db' has been unable to grant a connection to thread 180 (Binder:1548_B) with flags 0x2 for 5550.1 seconds.
generic_x86:/data/system_de/0 # sqlite3 accounts_de.db
sqlite> .tables
Error: database is locked
The strange thing is that I never did manage to crash the emulator, so not sure how it got to this state yet - but it did seem to happen after changing an email on the physical device and triggering a manual sync on the emulator after not observing an email change. Restarting the emulator seems to have reset things back to normal - and the email change was picked up.
Assignee | ||
Comment 14•7 years ago
|
||
As a back-up plan if we can't figure this out in time, I've confirmed that the manual pre-21 API flow appears to works well on the new emulator. Next up, try it on a device.
Assignee | ||
Comment 15•7 years ago
|
||
Switching to 'doOptionalProfileRenamePre21' as the primary code path appears to work well on-device (running API26). I can't reproduce the crashes anymore. Given the tight 57 uplift deadline, this might be our best bet - same code path for all API levels.
Assignee | ||
Comment 16•7 years ago
|
||
Let's read the source code of the crashing API to understand this better.
(This seems like a reasonable guess at a version my device is running)
We might be crashing in one of three places:
1) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1376
2) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1425
3) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1456
There's a check for 'accountToRename' not being null, and so it's likely that we're crashing in (1).
Notice how a call to renameAccountInternal might actually return 'null' if the isSuccessful flag isn't set. Given our somewhat ambiguous stack trace, my guess is that this happens because 'getAccountIdLocked' call returns <0 (see https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1418).
There is no error checking or sanity checks in place if that happens, and it ends up crashing via an NPE in the caller.
Here's the getAccountIdLocked method impl: https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#3741
So if the above reading is correct, it's failing to actually find the "account to rename" by name & type in an internal accounts db. Not sure why yet!
Assignee | ||
Comment 17•7 years ago
|
||
Reading android services source code, I can't quite see where this might be going wrong. However, one theory that seems plausible: we might be performing renameAccount call twice for a given 'account' object. The second call will execute normally until it tries to lookup the account by name&type tuple, and fails (since it's already been renamed).
Assignee | ||
Comment 18•7 years ago
|
||
After some trial and error with debug logging in place, replicated the crash in a way that now makes sense:
10-05 16:49:32.993 30398 30430 I GeckoLogger: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: grigory.kruglov@gmail.com NEW NAME: gkruglov@mozilla.com
10-05 16:49:33.319 30398 30430 I GeckoLogger: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: grigory.kruglov@gmail.com NEW NAME: gkruglov@mozilla.com
... and it blows up.
Oddly enough, I couldn't replicate the usual "change email on device A, it crashes o device B" scenario. The above is simply "change email on device A, it crashes on device A".
Most likely we're racing to change the email twice via two channels: via a push message and via a webchannel. Timestamp difference above is roughly how long it'll take for the device to get a push message.
Assignee | ||
Comment 19•7 years ago
|
||
Here's what a bad race looks like:
10-05 17:20:09.069 7625 7680 I GRIGRI : WEBCHANNEL: Accounts:ProfileUpdated
10-05 17:20:09.093 7625 12860 I GRIGRI : Fetching https://profile.accounts.firefox.com/v1/profile
10-05 17:20:09.289 7625 12862 D GeckoPushGCM: Message received. Processing on background thread.
10-05 17:20:09.296 7625 7799 I GeckoPushService: Google Play Services GCM message received; delivering.
10-05 17:20:09.296 7625 7799 I GRIGRI : GOT PUSH!
10-05 17:20:09.488 7625 7799 I GRIGRI : HANDLING PROFILE UPDATE PUSH
10-05 17:20:09.584 7625 12860 I GRIGRI : Fetching https://profile.accounts.firefox.com/v1/profile
10-05 17:20:09.584 7625 7865 I GRIGRI : 200 response! JSON: {"email":"grigory.kruglov@gmail.com","uid":"****","displayName":"grisha","avatar":"https:\/\/firefoxusercontent.com\/***"}
10-05 17:20:09.590 7625 7799 I FxAccounts: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: gkruglov@mozilla.com NEW NAME: grigory.kruglov@gmail.com
10-05 17:20:09.692 7625 7865 I GRIGRI : 200 response! JSON: {"email":"grigory.kruglov@gmail.com","uid":"****","displayName":"grisha","avatar":"https:\/\/firefoxusercontent.com\/***"}
10-05 17:20:09.984 7625 7799 I FxAccounts: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: gkruglov@mozilla.com NEW NAME: grigory.kruglov@gmail.com
The simplest thing to do is probably to synchronize profile renaming.
Assignee | ||
Comment 20•7 years ago
|
||
The higher order problem here is that we have many different representations of the shared state, and they're not kept in-sync; i.e. it's possible to have many instances of AndroidFxAccount present with their internal 'account' object out-of-sync with reality. The problem is pretty narrow - we tend to create these object right before using them - but it's clear in this bug that in presence of variable-time IO and exacerbating circumstances (two things happening at once) it very much is a problem. I vaguely suspect we've seen this in other "strange" accounts crashes as well. I've started addressing this a while back in some exploratory patches by moving towards a shared singleton holder for the accounts data out of perf considerations, but now it's clear that there are correctness benefits as well.
Smallest possible patch upcoming...
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8910517 [details]
Bug 1401318 - Fix some of the 'shared-state' access problems around Account
https://reviewboard.mozilla.org/r/181938/#review193242
I have seen strange shared account locking log messages that I have never investigated, but hopefully we don't see deadlocks in the wild.
Comment 23•7 years ago
|
||
Pushed by gkruglov@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e464d1cc05a6
Fix some of the 'shared-state' access problems around Account r=nalexander
Assignee | ||
Comment 24•7 years ago
|
||
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1368147
[User impact if declined]: A crash might occur after user changes primary email associated with the account.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: Not yet, just landed now. Bug is reproducible locally, verified that crashes do not occur.
[Needs manual test from QE? If yes, steps to reproduce]: Wouldn't hurt. Crash won't happen every time. On a logged-in device, open FxA management page, append &canChangeEmail=true to the URI, and under Secondary Email change the primary email a bunch of times. Similarly, change the primary email on a different device a bunch of times. And finally, change primary email a bunch of times on a different device, and wait until device-under-test misses a push notification and doesn't update the primary email. Then, in Sync Prefs, trigger a sync. None of these cases should crash, and a sensible thing should always happen (primary email is changed).
[List of other uplifts needed for the feature/fix]: Already uplited!
[Is the change risky?]: Not very.
[Why is the change risky/not risky?]: The primary fix is simple - refreshing internal state. The necessary locking is also simple, but it's possible it'll cause unforeseen issues. In my tests, it does not, and other users of the lock behave correctly when interrupted.
[String changes made/needed]: N/A.
Attachment #8917106 -
Flags: approval-mozilla-beta?
Comment 25•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Comment on attachment 8917106 [details] [diff] [review]
account-shared-state.patch
Fixes a new crash issue, Beta57+
Attachment #8917106 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8917106 [details] [diff] [review]
account-shared-state.patch
Actually on second thoughts if the crash volume is not large, is this a must fix for 57? It doesn't look like a simple not-null check to me, the size of the patch is usually directly proportional to the risk to quality.
Hi Snorp, could I get a please second opinion from you on this one?
Flags: needinfo?(snorp)
Attachment #8917106 -
Flags: approval-mozilla-beta+ → approval-mozilla-beta?
Assignee | ||
Comment 28•7 years ago
|
||
The crash volume is low for this particular signature because the feature is currently behind the flag on FxA side and must be explicitly enabled (via 'canChangeEmail' GET param). The crashes you currently see are all from devs, testers or early adopters. Once the feature is enabled, crash volume will rise proportionally to its usage.
The aim is to ship this feature with 57, so the uplift is unfortunately necessary if we are to follow through with that plan.
I also expect that this patch will fix Bug 1349147, which is unrelated to the "email change" feature, and has a considerable volume of crashes already.
Comment on attachment 8917106 [details] [diff] [review]
account-shared-state.patch
Thanks Grisha for the additional details. That makes sense and justifies the uplift. beta57+
Flags: needinfo?(snorp)
Attachment #8917106 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 30•7 years ago
|
||
bugherder uplift |
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•