1401318 - Crash in java.lang.NullPointerException: Attempt to read from field ''java.lang.String android.accounts.Account.name'' on a null object reference at android.os.Parcel.readException(Parcel.java)

Assignee

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-e3bb7cfd-f51d-4089-9e09-970ed0170919. ============================================================= Observed this in Sync Fest after changing primary email on an API21+ device.

:Grisha Kruglov

Assignee

Comment 1

•

7 years ago

java.lang.NullPointerException: Attempt to read from field 'java.lang.String android.accounts.Account.name' on a null object reference at android.os.Parcel.readException(Parcel.java:1699) at android.os.Parcel.readException(Parcel.java:1646) at android.accounts.IAccountManager$Stub$Proxy.renameAccount(IAccountManager.java:1535) at android.accounts.AccountManager$5.doWork(AccountManager.java:804) at android.accounts.AccountManager$BaseFutureTask.startTask(AccountManager.java:2073) at android.accounts.AccountManager$Future2Task.start(AccountManager.java:2132) at android.accounts.AccountManager.renameAccount(AccountManager.java:801) at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.doOptionalProfileRename21Plus(AndroidFxAccount.java:1112) at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.renameAccountIfNecessary(AndroidFxAccount.java:1100) at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.access$300(AndroidFxAccount.java:69) at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount$ProfileResultReceiver.onReceiveResult(AndroidFxAccount.java:1023) at android.os.ResultReceiver$MyRunnable.run(ResultReceiver.java:50) at android.os.Handler.handleCallback(Handler.java:751) at android.os.Handler.dispatchMessage(Handler.java:95) at android.os.Looper.loop(Looper.java:154) at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43)

:Grisha Kruglov

Assignee

Comment 2

•

7 years ago

Somehow we're passing null account into a call to accountManager.renameAccount.

:Grisha Kruglov

Assignee

Comment 3

•

7 years ago

Actually, no. First thing that renameAccount does is checks if the account==null, and throws. And so, account becomes null sometime later.

Comment hidden (mozreview-request)

:Grisha Kruglov

Assignee

Comment 6

•

7 years ago

The patch above seems to do the trick - I can't reproduce the issue locally. But I don't really understand how Comment 3 can happen :(

Nick Alexander :nalexander [he/him]

Comment 7

•

7 years ago

mozreview-review

Comment on attachment 8910517 [details] Bug 1401318 - Fix some of the 'shared-state' access problems around Account https://reviewboard.mozilla.org/r/181938/#review187290

Attachment #8910517 - Flags: review?(nalexander) → review+

:Grisha Kruglov

Assignee

Updated

•

7 years ago

Assignee: nobody → gkruglov

Status: NEW → ASSIGNED

Pulsebot

Comment 8

•

7 years ago

Pushed by gkruglov@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dd9b01ec0df2 Make sure account won't be null when it's processed by Android APIs r=nalexander

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 9

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/dd9b01ec0df2

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 57

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox55: --- → unaffected

status-firefox56: --- → unaffected

:Grisha Kruglov

Assignee

Comment 10

•

7 years ago

(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #9) > https://hg.mozilla.org/mozilla-central/rev/dd9b01ec0df2 Sebastian, you've marked this as fixed in 57, but the push page says that the first build with this patch is 20170921220243, which it says is 58. Can you clarify which one it is? This needs to be in 57, so it'd be good to uplift this soon if we need to. Same question for Bug 1400742.

Flags: needinfo?(aryx.bugmail)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 11

•

7 years ago

It's in 57, see e.g. the milestone for https://hg.mozilla.org/mozilla-central/rev/dd9b01ec0df2 It was part of the last autoland merge before m-c stopped syncing to beta. The situation is similar for 1400742. First _Nightly_ builds with the pushes are 58.0a1 because they ran >8 hours later (with the version bump added).

Flags: needinfo?(aryx.bugmail)

:Grisha Kruglov

Assignee

Comment 12

•

7 years ago

Well, the patch that didn't quite make sense also didn't quite fixed this (surprise!). I'm still seeing these crashes on my device after changing the email on a different device (specifically, in the emulator).

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

:Grisha Kruglov

Assignee

Comment 13

•

7 years ago

Not yet sure if this is related, but I'm seeing now in the emulator that the accounts.db file is locked: 10-05 19:59:30.378 1548 2172 W SQLiteConnectionPool: The connection pool for database '/data/system_de/0/accounts_de.db' has been unable to grant a connection to thread 180 (Binder:1548_B) with flags 0x2 for 5550.1 seconds. generic_x86:/data/system_de/0 # sqlite3 accounts_de.db sqlite> .tables Error: database is locked The strange thing is that I never did manage to crash the emulator, so not sure how it got to this state yet - but it did seem to happen after changing an email on the physical device and triggering a manual sync on the emulator after not observing an email change. Restarting the emulator seems to have reset things back to normal - and the email change was picked up.

:Grisha Kruglov

Assignee

Comment 14

•

7 years ago

As a back-up plan if we can't figure this out in time, I've confirmed that the manual pre-21 API flow appears to works well on the new emulator. Next up, try it on a device.

:Grisha Kruglov

Assignee

Comment 15

•

7 years ago

Switching to 'doOptionalProfileRenamePre21' as the primary code path appears to work well on-device (running API26). I can't reproduce the crashes anymore. Given the tight 57 uplift deadline, this might be our best bet - same code path for all API levels.

:Grisha Kruglov

Assignee

Comment 16

•

7 years ago

Let's read the source code of the crashing API to understand this better. (This seems like a reasonable guess at a version my device is running) We might be crashing in one of three places: 1) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1376 2) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1425 3) https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1456 There's a check for 'accountToRename' not being null, and so it's likely that we're crashing in (1). Notice how a call to renameAccountInternal might actually return 'null' if the isSuccessful flag isn't set. Given our somewhat ambiguous stack trace, my guess is that this happens because 'getAccountIdLocked' call returns <0 (see https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#1418). There is no error checking or sanity checks in place if that happens, and it ends up crashing via an NPE in the caller. Here's the getAccountIdLocked method impl: https://android.googlesource.com/platform/frameworks/base/+/android-7.0.0_r34/services/core/java/com/android/server/accounts/AccountManagerService.java?autodive=0%2F%2F#3741 So if the above reading is correct, it's failing to actually find the "account to rename" by name & type in an internal accounts db. Not sure why yet!

:Grisha Kruglov

Assignee

Comment 17

•

7 years ago

Reading android services source code, I can't quite see where this might be going wrong. However, one theory that seems plausible: we might be performing renameAccount call twice for a given 'account' object. The second call will execute normally until it tries to lookup the account by name&type tuple, and fails (since it's already been renamed).

:Grisha Kruglov

Assignee

Comment 18

•

7 years ago

After some trial and error with debug logging in place, replicated the crash in a way that now makes sense: 10-05 16:49:32.993 30398 30430 I GeckoLogger: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: grigory.kruglov@gmail.com NEW NAME: gkruglov@mozilla.com 10-05 16:49:33.319 30398 30430 I GeckoLogger: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: grigory.kruglov@gmail.com NEW NAME: gkruglov@mozilla.com ... and it blows up. Oddly enough, I couldn't replicate the usual "change email on device A, it crashes o device B" scenario. The above is simply "change email on device A, it crashes on device A". Most likely we're racing to change the email twice via two channels: via a push message and via a webchannel. Timestamp difference above is roughly how long it'll take for the device to get a push message.

:Grisha Kruglov

Assignee

Comment 19

•

7 years ago

Here's what a bad race looks like: 10-05 17:20:09.069 7625 7680 I GRIGRI : WEBCHANNEL: Accounts:ProfileUpdated 10-05 17:20:09.093 7625 12860 I GRIGRI : Fetching https://profile.accounts.firefox.com/v1/profile 10-05 17:20:09.289 7625 12862 D GeckoPushGCM: Message received. Processing on background thread. 10-05 17:20:09.296 7625 7799 I GeckoPushService: Google Play Services GCM message received; delivering. 10-05 17:20:09.296 7625 7799 I GRIGRI : GOT PUSH! 10-05 17:20:09.488 7625 7799 I GRIGRI : HANDLING PROFILE UPDATE PUSH 10-05 17:20:09.584 7625 12860 I GRIGRI : Fetching https://profile.accounts.firefox.com/v1/profile 10-05 17:20:09.584 7625 7865 I GRIGRI : 200 response! JSON: {"email":"grigory.kruglov@gmail.com","uid":"****","displayName":"grisha","avatar":"https:\/\/firefoxusercontent.com\/***"} 10-05 17:20:09.590 7625 7799 I FxAccounts: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: gkruglov@mozilla.com NEW NAME: grigory.kruglov@gmail.com 10-05 17:20:09.692 7625 7865 I GRIGRI : 200 response! JSON: {"email":"grigory.kruglov@gmail.com","uid":"****","displayName":"grisha","avatar":"https:\/\/firefoxusercontent.com\/***"} 10-05 17:20:09.984 7625 7799 I FxAccounts: fennec_grisha :: GRIGRI :: RENAMING ACCOUNT! CURRENT NAME: gkruglov@mozilla.com NEW NAME: grigory.kruglov@gmail.com The simplest thing to do is probably to synchronize profile renaming.

:Grisha Kruglov

Assignee

Comment 20

•

7 years ago

The higher order problem here is that we have many different representations of the shared state, and they're not kept in-sync; i.e. it's possible to have many instances of AndroidFxAccount present with their internal 'account' object out-of-sync with reality. The problem is pretty narrow - we tend to create these object right before using them - but it's clear in this bug that in presence of variable-time IO and exacerbating circumstances (two things happening at once) it very much is a problem. I vaguely suspect we've seen this in other "strange" accounts crashes as well. I've started addressing this a while back in some exploratory patches by moving towards a shared singleton holder for the accounts data out of perf considerations, but now it's clear that there are correctness benefits as well. Smallest possible patch upcoming...

:Grisha Kruglov

Assignee

Updated

•

7 years ago

Comment 22

•

7 years ago

mozreview-review

Comment on attachment 8910517 [details] Bug 1401318 - Fix some of the 'shared-state' access problems around Account https://reviewboard.mozilla.org/r/181938/#review193242 I have seen strange shared account locking log messages that I have never investigated, but hopefully we don't see deadlocks in the wild.

Pulsebot

Comment 23

•

7 years ago

Pushed by gkruglov@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e464d1cc05a6 Fix some of the 'shared-state' access problems around Account r=nalexander

:Grisha Kruglov

Assignee

Comment 24

•

7 years ago

Attached patch account-shared-state.patch — Details — Splinter Review

Approval Request Comment [Feature/Bug causing the regression]: Bug 1368147 [User impact if declined]: A crash might occur after user changes primary email associated with the account. [Is this code covered by automated tests?]: No. [Has the fix been verified in Nightly?]: Not yet, just landed now. Bug is reproducible locally, verified that crashes do not occur. [Needs manual test from QE? If yes, steps to reproduce]: Wouldn't hurt. Crash won't happen every time. On a logged-in device, open FxA management page, append &canChangeEmail=true to the URI, and under Secondary Email change the primary email a bunch of times. Similarly, change the primary email on a different device a bunch of times. And finally, change primary email a bunch of times on a different device, and wait until device-under-test misses a push notification and doesn't update the primary email. Then, in Sync Prefs, trigger a sync. None of these cases should crash, and a sensible thing should always happen (primary email is changed). [List of other uplifts needed for the feature/fix]: Already uplited! [Is the change risky?]: Not very. [Why is the change risky/not risky?]: The primary fix is simple - refreshing internal state. The necessary locking is also simple, but it's possible it'll cause unforeseen issues. In my tests, it does not, and other users of the lock behave correctly when interrupted. [String changes made/needed]: N/A.

Attachment #8917106 - Flags: approval-mozilla-beta?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 25

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e464d1cc05a6

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 26

•

7 years ago

Comment on attachment 8917106 [details] [diff] [review] account-shared-state.patch Fixes a new crash issue, Beta57+

Attachment #8917106 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

7 years ago

status-firefox57: fixed → affected

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 27

•

7 years ago

Comment on attachment 8917106 [details] [diff] [review] account-shared-state.patch Actually on second thoughts if the crash volume is not large, is this a must fix for 57? It doesn't look like a simple not-null check to me, the size of the patch is usually directly proportional to the risk to quality. Hi Snorp, could I get a please second opinion from you on this one?

Flags: needinfo?(snorp)

Attachment #8917106 - Flags: approval-mozilla-beta+ → approval-mozilla-beta?

:Grisha Kruglov

Assignee

Comment 28

•

7 years ago

The crash volume is low for this particular signature because the feature is currently behind the flag on FxA side and must be explicitly enabled (via 'canChangeEmail' GET param). The crashes you currently see are all from devs, testers or early adopters. Once the feature is enabled, crash volume will rise proportionally to its usage. The aim is to ship this feature with 57, so the uplift is unfortunately necessary if we are to follow through with that plan. I also expect that this patch will fix Bug 1349147, which is unrelated to the "email change" feature, and has a considerable volume of crashes already.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 29

•

7 years ago

Comment on attachment 8917106 [details] [diff] [review] account-shared-state.patch Thanks Grisha for the additional details. That makes sense and justifies the uplift. beta57+

Flags: needinfo?(snorp)

Attachment #8917106 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 30

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/9ea0df8dba2b

status-firefox57: affected → fixed

:Grisha Kruglov

Assignee

Updated

•

7 years ago

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Bug 1401318 - Fix some of the 'shared-state' access problems around Account 7 years ago :Grisha Kruglov 59 bytes, text/x-review-board-request	nalexander : review+	Details
account-shared-state.patch 7 years ago :Grisha Kruglov 8.25 KB, patch	ritu : approval-mozilla-beta+	Details \| Diff \| Splinter Review