Closed Bug 961895 Opened 10 years ago Closed 10 years ago

Fennec Nightly (soon-after-)startup crash in PasswordsProvider process (with Master Password enabled?)

Categories

(Firefox for Android Graveyard :: General, defect)

29 Branch
ARM
Android
defect
Not set
critical

Tracking

(fennec+)

RESOLVED WORKSFORME
Tracking Status
fennec + ---

People

(Reporter: ehsan.akhgari, Assigned: wesj)

References

Details

(Keywords: crash, dogfood, Whiteboard: [likely workarounds: disable password sync or disable master password])

Attachments

(1 file)

Attached file adb logcat
This started to happen for me yesterday, I'm attaching the relevant bits of adb logcat.

What I see in Fennec is a message box saying "Unfortunately, Nightly has stopped." shortly after startup, and then Fennec is killed in the background.

Because of this bug, Fennec nightly is unusable for me.  It would be nice if we backed out whatever caused this to regress.
Bug 947506 looks suspect. Margaret?
Blocks: 947506
Severity: normal → critical
Flags: needinfo?(margaret.leibovic)
Keywords: crash
OS: Mac OS X → Android
Hardware: x86 → ARM
Ehsan: do you have Master Password enabled?
(In reply to comment #2)
> Ehsan: do you have Master Password enabled?

I do.
But note that none of the websites that I visited in my testing require a password, so I never get this prompt (well, Fennec dies way too early for that anyway)
(In reply to comment #1)
> Bug 947506 looks suspect. Margaret?

Hmm, that just renames a class.  I'm not sure how that could have any runtime effects like this.. :/
Sync can't sync your passwords if MP is enabled.

It shouldn't cause a crash, but the logspew is normal, and has been that way for a year. This log doesn't look any different from what I expect, except that there's a native-code crash in there for some reason.
The workaround, by the way, is to turn off password syncing (which will affect all of your devices) or turn off MP on your phone.

MP is weak and many of us (including myself and Brian Smith) advise against its use and for its removal from Firefox.

MP is approximately equivalent to just asking you for a PIN to prevent your kids from accessing your passwords, actually using no encryption on disk. But somehow Fennec can access your passwords without entering your MP: Bug 939900. I see no point in having it enabled. Use Android's security features instead.
Turning off the master password does this fix this crash for me.
Thanks for checking, Ehsan.
Summary: Fennec Nightly startup crash in PasswordsProvider → Fennec Nightly startup crash in PasswordsProvider process with Master Password enabled
Dave reported similar symptoms without MP enabled. Waiting for confirmation and a log.
Flags: needinfo?(justdave)
Summary: Fennec Nightly startup crash in PasswordsProvider process with Master Password enabled → Fennec Nightly startup crash in PasswordsProvider process (with Master Password enabled?)
Version: unspecified → Firefox 29
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> (In reply to comment #1)
> > Bug 947506 looks suspect. Margaret?
> 
> Hmm, that just renames a class.  I'm not sure how that could have any
> runtime effects like this.. :/

I agree this doesn't sound like a regression that would have been caused by bug 947506, but I also don't know why Ehsan would have just started seeing this recently. Could it be related to the profile work that wesj has been working on? How recently did those patches land?
Flags: needinfo?(margaret.leibovic) → needinfo?(wjohnston)
No longer blocks: 947506
Nothing recent should have caused this that I know of. I added some logging, and we're not sending any profile info over, just like we weren't before. We intentionally throw when we can't do the crypto, but I'm not sure we want to. Or it seems like we should catch it and return something that tells Sync we couldn't insert the item.

I have a vague memory of talking about this when we wrote it, but can't remember why we decided on doing things this way. Richard, do you?
Flags: needinfo?(wjohnston) → needinfo?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #10)
> Dave reported similar symptoms without MP enabled. Waiting for confirmation
> and a log.

I'm pretty sure the crash I've been getting is unrelated.  It seems to have nothing to do with page load (it'll randomly crash after sitting there for an hour in the background).  It's still been happening to me, but I haven't been lucky enough to be near a computer to hook it up and grab a logcat when it happens anytime recently.
Flags: needinfo?(justdave)
(In reply to Dave Miller [:justdave] (justdave@bugzilla.org) from comment #13)

> I'm pretty sure the crash I've been getting is unrelated.  It seems to have
> nothing to do with page load (it'll randomly crash after sitting there for
> an hour in the background). 

You mean... like an hourly timed sync? :P
Flags: needinfo?(rnewman)
(In reply to Wesley Johnston (:wesj) from comment #12)

> I have a vague memory of talking about this when we wrote it, but can't
> remember why we decided on doing things this way. Richard, do you?

We're limited in what we can return over the CR interface, for one thing. I imagine the throwing was to differentiate, to some extent, between returning no value and being unable to return a value. But I think the throwing is a red herring: the throw is being caught and translated into a cross-process return value, as always; the issue here is a new native crash. Maybe NSS shutdown or something?

With any luck, later this year we can tackle Bug 711636 and its dependencies, and all of this will be moot.
I'm getting the same behavior & logcat results to what ehsan desscribes. (and I do have a Master Password enabled)

Adding 'dogfood' keyword, as this has been making Fennec unusable for dogfooding, for me at least. (Reliably crashes within a few minutes of launching.)

I'm uncomfortable turning off my master password; I may just disable sync (now that I know that's involved), for the time being.
Keywords: dogfood
tracking-fennec: --- → ?
Just to confirm: disabling sync (in android Settings app | "Accounts") does seem to work around this for me.
--> Adding that to whiteboard as a known workaround (and MP-disabling as another workaround, per comment 8).
Whiteboard: [likely workarounds: disable sync or disable master password]
(In reply to Daniel Holbert [:dholbert] from comment #17)
> Just to confirm: disabling sync (in android Settings app | "Accounts") does
> seem to work around this for me.

You can also just disable syncing passwords -- uncheck the box.

> I'm uncomfortable turning off my master password; I may just disable sync
> (now that I know that's involved), for the time being.

You should consider checking Settings > Storage > Phone storage encryption, and disable Master Password. Android's encryption is very very likely to be stronger and more robust; Firefox's MP can be cracked in a few minutes using a handful of widely available tools, and amounts to little more than an inconvenience.
Whiteboard: [likely workarounds: disable sync or disable master password] → [likely workarounds: disable password sync or disable master password]
(In reply to Richard Newman [:rnewman] from comment #18)
> > I'm uncomfortable turning off my master password; I may just disable sync
> > (now that I know that's involved), for the time being.
> 
> You should consider checking Settings > Storage > Phone storage encryption,

(Yup, I use that as well.)
Assignee: nobody → wjohnston
tracking-fennec: ? → 29+
Summary: Fennec Nightly startup crash in PasswordsProvider process (with Master Password enabled?) → Fennec Nightly (soon-after-)startup crash in PasswordsProvider process (with Master Password enabled?)
(In reply to Richard Newman [:rnewman] from comment #15)
> (In reply to Wesley Johnston (:wesj) from comment #12)
> 
> > I have a vague memory of talking about this when we wrote it, but can't
> > remember why we decided on doing things this way. Richard, do you?
> 
> We're limited in what we can return over the CR interface, for one thing. I
> imagine the throwing was to differentiate, to some extent, between returning
> no value and being unable to return a value. But I think the throwing is a
> red herring: the throw is being caught and translated into a cross-process
> return value, as always; the issue here is a new native crash. Maybe NSS
> shutdown or something?

I don't think the throwing is a red herring. We "catch" the exception, but we just rethrow something else:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/PasswordsProvider.java#217

which is what causes this crash AFAICT.
The lib loader throws, so NSSBridge fails to decrypt, so the PasswordsProvider throws a RuntimeException:

E/GeckoLibLoad(18615): Throw
E/GeckPasswordsProvider(18615): Error in NSSBridge


But this is a CP in a separate process, so the JavaBinder logs the exception:

E/JavaBinder(18615): *** Uncaught remote exception!  (Exceptions are not yet supported across processes.)
E/JavaBinder(18615): java.lang.RuntimeException: java.lang.Exception: PK11SDR_Decrypt returned error -8177: The security password entered is incorrect.
E/JavaBinder(18615):
E/JavaBinder(18615): 	at org.mozilla.gecko.db.PasswordsProvider.doCrypto(PasswordsProvider.java:218)
E/JavaBinder(18615): 	at org.mozilla.gecko.db.PasswordsProvider.onPostQuery$3e5a7f14(PasswordsProvider.java:281)
E/JavaBinder(18615): 	at org.mozilla.gecko.db.SQLiteBridgeContentProvider.query(SQLiteBridgeContentProvider.java:378)
E/JavaBinder(18615): 	at android.content.ContentProvider.query(ContentProvider.java:744)
E/JavaBinder(18615): 	at android.content.ContentProvider$Transport.query(ContentProvider.java:199)
E/JavaBinder(18615): 	at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:114)
E/JavaBinder(18615): 	at android.os.Binder.execTransact(Binder.java:388)
E/JavaBinder(18615): 	at dalvik.system.NativeStart.run(Native Method)
E/JavaBinder(18615): Caused by: java.lang.Exception: PK11SDR_Decrypt returned error -8177: The security password entered is incorrect.
E/JavaBinder(18615):
E/JavaBinder(18615): 	at org.mozilla.gecko.NSSBridge.nativeDecrypt(Native Method)
E/JavaBinder(18615): 	at org.mozilla.gecko.NSSBridge.decrypt(NSSBridge.java:44)
E/JavaBinder(18615): 	at org.mozilla.gecko.db.PasswordsProvider.doCrypto(PasswordsProvider.java:213)
E/JavaBinder(18615): 	... 7 more


… and has the remote content resolver client receive null, which Sync cheerfully handles:

E/FxSync  (18593): fennec :: PasswordsRepoSession :: Got null cursor exception in PasswordsRepoSession.store
W/FxSync  (18593): fennec :: SynchronizerSession :: First RecordsChannel onFlowStoreFailed. Logging local store error.
W/FxSync  (18593): org.mozilla.gecko.sync.repositories.NullCursorException
W/FxSync  (18593): 	at org.mozilla.gecko.sync.repositories.android.RepoUtils$QueryHelper.checkAndLogCursor(RepoUtils.java:86)
W/FxSync  (18593): 	at org.mozilla.gecko.sync.repositories.android.RepoUtils$QueryHelper.safeQuery(RepoUtils.java:65)
W/FxSync  (18593): 	at org.mozilla.gecko.sync.repositories.android.PasswordsRepositorySession.retrieveByGUID(PasswordsRepositorySession.java:562)
W/FxSync  (18593): 	at org.mozilla.gecko.sync.repositories.android.PasswordsRepositorySession.access$1200(PasswordsRepositorySession.java:38)
W/FxSync  (18593): 	at org.mozilla.gecko.sync.repositories.android.PasswordsRepositorySession$5.run(PasswordsRepositorySession.java:272)
W/FxSync  (18593): 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1080)
W/FxSync  (18593): 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573)
W/FxSync  (18593): 	at java.lang.Thread.run(Thread.java:841)


The CP apparently doesn't die, because it then handles multiple other calls under the same PID (which all fail in the same way).


The crash happens immediately after the umpteenth GeckoLibLoad failure, before that RuntimeException is thrown, and it's a native memory hygiene error.

As such, I don't believe there's anything on the Java side that would cause a crash (almost by definition).


From the CP client's perspective, the throw is identical to a null, because JavaBinder will just return a null value for a cross-process exception. The throw is simply there to cause this handy log to be created.

If you think that the logging or JavaBinder handling might be crashing, then by all means replace the throw with an explicit `return null`, and see if Ehsan can repro with and without that change.
Ahh. Makes sense. Thanks.
This is tracking Fx29 folks. What's the plan?
I don't think we have one, short of dropping the existing PasswordsProvider implementation, or making semi-random changes and trying to provoke a crash.

This doesn't seem conclusively linked to MP. It doesn't seem to have been caused by any recent changes in our code. It's nothing to do with the cross-process CP logic. Something is dying in native code. That might be due to a findable regression in NSS or storage, or it might not.

Certainly we're unlikely to uplift Bug 946857, or an NSS backout, or the removal of MP, so I suggest bumping the tracking flag.

If there's any short-term action, it should be to try to muffle the process death.
Another short-term fix I'd suggest: hide the Master Password setting in Settings unless you have it turned on. We shouldn't be encouraging users to get into this state.
See Also: → 989398
I could reproduce this when the bug was filed, but can't seem to anymore, on Release, Beta, or Nightly. I wonder if we're catching whatever was causing the crash now. I see tons of "Throw" log messages from:

http://mxr.mozilla.org/mozilla-central/source/mozglue/android/APKOpen.cpp#113
Can we retest this one?
tracking-fennec: 29+ → +
Flags: needinfo?(kbrosnan)
Flags: needinfo?(ehsan)
I can't reproduce in Nightly either, FWIW. I do have Sync enabled (including for passwords), and I have the master password checkbox checked, and I don't recall hitting a startup crash any time recently. So I think that means this worksforme.

(It looks like my device doesn't actually have any saved passwords on it -- I'm never actually prompted for my MP and nothing autofills, which I suppose makes sense per comment 6. Not sure if that impacts the reproducibility of this.)
Removing my ni - there is no crash id to go on here and I did a super search on crash-stats looking for the stacks in comment 21 nothing was found.
Flags: needinfo?(kbrosnan)
(In reply to comment #27)
> Can we retest this one?

What should I do to retest this?
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #30)
> (In reply to comment #27)
> > Can we retest this one?
> 
> What should I do to retest this?

Clearing the needinfo? unless someone replies to this.  Please needinfo? me again.
Flags: needinfo?(ehsan)
No STR and no signs this is still happening. Lets close this.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: