Closed Bug 1072332 Opened 5 years ago Closed 5 years ago

crash in java.lang.NullPointerException: at org.mozilla.gecko.db.PerProfileDatabases.getDatabaseHelperForProfile(PerProfileDatabases.java)

Categories

(Firefox for Android :: Data Providers, defect, critical)

35 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 35
Tracking Status
firefox35 --- verified
fennec 34+ ---

People

(Reporter: aaronmt, Assigned: wesj)

References

Details

(Keywords: crash, topcrash-android-armv7)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-6916cdc4-785b-4181-9bc5-c6fb02140923.
=============================================================

java.lang.NullPointerException
	at org.mozilla.gecko.db.PerProfileDatabases.getDatabaseHelperForProfile(PerProfileDatabases.java:53)
	at org.mozilla.gecko.db.AbstractPerProfileDatabaseProvider.getWritableDatabase(AbstractPerProfileDatabaseProvider.java:61)
	at org.mozilla.gecko.db.AbstractTransactionalProvider.delete(AbstractTransactionalProvider.java:184)
	at android.content.ContentProvider$Transport.delete(ContentProvider.java:273)
	at android.content.ContentResolver.delete(ContentResolver.java:1282)
	at org.mozilla.gecko.db.URLMetadata.deleteUnused(URLMetadata.java:204)
	at org.mozilla.gecko.db.BrowserProvider.deleteUnusedImages(BrowserProvider.java:1361)
	at org.mozilla.gecko.db.BrowserProvider.deleteInTransaction(BrowserProvider.java:417)
	at org.mozilla.gecko.db.AbstractTransactionalProvider.delete(AbstractTransactionalProvider.java:188)
	at android.content.ContentProvider$Transport.delete(ContentProvider.java:273)
	at android.content.ContentResolver.delete(ContentResolver.java:1282)
	at org.mozilla.gecko.db.LocalBrowserDB.expireHistory(LocalBrowserDB.java:657)
	at org.mozilla.gecko.db.BrowserDB.expireHistory(BrowserDB.java:141)
	at org.mozilla.gecko.GeckoApplication$1.run(GeckoApplication.java:95)
	at android.os.Handler.handleCallback(Handler.java:733)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:136)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:32)
Android CPU ABI
Comments

  I was reading an article on theverge.com , hit the home button, then the crash happened when trying to go back to nightly
Is this bug 1072091?
tracking-fennec: --- → ?
Flags: needinfo?(wjohnston)
Duplicate of this bug: 1072333
Flags: needinfo?(wjohnston)
This is likely a problem with URLMetadata not passing the CONTENT_URI with a profile
Assignee: nobody → wjohnston
Attached patch bandaidSplinter Review
This passes any profile we have along to the URLMetadata class (who then pass it back to us :)). I'm still not sure why (where?) we're getting null though... Does this actually show up in crash-stats enough to spend time on?
Attachment #8495061 - Flags: review?(mark.finkle)
tracking-fennec: ? → 34+
(In reply to Wesley Johnston (:wesj) from comment #5)
> Created attachment 8495061 [details] [diff] [review]
> bandaid
> 
> This passes any profile we have along to the URLMetadata class (who then
> pass it back to us :)). I'm still not sure why (where?) we're getting null
> though... Does this actually show up in crash-stats enough to spend time on?

This is the #6 crash on Nightly
Comment on attachment 8495061 [details] [diff] [review]
bandaid

If we intend to pull more code out of LocalBrowserDB into encapsulated classes like URLMetadata, we'll need to handle this better.
Attachment #8495061 - Flags: review?(mark.finkle) → review+
https://hg.mozilla.org/mozilla-central/rev/5cd5fb2dc27e
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 35
Updating status flag as per comment 9.
Adding topcrash keyword since this is #2 on Nightly as of today.

Looking at the latest reports for this signature, I'm still seeing several people who are crashing with builds generated after the fix landed. Either these new crashes are something different with the same signature or this crash is not fixed.
Flags: needinfo?(wjohnston)
(In reply to Aaron Train [:aaronmt] from comment #11)
> E.g, 09/29 build report
> https://crash-stats.mozilla.com/report/index/838ac433-58a5-44af-8ca6-
> bba192140929

Different crash:
Original crash has URLMetadata.java in the stack:
at org.mozilla.gecko.db.URLMetadata.deleteUnused(URLMetadata.java:204)

New crash has TabsAccessor.java:
at org.mozilla.gecko.TabsAccessor.persistLocalTabs(TabsAccessor.java:376)
File a new bug please
See Also: → 1074319
See bug 1074319
I'm adding the signature from bug 1072333 to this bug since it was duped here.  Note that signature is up 8 positions to #5 on Nightly @ 3.98%.
Crash Signature: [@ java.lang.NullPointerException: at org.mozilla.gecko.db.PerProfileDatabases.getDatabaseHelperForProfile(PerProfileDatabases.java)] → [@ java.lang.NullPointerException: at org.mozilla.gecko.db.PerProfileDatabases.getDatabaseHelperForProfile(PerProfileDatabases.java)] [@ java.lang.NullPointerException: Attempt to invoke virtual method ''java.lang.String org.mozilla.gecko.GeckoProfile.ge…
Attached patch Patch 2Splinter Review
I'm still thinking about this, but I ran into something that could cause this problem.

Callers are expecting that GeckoProfile.get(Context) always returns a profile. I recently broke that with some of the lock widget stuff:

1.) GeckoProfile.getGuestProfile can return null. It always could. We've used that previously to determine if we should use the guest profile. In fact, GeckoProfile.get(Context)'s old logic was "if it exists, use it". However, I recently changed it so that GeckoProfile.get(Context) does more/different checks to determine if it should return a Guest profile. i.e. it now checks if the keyguard is up OR if a profile exists and is locked. Because of that it needs to use something infallible now. i.e. createGeckoProfile().

2.) createGeckoProfile can also return null right now, but AFAICT, there is no reason for that. Just a try-catch that has been there since it landed. I removed the try-catch and made it a little smarter about not "initializing" (i.e. adding distribution files to) a profile that already exists. It should now throw if something unexpected goes wrong. Its only called from one place (BrowserApp.onCreate()), and that place doesn't check the return explicitly either, so this feels pretty safe.

I think fixing those will fix this crash. Maybe a better question is if we should be returning the guest profile in these cases? For instance, in the URLMetadata case I have to imagine that

1.) You hit home while Fennec is running.
2.) We call expireHistory as we go in the background
3.) it apparently tries to use browserProvider.getContext() to get the profile/db. PerProfileDatabase will throw this away and use its cached context if it needs a profile. Somewhere in here we wind up with a less than ideal context. I'm not exactly sure how yet. When I've tried to repro locally, I always get a BrowserApp, but it seems possible that the BrowserApp context is deleted and the ContentProvider gets its own context.
4.) If this context doesn't have a cached database AND the keyguard is up we'll wind up trying to get the Guest profile and deleting from it instead of the normal one... This is fixed now because we use the profile passed to expireHistory (and with this we shouldn't crash from the GuestMode stuff any more either)...

The TabsAccessor code doesn't seem to pass a profile, so we can't rely on that to make sure things are in the right place (yet). Its also not called when we go in the background. It might be hit when a locationchange happens though. Maybe a page load while the lockscreen is up? I'll put a fix in that bug to pass the profile otherwise we'll wind up storing them in the guest profile....

I hate having to hack this in everywhere. I wonder if using the ApplicationContext for storing/getting the profile would help us keep hold of something more stable. Maybe just using a static field for mProfile would do as much. Scary changes....
Attachment #8497828 - Flags: review?(rnewman)
Flags: needinfo?(wjohnston)
(In reply to Wesley Johnston (:wesj) from comment #16)

> I hate having to hack this in everywhere. I wonder if using the
> ApplicationContext for storing/getting the profile would help us keep hold
> of something more stable. Maybe just using a static field for mProfile would
> do as much. Scary changes....

Yeah, this is a real state management problem!

Setting down my thoughts before looking at your code, just to make sure we get two sets of eyes on this.


I think we have three orthogonal problem sets here.

0. We expect to always have a profile. Fixing this requires either ensuring that we do (even if it's "default") or auditing call sites. The latter is probably best, because…

1. We don't correctly check whether the profile is a guest profile to decide what to do. This definitely requires auditing call sites, and walking up to where a runnable is queued. Indeed…

2. We sometimes expect a delayed background action to grab the correct profile when it's run, rather than grabbing it at point-of-occurrence and having the right thing happen thanks to a closure. BrowserDB helps us to get this wrong, because it's a single piece of state that changes at the wrong time.

Example of this: visit a page in Guest Mode, then immediately Exit Guest Mode. Do delayed runnables end up hitting the main profile? If so, sadness.

I suspect the approach we should take is to allow a caller to go from a GeckoProfile instance to a LocalBrowserDB instance, and audit all of our BrowserDB/GeckoProfile uses to make sure they (a) grab the profile there-and-then, (b) check if it's null/guest and do the right thing, and (c) use the profile's LocalBrowserDB instead of the default one.

Then we should eliminate BrowserDB, because it encourages us to get this wrong. We should have a globally scoped profile that's cheap to grab, rather than a global BrowserDB that we typically use from background runnables.

Do you agree with that parallel analysis?

Now I should read your code.

Oh, on a procedural note: wanna fork this bug for the follow-up work, so we don't stomp all over the bandaid patch that already landed?
(In reply to Richard Newman [:rnewman] from comment #17)
> I suspect the approach we should take is to allow a caller to go from a
> GeckoProfile instance to a LocalBrowserDB instance, and audit all of our
> BrowserDB/GeckoProfile uses to make sure they (a) grab the profile
> there-and-then, (b) check if it's null/guest and do the right thing, and (c)
> use the profile's LocalBrowserDB instead of the default one.

Yeah, I think we probably need to audit. I'm not sure I entirely understand the BrowserDB hatred. If anything it does one good thing for us. It grabs the profile before it does any requests and makes sure it sends that along to the ContentProvider. The problems we're hitting come when we don't get one. I almost wonder if we should just bail in those cases for PerProfileDatabases. If you don't specify a profile, we just quit. Then we have to audit all callers to make sure they always pass us something (and we always use it).

> Oh, on a procedural note: wanna fork this bug for the follow-up work, so we
> don't stomp all over the bandaid patch that already landed?

I don't mind moving to a different bug if you want. I probably should have. Lazy awesomebar :)
(In reply to Wesley Johnston (:wesj) from comment #18)

> I'm not sure I entirely understand
> the BrowserDB hatred. If anything it does one good thing for us. It grabs
> the profile before it does any requests and makes sure it sends that along
> to the ContentProvider.

The problem I'm postulating is that:

  BrowserDB.doFoo()

does two things:

  * It resolves to a LocalBrowserDB instance (and implicitly to a particular profile)
  * It performs a ContentProvider operation on the current thread via LocalBrowserDB.

My Point #2 is exactly this: we ought to separate those two stages, because we should capture the *inputs* to the operation (profile, URL, etc.) at one instance in time (right now), and then do the write on the background thread, possibly hundreds of milliseconds later.

This might be an abstract problem -- because the profile never changes, right? -- but in principle, having a singleton that *must* be used on the background thread but is only accurate on the UI thread is a bad idea. At the very least it's less clear than it could be.

(This actually was a concrete problem when building the share overlay: BrowserDB in the current process might be pointing to the guest mode profile, so the share overlay can't use it, and must instead make its own LocalBrowserDB.

Within the scope of BrowserApp, where we've initialized the chosen profile, then sure, BrowserDB makes sense.)


Avoiding this and the other issues is why I suggest this kind of idiom, where we're very explicit about the profile that we're using:

  final GeckoProfile profile = GeckoProfile.get();
  if (profile == null) {
    // Well, that shouldn't happen; we should always have a profile!
    return;
  }

  if (profile.isGuestProfile()) {
    // Maybe we don't want to do this write...
    return;
  }

  ThreadUtils.postToBackgroundThread(new Runnable() {
    @Override
    public void run() {
      // Or capture browserDB outside the runnable. Doesn't matter.
      profile.browserDB().writeWhatever(someURL);
    }
  });


This way the background runnable is always 100% certain that:

* It has the current profile
* It knows whether the current profile is a guest profile, etc. etc.
* It's working on the right DB.

This also accurately models the per-profile storage logic -- profiles 'own' databases -- so in that sense it's "natural", for whatever that's worth.


> The problems we're hitting come when we don't get
> one. I almost wonder if we should just bail in those cases for
> PerProfileDatabases. If you don't specify a profile, we just quit.

That's a reasonable defensive step! It would avoid leakage of data into the default profile, anyway.

(At this point mfinkle normally shouts about hiding bugs, so perhaps we should whisper.)


> I don't mind moving to a different bug if you want. I probably should have.
> Lazy awesomebar :)

I understand :D

If we were really early in a cycle I might opt to reuse this one, but things get tricky when we cross target milestone boundaries.
Blocks: 1077590
Attachment #8497828 - Flags: review?(rnewman)
I'm not seeing any instances of these signatures anymore in Fennec Aurora 35.0a2.
You need to log in before you can comment on or make changes to this bug.