Closed Bug 1407217 Opened 2 years ago Closed 9 months ago

Crash in java.lang.IllegalStateException: Null cursor in isPinnedByUrl at org.mozilla.gecko.db.LocalBrowserDB.isPinnedForAS(LocalBrowserDB.java)

Categories

(Firefox for Android :: Activity Stream, defect, P3, critical)

Firefox 57
All
Android
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- fix-optional

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-20301b68-f269-4e5c-aef1-6bf450171009.
=============================================================
Java Stack Trace 	

java.lang.IllegalStateException: Null cursor in isPinnedByUrl
	at org.mozilla.gecko.db.LocalBrowserDB.isPinnedForAS(LocalBrowserDB.java:1768)
	at org.mozilla.gecko.BrowserApp$43.run(BrowserApp.java:3773)
	at android.os.Handler.handleCallback(Handler.java:733)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:149)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43)

these crashes start showing up in 57.0b.
Tim, is this related to increased Activity Stream usage in 57?
Flags: needinfo?(tspurway)
Andrew, we don't do much on Android.  Grisha, do you know who should own this?
Flags: needinfo?(tspurway) → needinfo?(gkruglov)
A-S is slated for 57 on Fennec, and this seems related.
Component: General → Activity Stream
Flags: needinfo?(gkruglov) → needinfo?(michael.l.comella)
There's a pile of possible causes for a null cursor:

- There's already a cursor open somewhere. I haven't seen this problem in a while, but it might still happen.
- The URI used for the CR isn't valid, or it refers to a provider that doesn't exist.
- An exception is thrown, perhaps because the query is invalid.

The CP code doesn't seem to do anything obviously wrong.

If anyone can get a log (or, better, repro!) that should narrow things down.
(In reply to Richard Newman [:rnewman] from comment #4)
> - There's already a cursor open somewhere. I haven't seen this problem in a
> while, but it might still happen.

I added this code in 57, which calls isPinnedForAS on a background thread in response to a user action (opening the options menu) – I wonder if another Cursor is being accessed concurrently but this only became a problem now because it's related to a user action whereas before we just never happened to open multiple cursors.

Richard, can only one Cursor be open per ContentResolver at a time? Or is there just an overall limit? If one, how does that work with classes like CursorAdapter that, afaik, keep a cursor open to fill the UI with data. (I was unaware of this limitation)

> - The URI used for the CR isn't valid, or it refers to a provider that
> doesn't exist.

This method existed before so I don't think it's a problem.
Flags: needinfo?(michael.l.comella) → needinfo?(rnewman)
Flags: needinfo?(michael.l.comella)
Curiously, all other times we query for `Cursor == null`, we don't throw but rather log and return a default value [1]. If the problem is that there is still a Cursor open, this may be an existing problem that's only uncovered because we throw here.

Thus hack fix: return a default value rather than throwing.

[1]: http://searchfox.org/mozilla-central/search?q=path%3Amobile%2Fandroid**LocalBrows+c+%3D%3D+null&case=false&regexp=false&path=
`isPinnedForAS` was introduced in bug 1319274 and there does not appear to have been an `isPinned` for old top sites so it's unlikely we would have seen this problem under another name in old top sites.
Depends on: 1319274
I spoke with Richard on IRC:

(In reply to Michael Comella (:mcomella) from comment #6)
> Richard, can only one Cursor be open per ContentResolver at a time? Or is
> there just an overall limit? If one, how does that work with classes like
> CursorAdapter that, afaik, keep a cursor open to fill the UI with data. (I
> was unaware of this limitation)

<@rnewman> CR/CP is an IPC mechanism
15:39 I could imagine that an allocation failure somewhere causes a null
15:39 I know for sure that some CR/CP calls turn exceptions into nulls
15:39 but IIRC that's only if the CP is running in a separate process
15:40 We really need the ADB log from one of these failures

Additionally:
- Returning null is a sign something is exceptional so we should try to find the source of the problem rather than covering it up
- Given ^, the fact that other methods do not catch null (comment 7) is likely a mistake and ideally we'd find the source of any problems these methods have too
Flags: needinfo?(rnewman)
> Richard, can only one Cursor be open per ContentResolver at a time? 

Typically that should be fine.

Whether this is a problem or not depends on a whole bunch of things.

In my experience we would get a null cursor when SQLite says 'nope'. Typically that has been a syntax error or some other genuine query error.

IIRC it could also happen if there's a cursor open that's somehow holding the DB in a locked state, or a concurrent writer, which would cause your query to fail. You could try this: kick off a thread that just runs `BEGIN EXCLUSIVE` on the shared SQLite connection, then see if your `isPinnedForAS` call fails with a null cursor. Or take the connection you got from `getReadableConnection()` and close it before you run the query. It _should_ throw an exception, but perhaps it doesn't.

Or directly try throwing a RuntimeException from inside your CP `query` method, and see how that's propagated to your caller through the CR interface.

There's a very steep stack of Android abstractions at play here, so it's hard to predict where the issue lies.
This is the 33rd top crasher in 57.0b5 and is not a top crasher in 57.0b7 (the current one). As such and given the challenge and my split responsibilities, I won't continue investigating this. I'll keep an eye out to make sure the crash rates don't spike and occasionally look for the logs we're hoping for (leaving NI).
Now the top 48th crasher: I wonder if there's a specific use case for which this happens that users will only run into once, e.g.:
- They tried something once, crashed, and have learned to avoid.
- They changed a setting, for which we don't refresh correctly and puts the app into a bad state (bug 1402327?), that they only need to change once and don't crash anymore since they don't change the setting anymore
Sebastian, would you mind verifying this (and bug 1402327) doesn't become an out-of-control crash when 57 moves to release on Nov. 14 + propagation time? I'll be PTO next week - thanks! (Note: I decided to stop investigating this in comment 12)
Flags: needinfo?(s.kaspari)
Alright. :)
Assignee: nobody → s.kaspari
Status: NEW → ASSIGNED
Flags: needinfo?(s.kaspari)
Handing over to Grisha.
Assignee: s.kaspari → gkruglov
Grisha isn't actively working on this.

It looks like the crashes are still fairly low, as anticipated (comment 11), so I'm going to stop looking into this regularly – we can wait for it to appear in the top crashers again.
Assignee: gkruglov → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(michael.l.comella)
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.