Closed Bug 1014603 Opened 6 years ago Closed 2 years ago

crash in java.lang.NullPointerException: at org.mozilla.gecko.db.TopSitesCursorWrapper.moveToPosition(TopSitesCursorWrapper.java)

Categories

(Firefox for Android :: General, defect)

32 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- wontfix
firefox35 --- affected
firefox36 --- affected
firefox37 --- affected
firefox38 --- affected

People

(Reporter: aaronmt, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-42ec7b20-4b03-4b77-837a-ece282140515.
=============================================================

java.lang.NullPointerException
	at org.mozilla.gecko.db.TopSitesCursorWrapper.moveToPosition(TopSitesCursorWrapper.java:313)
	at android.support.v4.widget.CursorAdapter.getView(CursorAdapter.java:247)
	at android.widget.HeaderViewListAdapter.getView(HeaderViewListAdapter.java:220)
	at android.widget.AbsListView.obtainView(AbsListView.java:2263)
	at android.widget.ListView.makeAndAddView(ListView.java:1790)
	at android.widget.ListView.fillUp(ListView.java:725)
	at android.widget.ListView.fillGap(ListView.java:664)
	at android.widget.AbsListView.trackMotionScroll(AbsListView.java:5136)
	at android.widget.AbsListView$FlingRunnable.run(AbsListView.java:4247)
	at android.os.Handler.handleCallback(Handler.java:733)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:136)
	at android.app.ActivityThread.main(ActivityThread.java:5017)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:515)
	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:779)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:595)
	at de.robv.android.xposed.XposedBridge.main(XposedBridge.java:132)
	at dalvik.system.NativeStart.main(Native Method)
Single crash reported on a Tab 10.1 (GT-P7510)
Attached patch Proposed patch. v1 (obsolete) — Splinter Review
This crash could occur if:

* topCursor is null. I added a sanity check for this which will make the exception more obvious. This could only occur if something odd happened in the SQLite layer.

* pinnedPositions is null: e.g., if moveToPosition was called after close (or if my reading of the constructor is wrong).

This patch does the following:

* Fixes all warnings, typos, and style errors.
* Adds a few sanity checks.
* Makes pinnedPositions final and inits it during construction, rather than as a side-effect of something that happens during construction. We never null it out, only clear it.
Attachment #8447523 - Flags: review?(michael.l.comella)
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Comment on attachment 8447523 [details] [diff] [review]
Proposed patch. v1

Review of attachment 8447523 [details] [diff] [review]:
-----------------------------------------------------------------

This patch prevents NullPointerExceptions, yes, but does it actually solve the logic error? When is pinnedPositions getting used such that it throws a NullPointerException? After `moveToPosition` was called after `close()`, as you mentioned in comment 3? Then shouldn't we ensure close was not called as the first step in `moveToPosition`?

It sounds like there is more going on here. If you think the issue's solved, please elaborate.

::: mobile/android/base/db/TopSitesCursorWrapper.java
@@ +251,5 @@
>                  map = suggestedIndexes;
>                  break;
> +
> +            default:
> +                return -1;

You can remove the null check and `return -1` below this.
(In reply to Michael Comella (:mcomella) from comment #4)
> Comment on attachment 8447523 [details] [diff] [review]
> Proposed patch. v1
> 
> Review of attachment 8447523 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch prevents NullPointerExceptions, yes, but does it actually solve
> the logic error? 
> When is pinnedPositions getting used such that it throws a
> NullPointerException?


This bug is somewhat complicated because evidently code is being inlined (there's nothing in that section of the method that would directly crash with an NPE), and this is evidently not a common flow -- four crashes between Nightly and Aurora.

The commonality between lines 311 and 313 is that pinnedPositions is touched by both called methods, and is thus the null culprit.

So I deduce that either:

1. The cursor is being closed then used, or
2. updatePinnedPositions is never called.

> After `moveToPosition` was called after `close()`, as
> you mentioned in comment 3? Then shouldn't we ensure close was not called as
> the first step in `moveToPosition`?

uPP is called by requery and in the constructor, so the only way pinnedPositions can be null is if the cursor was closed then used. My guess, then, is that SimpleCursorLoader -- literally the only consumer of TopSitesCursorWrapper -- is buggy, or Android is buggy. SCL does a lot of cursor closing and close checking, and the checks are worrisome.
(In reply to Richard Newman [:rnewman] from comment #5)
> (In reply to Michael Comella (:mcomella) from comment #4)
> > Comment on attachment 8447523 [details] [diff] [review]
> > Proposed patch. v1
> > 
> > Review of attachment 8447523 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > This patch prevents NullPointerExceptions, yes, but does it actually solve
> > the logic error? 
> > When is pinnedPositions getting used such that it throws a
> > NullPointerException?
> 
> 
> This bug is somewhat complicated because evidently code is being inlined
> (there's nothing in that section of the method that would directly crash
> with an NPE), and this is evidently not a common flow -- four crashes
> between Nightly and Aurora.
> 
> The commonality between lines 311 and 313 is that pinnedPositions is touched
> by both called methods, and is thus the null culprit.
> 
> So I deduce that either:
> 
> 1. The cursor is being closed then used, or
> 2. updatePinnedPositions is never called.
> 
> > After `moveToPosition` was called after `close()`, as
> > you mentioned in comment 3? Then shouldn't we ensure close was not called as
> > the first step in `moveToPosition`?
> 
> uPP is called by requery and in the constructor, so the only way
> pinnedPositions can be null is if the cursor was closed then used. My guess,
> then, is that SimpleCursorLoader -- literally the only consumer of
> TopSitesCursorWrapper -- is buggy, or Android is buggy. SCL does a lot of
> cursor closing and close checking, and the checks are worrisome.

FWIW, the stack trace tells us that this crash happens during just after a fling gesture. The fling implementation in ListView involves posting runnables in the main Looper handler. These runnables are cancelled in several situations, including when the view is detached from the window, for example.

My informed guess here is that there's a very short window of time between the cursor being closed and the next fling runnable being executed that might cause the adapter's getView() to be called after the cursor has been closed and before the view is actually detached.
Looking at the implementation of CursorAdapter#getView (which we don't override), it looks kinda like there's no sane action we can take here:

 public View getView(int position, View convertView, ViewGroup parent) {
   if (!mDataValid) {
     throw new IllegalStateException("this should only be called when the cursor is valid");
   }
   if (!mCursor.moveToPosition(position)) {
     throw new IllegalStateException("couldn't move cursor to position " + position);
   }
   ...


That is, by the time we get to having getView be called, our only options are to throw or return a bound view.

So the only real fix for this bug is to make sure that we're not trying to make views from closed cursors at all -- higher up the stack.

Lucas, can you suggest the right fix? And are you happy with the small fixes I describe in Comment 3, which are in this patch, as an additional layer?
Flags: needinfo?(lucasr.at.mozilla)
This seems quite similar to Bug 760956.
See Also: → 760956
Comment on attachment 8447523 [details] [diff] [review]
Proposed patch. v1

Clearing r?: waiting for more info.
Attachment #8447523 - Flags: review?(michael.l.comella)
Got into this crash on Firefox 34 Beta 10.
Device: Samsung Galaxy R (Android 2.3.4).
Michael, do you have an updated opinion on this?
Flags: needinfo?(lucasr.at.mozilla) → needinfo?(michael.l.comella)
I think this is important enough that someone should work on it -- 8 crashes in the past month on Nightly, 1200+ crashes in the last month on Release, making this our #32 crasher and risihg -- but I don't have the time to dig into it.

Michael, consider taking this over?
Assignee: rnewman → nobody
Status: ASSIGNED → NEW
Assignee: nobody → michael.l.comella
Flags: needinfo?(michael.l.comella)
(In reply to Richard Newman [:rnewman] from comment #7)
> That is, by the time we get to having getView be called, our only options
> are to throw or return a bound view.

Assuming the problem is actually that we're closing cursors when we shouldn't, an escape hatch with questionable side effects could be overriding the newView and bindView methods in our CursorAdapter implementations that use TopSitesCursorWrapper to ignore a closed cursor and return a dummy view. Context (CursorAdapter.getView):

241    public View getView(int position, View convertView, ViewGroup parent) {
242        if (!mDataValid) {
243            throw new IllegalStateException("this should only be called when the cursor is valid");
244        }
245        if (!mCursor.moveToPosition(position)) {
246            throw new IllegalStateException("couldn't move cursor to position " + position);
247        }
248        View v;
249        if (convertView == null) {
250            v = newView(mContext, mCursor, parent);
251        } else {
252            v = convertView;
253        }
254        bindView(v, mContext, mCursor);
255        return v;

But let's try not to close those Cursors.
* This method is called *twice* in some circumstances. [1]
...
protected void onLoadFinishedAfterTransitions(Loader<Cursor> loader, Cursor c) {

I wonder if the first call closes the Cursor and the second throws the Exception in a race condition.

[1]: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/home/TopSitesPanel.java#701
This is difficult to diagnose so my plan is as follows: we land Richard's patch to get more information. If we start getting closed Cursor errors instead, then Lucas' analysis in comment 6 is likely. A workaround would be my suggestion in comment 13 (except override getView instead of newView or bindView), though it might be worthwhile to dig into further.

It's difficult because we have to determine where the LoaderManager code releases the Cursors (i.e. the Loader's deliverResult, onReset, and onCancelled) runs in relation to where the FlingRunnable is removed from the runnable queue, i.e. when the window is detached.

If I analyzed this correctly and the stack trace is not modified further up the stack (e.g. catch and throw new), after Richard's patch we should start getting closed Cursor errors from:

SQLiteClosable.acquireReference [1]
 SQLiteQuery.fillWindow [2]
  SQLiteCursor.fillWindow [3]
   SQLiteCursor.getCount [4] / SQLiteCursor.onMove [5]
    AbstractCursor.moveToPosition [6]

Since mQuery gets closed when a SQLiteCursor is closed [7] which releases the final reference in the SQLiteQuery extends SQLiteProgram extends SQLiteClosable [8].

Note that the stack trace from comment 0 indicates this bug only occurs on the mListAdapter (note: I have not looked at the other traces to see if it affects the Grid but I don't see why it shouldn't).

An alternative solution is to simplify the code by replacing the TopSitesPanel with the speed dial, as we previously discussed.

[1]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteClosable.java#52
[2]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteQuery.java#57
[3]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteCursor.java#138
[4]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteCursor.java#131
[5]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteCursor.java#120
[6]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/AbstractCursor.java#195

[7]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteCursor.java#205
[8]: http://androidxref.com/4.4.4_r1/xref/frameworks/base/core/java/android/database/sqlite/SQLiteClosable.java#105
Comment on attachment 8447523 [details] [diff] [review]
Proposed patch. v1

r+, noting that it does not fix the bug.
Attachment #8447523 - Flags: review+
Attached patch Rebased patchSplinter Review
Updated the description; author is still Richard.
https://hg.mozilla.org/integration/fx-team/rev/03487443c9e5

Aaron, this bug is not fixed but I expect the crash signature will change to something like:

  IllegalStateException("attempt to re-open an already-closed object: " + this);

Do you mind letting me know if you see this signature, or anything similar (regarding closed Cursors or TopSitesCursorWrapper)?

Also, how would you recommend I watch for these signatures myself?
Flags: needinfo?(aaron.train)
Keywords: leave-open
(In reply to Michael Comella (:mcomella) from comment #18)
> https://hg.mozilla.org/integration/fx-team/rev/03487443c9e5
> 
> Aaron, this bug is not fixed but I expect the crash signature will change to
> something like:
> 
>   IllegalStateException("attempt to re-open an already-closed object: " +
> this);
> 
> Do you mind letting me know if you see this signature, or anything similar
> (regarding closed Cursors or TopSitesCursorWrapper)?
> 
> Also, how would you recommend I watch for these signatures myself?

Sure, I can watch this.

I visit https://crash-stats.mozilla.com, Firefox for Android, 38.0a1, Top Crashers, [Type All, Days: 1] and then double click on first appearance sort.
Flags: needinfo?(aaron.train)
Unassigning because I'm not actively working on this.

I can't find the signature mentioned in comment 18 and the steps in comment 19 didn't help me to find it. I also searched for the signature directly:

https://crash-stats.mozilla.com/search/?signature=~attempt+to+re-open&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Assignee: michael.l.comella → nobody
Crash Signature: [@ java.lang.NullPointerException: at org.mozilla.gecko.db.TopSitesCursorWrapper.moveToPosition(TopSitesCursorWrapper.java)] → [@ java.lang.NullPointerException: at org.mozilla.gecko.db.TopSitesCursorWrapper.moveToPosition(TopSitesCursorWrapper.java)] [@ java.lang.NullPointerException: at org.mozilla.gecko.db.TopSitesCursorWrapper.moveToPosition]
Attached file Recent stack trace
I opened the doorhanger and maybe did some scrolling on Google Maps when it crashed.

NI self to investigate.
Flags: needinfo?(michael.l.comella)
I think the work in bug 760956 may fix this.
Flags: needinfo?(michael.l.comella)
No recent crashes in crash stats.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.