Closed Bug 1254468 Opened 8 years ago Closed 8 years ago

(TopSitesPanel) android.database.StaleDataException: Attempting to access a closed CursorWindow

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(firefox46+ fixed, firefox47+ fixed, firefox48+ fixed, fennec47+)

RESOLVED FIXED
Firefox 48
Tracking Status
firefox46 + fixed
firefox47 + fixed
firefox48 + fixed
fennec 47+ ---

People

(Reporter: sebastian, Assigned: ahunt)

References

Details

(Keywords: crash)

Crash Data

Attachments

(5 files)

I was surfing around and switching between websites and about:home to debug bug 	1254465 when this crash happened:

> GeckoCrashHandler  E  >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 1 ("main")
> E  android.database.StaleDataException: Attempting to access a closed CursorWindow.
>    Most probable cause: cursor is deactivated prior to calling this method.
> E      at android.database.AbstractWindowedCursor.checkPosition(AbstractWindowedCursor.java:139)
> E      at android.database.AbstractWindowedCursor.getString(AbstractWindowedCursor.java:50)
> E      at android.database.CursorWrapper.getString(CursorWrapper.java:137)
> E      at org.mozilla.gecko.home.TopSitesPanel$TopSitesGridAdapter.bindView(TopSitesPanel.java:589)
> E      at android.support.v4.widget.CursorAdapter.getView(Unknown Source)
> E      at android.widget.AbsListView.obtainView(AbsListView.java:2346)
> E      at android.widget.GridView.onMeasure(GridView.java:1065)
> E      at org.mozilla.gecko.home.TopSitesGridView.onMeasure(TopSitesGridView.java:114)
> E      at android.view.View.measure(View.java:18794)
> E      at android.widget.ListView.setupChild(ListView.java:1958)
> E      at android.widget.ListView.makeAndAddView(ListView.java:1879)
> E      at android.widget.ListView.fillSpecific(ListView.java:1355)
> E      at android.widget.ListView.layoutChildren(ListView.java:1696)
> E      at android.widget.AbsListView.onLayout(AbsListView.java:2148)
> [..]
I was supposed to file a followup bug to make sure our adapter/cursorloader lifecycle was correct (since that was independent of our single cursor query change)... I'll use this bug for that instead!
Assignee: nobody → ahunt
Blocks: 760956
tracking-fennec: ? → 47+
Notes for myself:

"The Loader will release the data once it knows the application is no longer using it. For example, if the data is a Cursor from a CursorLoader, you should not call close() on it yourself. If the Cursor is being placed in a CursorAdapter, you should use the swapCursor(android.database.Cursor) method so that the old Cursor is not closed. "
[ http://developer.android.com/reference/android/app/LoaderManager.LoaderCallbacks.html#onLoadFinished%28android.content.Loader%3CD%3E,%20D%29 ]

As far as I can tell we're already doing this in CursorLoaderCallbacks (see TopSitesPanel), however we're using our own "TransitionAwareCursorLoaderCallbacks" to handle onLoadFinished, so maybe the bug lives somewhere in there?
[Tracking Requested - why for this release]:
This crash has more than 40 000 occurrences over the last 7 days
https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=java.lang.IllegalStateException%3A+at+android.database.sqlite.SQLiteClosable.acquireReference%28SQLiteClosable.java%29&page=12#tab-sigsummary

Bug 760956 provides a fix but too risky for uplift.
We should probably consider an uplift to 46 of this bug when ready.
So I think the issue is:

1) We get the cursor, and run SimpleCursorLoader.deliverResult()
2) deliverResult() calls CursorLoaderCallbacks.onLoadComplete() (this goes via Android Frameworks Loader.deliverResult() which calls onLoadComplete())
3) This is implemented in TransitionAwareCursorLoaderCallbacks, which schedules the onLoadFinishedAfterTransitions to be run after transitions have completed (TransitionsTracker.runAfterTransitions)
4) We get a race since SimpleCursorLoader now attempts to close the cursor, and we try to replace this cursor in the adapter using swapCursor in our SimpleCursorLoader.onLoadFinishedTranstions(). (We need to replace the old cursor since it's now being closed.)

Usually the TransitionsTracker runs our cursor swap soon enough, but sometimes we might close the cursor first, the adapter then might read the cursor before we managed to swap it, and we crash. I'm thinking that TransitionsAwareCursorLoaderCallbacks could be fundamentally flawed? (It looks this was first added in Bug 1100904.)

The crash seems to depend on three conditions:
1) The grid or topsites adapter is still processing items (i.e. iterating over the cursor).
2) A tracked transition is happening (I'm still figuring out what these are, but some are the doorhanger appearing/disappearing, and the url bar switching into editing mode)
3) The cursor is reloaded (e.g. because of pinning, or because another tab finished loading which modifies history which forces a reload).

I'm able to reproduce the crash by adding various sleep()s and hardcoding a BrowserDB.updateVisitedHistory call to onLoadFinished.
Here's the patch with added/modified timeouts that causes fennec to crash.

I'm still trying to figure out if there's an elegant way of fixing this. Maybe we just have to scrap TransitionAwareCursorLoaderCallbacks, alternatively we could build a special CursorLoader that doesn't close cursors, and close the cursor in TransitionAwareCursorLoaderCallbacks?
TransitionAwareCursorLoaderCallbacks is fundamentally flawed: old CursorLoader
cursors _must_ not be used after onLoadFinished has been called. However
we sometimes queue the cursor swapping (which is implemented by subclasses
in onLoadFinishedAfterTransitions) until after transitions have finished.
CursorLoader.deliverResult() closes the old cursor immediately after calling
onLoadFinished (with the new cursor). At this stage the adapter is
still holding onto the old (but now closed cursor), and will crash if it tries
to read this cursor (which can happen if the adapter is still iterating over the
cursor).

Instead we should ensure that we swap the cursors during onLoadFinished - the simplest
way to do this is by eliminating TransitionAwareCursorLoader and using onLoadFinished
the way the Android framework expects.

It's worth noting that TransitionAwareCursorLoader is obsolete: at the time it was added,
home panels were placed in the HomePagerTabStrip, which notified TransitionsTracker about
its transitions. However HomePagerTabStrip no longer exists, hence there's no need
for us to care about these transitions anymore. (The crash seems to happen because we
try to hide the doorhanger every time we reveid LOCATION_CHANGE, and each of these starts
a hide transition - even if no doorhanger is shown - hence we often have a transition
in progress every time we show topsites.)

Review commit: https://reviewboard.mozilla.org/r/39863/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/39863/
Attachment #8730435 - Flags: review?(nalexander)
Attachment #8730435 - Flags: review?(nalexander)
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

https://reviewboard.mozilla.org/r/39863/#review36459

There's a typo (s/reveid/receive/) in the commit message.

There's a part 2 needed for this to land: remove `TransitionsTracker` altogether!  See https://dxr.mozilla.org/mozilla-central/search?q=TransitionsTracker&redirect=true&case=false.

After that, I'm not knowledgeable enough to review this, so flag mcomella or sebastian instead.
(In reply to Nick Alexander :nalexander from comment #7)
> Comment on attachment 8730435 [details]
> MozReview Request: Bug 1254468 - Remove broken
> TransitionAwareCursorLoaderCallbacks r?nalexander
> 
> https://reviewboard.mozilla.org/r/39863/#review36459
> 
> There's a typo (s/reveid/receive/) in the commit message.
> 
> There's a part 2 needed for this to land: remove `TransitionsTracker`
> altogether!  See
> https://dxr.mozilla.org/mozilla-central/
> search?q=TransitionsTracker&redirect=true&case=false.
> 

Oops, I forgot to check whether we still needed TransitionsTracker (and assumed it might be used elsewhere). I wonder if we want to still use it, but waiting for transitions elsewhere. E.g. we could either not reload the cursor if we're still animating, or wait for transitions to end before returning a cursor? (Cursor loading happens in a worker thread, but making that wait on the UI transitions might be finicky?)

Is using a RecyclerView for home panels a better solution for the issues TransitionsTracker tried to solve? (I'm not hugely familiar with it yet, but it claims to have better performance.) If so, it might be worth focusing on upgrading at least TopSites to using a RecycleView given its probably the most regularly seen home-panel?
I've just realised the TransitionAwareCursorLoaderCallbacks isn't actually obsolete - we don't care about home-pager transitions, however we do still track the url-bar/awesomescreen transition (tapping on the urlbar brings up topsites/awesomescreen).

So, with my patch, the transition when tapping the urlbar is less smooth: on nightly we initially show an empty topsites panel (but still show the homepager), and then add the topsites later (because we don't pass the cursor to the adapter until the animation has stopped), with this patch we try to render everything including the topsites at once which is a bit slower.

(I suspect that if the DB query is slow enough then we'll initially draw the empty panel again, but on my device the cursor returns fast enough for all data to be available during the initial layouting pass.)
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/39863/diff/1-2/
Attachment #8730435 - Attachment description: MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?nalexander → MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian
Attachment #8730435 - Flags: review?(s.kaspari)
This is no longer needed - TransitionAwareCursorLoaderCallbacks was the only
consumer - it was removed as it caused race conditions. The ideal future solution
is probably to use recyclerviews to avoid jank, rather than trying to wait for
transitions to happen.

It's also extremely difficult to use this correctly - the
TransitionAwareCursorLoaderCallbacks simply held the cursor that would usually
be swapped in onLoadFinished until transitions have finished (which is incorrect,
since cursors need to be swapped in before onLoadFinished returns). It's hard to imagine
any alternative solutions, short of avoiding loading cursors in the first place (which
isn't too useful, since cursor loading happens in the background, at which point the UI
status is irrelevant), or hacking the CursorLoader to not return from its worker thread
until UI transitions are done (which would require a new thread-safe implementation of
TransitionsTracker), or maybe even hacking Android Framework's AsyncTaskLoader  to not run Loader.deliverResult
while transitions are running (which seems awfully brittle and hacky).

Review commit: https://reviewboard.mozilla.org/r/40151/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/40151/
Attachment #8730783 - Flags: review?(s.kaspari)
Based on my profiling (on an x86 emulator to avoid the keyboard interfering with the profiling bars), there is no noticeable difference in frame rendering time between nightly and my patch. It seems that the performance difference would therefore be highly subjective, and it's possible to argue that having everything appear at once is actually nicer - so I'm now less worried about undoing what Bug 871651 tried to solve.
A hacky alternative solution would be to keep the current implementation, but only delay handling the new cursor _if_ this is the first time we've received a cursor (i.e. the adapter has no cursor / adapter.getCursor() == null). In this case there's no old cursor that might be being read / that could get closed.

However personally I'm not a huge fan of the delay in loading topsites, I think that actually looks worse than being slightly slower in showing the panel. (The topsites themselves seem to appear more quickly if we don't do the delay, this just affects how quickly we show the potentially blank panel.)
Marking affected and tracking for 46+.
Okay, I need some more time to fully understand what we are doing here. I just triggered a try-run via reviewboard so that we get some feedback from the tests. I'd also like to add mcomella as second reviewer while I read through the code. :)
Attachment #8730435 - Flags: review?(michael.l.comella)
Attachment #8730783 - Flags: review?(michael.l.comella)
Here's a video of existing behaviour
(Nexus 4, Android 5, running latest Nightly)

Note: initially the topsites panel is blank, we only start painting TopSites after the transition has finished. We do this by not passing the cursor to the adapter until all transitions are complete.

I personally think this behaviour is more annoying, and it feels like this is slower at displaying the actual topsites content.
And here's a video of behaviour without TransitionsAwareCursorLoader
(Nexus 4, Android 5, running latest fx-team + my patches)

Note: everything appears at once, it feels like it takes slightly longer for the panel to appear, however at this stage the panel already contains our topsites tiles.

Note 2: if cursor loading takes a long time the homepanel transition should still happen with a blank topsites panel: cursor loading is performed in the background (in an AsyncTask), and we only pass the cursor to the adapter once it's finished loading. However in general the cursor load is fast enough for the adapter to receive a cursor before we start layouting the homepanels (which means the layouting is a bit slower, but we still see the topsites slightly earlier).
Let's make sure Anthony is aware of this change.

Also, you should try this out on a low-end device, ideally with a lot of bookmarks/history.
Flags: needinfo?(alam)
(In reply to :Margaret Leibovic from comment #18)
> Let's make sure Anthony is aware of this change.
> 
> Also, you should try this out on a low-end device, ideally with a lot of
> bookmarks/history.

I've been testing on a Nexus S (which is really slow), and there's effectively no difference in behaviour (compared to the old delayed load). I think the Cursor loading just takes long enough that we don't receive a cursor until after we've already drawn the homepager. I.e. all we're really doing is affecting behaviour on new devices, where we're able to load the cursor before we start layouting.

Unfortunately that device is too old to actually create a screencast, but it looks like a much much slower version of the "existing behaviour" video.
I tested this patch out on my 6P. 

After talking to :ahunt, we think there is a difference when it comes to the "transition" of the tiles. But its not necessarily bad. Basically, they "crash" in too fast. But we should get this fix in and polish the transition in bug 1258561.

Thanks :ahunt!
Flags: needinfo?(alam)
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

https://reviewboard.mozilla.org/r/39863/#review38107

Thank you for documenting all your findings in the bug. This was immensely helpful to understand what's going on here.

This makes sense and TransitionAwareCursorLoaderCallbacks seems to be flawed and it seems like this has been crashing for quite some time (I can only go back to Summer 2015).
Attachment #8730435 - Flags: review?(s.kaspari) → review+
Comment on attachment 8730783 [details]
MozReview Request: Bug 1254468 - Post: Remove unused TransitionsTracker r?sebastian

https://reviewboard.mozilla.org/r/40151/#review38109
Attachment #8730783 - Flags: review?(s.kaspari) → review+
(In reply to Andrzej Hunt :ahunt from comment #8)
> Is using a RecyclerView for home panels a better solution for the issues
> TransitionsTracker tried to solve? (I'm not hugely familiar with it yet, but
> it claims to have better performance.) If so, it might be worth focusing on
> upgrading at least TopSites to using a RecycleView given its probably the
> most regularly seen home-panel?

We have been looking into improving the top sites panel considerably in Q2. Moving to a RecyclerView-based implementation is something I'd do along with that. This is also the best time to fix animation/transition issues.
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

via irc: Sebastian said he was confident on these fixes.
Attachment #8730435 - Flags: review?(michael.l.comella)
https://hg.mozilla.org/mozilla-central/rev/adb1d935641a
https://hg.mozilla.org/mozilla-central/rev/916b9e55af42
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

Approval Request Comment
[Feature/regressing bug #]: Maybe Bug 1100904?
[User impact if declined]: Frequent crashes when viewing homepanels (e.g. TopSites). This is probably our top crash on release, 2nd-top crash on beta, 3rd-top on aurora.
[Describe test coverage new/current, TreeHerder]: Manual testing.
[Risks and why]: Low risk - we're removing a delay in cursor loading, i.e. avoiding a race condition by sticking to the default Android cursor loading implmentations.
[String/UUID change made/needed]: none.

(The followup commit isn't strictly needed for uplift, but is low risk since it just removes a now unneeded class.)
Attachment #8730435 - Flags: approval-mozilla-aurora?
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

Crash fix, Aurora47+
Attachment #8730435 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Hi Sebastian, if this is also a top-crasher on Beta46, should we uplift it there as well after stabilizing in Aurora for a few days?
Flags: needinfo?(s.kaspari)
(In reply to Ritu Kothari (:ritu) from comment #31)
> Hi Sebastian, if this is also a top-crasher on Beta46, should we uplift it
> there as well after stabilizing in Aurora for a few days?

Yeah, that sounds reasonable. I'll forward the NI to ahunt. He knows more about the surrounding code and if this is easy to uplift.
Flags: needinfo?(s.kaspari) → needinfo?(ahunt)
(In reply to Ritu Kothari (:ritu) from comment #31)
> Hi Sebastian, if this is also a top-crasher on Beta46, should we uplift it
> there as well after stabilizing in Aurora for a few days?

The patches apply cleanly to beta, and it looks like this fixes the crash on  both nightly and aurora (only 2 days of aurora data so far, but there aren't any reports of it happening with builds containing the fix). I can request beta uplift in a day or two, just to give us time to get more data?
Crash Signature: FennecAndroid 47.0a2 Crash Report [@ java.lang.IllegalStateException: at android.database.sqlite.SQLiteClosable.acquireReference(SQLiteClosable.java) ] FennecAndroid 48.0a1 Crash Report [@ java.lang.IllegalStateException: at android.database.sqlite.SQLit…
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

Approval Request Comment
[Feature/regressing bug #]: Maybe Bug 1100904?
[User impact if declined]: Frequent crashes when viewing homepanels (e.g. TopSites). This is probably our top crash on release, 3rd-top crash on beta, 3rd-top on aurora.
[Describe test coverage new/current, TreeHerder]: Manual testing. Patch seems to have not had any effects on nightly or aurora, the crash is no longer happening on nightly or aurora.
[Risks and why]: Low risk - we're removing a delay in cursor loading, i.e. avoiding a race condition by sticking to the default Android cursor loading implmentations.
[String/UUID change made/needed]: none.

(The followup commit isn't strictly needed for uplift, but is low risk since it just removes a now unneeded class.)
Flags: needinfo?(ahunt)
Attachment #8730435 - Flags: approval-mozilla-beta?
Comment on attachment 8730435 [details]
MozReview Request: Bug 1254468 - Remove broken TransitionAwareCursorLoaderCallbacks r?sebastian

Fix for #3 topcrash for Fennec, please uplift for beta 8 build today
Attachment #8730435 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.