Closed Bug 1280409 Opened 8 years ago Closed 8 years ago

Crash in java.lang.IllegalStateException: Couldn''t read row 0, col 0 from CursorWindow. Make sure the Cursor is initialized correctly before accessing data from it. at android.database.CursorWindow.nativeGetString(Native Method)

Categories

(Firefox for Android Graveyard :: General, defect)

defect
Not set
critical

Tracking

(firefox47 unaffected, firefox48+ fixed, firefox49 fixed, fennec48+, firefox50 fixed)

RESOLVED FIXED
Firefox 50
Tracking Status
firefox47 --- unaffected
firefox48 + fixed
firefox49 --- fixed
fennec 48+ ---
firefox50 --- fixed

People

(Reporter: kbrosnan, Assigned: Grisha)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-8c7a124f-fe81-4ac5-a447-0bded2160609.
=============================================================

java.lang.IllegalStateException: Couldn't read row 0, col 0 from CursorWindow.  Make sure the Cursor is initialized correctly before accessing data from it.
	at android.database.CursorWindow.nativeGetString(Native Method)
	at android.database.CursorWindow.getString(CursorWindow.java:438)
	at android.database.AbstractWindowedCursor.getString(AbstractWindowedCursor.java:51)
	at org.mozilla.gecko.db.BrowserDatabaseHelper.upgradeDatabaseFrom31to32(BrowserDatabaseHelper.java:1539)
	at org.mozilla.gecko.db.BrowserDatabaseHelper.onUpgrade(BrowserDatabaseHelper.java:1736)
	at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:256)
	at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
	at org.mozilla.gecko.db.AbstractPerProfileDatabaseProvider.getWritableDatabase(AbstractPerProfileDatabaseProvider.java:61)
	at org.mozilla.gecko.db.AbstractTransactionalProvider.insert(AbstractTransactionalProvider.java:207)
	at android.content.ContentProvider$Transport.insert(ContentProvider.java:263)
	at android.content.ContentResolver.insert(ContentResolver.java:1236)
	at org.mozilla.gecko.db.LocalUrlAnnotations.insertAnnotation(LocalUrlAnnotations.java:173)
	at org.mozilla.gecko.db.LocalUrlAnnotations.insertReaderViewUrl(LocalUrlAnnotations.java:224)
	at org.mozilla.gecko.reader.SavedReaderViewHelper$1.run(SavedReaderViewHelper.java:148)
	at android.os.Handler.handleCallback(Handler.java:739)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:148)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43)

New(ish) crash in 48. Still see some very old reports < FxAndroid 38 see bug 863980 for the old bug.

Mfgr      Model    Andr API           CPU ABI    #  
samsung  SM-G900F  19 (REL)         armeabi-v7a  10  16.4%
samsung  GT-P5110  22 (UNOFFICIAL)  armeabi-v7a  5  8.2%
samsung  GT-I9195  19 (REL)         armeabi-v7a  2  3.3%
sony     SOL26     21 (REL)         armeabi-v7a  7  11.5%
sony     E6653     23 (REL)         armeabi-v7a  3  4.9%
Grisha, can you take a look at this?
Assignee: nobody → gkruglov
tracking-fennec: ? → 48+
Flags: needinfo?(gkruglov)
Status: NEW → ASSIGNED
Flags: needinfo?(gkruglov)
This exception is thrown during the 31->32 migration, which is the local visits migration. During this migration, we're doing two things which involve reading data from cursors:

1) synthesizing visit records based on the data in history table (cursor is used to read data from the History table)
2) migrating data from history extensions database into the visits table. (cursor is used to read data from the History Extensions DB)

Cursor operations in both cases are careful to check for existence of records, etc.

History extensions db has one table of data which we're migrating, and this is roughly its schema:
history_guid | visits_json_blob

visits_json_blob is a json array of visits recorded for the history record GUID, each visit described by a visit type and a timestamp.

My theory is that the visits JSON blob is the culprit. Android's CursorWindow implementation uses a cursorWindowSize of 2mb (see https://github.com/android/platform_frameworks_base/blob/master/core/res/res/values/config.xml#L1436), so if our column data exceeds that limit we might start running into problems. I've certainly seen visit JSON blobs contain more than the expected 20 visit max, so it's not inconceivable to imagine that it might contain more than 2mb worth of visit data, somehow, for some strange Sync edge-case-of-a-reason. In fact, Fennec's own history sync code used to upload all of the (at the time faked) visit data, without any limits on it - e.g. it would happily synthesize and upload hundreds of visit records - as opposed to desktop's limit of 20 (current fennec sync has a similar limit now).

It's also possible that we're hitting a combination of "a little too much json data" and "low on memory" and fail to allocate our cursor as well.

One problem with the theory of hitting cursorWindowSize during this migration is that since HistoryExtensionsDB used to be read during Sync operations I would expect to see this failure more often, and coming from the Sync code. In this migration, we're supposed to be only copying data over from the HistoryExtensionsDB if Sync is set up. it's still possible that Sync is setup, yet syncing of History is disabled, and we have a big blob of JSON lying around untouched until we try to port it over during this migration... It seems like a pretty narrow case, but number of failures is also very small.

Either way, I suggest wrapping migration code which deals with copying data over from the history extensions DB in try/catch for IllegalStateException, and just move on silently if we hit any exceptions. If we fail to port over visit data from the legacy file, it's not the end of the world - and, in this case, I don't think there's anything else we can do anyway.
Browsing code for the affected build number (well, very close) confirms that problem is occurring while copying data from the HistoryExtensionsDB.

Crash logs point to this line: https://hg.mozilla.org/releases/mozilla-beta/file/c6290de3ba99/mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java#l1539
Comment on attachment 8764570 [details]
Bug 1280409 - Guard against cursor failures while migrating data from HistoryExtensionsDB

https://reviewboard.mozilla.org/r/60370/#review57606

I'm not sure about just catching the exception. The message seems to mean that we access a Cursor that is not initialized and that's something we could check ahead. But I can't really find where exactly this happens. The last stacktrace is from 48.0. Do you have an idea?

What is the history extension DB btw.? I can't find any references of it except here in the code.

I'm clearing the review for now, but feel free to reflag me!
Attachment #8764570 - Flags: review?(s.kaspari)
(In reply to Sebastian Kaspari (:sebastian) from comment #5)
> Comment on attachment 8764570 [details]
> Bug 1280409 - Guard against cursor failures while migrating data from
> HistoryExtensionsDB
> 
> https://reviewboard.mozilla.org/r/60370/#review57606
> 
> I'm not sure about just catching the exception. The message seems to mean
> that we access a Cursor that is not initialized and that's something we
> could check ahead. But I can't really find where exactly this happens. The
> last stacktrace is from 48.0. Do you have an idea?


We're failing in a .getString, so I'm guessing that it happens at the only call to cursor.getString in the copyHistoryExtensionDataToVisitsTable: 
https://hg.mozilla.org/releases/mozilla-beta/file/c6290de3ba99/mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java#l560

As I've described in Comment 2, it's possible that these JSON blobs of data we receive would be rather large. Doing some napkin math, we'll need around 30k visits in one JSON array to hit 2mb CursorWindow limit... This seems rather suspect, it's a very large number of visits. But perhaps theoretically possible, since there was no cap for number of visits fennec sync will upload, before the visits table migration changes..?

If our cursorwindowsize is smaller than 2mb (although I believe that's the size of it, see Comment 2), this starts to look more likely.

> What is the history extension DB btw.? I can't find any references of it
> except here in the code.

HistoryExtensionDB was a database file which we used for storing history visits data, before we had the visits table in the main database. We needed to store the visits data for syncing purposes. While syncing history, we would receive JSON arrays describing individual visits for each history GUID (see Comment 2 for schema of that data). We would then store these blobs in the history extensions db (replacing anything we might already have in there), and when syncing next time, we'd combine these JSON blobs with "fake visits" which we'd generate locally based on local visit counts (History.VISITS), and send them up to Sync. This cycle would repeat during every sync op.

When visits table was introduced, any mention of the history extensions db was wiped away (other than these migrations, of course).

> I'm clearing the review for now, but feel free to reflag me!

IF large json arrays are the problem, we could use something like sqlite's substr (http://www.sqlite.org/lang_corefunc.html#substr) to carefully read JSON data in, avoiding cursor window size limits. This will add a bit of complexity into that migration, and I'm not sure it matters enough in this case. IF we are hitting this case, the only source of these visits AFAIK would have been the faulty older sync code on the android side - and the visits it would generate aren't "real" - timestamps are faked, etc. - so I wouldn't feel bad just dropping that data on the floor.

We might also be failing to initialize cursor for some other reason. (...insert plausible reason here...) and I'm not sure what it might be. Thoughts?

Generally speaking, due to the nature of how we were using historyextensiondb, I would treat this migration as "best effort" - e.g. it we fail to read data from it and port it over successfully, "synthesize" visits data from what's available in the history table, and move on with our lives. Visits table will be "bootstrapped" with semi-real data, and over time we'll populate it with real data as user browses around the web. Downside is that we *might* lose real visit data if we aren't as robust as possible here... E.g. even if we have large json arrays of visits for some GUIDs, it's very unlikely we'll have that problem for all of them.
Comment on attachment 8764570 [details]
Bug 1280409 - Guard against cursor failures while migrating data from HistoryExtensionsDB

https://reviewboard.mozilla.org/r/60370/#review58082

Okay, assuming it's the big JSON blob. Looking at the code I can't see any reason why the cursor should not be initialized at this point... so catching the exception seems to be our only sane way out here.
Attachment #8764570 - Flags: review+
https://hg.mozilla.org/integration/fx-team/rev/94763b7450fd1dae8cae6135c1ffbe4cebe07a1e
Bug 1280409 - Guard against cursor failures while migrating data from HistoryExtensionsDB r=sebastian
https://hg.mozilla.org/mozilla-central/rev/94763b7450fd
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 50
Track this as the crash happened in 48.
Hi Grisha, do you want to uplift this for 48/49 if these patches are not too risky?
Requesting uplift for the patch into 48 (where crashing migration landed) and 49.

[Feature/regressing bug #]: Bug 1046709

[User impact if declined]: in certain (likely profile-dependent, so fairly rate, but...) cases Firefox crashes on launch during a database migration.

[Describe test coverage new/current, TreeHerder]: These explicit failing cases aren't well covered by tests and coverage can't be easily added at this point. Migrations in general aren't well covered with automated tests, unfortunately.

[Risks and why]: low; very simple code change, wrapping a failing function call with exception handling.

[String/UUID change made/needed]: none
Attachment #8768669 - Flags: approval-mozilla-beta?
Attachment #8768669 - Flags: approval-mozilla-aurora?
Comment on attachment 8768669 [details] [diff] [review]
guard-against-cursor-failures-during-visits-migration.patch

Review of attachment 8768669 [details] [diff] [review]:
-----------------------------------------------------------------

This is a possible fix for the crash. Take it in 48 beta 7 and aurora.
Attachment #8768669 - Flags: approval-mozilla-beta?
Attachment #8768669 - Flags: approval-mozilla-beta+
Attachment #8768669 - Flags: approval-mozilla-aurora?
Attachment #8768669 - Flags: approval-mozilla-aurora+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: