Closed Bug 847021 Opened 7 years ago Closed 6 years ago

android.database.sqlite.SQLiteException: Can't upgrade read-only database from version <n> to 16 at android.database.sqlite.SQLiteOpenHelper.getReadableDatabase(SQLiteOpenHelper.java) at org.mozilla.gecko.db.LocalBrowserDB.filterAllSites

Categories

(Firefox for Android :: Data Providers, defect, critical)

21 Branch
ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Firefox 29
Tracking Status
firefox21 - wontfix
firefox22 --- wontfix
firefox23 --- wontfix
firefox26 --- wontfix
firefox27 --- fixed
firefox28 --- fixed
firefox29 --- fixed
fennec + ---

People

(Reporter: scoobidiver, Assigned: rnewman)

Details

(Keywords: crash, steps-wanted, verifyme, Whiteboard: [native-crash][startupcrash])

Crash Data

Attachments

(3 files)

It happened from time to time with database versions previous to 14 but it seems to have spiked since Firefox 21 with database version 14 especially coming from database version 2 or 3.
Are XUL Fennec users trying to install Native Fennec?

android.database.sqlite.SQLiteException: Can't upgrade read-only database from version 1 to 14: /data/data/org.mozilla.fennec_aurora/files/mozilla/m107wyfl.default/browser.db
	at android.database.sqlite.SQLiteOpenHelper.getReadableDatabase(SQLiteOpenHelper.java:199)
	at org.mozilla.fennec_aurora.db.BrowserProvider.getReadableDatabase(BrowserProvider.java:1838)
	at org.mozilla.fennec_aurora.db.BrowserProvider.query(BrowserProvider.java:2440)
	at android.content.ContentProvider$Transport.query(ContentProvider.java:212)
	at android.content.ContentResolver.query(ContentResolver.java:262)
	at org.mozilla.gecko.db.LocalBrowserDB.filterAllSites(LocalBrowserDB.java:176)
	at org.mozilla.gecko.db.LocalBrowserDB.getTopSites(LocalBrowserDB.java:234)
	at org.mozilla.gecko.db.BrowserDB.getTopSites(BrowserDB.java:137)
	at org.mozilla.gecko.AboutHomeContent.loadTopSites(AboutHomeContent.java:299)
	at org.mozilla.gecko.AboutHomeContent.access$400(AboutHomeContent.java:69)
	at org.mozilla.gecko.AboutHomeContent$10.run(AboutHomeContent.java:437)
	at android.os.Handler.handleCallback(Handler.java:587)
	at android.os.Handler.dispatchMessage(Handler.java:92)
	at android.os.Looper.loop(Looper.java:130)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:31)

More reports at:
https://crash-stats.mozilla.com/query/query?product=FennecAndroid&query_search=signature&query_type=contains&query=android.database.sqlite.SQLiteOpenHelper.getReadableDatabase&do_query=1
It's #5 top crasher in 21.0a2.
tracking-fennec: --- → ?
Keywords: topcrash
Need steps here. Is this, i.e, filterAllSites used in cursor access when visiting the top-sites tab or on start-up with about:home or is this in profile-migration?
Keywords: steps-wanted
(In reply to Aaron Train [:aaronmt] from comment #2)
> Need steps here. Is this, i.e, filterAllSites used in cursor access when
> visiting the top-sites tab or on start-up with about:home or is this in
> profile-migration?
I assume the third case because of the upgrade assertion and a max startup time to 16 seconds.
(In reply to Scoobidiver from comment #0)
> Are XUL Fennec users trying to install Native Fennec?
> 
> android.database.sqlite.SQLiteException: Can't upgrade read-only database
> from version 1 to 14:
> /data/data/org.mozilla.fennec_aurora/files/mozilla/m107wyfl.default/browser.
> db

Matt - any ideas on what may be going on here? Want to get the opinion of mobile engineering (what versions/paths to test, etc.) before asking QA to take a stab at this.
Flags: needinfo?(mbrubeck)
I don't know this code; looking for a more suitable victim...
Flags: needinfo?(mbrubeck) → needinfo?(lucasr.at.mozilla)
Is there a recent forced upgrade for Aurora and Nightly channels in Firefox for Android?
It might be that some people haven't opened their aurora firefox in a long time and just recently updated using the update button?

I'm not sure (one way or the other) if we would see these issues on beta or release due to it going through the Google Play...
Is it the new version of bug 791441 which was restricted to Aurora and Nightly channels?
(In reply to Aaron Train [:aaronmt] from comment #2)
> Need steps here. Is this, i.e, filterAllSites used in cursor access when
> visiting the top-sites tab or on start-up with about:home or is this in
> profile-migration?

Since "AboutHomeContent" appears in the logs, I'd bet it's not a XUL migration issue. Looks like <n> is 3 or less, so these are some old Aurora versions being updated to current release.

The next question becomes "Why are these DBs read-only?" I honestly don't remember any "read-only" issues from Fx14 or Fx15.

I do remember that we were using the Android system DB back then. Maybe that has something to do with it. Maybe we need to drop all history/bookmark data for any DB version before we were local.
Looking at the code for "getReadableDatabase", I think this exception is a red herring (not the real problem). See the code:
http://androidxref.com/4.0.4/xref/frameworks/base/core/java/android/database/sqlite/SQLiteOpenHelper.java#216

The flow for "getReadableDatabase" seems to be:
* Return the existing mDatabase if ready
* Attempt to get a writeable database
  * getWriteableDatabase will attempt to update an old DB to a new version, but will silently fail
* Check to see if the DB version and the expected version are the same (getWriteableDatabase succeeded)
* If the versions are not the same, fire the exception that causes this crash.

So the real issue is: getWriteableDatabase failed to update the DB to the new version. We really don't know why the failure happened unless we find some Android 16 logs with additional logcat info.
One theory is that some code is trying to use the BrowserProvider while it is trying to update. I am trying to figure out if the ContentProvider system has protection for this situation. I don't see any of the system Android content providers trying to explicitly rpotect updates, except the ContactsProvider - which seems to do it's updating outside the SqliteOpenHelper framework.
I am getting tired of bandaids for Sqlite concurrency issues... but here's another one!

This patch tries to keep the content provider closed for business while an update might be happening. It uses the same style/pattern as the ContactsProvider in Android uses.

I really want to know if we need to worry about concurrency during update since we use the standard SqliteOpenHelper - ContactsProvider does not. No other content provider in the Android source code seems to need this pattern.
Attachment #724036 - Flags: feedback?(rnewman)
Assignee: nobody → mark.finkle
tracking-fennec: ? → 21+
Attachment #724036 - Flags: feedback?(lucasr.at.mozilla)
Flags: needinfo?(lucasr.at.mozilla)
Comment on attachment 724036 [details] [diff] [review]
WIP (wait for access)

Review of attachment 724036 [details] [diff] [review]:
-----------------------------------------------------------------

This looks sane to me. But I'm sure rnewman will have more insight here.
Attachment #724036 - Flags: feedback?(lucasr.at.mozilla) → feedback+
Comment on attachment 724036 [details] [diff] [review]
WIP (wait for access)

Review of attachment 724036 [details] [diff] [review]:
-----------------------------------------------------------------

Ouch, so sorry this wasn't done sooner -- somehow slipped past me, and only Bugzilla's outstanding requests reminder saved the day.

I'm not sure I agree with your analysis in Comment 11 and onward. getReadableDatabase is synchronized, and uses mInitialized as a flag. If the database is already open for use, it'll be returned without crossing the upgrade barrier (even if somehow it needs an upgrade). If it's already being initialized, then the synchronized call will wait. And if you're accessing the same DB through multiple helpers (which would circumvent that synchronization), you wouldn't get this exception.

My analysis is:

* A single call is made to getReadableDatabase (e.g., by Awesomebar).
* The DB is not already initialized.
* getWriteableDatabase throws a SQLiteException (bad query?) during update.
  -- any failure in onCreate, onUpgrade, onDowngrade, or onOpen (which we don't use) will bubble up.
* SQLiteOpenHelper attempts to open the DB read-only -- "Couldn't open " + mName + " for writing (will try read-only):" -- detects that the upgrade failed (version mismatch), and throws this exception.

getWriteableDatabase -- through synchronization -- implements exactly the mutual exclusion that you're trying to apply here, so this patch is unnecessary for this bug. By the time gWD returns, it will give you a DB handle that's correctly upgraded, or it'll throw.

By throwing during an upgrade we will cause the exception reported in this bug.

My suggestion, then: don't throw a SQLiteException in an onCreate/onUpgrade/onDowngrade/onOpen implementation. Better to catch it and throw something else that won't trigger the fallback into read-only/version check/throw.

Now, there is one area that might be concurrency/transaction related.

I could imagine a situation in which a database was marked as open and/or in a transaction, and then our upgrade handlers come to run. That would cause an exception even if our code was 'safe' -- database in use. Perhaps it would be worthwhile checking for db.inTransaction() at the start of onUpgrade, and either throwing a unique exception or terminating the orphaned transaction prior to beginning the upgrade?


Code comments inline for posterity.

::: mobile/android/base/db/BrowserProvider.java.in
@@ +1795,5 @@
> +            try {
> +                latch.await();
> +                return;
> +            } catch (InterruptedException e) {
> +                Thread.currentThread().interrupt();

I'm not sure it makes sense to do this in a loop. If you're interrupted, it's for a good reason; you should stop doing work, and that means aborting the content provider request that you're handling.

And to wait for a fixed period of time (which you should, to avoid zombiedom), use the long form of await(), not a loop.

I'd suggest:

  try {
      if (latch.await(60, TimeUnit.SECONDS)) {
          return;
      }
  } catch (InterruptedException e) {
      Log.w(LOGTAG, "Interrupted.", e);
      Thread.currentThread().interrupt();
  }
  throw new IllegalStateException("Cannot acquire access.");

The callers should catch that exception if they wish, or just allow it to propagate and have "consistent failure" rather than a DB problem.

Even if you *do* decide to do this in a loop, it should be:

  while (!Thread.interrupted()) {
      ...
Attachment #724036 - Flags: feedback?(rnewman) → feedback-
Adding needinfo on Mark to help with next steps here. Mark, is there a new patch upcoming to help with this top-crasher based on the feedback ?
Flags: needinfo?(mark.finkle)
(In reply to bhavana bajaj [:bajaj] from comment #16)
> Adding needinfo on Mark to help with next steps here. Mark, is there a new
> patch upcoming to help with this top-crasher based on the feedback ?

No patch is upcoming. We still do not have a good handle on any cause. The patch attached in this bug is blindly guessing. I hate it.

We have been working in other bugs, finding places where the DB is locking or hanging. I am hoping that by fixing some of those real bugs, we might see some affect on this (and other DB locking) ghost bugs.
Flags: needinfo?(mark.finkle)
It still happens in the trunk and Beta at a low volume but has never happened in Aurora.
tracking-fennec: 21+ → ?
Keywords: topcrash
note that next time we upgrade our DBs we should track it for that release
tracking-fennec: ? → +
Bug 858994 and Bug 860759 involve a DB upgrade. Those bugs should get uplifted shortly.
Summary: android.database.sqlite.SQLiteException: Can't upgrade read-only database from version <n> to 14 at android.database.sqlite.SQLiteOpenHelper.getReadableDatabase(SQLiteOpenHelper.java) at org.mozilla.gecko.db.LocalBrowserDB.filterAllSites → android.database.sqlite.SQLiteException: Can't upgrade read-only database from version <n> to 16 at android.database.sqlite.SQLiteOpenHelper.getReadableDatabase(SQLiteOpenHelper.java) at org.mozilla.gecko.db.LocalBrowserDB.filterAllSites
We're seeing this mysterious error message in our app DigiCal as well, but only for <= 4.0 devices. Which is a shame, as we collect LogCat messages for those.


We're going to deploy a blind fix that basically calls getWritableDatabase() once per process, which 

- forces getDatabaseLocked(true) to either return an upgraded, writable database which is closed instantly again.
- or throw the SQLiteException (4.3 source: line 230) instead of silently ignoring that and trying to return a read only database. I'm sure this will happen and show up on crash reports in the next version so we can get the underlying cause fixed.


You might consider doing the same, it probably won't fix the crashes, but at least the new stack traces won't be totally useless. The cost of calling getWritableDatabase().close() once per process is worth it in my opinion.
mfinkle I can hit this on my t-mobile g2 running on Android 2.2. Is there anything useful to do here?
Flags: needinfo?(mark.finkle)
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> mfinkle I can hit this on my t-mobile g2 running on Android 2.2. Is there
> anything useful to do here?

Hooray!

Yes, logs please.

adb setprop log.tag.GeckoLocalBrowserDB VERBOSE
adb setprop log.tag.GeckoBrowserProvider VERBOSE
adb setprop log.tag.GeckoDBUtils VERBOSE
adb setprop log.tag.SQLiteOpenHelper VERBOSE
adb setprop log.tag.Database VERBOSE

Run those, restart the process (force-quit the app) and repro.

Massive extra points if you can repro the actual failure (not just this symptom): all the way from having the DB open successfully, through an upgrade (we presume), to having the exception in the log.

Please grab the whole log:

  adb logcat -v time > everything.log

-- no filtering. The root cause might be outside of our classes, so we don't know what the tag'll be.
Flags: needinfo?(mark.finkle)
Flags: needinfo?(rnewman)
Attachment #8359466 - Attachment mime type: text/x-log → text/plain
Comment on attachment 8359466 [details]
logcat of the crash

I love you, Kevin.
Flags: needinfo?(rnewman)
Comment on attachment 8359512 [details] [diff] [review]
Correctly drop bookmarks_type_deleted_index during migration. v1

This'll be a non-trivial one to test. I suggest that we make sure it doesn't negatively affect local builds, then Nightly, then uplift it to Beta, where Kevin can verify.

It's possible there will be another index or view that causes trouble at that point, but we can iterate now that we know which log tag needs to be bumped to see what's wrong!
Attachment #8359512 - Attachment description: Correctly drop bookmarks_type_deleted_index during migration. → Correctly drop bookmarks_type_deleted_index during migration. v1
Attachment #8359512 - Flags: review?(mark.finkle)
Attachment #8359512 - Flags: review?(mark.finkle) → review+
Nothing looked dodgy locally, so:

https://hg.mozilla.org/integration/fx-team/rev/f6b1edbb2afc
Assignee: mark.finkle → rnewman
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/f6b1edbb2afc
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
Comment on attachment 8359512 [details] [diff] [review]
Correctly drop bookmarks_type_deleted_index during migration. v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
  Aaaaaancient code.

User impact if declined: 
  Total failure to do database-related operations after upgrade.

Testing completed (on m-c, etc.):
  Baked on Nightly. Can't easily verify without first getting this into Beta.
 
Risk to taking this patch (and alternatives if risky): 
  Change is 100% safe.

String or IDL/UUID changes made by this patch:
  None.
Attachment #8359512 - Flags: approval-mozilla-beta?
Attachment #8359512 - Flags: approval-mozilla-aurora?
Attachment #8359512 - Flags: approval-mozilla-beta?
Attachment #8359512 - Flags: approval-mozilla-beta+
Attachment #8359512 - Flags: approval-mozilla-aurora?
Attachment #8359512 - Flags: approval-mozilla-aurora+
This looks good. Crash is not gone from crash stats however the rate is extremely low. Less than 5 crashes a week on release. If someone has a reproducible case for this file a new bug.
Status: RESOLVED → VERIFIED
Flags: needinfo?(kbrosnan)
^5
You need to log in before you can comment on or make changes to this bug.