Closed Bug 947939 Opened 11 years ago Closed 11 years ago

SQLiteDatabaseLockedException from BrowserDB.expireHistory

Categories

(Firefox for Android Graveyard :: Data Providers, defect)

Product:

Component:

Platform:

All

Android

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

(firefox29 fixed, firefox30 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 30

Tracking Flags:

Tracking

Status

firefox29

---

fixed

firefox30

---

fixed

People

(Reporter: mfinkle, Assigned: rnewman)

References

Details

Attachments

(3 files, 8 obsolete files)

Part 0: general cleanup in BrowserProvider. 11 years ago Richard Newman [:rnewman] 14.60 KB, patch		Details \| Diff \| Splinter Review
Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. 11 years ago Richard Newman [:rnewman] 5.32 KB, patch	nalexander : review+	Details \| Diff \| Splinter Review
Part 2: transaction handling in deletes and updates. 11 years ago Richard Newman [:rnewman] 24.67 KB, patch	nalexander : feedback+	Details \| Diff \| Splinter Review
Part 3: more efficient bulk updates of bookmarks. 11 years ago Richard Newman [:rnewman] 5.70 KB, patch		Details \| Diff \| Splinter Review
Part 2: transaction handling in deletes and updates.* * * 11 years ago Richard Newman [:rnewman] 27.22 KB, patch		Details \| Diff \| Splinter Review
Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query, ensure it's transaction-sane. 11 years ago Richard Newman [:rnewman] 6.40 KB, patch		Details \| Diff \| Splinter Review
Pre: fix testBrowserProvider#ensureEmptyDatabase. 11 years ago Richard Newman [:rnewman] 4.95 KB, patch		Details \| Diff \| Splinter Review
Part 0: general cleanup in BrowserProvider. 11 years ago Richard Newman [:rnewman] 14.60 KB, patch	nalexander : review+	Details \| Diff \| Splinter Review
Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. * * 11 years ago Richard Newman [:rnewman] 6.40 KB, patch	nalexander : review+	Details \| Diff \| Splinter Review
Part 2: transaction handling in deletes and updates.* * * 11 years ago Richard Newman [:rnewman] 37.28 KB, patch		Details \| Diff \| Splinter Review
Part 2: transaction handling in deletes and updates, more efficient bulk updates. 11 years ago Richard Newman [:rnewman] 37.01 KB, patch	nalexander : review+ Sylvestre : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Description

•

11 years ago

https://crash-stats.mozilla.com/report/index/f67e87d1-b194-4e2c-971f-97e162131208 Technically, we could catch this exception and ignore it. We will attempt to expireHistory again at some point. If we really need to expireHistory, we should just re-try the call once or twice.

Kevin Brosnan [Ex-Mozilla]

Updated

•

11 years ago

Blocks: 752828

OS: Mac OS X → Android

Hardware: x86 → All

Richard Newman [:rnewman]

Assignee

Comment 1

•

11 years ago

See retry comments here: http://developer.android.com/reference/android/database/sqlite/SQLiteDatabaseLockedException.html My intuition is that immediately retrying won't help, and just muffling will cause the same error in the next write call. There has to be a reason for this. Much ponder, so think.

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 2

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #1) > See retry comments here: > > http://developer.android.com/reference/android/database/sqlite/ > SQLiteDatabaseLockedException.html > > My intuition is that immediately retrying won't help, and just muffling will > cause the same error in the next write call. > > There has to be a reason for this. Much ponder, so think. We were thinking about delay posting back onto the thread.

Richard Newman [:rnewman]

Assignee

Comment 3

•

11 years ago

Here are my guesses. This exception is thrown because somewhere down in the bowels, sqlite is returning SQLITE_BUSY. That means one of these things: * Two simultaneous writes. * A write and a read, in some circumstances (WAL should eliminate this). * An attempt to upgrade a read transaction to a write after a parallel change has been committed (<http://sqlite.1065341.n5.nabble.com/Begin-immediate-transaction-gt-SQLITE-BUSY-database-is-locked-td64878.html>). * A write while WAL checkpointing is occurring? BrowserProvider: // From Honeycomb on, it's possible to run several db // commands in parallel using multiple connections. if (Build.VERSION.SDK_INT >= 11) { db.enableWriteAheadLogging(); db.setLockingEnabled(false); Firstly, setLockingEnabled does nothing on 16+, which probably explains why we only see this problem on API15: different threads cannot issue writes at the same time because of the exclusive locking. That Android now forces locking is working to hide our bug. Secondly, disabling the sqlite connection locking means that we must be sure never to issue two simultaneous writes, or otherwise hit one of the events above. My theories as to what 'shapes' in our code are the causes here: * We write to BrowserProvider from multiple threads. Sync has its own threads, for one thing, but we also might hit this from within Fennec's main code. * We retain cursors in order to draw our list UI. These might overlap with writes in some dangerous way. * If we *ever* run a writing command after a read command within a transaction begun on that thread.

Richard Newman [:rnewman]

Assignee

Comment 4

•

11 years ago

Oh hey! Check this out! We upgrade a read to a write inside a transaction in expireHistory! >> * Call this method within a transaction. */ private void expireHistory(final SQLiteDatabase db, final int retain, final long keepAfter) { Log.d(LOGTAG, "Expiring history."); RR final long rows = DatabaseUtils.queryNumEntries(db, TABLE_HISTORY); ... sql = "DELETE FROM " + TABLE_HISTORY + " WHERE " + History._ID + " " + "IN ( SELECT " + History._ID + " FROM " + TABLE_HISTORY + " " + "ORDER BY " + sortOrder + " LIMIT " + toRemove + ")"; } trace("Deleting using query: " + sql); WW db.execSQL(sql);

Richard Newman [:rnewman]

Assignee

Comment 5

•

11 years ago

Possible solutions (all or some): 1. The hammer: re-enable locking. It's probably not buying us much. 2. Find situations in which a lock must be upgraded within a transaction (a write after a read), and handle this exception by finishing the transaction and starting a new one to perform the write. This assumes that the competing write didn't invalid the results of the previous read. 3. ?

Richard Newman [:rnewman]

Assignee

Comment 6

•

11 years ago

Bug 876589 involves a crash in setLocale while executing a query. setLocale involves a write. The stack implies that the connection pool was exhausted, and a new connection was being opened -- and I'd bet that one of those other connections was in the middle of a write. We can mitigate that somewhat by closing cursors more aggressively, so the pool stays free, but otherwise we simply have to assume that any *read* might involve a write, just in order to open a connection!

Richard Newman [:rnewman]

Assignee

Comment 7

•

11 years ago

I think it's fair to summarize: any concurrent query -- read or write -- issued against SQLite on Android, with locking disabled, can raise this error. It can do so for simple reasons (two simultaneous writes), or for complex ones (upgrading locks within a transaction), but it can also do it *simply because you issued a read that required a write under-the-hood*. Could someone check my math on this?

Richard Newman [:rnewman]

Assignee

Comment 8

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #3) > * If we *ever* run a writing command after a read command within a > transaction begun on that thread. Bugzilla ate the rest of this comment, which was a link to http://sqlite.1065341.n5.nabble.com/Begin-immediate-transaction-gt-SQLITE-BUSY-database-is-locked-td64878.html

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 9

•

11 years ago

Just throwing some background into this bug. I landed the code to disable the locking in bug 843005. We did that as a possible way to reduce the DB locking crashes, basically the same kind of bug as this. Obviously this did not help, mainly I think because the SQLITE_BUSY is a native problem, not happening in the Java layer.

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 10

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #4) > Oh hey! Check this out! We upgrade a read to a write inside a transaction in > expireHistory! > > > >> * Call this method within a transaction. > */ > private void expireHistory(final SQLiteDatabase db, final int retain, > final long keepAfter) { > Log.d(LOGTAG, "Expiring history."); > RR final long rows = DatabaseUtils.queryNumEntries(db, TABLE_HISTORY); > > ... > > sql = "DELETE FROM " + TABLE_HISTORY + " WHERE " + History._ID + > " " + > "IN ( SELECT " + History._ID + " FROM " + TABLE_HISTORY + > " " + > "ORDER BY " + sortOrder + " LIMIT " + toRemove + ")"; > } > trace("Deleting using query: " + sql); > WW db.execSQL(sql); I wonder if we can just drop the query for number of entries, or move it into the DELETE sql as a subselect.

Richard Newman [:rnewman]

Assignee

Comment 11

•

11 years ago

I'm going to mark this as assigned to me, hoping that at some point I'll have time to revisit it. Feel free to steal, future dev with intimate knowledge of the Android database system.

Assignee: nobody → rnewman

Comment 12

•

11 years ago

Commit pushed to master at https://github.com/mozilla/MakeAPI https://github.com/mozilla/MakeAPI/commit/956e8b6d8e12159f4a40ecbb7da96158feaddec6 Bug 947939 - Fix mixed up logic in the search query generator

Richard Newman [:rnewman]

Assignee

Comment 13

•

11 years ago

(In reply to [github robot] from comment #12) > Bug 947939 - Fix mixed up logic in the search query generator That should have been Bug 974939, according to cade. Disregard!

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Blocks: 975024

Richard Newman [:rnewman]

Assignee

Comment 14

•

11 years ago

Tactics I plan to employ: * Primarily: before writing on a thread that has seen previous reads, and where consistency isn't required, finish then restart the transaction. * Secondarily: when performing an operation that might try to write (primarily UPDATE and DELETE, but also anything that might open a DB connection!), catch SQLiteDatabaseLockedException and muffle it. * Thirdly: try to reduce interleaving of reads and writes. * Fourthly: try to reduce the likelihood of writes occurring on multiple threads.

Status: NEW → ASSIGNED

Richard Newman [:rnewman]

Assignee

Comment 15

•

11 years ago

The following sequence of patches do the following: Part 0: generally clean up BrowserProvider: style nits, correctness and perf nits (e.g., moving DB operations closer together, removing needless null checks), adding final where it makes sense, adding comments, minor reordering of operations for transaction safety (e.g., moving a delete before a query in order to avoid a write upgrade). Part 1: rewrites cleanupSomeDeletedRecords. Introduces the use of SQL "IN" to avoid iterating over live cursors issuing individual deletes. Not only does that avoid overlapping reads and writes (albeit on the same thread), but it turns six queries into two. Part 2: threads a "should I use transactions?" context through various delete and update operations. This is used to only start transactions immediately before a write, rather than before a read-then-write sequence. It also answers one of the TODOs from Part 0, replacing another iteration with an IN. I could be persuaded to use a different approach here (e.g., a static method "shouldUseTransactions()" instead of threading), but short of abandoning the idea of using transactions altogether, we need to delegate the choice of when to establish a transaction to the method that's actually doing the writing. (Or, I suppose, we could terminate existing transactions immediately before starting a write... but that seems painful too.) Part 3: refactors some "IN" handling from earlier parts, using it to reimplement `updateBookmarks` to not make N+1 queries with interleaved read/writes. About to run Robocop on these.

Richard Newman [:rnewman]

Assignee

Comment 16

•

11 years ago

Attached patch Part 0: general cleanup in BrowserProvider. (obsolete) — Details — Splinter Review

Attachment #8380067 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 17

•

11 years ago

Attached patch Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. (obsolete) — Details — Splinter Review

Attachment #8380068 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 18

•

11 years ago

Attached patch Part 2: transaction handling in deletes and updates. (obsolete) — Details — Splinter Review

Attachment #8380069 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 19

•

11 years ago

Attached patch Part 3: more efficient bulk updates of bookmarks. (obsolete) — Details — Splinter Review

Attachment #8380070 - Flags: review?(nalexander)

Nick Alexander :nalexander [he/him]

Comment 20

•

11 years ago

Comment on attachment 8380067 [details] [diff] [review] Part 0: general cleanup in BrowserProvider. Review of attachment 8380067 [details] [diff] [review]: ----------------------------------------------------------------- rs=me, but this isn't just cleanup -- there's DB re-ordering. ::: mobile/android/base/db/BrowserProvider.java @@ +1069,3 @@ > } > } finally { > + cursor.close(); Hmm, you don't like the null check? I think FindBugs will want it.

Nick Alexander :nalexander [he/him]

Comment 21

•

11 years ago

Comment on attachment 8380068 [details] [diff] [review] Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. Review of attachment 8380068 [details] [diff] [review]: ----------------------------------------------------------------- Looks good modulo doc confusion. ::: mobile/android/base/db/BrowserProvider.java @@ +271,5 @@ > Log.d(LOGTAG, message); > } > } > > + public static String computeSQLInClause(int items, String field) { public? And does this not already exist in Fennec? Should it exist somewhere else? @@ +337,5 @@ > + > + // We don't need to finish the transaction here, because it will > + // necessarily already be a writing transaction. > + final String inClause = computeSQLInClause(ids.length, > + CommonColumns._ID); I find this comment confusing. I see no transaction in this method, so any transaction in play must be the caller's responsibility. As such, any comments about transactions should be removed or in the method's Javadoc, no?

Attachment #8380068 - Flags: review?(nalexander) → review+

Nick Alexander :nalexander [he/him]

Comment 22

•

11 years ago

Comment on attachment 8380069 [details] [diff] [review] Part 2: transaction handling in deletes and updates. Review of attachment 8380069 [details] [diff] [review]: ----------------------------------------------------------------- I don't have the energy to address BP.java right now, but I'd like to discuss why we're having callers manage |shouldUseTransactions| at all. Is this a method that all providers should implement (and test code can override so that we can see both behaviours)? ::: mobile/android/base/db/TransactionalProvider.java @@ +54,5 @@ > */ > abstract protected Uri insertInTransaction(Uri uri, ContentValues values); > > /* > * Deletes items from the database within a DB transaction. nit: trailing ws. @@ +62,5 @@ > + * @param selection an optional filter to match rows to update > + * > + * @param selectionArgs arguments for the selection > + * > + * @param shouldUseTransactions true if the method is expected to start a use transactions? If it only uses one, can we call this shouldStartTransaction? @@ +70,2 @@ > */ > + abstract protected int deleteInTransaction(Uri uri, String selection, String[] selectionArgs, boolean shouldUseTransactions); So, this is pretty weird. Name is inTransaction, arg is useTransactions? @@ +75,5 @@ > * > * @param uri Query URI > * @param values A set of column_name/value pairs to add to the database. > * @param selection An optional filter to match rows to update. > + * @param selectionArgs Arguments for the selection nit: ws, full sentence. @@ +83,3 @@ > * @return number of rows impacted by the update > */ > + abstract protected int updateInTransaction(Uri uri, ContentValues values, String selection, String[] selectionArgs, boolean shouldUseTransactions); Ditto weird name. @@ +137,5 @@ > @Override > public int delete(Uri uri, String selection, String[] selectionArgs) { > trace("Calling delete on URI: " + uri); > > + final boolean shouldUseTransactions = Build.VERSION.SDK_INT >= 11; Break this out into a method, javadoc what the hell it means.

Attachment #8380069 - Flags: review?(nalexander) → feedback+

Richard Newman [:rnewman]

Assignee

Comment 23

•

11 years ago

Thanks for swift reviews! (In reply to Nick Alexander :nalexander from comment #21) > > + public static String computeSQLInClause(int items, String field) { > > public? And does this not already exist in Fennec? Should it exist > somewhere else? It already exists as part of RepoUtils. It was so small I didn't see a point in importing that. At some point it might be nice to break out some of our very reusable code as totally separate libraries, and stick them on GitHub. It's changed to private in a later commit. > I find this comment confusing. I see no transaction in this method, so any > transaction in play must be the caller's responsibility. As such, any > comments about transactions should be removed or in the method's Javadoc, no? Perhaps. What this is getting at is: if we're already in a transaction, if it has issued a read query, we would have to kill it with fire in order to be confident in our next operation (because a read followed by a write can cause SQLITE_BUSY). Knowing the state of the transaction requires non-local knowledge -- in that way this is a lot like understanding thread safety. I'll clarify the comments. (In reply to Nick Alexander :nalexander from comment #20) > rs=me, but this isn't just cleanup -- there's DB re-ordering. Yeah, see Comment 15: "minor reordering of operations for transaction safety (e.g., moving a delete before a query in order to avoid a write upgrade)". > Hmm, you don't like the null check? I think FindBugs will want it. That null check pattern is bogus. SQLiteDatabase#query isn't documented as returning null in any circumstance, and it turns out that it only does so if the query is malformed. In that case we'll have already triggered a NPE inside the body of the `try`, when we try to use the cursor; checking for `null` here wins us nothing. Fennec style is to throw the NPE in this kind of scenario, so that we get a crash report in the right place.

Richard Newman [:rnewman]

Assignee

Comment 24

•

11 years ago

(In reply to Nick Alexander :nalexander from comment #22) > Comment on attachment 8380069 [details] [diff] [review] > Part 2: transaction handling in deletes and updates. > > Review of attachment 8380069 [details] [diff] [review]: > ----------------------------------------------------------------- > > I don't have the energy to address BP.java right now, but I'd like to > discuss why we're having callers manage |shouldUseTransactions| at all. See details in Comment 15. The alternative is to make it a field or method on the Provider, which is becoming more attractive to me. > Is this a method that all providers should implement (and test code can > override so that we can see both behaviours)? In principle they could. Everything that derives from TransactionalProvider needs to implement this API, of course. > > + * @param shouldUseTransactions true if the method is expected to start a > > use transactions? If it only uses one, can we call this > shouldStartTransaction? It can use as many as it wishes. This argument means, in full: based on the device you're running on, we reckon you should use transactions to bundle writes together; you're responsible for using as many as you need to; and you may return without finishing the transaction. > So, this is pretty weird. Name is inTransaction, arg is useTransactions? Legacy. I didn't have the energy to delete TransactionalProvider and inline this stuff into delete(). But note that this has always been "deleteInTransactionWhereTheCallerWillHaveStartedOneIfNecessary", and now it's "deleteUsingTransactionsIfYouThinkYouShould,WeThinkYouShould". The only purpose of this method now is to (a) do the check for whether transactions should be used, and (b) to do the repetitive cleanup work afterwards. > > * @return number of rows impacted by the update > > */ > > + abstract protected int updateInTransaction(Uri uri, ContentValues values, String selection, String[] selectionArgs, boolean shouldUseTransactions); > > Ditto weird name. Ain't my fault, buddy :D

Richard Newman [:rnewman]

Assignee

Comment 25

•

11 years ago

Attached patch Part 2: transaction handling in deletes and updates.* * * (obsolete) — Details — Splinter Review

Bug 947939 - Part 2b: review comments.

Attachment #8380109 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380069 - Attachment is obsolete: true

Richard Newman [:rnewman]

Assignee

Comment 26

•

11 years ago

Attached patch Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query, ensure it's transaction-sane. (obsolete) — Details — Splinter Review

It occurred to me that this method's recursive calling of CP methods with a URI would lead to it closing the existing transaction prematurely. It also did a stupid amount of extra work: composing URIs just so it could parse them again elsewhere. I stripped all of that out, and just call methods on the right DB instead.

Attachment #8380283 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380068 - Attachment is obsolete: true

Richard Newman [:rnewman]

Assignee

Comment 27

•

11 years ago

Attached patch Pre: fix testBrowserProvider#ensureEmptyDatabase. (obsolete) — Details — Splinter Review

I noticed that this code was spewing errors about non-closed cursors. So let's do that.

Attachment #8380308 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 28

•

11 years ago

Comment on attachment 8380308 [details] [diff] [review] Pre: fix testBrowserProvider#ensureEmptyDatabase. Obsoleted: moved to Bug 975792.

Attachment #8380308 - Attachment is obsolete: true

Attachment #8380308 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 29

•

11 years ago

Lots of additional rework gone on: batching also tries to do transaction handling, and that doesn't play nicely with the implicit transaction handling that we do inside insert/delete/update. Fixed. testBrowserProvider now passes on my machine. Try build: https://tbpl.mozilla.org/?tree=Try&rev=5e3a64acff62

Nick Alexander :nalexander [he/him]

Comment 30

•

11 years ago

> https://tbpl.mozilla.org/?tree=Try&rev=5e3a64acff62 Oh Christmas tree, oh Christmas tree, ...

Richard Newman [:rnewman]

Assignee

Comment 31

•

11 years ago

Yeah, that's what happens when you say "oh, I'll just take out this totally unnecessary bit" immediately before pushing to try. :P

Richard Newman [:rnewman]

Assignee

Comment 32

•

11 years ago

Much better: https://tbpl.mozilla.org/?tree=Try&rev=4c210bb5bd93

Richard Newman [:rnewman]

Assignee

Comment 33

•

11 years ago

Attached patch Part 0: general cleanup in BrowserProvider. — Details — Splinter Review

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380067 - Attachment is obsolete: true

Attachment #8380067 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 34

•

11 years ago

Attached patch Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. * * — Details — Splinter Review

Bug 947939 - Part 1b: ensure that cleanupSomeDeletedRecords doesn't inadvertently close existing transactions.

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380283 - Attachment is obsolete: true

Attachment #8380283 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 35

•

11 years ago

Attached patch Part 2: transaction handling in deletes and updates.* * * (obsolete) — Details — Splinter Review

Bug 947939 - Part 2b: review comments. * * * Bug 947939 - Part 2c: trivial cleanup. * * * Bug 947939 - Part 3: more efficient bulk updates of bookmarks. * * * Bug 947939 - Part 4: miscellaneous cleanup.

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380109 - Attachment is obsolete: true

Attachment #8380109 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 36

•

11 years ago

Comment on attachment 8380826 [details] [diff] [review] Part 2: transaction handling in deletes and updates.* * * This subsumes all remaining parts, because it got quite intertwingled. Note in particular this comment: + * This is a ThreadLocal separate from `db.inTransaction` because batched

Richard Newman [:rnewman]

Assignee

Comment 37

•

11 years ago

Comment on attachment 8380070 [details] [diff] [review] Part 3: more efficient bulk updates of bookmarks. Folded into part 2.

Attachment #8380070 - Attachment is obsolete: true

Attachment #8380070 - Flags: review?(nalexander)

Richard Newman [:rnewman]

Assignee

Comment 38

•

11 years ago

Attached patch Part 2: transaction handling in deletes and updates, more efficient bulk updates. — Details — Splinter Review

Richard Newman [:rnewman]

Assignee

Updated

•

11 years ago

Attachment #8380826 - Attachment is obsolete: true

Nick Alexander :nalexander [he/him]

Comment 39

•

11 years ago

Comment on attachment 8380823 [details] [diff] [review] Part 0: general cleanup in BrowserProvider. Review of attachment 8380823 [details] [diff] [review]: ----------------------------------------------------------------- Knowledge translation for the win. ::: mobile/android/base/db/BrowserProvider.java @@ +967,5 @@ > } > b.append(" WHEN ? THEN " + i); > } > > + // TODO: use computeSQLInClause nit: full sentence. @@ +1345,5 @@ > values.put(History.VISITS, 0); > values.put(History.DATE_MODIFIED, System.currentTimeMillis()); > > + // Doing this UPDATE (or the DELETE above) first ensures that the > + // first operation within this transaction is a write. How do we know we're in a transaction, and why do we know this is the first operation? Do the caller's expectations need to be stated in the JavaDoc too? @@ +1347,5 @@ > > + // Doing this UPDATE (or the DELETE above) first ensures that the > + // first operation within this transaction is a write. > + // The cleanup call below will do a SELECT first, and thus would > + // require the transaction to be upgraded from a reader to a writer. Which isn't kosher? What's the bad behaviour? How will people realize they've violated this? Nice to have this in the commit message, too.

Attachment #8380823 - Flags: review+

Nick Alexander :nalexander [he/him]

Comment 40

•

11 years ago

Comment on attachment 8380823 [details] [diff] [review] Part 0: general cleanup in BrowserProvider. Review of attachment 8380823 [details] [diff] [review]: ----------------------------------------------------------------- Knowledge translation for the win. ::: mobile/android/base/db/BrowserProvider.java @@ +967,5 @@ > } > b.append(" WHEN ? THEN " + i); > } > > + // TODO: use computeSQLInClause nit: full sentence. @@ +1345,5 @@ > values.put(History.VISITS, 0); > values.put(History.DATE_MODIFIED, System.currentTimeMillis()); > > + // Doing this UPDATE (or the DELETE above) first ensures that the > + // first operation within this transaction is a write. How do we know we're in a transaction, and why do we know this is the first operation? Do the caller's expectations need to be stated in the JavaDoc too? @@ +1347,5 @@ > > + // Doing this UPDATE (or the DELETE above) first ensures that the > + // first operation within this transaction is a write. > + // The cleanup call below will do a SELECT first, and thus would > + // require the transaction to be upgraded from a reader to a writer. Which isn't kosher? What's the bad behaviour? How will people realize they've violated this? Nice to have this in the commit message, too.

Nick Alexander :nalexander [he/him]

Comment 41

•

11 years ago

Comment on attachment 8380824 [details] [diff] [review] Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query. * * Review of attachment 8380824 [details] [diff] [review]: ----------------------------------------------------------------- lgtm. ::: mobile/android/base/db/BrowserProvider.java @@ +292,5 @@ > + /** > + * Clean up some deleted records from the specified table. > + * > + * If called in an existing transaction, it is the caller's responsibility > + * to ensure that we're already the only writer. If not called in an I find "only writer" confusing. Already in a writing transaction? Per discussion in IRC, "only writer" is by necessity, and "already in a writing transaction" is a little clearer, so try to work in both.

Attachment #8380824 - Flags: review+

Nick Alexander :nalexander [he/him]

Comment 42

•

11 years ago

Comment on attachment 8380831 [details] [diff] [review] Part 2: transaction handling in deletes and updates, more efficient bulk updates. Review of attachment 8380831 [details] [diff] [review]: ----------------------------------------------------------------- I really gave this the college try. It looks good to me, but I'd be *shocked* if we haven't missed a case. ::: mobile/android/base/db/BrowserProvider.java @@ +654,5 @@ > new String[] { Long.toString(ContentUris.parseId(uri)) }); > // fall through > case HISTORY: { > debug("Updating history: " + uri); > + if (shouldUpdateOrInsert(uri)) { Would have preferred the bracing noise separately. Out of interest, post to mobile-firefox-dev if it's possible to make Eclipse do this (and only this) automatically. @@ +924,5 @@ > > /** > * Update the positions of bookmarks in batches. > * > + * Establishes and finishes its own transactions. Begins and ends. Why introduce new nomenclature? @@ +1099,2 @@ > try { > + ids = computeSQLInArgumentsFromLongs(cursor); Is it worth skipping the big String[] allocation and building inClause below inline? I know you're a big efficiency fan. ::: mobile/android/base/db/TransactionalProvider.java @@ +56,5 @@ > > /* > * Deletes items from the database within a DB transaction. > * > + * @param uri query URI Harmonize this formatting with the formatting (and nicer descriptions) for |updateInTransaction| below. @@ +60,5 @@ > + * @param uri query URI > + * @param selection an optional filter to match rows to update > + * @param selectionArgs arguments for the selection > + * > + * @return number of rows impacted by the deletion Is this different than "number of rows deleted"? (Perhaps due to db constraint propagation?) @@ +140,5 @@ > + return Build.VERSION.SDK_INT >= 11; > + } > + > + /** > + * This is a ThreadLocal separate from `db.inTransaction` because batched This comment is great, but I'd like to make it even greater. Give me a one line summary: "Track whether we're in a batch operation." Then include the detail line "When we're in a batch...". Can we then talk about intended use? Then the caveat, and finally the note. @@ +142,5 @@ > + > + /** > + * This is a ThreadLocal separate from `db.inTransaction` because batched > + * operations start transactions independent of individual ContentProvider > + * operations. This doesn't work well with the entire concept of this What is this? Be explicit. @@ +169,5 @@ > + > + /** > + * If we're not currently in a transaction, and we should be, start one. > + */ > + protected void beginWrite(final SQLiteDatabase db) { These functions are a breath of fresh air in this file, but I *still* find them confusing, because |beginWrite| and |endWrite| don't bookend each other. I really had to dig into the callers to get this straight. Is there some semantic that means "start a transaction and set isInBatch", versus "start a transaction if I'm not already in batch"? @@ +181,5 @@ > + db.beginTransaction(); > + } > + } > + > + protected void markWriteSuccessful(final SQLiteDatabase db) { nit: the Javadoc above is nice. @@ +183,5 @@ > + } > + > + protected void markWriteSuccessful(final SQLiteDatabase db) { > + if (isInBatch()) { > + return; Any reason not to trace here, too? Too frequent? @@ +284,5 @@ > > final SQLiteDatabase db = getWritableDatabase(uri); > > + debug("bulkInsert: explicitly starting transaction."); > + isInBatchOperation.set(Boolean.TRUE); OK, I can see that this isInBatchOperation assignment bookends the one below. It's weird that we manually set this flag and call |beginTransaction| ourselves. Should this be wrapped up in something more transparent? I suppose this is in TP; somebody's gotta do the actual transaction.

Attachment #8380831 - Flags: review+

Richard Newman [:rnewman]

Assignee

Comment 43

•

11 years ago

(In reply to Nick Alexander :nalexander from comment #40) > > + // TODO: use computeSQLInClause > > nit: full sentence. 'sok, the later parts actually do this, so the comment is gone. > How do we know we're in a transaction, and why do we know this is the first > operation? Do the caller's expectations need to be stated in the JavaDoc > too? Global analysis, but I'll add a comment! > Which isn't kosher? What's the bad behaviour? How will people realize > they've violated this? Nice to have this in the commit message, too. Alas, there's little we can do to programmatically ensure that this failure mode is avoided, but at least there are comments everywhere, and it's hard to ignore all of the transaction-oriented stuff hanging around now. Will augment the commit message.

Richard Newman [:rnewman]

Assignee

Comment 44

•

11 years ago

(In reply to Nick Alexander :nalexander from comment #42) > Is it worth skipping the big String[] allocation and building inClause below > inline? I know you're a big efficiency fan. Good spot. > > + * @return number of rows impacted by the deletion > > Is this different than "number of rows deleted"? (Perhaps due to db > constraint propagation?) It returns the value that the CP's own `deleteInTransaction` method returns, so I'd call that "it depends". > Is there some semantic that means "start a transaction and set isInBatch", > versus "start a transaction if I'm not already in batch"? Introduced {begin,end}Batch, markBatchSuccessful to mirror *Write.

Richard Newman [:rnewman]

Assignee

Comment 45

•

11 years ago

Final try push with review comments: https://tbpl.mozilla.org/?tree=Try&rev=1f684f108906

Richard Newman [:rnewman]

Assignee

Comment 46

•

11 years ago

Try looks like Boston in mid-March. I'll do some manual smoke testing in the morning, then land this. If the general approach is supported by crash stats, I'll prepare the smallest possible fix -- likely a "busy? let's end that transaction and try again" patch -- for Aurora and Beta.

Richard Newman [:rnewman]

Assignee

Comment 47

•

11 years ago

Hand-tested Android import, syncing, browsing. Looks good, even under extensive logging. https://hg.mozilla.org/integration/fx-team/rev/63059742d136 https://hg.mozilla.org/integration/fx-team/rev/ba49fd7dcf7e https://hg.mozilla.org/integration/fx-team/rev/8e8a2c4025ee

status-firefox30: --- → fixed

Target Milestone: --- → Firefox 30

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 48

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/63059742d136 https://hg.mozilla.org/mozilla-central/rev/ba49fd7dcf7e https://hg.mozilla.org/mozilla-central/rev/8e8a2c4025ee

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Richard Newman [:rnewman]

Assignee

Comment 49

•

11 years ago

There's nothing in crash reports to indicate that this regressed, so it's time to think about uplift. Arguments in favor of uplifting all three parts: * It seems to work, whereas a re-implementation on Aurora might not. For example, some of the superficially superfluous changes (removing interleaved reads and writes during expiry) might be a key part of the overall solution -- just making a transaction-related change might not be enough. * It's not that much code. * It's less work to uplift a completed patch set than to try something new. * It removes risk from the next cycle by taking it in 29. Arguments in favor of trying to distill some workaround for 29: * This patch set is not a minimal change, so there's some inherent risk. Does anyone have additional arguments for or against?

Lucas Rocha (:lucasr)

Comment 50

•

11 years ago

I'm in favor of uplift just for the fact that we'll only know if this patch actually has a positive effect with a user base larger than Nightly's.

Richard Newman [:rnewman]

Assignee

Comment 51

•

11 years ago

Comment on attachment 8380831 [details] [diff] [review] Part 2: transaction handling in deletes and updates, more efficient bulk updates. Batch approval for all three parts. [Approval Request Comment] Bug caused by (feature/regressing bug #): Historical SQL usage patterns. User impact if declined: Continued baseline of crashes. Testing completed (on m-c, etc.): Been baking on Nightly for some time. Automated tests pass. Risk to taking this patch (and alternatives if risky): Moderate but worthwhile. These patches aren't trivial, but they are thorough. At worst they should produce obvious failures, likely no worse than the current concurrency problems. At best this will eliminate a whole class of SQL bugs (meta: Bug 975024). String or IDL/UUID changes made by this patch: None.

Attachment #8380831 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

status-firefox29: --- → affected

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

Attachment #8380831 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 52

•

11 years ago

This needs a lot of love for uplifting to Aurora.

Flags: needinfo?(rnewman)

Keywords: branch-patch-needed

Richard Newman [:rnewman]

Assignee

Comment 53

•

11 years ago

This built and Robocop tests passed on my device. I inlined all of the TransactionalProvider bits into BrowserProvider. https://hg.mozilla.org/releases/mozilla-aurora/rev/3fcdc52fc85f

status-firefox29: affected → fixed

Flags: needinfo?(rnewman)

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Keywords: branch-patch-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 54

•

11 years ago

This seems to have broken troboprovider tests on tbpl: https://tbpl.mozilla.org/php/getParsedLog.php?id=35759962&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=35762432&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=35763845&tree=Mozilla-Aurora

Flags: needinfo?(rnewman)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 55

•

11 years ago

(In reply to Wes Kocher (:KWierso) from comment #54) > This seems to have broken troboprovider tests on tbpl: > https://tbpl.mozilla.org/php/getParsedLog.php?id=35759962&tree=Mozilla-Aurora > https://tbpl.mozilla.org/php/getParsedLog.php?id=35762432&tree=Mozilla-Aurora > https://tbpl.mozilla.org/php/getParsedLog.php?id=35763845&tree=Mozilla-Aurora Backed out in https://hg.mozilla.org/releases/mozilla-aurora/rev/1cd27831f2b1

Richard Newman [:rnewman]

Assignee

Comment 56

•

11 years ago

I was just looking at that myself. Probably hg fuzz-factored wrong, or I screwed up a manual conflict resolution: 03-06 18:55:48.242 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93 19:00:20 INFO - 03-06 18:56:18.265 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93 19:00:20 INFO - 03-06 18:56:48.289 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93 19:00:20 INFO - 03-06 18:57:18.312 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93 19:00:20 INFO - 03-06 18:57:48.312 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93 19:00:20 INFO - 03-06 18:58:18.367 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-firefox29: fixed → affected

Richard Newman [:rnewman]

Assignee

Comment 57

•

11 years ago

I re-merged, ran a try build with some retriggers for good measure, and it's green so far: https://tbpl.mozilla.org/?tree=Try&rev=aa3f376a5afa Will reland when all of those are done.

Richard Newman [:rnewman]

Assignee

Comment 58

•

11 years ago

Such green. Much x86. Wow. https://hg.mozilla.org/releases/mozilla-aurora/rev/f51e42ad12e2 https://hg.mozilla.org/releases/mozilla-aurora/rev/ebf3963ed46a

status-firefox29: affected → fixed

Flags: needinfo?(rnewman)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 59

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #57) > I re-merged, ran a try build with some retriggers for good measure, and it's > green so far: > > https://tbpl.mozilla.org/?tree=Try&rev=aa3f376a5afa > > Will reland when all of those are done. Turns out this try run didn't run anything from the Talos suite (including the trobocheck test that caused the backout.

Phil Ringnalda (:philor)

Comment 60

•

11 years ago

Backed out in https://hg.mozilla.org/releases/mozilla-aurora/rev/fbbdc60eea00 - roboprovider, yeah, but given how many retries there are already on my few little retriggers of robocop-1, I think the massive retries on Try weren't a coincidence, either.

status-firefox29: fixed → affected

Richard Newman [:rnewman]

Assignee

Comment 61

•

11 years ago

Thanks, chaps.

Richard Newman [:rnewman]

Assignee

Comment 62

•

11 years ago

Found it. createFavicon was reimplemented in 30; in 29 it still calls insertFavicon, which opens but doesn't close a transaction. Simple solution. I think testBrowserProviderPerf is a pretty bad Talos test, though -- it opens two databases, inserting into one and querying the other, as far as I can tell. Nick might have fixed that in Bug 938821.

Nick Alexander :nalexander [he/him]

Comment 63

•

11 years ago

> I think testBrowserProviderPerf is a pretty bad Talos test, though -- it > opens two databases, inserting into one and querying the other, as far as I > can tell. Nick might have fixed that in Bug 938821. If I did, it was unintentional.

Richard Newman [:rnewman]

Assignee

Comment 64

•

11 years ago

I have a clean try build for this: https://tbpl.mozilla.org/?tree=Try&rev=b6089b2ea2b3 but distributed favicons don't get loaded correctly for a clean-install Aurora build. Investigating.

Richard Newman [:rnewman]

Assignee

Comment 65

•

11 years ago

Excellent, a derp. Works fine if I... mark the transaction as successful before committing it. Relanded. If this bounces, I give up. https://hg.mozilla.org/releases/mozilla-aurora/rev/72c55d43d54a https://hg.mozilla.org/releases/mozilla-aurora/rev/6b2e2ac0d425

status-firefox29: affected → fixed

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

You need to log in before you can comment on or make changes to this bug.