SQLiteDatabaseLockedException from BrowserDB.expireHistory

RESOLVED FIXED in Firefox 29

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: mfinkle, Assigned: rnewman)

Tracking

(Blocks 1 bug)

unspecified
Firefox 30
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox29 fixed, firefox30 fixed)

Details

Attachments

(3 attachments, 8 obsolete attachments)

14.60 KB, patch
nalexander
: review+
Details | Diff | Splinter Review
6.40 KB, patch
nalexander
: review+
Details | Diff | Splinter Review
37.01 KB, patch
nalexander
: review+
Details | Diff | Splinter Review
https://crash-stats.mozilla.com/report/index/f67e87d1-b194-4e2c-971f-97e162131208

Technically, we could catch this exception and ignore it. We will attempt to expireHistory again at some point.

If we really need to expireHistory, we should just re-try the call once or twice.
Blocks: 752828
OS: Mac OS X → Android
Hardware: x86 → All
See retry comments here:

http://developer.android.com/reference/android/database/sqlite/SQLiteDatabaseLockedException.html

My intuition is that immediately retrying won't help, and just muffling will cause the same error in the next write call.

There has to be a reason for this. Much ponder, so think.
(In reply to Richard Newman [:rnewman] from comment #1)
> See retry comments here:
> 
> http://developer.android.com/reference/android/database/sqlite/
> SQLiteDatabaseLockedException.html
> 
> My intuition is that immediately retrying won't help, and just muffling will
> cause the same error in the next write call.
> 
> There has to be a reason for this. Much ponder, so think.

We were thinking about delay posting back onto the thread.
Here are my guesses.


This exception is thrown because somewhere down in the bowels, sqlite is returning SQLITE_BUSY. That means one of these things:

* Two simultaneous writes.
* A write and a read, in some circumstances (WAL should eliminate this).
* An attempt to upgrade a read transaction to a write after a parallel change has been committed (<http://sqlite.1065341.n5.nabble.com/Begin-immediate-transaction-gt-SQLITE-BUSY-database-is-locked-td64878.html>).
* A write while WAL checkpointing is occurring?


BrowserProvider:

            // From Honeycomb on, it's possible to run several db
            // commands in parallel using multiple connections.
            if (Build.VERSION.SDK_INT >= 11) {
                db.enableWriteAheadLogging();
                db.setLockingEnabled(false);


Firstly, setLockingEnabled does nothing on 16+, which probably explains why we only see this problem on API15: different threads cannot issue writes at the same time because of the exclusive locking. That Android now forces locking is working to hide our bug.

Secondly, disabling the sqlite connection locking means that we must be sure never to issue two simultaneous writes, or otherwise hit one of the events above.

My theories as to what 'shapes' in our code are the causes here:

* We write to BrowserProvider from multiple threads. Sync has its own threads, for one thing, but we also might hit this from within Fennec's main code.
* We retain cursors in order to draw our list UI. These might overlap with writes in some dangerous way.
* If we *ever* run a writing command after a read command within a transaction begun on that thread.
Oh hey! Check this out! We upgrade a read to a write inside a transaction in expireHistory!


>>     * Call this method within a transaction.
       */
      private void expireHistory(final SQLiteDatabase db, final int retain, final long keepAfter) {
          Log.d(LOGTAG, "Expiring history.");
RR        final long rows = DatabaseUtils.queryNumEntries(db, TABLE_HISTORY);

...

            sql = "DELETE FROM " + TABLE_HISTORY + " WHERE " + History._ID + " " +
                  "IN ( SELECT " + History._ID + " FROM " + TABLE_HISTORY + " " +
                  "ORDER BY " + sortOrder + " LIMIT " + toRemove + ")";
        }
        trace("Deleting using query: " + sql);
WW      db.execSQL(sql);
Possible solutions (all or some):

1. The hammer: re-enable locking. It's probably not buying us much.

2. Find situations in which a lock must be upgraded within a transaction (a write after a read), and handle this exception by finishing the transaction and starting a new one to perform the write. This assumes that the competing write didn't invalid the results of the previous read.

3. ?
Bug 876589 involves a crash in setLocale while executing a query. setLocale involves a write. The stack implies that the connection pool was exhausted, and a new connection was being opened -- and I'd bet that one of those other connections was in the middle of a write.

We can mitigate that somewhat by closing cursors more aggressively, so the pool stays free, but otherwise we simply have to assume that any *read* might involve a write, just in order to open a connection!
I think it's fair to summarize: any concurrent query -- read or write -- issued against SQLite on Android, with locking disabled, can raise this error.

It can do so for simple reasons (two simultaneous writes), or for complex ones (upgrading locks within a transaction), but it can also do it *simply because you issued a read that required a write under-the-hood*.

Could someone check my math on this?
(In reply to Richard Newman [:rnewman] from comment #3)

> * If we *ever* run a writing command after a read command within a
> transaction begun on that thread.

Bugzilla ate the rest of this comment, which was a link to 

http://sqlite.1065341.n5.nabble.com/Begin-immediate-transaction-gt-SQLITE-BUSY-database-is-locked-td64878.html
Just throwing some background into this bug. I landed the code to disable the locking in bug 843005. We did that as a possible way to reduce the DB locking crashes, basically the same kind of bug as this.

Obviously this did not help, mainly I think because the SQLITE_BUSY is a native problem, not happening in the Java layer.
(In reply to Richard Newman [:rnewman] from comment #4)
> Oh hey! Check this out! We upgrade a read to a write inside a transaction in
> expireHistory!
> 
> 
> >>     * Call this method within a transaction.
>        */
>       private void expireHistory(final SQLiteDatabase db, final int retain,
> final long keepAfter) {
>           Log.d(LOGTAG, "Expiring history.");
> RR        final long rows = DatabaseUtils.queryNumEntries(db, TABLE_HISTORY);
> 
> ...
> 
>             sql = "DELETE FROM " + TABLE_HISTORY + " WHERE " + History._ID +
> " " +
>                   "IN ( SELECT " + History._ID + " FROM " + TABLE_HISTORY +
> " " +
>                   "ORDER BY " + sortOrder + " LIMIT " + toRemove + ")";
>         }
>         trace("Deleting using query: " + sql);
> WW      db.execSQL(sql);

I wonder if we can just drop the query for number of entries, or move it into the DELETE sql as a subselect.
I'm going to mark this as assigned to me, hoping that at some point I'll have time to revisit it. Feel free to steal, future dev with intimate knowledge of the Android database system.
Assignee: nobody → rnewman
(In reply to [github robot] from comment #12)

> Bug 947939 - Fix mixed up logic in the search query generator

That should have been Bug 974939, according to cade. Disregard!
Blocks: 975024
Tactics I plan to employ:

* Primarily: before writing on a thread that has seen previous reads, and where consistency isn't required, finish then restart the transaction.

* Secondarily: when performing an operation that might try to write (primarily UPDATE and DELETE, but also anything that might open a DB connection!), catch SQLiteDatabaseLockedException and muffle it.

* Thirdly: try to reduce interleaving of reads and writes.

* Fourthly: try to reduce the likelihood of writes occurring on multiple threads.
Status: NEW → ASSIGNED
The following sequence of patches do the following:

Part 0: generally clean up BrowserProvider: style nits, correctness and perf nits (e.g., moving DB operations closer together, removing needless null checks), adding final where it makes sense, adding comments, minor reordering of operations for transaction safety (e.g., moving a delete before a query in order to avoid a write upgrade).

Part 1: rewrites cleanupSomeDeletedRecords. Introduces the use of SQL "IN" to avoid iterating over live cursors issuing individual deletes. Not only does that avoid overlapping reads and writes (albeit on the same thread), but it turns six queries into two.

Part 2: threads a "should I use transactions?" context through various delete and update operations. This is used to only start transactions immediately before a write, rather than before a read-then-write sequence. It also answers one of the TODOs from Part 0, replacing another iteration with an IN.

I could be persuaded to use a different approach here (e.g., a static method "shouldUseTransactions()" instead of threading), but short of abandoning the idea of using transactions altogether, we need to delegate the choice of when to establish a transaction to the method that's actually doing the writing. (Or, I suppose, we could terminate existing transactions immediately before starting a write... but that seems painful too.)

Part 3: refactors some "IN" handling from earlier parts, using it to reimplement `updateBookmarks` to not make N+1 queries with interleaved read/writes.

About to run Robocop on these.
Attachment #8380067 - Flags: review?(nalexander)
Attachment #8380069 - Flags: review?(nalexander)
Attachment #8380070 - Flags: review?(nalexander)
Comment on attachment 8380067 [details] [diff] [review]
Part 0: general cleanup in BrowserProvider.

Review of attachment 8380067 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me, but this isn't just cleanup -- there's DB re-ordering.

::: mobile/android/base/db/BrowserProvider.java
@@ +1069,3 @@
>              }
>          } finally {
> +            cursor.close();

Hmm, you don't like the null check?  I think FindBugs will want it.
Comment on attachment 8380068 [details] [diff] [review]
Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query.

Review of attachment 8380068 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good modulo doc confusion.

::: mobile/android/base/db/BrowserProvider.java
@@ +271,5 @@
>              Log.d(LOGTAG, message);
>          }
>      }
>  
> +    public static String computeSQLInClause(int items, String field) {

public?  And does this not already exist in Fennec?  Should it exist somewhere else?

@@ +337,5 @@
> +
> +        // We don't need to finish the transaction here, because it will
> +        // necessarily already be a writing transaction.
> +        final String inClause = computeSQLInClause(ids.length,
> +                                                   CommonColumns._ID);

I find this comment confusing.  I see no transaction in this method, so any transaction in play must be the caller's responsibility.  As such, any comments about transactions should be removed or in the method's Javadoc, no?
Attachment #8380068 - Flags: review?(nalexander) → review+
Comment on attachment 8380069 [details] [diff] [review]
Part 2: transaction handling in deletes and updates.

Review of attachment 8380069 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have the energy to address BP.java right now, but I'd like to discuss why we're having callers manage |shouldUseTransactions| at all.  Is this a method that all providers should implement (and test code can override so that we can see both behaviours)?

::: mobile/android/base/db/TransactionalProvider.java
@@ +54,5 @@
>       */
>      abstract protected Uri insertInTransaction(Uri uri, ContentValues values);
>  
>      /*
>       * Deletes items from the database within a DB transaction.

nit: trailing ws.

@@ +62,5 @@
> +     * @param selection an optional filter to match rows to update
> +     * 
> +     * @param selectionArgs arguments for the selection
> +     * 
> +     * @param shouldUseTransactions true if the method is expected to start a

use transactions?  If it only uses one, can we call this shouldStartTransaction?

@@ +70,2 @@
>       */
> +    abstract protected int deleteInTransaction(Uri uri, String selection, String[] selectionArgs, boolean shouldUseTransactions);

So, this is pretty weird.  Name is inTransaction, arg is useTransactions?

@@ +75,5 @@
>       *
>       * @param uri            Query URI
>       * @param values         A set of column_name/value pairs to add to the database.
>       * @param selection      An optional filter to match rows to update.
> +     * @param selectionArgs  Arguments for the selection  

nit: ws, full sentence.

@@ +83,3 @@
>       * @return               number of rows impacted by the update
>       */
> +    abstract protected int updateInTransaction(Uri uri, ContentValues values, String selection, String[] selectionArgs, boolean shouldUseTransactions);

Ditto weird name.

@@ +137,5 @@
>      @Override
>      public int delete(Uri uri, String selection, String[] selectionArgs) {
>          trace("Calling delete on URI: " + uri);
>  
> +        final boolean shouldUseTransactions = Build.VERSION.SDK_INT >= 11;

Break this out into a method, javadoc what the hell it means.
Attachment #8380069 - Flags: review?(nalexander) → feedback+
Thanks for swift reviews!


(In reply to Nick Alexander :nalexander from comment #21)

> > +    public static String computeSQLInClause(int items, String field) {
> 
> public?  And does this not already exist in Fennec?  Should it exist
> somewhere else?

It already exists as part of RepoUtils. It was so small I didn't see a point in importing that.

At some point it might be nice to break out some of our very reusable code as totally separate libraries, and stick them on GitHub.

It's changed to private in a later commit.


> I find this comment confusing.  I see no transaction in this method, so any
> transaction in play must be the caller's responsibility.  As such, any
> comments about transactions should be removed or in the method's Javadoc, no?

Perhaps. What this is getting at is: if we're already in a transaction, if it has issued a read query, we would have to kill it with fire in order to be confident in our next operation (because a read followed by a write can cause SQLITE_BUSY). Knowing the state of the transaction requires non-local knowledge -- in that way this is a lot like understanding thread safety.

I'll clarify the comments.



(In reply to Nick Alexander :nalexander from comment #20)

> rs=me, but this isn't just cleanup -- there's DB re-ordering.

Yeah, see Comment 15: "minor reordering of operations for transaction safety (e.g., moving a delete before a query in order to avoid a write upgrade)".
 

> Hmm, you don't like the null check?  I think FindBugs will want it.

That null check pattern is bogus. SQLiteDatabase#query isn't documented as returning null in any circumstance, and it turns out that it only does so if the query is malformed. In that case we'll have already triggered a NPE inside the body of the `try`, when we try to use the cursor; checking for `null` here wins us nothing.

Fennec style is to throw the NPE in this kind of scenario, so that we get a crash report in the right place.
(In reply to Nick Alexander :nalexander from comment #22)
> Comment on attachment 8380069 [details] [diff] [review]
> Part 2: transaction handling in deletes and updates.
> 
> Review of attachment 8380069 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't have the energy to address BP.java right now, but I'd like to
> discuss why we're having callers manage |shouldUseTransactions| at all.

See details in Comment 15. 

The alternative is to make it a field or method on the Provider, which is becoming more attractive to me.


> Is this a method that all providers should implement (and test code can
> override so that we can see both behaviours)?

In principle they could.

Everything that derives from TransactionalProvider needs to implement this API, of course. 


> > +     * @param shouldUseTransactions true if the method is expected to start a
> 
> use transactions?  If it only uses one, can we call this
> shouldStartTransaction?

It can use as many as it wishes. This argument means, in full: based on the device you're running on, we reckon you should use transactions to bundle writes together; you're responsible for using as many as you need to; and you may return without finishing the transaction.
 

> So, this is pretty weird.  Name is inTransaction, arg is useTransactions?

Legacy. I didn't have the energy to delete TransactionalProvider and inline this stuff into delete().

But note that this has always been "deleteInTransactionWhereTheCallerWillHaveStartedOneIfNecessary", and now it's "deleteUsingTransactionsIfYouThinkYouShould,WeThinkYouShould".

The only purpose of this method now is to (a) do the check for whether transactions should be used, and (b) to do the repetitive cleanup work afterwards.


> >       * @return               number of rows impacted by the update
> >       */
> > +    abstract protected int updateInTransaction(Uri uri, ContentValues values, String selection, String[] selectionArgs, boolean shouldUseTransactions);
> 
> Ditto weird name.

Ain't my fault, buddy :D
Bug 947939 - Part 2b: review comments.
Attachment #8380109 - Flags: review?(nalexander)
Attachment #8380069 - Attachment is obsolete: true
It occurred to me that this method's recursive calling of CP methods with a URI would lead to it closing the existing transaction prematurely. It also did a stupid amount of extra work: composing URIs just so it could parse them again elsewhere. I stripped all of that out, and just call methods on the right DB instead.
Attachment #8380283 - Flags: review?(nalexander)
Attachment #8380068 - Attachment is obsolete: true
I noticed that this code was spewing errors about non-closed cursors. So let's do that.
Attachment #8380308 - Flags: review?(nalexander)
Comment on attachment 8380308 [details] [diff] [review]
Pre: fix testBrowserProvider#ensureEmptyDatabase.

Obsoleted: moved to Bug 975792.
Attachment #8380308 - Attachment is obsolete: true
Attachment #8380308 - Flags: review?(nalexander)
Lots of additional rework gone on: batching also tries to do transaction handling, and that doesn't play nicely with the implicit transaction handling that we do inside insert/delete/update.

Fixed. testBrowserProvider now passes on my machine.

Try build:

https://tbpl.mozilla.org/?tree=Try&rev=5e3a64acff62
Yeah, that's what happens when you say "oh, I'll just take out this totally unnecessary bit" immediately before pushing to try. :P
Attachment #8380067 - Attachment is obsolete: true
Attachment #8380067 - Flags: review?(nalexander)
Bug 947939 - Part 1b: ensure that cleanupSomeDeletedRecords doesn't inadvertently close existing transactions.
Attachment #8380283 - Attachment is obsolete: true
Attachment #8380283 - Flags: review?(nalexander)
Bug 947939 - Part 2b: review comments.
* * *
Bug 947939 - Part 2c: trivial cleanup.
* * *
Bug 947939 - Part 3: more efficient bulk updates of bookmarks.
* * *
Bug 947939 - Part 4: miscellaneous cleanup.
Attachment #8380109 - Attachment is obsolete: true
Attachment #8380109 - Flags: review?(nalexander)
Comment on attachment 8380826 [details] [diff] [review]
Part 2: transaction handling in deletes and updates.* * *

This subsumes all remaining parts, because it got quite intertwingled.

Note in particular this comment:

+     * This is a ThreadLocal separate from `db.inTransaction` because batched
Comment on attachment 8380070 [details] [diff] [review]
Part 3: more efficient bulk updates of bookmarks.

Folded into part 2.
Attachment #8380070 - Attachment is obsolete: true
Attachment #8380070 - Flags: review?(nalexander)
Attachment #8380826 - Attachment is obsolete: true
Comment on attachment 8380823 [details] [diff] [review]
Part 0: general cleanup in BrowserProvider.

Review of attachment 8380823 [details] [diff] [review]:
-----------------------------------------------------------------

Knowledge translation for the win.

::: mobile/android/base/db/BrowserProvider.java
@@ +967,5 @@
>              }
>              b.append(" WHEN ? THEN " + i);
>          }
>  
> +        // TODO: use computeSQLInClause

nit: full sentence.

@@ +1345,5 @@
>          values.put(History.VISITS, 0);
>          values.put(History.DATE_MODIFIED, System.currentTimeMillis());
>  
> +        // Doing this UPDATE (or the DELETE above) first ensures that the
> +        // first operation within this transaction is a write.

How do we know we're in a transaction, and why do we know this is the first operation?  Do the caller's expectations need to be stated in the JavaDoc too?

@@ +1347,5 @@
>  
> +        // Doing this UPDATE (or the DELETE above) first ensures that the
> +        // first operation within this transaction is a write.
> +        // The cleanup call below will do a SELECT first, and thus would
> +        // require the transaction to be upgraded from a reader to a writer.

Which isn't kosher?  What's the bad behaviour?  How will people realize they've violated this?  Nice to have this in the commit message, too.
Attachment #8380823 - Flags: review+
Comment on attachment 8380823 [details] [diff] [review]
Part 0: general cleanup in BrowserProvider.

Review of attachment 8380823 [details] [diff] [review]:
-----------------------------------------------------------------

Knowledge translation for the win.

::: mobile/android/base/db/BrowserProvider.java
@@ +967,5 @@
>              }
>              b.append(" WHEN ? THEN " + i);
>          }
>  
> +        // TODO: use computeSQLInClause

nit: full sentence.

@@ +1345,5 @@
>          values.put(History.VISITS, 0);
>          values.put(History.DATE_MODIFIED, System.currentTimeMillis());
>  
> +        // Doing this UPDATE (or the DELETE above) first ensures that the
> +        // first operation within this transaction is a write.

How do we know we're in a transaction, and why do we know this is the first operation?  Do the caller's expectations need to be stated in the JavaDoc too?

@@ +1347,5 @@
>  
> +        // Doing this UPDATE (or the DELETE above) first ensures that the
> +        // first operation within this transaction is a write.
> +        // The cleanup call below will do a SELECT first, and thus would
> +        // require the transaction to be upgraded from a reader to a writer.

Which isn't kosher?  What's the bad behaviour?  How will people realize they've violated this?  Nice to have this in the commit message, too.
Comment on attachment 8380824 [details] [diff] [review]
Part 1: reimplement cleanupSomeDeletedRecords to make only one delete query.  * *

Review of attachment 8380824 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm.

::: mobile/android/base/db/BrowserProvider.java
@@ +292,5 @@
> +    /**
> +     * Clean up some deleted records from the specified table.
> +     *
> +     * If called in an existing transaction, it is the caller's responsibility
> +     * to ensure that we're already the only writer. If not called in an

I find "only writer" confusing.  Already in a writing transaction?  Per discussion in IRC, "only writer" is by necessity, and "already in a writing transaction" is a little clearer, so try to work in both.
Attachment #8380824 - Flags: review+
Comment on attachment 8380831 [details] [diff] [review]
Part 2: transaction handling in deletes and updates, more efficient bulk updates.

Review of attachment 8380831 [details] [diff] [review]:
-----------------------------------------------------------------

I really gave this the college try.  It looks good to me, but I'd be *shocked* if we haven't missed a case.

::: mobile/android/base/db/BrowserProvider.java
@@ +654,5 @@
>                          new String[] { Long.toString(ContentUris.parseId(uri)) });
>                  // fall through
>              case HISTORY: {
>                  debug("Updating history: " + uri);
> +                if (shouldUpdateOrInsert(uri)) {

Would have preferred the bracing noise separately.  Out of interest, post to mobile-firefox-dev if it's possible to make Eclipse do this (and only this) automatically.

@@ +924,5 @@
>  
>      /**
>       * Update the positions of bookmarks in batches.
>       *
> +     * Establishes and finishes its own transactions.

Begins and ends.  Why introduce new nomenclature?

@@ +1099,2 @@
>          try {
> +            ids = computeSQLInArgumentsFromLongs(cursor);

Is it worth skipping the big String[] allocation and building inClause below inline?  I know you're a big efficiency fan.

::: mobile/android/base/db/TransactionalProvider.java
@@ +56,5 @@
>  
>      /*
>       * Deletes items from the database within a DB transaction.
>       *
> +     * @param uri query URI

Harmonize this formatting with the formatting (and nicer descriptions) for |updateInTransaction| below.

@@ +60,5 @@
> +     * @param uri query URI
> +     * @param selection an optional filter to match rows to update
> +     * @param selectionArgs arguments for the selection
> +     *
> +     * @return number of rows impacted by the deletion

Is this different than "number of rows deleted"?  (Perhaps due to db constraint propagation?)

@@ +140,5 @@
> +        return Build.VERSION.SDK_INT >= 11;
> +    }
> +
> +    /**
> +     * This is a ThreadLocal separate from `db.inTransaction` because batched

This comment is great, but I'd like to make it even greater.  Give me a one line summary: "Track whether we're in a batch operation."

Then include the detail line "When we're in a batch...".

Can we then talk about intended use?

Then the caveat, and finally the note.

@@ +142,5 @@
> +
> +    /**
> +     * This is a ThreadLocal separate from `db.inTransaction` because batched
> +     * operations start transactions independent of individual ContentProvider
> +     * operations. This doesn't work well with the entire concept of this

What is this?  Be explicit.

@@ +169,5 @@
> +
> +    /**
> +     * If we're not currently in a transaction, and we should be, start one.
> +     */
> +    protected void beginWrite(final SQLiteDatabase db) {

These functions are a breath of fresh air in this file, but I *still* find them confusing, because |beginWrite| and |endWrite| don't bookend each other.  I really had to dig into the callers to get this straight.

Is there some semantic that means "start a transaction and set isInBatch", versus "start a transaction if I'm not already in batch"?

@@ +181,5 @@
> +            db.beginTransaction();
> +        }
> +    }
> +
> +    protected void markWriteSuccessful(final SQLiteDatabase db) {

nit: the Javadoc above is nice.

@@ +183,5 @@
> +    }
> +
> +    protected void markWriteSuccessful(final SQLiteDatabase db) {
> +        if (isInBatch()) {
> +            return;

Any reason not to trace here, too?  Too frequent?

@@ +284,5 @@
>  
>          final SQLiteDatabase db = getWritableDatabase(uri);
>  
> +        debug("bulkInsert: explicitly starting transaction.");
> +        isInBatchOperation.set(Boolean.TRUE);

OK, I can see that this isInBatchOperation assignment bookends the one below.  It's weird that we manually set this flag and call |beginTransaction| ourselves.  Should this be wrapped up in something more transparent?  I suppose this is in TP; somebody's gotta do the actual transaction.
Attachment #8380831 - Flags: review+
(In reply to Nick Alexander :nalexander from comment #40)

> > +        // TODO: use computeSQLInClause
> 
> nit: full sentence.

'sok, the later parts actually do this, so the comment is gone.


> How do we know we're in a transaction, and why do we know this is the first
> operation?  Do the caller's expectations need to be stated in the JavaDoc
> too?

Global analysis, but I'll add a comment!

 
> Which isn't kosher?  What's the bad behaviour?  How will people realize
> they've violated this?  Nice to have this in the commit message, too.

Alas, there's little we can do to programmatically ensure that this failure mode is avoided, but at least there are comments everywhere, and it's hard to ignore all of the transaction-oriented stuff hanging around now. Will augment the commit message.
(In reply to Nick Alexander :nalexander from comment #42)

> Is it worth skipping the big String[] allocation and building inClause below
> inline?  I know you're a big efficiency fan.

Good spot.

> > +     * @return number of rows impacted by the deletion
> 
> Is this different than "number of rows deleted"?  (Perhaps due to db
> constraint propagation?)

It returns the value that the CP's own `deleteInTransaction` method returns, so I'd call that "it depends".

> Is there some semantic that means "start a transaction and set isInBatch",
> versus "start a transaction if I'm not already in batch"?

Introduced {begin,end}Batch, markBatchSuccessful to mirror *Write.
Final try push with review comments:

https://tbpl.mozilla.org/?tree=Try&rev=1f684f108906
Try looks like Boston in mid-March. I'll do some manual smoke testing in the morning, then land this.

If the general approach is supported by crash stats, I'll prepare the smallest possible fix -- likely a "busy? let's end that transaction and try again" patch -- for Aurora and Beta.
Hand-tested Android import, syncing, browsing. Looks good, even under extensive logging.

   https://hg.mozilla.org/integration/fx-team/rev/63059742d136
   https://hg.mozilla.org/integration/fx-team/rev/ba49fd7dcf7e
   https://hg.mozilla.org/integration/fx-team/rev/8e8a2c4025ee
Target Milestone: --- → Firefox 30
There's nothing in crash reports to indicate that this regressed, so it's time to think about uplift.


Arguments in favor of uplifting all three parts:

* It seems to work, whereas a re-implementation on Aurora might not. For example, some of the superficially superfluous changes (removing interleaved reads and writes during expiry) might be a key part of the overall solution -- just making a transaction-related change might not be enough.
* It's not that much code.
* It's less work to uplift a completed patch set than to try something new.
* It removes risk from the next cycle by taking it in 29.


Arguments in favor of trying to distill some workaround for 29:

* This patch set is not a minimal change, so there's some inherent risk.


Does anyone have additional arguments for or against?
I'm in favor of uplift just for the fact that we'll only know if this patch actually has a positive effect with a user base larger than Nightly's.
Comment on attachment 8380831 [details] [diff] [review]
Part 2: transaction handling in deletes and updates, more efficient bulk updates.

Batch approval for all three parts.

[Approval Request Comment]
Bug caused by (feature/regressing bug #):
  Historical SQL usage patterns.
 
User impact if declined: 
  Continued baseline of crashes.

Testing completed (on m-c, etc.):
  Been baking on Nightly for some time. Automated tests pass.

Risk to taking this patch (and alternatives if risky): 
  Moderate but worthwhile. These patches aren't trivial, but they are thorough. At worst they should produce obvious failures, likely no worse than the current concurrency problems. At best this will eliminate a whole class of SQL bugs (meta: Bug 975024).

String or IDL/UUID changes made by this patch:
  None.
Attachment #8380831 - Flags: approval-mozilla-aurora?
Attachment #8380831 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This needs a lot of love for uplifting to Aurora.
Flags: needinfo?(rnewman)
This built and Robocop tests passed on my device. I inlined all of the TransactionalProvider bits into BrowserProvider.

https://hg.mozilla.org/releases/mozilla-aurora/rev/3fcdc52fc85f
Flags: needinfo?(rnewman)
I was just looking at that myself. Probably hg fuzz-factored wrong, or I screwed up a manual conflict resolution:

  03-06 18:55:48.242 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
19:00:20     INFO -  03-06 18:56:18.265 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
19:00:20     INFO -  03-06 18:56:48.289 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
19:00:20     INFO -  03-06 18:57:18.312 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
19:00:20     INFO -  03-06 18:57:48.312 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
19:00:20     INFO -  03-06 18:58:18.367 W/SQLiteDatabase( 2198): database lock has not been available for 30 sec. Current Owner of the lock is 94. Continuing to wait in thread: 93
I re-merged, ran a try build with some retriggers for good measure, and it's green so far:

https://tbpl.mozilla.org/?tree=Try&rev=aa3f376a5afa

Will reland when all of those are done.
(In reply to Richard Newman [:rnewman] from comment #57)
> I re-merged, ran a try build with some retriggers for good measure, and it's
> green so far:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=aa3f376a5afa
> 
> Will reland when all of those are done.

Turns out this try run didn't run anything from the Talos suite (including the trobocheck test that caused the backout.
Backed out in https://hg.mozilla.org/releases/mozilla-aurora/rev/fbbdc60eea00 - roboprovider, yeah, but given how many retries there are already on my few little retriggers of robocop-1, I think the massive retries on Try weren't a coincidence, either.
Thanks, chaps.
Found it. createFavicon was reimplemented in 30; in 29 it still calls insertFavicon, which opens but doesn't close a transaction.

Simple solution.

I think testBrowserProviderPerf is a pretty bad Talos test, though -- it opens two databases, inserting into one and querying the other, as far as I can tell. Nick might have fixed that in Bug 938821.
> I think testBrowserProviderPerf is a pretty bad Talos test, though -- it
> opens two databases, inserting into one and querying the other, as far as I
> can tell. Nick might have fixed that in Bug 938821.

If I did, it was unintentional.
I have a clean try build for this:

https://tbpl.mozilla.org/?tree=Try&rev=b6089b2ea2b3

but distributed favicons don't get loaded correctly for a clean-install Aurora build. Investigating.
Excellent, a derp. Works fine if I... mark the transaction as successful before committing it.

Relanded. If this bounces, I give up.

   https://hg.mozilla.org/releases/mozilla-aurora/rev/72c55d43d54a
   https://hg.mozilla.org/releases/mozilla-aurora/rev/6b2e2ac0d425
You need to log in before you can comment on or make changes to this bug.