Closed Bug 689229 Opened 13 years ago Closed 13 years ago

mozStoragePrivateHelpers functions should not call sqlite3_extended_result_codes, because that function blocks

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

+++ This bug was initially created as a clone of Bug 686025 +++

In sqlite.c:

SQLITE_API int sqlite3_extended_result_codes(sqlite3 *db, int onoff){
  sqlite3_mutex_enter(db->mutex);
  db->errMask = onoff ? 0xffffffff : 0xff;
  sqlite3_mutex_leave(db->mutex);
  return SQLITE_OK;
}

mozStoragePrivateHelpers.cpp's prepareStmt seems not to believe that this function can block:

int
prepareStmt(sqlite3* aDatabase,
            const nsCString &aSQL,
            sqlite3_stmt **_stmt)
{
  bool checkedMainThread = false;

  (void)::sqlite3_extended_result_codes(aDatabase, 1);

  int srv;
  while((srv = ::sqlite3_prepare_v2(aDatabase, aSQL.get(), -1, _stmt, NULL)) ==
        SQLITE_LOCKED_SHAREDCACHE) {
    if (!checkedMainThread) {
      checkedMainThread = true;
      if (NS_IsMainThread()) {
        NS_WARNING("We won't allow blocking on the main thread!");
        break;
      }
    }

The main thread is blocked on sqlite3_extend_result_codes in both attachments from bug 686025 (attachment 559589 [details] and attachment 559674 [details]).
I think you are misunderstanding the intent of the code.  sqlite3_prepare_v2 will also acquire the connection mutex as well.  I read the code as attempting to avoid a busy-wait by transitioning to a mutex-notification scheme rather than looping in a tight loop on a call to sqlite3_prepare_v2.  The specific return value, I believe, is related to another connection (and therefore protected by a different mutex) having acquired the shared-cache, thereby making it impossible for our connection to do anything.

This is the same problem as the other bug; if you use the same connection for both sync and async purposes, you can get into a heap of trouble.
Asuth, check my reading of sqlite3.c itself in light of the main thread and thread 20 stacks in bug 686025 attachment 559589 [details] -- it seems to me sqlite3.c (concatenated for my reading convenience) holds a mutex across a single opcode step. But some of the opcodes seem to do quite a bit. And sqlite3_step loops holding the mutex in case of SQLITE_SCHEMA (schema change).

This all smells really bad. There should be no mutex held for long, more than a few dozens of instructions.

Evidently something is going wrong. The hangs are not deadlocks or iloops, though, since reporters say they make (very slow, painful) progress.

Could we have spurious schema changes, which take a long time, finally hitting the retry limit (5)?

That attachment blizzard captured seems like it has the perpetrator dead to rights.

/be
(In reply to Andrew Sutherland (:asuth) from comment #1)
> I think you are misunderstanding the intent of the code.  sqlite3_prepare_v2
> will also acquire the connection mutex as well.  I read the code as
> attempting to avoid a busy-wait by transitioning to a mutex-notification
> scheme rather than looping in a tight loop on a call to sqlite3_prepare_v2. 

I might still be misunderstanding, but I don't think this is what's happening here.  I didn't post the whole contents of the loop above; those two close braces close two if statements.  The loop appears to be a condvar-wait loop; in no case will we busy-wait (thankfully!).

Here's the whole function:

int
prepareStmt(sqlite3* aDatabase,
            const nsCString &aSQL,
            sqlite3_stmt **_stmt)
{
  bool checkedMainThread = false;

  (void)::sqlite3_extended_result_codes(aDatabase, 1);

  int srv;
  while((srv = ::sqlite3_prepare_v2(aDatabase, aSQL.get(), -1, _stmt, NULL)) ==
        SQLITE_LOCKED_SHAREDCACHE) {
    if (!checkedMainThread) {
      checkedMainThread = true;
      if (NS_IsMainThread()) {
        NS_WARNING("We won't allow blocking on the main thread!");
        break;
      }
    }

    srv = WaitForUnlockNotify(aDatabase);
    if (srv != SQLITE_OK)
      break;
  }

  if (srv != SQLITE_OK) {
    // [snip] log an error
  }

  (void)::sqlite3_extended_result_codes(aDatabase, 0);
  // Drop off the extended result bits of the result code.
  return srv & 0xFF;
}

Notice that if we break, we return an error from prepareStmt.  This suggests to me that this function is not supposed to block.

> sqlite3_prepare_v2 will also acquire the connection mutex as well.

Ah, |sqlite3_mutex_enter(db->mutex);| is a per-connection mutex.  See sqlite3.c:110024.

So now I'm really confused.  In the stack trace, the main thread and the storage thread are both blocked on this mutex.

So the main thread and the storage thread both use the same connection?  Sync versus async issues aside, shouldn't we change this?  Wouldn't that remove all contention for this lock?

> This is the same problem as the other bug; if you use the same connection
> for both sync and async purposes, you can get into a heap of trouble.

Isn't the problem that we're using the same connection on two threads?

> [re: SQLITE_LOCKED_SHAREDCACHE]
> The specific return value, I believe, is related to another connection (and
> therefore protected by a different mutex) having acquired the shared-cache,
> thereby making it impossible for our connection to do anything.

AFAICT, we never call sqlite3_enable_shared_cache(), so I'm not even sure how we could get this error code.
(In reply to Brendan Eich [:brendan] from comment #2)
> Asuth, check my reading of sqlite3.c itself in light of the main thread and
> thread 20 stacks in bug 686025 attachment 559589 [details] -- it seems to me
> sqlite3.c (concatenated for my reading convenience) holds a mutex across a
> single opcode step. But some of the opcodes seem to do quite a bit. And
> sqlite3_step loops holding the mutex in case of SQLITE_SCHEMA (schema
> change).

The mutex can actually be held much longer than that; sqlite3_step only returns when it has a result row to provide to the caller or the program terminates.  A single call to sqlite3_step can end up doing all the I/O work if the program builds a temporary table.

Unfortunately the stack details seem like there may be some inlining obscuring what is actually happen.  I would presume that the cursor movement may be blocking on some type of I/O but without statistical profiling info, it's hard to know for sure.

> Could we have spurious schema changes, which take a long time, finally
> hitting the retry limit (5)?

I would doubt this unless Places gets up to schema changes at runtime...
 
> That attachment blizzard captured seems like it has the perpetrator dead to
> rights.

Yes, the immediate problem is definitely the Places code causing the main thread to attempt to take the mutex that the async thread is using.  It would still be good to know what statement is capable of taking that long.
(In reply to Andrew Sutherland (:asuth) from comment #4)
> > Could we have spurious schema changes, which take a long time, finally
> > hitting the retry limit (5)?
> 
> I would doubt this unless Places gets up to schema changes at runtime...

Not just schema changes cause these, some pragmas causes statements reparse too, for example indexedDB spawn a bunch of connections and each time one of this is inited PRAGMA foreign_keys=1 causes all connections to reparse statements, when lots of connections are open in a loop we end up reparsing statements multiple times (I've accounted up to 8 consecutive reparse).
(In reply to Justin Lebar [:jlebar] from comment #3)
> Notice that if we break, we return an error from prepareStmt.  This suggests
> to me that this function is not supposed to block.

prepareStmt will necessarily acquire the connection mutex; there is no way to prepare the statement without doing so.  For async statements we perform the preparation on the async thread to deal with this.  What I meant in my previous comment was that the bug is calling createStatement which calls prepareStmt which will attempt to acquire the lock.
 
> So now I'm really confused.  In the stack trace, the main thread and the
> storage thread are both blocked on this mutex.

Both stacks show the (well, an) async thread successfully executing in sqlite3VdbeExec, so I would characterize it as the main thread is blocked on the mutex and the storage thread wants a copy.
 
> So the main thread and the storage thread both use the same connection? 
> Sync versus async issues aside, shouldn't we change this?  Wouldn't that
> remove all contention for this lock?

Yes, although Places lives a very complicated life, because I believe it still exposes some synchronous APIs.  I believe it tries to still support some APIs but not experience the problem by using the write-ahead-log and separate connections.  I can't speak to that, however, and am not fully up on the nuances of the WAL.

> Isn't the problem that we're using the same connection on two threads?

Yes, though again, the source bug was that the wrong call was used.  That connection could indeed be a pure-async connection, but if you call createStatement instead of createAsyncStatement, you just shot your foot.

> AFAICT, we never call sqlite3_enable_shared_cache(), so I'm not even sure
> how we could get this error code.

You want to look for the SQLITE_OPEN_SHAREDCACHE open flag.
(In reply to Marco Bonardo [:mak] from comment #5)
> (In reply to Andrew Sutherland (:asuth) from comment #4)
> > > Could we have spurious schema changes, which take a long time, finally
> > > hitting the retry limit (5)?
> > 
> > I would doubt this unless Places gets up to schema changes at runtime...
> 
> Not just schema changes cause these, some pragmas causes statements reparse
> too, for example indexedDB spawn a bunch of connections and each time one of
> this is inited PRAGMA foreign_keys=1 causes all connections to reparse
> statements, when lots of connections are open in a loop we end up reparsing
> statements multiple times (I've accounted up to 8 consecutive reparse).

It appears this is connection-local:
- flagPragma generates an OP_Expire
- OP_Expire calls sqlite3ExpirePreparedStatements
- that dude just expires all prepared statements on the connection

So unless places is issuing pragmas itself and interleaving them with extremely complicated SQL statement invocations, this does not seem like it should be a problem in this case?
(In reply to Andrew Sutherland (:asuth) from comment #7)
> So unless places is issuing pragmas itself and interleaving them with
> extremely complicated SQL statement invocations, this does not seem like it
> should be a problem in this case?

Right, it's not a problem in this case, I just wanted to point out there changing the schema is not the only way to cause expiration of statements, sorry if that was unclear.
Could a popular add-on be involved? Blizzard?

In any case, we have very hazardous APIs that are being misused. This is a bug to fix. Please file it.

Separately, I'm horrified by the mutex usage in sqlite3.c. It violates long-standing rules we've promulgated via NSPR (going back to NSPR 1; Alan Freier used to admonish people to avoid using mutexes for more than very short critical sections writing and reading coherent memory, and to use condvars for indefinite or just too-long waits and synchs). I'd like a bug on this, but would it go upstream to sqlite folks?

/be
> Separately, I'm horrified by the mutex usage in sqlite3.c.

AFAICT, the mutexes that are acquired all over the place (db->mutex) are, confusingly, per-connection, not per db.  So as I understand the problem isn't so much the mutexes but the fact that they're ever contended -- if we never touched a connection on two different threads, we'd never block on the mutex.
(In reply to Justin Lebar [:jlebar] from comment #10)
> > Separately, I'm horrified by the mutex usage in sqlite3.c.
> 
> AFAICT, the mutexes that are acquired all over the place (db->mutex) are,
> confusingly, per-connection, not per db.

The critical sections are still way overlarge.

> So as I understand the problem
> isn't so much the mutexes but the fact that they're ever contended -- if we
> never touched a connection on two different threads, we'd never block on the
> mutex.

This is the API hazard I mentioned in comment 9, or one of them. We really ought not expose blocking APIs to the main thread. Second, we really ought not expose connections to more than one thread.

/be
(In reply to Brendan Eich [:brendan] from comment #9)
> In any case, we have very hazardous APIs that are being misused. This is a
> bug to fix. Please file it.

I'm not sure there's anyone in charge of this or with the time/resources to actually do anything about the hazardous APIs.  As far as I understand, sdwilsh was the platform engineer in charge of storage and his new dayjob no longer allows him to work fulltime on the mozilla platform.  mak seems to work places?  (And I no longer work on Thunderbird/gloda, which was how I got involved.)
 
> Separately, I'm horrified by the mutex usage in sqlite3.c. It violates
> long-standing rules we've promulgated via NSPR (going back to NSPR 1; Alan
> Freier used to admonish people to avoid using mutexes for more than very
> short critical sections writing and reading coherent memory, and to use
> condvars for indefinite or just too-long waits and synchs). I'd like a bug
> on this, but would it go upstream to sqlite folks?

I think at this point, all of our problems are our own and not SQLite's.  With the write-ahead-log and proper use of multiple connections (properly used), most (all?) contention is avoidable.  You need some kind of protection against illegal usage of an API to maintain consistency, and they use a mutex to perform that protection.  Our calls could certainly use sqlite3_mutex_try to avoid getting into a blocking situation, but our semantics don't support that unless we want to play a game of nested event loop russian roulette.

In case it's not clear I 100% wish we only had a pure-async API that had no foot-guns.  Although a sync API that is main-thread only is okay, and likely required if we ever implement that bit of the IndexedDB spec with IndexedDB still backed by SQLite/mozStorage.
> In case it's not clear I 100% wish we only had a pure-async API that had no
> foot-guns.  Although a sync API that is main-thread only is okay, and likely

or rather, that is NON-main-thread-only, and is used only on that one thread.
(In reply to Andrew Sutherland (:asuth) from comment #12)
> In case it's not clear I 100% wish we only had a pure-async API that had no
> foot-guns.  Although a sync API that is main-thread only is okay, and likely
> required if we ever implement that bit of the IndexedDB spec with IndexedDB
> still backed by SQLite/mozStorage.

We'll almost certainly be backing IndexedDB with something other than SQLite by the time we get around to implementing the sync versions of the API.
Just to be clear the sync bits of IndexedDB are specified to exist only on workers, never on the main thread.
(In reply to Brendan Eich [:brendan] from comment #2)
> Asuth, check my reading of sqlite3.c itself in light of the main thread and
> thread 20 stacks in bug 686025 attachment 559589 [details] -- it seems to me
> sqlite3.c (concatenated for my reading convenience) holds a mutex across a
> single opcode step. But some of the opcodes seem to do quite a bit. And
> sqlite3_step loops holding the mutex in case of SQLITE_SCHEMA (schema
> change).
> 
> This all smells really bad. There should be no mutex held for long, more
> than a few dozens of instructions.
SQLite's threadsafety guarantee is simple: All access to the connection is serialized.
This has always been the case, we've just used it horribly.  mak and I have spent lots of time trying to fix it, but there are lots of database consumers, and Places in general is/has been understaffed to fix longstanding poor API choices we inherited when Places was "gifted" to us.

(In reply to Justin Lebar [:jlebar] from comment #3)
> So the main thread and the storage thread both use the same connection? 
> Sync versus async issues aside, shouldn't we change this?  Wouldn't that
> remove all contention for this lock?
Yes, we should!  Work has been underway to do this for years, but sadly we can't rewrite all of Places in a moments notice.  We have to slowly change things to be asynchronous, which then leaves us in this weird state (and because we still do lots of main-thread disk I/O, we have temporary tables to make that pain less severe, but that means we need to use the same connection to access the temporary table on both threads).

> AFAICT, we never call sqlite3_enable_shared_cache(), so I'm not even sure
> how we could get this error code.
http://hg.mozilla.org/mozilla-central/annotate/7f4867717226/storage/src/mozStorageService.cpp#l481, although Places shouldn't be opening a connection with a shared cache, because then SQLite serializes all access to the cache for all connections to the same database.  `mozIStorageService::openUnsharedDatabase` is your friend here.

(In reply to Brendan Eich [:brendan] from comment #11)
> This is the API hazard I mentioned in comment 9, or one of them. We really
> ought not expose blocking APIs to the main thread. Second, we really ought
> not expose connections to more than one thread.
Had we been smarter when we introduced the Storage API, we could have avoided these problems.  Alas, hindsight is 20/20.

(In reply to Andrew Sutherland (:asuth) from comment #12)
> I think at this point, all of our problems are our own and not SQLite's. 
> With the write-ahead-log and proper use of multiple connections (properly
> used), most (all?) contention is avoidable.  You need some kind of
> protection against illegal usage of an API to maintain consistency, and they
> use a mutex to perform that protection.  Our calls could certainly use
> sqlite3_mutex_try to avoid getting into a blocking situation, but our
> semantics don't support that unless we want to play a game of nested event
> loop russian roulette.
BINGO
> > So the main thread and the storage thread both use the same connection? 
> > Sync versus async issues aside, shouldn't we change this?  Wouldn't that
> > remove all contention for this lock?
> Yes, we should!  Work has been underway to do this for years, but sadly we can't rewrite all of 
> Places in a moments notice.  We have to slowly change things to be asynchronous

What I still don't understand is, why do we have to change things to be asynchronous in order to not use the same sqlite connection on two threads?

Does code care which sqlite connection it uses?  That is, do we rely on the connection's serialization guarantees?  (It sounds like we don't, since async doesn't have those guarantees.)  If not, why can't we add a level of indirection around the connections?  When you make a database query on the main thread using MozDBConnection C, you use sqlite connection X, but when you query the db off the main thread the same MozDBConnection, you use connection sqlite Y.
Ah, I missed the parenthetical about temporary tables.

Is that all that's keeping us from using the scheme in comment 17?  Is there no way to share an in-memory table across connections?
(In reply to Justin Lebar [:jlebar] from comment #18)
> Is that all that's keeping us from using the scheme in comment 17?  Is there
> no way to share an in-memory table across connections?
Pretty much, yeah.  No, because then you'd have to lock around access to those (and again, SQLite doesn't do table level-locking [mostly, but it's complicated and we don't want to go down that codepath]).

Note, however, that writers still block writers, even in the WAL world, so we can still get hit with the async connection doing some writing, and then the main thread comes along and tries to do some as well.  The only way to solve this forever is to make everything use the async API.
(In reply to Shawn Wilsher :sdwilsh from comment #19)
> Note, however, that writers still block writers, even in the WAL world

True, the main advantage is that writing to the wal is much cheaper than writing to the database, in NORMAL mode (that is the Storage default mode) there is no fsync involved. So in the WAL world writers block writers for a shorter time.
Actually, in Places the only TEMP stuff problem nowadays are 2 temp triggers, that ideally may be non-temp, they are temp for downgrade compatibility (if we make them physical and user downgrades to a previous version that has temp ones, he will have double triggers). I am planning from some time to make places downgrade path disruptive, thus start using places2.sqlite, places3.sqlite and on downgrade restore a partial history backup. At that point we'd be able to use a new schema without TEMP requirements and easily force all sync APIs to use a separate connection from async APIs. This should be faster than rewriting all sync APIs (that would still be the long term goal).
Actually, we may already do that with the exception that any query writing to moz_historyvisits should live in the connection with the triggers.
I'll file a bug to try to do this since I was already planning to improve concurrency in a similar way.
Thanks for explaining, Shawn, and thanks for filing that bug, Mak.  I'm going to close this bug, since it's not valid.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.