1371945 - Possible statement finalization crash on shutdown

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Description

•

7 years ago

While looking at some fancy crashes for a broken Try, where I forgot to finalize a statement, I noticed that we may be doing something unsafe on shutdown.
You can see some of the stacks here:
https://treeherder.mozilla.org/logviewer.html#?job_id=105904418&repo=try&lineNumber=30342
https://treeherder.mozilla.org/logviewer.html#?job_id=105904381&repo=try&lineNumber=13479
https://treeherder.mozilla.org/logviewer.html#?job_id=105904452&repo=try&lineNumber=23678

If we invoke sqlite3_close and it fails with SQLITE_BUSY, there are likely unfinalized statements.
At this point we use sqlite3_next_stmt() to find them and sqlite3_finalize() them.

The problem is that from this point on the stmt pointer is invalid.
Since all of our statements are created through the API, there is somewhere an object with a dangling mDBStatement pointer, that object will sooner or later be destroyed and its destructor may try to use that pointer.

Statement::internalFinalize checks mDBConnection->isClosed before trying to finalize, that in the end reads (with sharedAsyncExecutionMutex) mConnectionClosed.
internalClose sets mConnectionClosed to true (with the same mutex).

Is is possible that Statement::internalFinalize reads a false value, then the control passes to Connection::internalClose that sets it to true and finalizes the statement, then Statement::internalFinalize continues and tries to finalize it again (crashing at this point). Maybe we should keep sharedAsyncExecutionMutex longer in one of the 2 points to be sure the finalizations are mutually exclusive?

Andrew, do you think this is plausible or am I reading something wrong? Any further insight?
Note, here I'm assuming that it should be fine to invoke sqlite_finalize on the async thread, if it's not, then we are in bigger troubles.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 1

•

7 years ago

Yeah, this regressed in bug 914070 and I missed it in review.  The internalFinalize condition changed like so there:

-  if (!mDBConnection->isClosing(true)) {
+  if (!mDBConnection->isClosed()) {

The pre-patch isClosing(true) was equivalent to post-patch (isClosing() || isClosed()) and the correctness of the code below that depended on the isClosing() check to be there.  According to bug 914070 comment 6 you did this intentionally so that (sync) statements could finalize themselves up to the close actually happening.  (Which makes sense given that Places likes to use the sync API on the async thread, and I now grok that properly.)

Ideally, we'd have something like isConnectionReadyOnThisThread() that captures the nuance that the Sync API can be used on the owning thread or the async thread but they have different life-cycles.  That is, I think it's reasonable to:
* Enforce that the sync API is no longer legal to use on the owning thread once Close() or AsyncClose() has been invoked.
* Allow use of the sync API on the async thread up until internalClose() is invoked.

isConnectionReadyOnThisThread() could look something like the following and would make the internalFinalize logic correct again:
```
MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
// NB: we would no longer forget() this in AsyncClose() but
// instead keep it around until Connection::shutdownAsyncThread.
// This would mean bug 1166166's isAsyncExecutionThreadAvailable
// would need to change too.
if (mAsyncExecutionThread &&
    mAsyncExecutionThread->isOnCurrentThread()) {
  // inlined isClosed() without redundant lock-taking.
  return !mConnectionClosed;
}
return connectionReady();
```

Depending on whether Places is going to need to do more sync API things on the async thread that will potentially run after AsyncClose() has been called on the main thread, then it could make sense to introduce something like DBConnForThisThread() too.  isConnectionReadyOnInThisThread() or its callers could be re-written in terms of that.  If we get concerned about the overhead for consumers like QuotaManager clients that will never have an async thread, this could be "cleaned up" by templating that bit with traits.

Just holding the lock over the duration of the isClosed() and sqlite3_finalize call also works if the comments are updated to capture what's actually happening now.  My natural preference is to favor something more like isConnectionReadyOnThisThread() because it moves us closer towards a Communicating Sequential Processes implementation, but I'm not sure it's worth the churn.

Flags: needinfo?(bugmail)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 2

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #1)
> According to bug 914070 comment 6 you did
> this intentionally so that (sync) statements could finalize themselves up to
> the close actually happening.  (Which makes sense given that Places likes to
> use the sync API on the async thread, and I now grok that properly.)

Places uses the sync API on the async thread, but the kind of work we may do on shutdown is not critical, so it's fine if statement creation fails and we bail out. And I think that's what happens already, since we set Places mClosed = true before asyncClose, and that stops creating any kind of statements.
Moreover asyncClose invokes setClosedState immediately, and that, by setting mDBConn = nullptr disallows the creation of new statements, regardless.

The problem is we keep statements cached using a mozilla::storage::StatementCache created on the async thread, and that StatementCache needs to be finalized on the thread where it was created, afaict. What happened at that point was that:
1. mAsyncExecutionThreadShuttingDown was set to true immediately by asyncClose
2. the async thread runnable to finalize the statements in the StatementCache ran
3. internalFinalize was checking isClosing(true), that returned true, thus it thought it was too late to finalize statements
4. we were entering internalClose with unfinalized statements

Thus, I allowed finalizing statements up to the point where the connection is really closing, to allow a StatementCache to be used on the async thread. I missed this thread-safety issue though.

So, it's not a problem with the sync API being available after asyncClose has been invoked (We don't care about that), the problem is with allowing statements to be finalized after asyncClose has been invoked.

isConnectionReadyOnThisThread() sounds like a nice solution to have the sync API usable up to the real close, but that's not what we need, we just need internalFinalize to be "usable" up to that point. Surely we could use isConnectionReadyOnThisThread() in internalFinalize, but the current status tracking for closing/closed sounds enough bookkeeping for that single thing.
Thus, I was more prone towards enlarging the mutex.

The problem is that enlarging the mutex in internalFinalize means isClosed cannot autolock it, thus I'd need to wrap all the isClosed calls around :(
Could we instead autoLock the whole ::sqlite3_next_stmt loop?

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 3

•

7 years ago

That's good news about places!  Although, to be clear, I wasn't thinking that Places was doing crazy complex stuff at shutdown.  More just that Places could have interesting tasks that were queued on the async thread before AsyncClose() was issued and that one would assume would run to completion like the connection hadn't been closed.  At least unless they were explicitly canceled.  It sounds like the status quo is good enough, though.


So we're on the same page, AIUI, the race you proposed in comment 0 can only happen where we have a Statement being finalized on the main thread.  This is because for the async thread to even exist, we required that internalClose() is being executed on the async thread because Close() will just error out on the main thread (for now).  Given your previous comments, it seems like this would be most likely to happen because of a Statement's destructor being invoked.


In terms of isClosed(), it's also possible to add an overloaded isClosed(MutexAutoLock&) where the MutexAutoLock reference is a "proof of lock".

I think I'd suggest:
- Add isClosed(MutexAutoLock&)
- Use it in Statement::internalFinalize, with the sharedAsyncExecutionMutex held for the isClosed(locked) call and the sqlite3_finalize call.  Expanding the autolock in Connection::internalClose() so that mConnectionClosed=true happens atomically with all the finalizations doesn't address the underlying race you're talking about.  internalFinalize's isClosed() check could still return false on the main thread, we'd then drop the lock and inevitably head towards the risky sqlite3_finalize.  We must continue to hold a lock after checking isClosed() or all is lost, so we need to pass it in.
- Add a `SQLiteMutexAutoLock lockedScope(sharedDBMutex);` to the (srv == SQLITE_BUSY) block in Connection::internalClose().  That absolutely seems like it should be a single atomic manipulation and that no other code should be able to do anything with our SQLite connection while we're doing that.
- That raises the issue that once we invoke sqlite3_close() our sharedDBMutex is also a dangling pointer.  We generation check mDBConn, but it does seem like maybe we should be nulling its mMutex out again.  Also, we should change SQLiteMutex's NS_ASSERTIONs to MOZ_ASSERTs I think.

Flags: needinfo?(bugmail)

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/148534/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/148534/

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Assignee: nobody → mak77

Status: NEW → ASSIGNED

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Blocks: 1371677

Andrew Sutherland [:asuth] (he/him)

Comment 5

•

7 years ago

mozreview-review

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

https://reviewboard.mozilla.org/r/148534/#review152948

Great cleanup!  Just a few comment-updating nits.

::: storage/mozStorageStatement.cpp:355
(Diff revision 1)
>  <span class="indent">>></span>    //
>  <span class="indent">>></span>    // The connection is still open. While statement finalization and
>  <span class="indent">>></span>    // closing may, in some cases, take place in two distinct threads,
>  <span class="indent">>></span>    // we have a guarantee that the connection will remain open until
>  <span class="indent">>></span>    // this method terminates:
>  <span class="indent">>></span>    //
>  <span class="indent">>></span>    // a. The connection will be closed synchronously. In this case,
>  <span class="indent">>></span>    // there is no race condition, as everything takes place on the
>  <span class="indent">>></span>    // same thread.
>  <span class="indent">>></span>    //
>  <span class="indent">>></span>    // b. The connection is closed asynchronously and this code is
>  <span class="indent">>></span>    // executed on the opener thread. In this case, asyncClose() has
>  <span class="indent">>></span>    // not been called yet and will not be called before we return
>  <span class="indent">>></span>    // from this function.
>  <span class="indent">>></span>    //
>  <span class="indent">>></span>    // c. The connection is closed asynchronously and this code is
>  <span class="indent">>></span>    // executed on the async execution thread. In this case,
>  <span class="indent">>></span>    // AsyncCloseConnection::Run() has not been called yet and will
>  <span class="indent">>></span>    // not be called before we return from this function.
>  <span class="indent">>></span>    //
>  <span class="indent">>></span>    // In either case, the connection is still valid, hence closing
>  <span class="indent">>></span>    // here is safe.
>  <span class="indent">>></span>    //

I suggest just removing this comment block that's from the isClosing(true) era.  The intro paragraph's first sentence is ambiguous and case 'b' does not hold any longer.  Your new comment covers the important thing, which is that internalClose() won't be allowed to progress to its close logic.

::: storage/mozStorageStatement.cpp:385
(Diff revision 1)
>  <span class="indent">>></span>    srv = ::sqlite3_finalize(mDBStatement);
>  <span class="indent">>></span>  }
>  #ifdef DEBUG
> -  else {
> +    else {
> -    //
> +      //
> -    // The database connection is either closed or closing. The sqlite
> +      // The database connection is either closed or closing. The sqlite

This coudl be updated too: s/closed or closing/closed/.

Attachment #8877163 - Flags: review?(bugmail) → review+

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 6

•

7 years ago

I'm hitting some interesting failures in cookies:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e981636f4b14747b009e7e97d6eb21d146e757e

I wonder if there's a relation with the problem with asyncClose they have in bug 1158387.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 7

•

7 years ago

In particular, some statements in moz_cookies and moz_perms appear to not be finalized.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 8

•

7 years ago

And, we still end up crashing on a Sqlite assertion.
/home/worker/workspace/build/src/db/sqlite3/src/sqlite3.c:23350: pthreadMutexFree: Assertion `p->nRef==0' failed.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 9

•

7 years ago

Attached file refcnt_log.txt — Details

The cookies and permissions problem is due to the fact to finalize their cached statements they just nullify their owning pointers. Looks like something else has an owning pointer on those statements, and at the time internalClose runs, that pointer is still alive, so statements are not finalized and internalClose complains.
I tried dumping the refcnt log, and indeed I see that after the cookie service nullifies the statements in CleanupCachedStatements (just before invoking asyncClose), there is still one ref in AsyncExecuteStatements::mStatements, that is cleared in AsyncExecuteStatements::notifyComplete.

I'd actually expect the finalization to happen before internalClose(), but at this point I'm not sure, destructorAsyncFinalize() is a bit special.

It's late now and I don't have enough concentration to figure out the whole story, so let's see tomorrow.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 10

•

7 years ago

So what happens is that asyncClose forget()s mAsyncExecutionThread to AsyncCloseConnection, then AsyncExecuteStatements removes the last strong ref and the async statement destructor runs, but it can't getAsyncExecutionTarget, thus it just sets mAsyncStatement = nullptr and let the connection do the finalization (but the connection ASSERTs at that point since not all the statements have been finalized). Clearly we could ->Finalize() the statements to workaround the problem, but it would be nicer if async execution could do that by itself :(

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 11

•

7 years ago

Comment 1 may be right, that we should stop forgetting mAsyncExecutionThread until shutdownAsyncThread. My only fear is that we may have code that bases its decisions on the existance of mAsyncExecutionThread like we do with mDBConn.

Another possibility could be, in destructorAsyncFinalize, to assume that if we are not on the owner thread we must be on the async thread, and avoid the getAsyncExecutionTarget call if so. On the other side, this would assume the consumers are good to us and won't take references on other threads, that's not a given.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 12

•

7 years ago

Even not forgetting mAsyncExecutionThread, we still set mAsyncExecutionThreadShuttingDown and getAsyncExecutionTarget doesn't return the thread if we are shutting down. I'll try to also change that so we won't create it, but return it if present, and see what happens.

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/3-4/

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 16

•

7 years ago

Testing on Try, will re-ask for review when ready, in any case you can use the interdiff 1-latest to see the new changes, in case you have comments/suggestions before Try tells us if we are doomed.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Blocks: 1355561

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/7-8/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/8-9/

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/9-10/

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 23

•

7 years ago

Things would look good... if not for this intermittent Statement::internalFinalize crash in Jetpack, for which so far I have no idea:
https://treeherder.mozilla.org/logviewer.html#?job_id=107028059&repo=try&lineNumber=3566

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 24

•

7 years ago

If the line reference is correct, the crash is at sqlite3_mutex_enter usage inside sqlite3_finalize.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 25

•

7 years ago

This is in a debug build instead:
https://treeherder.mozilla.org/logviewer.html#?job_id=107041113&repo=try&lineNumber=3952

In both stacks, the async thread is executing a statement, and is waiting on isClosed().

I can't see a way where this may be caused by internalClose, since the enlarged mutex now would make us bailout.
I'm trying to figure out if there's any case where we could enter internalFinalize multiple times, but on the same thread we'd be serialized and bailout on the initial null check. Can both threads enter internalFinalize? If so (how?) maybe we should enlarge the async mutex to also protect the checks on mDBStatement?
Andrew, after lots of brainstorming, I'm a bit lost now. Any idea is welcome.

Flags: needinfo?(bugmail)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 26

•

7 years ago

Also, apart from the internalFinalize problem, the patch is ready for review of the 1-latest interdiff.

Andrew Sutherland [:asuth] (he/him)

Comment 27

•

7 years ago

Ugh, so we have a locking discipline problem going on.

In your crash, in thread 26, AsyncExecuteStatements::executeStatement is holding the SQL lock while calling into Connection::stepStatement. stepStatement calls isClosed which attempts to acquire the async lock. So our ordering is [SQL, async].

That is running up against thread 0 where Statement::internalFinalize acquires the async mutex, then invokes sqlite3_finalize which de facto attempts to acquire the SQL lock. So the ordering is [async, SQL] and that's how we get deadlock. And it's also what I explicitly suggested :(.

I think, like you proposed in comment 11 for destructorAsyncFinalize, and :dthayer raised and I responded to in bug 1166166 comment 25, the method needs to be explicitly thread-aware. There's a straightforward rationale for each thread that does not require taking the async mutex.

I suggest we:
- Maybe let bug 1166166 and the thread-checking finalizing land first? Or just have you take over that patch? There's a little too much overlap happening right now. That would take care of destructorAsyncFinalize initially.
- Perhaps extract out the core check that destructorAsyncFinalize grows to give us a lock-free isConnectionReadyOnThisThread() after all, since I think that's again what the check amounts to and it seems better to centralize the logic/comment/rationale on the Connection rather than repeatedly re-create it.
- It might make sense to build on your changes here to have mAsyncExecutionThread live longer and to use that for the check instead of threadOpenedOn. (Which also does invert what I told :dthayer in bug 1166166...) There's definitely only ever one async thread, so checking if it exists and if we're on that thread should avoid false positives. And with your changes where we only zero mAsyncExecutionThread when the thread is shutdown, it's a safe check.
- Have internalFinalize use that check.

Andrew Sutherland [:asuth] (he/him)

Updated

•

7 years ago

Flags: needinfo?(bugmail)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Depends on: 1166166

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

No longer blocks: 1371677

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 28

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #27)
> - Maybe let bug 1166166 and the thread-checking finalizing land first?

Absolutely yes, I clarified the dependencies.
Afaict that bug is ready to land?

> - Perhaps extract out the core check that destructorAsyncFinalize grows to
> give us a lock-free isConnectionReadyOnThisThread()

Sure, I can look into that.

>   - It might make sense to build on your changes here to have
> mAsyncExecutionThread live longer and to use that for the check instead of
> threadOpenedOn.

Yep, the changes to nullify mAsyncExecutionThread in ShutdownAsyncThread are already in this patch.

> - Have internalFinalize use that check.

SGTM, I could probably rebuild the tree of patches locally and see what try thinks about it with some tens retriggers.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 29

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #27)
>   - It might make sense to build on your changes here to have
> mAsyncExecutionThread live longer and to use that for the check instead of
> threadOpenedOn.  (Which also does invert what I told :dthayer in bug
> 1166166...)  There's definitely only ever one async thread, so checking if
> it exists and if we're on that thread should avoid false positives.

This is unclear, if now mAsyncExecutionThread should only be accessed from the opener thread, per the patch in bug 1166166, can we use it for a check in a method like isConnectionReadyOnThisThread that could be invoked from both threads?

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 30

•

7 years ago

Also, if we are on the async thread, doesn't that already mean the connection is still on?

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 31

•

7 years ago

finally, if we stop acquiring the shared execution mutex in internalFinalize, how do we prevent it from clashing with internalClose, so that only one of the 2 tries to finalize the same statement? We need some kind of synchronization between these two.
Are you maybe suggesting to use a mutex free isConnectionReadyOnThisThread in stepStatement instead?

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 32

•

7 years ago

I assume the comment in mAsyncExecutionThread should read "modified" and not "accessed" in bug 1166166.
What I did so far, is trying to replace as much isClosed() as possible with isConnectionReadyOnThisThread(), where the code doesn't seem to need specific synchronization. stepStatement, prepareStatement and a bunch of MOZ_ASSERTs should do with it.

I also removed isClosing() and made mConnectionClosed better mirror the connection status (for example the default value is true, changes to false in initializeInternal in case of success, and is set to true in internalClose). Clearly this requires an additional lock in initializeInternal.
This should reduce the number of states we can have, and maybe make things easier to follow.

I left internalClose and internalFinalize synchronization through the asyncExecutionMutex, I'm not sure how we'd proceed there, but now we should not deadlock anymore with stepStatement.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 33

•

7 years ago

For reference current changes are visible at
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e909d71e7dd41122a4f7ffcae8e8b5b17b3544e
results may be positive or negative, I only ran the Storage tests locally.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 34

•

7 years ago

Had to reintroduce isClosing... after all we can wait for connections that inited shutdown, not for the ones that weren't asked to close.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=780a3397f88b5ee45b40060a812a7ca6b42520e9

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/10-11/

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 36

•

7 years ago

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

The latest patch is rebased on top of bug 1166166

This is the corresponding Try.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=224680347698b47a09cd29a9b5ca0fdd7ec320fa

I'm not sure if I got all of your suggestions, I'm happy to make and test changes.

Attachment #8877163 - Flags: review+ → review?(bugmail)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 37

•

7 years ago

Note: I tried to add a couple asserts, but that didn't work for 2 reasons:
1. I tried to make the destructor assert if Close() or asyncClose() were not invoked before. In general it's better if the consumer takes care of that. Unfortunately looks like we have quite some connections relying on the destructor.
2. I tried to make isConnectionReadyOnThisThread work only on the one of the 2 threads (opener or helper), but looks like we have code that used the connection out of those 2 threads. For example the memory reporters seem to invoke our API on the main-thread even if the connection was created elsewhere :(

Andrew Sutherland [:asuth] (he/him)

Comment 38

•

7 years ago

mozreview-review

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

https://reviewboard.mozilla.org/r/148534/#review153552

A few nits/minutae, but I think we're there!  Thanks for taking on this non-trivial and churn-heavy cleanup and refactoring.  Between this and bug 1166166 I think we're making great progress forward in terms of our correctness and ability to reason about the multi-threaded logic thanks to improved primitives and their documentation.

::: storage/mozStorageConnection.h:145
(Diff revision 11)
>    };
>  
>    /**
>     * Lazily creates and returns a background execution thread.  In the future,
>     * the thread may be re-claimed if left idle, so you should call this
>     * method just before you dispatch and not save the reference.

I think we should also add the text: "This method will return null once AsyncClose() has been called."

I also have an action item below to make this true.

This is consistent with our existing semantics prior to our life extension of mAsyncExecutionThread.

::: storage/mozStorageConnection.h:232
(Diff revision 11)
>    nsresult rollbackTransactionInternal(sqlite3 *aNativeConnection);
>  
>    bool connectionReady();
>  
>    /**
> -   * True if this connection is shutting down but not yet closed.
> +   * Similar to connectionReady, but usable from both threads.

I think we should expand the info here, I suggest updating this to something like the following (feel free to use verbatim, as is always the case with my proposed comments):

Thread-aware version of connectionReady, results per caller's thread are:
- owner thread: Same as connectionReady().  True means we have a valid, un-closed database connection  and it's not going away until you invoke Close() or AsyncClose().
- async thread: Returns true at all times because you can't schedule runnables against the async thread after AsyncClose() has been called.  Therefore, the connection is still around if your code is running.
- any other thread: Race-prone Lies!  If you are main-thread code in mozStorageService iterating over the list of connections, you need to acquire the sharedAsyncExecutionMutex for the connection, invoke connectionReady() while holding it, and then continue to hold it while you do whatever you need to do.  This is because of off-main-thread consumers like dom/cache and IndexedDB and other QuotaManager clients.

::: storage/mozStorageConnection.cpp:504
(Diff revision 11)
>  : sharedAsyncExecutionMutex("Connection::sharedAsyncExecutionMutex")
>  , sharedDBMutex("Connection::sharedDBMutex")
>  , threadOpenedOn(do_GetCurrentThread())
>  , mDBConn(nullptr)
>  , mAsyncExecutionThreadShuttingDown(false)
> -#ifdef DEBUG
> +, mConnectionClosed(true)

I'm almost positive we want this to be false here, unless this is part of an aggressive deprecation strategy ;)

This suggests we probably want something like `MOZ_ASSERT_IF(mDBConn, !mConnectionClosed);` in isConnectionReadyOnThisThread() or elsewhere to ensure that the relationship between these state-tracking variables holds.

::: storage/mozStorageConnection.cpp:585
(Diff revision 11)
>  
> -#ifdef DEBUG
> -  mAsyncExecutionThreadIsAlive = true;
> -#endif
> -
>    return mAsyncExecutionThread;

We need to return nullptr here if mAsyncExecutionThreadShuttingDown per our contract.  I might suggest having the high-level control flow be:

* ENSURE_TRUE
* if (shuttingDown) return nullptr;
* if (!thread) { make thread }
* return thread

That flow avoids the more complex conditional.

::: storage/mozStorageConnection.cpp:687
(Diff revision 11)
>  nsresult
>  Connection::initializeInternal()
>  {
>    MOZ_ASSERT(mDBConn);
>  
> +  bool initSucceeded = false;

nit: The ScopeExit idiom would be to lose initSucceeded and just call guard.release(); instead.  That way you can assume initSucceeded==false if this logic runs.

::: storage/mozStorageConnection.cpp:742
(Diff revision 11)
>    nsAutoCString pageSizeQuery(MOZ_STORAGE_UNIQUIFY_QUERY_STR
>                                "PRAGMA page_size = ");
>    pageSizeQuery.AppendInt(pageSize);
> -  nsresult rv = ExecuteSimpleSQL(pageSizeQuery);
> -  NS_ENSURE_SUCCESS(rv, rv);
> +  int srv = executeSql(mDBConn, pageSizeQuery.get());
> +  if (srv != SQLITE_OK) {
> +    MOZ_ALWAYS_TRUE(::sqlite3_close(mDBConn) == SQLITE_OK);

nit: It seems like we should be able to move the close inside the ScopeExit lambda.  The stack will be slightly weirder than it is this way, but that gets our boilerplate down to the minimum and it's very common for RAII classes to assert at destruction time.  (For example, ErrorResult.)

::: storage/mozStorageConnection.cpp:942
(Diff revision 11)
>  Connection::isClosing()
>  {
> -  bool shuttingDown = false;
> -  {
> -    MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
> +  MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
> -    shuttingDown = mAsyncExecutionThreadShuttingDown;
> +  return mAsyncExecutionThreadShuttingDown && !mConnectionClosed;

I like this cleanup!  The previous separate isClosed() call with its own lock was sufficiently correct but hurt my brain.

::: storage/mozStorageConnection.cpp:962
(Diff revision 11)
> +
> +bool
>  Connection::isAsyncExecutionThreadAvailable()
>  {
> -  MOZ_ASSERT(NS_IsMainThread());
> +  MOZ_ASSERT(threadOpenedOn == NS_GetCurrentThread());
>    return !!mAsyncExecutionThread;

This may be a fixup you're going to do when but 1371945 lands, but to be clear, this needs to be changed to:
`return mAsyncExecutionThread && !mAsyncExecutionThreadShuttingDown;` in your patch since our contract is that you can safely Dispatch() to the returned thread, which does not hold true once mAsyncExecutionThreadShuttingDown becomes true.

(Noting that I did previously explicitly ask for what's here right now on that bug, having requested a change from `mAsyncExecutionThreadIsAlive && ! mAsyncExecutionThreadShuttingDown` but several things have changed since then.  I mention this to minimize my own confusion, and hopefully yours too!)

::: storage/mozStorageConnection.cpp:971
(Diff revision 11)
> -Connection::shutdownAsyncThread(nsIThread *aThread) {
> -  MOZ_ASSERT(!mAsyncExecutionThread);
> -  MOZ_ASSERT(mAsyncExecutionThreadIsAlive);
> +Connection::shutdownAsyncThread() {
> +  MOZ_ASSERT(threadOpenedOn == NS_GetCurrentThread());
> +  MOZ_ASSERT(mAsyncExecutionThread);
>    MOZ_ASSERT(mAsyncExecutionThreadShuttingDown);
>  
> -  DebugOnly<nsresult> rv = aThread->Shutdown();
> +  MOZ_ALWAYS_SUCCEEDS(mAsyncExecutionThread->Shutdown());

FYI I just learned: nsIThread has an asyncShutdown() method that prevents further dispatch to the thread but doesn't spin the event loop waiting for shutdown.  It would be desirable to use because it provides the additional guarantee that nothing else can get dispatched to the thread after we dispatch AsyncCloseConnection.  However, we don't actually need it as long as getAsyncExecutionTarget() returns null after AsyncClose() completes (and as long as Places is well-behaved and doesn't save the returned nsIEventTarget off).

We can't quite use it because of our new initializeOnAsyncThread() method.  Both Shutdown() and AsyncShutdown() can't be called on the async thread itself, which means initializeOnAsyncThread can't invoke it.  And here in shutdownAsyncThread(), calling Shutdown() on an already-AsyncShutdown() thread results in NS_ERROR_UNEXPECTED.  I suppose it would work if we relaxed MOZ_ALWAYS_SUCCEEDS or conditionalized it, but the latter likely requires an additional state variable.

So, in summary: fyi asyncShutdown() is neat but it seems like more complexity than it's worth for us to use it.

::: storage/mozStorageConnection.cpp:974
(Diff revision 11)
>    MOZ_ASSERT(mAsyncExecutionThreadShuttingDown);
>  
> -  DebugOnly<nsresult> rv = aThread->Shutdown();
> -  MOZ_ASSERT(NS_SUCCEEDED(rv));
> -#ifdef DEBUG
> -  mAsyncExecutionThreadIsAlive = false;
> +  MOZ_ALWAYS_SUCCEEDS(mAsyncExecutionThread->Shutdown());
> +
> +  {
> +    MutexAutoLock lockedScope(sharedAsyncExecutionMutex);

It's not clear to me why you're acquiring this lock.  Given that Shutdown() completed, the thread is super dead.  Please add a comment if you're keeping it.

::: storage/mozStorageService.cpp:132
(Diff revision 11)
>  
>      for (uint32_t i = 0; i < connections.Length(); i++) {
>        RefPtr<Connection> &conn = connections[i];
>  
>        // Someone may have closed the Connection, in which case we skip it.
> -      bool isReady;
> +      if (conn->isClosed()) {

Aha, so it turns out our current implementation is unsound because of the off-main-thread connection owners like dom/cache and IndexedDB.

Particularly relevant to any check we perform here is that ReportConn() invokes Connection::getSqliteRuntimeStatus() which does `sqlite3_db_status(mDBConn,...)` so we need to ensure that we have an mDBConn here.  And for the safety we currently lack, we need to ensure it doesn't go away.

I think this needs to be:
- acquire the connection's async mutex (new!)
- check conn->connectionReady() and `continue` if it's closed because we need the mDBConn.
- then proceed to acquire the sharedDBMutex and run the checks with the async lock still held.

And additionally:
- change Connection::setClosedState() so that the `mDBConn = nullptr;` happens inside the async mutex block.

With these changes, holding the async lock protects us from off-main-thread connection owners closing the connection.  Most specifically, it means that once we acquire the lock on the main thread and have checked mDBConn (via connectionReady), we know that if the off-thread owner tries to race against us by calling Close(), they will end up blocked inside Connection::setClosedState(), unable to progress to nulling out mDBConn and invoking internalClose().  We also know that if they raced us and won, we will connectionReady() will have returned false in our check.

We won't deadlock anymore because

::: storage/mozStorageService.cpp:349
(Diff revision 11)
>    nsTArray<RefPtr<Connection> > connections;
>    getConnections(connections);
>  
>    for (uint32_t i = 0; i < connections.Length(); i++) {
>      RefPtr<Connection> conn = connections[i];
> -    if (!conn->connectionReady())
> +    if (!conn->isConnectionReadyOnThisThread())

Since we can guarantee we're not the async thread and this therefore amounts to connectionReady(), I think it's better to change this back to connectionReady() and add a comment like:
"""
For non-main-thread owning/opening threads, we may be racing against them closing their connection or their thread.  That's okay, see below.
"""

And then in the wrong thread case, add:
"""
It's possible the connection is already closed or will be closed by the time our runnable runs.  ExecuteSimpleSQL will safely return with a failure in that case.  If the thread is shutting down or shut down, the dispatch will fail and that's okay.
"""

And maybe we should add "Unused <<" ahead of the Dispatch call to make it clear we don't care about its fate?

Attachment #8877163 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 39

•

7 years ago

re: your other questions/comments, I believe all your conclusions were correct and/or you found the answer to all of them and/or my comments on the review cover them.  Please needinfo me if there was something you wanted an answer to still that I missed.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Updated

•

7 years ago

Whiteboard: [fxsearch]

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/11-12/

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Assignee

Comment 41

•

7 years ago

Thank you Andrew, as usual your suggestions were extremely valuable.
Version 12 is the last one usable for interdiffs, the next version will be rebased on top of mozilla-central. Try is green.

Comment hidden (mozreview-request)

Comment on attachment 8877163 [details]
Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/148534/diff/12-13/

Pulsebot

Comment 43

•

7 years ago

Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/eacdba502064
Avoid a possible thread-safety problem with unfinalized statements. r=asuth

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 44

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eacdba502064

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox56: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

Bug 1371945 - Avoid a possible thread-safety problem with unfinalized statements. 7 years ago Marco Bonardo [:mak] (Away Apr 25 - May 5) 59 bytes, text/x-review-board-request	asuth : review+	Details
refcnt_log.txt 7 years ago Marco Bonardo [:mak] (Away Apr 25 - May 5) 10.21 KB, text/plain		Details