Closed Bug 624051 Opened 9 years ago Closed 9 years ago

AsyncStatement destructor should finalize statements on the asynchronous thread when available to avoid blocking the main thread

Categories

(Toolkit :: Storage, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b10
Tracking Status
blocking2.0 --- final+

People

(Reporter: sdwilsh, Assigned: asuth)

References

Details

(Whiteboard: [hardblocker][fixed-in-places])

Attachments

(2 files, 1 obsolete file)

dolske pastebined a stack a little while ago that never made it into a bug.  The stack indicated that we were not finalizing a statement, and so we ended up finalizing it on the main thread when the callback (it was running async) released its reference to it.

We need to audit our code to make sure we finalize all our statements.
This blocks the final release because it causes lock contention.
blocking2.0: --- → final+
We might be able to fix this generically in storage too (which would be preferred actually).
gavin found the stack, luckily.  Here is the important bit:
nsRunnable::Release()
  mozilla::storage::(anonymous namespace)::CompletionNotifier::~CompletionNotifier()
    mozilla::storage::AsyncExecuteStatements::Release()
      mozilla::storage::AsyncStatement::Release()
        mozilla::storage::AsyncStatement::~AsyncStatement()
          mozilla::storage::StorageBaseStatementInternal::internalAsyncFinalize()
            sqlite3_finalize
              pthread_mutex_lock
                semaphore_wait_signal_trap
Whiteboard: [hard blocker]
Unassigned hard blocker, over to zpao.

sd
Assignee: nobody → paul
Bah. Started to add:

sdwilsh: if you've already got plans in your head or distributing the load to zpao doesn't make sense, feel free to yank back.
(In reply to comment #5)
> sdwilsh: if you've already got plans in your head or distributing the load to
> zpao doesn't make sense, feel free to yank back.
I think we can fix this in storage code generically instead of having a footgun like this around, but I've had no time to investigate it.  It's fun with threading, which I'm sure zpao will love.
Whiteboard: [hard blocker] → [hardblocker]
As a wallpaper, I've audited our code to check for unfinalized async statements, and I've not found any. I guess if this unfinalized stmt could be part of another module or a add-on instead.
(In reply to comment #7)
> As a wallpaper, I've audited our code to check for unfinalized async
> statements, and I've not found any. I guess if this unfinalized stmt could be
> part of another module or a add-on instead.
No, the stack I was seeing was totally us.  The only other thread that was holding the lock was one of our runnables in History.cpp.  This is likely a very subtle issue to spot in our own code, which is why the storage solution is the way to go.
(In reply to comment #8)
> (In reply to comment #7)
> No, the stack I was seeing was totally us.  The only other thread that was
> holding the lock was one of our runnables in History.cpp.  This is likely a
> very subtle issue to spot in our own code, which is why the storage solution is
> the way to go.

I thought in cpp land the statements were finalized when exiting scope, plus all statements in history.cpp are cached and finalized on shutdown...
(In reply to comment #9)
> I thought in cpp land the statements were finalized when exiting scope, plus
> all statements in history.cpp are cached and finalized on shutdown...
This is true, however the finalization that occurs when all references are released is different than calling finalize for async statements.  It causes the finalize to be called on the current thread, not the background one.

Like I said, this is a foot gun.
Basically, what we need to do is what asyncFinalize does now, but with less structure:
https://mxr.mozilla.org/mozilla-central/source/storage/src/StorageBaseStatementInternal.cpp#97

How do we do that, you ask?  Well, by making AsyncStatementFinalizer take a sqlite3_stmt object instead!  At this point, it makes sense to just merge internalAsyncFinalize and asyncFinalize (merge to the latter, not the former!).

If this seems way over anybody's head, feel free to unassign and I'll pick it up next week once I'm done with bug 606966.
Component: Places → Storage
QA Contact: places → storage
I believe the rationale for having the AsyncStatementFinalizer have a reference to the statement rather than the sqlite3_stmt was the threading analysis was simplified by having mAsyncStatement only ever initialized or accessed on the asynchronous thread.  Also, there is the very expected correct use-case where the caller issues an executeAsync immediately followed by a finalization; in that case mAsyncStatement will not exist at the time the finalization occurs.  I'm not sure how that fits in with your proposal, but I'm probably missing something.

I agree that it does make sense to have the "oh no! we are getting deleted and still have an mAsyncStatement" case dispatch the finalization to the async thread if it exists seems like a solid improvement.  (And from a threading analysis perspective, the fact that our reference count is hitting zero means that the asynchronous thread finished with our statement and released its refcount on our thread, which means that sufficient synchronization has occurred that we can reliably access mAsyncStatement at that point without hazard.)
(In reply to comment #12)
> I believe the rationale for having the AsyncStatementFinalizer have a reference
> to the statement rather than the sqlite3_stmt was the threading analysis was
> simplified by having mAsyncStatement only ever initialized or accessed on the
> asynchronous thread.  Also, there is the very expected correct use-case where
> the caller issues an executeAsync immediately followed by a finalization; in
> that case mAsyncStatement will not exist at the time the finalization occurs. 
> I'm not sure how that fits in with your proposal, but I'm probably missing
> something.
Hmm, yeah.  My proposal isn't quite right; we still need these semantics.

> I agree that it does make sense to have the "oh no! we are getting deleted and
> still have an mAsyncStatement" case dispatch the finalization to the async
> thread if it exists seems like a solid improvement.  (And from a threading
> analysis perspective, the fact that our reference count is hitting zero means
> that the asynchronous thread finished with our statement and released its
> refcount on our thread, which means that sufficient synchronization has
> occurred that we can reliably access mAsyncStatement at that point without
> hazard.)
That is what was happening in this case.

This requires a bit more care, but I think we have the right general approach to move forward.
Unassigning... I know we don't like unassigned blockers, but this is out of my league if we want this fixed in a reasonable amount of time. Shawn said he would pick it up next week, but if asuth, marco want to pick it up, I'm sure that would be fine too.
Assignee: paul → nobody
I can take this; it's one of the few firefox blockers I am well qualified to fix. :)
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Summary: Not finalizing a statement → AsyncStatement destructor should finalize statements on the asynchronous thread when available to avoid blocking the main thread
FWIW, I cannot think of a situation where the background thread would not be available (apart from shutdown, I guess).  We hold a strong reference to the connection object and it holds a strong reference to the the thread.
(In reply to comment #16)
> FWIW, I cannot think of a situation where the background thread would not be
> available (apart from shutdown, I guess).  We hold a strong reference to the
> connection object and it holds a strong reference to the the thread.

AsyncClose invokes setClosedState which causes getAsyncExecutionTarget to start returning null, so the sequence of:

conn.AsyncClose();
statementIDidNotFinalize = nsnull;

will fail to gain access to the async thread.

We could probably change our idiom to avoid or reduce this scenario, but I'm not sure it's a major benefit.
Attached patch unit test v1Splinter Review
Here is a unit test that fails without the fix.
Attached patch fix v1 (obsolete) — Splinter Review
Here is the fix that makes the unit test happy.

I have pushed to try and will also see if anything jumps out at me as dumb after lunch before asking review.  Feel free to skim test and fix for feedback before then if you are bored, sdwilsh.
(In reply to comment #17)
> AsyncClose invokes setClosedState which causes getAsyncExecutionTarget to start
> returning null, so the sequence of:
> 
> conn.AsyncClose();
> statementIDidNotFinalize = nsnull;
> 
> will fail to gain access to the async thread.
> 
> We could probably change our idiom to avoid or reduce this scenario, but I'm
> not sure it's a major benefit.
Sure, but closing the connection and then finalizing statements is silly anyway (although it could happen in C++ land if you close at odd times and don't finalize your local statements).
Did not look to terribly closely at this, but first pass comments:
http://reviews.visophyte.org/r/503936/

on file: storage/src/StorageBaseStatementInternal.cpp line 87
>   // It is vital that this remain an nsCOMPtr for NS_ProxyRelease's benefit.
>   nsCOMPtr<StorageBaseStatementInternal> mStatement;

This is actually fixed on trunk, and it could be an nsRefPtr again.


on file: storage/src/StorageBaseStatementInternal.cpp line 93
>  * Finalize a sqlite3_stmt on the background thread for a statement whose
>  * destructor was invoked and the statement was non-null and not yet finalized.

We NULL it out after we finalize, so just saying not yet finalized will be
sufficient.


on file: storage/src/StorageBaseStatementInternal.cpp line 120
>     // NS_ProxyRelease wants an nsCOMPtr, but our ambiguous nsISupports means
>     // trouble.  This is a cop-out.
>     nsCOMPtr<nsISupports> releaseRef(
>       NS_ISUPPORTS_CAST(mozIStorageConnection *, mConnection));
>     (void)::NS_ProxyRelease(mConnection->threadOpenedOn, releaseRef);

nsRefPtrs work now as of
https://hg.mozilla.org/mozilla-central/rev/d65c65ef13a4


on file: storage/src/StorageBaseStatementInternal.cpp line 170
>     nsCOMPtr<nsIRunnable> event =
>       new LastDitchSqliteStatementFinalizer(mDBConnection, mAsyncStatement);
>     if (event &&

|new| won't return NULL on trunk anymore, so we don't need to check |event|
(In reply to comment #12)
> I believe the rationale for having the AsyncStatementFinalizer have a reference
> to the statement rather than the sqlite3_stmt was the threading analysis was
> simplified by having mAsyncStatement only ever initialized or accessed on the
> asynchronous thread.  Also, there is the very expected correct use-case where
> the caller issues an executeAsync immediately followed by a finalization; in
> that case mAsyncStatement will not exist at the time the finalization occurs. 
> I'm not sure how that fits in with your proposal, but I'm probably missing
> something.
We could fix this by creating a new object that stores the sqlite3_stmt pointer (and could act like one with operator overloading), have it have threadsafe refcounting, and pass it around between the threads.  At least, I think so.  But doing so probably isn't worth it for this bug.
(In reply to comment #21)
> on file: storage/src/StorageBaseStatementInternal.cpp line 120
> >     // NS_ProxyRelease wants an nsCOMPtr, but our ambiguous nsISupports means
> >     // trouble.  This is a cop-out.
> >     nsCOMPtr<nsISupports> releaseRef(
> >       NS_ISUPPORTS_CAST(mozIStorageConnection *, mConnection));
> >     (void)::NS_ProxyRelease(mConnection->threadOpenedOn, releaseRef);
> 
> nsRefPtrs work now as of
> https://hg.mozilla.org/mozilla-central/rev/d65c65ef13a4

The problem is that Connection has an ambiguous nsISupports.  The act of providing a non-ambiguous nsISupports loses the type information that triggers either of the templates.

My hack here was dangerously wrong since it did not guarantee our runnable forgot about its reference; I was in too much of a hurry to get to lunch and stem the horrible pain of XPCOM and glitched.  I am resolving the correctness and XPCOM issues by just inlining what the template gets up to so we avoid spurious or erroneous reference count manipulation.  If you would like to introduce a new proxy release helper that handles the ambiguity case in a separate bug, I can block on that bug.
I pushed this to try as well.  The last try push generally succeeded, although not everything got to run before the co-lo fell over.
Attachment #503936 - Attachment is obsolete: true
Attachment #504145 - Flags: review?(sdwilsh)
Attachment #503935 - Flags: review?(sdwilsh)
Although I left my patches stacked on top of an orange prone revision, the try server seems fine with the revised patch.
I don't care enough to fix the ambiguous case (and I don't think there is a trivial fix there).
Comment on attachment 503935 [details] [diff] [review]
unit test v1

r=sdwilsh
Attachment #503935 - Flags: review?(sdwilsh) → review+
Comment on attachment 504145 [details] [diff] [review]
v2 fix with requested changes and corrected proxy release

>+++ b/storage/src/StorageBaseStatementInternal.cpp
>+  LastDitchSqliteStatementFinalizer(Connection *aConnection,
>+                                    sqlite3_stmt *aStatement)
nit: make this nsRefPtr<Connection> & instead

>+  : mConnection(aConnection)
>+  , mAsyncStatement(aStatement)
>+  {
>+  }
And add an NS_PRECONDITION here to assert that aConnection is not null.

r=sdwilsh
Attachment #504145 - Flags: review?(sdwilsh) → review+
nits fixed, patches folded together.

pushed to places branch per sdwilsh:
http://hg.mozilla.org/projects/places/rev/68addc97be35

re-assigning to sdwilsh for merge to mozilla-central
Assignee: bugmail → sdwilsh
Whiteboard: [hardblocker] → [hardblocker][fixed-in-places]
The landing of the fix made a serious bug in places' test_tags.js much more evident.  I filed bug 627300 on the problem and landed a fix on the places branch.
Depends on: 627300
http://hg.mozilla.org/mozilla-central/rev/68addc97be35
Assignee: sdwilsh → bugmail
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b10
You need to log in before you can comment on or make changes to this bug.