Closed Bug 528802 Opened 11 years ago Closed 11 years ago

Automatic construction of synchronous sqlite3 statement precludes complete asynchronous operation

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 507414

People

(Reporter: asuth, Assigned: asuth)

References

Details

Attachments

(1 file)

Any application meaningfully using the asynchronous storage support is going to want to completely avoid the sqlite mutex ever being taken from the main thread once it passes a certain point.

The current implementation has two main issues:
1) Synchronous sqlite3 statements are immediately created whenever new mozStorageStatements are created.  This acquires the sqlite3 mutex.
2) Asynchronous sqlite3 statements are created on the main thread (which takes the mutex) and they cannot be primed from the public XPCOM interface.

The attached preliminary patch addresses each problem:
1) It defers the creation of the synchronous sqlite3 statements until they are needed.
2) It creates the asynchronous sqlite3 statements on the asynchronous thread.

Notable implementation ramifications are:
- We save the SQL as a string on the Statement.
- You must bind by index (unless you are exclusively using statements prepared before you crossed the 'only async' line and you compelled the sync statements to be created and their name maps to be populated.)  Since actually knowing the number of binding slots requires that the synchronous statement be available, binding to an illegal index becomes an async runtime error rather than an error at bind time.  (However, if you had the sync statement available, the binding error will happen.)
- State management on the Statement becomes slightly trickier because the existence of the sync and async sqlite3_stmt's can no longer also serve as indicators of other things (like whether the statement has been finalized).

The patch adds a new unit test test_async_is_really_async.js which is really only intended for human verification that the mutex is not taken.  The idea is that you do a check-interactive on a debug build.  We use the 'debugger' hook (which you set a breakpoint on) as an indicator of when we are crossing into async-only territory.  When you hit that breakpoint, you set a conditional breakpoint on sqlite3_mutex_enter so that it only fires if the mutex is entered on the main thread.  We use 'debugger' again to mark when the test is completed.  If you get to the second 'debugger' breakpoint without sqlite3_mutex_enter firing, then the test can be thought of as passing.

Because the close-spinning-the-event-loop bug (bug 496019) is not yet landed and I haven't seen its final proposed version, this patch is not yet adapted for that patch.

I would appreciate feedback/input on this patch.  I am not entirely crazy about the potential subtle changes in semantics due to synchronous statements being created on demand.  I was trying to avoid changing any of the interfaces so that there is some chance of this getting into 1.9.2 so Thunderbird can either not suck or not have to do horrible branching things with 1.9.2. (I guess a fork is also an option.)  I really wish I had tried to do this earlier...

Gloda definitely needs to create SQL statements on the fly, so this is not a hypothetical issue.  The on-the-fly SQL is required both because you can't bind a set of things against an "IN" and because gloda builds complicated queries that just can't be known a priori.
Attachment #412471 - Flags: review?(sdwilsh)
It sounds like what you really want is bug 507414 where we can optimize the hell out of pure async usage.

(In reply to comment #0)
> The current implementation has two main issues:
> 1) Synchronous sqlite3 statements are immediately created whenever new
> mozStorageStatements are created.  This acquires the sqlite3 mutex.
Some consumers (namely, places) depend on the behavior for performance reasons.  They create statements at startup so they don't hang the UI creating them when they are first used.

> 2) Asynchronous sqlite3 statements are created on the main thread (which takes
> the mutex) and they cannot be primed from the public XPCOM interface.
This would likely be solved by bug 507414, right?
 
> 2) It creates the asynchronous sqlite3 statements on the asynchronous thread.
I strongly support this.

> - We save the SQL as a string on the Statement.
Oh sadfaces.  We used to do that and I removed it since we could get to it from sqlite3_sql to save memory.  I guess if we do this, we should make an accessor method that returns either our locally stored string, or the sqlite3_sql string.

> - You must bind by index (unless you are exclusively using statements prepared
> before you crossed the 'only async' line and you compelled the sync statements
> to be created and their name maps to be populated.)  Since actually knowing the
> number of binding slots requires that the synchronous statement be available,
> binding to an illegal index becomes an async runtime error rather than an error
> at bind time.  (However, if you had the sync statement available, the binding
> error will happen.)
We should be able to do this without imposing this, right?  I've been advocating using bindByName for a while now, and I'd hate to then break folks.  We'd just have to change how we store bindings, but this is certainly doable.  Having it be a runtime error (as in, when it's running on the background thread) is fine by me though.

> - State management on the Statement becomes slightly trickier because the
> existence of the sync and async sqlite3_stmt's can no longer also serve as
> indicators of other things (like whether the statement has been finalized).
This is the same probably I hit with bug 496019.  It's fun, isn't it!?

> The patch adds a new unit test test_async_is_really_async.js which is really
> only intended for human verification that the mutex is not taken.  The idea is
> that you do a check-interactive on a debug build.  We use the 'debugger' hook
> (which you set a breakpoint on) as an indicator of when we are crossing into
> async-only territory.  When you hit that breakpoint, you set a conditional
> breakpoint on sqlite3_mutex_enter so that it only fires if the mutex is entered
> on the main thread.  We use 'debugger' again to mark when the test is
> completed.  If you get to the second 'debugger' breakpoint without
> sqlite3_mutex_enter firing, then the test can be thought of as passing.
Realistically, you could write a native code test for this that will deadlock (and thus fail in debug builds and timeout in opt builds) if you really want to make sure it never regresses.  This feels like the right long-term approach, because we don't know who might be maintaining this module in the future.  It's not too terrible to write the test either, and I'd be happy to provide more details if you are unsure on how we'd go about it.

> Gloda definitely needs to create SQL statements on the fly, so this is not a
> hypothetical issue.  The on-the-fly SQL is required both because you can't bind
> a set of things against an "IN" and because gloda builds complicated queries
> that just can't be known a priori.
FWIW, the first issue is solveable with bound statements, but you'd have to use a temporary table to store the values in the IN statements.  It's not pretty, by any means.
(In reply to comment #1)
> It sounds like what you really want is bug 507414 where we can optimize the
> hell out of pure async usage.

Yes, that would address my use-case.  I presume if I want the functionality in 1.9.2, said new function should be exposed in the form of a new interface somehow?

 
> > 2) Asynchronous sqlite3 statements are created on the main thread (which takes
> > the mutex) and they cannot be primed from the public XPCOM interface.
> This would likely be solved by bug 507414, right?

As long as bug 507414 is fixed by creating them on the async thread, yes.


> > - We save the SQL as a string on the Statement.
> Oh sadfaces.  We used to do that and I removed it since we could get to it from
> sqlite3_sql to save memory.  I guess if we do this, we should make an accessor
> method that returns either our locally stored string, or the sqlite3_sql
> string.

I did this because sqlite3_sql acquires the mutex.  With the explicit async-API we could temporarily store it on a short-lived helper class that is just responsible for creating the async statement (like the close helper).  On-demand async creation can then revert to just stealing it from the sync statement.


> > - You must bind by index (unless you are exclusively using statements prepared
> > before you crossed the 'only async' line and you compelled the sync statements
> > to be created and their name maps to be populated.)  Since actually knowing the
> > number of binding slots requires that the synchronous statement be available,
> > binding to an illegal index becomes an async runtime error rather than an error
> > at bind time.  (However, if you had the sync statement available, the binding
> > error will happen.)
> We should be able to do this without imposing this, right?  I've been
> advocating using bindByName for a while now, and I'd hate to then break folks. 
> We'd just have to change how we store bindings, but this is certainly doable. 
> Having it be a runtime error (as in, when it's running on the background
> thread) is fine by me though.

Yes, this can be avoided.  In fact, if we could guarantee that we had a sqlite3_stmt that had already had sqlite3_bind_parameter_index/sqlite3_bind_parameter_name invoked on it, we could keep our efficient index-based storage mechanism.  Unfortunately, this requires 1 of 3 things:

1) The user does not attempt to bind to an async statement until we have notified them that the async statement has been created on the async thread (where we could force a call to sqlite3_bind_parameter_index).

2) Somehow establish our own mapping on the sync thread in a synchronous fashion.  This would require somehow getting sqlite to parse the SQL for us which seems likely to end up convoluted and dangerous.

3) Have a string-based fallback mode for the time before #1 effectively happens.
 

> > - State management on the Statement becomes slightly trickier because the
> > existence of the sync and async sqlite3_stmt's can no longer also serve as
> > indicators of other things (like whether the statement has been finalized).
> This is the same probably I hit with bug 496019.  It's fun, isn't it!?

Yeah, I was really sad I didn't have your patch available to work from or just use...


> Realistically, you could write a native code test for this that will deadlock
> (and thus fail in debug builds and timeout in opt builds) if you really want to
> make sure it never regresses.  This feels like the right long-term approach,
> because we don't know who might be maintaining this module in the future.  It's
> not too terrible to write the test either, and I'd be happy to provide more
> details if you are unsure on how we'd go about it.

Yes, the automated test sounds like a good idea.  I don't think we need to force a deadlock though.  The mutex routines are exposed via the configuration mechanism using SQLITE_CONFIG_GETMUTEX/SQLITE_CONFIG_MUTEX.  We should be able to wrap the mutex acquisition routine with one that sets a flag if invoked from the main thread.

 
> FWIW, the first issue is solveable with bound statements, but you'd have to use
> a temporary table to store the values in the IN statements.  It's not pretty,
> by any means.

If this is the only way for me to get fully async things happening in 1.9.2, then it is certainly an option.  (Unfortunately it is an option that requires me to speculatively pre-create a fairly large number of permutations of SQL statements and has an annoyingly high additional dispatch cost to prepare the temporary tables, I presume...)


I should stress that the patch I put up was entirely for discussion purposes*.  My goal is to have 1.9.2 be usable without patches (or the scary ugly stuff with temporary tables).  It sounds like perhaps we can just dupe this bug to 507414...

* The patch is going to be used on Thunderbird's (likely-happening regrettable) mozilla-1.9.1 branch since we need 1.9.2 storage and this patch does not require gloda to be changed if the patch is added or removed.  (Well, apart from the change to bind by index.)
(In reply to comment #2)
> Yes, this can be avoided.  In fact, if we could guarantee that we had a
> sqlite3_stmt that had already had
> sqlite3_bind_parameter_index/sqlite3_bind_parameter_name invoked on it, we
> could keep our efficient index-based storage mechanism.  Unfortunately, this
> requires 1 of 3 things:

Which is to say, sqlite3_bind_parameter_index/sqlite3_bind_parameter_name call createVarMap if the map is not already created.  createVarMap acquires the mutex, but sqlite3_bind_parameter_index/sqlite3_bind_parameter_name themselves do not.
(In reply to comment #2)
> Yes, that would address my use-case.  I presume if I want the functionality in
> 1.9.2, said new function should be exposed in the form of a new interface
> somehow?
For mozilla-central, you would add it on to mozIStorageConnection, and then for 1.9.2, you would just add a new interface - mozIStorageConnection_MOZILLA_1_9_2_ADDITIONS that has it on it.  Connection would implement both.

> As long as bug 507414 is fixed by creating them on the async thread, yes.
I don't see why we couldn't do this.

> I did this because sqlite3_sql acquires the mutex.  With the explicit async-API
> we could temporarily store it on a short-lived helper class that is just
> responsible for creating the async statement (like the close helper). 
> On-demand async creation can then revert to just stealing it from the sync
> statement.
It does?  Hrm, that's sad.  It's not like it's a lot of memory, so I'm fine with storing it locally.  No need to make things any more complicated than they currently are.

> Yes, this can be avoided.  In fact, if we could guarantee that we had a
> sqlite3_stmt that had already had
> sqlite3_bind_parameter_index/sqlite3_bind_parameter_name invoked on it, we
> could keep our efficient index-based storage mechanism.  Unfortunately, this
> requires 1 of 3 things:
The hash table to store bindings by name would be constant time look-up too, so the only concern is memory usage, right?  I think we can throw that out for speed and ease of use.

> Yes, the automated test sounds like a good idea.  I don't think we need to
> force a deadlock though.  The mutex routines are exposed via the configuration
> mechanism using SQLITE_CONFIG_GETMUTEX/SQLITE_CONFIG_MUTEX.  We should be able
> to wrap the mutex acquisition routine with one that sets a flag if invoked from
> the main thread.
True - I had forgotten about that interface.

> I should stress that the patch I put up was entirely for discussion purposes*. 
> My goal is to have 1.9.2 be usable without patches (or the scary ugly stuff
> with temporary tables).  It sounds like perhaps we can just dupe this bug to
> 507414...
That's why I didn't do a full code review :).  I'm happy with the duplicate.  Note that I won't have time to work on that bug until at least December, so you'll likely have to patch it.  I promise to iterate quick on the reviews!
Flags: blocking-thunderbird3.1+
(In reply to comment #2)
> I should stress that the patch I put up was entirely for discussion purposes*. 
> My goal is to have 1.9.2 be usable without patches (or the scary ugly stuff
> with temporary tables).  It sounds like perhaps we can just dupe this bug to
> 507414...
I can clear this review request then, right?
Comment on attachment 412471 [details] [diff] [review]
allow fully async operation v1

Yeah, sorry, I should have told you to r- or clear when you were done.
Attachment #412471 - Flags: review?(sdwilsh)
Whiteboard: [tb31needs]
Clearing obsolete flag; we'd still very much like to see this on the 1.9.2 branch for 3.1 beta 1 (which is still being scheduled).
Flags: blocking-thunderbird3.1+
Andrew, has this bug effectively become a DUP of bug 507414 and should just be marked as such?  If not, should we still keep this on the [tb31needs] list?
Indeed.  This could probably be resolved invalid but it's useful to make the conversation accessible from 507414.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Whiteboard: [tb31needs]
Duplicate of bug: 507414
Depends on: 545625
You need to log in before you can comment on or make changes to this bug.