Closed Bug 914070 Opened 6 years ago Closed 6 years ago

[Places] Many statements seem to be left unfinalized

Categories

(Toolkit :: Storage, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
mozilla31

People

(Reporter: Yoric, Assigned: mak)

References

()

Details

(Whiteboard: p=5 s=it-31c-30a-29b.3 [qa-])

Attachments

(3 files, 6 obsolete files)

See bug 914005 and its attachment 801340 [details].

List of statements:
* UPDATE moz_places SET hidden = 0 WHERE id = :page_id AND frecency <> 0
* UPDATE moz_places SET frecency = CALCULATE_FRECENCY(:page_id) WHERE id = :page_id
* SELECT id, visit_date FROM moz_historyvisits WHERE place_id = (SELECT id FROM moz_places WHERE url = :page_url) AND visit_date = :visit_date
* INSERT INTO moz_historyvisits (from_visit, place_id, visit_date, visit_type, session) VALUES (:from_visit, :page_id, :visit_date, :visit_type, 0)
* INSERT INTO moz_places (url, title, rev_host, hidden, typed, frecency, guid) VALUES (:url, :title, :rev_host, :hidden, :typed, :frecency, :guid)
* SELECT guid, id, title, hidden, typed, frecency FROM moz_places WHERE url = :page_url
* UPDATE moz_places SET favicon_id = :icon_id WHERE id = :page_id
* INSERT OR REPLACE INTO moz_favicons (id, url, data, mime_type, expiration, guid) VALUES ((SELECT id FROM moz_favicons WHERE url = :icon_url), :icon_url, :data, :mime_type, :expiration, COALESCE(:guid, (SELECT guid FROM moz_favicons WHERE url = :icon_url), GENERATE_GUID()))
* SELECT h.id, h.favicon_id, h.guid, ( SELECT h.url FROM moz_bookmarks b WHERE b.fk = h.id UNION ALL SELECT url FROM moz_places WHERE id = ( SELECT COALESCE(grandparent.place_id, parent.place_id) as r_place_id FROM moz_historyvisits dest LEFT JOIN moz_historyvisits parent ON parent.id = dest.from_visit AND dest.visit_type IN (5, 6) LEFT JOIN moz_historyvisits grandparent ON parent.from_visit = grandparent.id AND parent.visit_type IN (5, 6) WHERE dest.place_id = h.id AND EXISTS(SELECT 1 FROM moz_bookmarks b WHERE b.fk = r_place_id) LIMIT 1 ) ) FROM moz_places h WHERE h.url = :page_url
* SELECT id, expiration, data, mime_type FROM moz_favicons WHERE url = :icon_url
* /* do not warn (bug 659740 - SQLite may ignore index if few visits exist) */SELECT ROUND((strftime('%s','now','localtime','utc') - v.visit_date/1000000)/86400), IFNULL(r.visit_type, v.visit_type), v.visit_date FROM moz_historyvisits v LEFT JOIN moz_historyvisits r ON r.id = v.from_visit AND v.visit_type BETWEEN 5 AND 6 WHERE v.place_id = :page_id ORDER BY v.visit_date DESC
* SELECT typed, hidden, visit_count, (SELECT count(*) FROM moz_historyvisits WHERE place_id = :page_id), EXISTS (SELECT 1 FROM moz_bookmarks WHERE fk = :page_id), (url > 'place:' AND url < 'place;') FROM moz_places WHERE id = :page_id
It's not quite clear to me why we have these warnings. If they are pertinent, this might indicate that we have a race condition somewhere and we create some of these statements during shutdown of the database.
these statements are async...
Blocks: 914005
All these statements seem to be finalized with the same stack:

* thread #16: tid = 0x2f03, 0x000000010363ce01 XUL`mozilla::storage::Statement::internalFinalize(this=0x0000000115f29970, aDestructing=false) + 193 at mozStorageStatement.cpp:402, stop reason = breakpoint 1.1
    frame #0: 0x000000010363ce01 XUL`mozilla::storage::Statement::internalFinalize(this=0x0000000115f29970, aDestructing=false) + 193 at mozStorageStatement.cpp:402
    frame #1: 0x000000010363dfaa XUL`mozilla::storage::Statement::Finalize(this=0x0000000115f29970) + 26 at mozStorageStatement.cpp:351
    frame #2: 0x0000000102b51024 XUL`mozilla::storage::StatementCache<mozIStorageStatement>::FinalizeCachedStatements(aKey=0x000000011450e868, aStatement=0x000000011450e878) + 52 at StatementCache.h:98
    frame #3: 0x0000000102b5113d XUL`nsBaseHashtable<nsCStringHashKey, nsCOMPtr<mozIStorageStatement>, mozIStorageStatement*>::s_EnumStub(table=0x00000001145d86e0, hdr=0x000000011450e860, number=0, arg=0x0000000114a80b78) + 93 at nsBaseHashtable.h:397
    frame #4: 0x0000000104690840 XUL`PL_DHashTableEnumerate(table=0x00000001145d86e0, etor=0x0000000102b510e0, arg=0x0000000114a80b78) + 288 at pldhash.cpp:713
    frame #5: 0x0000000102b50fe1 XUL`nsBaseHashtable<nsCStringHashKey, nsCOMPtr<mozIStorageStatement>, mozIStorageStatement*>::Enumerate(this=0x00000001145d86e0, enumFunc=0x0000000102b50ff0, userArg=0x0000000000000000)(nsACString_internal const&, nsCOMPtr<mozIStorageStatement>&, void*), void*) + 129 at nsBaseHashtable.h:204
    frame #6: 0x0000000102b4e0e0 XUL`mozilla::storage::StatementCache<mozIStorageStatement>::FinalizeStatements(this=0x00000001145d86e0) + 48 at StatementCache.h:82
    frame #7: 0x000000010380d920 XUL`mozilla::places::FinalizeStatementCacheProxy<mozIStorageStatement>::Run(this=0x0000000110005430) + 32 at Helpers.h:180
    frame #8: 0x0000000104734d53 XUL`nsThread::ProcessNextEvent(this=0x0000000113c1d6d0, mayWait=true, result=0x0000000114a80dde) + 1731 at nsThread.cpp:622
    frame #9: 0x000000010468ce59 XUL`NS_ProcessNextEvent(thread=0x0000000113c1d6d0, mayWait=true) + 169 at nsThreadUtils.cpp:238
    frame #10: 0x00000001047336c7 XUL`nsThread::ThreadFunc(arg=0x0000000113c1d6d0) + 295 at nsThread.cpp:250
    frame #11: 0x0000000101238465 libnss3.dylib`_pt_root + 357
    frame #12: 0x00007fff944158bf libsystem_c.dylib`_pthread_start + 335
    frame #13: 0x00007fff94418b75 libsystem_c.dylib`thread_start + 13

So I guess we are doing something wrong with the FinalizeStatementsCacheProxy.

A quick look at the code here seems to point here:
http://dxr.mozilla.org/mozilla-central/source/toolkit/components/places/Database.cpp#l1926

The dispatch to finalization seems to correctly take place before we start async closing. However, the blockingConnectionCloseCallback might possibly undo all our async efforts.
So, the issue seems to be related to the fact that, for some reason, we have statements that belong to the async execution thread.

I believe that we could fix the issue by having FinalizeStatementCacheProxy perform the call to AsyncClose. This would require passing as argument to the FinalizeStatementCacheProxy both the database and |closeListener|. Does that sound correct to you, mak?
Flags: needinfo?(mak77)
taking to keep this on my radar, I should dedicate some time to figure it out
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Flags: needinfo?(mak77)
Attached patch patch v1 (obsolete) — Splinter Review
The fact is that, we dispatch a runnable that finalize statements to the async thread, then we call asyncClose().
asyncClose() sets mAsyncExecutionThreadShuttingDown to true immediately, that's fine.
then our finalizing runnable runs, it tries to finalize the statements, but it ends up in Statement::internalFinalize, where isClosing(true) returns true, since we are shutting down, though to finalize our statement the only thing we need is that the connection is alive, nothing else.
I think we should allow finalizing between the asyncClose call and the actual db closing (note that we null mDBConn before calling sqlite3_close).
Though, since you wrote the original code, it's possible here I'm missing something

PS: the Places changes are not needed to solve the warnings, they are just a nice to have, there's no point in spinning since we do the same in Storage now...
Attachment #8383787 - Flags: feedback?(dteller)
Comment on attachment 8383787 [details] [diff] [review]
patch v1

Review of attachment 8383787 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, except perhaps the locking issue below.

::: storage/src/mozStorageStatement.cpp
@@ +355,5 @@
>      return NS_OK;
>  
>    int srv = SQLITE_OK;
>  
> +  if (mDBConnection->ConnectionReady()) {

I'm trying to remember: can we call ConnectionReady() in this context? isClosing() acquires a mutex before checking it, shouldn't we also?
Attachment #8383787 - Flags: feedback?(dteller) → feedback+
Comment on attachment 8383787 [details] [diff] [review]
patch v1

Review of attachment 8383787 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageStatement.cpp
@@ +355,5 @@
>      return NS_OK;
>  
>    int srv = SQLITE_OK;
>  
> +  if (mDBConnection->ConnectionReady()) {

the mutex in isClosing() is for mAsyncExecutionThreadShuttingDown
Attachment #8383787 - Flags: review?(dteller)
Comment on attachment 8383787 [details] [diff] [review]
patch v1

Review of attachment 8383787 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageStatement.cpp
@@ +355,5 @@
>      return NS_OK;
>  
>    int srv = SQLITE_OK;
>  
> +  if (mDBConnection->ConnectionReady()) {

Ok, I'm willing to take your word on this, but in that case, please document that we're on the right thread to access ConnectionReady().
Attachment #8383787 - Flags: review?(dteller) → review+
hm, I think we broke some assumptions when we moved close to the async thread. Mostly the fact now mDBConn may be set to nullptr on either thread (indeed we unadvertitely, but expectedly, broke this assertion http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#816 those 2 lines are orphans).
I wonder if it's worth to add a mutex protecting mDBConn changes, on the other side that has a code maintainability and perf impact :(

Let's see what Andrew thinks
Flags: needinfo?(bugmail)
just as a side note, since this patch is not making things worse then they were, we may take it, jus to stop annoying hundreds of developers with warnings, and move the discussion about properly handling mDBConn to a separate bug.
From comment 6 and looking at the code, it seems like Places is violating all the assumptions I had about how consumers would use mozStorage.  Specifically, dispatching  FinalizeStatementCacheProxy or anything to the async execution thread seems like a huge warranty-voider.  FinalizeStatementCacheProxy is exactly the kind of thing that should exist in mozStorage or it should not be messing with the async execution thread.

API-wise, calling AsyncClose is meant to close-up shop for the connection and all public APIs.  What FinalizeStatementCacheProxy gets up to consists of calls to public APIs.

It seems like :Yoric's proposal in comment 4 to not call AsyncClose until Places is actually done calling public APIs is the correct thing to do based on how things are implemented in mozStorage right now.  The problem with this strategy is Places' event-loop spinning can't be abandoned in favor of mozStorage's in that case because mozStorage's spinning depends on asyncClose having been called.


I'm generally not a fan of increasing the complexity of mozStorage threading analysis by introducing an additional weird state things can be in, which I feel is where what this patch implements/allows to continue happening.  I think the right thing to do is:
- Lose FinalizeStatementCacheProxy.
- Add Statement::AsyncFinalize. Its semantics are to clean up the synchronously-held statement as well as any any asynchronously held statement, if present.
- Make StatementCache have an AsyncFinalizeStatements helper.  Have it call AsyncFinalize.

For AsyncFinalize, I would suggest:
- Move the mStatementParamsHolder/mStatementRowHolder logic out into a cleanupReferences() private method.
- Have an internalAsyncFinalize method that does the actual :sqlite3_finalize() for any non-null statements, nulling them afterwards.
- Implement AsyncFinalize so that:
  - It complains loudly if isClosing() already.  That suggests an ordering error.
  - It calls cleanupReferences()
  - It dispatches a runnable to the async thread that is basically AsyncStatementFinalizer but with the direct sqlite-hitting changed to a call to internalAsyncFinalize and it explicitly holding a Statement instead of a base-statement.


This is more code, and I realize that Places is an unfortunate situation that will not be repeated so spending more effort on it seems wasteful.  But since I fear places isn't going anywhere soon, I think it's useful to be able to reason about what the system is doing without the extra complexity of Places misusing APIs in confusing ways in code that lives outside the mozStorage module.
Flags: needinfo?(bugmail)
(In reply to Andrew Sutherland (:asuth) from comment #12)
> From comment 6 and looking at the code, it seems like Places is violating
> all the assumptions I had about how consumers would use mozStorage. 
> Specifically, dispatching  FinalizeStatementCacheProxy or anything to the
> async execution thread seems like a huge warranty-voider. 

From the API and code, it's not clear that runnables using the Storage API should not be sent to the async thread. Here we have a runnable sent to the async thread, the fact asyncClose is invoked before this runnable is executed is something one may avoid only by spinning the events loop until the async thread is empty, before calling AsyncClose.
Any consumer sending a runnable to the async thread would then have to do this, since there's no other way to ensure it will run before an AsyncClose call.
I was not aware of this assumption, since IIRC Shawn first started using the async thread as an execution target for our runnables.

> I'm generally not a fan of increasing the complexity of mozStorage threading
> analysis by introducing an additional weird state things can be in, which I
> feel is where what this patch implements/allows to continue happening.

Unfortunately, by making mDBConn nullable on both threads, we already added an additional weird state. It may change on any of the two threads, but we never protected mDBConn for that. This issue should be solved regardless of the path we take here. The fact solving that would also solve this issue, was sort of a side note.

> think the right thing to do is:
> - Lose FinalizeStatementCacheProxy.
> - Add Statement::AsyncFinalize. Its semantics are to clean up the
> synchronously-held statement as well as any any asynchronously held
> statement, if present.
> - Make StatementCache have an AsyncFinalizeStatements helper.  Have it call
> AsyncFinalize.

The statementCache that is requiring FinalizeStatementCacheProxy is created and lives on the async thread, FinalizeStatementCacheProxy exists just because of that, to avoid handling this object on the wrong thread. We don't have any problem with the other StatementCache on the mainthread, for which we can just call FinalizeStatements().
I'm not sure how AsyncFinalize would solve the fact we have a statementCache that should be finalized on the async thread. The only solution I see is spinning the events loop until FinalizeStatementCacheProxy is invoked, then call AsyncClose(), though this may end up happening too late in the shutdown process.
(In reply to Marco Bonardo [:mak] from comment #13)
> Unfortunately, by making mDBConn nullable on both threads, we already added
> an additional weird state. It may change on any of the two threads, but we
> never protected mDBConn for that. This issue should be solved regardless of
> the path we take here. The fact solving that would also solve this issue,
> was sort of a side note.

mDBConn gets zeroed by Connection::internalClose.  There's 2 ways in which internalClose gets called:

1) By a a normal/synchronous connection that never did anything asynchronous and calls Close().  If it tries to call Close() and something asynchronous happened, we bail with NS_ERROR_UNEXPECTED.  In this case mDBConn gets zeroed on the main thread.  By definition there is no asynchronous thread.

2) By the AsyncCloseConnection triggered by AsyncClose.  By definition this happens on the async thread.


So I'm not sure I see what you mean when you say either thread can nullify mDBConn.  There is exactly one thread we will call internalClose on based on the usage of the connection.  This decision is made on the main thread when it spins up an asynchronous execution thread, for which there inherently can be no race.

Are you talking about the isClosing stuff?  Because that's also a state change that happens on the main thread too.


It's possible I'm missing other things too.  I did not realize that the statement cache was initially created on the async thread, although as I think back, I probably was aware that :sdwilsh was going to use the asynchronous thread for something nefarious.

It sounds like :Yoric is pretty familiar with mozStorage and Places' usage of mozStorage now.  Given that I'm not up to the task of understanding Places' use of mozStorage and I'm no longer involved in Thunderbird as a mozStorage consumer, now might just be a good time for me to step down as a peer and :Yoric to step up.  My underlying concern about the patch is the amount of effort it takes for me to make sure I'm understanding what's going on in storage; someone more actively involved is less likely to have that problem.
(In reply to Andrew Sutherland (:asuth) from comment #14)
> mDBConn gets zeroed by Connection::internalClose.  There's 2 ways in which
> internalClose gets called:
> 
> 1) By a a normal/synchronous connection that never did anything asynchronous
> and calls Close().  If it tries to call Close() and something asynchronous
> happened, we bail with NS_ERROR_UNEXPECTED.  In this case mDBConn gets
> zeroed on the main thread.  By definition there is no asynchronous thread.
> 
> 2) By the AsyncCloseConnection triggered by AsyncClose.  By definition this
> happens on the async thread.
> 
> 
> So I'm not sure I see what you mean when you say either thread can nullify
> mDBConn.  There is exactly one thread we will call internalClose on based on
> the usage of the connection.

That is right. Indeed we can nullify it just in one thread, but we read it from both threads (many synchronous APIs, isClosing...) to check the connection is valid. The reads happening on the thread we are not setting it to null, may read it valid when it's just about to be invalidated (I think this is what Yoric was pointing out).
The only solution I see to this, is that we move back mDBConn to be nulled always on the main-thread (in setClosedState), and introduce a new async thread only property that changes when sqlite_close is invoked. We may then have a ConnectionIsValid that, depending on the thread it is invoked on, returns !!mDBConn or the async property.

> I probably was aware that :sdwilsh was going to use the
> asynchronous thread for something nefarious.

Heh :)

> It sounds like :Yoric is pretty familiar with mozStorage and Places' usage
> of mozStorage now.  Given that I'm not up to the task of understanding
> Places' use of mozStorage and I'm no longer involved in Thunderbird as a
> mozStorage consumer, now might just be a good time for me to step down as a
> peer and :Yoric to step up.

Actually, your knowledge of the Storage (and general) multi-threading handling can't be paired by anyone, so we'd really lose precious information without you. I figure maybe my reply sounded a bit harsh, I'm sorry for that. I was trying to explain the situation in the fewer words possible, and ended up a bit too much wild. Sorry :(

In my opinion the fact we can reuse the async thread and have a natural enqueueing with Sqlite is a very nice feature, allowing a good mixture of hitting sqlite and doing work in the middle (not everything can be done with just queries). So it would be sad to lose it. And the API leaves space for that by the fact it allows to just enqueue work until asyncClose is invoked. Thus is why I think we should allow any work enqueued -before- asyncClose to complete correctly. Clearly from that point on, no new work should be allowed to enqueue or run on the main-thread.
(In reply to Marco Bonardo [:mak] from comment #15)
> That is right. Indeed we can nullify it just in one thread, but we read it
> from both threads (many synchronous APIs, isClosing...) to check the
> connection is valid. The reads happening on the thread we are not setting it
> to null, may read it valid when it's just about to be invalidated (I think
> this is what Yoric was pointing out).

Ah, right, so I think we've got different expectations on the consumer code here.  isClosing() is intended to moot checking mDBConn; its use by internalFinalize in that weird aResultOnClosed path means we don't look at mDBConn at all.

But a fundamental assumption on my part is that when consumer code calls AsyncClose it will stop making API calls so there is no need for us to go super-paranoid about stopping the code from doing crazy things.  The exceptional case is destructors where garbage-collection/reference counting and their complicated interaction patterns means we can expect serious races that risk crashing us.  And that's why isClosing definitely gets used where it does.

I think it's clear this was a misguided/lazy assumption on my part.  Strictly speaking, any place we currently check mDBConn in a guard needs to actually be doing something like "if (isClosing(true) || !mDBConn)".  The mDBConn covers the initialization-check, the isClosing() check covers the falling edge.  The correct thing to do under the current regime is to make ConnectionReady have that logic and have the current ConnectionReady check in isClosing directly check mDBConn.  isClosing's mutex acquisition is a little weird (excessive?) to me right now and our class commment isn't amazing about its usage, but I think the async thread avoids holding it during I/O...


> The only solution I see to this, is that we move back mDBConn to be nulled
> always on the main-thread (in setClosedState), and introduce a new async
> thread only property that changes when sqlite_close is invoked. We may then
> have a ConnectionIsValid that, depending on the thread it is invoked on,
> returns !!mDBConn or the async property.

It sounds to me like you're saying:
- mDBConn gets swapped with null
- We do basically what LastDitchSqliteStatementFinalizer does and save off that sqlite3* and close it there.  Maybe it doesn't hold the bare pointer but there's just some other member on the class.

It seems like this would still break the existing Places usage, though.

I do agree that generally having a very well defined semantics for mDBConn being null/non-null and such would be a good simplification.


> > It sounds like :Yoric is pretty familiar with mozStorage and Places' usage
> > of mozStorage now.  Given that I'm not up to the task of understanding
> > Places' use of mozStorage and I'm no longer involved in Thunderbird as a
> > mozStorage consumer, now might just be a good time for me to step down as a
> > peer and :Yoric to step up.
> 
> Actually, your knowledge of the Storage (and general) multi-threading
> handling can't be paired by anyone, so we'd really lose precious information
> without you. I figure maybe my reply sounded a bit harsh, I'm sorry for
> that. I was trying to explain the situation in the fewer words possible, and
> ended up a bit too much wild. Sorry :(

I have zero problems with your response; it did not come across as harsh.  Apologies if I sounded reactionary.  My concern is that it takes me a lot of time to respond to these bugs and while I try and do a good job, I'm still cutting some corners as I'm feeling a lot of time/schedule pressure from my primary tasks in Firefox OS.  I worry that this is a net disservice to mozStorage since many of our problems/complexity with the threading and async stuff is due to me similarly cutting corners in my initial incremental development of the async functionality required by Thunderbird.

I am happy to stick around if I'm adding value.  For the future I'll try and see about providing patches as part of any review process to flesh out comments to the code to help both myself and others reading the code.
 
> In my opinion the fact we can reuse the async thread and have a natural
> enqueueing with Sqlite is a very nice feature, allowing a good mixture of
> hitting sqlite and doing work in the middle (not everything can be done with
> just queries). So it would be sad to lose it. And the API leaves space for
> that by the fact it allows to just enqueue work until asyncClose is invoked.
> Thus is why I think we should allow any work enqueued -before- asyncClose to
> complete correctly. Clearly from that point on, no new work should be
> allowed to enqueue or run on the main-thread.

It is nice, it's just potentially more complex than my proposed band-aid that requires less re-evaluation of stuff :).  I think if you can clarify your proposal to simplify mDBConn in a way that lets Places keep doing its thing and we can maybe augment the code comments a little to explain why we aren't adding a mutex hazard / the exact life-cycle of mDBConn, that would probably be best for all involved parties.
I will try to make a patch and we could hammer its edges then.
(In reply to Andrew Sutherland (:asuth) from comment #14)
> It sounds like :Yoric is pretty familiar with mozStorage and Places' usage
> of mozStorage now.  Given that I'm not up to the task of understanding
> Places' use of mozStorage and I'm no longer involved in Thunderbird as a
> mozStorage consumer, now might just be a good time for me to step down as a
> peer and :Yoric to step up.  My underlying concern about the patch is the
> amount of effort it takes for me to make sure I'm understanding what's going
> on in storage; someone more actively involved is less likely to have that
> problem.

I'm starting to become familiar with it, but my understanding of the code is not nearly as good as that of either of you. I'm willing to try getting more involved, but I'll need considerable help.
Blocks: fxdesktopbacklog
No longer blocks: fxdesktoptriage
Whiteboard: p=0
Whiteboard: p=0 → p=5 s=it-31c-30a-29b.2
Hi mak, if this is something you think QA should be testing and verifying, can you give us some advice on how to do best do that?  If you think it isn't, also let us know here so that we can mark the bug [qa-] for this iteration. Thanks!
Flags: needinfo?(mak77)
Whiteboard: p=5 s=it-31c-30a-29b.2 → p=5 s=it-31c-30a-29b.2 [qa?]
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
(In reply to Liz Henry :lizzard from comment #19)
> Hi mak, if this is something you think QA should be testing and verifying,
> can you give us some advice on how to do best do that?  If you think it
> isn't, also let us know here so that we can mark the bug [qa-] for this
> iteration. Thanks!

I don't think there's anything QA needs to verify here, it's very trivial for us to check the warnings disappear, and it's very non-trivial to verify a possible thread-safety issue.
Flags: needinfo?(mak77)
Whiteboard: p=5 s=it-31c-30a-29b.2 [qa?] → p=5 s=it-31c-30a-29b.2 [qa-]
Attached patch patch v2 (obsolete) — Splinter Review
ok, this is sort-of what I had in mind (please ignore the Places changes, those are already r+ by David):

1. setClosedState goes back to its origins, it sets mDBConn to nullptr, so that now that only happens on the opened thread
2. isClosing and connectionReady can now only be used on the opened thread, since they rely on mDBConn
3. due to 1) I pass-over the raw connection handle to finish the closing work (either sync or async closing)
4. a new isConnectionClosed tracks whether sqlite3_close has been invoked already
5. isConnectionClosed may be invoked from both threads, so there are various possibilities here:
5a. make it call connectionReady on main-thread, use its own tracking property on async thread. I don't like this much cause makes its behavior variable and error-prone, but avoids using a mutex.
5b. provide its own mConnectionClosedMutex. If we may avoid further complication I'd say we should try.
5c. re-use sharedAsyncExecutionMutex, this is what I did, it should be fine considered this code runs only one of the two threads, and very late on shutdown, but I'd really love Andrew's feedback about this.
Attachment #8403301 - Flags: feedback?(dteller)
Attachment #8403301 - Flags: feedback?(bugmail)
Comment on attachment 8403301 [details] [diff] [review]
patch v2

Review of attachment 8403301 [details] [diff] [review]:
-----------------------------------------------------------------

> 2. isClosing and connectionReady can now only be used on the opened thread, since they rely on mDBConn

That doesn't seem to be the case for your implementation of isClosing()

> 5c. re-use sharedAsyncExecutionMutex, this is what I did, it should be fine considered this code runs only one of the two threads, and very late on shutdown, but I'd really love Andrew's feedback about this.

I don't remember much about this mutex, but this makes me a little nervous about 1/ deadlocks; 2/ unexpected contentions. I would be less nervous if we had a less widely used mutex.

::: storage/src/mozStorageConnection.cpp
@@ -826,5 @@
>  
> -  // Set the property to null before closing the connection, otherwise the other
> -  // functions in the module may try to use the connection after it is closed.
> -  sqlite3 *dbConn = mDBConn;
> -  mDBConn = nullptr;

So, what happens now if we try to use mDBConn after the connection is closed?
Attachment #8403301 - Flags: feedback?(dteller) → feedback+
:mak, in comment 13 you seem to be making the argument that synchronous-callers should be able to go crazy using the synchronous APIs on the async thread before the sqlite3* is closed for realsies.  I want to make sure that we're all on the same page that this is not what you're implementing with this patch.

This *does* make Statement::Finalize work in the way that FinalizeStatementCacheProxy needs, since it does not look at mDBConn and just directly does ::sqlite3_finalize(mDBStatement);

But everything else likes to check mDBConn and will explode.  For example, Statement::ExecuteStep uses Connection::stepStatement which will explode.  Of course Statement::GetReady is potentially made of lies and I think there are potential crashers all over the place in the binding and result retrieval stages anyways.

So I'm fine with this limitation, because it's sane, but it does mean that we need to be absolutely sure that *the only thing Places ever does on the async thread is finalize statements*.  If it is running statements on the async thread synchronously, then one of the following should happen:
- it needs some type of barrier so it doesn't call AsyncClose before they complete
- we need to make the synchronous API either not freak out if mDBConn has been already been cleared
(Because our async execution target is guarded and we'll assume Places is not saving it off, it's generally fine for the sync APIs to do whatever they want as long as they can survive mDBConn being null.)


(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #22)
> > 5c. re-use sharedAsyncExecutionMutex, this is what I did, it should be fine considered this code runs only one of the two threads, and very late on shutdown, but I'd really love Andrew's feedback about this.

I just audited the mutex.  Before the patch, sharedAsyncExecutionMutex protects:
- Connection.mAsyncExecutionThreadShuttingDown
- Connection.mAsyncExecutionThread
- AsyncExecuteStatements.mCancelRequested

After the patch, sharedAsyncExecutionMutex protects:
- Connection.mAsyncExecutionThreadShuttingDown
- Connection.mAsyncExecutionThread
- Connection.mConnectionClosed
- AsyncExecuteStatements.mCancelRequested

(It would probably be good to explicitly list these in the mutex's comment.  Any maybe also note in the variable comments that they are guarded by sharedAsyncExecutionMutex where not already noted.  (And thanks for fixing that out-of-date comment citing its previous name already!))

The big risk in the patch is that mDBConn happening on the main thread makes the async thread unhappy.  That does seem to be a bug with the patch currently.  AsyncExecuteStatements uses GetNativeConnection() to retrieve the sqlite3* for execution and error code retrieval.  GetNativeConnection is just a wrapper around fetching mDBConn.  You either need to give the AsyncExecuteStatements the bare sqlite3* every time or save mDBConn off somewhere else that the async thread can grab it, like in mAsyncDBConn or something.  I think giving the bare sqlite3* may be the least head-hurty option and consistent with the other changes in the patch.
Comment on attachment 8403301 [details] [diff] [review]
patch v2

I think this looks pretty good and is likely good to go once the GetNativeConnection issue is addressed.  I would like to know the answer to the question about whether there is any potential for Places to be doing synchronous things other than Finalize on the async thread in such a way that they will race AsyncClose.

I should be able to run around a review on this pretty quick now assuming you're not addressing any more than that.
Attachment #8403301 - Flags: feedback?(bugmail) → feedback+
Whiteboard: p=5 s=it-31c-30a-29b.2 [qa-] → p=5 s=it-31c-30a-29b.3 [qa-]
(In reply to Andrew Sutherland (:asuth) from comment #23)
> :mak, in comment 13 you seem to be making the argument that
> synchronous-callers should be able to go crazy using the synchronous APIs on
> the async thread before the sqlite3* is closed for realsies.  I want to make
> sure that we're all on the same page that this is not what you're
> implementing with this patch.

Yes, I still think we should allow any work in the async thread to complete before we actually close... Though that is becoming scope creep for this bug. So for now I'd be fine fixing this specific case and live with the assumption some data may be lost.

> But everything else likes to check mDBConn and will explode.  For example,
> Statement::ExecuteStep uses Connection::stepStatement which will explode.

By explode, do you mean fail, or crash? Cause to me looks like it would throw SQLITE_BUSY, the callers should be handling errors already.

> we need to be absolutely sure that *the only thing Places ever does on the
> async thread is finalize statements*.

As I said, it may end up trying to do some work late, and we are fine with losing that so far. Clearly I agree we need more guards against null deref.
 
> The big risk in the patch is that mDBConn happening on the main thread makes
> the async thread unhappy.  That does seem to be a bug with the patch
> currently.  AsyncExecuteStatements uses GetNativeConnection() to retrieve
> the sqlite3* for execution and error code retrieval.  GetNativeConnection is
> just a wrapper around fetching mDBConn.

My thoughts about this are a little bit wider:
- I think the implicit sqlite3 *() operator is a foot gun. it basically hides GetNativeConnection(), making hard to audit call points in mxr, and making code more obscure. I'd like to remove it and make its callers use GetNativeConnection
- GetNativeConnection is sort-of abused. In spite of protecting the code from mDBConn null-deref, we should reduce its usage to a bare minimum (or even completely remove it). Most of the cases seem to require it to get an error message, for which we may pass the handle, provide a Connection->GetErrMsg() or try to use ::sqlite3_db_handle wherever possible.

And actually, I could have maybe just used ::sqlite3_db_handle for finalization from the beginning, avoiding all of this:
sqlite3 *conn = ::sqlite3_db_handle(mDBStatement);
if (conn) { /* finalize */ }

This makes a very simple patch and sounds like would do what we want here, without scope-creeping too much. If this would work for you, I could file the other changes as follow-ups, cause I still think it may be worth to remove some potential pitfalls from the code.
(In reply to Marco Bonardo [:mak] from comment #25)
> (In reply to Andrew Sutherland (:asuth) from comment #23)
> > But everything else likes to check mDBConn and will explode.  For example,
> > Statement::ExecuteStep uses Connection::stepStatement which will explode.
> 
> By explode, do you mean fail, or crash? Cause to me looks like it would
> throw SQLITE_BUSY, the callers should be handling errors already.

I did a limited analysis.  There were definitely places that checked mDBConn and would error out.  There are definitely places where the already-freed sqlite3 data structures are theoretically accessible from rampaging rogue code because a lack of defensive checks.  I think Places' uses are likely limited to erroring out when they should not.  (Truly rogue code would need to keep making calls after the actual close occurred.)
 
> - I think the implicit sqlite3 *() operator is a foot gun. it basically
> hides GetNativeConnection(), making hard to audit call points in mxr, and
> making code more obscure. I'd like to remove it and make its callers use
> GetNativeConnection

Agreed.

> - GetNativeConnection is sort-of abused. In spite of protecting the code
> from mDBConn null-deref, we should reduce its usage to a bare minimum (or
> even completely remove it). Most of the cases seem to require it to get an
> error message, for which we may pass the handle, provide a
> Connection->GetErrMsg() or try to use ::sqlite3_db_handle wherever possible.

Both sound good.  It seems analytically desirable to make sure any-place using ::sqlite3_db_handle has some assurance that !isClosed(), although practically speaking I'm not sure that's something we need to guard against.

> And actually, I could have maybe just used ::sqlite3_db_handle for
> finalization from the beginning, avoiding all of this:
> sqlite3 *conn = ::sqlite3_db_handle(mDBStatement);
> if (conn) { /* finalize */ }

Since we explicitly finalize all statements during our last-ditch finalization, !conn implies that you just accessed free()d memory and were lucky enough to not crash.  That's misleading.

> This makes a very simple patch and sounds like would do what we want here,
> without scope-creeping too much. If this would work for you, I could file
> the other changes as follow-ups, cause I still think it may be worth to
> remove some potential pitfalls from the code.

This works for me.  I do really like the explicit isClosing/isClosed logic for clarity and would still be happy for that to land.  Your call!
Attached patch Part 1 - v1 (obsolete) — Splinter Review
This part removes the implicit sqlite3* conversion operator, and GetNativeConnection() helper.

I'm passing the native connection pointer to:
- AsyncExecuteStatements::execute (that passes it over to AsyncExecuteStatements::AsyncExecuteStatements)
- AsyncStatement::initialize
- Statement::initialize (for symmetry with the async case)

All of these need the native connection just for calling ::sqlite3_errmsg, we can't just provide an helper in Connection for that, since the sqlite api call may run after we have nulled mDBConn.
I suppose we may avoid nulling mDBConn and use mConnectionClosed and isClosed() to track whether it's valid (that I'm introducing in part2), though that would require changing all of the null checks we have on mDBConn, so I'd probably file it as a follow-up if we want to do that.
Attachment #8383787 - Attachment is obsolete: true
Attachment #8403301 - Attachment is obsolete: true
Attached patch part 2 - v2 (obsolete) — Splinter Review
And this is the same patch as before, with minor corrections. Applies on top of part 1 clearly...

I'm going to do a try run just to check I'm not hitting some unexpected assertions, then will re ask for review. Though, if you have comments in the meanwhile, they are welcome.
(In reply to Marco Bonardo [:mak] from comment #27)
> I suppose we may avoid nulling mDBConn and use mConnectionClosed and
> isClosed() to track whether it's valid (that I'm introducing in part2),
> though that would require changing all of the null checks we have on
> mDBConn, so I'd probably file it as a follow-up if we want to do that.

or we may have "mDBConn2" just to feed GetLastErrorString (or a simpler LastErrorString() getter). This would be easier to implement, even if not very clean.
Attached patch Part 1 - v2Splinter Review
Attachment #8407800 - Attachment is obsolete: true
Attachment #8407804 - Attachment is obsolete: true
Attached patch Part 2 - v2 (obsolete) — Splinter Review
this seems to pass all tests (locally, now going through Try again). It's a little bit more changes than expected, cause both stepStatement and prepareStatement were either trying to use mDBConn late (in commonly found cases like executeAsync();asyncClose()) or their waitForUnlock callbacks were returning after mDBConn was nulled out... So I ended up passing the native connection more thoroughfully... I added more isClosed() checks to guard against misuses.
I think globally is not that bad that mDBConn usage is more restricted into mozStorageConnection, it's just unfortunate we must pass around the underlying connection in so many points, when in most cases we need for simple things like error messages.

Btw, I'd say these 2 parts are good for feedback, mostly asking Andrew at this point since he knows better the origins of the code I'm touching, though if David has any comments, they are more than welcome.

Please consider this request global for the 2 parts.
Attachment #8408460 - Flags: feedback?(bugmail)
Comment on attachment 8408460 [details] [diff] [review]
Part 2 - v2

r=asuth contingent on a change to isClosing and some related stuff:

isClosing() as implemented under both patches is going to return false under all circumstances.  Its value is effectively:
  mAsyncExecutionThreadShuttingDown && (mDBConn != nullptr)

setClosedState() sets:
 mAsyncExecutionThreadShuttingDown = true;
 mDBConn = nullptr;
within a few lines of each other.

I think isClosing wants to effectively be: 
  mAsyncExecutionThreadShuttingDown && !isClosed()


This is consistent with our only consumer, the xpcom-shutdown-threads observer logic whose intent is to give anything that's closing a chance to finish closing down.  Note that it has an outdated/incorrect comment about us only giving async connections a chance.  Synchronous connections have a chance, they just don't need any help.

That logic is then followed by the SCM_CRASH logic.  It uses ConnectionReady which is correctish if we've fixed isClosing(), but arguably if !isClosed() is a more accurate check (although I don't super care.)


Thanks for all the cleanup effort here!  This makes my brain much happier!
Attachment #8408460 - Flags: feedback?(bugmail) → review+
Thank you Andrew! I will fix your comment shortly.

Actually the patch already found the first bug, minimizeMemory is working on the main-thread even if the connection has been created on another thread. I'm going to file a dependency and work on it apart (likely just dispatching a runnable in this specific case). This bug also affects b2g.

and test_cookies_async_failure.js is failing (hitting the SQLITE_MISUSE case in createStatement I think), I must investigate this, off-hand it looks like it may be an actual bug in the code, but let's see.
Depends on: 998330
ok after some long debugging session, looks like the cookies problem is that
AsyncExecuteStatements::notifyComplete() may end up calling into Connection->CommitTransaction() after asyncClose has been invoked, at that point mDBConn is nullptr and CommitTransaction returns an error, that ends up invoking handleError. The cookies service on any handleError thinks the database is corrupt and we end up in a strange situation with an invalid database.

Moreover mozStorageAsyncTransaction is calling directly into Connection::CommitTransaction, that is the CommitTransaction API (it can't work without mDBConn) and CommitTransaction internally uses ExecuteSimpleSQL (an API again). We may have to handle transactions differently in the async case :(
Attached patch part 3 - v1Splinter Review
this is the fix to allow transaction management in async statement execution. We don't actually need the full mozStorageTransaction helper since the logic is quite serialized here.

I'm now going to fix bug 99833, but that should be the "easy" part.
Attachment #8410589 - Flags: review?(bugmail)
Attached patch Part 2 - v3 (obsolete) — Splinter Review
address comments.
Attachment #8408460 - Attachment is obsolete: true
Attached patch Part 2 - v4Splinter Review
oops, isClosing() was now trying to acquire the mutex twice.
Attachment #8410597 - Attachment is obsolete: true
Attachment #8410589 - Flags: review?(bugmail) → review+
this is a green try with the 3 parts:
https://tbpl.mozilla.org/?tree=Try&rev=99b0689cc632

I had to remove the assertion check from connectionReady, see https://bugzilla.mozilla.org/show_bug.cgi?id=998330#c3 for the reasoning. I think I will file a separate bug to investigate the approach, but it's not the scope here and I'm not really introducing any new hazards that were not already there by doing this.
By removing the assertion bug 998330 is no more mandatory to land this, so I will fix it later (there's a leak I must investigate there).
Status: RESOLVED → VERIFIED
Depends on: 1002891
You need to log in before you can comment on or make changes to this bug.