Last Comment Bug 867798 - Cookie Service closes DB on main thread
: Cookie Service closes DB on main thread
Status: ASSIGNED
[necko-backlog]
:
Product: Core
Classification: Components
Component: Networking: Cookies (show other bugs)
: unspecified
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Jason Duell [:jduell] (needinfo me)
:
:
Mentors:
Depends on: OMTConnectionClose
Blocks: AsyncConnConversion
  Show dependency treegraph
 
Reported: 2013-05-01 15:38 PDT by Vladan Djeric (:vladan)
Modified: 2016-02-19 13:55 PST (History)
18 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1: close syncConn asynchronously (1.12 KB, patch)
2013-05-02 18:56 PDT, Jason Duell [:jduell] (needinfo me)
mak77: review-
Details | Diff | Splinter Review

Description Vladan Djeric (:vladan) 2013-05-01 15:38:49 PDT
We're closing the cookie DB on the main thread after reading it in. Can we do an async close to prevent main-thread jank?

Profile: http://people.mozilla.com/~bgirard/cleopatra/#report=3749141e55c007c281f874795d9d0cd6cd751e15&search=ReadCookieDBListener
Comment 1 Doug Turner (:dougt) 2013-05-02 15:05:49 PDT
mak, asuth, what can be done here.  Why is the db being closed at all during normal running of ff?
Comment 2 Jason Duell [:jduell] (needinfo me) 2013-05-02 18:39:56 PDT
:vladan, the profile you link to confuses me.  Am I correct in reading it as saying that 100% of the time here is in the bottom call (NtClose)?  Just how long is the jank here, and when is it happening (shutdown)?

I think I've got the fix--patch coming...
Comment 3 Jason Duell [:jduell] (needinfo me) 2013-05-02 18:56:01 PDT
Created attachment 744961 [details] [diff] [review]
v1: close syncConn asynchronously

bent: one-line patch (with long-winded explanation :)  

So what's going on here is that we do this stupid thing with the cookie code where we

1) at first use of the cookie service we start an async load of the entire DB into memory (I think we should do this much earlier in startup and skip step 2).

2) we then issue a sync DB request (on a different db connection) to get just the single cookie that's immediately needed.  You and I had a long IRC chat where we concluded that it's likely this sync request winds up actually blocking for the async query to complete :(

3) we then null out the syncConn connection pointer, and that's currently causing the connection's Release function to call synchronous DB close on the main thread (see stack in profile):

   http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#436

   http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#413

So this patch just calls AsyncClose on the syncConn before we release it, which should make the close happen off-main thread. (Or specifically, it should cause the connection to be flushed off main thread, which will make the actual main thread close() much faster.  Or so is my weak understanding of mozStorageConnection):

   http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#347

I'm kind of surprised that the synchronous close of the sync conn is taking so much time.  I'd expect, for instance, that the blocking wait for the sync cookie SQL lookup would take longer. In any case, I guess this should help.

bent: is there any reason we can't make the connection::Release logic call AsyncClose(), so we don't hit sync Close() by default?  Seems like a jank footgun.
Comment 4 Jason Duell [:jduell] (needinfo me) 2013-05-02 19:03:03 PDT
bent: oh, and IIRC we open all sqlite connections (or at least the cookie ones) on the main thread, which does main thread I/O.  Could we move that to another thread?
Comment 5 Vladan Djeric (:vladan) 2013-05-02 22:46:17 PDT
(In reply to Jason Duell (:jduell) from comment #2)
> :vladan, the profile you link to confuses me.  Am I correct in reading it as
> saying that 100% of the time here is in the bottom call (NtClose)? 

The profile I linked to is filtered on "ReadCookieDBListener". You can see the full profile with samples before & after NtClose by removing the "search" argument from the URL:

http://people.mozilla.com/~bgirard/cleopatra/#report=3749141e55c007c281f874795d9d0cd6cd751e15

> Just how long is the jank here, and when is it happening (shutdown)?

The sampling frequency is 50 ms (see info in panel on the left) and there are 58 samples with this stack, so 58 * 50 ms = 2.9 seconds to perform NtClose

Note that I captured this profile while running Firefox with its profile directory on an SD card:

https://blog.mozilla.org/vdjeric/2013/05/01/running-nightly-23-from-an-sd-card/
Comment 6 Jason Duell [:jduell] (needinfo me) 2013-05-03 17:06:39 PDT
Comment on attachment 744961 [details] [diff] [review]
v1: close syncConn asynchronously

Review of attachment 744961 [details] [diff] [review]:
-----------------------------------------------------------------

bent suggests :mak for advice here.  I currently suspect that this patch won't actually do very much, since it's the actual NtClose that's blocking for 2.9 seconds, not anything that's flushing the db connection first.  I'm guessing the right fix here is to close the the connection on a different thread.  If that's right just r- this patch and I'll write a new one.
Comment 7 Jason Duell [:jduell] (needinfo me) 2013-05-03 17:09:20 PDT
FYI: opening the DB connection is also blocking I/O on main thread, but it only causes a single sample delay, so 50ms:

   http://people.mozilla.com/~bgirard/cleopatra/#report=3749141e55c007c281f874795d9d0cd6cd751e15&search=TryInitDB
Comment 8 :Ehsan Akhgari 2013-05-04 12:27:03 PDT
(In reply to comment #7)
> FYI: opening the DB connection is also blocking I/O on main thread, but it only
> causes a single sample delay, so 50ms:
> 
>   
> http://people.mozilla.com/~bgirard/cleopatra/#report=3749141e55c007c281f874795d9d0cd6cd751e15&search=TryInitDB

But that could become a much worse jank on other systems if the I/O subsystem is busy doing other things, etc.  We should fix that too, but perhaps in a different bug.
Comment 9 Vladan Djeric (:vladan) 2013-05-06 19:10:00 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> (In reply to comment #7)
> > FYI: opening the DB connection is also blocking I/O on main thread, but it only
> > causes a single sample delay, so 50ms:
> But that could become a much worse jank on other systems if the I/O
> subsystem is busy doing other things, etc.  We should fix that too, but
> perhaps in a different bug.

Yoric is working on making this possible in bug 702559
Comment 10 Marco Bonardo [::mak] 2013-05-09 07:19:11 PDT
Comment on attachment 744961 [details] [diff] [review]
v1: close syncConn asynchronously

Review of attachment 744961 [details] [diff] [review]:
-----------------------------------------------------------------

Calling AsyncClose, despite the name, won't help you here.
It just enqueues an event to the async thread that, when executed, enqueues another event to the main thread, that does the actual closing. So you would just be delaying the hang. Would be mandatory only if you had executed something async on that connection.
As said, Yoric is indeed working on a pure-async connection that can do open and close off main-thread, though again that won't help us here, since it doesn't do that for synchronous connections, that is our case.
May we evaluate closing synchronous connections off main-thread? well, it could be feasible I suppose.

What we can do for now is checking if there are obvious reasons for close to take this much.
Since cookies are using wal, they should not block each other. The wal should also not be merged when we close this sync connection, since according to the documentation it's merged when the last connection closes, and at the point this one is closed the async one is still open.  So doesn't look like an fsync.

One thing we may try is to make the synchronous connection read-only, while I'm not sure that will change much, it should conceptually reduce the needed synchronization work on closing it.
We should also ensure all prepared statements are finalized (or nullified, that is the same thing due to the destructor) before the connection is closed, this stmt is not http://mxr.mozilla.org/mozilla-central/source/netwerk/cookie/nsCookieService.cpp#2180 would be enough to scope it.  Again, I'm not sure this helps, but it's surely wrong atm.

I took a look at the code in cookies service and couldn't find other misuses, so far.
That said, looks like all of the time is spent by the OS in closing the handle, so I'm not sure we can anything else than evaluating moving all sqlite_close calls off main-thread.
Comment 11 Jason Duell [:jduell] (needinfo me) 2013-05-09 11:29:27 PDT
marco, thanks for looking at this.

Since this is one-time jank that only occurs on really fairly crappy storage, I don't think it's a top priority.

I see two ways to fix this when the time is right.  

1) we can wait for bug 702559 and we'll get async open/close "for free".

2) we should move the cookie service init to an earlier part of startup, since even if we use async DB calls we'll still block the main thread during the 1st channel load.  I filed bug 870460 for that.

For now I'll mark this depending on 702559.  If we do #2 this would become INVALID.
Comment 12 Marco Bonardo [::mak] 2013-05-10 05:23:41 PDT
(In reply to Jason Duell (:jduell) from comment #11)
> marco, thanks for looking at this.
> 
> Since this is one-time jank that only occurs on really fairly crappy
> storage, I don't think it's a top priority.
> 
> I see two ways to fix this when the time is right.  
> 
> 1) we can wait for bug 702559 and we'll get async open/close "for free".

As I said, that bug won't help us, since it's only for async connections, that means you open a connection and can only run async operations on it. Here we have a synchronous connection on main-thread.

What you want is to make sync connections (that live on main-thread) call sqlite_close on another thread.  For this we need a separate bug, that may probably re-use some code from bug 702559, but it's not the same thing.
Comment 13 Marco Bonardo [::mak] 2015-04-09 03:35:13 PDT
Now we can use asyncClose on a synchronous connection, and it will close on the async thread.
the suggested patch should do, but we could also try to make syncConn a readonly clone of dbConn and ensure stmtReadDomain is finalized before asyncClose.

Note You need to log in before you can comment on or make changes to this bug.