Shrink Sqlite memory on the async thread when possible.

RESOLVED FIXED in Firefox 56

Status

()

defect
P2
normal
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: mak, Assigned: dthayer, Mentored)

Tracking

(Blocks 2 bugs)

unspecified
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [good first bug][lang=cpp][qf:p1])

Attachments

(1 attachment)

for mixed connections we currently execute shrink memory on the main-thread, we could convert mAsyncExecutionThreadIsAlive to release and use that to execute on the async thread.
Places.sqlite takes about 200ms to shrink memory, that's a long time on the main-thread.
Priority: -- → P2
Whiteboard: [good first bug][lang=cpp] → [good first bug][lang=cpp][qf]
Whiteboard: [good first bug][lang=cpp][qf] → [good first bug][lang=cpp][qf:p1]
Does the Places cleanup work happen at idle time? If so, this shouldn't be qf:p1.
Flags: needinfo?(mak77)
it happens on idle AND on the "memory-pressure" observer topic (at that time it happens for all the known connections, included Places obviously)
Flags: needinfo?(mak77)
Assignee: nobody → dothayer
Marco, I couldn't find the cleanup that happens on idle. I did find this: http://searchfox.org/mozilla-central/source/toolkit/modules/Sqlite.jsm#660, which happens on connection idle if shrinkMemoryOnConnectionIdleMS is specified, but that doesn't seem to be passed in anywhere outside of test code. Am I missing something?
Flags: needinfo?(mak77)
(In reply to Doug Thayer [:dthayer] from comment #4)
> Marco, I couldn't find the cleanup that happens on idle.

What you found is correct, indeed I suspect that what happens here is just a consequence of the "memory-pressure" notification, that is very likely to hit us on 32bit systems (especially win32). In case of memory-pressure Storage will invoke minimizeMemory() that goes through all the known connections and runs the shrink memory pragma on them. If a connection is synchronous (majority atm) that shrinking happens on the main-thread even if the connection has an async thread available.

We don't seem to make use of the idle helper yet, so I was probably just remembering a plan that never concretized. We don't minimize memory on idle. On the other side the Sqlite.jsm helper would already run that pragma off the main-thread.
Flags: needinfo?(mak77)
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review149710

Thanks for looking into this.

::: storage/mozStorageConnection.cpp:943
(Diff revision 1)
>  }
>  
> +bool
> +Connection::isAsyncExecutionThreadAvailable()
> +{
> +  return mAsyncExecutionThreadIsAlive && ! mAsyncExecutionThreadShuttingDown;

Both properties should be accessed while keeping a lock on sharedAsyncExecutionMutex, and the current code doesn't seem to do that.

Here you sould probably add a  sharedAsyncExecutionMutex.AssertCurrentThreadOwns(); and add a MutexAutoLock when accessing isAsyncExecutionThreadAvailable in the else if scope.

You should also update the sharedAsyncExecutionMutex and mAsyncExecutionThreadIsAlive javadocs to point that out. And add a javadoc to isAsyncExecutionThreadAvailable stating what it does and the mutex lock requirement.

::: storage/mozStorageService.cpp:368
(Diff revision 1)
>        DebugOnly<nsresult> rv =
>          conn->ExecuteSimpleSQLAsync(shrinkPragma, nullptr, getter_AddRefs(ps));
>        MOZ_ASSERT(NS_SUCCEEDED(rv), "Should have purged sqlite caches");
>      } else if (NS_SUCCEEDED(conn->threadOpenedOn->IsOnCurrentThread(&onOpenedThread)) &&
>                 onOpenedThread) {
> -      // We are on the opener thread, so we can just proceed.
> +    if (conn->isAsyncExecutionThreadAvailable()) {

maybe it's just a mozreview artifact, but the indentation here looks wrong.

::: storage/mozStorageService.cpp:376
(Diff revision 1)
> +        conn->ExecuteSimpleSQLAsync(shrinkPragma, nullptr, getter_AddRefs(ps));
> +      MOZ_ASSERT(NS_SUCCEEDED(rv), "Should have purged sqlite caches");
> +    } else {
>        conn->ExecuteSimpleSQL(shrinkPragma);
> +    }
> +      // We are on the opener thread, so we can just proceed.

This comment here doesn't work anymore, just remove it.
Attachment #8873622 - Flags: review?(mak77) → review-
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review149710

> Both properties should be accessed while keeping a lock on sharedAsyncExecutionMutex, and the current code doesn't seem to do that.
> 
> Here you sould probably add a  sharedAsyncExecutionMutex.AssertCurrentThreadOwns(); and add a MutexAutoLock when accessing isAsyncExecutionThreadAvailable in the else if scope.
> 
> You should also update the sharedAsyncExecutionMutex and mAsyncExecutionThreadIsAlive javadocs to point that out. And add a javadoc to isAsyncExecutionThreadAvailable stating what it does and the mutex lock requirement.

Just for my understanding, what's our policy on locks? Is it to use as little as possible or to try to be future/bullet proof? I was avoiding locking because mAsyncExecutionThreadIsAlive is only ever set to false from the main thread (or the owning thread when initializing), and mAsyncExecutionThreadShuttingDown is only ever set from the owning thread. Since we only call this from the main thread which is also the owning thread, it should be correct to not lock, no? (The worst case should be a false negative on mAsyncExecutionThreadIsAlive.) However if any of those subtleties change then this could break, so I could see the value in locking (also I could be missing something that I don't know about, so maybe just better safe than sorry.) In any case asserting seems like a good idea, but if there's any more insight you could give me into our policy on this that would be great :)
(In reply to Doug Thayer [:dthayer] from comment #7)
> I was avoiding
> locking because mAsyncExecutionThreadIsAlive is only ever set to false from
> the main thread

But it's not atomic. Though, if it's only always set and read from the owner thread, then we could avoid the lock if we'd only use a setter and a getter to modify and read from it, and those should assert that we are always on the same thread.
I agree this is less critic than mAsyncExecutionThreadShuttingDown, but we still need some way to check we would not abuse it in the future.

> (or the owning thread when initializing), and
> mAsyncExecutionThreadShuttingDown is only ever set from the owning thread.

mAsyncExecutionThreadShuttingDown is accessed from both threads, and it's not atomic as well. So afaict, in this case you need a lock regardless (or some other less expensive synchronization, but it seems pointless to overengineer a particular solution for just this fix) for your isAsyncExecutionThreadAvailable method.

> Since we only call this from the main thread which is also the owning
> thread, it should be correct to not lock, no? (The worst case should be a
> false negative on mAsyncExecutionThreadIsAlive.)

That'd be bad imo, a false negative could cause obscure bugs in future consumers of it, since they'd expect it to be correct. I can't predict how it will be used, but I'm sure it will be useful for other cases.
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review150042

r=me with a green(ish) Try run (at least xpcshell, cpptests, mochitests for the tier1 platforms).
Attachment #8873622 - Flags: review?(mak77) → review+
Quick update on this: the try run made it clear that something was deadlocking, and indeed there was a trivial (embarrassing) deadlock in that we hold the lock while calling ExecuteSimpleSQLAsync. I then did a follow-up run with this deadlock resolved, and I was still seeing a lot of orange ending on RunWatchdog, so something is clearly still amiss. In any case, just updating the thread, I'll continue digging into this in between other things but I just figured I should post in case there might be some obvious candidate for the failures, or if anyone has any suggestions on how to test this with more depth locally. I've currently just been calling heapMinimize(true) on nsIMemory.
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review151964

::: storage/mozStorageConnection.cpp:944
(Diff revision 2)
>  
> +bool
> +Connection::isAsyncExecutionThreadAvailable()
> +{
> +  sharedAsyncExecutionMutex.AssertCurrentThreadOwns();
> +  return mAsyncExecutionThreadIsAlive && !mAsyncExecutionThreadShuttingDown;

Disclaimer: I am sorry for dropping in at this late time.  I've been involved with writing or reviewing most of the changes around this code, and I still have to do some brain bootstrapping every time I look at it.  I actually skimmed this patch the other day and thought it was fine, but now that I'm looking at bug 1355561, I'm concerned about adding redundant state-tracking because it adds more interactions to consider and there are already so many in this code :(.  (Although having isAsyncExecutionThreadAvailable() is great!)

This could just be `return mAsyncExecutionThread;` (which will coerce to a bool, but we could also prefix with `!!` for clarity).  Sort-of like how our contract with mDBConn is that if it's around it's usable, same deal with mAsyncExecutionThread.

More specifically, mAsyncExecutionThreadIsAlive becomes true when mAsyncExecutionThread becomes non-null. And then mAsyncExecutionThread becomes null at the same time mAsyncExecutionThreadShuttingDown becomes true, making `!!mAsyncExecutionThread == (mAsyncExecutionThreadIsAlive && !mAsyncExecutionThreadShuttingDown)` equivalent in all cases.  mAsyncExecutionThreadIsAlive lives longer so we can have something to assert on at destruction time without holding onto the should-be-dead nsIThread.

This avoids needing to make the assertion-focused mAsyncExecutionThreadIsAlive into an additional/redundant piece of exposed state.

So my suggestion would be that we change the above and lose the parts of this patch that make mAsyncExecutionThreadIsAlive non-DEBUG if you're still touching this patch to deal with the orange.  I can also provide a follow-up patch if you're done and :mak agrees with me.
(In reply to Doug Thayer [:dthayer] from comment #11)
> I then did a follow-up
> run with this deadlock resolved, and I was still seeing a lot of orange
> ending on RunWatchdog, so something is clearly still amiss.

The deadlock is here:

 void
 Connection::shutdownAsyncThread(nsIThread *aThread) {
+  MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
   MOZ_ASSERT(!mAsyncExecutionThread);
   MOZ_ASSERT(mAsyncExecutionThreadIsAlive);
   MOZ_ASSERT(mAsyncExecutionThreadShuttingDown);
 
   DebugOnly<nsresult> rv = aThread->Shutdown();
                            ^^^

You're holding the lock while spinning the event loop, which is almost certainly not what you intended to do! :)

Happily, making the variable debug-only again also means removing this added lock and the source of the problem!
Er, I realized that wasn't descriptive enough.

AsyncCloseConnection::Run does NS_NewRunnableMethod so Connection::shutdownAsyncThread runs on the main thread, then it calls internalClose().  internalClose() attempts to acquire the mutex to set mConnectionClosed = true.  However, the shutdownAsyncThread got to the main thread and began executing really quickly, acquiring the lock before the async thread was able to acquire the lock.  The main thread is spinning until the async thread shuts down, which requires internalClose to complete, but the async thread is stuck in internalClose waiting for the lock, hence deadlock.
Thanks, Andrew! That did in fact resolve it. Marco, would you mind taking another look with the changes Andrew proposed? And I'm now not locking around the whole else if block, to avoid deadlocking on getAsyncExecutionTarget. I believe this should be all right though since AsyncClose is only called from the main thread?

Here's the latest try run. The orange looks like all intermittents to my eyes:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d073e935d927eb638698348b5a16982b18b9dce3
Flags: needinfo?(mak77)
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review152434

::: storage/mozStorageService.cpp:370
(Diff revision 3)
>        MOZ_ASSERT(NS_SUCCEEDED(rv), "Should have purged sqlite caches");
>      } else if (NS_SUCCEEDED(conn->threadOpenedOn->IsOnCurrentThread(&onOpenedThread)) &&
>                 onOpenedThread) {
> -      // We are on the opener thread, so we can just proceed.
> +      bool isAsyncExecutionThreadAvailable;
> +      {
> +        MutexAutoLock lockedScope(conn->sharedAsyncExecutionMutex);

you could probably move the autolock inside isAsyncExecutionThreadAvailable() and here just "if (isAsyncExecutionThreadAvailable())"

But, mAsyncExecutionThread is assigned and nullified on the opener thread, so potentially we may not need a lock at all here.
We could make isAsyncExecutionThreadAvailable callable only by the opener thread for now, through an assert, and not having a lock.
But I don't know if Andrew prefer to have a lock per the sharedAsyncExecutionMutex documentation. Close is getting a lock for mAsyncExecutionThread, even if doesn't seem necessary, and that code predates my historical knowledge.

I'd like to hear from Andrew here, since he may have knowledge about why Close needs the mutex.
Flags: needinfo?(mak77) → needinfo?(bugmail)
Flags: needinfo?(bugmail)
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review152434

> you could probably move the autolock inside isAsyncExecutionThreadAvailable() and here just "if (isAsyncExecutionThreadAvailable())"
> 
> But, mAsyncExecutionThread is assigned and nullified on the opener thread, so potentially we may not need a lock at all here.
> We could make isAsyncExecutionThreadAvailable callable only by the opener thread for now, through an assert, and not having a lock.
> But I don't know if Andrew prefer to have a lock per the sharedAsyncExecutionMutex documentation. Close is getting a lock for mAsyncExecutionThread, even if doesn't seem necessary, and that code predates my historical knowledge.
> 
> I'd like to hear from Andrew here, since he may have knowledge about why Close needs the mutex.

It looks like mAsyncExecutionThread was originally not protected by the mutex, but this changed in bug 496019 during a refactoring.

Before that patch we were calling mAsyncExecutionThread->Shutdown() and nulling it out on the async thread without holding the mutex in internalClose().  I probably wrote that bit and figured it was sound because we set mAsyncExecutionThreadShuttingDown immediately prior to that.  Without a mutex to act as a memory barrier the main thread would potentially see an inconsistent state there, but it hypothetically wouldn't matter because the shutting down flag would dominate.  I'm not sure that was totally sound given that nsCOMPtr operations aren't atomic.  These days, I certainly feel like it's appropriate to use mutexes at least as memory barriers for the compiler's benefit so it doesn't betray us.

I did some reasonably thorough code skimming and I think we're good in terms of all manipulations being on the owning thread.  getAsyncExecutionTarget is used in various places, but it seems to stick to the same rules.

In summary:
- :mak's plan sounds good.  It's fine to not protect it with the mutex as long as we assert what thread we're on.
- Let's update the sharedAsyncExecutionMutex documentation.  If you want to clean up all uses of mAsyncExecutionThread throughout the code to no longer use the guard and add similar assertions in those places, that would be fantastic and I'm happy to review.  But if you'd like to punt on that for now, I totally understand, let's just have the documentation say: "mAsyncExecutionThread was previously protected by this mutex but we realized we only read/write it on the main thread.  We're in the process of cleaning it up." or something to that effect.
er, "only read/write it on the main thread" should be "only read/write it on the owning thread".
(In reply to Andrew Sutherland [:asuth] from comment #19)
> er, "only read/write it on the main thread" should be "only read/write it on
> the owning thread".

I'm not sure if I'm missing something or not, but isn't it nullified on the main thread?

http://searchfox.org/mozilla-central/source/storage/mozStorageConnection.cpp#1240 -> http://searchfox.org/mozilla-central/source/storage/mozStorageConnection.cpp#1331
Flags: needinfo?(bugmail)
there is a bit of confusion between main-thread and opener thread since the async API started being stricter about checking for main-thread, but some older API still checks for opener thread. The async thread only makes sense for main-thread APIs.
Basically, the opener thread and the main thread FOR THE ASYNC API are the same thing, the code just needs cleanup...
Ah okay. Makes sense.
Flags: needinfo?(bugmail)
I mid-aired on this, but I think the following still holds.  (Although :mak is leaning towards main-thread assertions, which I think in my meandering comment below I agree makes the most sense until we have a fully thought-out thread ownership plan.)

(In reply to Doug Thayer [:dthayer] from comment #20)
> I'm not sure if I'm missing something or not, but isn't it nullified on the
> main thread?

I'm being overly pedantic and looking to the future.  Right now it's a given that the owning thread for a connection using the async API is the main thread.  There are a bunch of places we explicitly do things like NS_ReleaseOnMainThread() that bake it in.  That might change in the future[1].

Especially as we move towards more persistently stored data being managed on PBackground or by QuotaManager, it's desirable to try and separate things that must happen on the main thread (the observer service, XPConnect), and things that just usually tend to happen on the main thread.  Referring to owning thread instead of main thread is just a step towards that.

Also, it may be a dumb step to do prematurely.  I know when first was looking at various PBackground things I got frustrated/confused by everything saying owning thread instead of background thread.  I think it turns out a lot of the convention comes from the fact that NS_INLINE_DECL_REFCOUNTING and NS_INLINE_DECL_REFCOUNTING_WITH_DESTROY provide classes with NS_DECL_OWNINGTHREAD and thereby NS_ASSERT_OWNINGTHREAD "for free" (since it needs to save off the thread it was created on to assert at you if you addref/release on the wrong-thread with the non-threadsafe ref-counting).

Disclaimer: I've probably added a lot of NS_IsMainThread() assertions and !NS_IsMainThread() assertions because it's so easy to use and does cover 90% of our use-cases and potential mis-uses.

Note: Bug 1121282 covers how our current threadOpenedOn assertions may be too limiting already and suggest we can't/shouldn't change how we do thread assertions until that's fixed.

1: For example I've proposed that we could let ChromeWorkers use the SQLite.jsm API via WebIDL bindings to mozStorage to help avoid having the Firefox Sync functionality have to do so much on the main thread.  In that case, there would be an async thread and an owning thread that is not the main thread.
(In reply to Andrew Sutherland [:asuth] from comment #18)
> I did some reasonably thorough code skimming and I think we're good in terms
> of all manipulations being on the owning thread.  getAsyncExecutionTarget is
> used in various places, but it seems to stick to the same rules.

Hmm, what about StorageBaseStatementInternal::destructorAsyncFinalize()? It seems that is being called from an unknown thread and trying to figure out if it's on the async thread or not in order to either run the statement directly or dispatch it.
Flags: needinfo?(bugmail)
(In reply to Doug Thayer [:dthayer] from comment #24)
> Hmm, what about StorageBaseStatementInternal::destructorAsyncFinalize()? It
> seems that is being called from an unknown thread and trying to figure out
> if it's on the async thread or not in order to either run the statement
> directly or dispatch it.

Good eye!  I think in this case we want to invert the check from "am I on the async thread" to "am I on the owning thread".  If we're on the owning thread, then we are allowed to invoke getAsyncExecutionTarget() and do the dispatch if it exists.  (And if it doesn't exist, assume that the AsyncCloseRunnable is handling the finalization of the statement so we should take no other action other than nulling out mAsyncStatement.)  If we're not on the owning thread, we're allowed to assume we're on the async thread and run the last ditch finalizer directly.
Flags: needinfo?(bugmail)
(In reply to Marco Bonardo [::mak] from comment #5)
> What you found is correct, indeed I suspect that what happens here is just a
> consequence of the "memory-pressure" notification, that is very likely to
> hit us on 32bit systems (especially win32).

Just my 2p since I dealt with memory-pressure events for quite a while a couple of years ago. I don't have recent data on it but apparently we almost never triggered memory-pressure events on Win32; we tend to OOM because of fragmentation before we get to the memory-pressure threshold. I tried making memory-pressure events more common in bug 1301667 but it didn't work because we don't have a back-off mechanism to prevent a storm of events to be sent back-to-back if memory pressure doesn't ease before we check again. Long story short, check if we're actually hitting this often because we might not.
What do you thing re: gsvelto's comment, Andrew? I'm not sure of the best way to measure how often we are hitting it. I can't find stacks for this in our BHR data, which might be somewhat revealing. In any case I updated the patch per your suggestions, since I could see it having value outside of strong performance needs, and it was already almost there.
Flags: needinfo?(bugmail)
Summary: use mAsyncExecutionThreadIsAlive to shrink memory on the async thread when possible. → Shrink Sqlite memory on the async thread when possible.
I think the issues of how we trigger memory shrinking versus what we do when we memory shrink are separable.  I could imagine mozStorage using its own idle-timer-based memory shrinking as a more proactive approach.  (And a less dramatic one than SQLite.jsm tearing down the entire connection.)  It's also definitely the case that your cleanups are helpful, both in the fixes themselves, and in raising relevant and timely questions.  For example, I think :mak ran smack dab into the destructorAsyncFinalize problem you encountered in bug 1371945 as well, which I'm going to look into momentarily.
Flags: needinfo?(bugmail)
Blocks: 1371945
Doug, how far are we from landing this? We have a chain of Storage changes that we'd prefer to handle sequentially since they touch related code. Is there anything blocking you we could help with?
Hey Marco, I was just waiting on Andrew to review, since the patch had some moderately significant changes from his suggestions. Reading back I don't think I made that explicit though, so sorry about that. I'm ready to land it as soon as it's had a second pair of eyes though, so if you have the time to just take a look instead that works for me.
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review154764

yes, if Try is happy with this please land it, we will handle further work and cleanups in bug 1371945.

::: storage/mozStorageConnection.cpp
(Diff revision 4)
>    {
>      MOZ_ASSERT(NS_IsMainThread());
>    }
>  
>    NS_IMETHOD Run() override {
> -    MOZ_ASSERT (NS_GetCurrentThread() == mConnection->getAsyncExecutionTarget());

Could replace this with the opposite check? (!mDBConnection->threadOpenedOn->IsOnCurrentThread()) or MOZ_ASSERT(!NS_IsOnMainThread())

::: storage/mozStorageConnection.cpp:1337
(Diff revision 4)
> -    // We need to lock because we're modifying mAsyncExecutionThread
> -    MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
> -    closeEvent = new AsyncCloseConnection(this,
> -                                          nativeConn,
> +                                                              nativeConn,
> -                                          completeEvent,
> +                                                              completeEvent,
> -                                          mAsyncExecutionThread.forget());
> +                                                              mAsyncExecutionThread.forget());

check indentation here please, mozreview seems to think the .forget() call is not on its line.
I also like the patch! :)
Comment on attachment 8873622 [details]
Bug 1166166 - Shrink storage cache off main thread

https://reviewboard.mozilla.org/r/145010/#review154764

Ran this a couple days ago: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f05196c258ef6c88047ee288d2e9e2673111040

Not sure what the OSX failures are, running another OSX-specific run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a136a5d7af79f1a255348eafcc959ae6d2e6b5d

> check indentation here please, mozreview seems to think the .forget() call is not on its line.

I'm not actually sure what mozreview is thinking here. I double checked the new indentation and it's all spaces and looks correct.
something went wrong on infra, your tests seem to be forever pending, while other pushes to try are proceeding... I'd suggest to push again.
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec459f72f2ea
Shrink storage cache off main thread r=mak
Thank you, much appreciated.
https://hg.mozilla.org/mozilla-central/rev/ec459f72f2ea
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.