Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after

RESOLVED FIXED in Firefox 56

Status

()

Toolkit
Storage
P1
normal
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: mak, Assigned: mak)

Tracking

Trunk
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 wontfix, firefox56 fixed)

Details

(Whiteboard: [fxsearch])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

Places is hitting one special case:
Schema migration is mostly synchronous because most consumers expect to do things like history.something(), and something() must be able to use a valid database connection.
If the database is corrupt, we need to close the connection, remove it and generate a new one.
Unfortunately migration includes a mixture of synchronous and asynchronous statements, cause some tasks are not critical or are heavy.
Due to the mixture, Close() bails out because we should have called asyncClose().
But we cannot use asyncClose() because we need everything to be synchronous for now.

Thus, I came with the proposal to spin the events loop in close(), and Andrew suggested to add a specific API (https://bugzilla.mozilla.org/show_bug.cgi?id=1355414#c2)

This is the current proposal:
1. add a spinningSynchronousClose() (or similar name) to the connection, that can be used in these specific cases, like the corruption during migration one.
2. make close() call spinningSynchronousClose() if asyncClose should have been invoked, but do that after a MOZ_DIAGNOSTIC_ASSERT
Comment hidden (mozreview-request)

Comment 2

a year ago
mozreview-review
Comment on attachment 8857537 [details]
Bug 1355561 - Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after.

https://reviewboard.mozilla.org/r/129506/#review132100

Looks good, please revise close logic and the test per the comment below.

::: storage/mozStorageConnection.cpp:1255
(Diff revision 1)
>  }
>  
>  NS_IMETHODIMP
> +Connection::SpinningSynchronousClose()
> +{
> +  nsresult rv = AsyncClose(nullptr);

I'm pretty sure you want to provide a callback to AsyncClose and have that decide when to terminate the loop instead of using the shared memory style approach.  For three reasons:

1) The loop could hang as written (although it's unlikely since we expect the main thread to be spammed with events).  Our AsyncCloseConnection runnable does:

* dispatch NewRunnableMethod(&shutdownAsyncThread)
* internalClose()
* dispatch mCallbackEvent if AsyncClose was provided an aCallback (that got wrapped).

In order to avoid hanging, we have to guarantee that there will be a runnable posted to the main thread after the state transitions to closed in internalClose().  SpinningSynchronousClose does not provide a callback, so the guarantee is not met.

2) The existing test to check that the event loop gets spun is race-prone.  There a sleep(50ms) to try and hack around this, but it's happening on the wrong thread.  To reduce the risk of losing the race, the sleep would want to be on the async thread.  By using the callback, you can change your test to verify the event loop is spun by just dispatching a runnable to the main thread that sets a flag without having to worry about races and sleeps because the runnable will deterministically be executed.

3) Since it's a given that there are asynchronous operations in flight, it makes a lot of sense for the call to Close() to ensure that all of the callbacks associated with those async operations have already run before returning.  As written, there's a very real risk that the close will take effect and then the async callbacks will be invoked on the main thread, which seems dangerous.  This does mean that the async callbacks are guaranteed to run with whatever C++ code is on the stack, which sounds like a nightmare for Places, but this just makes it a guaranteed nightmare instead of a very probably nightmare.

Comment 3

a year ago
mozreview-review
Comment on attachment 8857537 [details]
Bug 1355561 - Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after.

https://reviewboard.mozilla.org/r/129506/#review132116

Uh, forgot to set review flag.
Attachment #8857537 - Flags: review?(bugmail) → review-

Comment 4

a year ago
mozreview-review
Comment on attachment 8857537 [details]
Bug 1355561 - Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after.

https://reviewboard.mozilla.org/r/129506/#review132140

::: storage/mozStorageConnection.cpp:1236
(Diff revision 1)
> -    NS_ENSURE_TRUE(asyncCloseWasCalled, NS_ERROR_UNEXPECTED);
> +  }
> +  if (!asyncCloseWasCalled) {
> +    MOZ_ASSERT(false, "Close() was invoked on a connection that executed asynchronous statements. Should have used asyncClose().");
> +#ifdef DEBUG
> +    nsCOMPtr<nsIXPConnect> xpc = do_GetService(nsIXPConnect::GetCID());
> +    Unused << xpc->DebugDumpJSStack(false, false, false);

If the `MOZ_ASSERT` fires above, will we ever get here and print the JS stack before the process crashes?
(In reply to Kit Cambridge [:kitcambridge] (He/him; UTC-8) from comment #4)
> If the `MOZ_ASSERT` fires above, will we ever get here and print the JS
> stack before the process crashes?

good point!

(In reply to Andrew Sutherland [:asuth] from comment #2)
> 1) The loop could hang as written

I was actually evaluating to add a timeout, then I thought that likely async shutdown would have crashed already.
But you are right, I must guarantee that there's something on the main-thread to be safe.

> 2) The existing test to check that the event loop gets spun is race-prone. 
> There a sleep(50ms) to try and hack around this, but it's happening on the
> wrong thread. 

I didn't have lots of ideas on how to test this, so I thought to rely on the fact the spinning would have dispatched the callback before the close call, since that's what asyncClose does normally.
If I get this right, the risk you are pointing out is that asyncClose continues on the async thread, while the main thread sleeps. Basically point 3.

> 3) Since it's a given that there are asynchronous operations in flight

well, this may not be true, if one directly invokes the spinningClose, it may also be on a connection that never ran any async statement. For example in the case of migration, we don't know if we invoked any async statement or not at the point where the corruption is detected. That's the point of this API, we need it to close the connection synchronously regardless of its status.

> makes a lot of sense for the call to Close() to ensure that all of the
> callbacks associated with those async operations have already run before
> returning.  As written, there's a very real risk that the close will take
> effect and then the async callbacks will be invoked on the main thread,
> which seems dangerous.

ugh, now I feel bad for the horrible patch :/
Thank you for finding out this problem, while I originally thought to use the callback, using internal status looked having less overhead. My fault for not having noticed the pitfall in doing that.
Priority: P1 → P2
Whiteboard: [fxsearch]
Comment hidden (mozreview-request)
Is this sort-of what you had in mind?
hm, I have some failures on Try to check.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Ok, the latest version is green(ish).

Comment 13

11 months ago
mozreview-review
Comment on attachment 8857537 [details]
Bug 1355561 - Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after.

https://reviewboard.mozilla.org/r/129506/#review151888

Looks good!  I don't think this needs another review pass, but feel free to re-r? if you want me to do a quick sanity check or anything like that.

::: storage/mozStorageConnection.h:335
(Diff revision 5)
> +  enum ConnectionClosedState {
> +    eUnknown = 0,
> +    eClosing,
> +    eClosed,
> +  };

This additional state tracking no longer accomplishes anything in the patch and I think it should be removed to avoid adding more things to think about.  Especially since eClosing's name overlaps with the very different isClosing().

NB: You will need to update your debug assertion that the close completed.

Restating for my own benefit, I understand our visible state transitions to be these:
- Connection opened:
  - connectionReady() && !isClosing() && !isClosed() && getAsyncExecutionTarget()
- Connection close initiated via AsyncClose(), which invokes setClosedState() on the owning thread:
  - !connectionReady() && isClosing() && !isClosed() && !getAsyncExecutionTarget()
- Sync Close() or AsyncClose() completes on the async thread via internalClose():
  - !connectionReady() && !isClosing() && isClosed() && !getAsyncExecutionTarget()

::: storage/mozStorageConnection.cpp:515
(Diff revision 5)
> +  {
> +    *mClosed = true;
> +    return NS_OK;
> +  }
> +
> +  bool* mClosed;

This seems a lot more dangerous than it needs to be.   As the patch stands, `AsyncClose(); SpinningSynchronousClose()` (which no code does) will result in this writing into stack memory no longer owned by our creator.  (And which could be dangerous if other code decides to spin an event loop on the stack so that the stack would be deep enough for in-use stack to get clobbered.)

While other changes should correct that specific edge case, it seems safer to just expose the closed flag and avoid pointers/references entirely.

::: storage/mozStorageConnection.cpp:1277
(Diff revision 5)
> +Connection::SpinningSynchronousClose()
> +{

I think we need a guard like the following at the top here.  It can go after the thread check.
```
// As currently implemented, we can't spin to wait for an existing AsyncClose.
// Our only existing caller will never have called close; assert if misused
// so that no new callers assume this works after an AsyncClose.
MOZ_DIAGNOSTIC_ASSERT(connectionReady());
if (!connectionReady()) {
  return NS_ERROR_UNEXPECTED;
}
```

The previously alluded to `AsyncClose(); SpinningSynchronousClose();` doom which this would prevent is:
- first AsyncClose(): goes async to do stuff.
- SpinningSycnronousClose() hands off to a second AsyncClose(callback)
  - because !asyncThread (because mAsyncExecutionThreadShuttingDown)
  - does NS_DispatchToMainThread(callback)
  - calls and returns the rv of Close() which will return NS_ERROR_NOT_INITIALIZED.  This will cause the NS_ENSURE_SUCCESS(rv, rv) after the AsyncClose() call to cause us to return without spinning the event loop, but leave our scheduled-runnable with a pointer into stack memory which it will write into when it runs, corrupting the stack.


I think AsyncClose should probably have the contract that if it returns success then any callback will execute in the future, and that if it returns failure, the callback will not be executed.  So I would suggest we change its !asyncThread branch to do `MOZ_ALWAYS_SUCCEEDS(Close());` and unconditionally return NS_OK.  The assert is desirable because:
1. By adding bail logic, this path no longer handles the "too many AsyncClose()/Close() calls" case which is probably not crash-worthy.
2. We don't expect actual closing to fail, and this is pretty deep into edge-case territory.

I also think bail logic at the top of SpinningSynchronousClose makes sense because the current behavior is non-obvious.  I had to trace that the call to SpinningSynchronousClose() that calls AsyncClose() that calls Close() works out (other than the stack trasher).  (It does because Close() fast-bails when !mDBConn, but let's save future us some work! :)

::: storage/mozStorageConnection.cpp:1288
(Diff revision 5)
> +  bool processed;
> +  nsCOMPtr<nsIThread> thread(::do_GetCurrentThread());
> +  while (!connectionClosed) {
> +    Unused << thread->ProcessNextEvent(true, &processed);
> +  }

This idiom changed in the interim in bug 1359490.  Now it should be something like the following:

MOZ_ALWAYS_TRUE(SpinEventLoopUntil([&]() { return connectionClosed; }));

::: storage/test/gtest/test_spinningSynchronousClose.cpp:10
(Diff revision 5)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "storage_test_harness.h"
> +#include "prinrval.h"
> +
> +class CompletionRunnable final : public Runnable

I'd suggest adding a comment like the following (for my benefit in the future ;)

Helper to verify that the event loop was spun.  As long as this is dispatched prior to a call to Close()/SpinningSynchronousClose() we are guaranteed this will be run if the event loop is spun to perform a close.  This is because SpinningSynchronousClose must spin the event loop to realize the close completed and our runnable will already be enqueued and therefore run before the AsyncCloseConnection's callback.  Note that this invariant may be violated if our runnables end up in different queues thanks to Quantum changes, so this test may need to be updated if the close dispatch changes.

::: storage/test/gtest/test_spinningSynchronousClose.cpp:50
(Diff revision 5)
> +  nsCOMPtr<nsIRunnable> event = new CompletionRunnable(&done);
> +  NS_DispatchToMainThread(event, NS_DISPATCH_NORMAL);
> +  do_check_false(NS_SUCCEEDED(db->Close()));
> +  do_check_true(done);
> +}
> +#endif

Maybe it would also make sense to add an "AsyncClose(); SpinningSynchronousClose();" test in here to verify that we explicitly error out in that case?
Attachment #8857537 - Flags: review?(bugmail) → review+
Comment hidden (mozreview-request)
(In reply to Andrew Sutherland [:asuth] from comment #13)
> Maybe it would also make sense to add an "AsyncClose();
> SpinningSynchronousClose();" test in here to verify that we explicitly error
> out in that case?

The problem in doing this is that we hit the MOZ_DIAGNOSTIC_ASSERT(connectionReady());, so either I change that to a MOZ_ASSERT, or don't add a test.
(In reply to Marco Bonardo [::mak] from comment #15)
> The problem in doing this is that we hit the
> MOZ_DIAGNOSTIC_ASSERT(connectionReady());, so either I change that to a
> MOZ_ASSERT, or don't add a test.

Ah, yes, that would seem to be a problem... I think the diagnostic assert is more valuable than the test.
The change uncovered some other problems:
1. Some code is invoking asyncClose multiple times, now we would invoke Close() and assert the second time. That breaks some tests and thus I just added an early  if (!mDBConn) { return NS_ERROR_NOT_INITIALIZED; } to asyncClose() similar to the Close() one.

2. AsyncInitDatabase invokes the connection initialize() method. If that fails, we create an asyncClose runnable and dispatch it to the main thread to eventually try to clean up an half open connection.
But looks like the connection destructor may run before, invoking Close(), probably because nothing holds onto the connection. I am evaluating to maybe redispatch the same runnable that at least has a strong ref onto the connection. Let's see
hm, 2. is instead caused by 1.
test_asyncClose_failed_open creates a fake invalid database and tries to asyncOpen it, that creates the async thread... Then when initialize() fails on the connection we try to invoke asyncClose, that finds mDBConn null. At this point we hit the ~Connection assert about mAsyncExecutionThreadIsAlive, because nobody shutdown the thread.
On the other side, without the early return, we hit the MOZ_ALWAYS_SUCCEEDS(Close());
I guess my early return is not great, since we should proceed and shutdown the thread.
Given that initialize() and friends take care to ensure the connection is actually closed and it's clear the use of AsyncClose() was just a shortcut to avoid a special code path... I think we should add your early bail and introduce something like Connection::initializeOnAsyncThread(nsIFile* aStorageFile) and have AsyncInitDatabase use it.  All that really needs to be done at that point is to shutdown the async thread.

The code would basically be:
  MOZ_ASSERT(!NS_IsMainThread());
  nsresult rv = aStorageFile ? initialize(aStorageFile)
                             : initialize();
  if (NS_FAILED(rv)) {
    MutexAutoLock lockedScope(sharedAsyncExecutionMutex);
    mAsyncExecutionThreadShuttingDown = true;
    nsCOMPtr<nsIRunnable> event = NewRunnableMethod<nsCOMPtr<nsIThread>>
      (mConnection, &Connection::shutdownAsyncThread, mAsyncExecutionThread.forget());
    (void)NS_DispatchToMainThread(event);
  }
  return rv;


Also, I think this raises the issue that a failed initialize() may also want to do `mConnectionClosed = true;` so that isClosed() returns true and all our state invariants hold.  Given the amount of duplicate boilerplate we have in initializeInternal() (especially now with sharedDBMutex.destroy();) I think we should adopt use of MFBT's ScopeExit (https://searchfox.org/mozilla-central/source/mfbt/ScopeExit.h) helper.
I fixed the failing tests, that indeed were either invoking close() before asyncClose(), or double invoking asyncClose().
In the end the assert looks useful to find those cases.

With the tests fixed, AsyncClose works as expected, if the async thread is not present, it will invoke Close(), that will eventually assert, for example if the connection has already been closed. if the async thread is present, it will proceed trying to close the connection (and finalizing the async thread).
Though, if (!asyncThread && !mDBConn) I will both assert and return an error, or a mistake may be invisible in opt builds.

I'll also evaluate your suggestion in comment 19 for the async opening, then I hope to stop there, since there's enough material already here :) But only Try will tell me.
note, there are a few points in initializeInternal() that don't properly invoke ::sqlite_close() and friends in case of error.
I agree that using a ScopeExit sounds like the best thing here. I'll land bug 1371945 when possible, then rebase on top of it.
Depends on: 1371945
Priority: P2 → P1
Comment hidden (mozreview-request)

Comment 23

10 months ago
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/73edd4e1acef
Add a new API to spinningly close the database when strictly needed, and ensure Close() does what it's named after. r=asuth

Comment 24

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/73edd4e1acef
Status: ASSIGNED → RESOLVED
Last Resolved: 10 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Blocks: 1355084
Too late for 55. Mark 55 won't fix.
status-firefox55: affected → wontfix
You need to log in before you can comment on or make changes to this bug.