Closed Bug 494828 Opened 15 years ago Closed 15 years ago

Stop using our own mutexes and use SQLite's where possible.

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: sdwilsh, Assigned: sdwilsh)

References

Details

Attachments

(2 files, 6 obsolete files)

Each Connection object has four different mutexes associated with it, on top of the one that SQLite already has for the database connection.  Of our four, I think only one still needs to be separate, but all others can use the mutex exposed via the SQLite API.  sqlte3_mutex is a recursive mutex, so the thread can obtain the mutex multiple times without deadlocking.

The four mutexes currently being used:
mAsyncExecutionMutex
This protects access to our asynchronous execution thread.  Since it does not interact at all with the database connection, it should not use the database handle's mutex.

mTransactionMutex
This protects mTransactionInProgress, and should be safe to use the connection's mutex. The mutex will already be obtained when these methods do their work anyway.

mFunctionsMutex
This protects our mFunctions hash table.  I don't see why we can't use the database connection's mutex here.

mProgressHandlerMutex
This guards our progress handler (one per connection).  It is used in the setting and removing of progress handlers, calling the progress handler (which is wrong - we are holding a lock while calling outside of our module), and when we close the database connection, which I think we can avoid as well.  This should be able to use the database connection's mutex.


We can effectively remove three mutexes per connection, which should reduce the number of locks (and system resources) quite a bit.  Bug 488148 also helps here in reducing the number of mutexes storage uses.  That use case cannot use the connections mutex, however, since the object could last longer than Connection::mDBConn.

Note: This will require the creation of two helper objects that automagically lock upon construction, and unlock on destruction (or vice versa) for sqlite3_mutex.  We'll need to have debug builds of SQLite (bug 494826) in our own debug builds to assert that a mutex is held or not held (or at least should).
Other than the issue with calling the progress handler, is reusing the SQLite mutex for mFunctions and mProgressHandler going to create any performance problems by unnecessarily blocking SQLite?  (I don't know what the DB connection mutex protects.)

You should also (in theory) be able to integrate the SQLite mutex into our deadlock detector by wrapping it in a class that inherits from mozilla::BlockingResourceBase.  Depending on SQLite's API, this may not work (or be too hard).  We can chat on IRC about what this would involve.
(In reply to comment #1)
> Other than the issue with calling the progress handler, is reusing the SQLite
> mutex for mFunctions and mProgressHandler going to create any performance
> problems by unnecessarily blocking SQLite?  (I don't know what the DB
> connection mutex protects.)
The mutex is used to serialize access to the connection since only one thread can access it at a time.  Modifying progress handlers is rare, so I don't think it'll add any contention worth noting.  Adding functions is likely only going to be done when you open the connection and set it up, which is inherently serial, and people tend to not remove functions.  As a result, I don't think it would add any contention (we'd also acquire the lock anyway when we register the function/progress handler with SQLite)

> You should also (in theory) be able to integrate the SQLite mutex into our
> deadlock detector by wrapping it in a class that inherits from
> mozilla::BlockingResourceBase.  Depending on SQLite's API, this may not work
> (or be too hard).  We can chat on IRC about what this would involve.
I wasn't sure if I could do that.  The sqlite mutex is a recursive mutex, which is something the deadlock detector checks for, right?
(In reply to comment #2)
> (In reply to comment #1)
> > You should also (in theory) be able to integrate the SQLite mutex into our
> > deadlock detector by wrapping it in a class that inherits from
> > mozilla::BlockingResourceBase.  Depending on SQLite's API, this may not work
> > (or be too hard).  We can chat on IRC about what this would involve.
> I wasn't sure if I could do that.  The sqlite mutex is a recursive mutex, which
> is something the deadlock detector checks for, right?

The subclass defines what "recursive mutex" means; mozilla::Monitor does almost exactly this already.  The deadlock detector and BlockingResourceBase class only understand "really" acquiring (as opposed to recursively entering) and releasing primitives; i.e., "I now own this lock," "I no longer own this monitor."
Summary: Stop using our own mutexes and use SQLite's where possible. → p
Summary: p → Stop using our own mutexes and use SQLite's where possible.
Wow, don't know what happened with the title there.  Apologies.
Attached patch helpers v1.0 (obsolete) — Splinter Review
Step one - get some helper objects.
Assignee: nobody → sdwilsh
Status: NEW → ASSIGNED
Attachment #380211 - Flags: review?(jones.chris.g)
Attachment #380211 - Flags: review?(bugmail)
Comment on attachment 380211 [details] [diff] [review]
helpers v1.0

>new file mode 100644
>--- /dev/null
>+++ b/storage/src/SQLiteMutex.h
>+  /**
>+   * Constructs a wrapper for a sqlite3_mutex that has deadlock detecting.
>+   *
>+   * @param aName
>+   *        A name which can be used to reference this mutex.
>+   */
>+  SQLiteMutex(const char *aName)
>+  : BlockingResourceBase(aName, eMutex)

Nit: I seem to recall some (such as jst-review) liking the ':' on the same line as the constructor, but I don't care.

>+  /**
>+   * Sets the mutex that we are wrapping.  We generally do not have access to
>+   * our mutex at class construction, so we have to set it once we get access to
>+   * it.
>+   *
>+   * @param aMutex
>+   *        The sqlite3_mutex that we are going to wrap.
>+   */
>+  void setMutex(sqlite3_mutex *aMutex)
>+  {
>+    NS_ASSERTION(aMutex, "You must pass in a valid mutex!");
>+    mMutex = aMutex;
>+  }
>+

Big non-nit: BlockingResourceBase expects that each instance is associated with one and only one underlying primitive.  That is, the sqlite3_mutex should be synonymous with its SQLiteMutex wrapper, and no two wrappers should own the same mutex, and no single wrapper should own multiple mutexes in its lifetime.

If this isn't true, bad things will happen.  You'll need a single SQLiteMutex per sqlite3_mutex.  The best way to do this would be to create the sqlite3_mutex in the SQLiteMutex constructor.  Next best would be making the sqlite3_mutex should a param of the constructor, and least preferable would be an |Init()| method with suitable protection against multiple calls.  I don't know what API constraints you have, though.

>+#else
>+  void lock()
>+  {
>+    NS_ASSERTION(mMutex, "No mutex associated with this wrapper!");
>+

My personal preference would be to set up mMutex in the constructor and do away with these checks.  But I don't know what your constraints are.


>+  void assertCurrentThreadOwns()
>+  {

Missing a check that mMutex is non-null, unless SQLite does this for you.  But see above regarding the constructor.

>+    NS_ASSERTION(sqlite3_mutex_held(mMutex),
>+                 "Mutex is not held, but we expect it to be!");
>+  }
>+

You'll want to make certain that |sqlite3_mutex_held()| tests that /this/ thread holds the mutex rather than /some/ thread holds the mutex.  If it's the latter, you should change the name of this function to |SQLiteMutex::assertHeld| or something to avoid confusion with the mozilla::Mutex methods.

>+  void assertNotCurrentThreadOwns()
>+  {

Missing a check that mMutex is non-null, unless SQLite does this for you.  But see above regarding the constructor.

--- a/storage/test/test_mutex.cpp
+++ a/storage/test/test_mutex.cpp

This test looks OK, but it wouldn't be a bad idea to ensure that deadlocks involving SQLiteMutex are being checked properly.  It's inconvenient to write a unit test for this, but a one-off program wouldn't hurt.  There are some examples you can reference in xpcom/tests/TestDeadlockDetector.cpp.
Attachment #380211 - Flags: review?(jones.chris.g) → review-
(In reply to comment #6)
> Big non-nit: BlockingResourceBase expects that each instance is associated with
> one and only one underlying primitive.  That is, the sqlite3_mutex should be
> synonymous with its SQLiteMutex wrapper, and no two wrappers should own the
> same mutex, and no single wrapper should own multiple mutexes in its lifetime.
Right, and I would love to be able to pass the lock in the constructor, but I won't be able to do that.  The primary (only only, so far) use case for this is to wrap the mutex obtained from sqlite3_db_mutex [1], which we can't use until have a SQLite database connection, which we won't have until we are inside Connection::initialize.  I want this mutex to be a class member variable, and not one that I allocate.  I never actually plan on allocating my own sqlite3_mutex to use with this (I'd just use mozilla::Mutex then).

setMutex is supposed to be the equivalent of an initialization method.  I've added another assertion in there to make sure we aren't setting a mutex after we have already set one.  I'm not sure if it's worth the extra code to check that we haven't wrapped this mutex before, and could be something easily enforced by reviews (especially with a big warning above SQLiteMutex).

Thoughts on this?

> Missing a check that mMutex is non-null, unless SQLite does this for you.  But
> see above regarding the constructor.
[snip]
> You'll want to make certain that |sqlite3_mutex_held()| tests that /this/
> thread holds the mutex rather than /some/ thread holds the mutex.  If it's the
> latter, you should change the name of this function to
> |SQLiteMutex::assertHeld| or something to avoid confusion with the
> mozilla::Mutex methods.
Per http://www.sqlite.org/c3ref/mutex_held.html, we are OK here.

> +++ a/storage/test/test_mutex.cpp
> 
> This test looks OK, but it wouldn't be a bad idea to ensure that deadlocks
> involving SQLiteMutex are being checked properly.  It's inconvenient to write a
> unit test for this, but a one-off program wouldn't hurt.  There are some
> examples you can reference in xpcom/tests/TestDeadlockDetector.cpp.

[1] http://www.sqlite.org/c3ref/db_mutex.html
Attached patch code change v1.0 (obsolete) — Splinter Review
And to give you a better idea of my invariants, here's the code changes using the helper object.
Attachment #380272 - Flags: review?(bugmail)
Attached patch code change v1.0 (obsolete) — Splinter Review
forgot to qrefresh after a comment change...
Attachment #380272 - Attachment is obsolete: true
Attachment #380273 - Flags: review?(bugmail)
Attachment #380272 - Flags: review?(bugmail)
Attached patch helpers v1.1 (obsolete) — Splinter Review
Might be ready for review.  Depends what cjones says about comment 7.  This brings in the deadlock detector test from xpcom/tests and re-purposes it for our use.
Attachment #380211 - Attachment is obsolete: true
Attachment #380211 - Flags: review?(bugmail)
(In reply to comment #7)
> (In reply to comment #6)
> > Big non-nit: BlockingResourceBase expects that each instance is associated with
> > one and only one underlying primitive.  That is, the sqlite3_mutex should be
> > synonymous with its SQLiteMutex wrapper, and no two wrappers should own the
> > same mutex, and no single wrapper should own multiple mutexes in its lifetime.
> Right, and I would love to be able to pass the lock in the constructor, but I
> won't be able to do that.  The primary (only only, so far) use case for this is
> to wrap the mutex obtained from sqlite3_db_mutex [1], which we can't use until
> have a SQLite database connection, which we won't have until we are inside
> Connection::initialize.  I want this mutex to be a class member variable, and
> not one that I allocate.  I never actually plan on allocating my own
> sqlite3_mutex to use with this (I'd just use mozilla::Mutex then).
> 
> setMutex is supposed to be the equivalent of an initialization method.  I've
> added another assertion in there to make sure we aren't setting a mutex after
> we have already set one.  I'm not sure if it's worth the extra code to check
> that we haven't wrapped this mutex before, and could be something easily
> enforced by reviews (especially with a big warning above SQLiteMutex).
> 
> Thoughts on this?
> 

I would call it |Init()| to make it clear what's happening, but that's a point of style.  It sounds like |SQLite.Init()| will be called at exactly one site, yes?  If so, just the |assert(!mMutex)| in Init might be sufficient.

I'll r+ a second version with the assertion guarding setMutex/Init, and comments warning that if there isn't a one-to-one mapping between sqlite mutexes and SQLiteMutexes over the lifetime of the program, you void the deadlock detector's warranty.

> > Missing a check that mMutex is non-null, unless SQLite does this for you.  But
> > see above regarding the constructor.
> [snip]
> > You'll want to make certain that |sqlite3_mutex_held()| tests that /this/
> > thread holds the mutex rather than /some/ thread holds the mutex.  If it's the
> > latter, you should change the name of this function to
> > |SQLiteMutex::assertHeld| or something to avoid confusion with the
> > mozilla::Mutex methods.
> Per http://www.sqlite.org/c3ref/mutex_held.html, we are OK here.
> 

According to the doc, sqlite_mutex_held returns 1 when the mutex param is NULL.  They say this is for builds not requesting thread safety.  However, we'll never do this, so this "feature" might cause you to miss real bugs.  I'd still add the non-NULL assertions.
Attached patch helpers v1.2 (obsolete) — Splinter Review
Attachment #380290 - Attachment is obsolete: true
Attachment #380300 - Flags: review?(jones.chris.g)
Attachment #380300 - Flags: review?(bugmail)
Attached patch code change v1.1 (obsolete) — Splinter Review
Attachment #380273 - Attachment is obsolete: true
Attachment #380301 - Flags: review?(bugmail)
Attachment #380273 - Flags: review?(bugmail)
Whiteboard: [needs review asuth][needs review cjones]
Comment on attachment 380300 [details] [diff] [review]
helpers v1.2

>diff --git a/storage/src/SQLiteMutex.h b/storage/src/SQLiteMutex.h
>new file mode 100644
>--- /dev/null
>+++ b/storage/src/SQLiteMutex.h
>+/**
>+ * Wrapper class for sqlite3_mutexes.  To be used whenever we want to use a
>+ * sqlite3_mutex.
>+ *
>+ * @warning Never EVER wrap the same sqlite3_mutex with different a SQLiteMutex.

Nit: "with a different".

>+  void assertNotCurrentThreadOwns()
>+  {
>+    NS_ASSERTION(mMutex, "No mutex associated with this wrapper!");
>+    NS_ASSERTION(sqlite3_mutex_notheld(mMutex),
>+                 "Mutex is held, but we expect to not be!");

Nit: "expect it to not be".

Looks good, thanks.
Attachment #380300 - Flags: review?(jones.chris.g) → review+
Attached patch helpers v1.3Splinter Review
Addresses review comments.
Attachment #380300 - Attachment is obsolete: true
Attachment #380304 - Flags: review?(bugmail)
Attachment #380300 - Flags: review?(bugmail)
Whiteboard: [needs review asuth][needs review cjones] → [needs review asuth]
Try server indicates that I need to add sqlite3_db_mutex to sqlite.def as well.  I'll upload a new patch tomorrow.
Attached patch code change v1.2Splinter Review
Attachment #380301 - Attachment is obsolete: true
Attachment #380485 - Flags: review?(bugmail)
Attachment #380301 - Flags: review?(bugmail)
Attachment #380304 - Flags: review?(bugmail) → review+
Comment on attachment 380485 [details] [diff] [review]
code change v1.2

Removal of the explicit progress handler removal seems safe enough.  Even if the connection close fails, there should be no way left for it to generate events (especially now that people can't get at the native connection.)
Attachment #380485 - Flags: review?(bugmail) → review+
Whiteboard: [needs review asuth] → [can land]
Blocks: 505550
Depends on: 505586
http://hg.mozilla.org/mozilla-central/rev/50281086dddd
http://hg.mozilla.org/mozilla-central/rev/0f19329641be
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [fixed-in-places]
Target Milestone: --- → mozilla1.9.2a1
I backed out the changes to our code but not the adding of the mutex helpers on mozilla-central.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 505550
I'm pretty sure the crash I was seeing on tinderbox was the one that was fixed by bug 507199.  I'll try to re-land this tomorrow.
Status: REOPENED → ASSIGNED
http://hg.mozilla.org/mozilla-central/rev/59a6c227df86
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.9.2a1 → mozilla1.9.3a1
I backed this out because there were a lot of failures in various Places tests after this landed:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254430662.1254433566.20000.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254431703.1254435076.4193.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254433691.1254437796.2542.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254440804.1254443040.26924.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254432240.1254435242.6144.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254436169.1254439463.20551.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254436406.1254439933.25592.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254434999.1254437746.1813.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254444213.1254446779.2280.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254448561.1254451001.15653.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254439069.1254442470.20921.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254440388.1254443652.1135.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254443311.1254446525.32149.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254445524.1254449213.28616.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254447133.1254449530.32037.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254453041.1254455230.29107.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254456793.1254459444.10166.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254456945.1254459123.6622.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254458152.1254460120.17541.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254456894.1254458744.2472.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254457138.1254460120.17542.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254457666.1254460118.17256.gz

Bug 520113 was filed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 520113
This passed the try server just now without issue.  I'm going to try this once more on mozilla-central.  I don't think it was an intermittent failure at all, and I'll have time to watch the tree for a while.
No longer blocks: 520113
Depends on: 520113
I am going to push this with one change though.  I want to add this line at the start of a few methods (like BeginTransactionAs) since we are accessing some of the sqlite3 objects members:
  if (!mDBConn) return NS_ERROR_NOT_INITIALIZED;
r=asuth for the safe-to-add-everywhere error-checking line
http://hg.mozilla.org/mozilla-central/rev/0ceb1ee6ad55

Things look good so far.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Since the push, including the run from the push, every single Windows Ed run has failed with "bug 510219." If that's a coincidence, it's a pretty alarming one, since the last time we hit that on Windows was in early November.
(In reply to comment #29)
> Since the push, including the run from the push, every single Windows Ed run
> has failed with "bug 510219." If that's a coincidence, it's a pretty alarming
> one, since the last time we hit that on Windows was in early November.
I find it pretty unlikely that this changed any observable behavior, but it certainly could have changed some slight timing since we are using a different mutex now.  That slight timing change will certainly be exaggerated on debug boxes in a VM.  I'll take a look at the test right away this morning.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: