Closed Bug 420505 Opened 13 years ago Closed 13 years ago

mozStorageService isn't as threadsafe as it claims to be

Categories

(Toolkit :: Storage, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta5

People

(Reporter: sdwilsh, Assigned: sdwilsh)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 6 obsolete files)

Two things - one I know is an issue, the other bsmedberg should comment on:
1) We need a lock around calls that initialize the database connections since for shared connections we enable and disabled the shared cache.  I missed this in my review of myk's patch to enable fts.

2) I'm not sure on this, but our GetSingleton method [1] doesn't seem so threadsafe to me.  Maybe it's the cause of bug 408518?

[1] http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/storage/src/mozStorageService.cpp&rev=1.15#54
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
I need issue two addressed - but regardless, this should take less than an hour.
Flags: in-testsuite-
Flags: in-litmus-
Whiteboard: [needs patch][swag 1hr]
The getsingleton method isn't very threadsafe: making it threadsafe involves either a lock or a sentinel and a busy-wait.
When is a good time to create the lock required for that then?
How much activity goes on in the storageservice constructor? A busy-wait is a lot simpler if it doesn't take long.
Very little actually.  Busy-wait is probably best.
Attached patch v1.0 (obsolete) — Splinter Review
My swag estimate was a bit off...

So, after much discussion on irc I think I finally have the right solution here.  Requesting r from bsmedberg and sr (although I can't actually request sr in toolkit it seems) from brendan because I'm using lots of new stuff that I've never used before.
Attachment #308004 - Flags: review?(brendan)
Attachment #308004 - Flags: review?(benjamin)
Whiteboard: [needs patch][swag 1hr] → [has patch][needs review bsmedberg, brendan][swag 2hr]
Comment on attachment 308004 [details] [diff] [review]
v1.0

I forgot to run the unit tests, which demonstrates that this doesn't work...
Attachment #308004 - Attachment is obsolete: true
Attachment #308004 - Flags: review?(brendan)
Attachment #308004 - Flags: review?(benjamin)
Attached patch v1.1 (obsolete) — Splinter Review
OK, that was a *stupid* mistake...
Attachment #308006 - Flags: review?(brendan)
Attachment #308006 - Flags: review?(benjamin)
Comment on attachment 308006 [details] [diff] [review]
v1.1

>+static PRCallOnceType initOnce;

Nit: sInitOnce, to match gFoo?

> mozStorageService::~mozStorageService()
> {
>     gStorageService = nsnull;
>+    PR_DestroyLock(mLock);
>+    PR_DestroyLock(gSingletonLock);

Null gSingletonLock after destroying it.

Looks good to me otherwise.

/be
Attachment #308006 - Flags: review?(brendan) → review+
Whiteboard: [has patch][needs review bsmedberg, brendan][swag 2hr] → [has patch][needs review bsmedberg][swag 2hr]
Comment on attachment 308006 [details] [diff] [review]
v1.1

> mozStorageService::~mozStorageService()
> {
>     gStorageService = nsnull;
>+    PR_DestroyLock(mLock);
>+    PR_DestroyLock(gSingletonLock);
> }

As brendan said, you need to null out gSingletonLock here... but more concerning to me, how do you know that somebody won't try to re-create the storage service again? When that happens, the runonce call will have already happened, but gSingletonLock will be null. You should probably test gSingletonLock at that point and if it's null, fail early.

>     // We want to return a valid connection regardless if it succeeded or not so
>     // that consumers can backup the database if it failed.
>+    PR_Lock(mLock);
>     (void)msc->Initialize(aDatabaseFile);
>+    PR_Unlock(mLock);
>     NS_ADDREF(*_retval = msc);

Use nsAutoLock with braces for scoping:

    {
      nsAutoLock lock(mLock);
      (void)msc->Initialize(aDatabaseFile);
    }

>+    PR_Lock(mLock);
>     sqlite3_enable_shared_cache(0);
>     (void)msc->Initialize(aDatabaseFile);
>     sqlite3_enable_shared_cache(1);
>+    PR_Unlock(mLock);

ditto
Attachment #308006 - Flags: review?(benjamin) → review-
Whiteboard: [has patch][needs review bsmedberg][swag 2hr] → [needs new patch][swag 2hr]
Attached patch v1.2 (obsolete) — Splinter Review
Addresses review comments (as well as one of shaver's via irc a long time ago).  I only wish that we had reader writer locks because we now have contention when creating database connections, when most of the time it wouldn't matter.
Attachment #308006 - Attachment is obsolete: true
Attachment #309670 - Flags: review?(benjamin)
Attachment #309670 - Attachment is patch: true
Attachment #309670 - Attachment mime type: application/octet-stream → text/plain
Attached patch v1.3 (obsolete) — Splinter Review
caught a minor nit
Attachment #309670 - Attachment is obsolete: true
Attachment #309671 - Flags: review?(benjamin)
Attachment #309670 - Flags: review?(benjamin)
(In reply to comment #11)
>  I only wish that we had reader writer locks because we now have contention
> when creating database connections, when most of the time it wouldn't matter.

PRRWLock is in NSPR. Wan-Teh has reported users finding it does not avoid contention as much as hoped.

/be
(In reply to comment #13)
> PRRWLock is in NSPR. Wan-Teh has reported users finding it does not avoid
> contention as much as hoped.
The only documentation I found on that (the NSPR 3.0 release notes) indicated that mozilla.org projects could not use that because it was not licensed under the MPL.
Whiteboard: [needs new patch][swag 2hr] → [has patch][needs review bsmedberg][swag 2hr]
NSPR reader writer locks are now licensed under the same triple
license as the rest of NSPR.
gah.

there's code in the component manager guarded by XPCOM_CHECK_PENDING_CIDS which would prevent the need for this junk. instead of adding messy code to each module that needs to be guarded, let's just unconditionally enable it.

with XPCOM_CHECK_PENDING_CIDS, there is a promise by xpcom that it will not call a factory for a getservice until it finishes the current attempt. and when it succeeds, it would not need to call the factory again.
Attachment #309801 - Flags: review?(benjamin)
(In reply to comment #16)
> with XPCOM_CHECK_PENDING_CIDS, there is a promise by xpcom that it will not
> call a factory for a getservice until it finishes the current attempt. and when
> it succeeds, it would not need to call the factory again.
I was wondering why we didn't just do that by default...

We'd still need locks for sqlite3_enabled_shared_cache calls though
Comment on attachment 309671 [details] [diff] [review]
v1.3

i'll give you a free r- :)

>+static PRLock *gSingletonLock;

lose this stuff as soon as bsmedberg approves ;-)

>+mozStorageService::mozStorageService() :
this should be done in Init so that failure can be handled:
>+    mLock(PR_NewLock())

> mozStorageService::~mozStorageService()
this needs to be guarded by a null check:
>+    PR_DestroyLock(mLock);

> mozStorageService::Init()
>+    NS_ENSURE_TRUE(mLock, NS_ERROR_OUT_OF_MEMORY);
instead of this (which isn't compatible w/ your destructor = crash)

> mozStorageService::OpenDatabase(nsIFile *aDatabaseFile, mozIStorageConnection **_retval)
>+        (void)msc->Initialize(aDatabaseFile);

why void? (having seen things that can fail)

> mozStorageService::OpenUnsharedDatabase(nsIFile *aDatabaseFile, mozIStorageConnection **_retval)
>     // without affecting the caches currently in use by other connections.
>     // We want to return a valid connection regardless if it succeeded or not so
>     // that consumers can backup the database if it failed.
>+        nsAutoLock lock(mLock);
>+        sqlite3_enable_shared_cache(0);
>+        (void)msc->Initialize(aDatabaseFile);
>+        sqlite3_enable_shared_cache(1);

this is strange, is it a global thing or a per database thing? it seems like this should be done somewhere else.
Attachment #309671 - Flags: review-
ugh - I don't remember seeing this when we did it:
It is illegal to call sqlite3_enable_shared_cache() if one or more open database connections were opened by the calling thread. If the argument is non-zero, shared-cache mode is enabled. If the argument is zero, shared-cache mode is disabled. The return value is either SQLITE_OK (if the operation was successful), SQLITE_NOMEM (if a malloc() failed), or SQLITE_MISUSE (if the thread has open database connections).

Myk, I think we have an issue with the shared DB patch and we may need to back it out.
Whiteboard: [has patch][needs review bsmedberg][swag 2hr] → [needs new patch][swag 1hr]
Comment on attachment 309671 [details] [diff] [review]
v1.3

Does anything other than CreateInstance call mozStorageService::GetSingleton? If so, the PENDING_CIDS thing won't help solve this particular bug.
(In reply to comment #20)
> (From update of attachment 309671 [details] [diff] [review])
> Does anything other than CreateInstance call mozStorageService::GetSingleton?
> If so, the PENDING_CIDS thing won't help solve this particular bug.
The only thing that should call mozStorageService::GetSingleton is the factor object.  Does this mean I don't need to worry about all that locking after all?
(In reply to comment #19)
> ugh - I don't remember seeing this when we did it:
> It is illegal to call sqlite3_enable_shared_cache() if one or more open
> database connections were opened by the calling thread. If the argument is
> non-zero, shared-cache mode is enabled. If the argument is zero, shared-cache
> mode is disabled. The return value is either SQLITE_OK (if the operation was
> successful), SQLITE_NOMEM (if a malloc() failed), or SQLITE_MISUSE (if the
> thread has open database connections).

Note: over in bug 413589 we determined through testing and communication with a SQLite developer that this documentation is out of date, and it is now fine to call sqlite3_enable_shared_cache when one or more open database connections have been opened by the calling thread, so we don't need to change the behavior of the openUnsharedDatabase method added in that bug.
Depends on: 423633
it should, all cases where it doesn't are bugs elsewhere (the one mentioned above in component manager, and bug 423633)

a short explanation of bug 423633; with attachment 309801 [details] [diff] [review] applied, when someone with nothing better to do asks the component manager for the factory directly:

f=Components.manager.getClassObjectByContractID("@mozilla.org/storage/service;1",Components.interfaces.nsIFactory);

then they can trigger nsIFactory.createInstance on multiple threads w/o guards. sadly this is not technically illegal (and in fact it is at times even useful), however standard practices tell everyone to use the component/service manager, so i believe it's well worth overlooking in this bug (the alternative would be fixing GenericFactory to add some locking)

for reference, the component manager ensures singleton-ness of factories:
nsComponentManagerImpl::GetFactoryEntry/nsAutoMonitor mon(mMon);
Attachment #309671 - Flags: review?(benjamin)
Attached patch v1.4 (obsolete) — Splinter Review
per comment 20, I'm keeping the singleton lock in.

Now that the sqlite docs are up to date, I noticed that sqlite3_enable_shared_cache returns something now (it didn't used to as far as I know), so I'm checking that as well.  Had to use an nsRefPtr now so we don't leak the object if we fail.

I think we should get this in for the beta, so this needs to get landed.
Attachment #309671 - Attachment is obsolete: true
Attachment #309801 - Attachment is obsolete: true
Attachment #310290 - Flags: review?(benjamin)
Attachment #309801 - Flags: review?(benjamin)
Whiteboard: [needs new patch][swag 1hr] → [needs patch][needs review bsmedberg][swag 1hr]
Comment on attachment 310290 [details] [diff] [review]
v1.4

> mozStorageService *
> mozStorageService::GetSingleton()
> {
>+    // Since this service can be called by multiple threads, we have to use a
>+    // a lock to test and possibly create gStorageService
>+    static PRCallOnceType sInitOnce;
>+    PRStatus rc = PR_CallOnce(&sInitOnce, SingletonInit);
>+    if (rc != PR_SUCCESS) return nsnull;

separate line please

>+
>+    // If someone managed to start us twice, error out early.
>+    if (!gSingletonLock) return nsnull;

separate line please
Attachment #310290 - Flags: review?(benjamin) → review+
Attached patch v1.5Splinter Review
addresses review comments
Attachment #310290 - Attachment is obsolete: true
I won't have time to land this today.  This *might* hit Ts a bit, but we really can't do anything about it except maybe use reader-writer locks.
Keywords: checkin-needed
Whiteboard: [needs patch][needs review bsmedberg][swag 1hr]
Comment on attachment 309801 [details] [diff] [review]
eschew bad cludges by fixing root problem

moved to bug 423821
Target Milestone: mozilla1.9 → mozilla1.9beta5
Checking in and watching Ts for Shawn.

Checking in storage/src/mozStorageService.cpp;
/cvsroot/mozilla/storage/src/mozStorageService.cpp,v  <--  mozStorageService.cpp
new revision: 1.16; previous revision: 1.15
done
Checking in storage/src/mozStorageService.h;
/cvsroot/mozilla/storage/src/mozStorageService.h,v  <--  mozStorageService.h
new revision: 1.7; previous revision: 1.6
done
Status: NEW → RESOLVED
Closed: 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.