Closed Bug 326334 Opened 15 years ago Closed 15 years ago

Implement mozStorage as a multithreaded server

Categories

(Toolkit :: Storage, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8.1

People

(Reporter: brettw, Assigned: brettw)

Details

(Keywords: fixed1.8.1)

Attachments

(1 file, 2 obsolete files)

 
Severity: normal → major
Priority: -- → P1
Target Milestone: --- → Firefox 2 alpha2
design doc please! :)
I imagine the design doc will have to come after me figuring out what the heck it's supposed to do.
Attached patch Patch (obsolete) — Splinter Review
This patch implements the asynchronous sqlite flushing system.
Attachment #211445 - Flags: review?(vladimir)
Component: Places → Storage
Flags: review?(vladimir)
Product: Firefox → Toolkit
Target Milestone: Firefox 2 alpha2 → mozilla1.8.1
Comment on attachment 211445 [details] [diff] [review]
Patch

I think changing it to the storage component cleared the review request.
Attachment #211445 - Flags: first-review?(vladimir)
Is there some information somewhere describing what problem this is trying to solve and the prospective solution(s)?  ... maybe some information on the wiki that can be linked to from this bug?  Thanks!
Comment on attachment 211445 [details] [diff] [review]
Patch


>+
>+// this is resquired for 
>+#define SQLITE_ASYNC_TWO_FILEHANDLES 1

"required" for ... ?

>+// AsyncMessage
>+//
>+//    Entries on the write-op queue are instances of the AsyncWrite
^ AsyncMessage


>+//    This space is sqliteMalloc()d along with the AsyncWrite structure in 
^ AsyncMessage

>+// message queue
>+static AsyncMessage* AsyncQueueFirst = nsnull;
>+static AsyncMessage* AsyncQueueLast = nsnull;

There's an nsDeque class, but it's not nsTDeque, so it probably doesn't buy you any type-safety or code cleanliness to use it.. and there's no way to synchronize on it, so you'd still be stuck with rolling your own mutex/cond.  So just ignore this, I'm just whining about our generic data classes :)


>+  // Add the record to the end of the write-op queue
>+  NS_ASSERTION(! aMessage->mNext, "New messages should not have next pointers");
>+  if (AsyncQueueLast) {
>+    NS_ASSERTION(AsyncQueueLast, "If we have a last item, we need to have a first one");

NS_ASSERTION(AsyncQueueFirst, not Last

>+
>+// AsyncClose
>+//
>+//    Close the file. This just adds an entry to the write-op list, the file is
>+//    not actually closed.
>+
>+int // static
>+AsyncClose(OsFile** aFile)
>+{
>+  if (AsyncWriteError != SQLITE_OK)
>+    return AsyncWriteError;
>+  AsyncOsFile* asyncfile = NS_REINTERPRET_CAST(AsyncOsFile*, *aFile);
>+  return AppendNewAsyncMessage(asyncfile, ASYNC_CLOSE, 0, 0, 0);
>+}

AsyncClose should set a flag in the OsFile struct indicating that it's closed; any ops (read/write) should check that flag first and return an error if set, before appending their operation to the queue.  Otherwise, it's possible to append a write op for a closed file, which sounds bad.
All of the Async* functions that take an OsFile should probably have checks for whether the file has already been closed.

>+// AsyncOpenDirectory
>+//
>+//    Open the directory identified by zName and associate it with the
>+//    specified file. This just adds an entry to the write-op list, the
>+//    directory is opened later by sqlite3_async_flush().
>+
>+int // static
>+AsyncOpenDirectory(OsFile* aFile, const char* aName)
>+{
>+  if (AsyncWriteError != SQLITE_OK)
>+    return AsyncWriteError;
>+  AsyncOsFile* asyncfile = NS_REINTERPRET_CAST(AsyncOsFile*, aFile);
>+  return AppendNewAsyncMessage(asyncfile, ASYNC_OPENDIRECTORY, 0,
>+                          strlen(aName) + 1, aName);
>+}

I'm not really sure why OpenDirectory can be async, but looking at the code the only time the result is actually used is to call Sync on it, so I guess that's ok.
> There's an nsDeque class, but it's not nsTDeque, so it probably doesn't buy you
> any type-safety or code cleanliness to use it.. and there's no way to
> synchronize on it, so you'd still be stuck with rolling your own mutex/cond. 
> So just ignore this, I'm just whining about our generic data classes :)

FYI, I've written a re-usable task queue for the new threading/event-queue implementation that I'm working on (see bug 326273).  It is basically a deque of nsIRunnable objects w/ support for thread synchronization.  Let me know if you are interested, and I can see about hoisting it onto the trunk/1.8-branch sooner rather than later.
Attached patch Path addressing Vlad's comments (obsolete) — Splinter Review
Attachment #211445 - Attachment is obsolete: true
Attachment #212065 - Flags: first-review?(vladimir)
Attachment #211445 - Flags: first-review?(vladimir)
Attachment #212065 - Flags: second-review?(darin)
Attached patch Real new patchSplinter Review
Attachment #212065 - Attachment is obsolete: true
Attachment #212067 - Flags: second-review?(darin)
Attachment #212067 - Flags: first-review?(vladimir)
Attachment #212065 - Flags: second-review?(darin)
Attachment #212065 - Flags: first-review?(vladimir)
Comment on attachment 212067 [details] [diff] [review]
Real new patch

r=me
Attachment #212067 - Flags: first-review?(vladimir) → first-review+
Comment on attachment 212067 [details] [diff] [review]
Real new patch

>Index: src/mozStorageAsyncIO.cpp

>+ * your program crashes or if you take a power lose after the database

s/lose/loss/


>+nsresult
>+mozStorageService::FinishAsyncIO()
...
>+  PR_Lock(AsyncQueueLock);
...
>+  PR_WaitCondVar(AsyncCompleteCondition, PR_INTERVAL_NO_TIMEOUT);
>+
>+  // at this point, the writer thread should be finished and we should be on
>+  // the main thread. Nobody else should have any use for the lock. Marking

This comment is wrong.  The writer thread may have just notified
the condition variable, but it might still be waiting to acquire
the lock again.  If that happens, then this function may end up
destroying the lock while the writer thread is trying to acquire
it.  The canonical solution to this sort of problem is to employ
reference counting.  Each thread that has an interest in the lock
should take an owning reference to some containing data structure
that holds the lock.  When the reference count goes to zero, then
destroy the lock.


>+  //   THREAD pthread_mutex_lock(&async.queueMutex);

Would be good to remove these comments since they don't really
apply to this code.


>+  return AppendNewAsyncMessage(asyncfile, ASYNC_OPENDIRECTORY, 0,
>+                          strlen(aName) + 1, aName);

nit: nice to line up start of parameter lists vertically.


>+      #ifdef IO_DELAY_INTERVAL_MS
>+        // this simulates slow disk
>+        PR_Sleep(PR_MillisecondsToInterval(IO_DELAY_INTERVAL_MS));
>+      #else
>+        // yield so the UI thread is more responsive
>+        PR_Sleep(PR_INTERVAL_NO_WAIT);
>+      #endif

I noticed that the PR_Sleep codepath is disabled in this version.
Have you tested to see if it is worth enabling?


>Index: src/mozStorageService.cpp

> mozStorageService::~mozStorageService()
> {
>+    if (mObserverService) {
>+        mObserverService->RemoveObserver(this, gQuitApplicationMessage);
>+    }

So, it is never the case that your destructor would be called with someone
referencing you.  So, this RemoveObserver call is bogus.


>+    rv = mObserverService->AddObserver(this, gQuitApplicationMessage, PR_FALSE);

... PR_FALSE implies AddRef on this from mObserverService.

If you are a service (singleton) like this guy, then it is okay
to not call RemoveObserver.  The reference will be removed at
xpcom-shutdown automatically.


When you implement the fix for referencing counting the locks,
be sure to handle the case where Init fails or is not called
such that the lock pointers are null.


Brett and I discussed a fix for the shutdown race condition and
we came up with fairly trivial solution.  r=darin with that fix.
Attachment #212067 - Flags: second-review?(darin) → second-review+
Looks like this makes some issues on x86_64.

sqlite3 typedefines sqlite_int64 to "long long int" but PRInt64 is typedefined to "long" on 64 bit hosts. Thus we have this error:

mozStorageAsyncIO.cpp:570: erreur: invalid conversion from ‘int (* const)(OsFile*, sqlite_int64*)’ to ‘int (*)(OsFile*, PRInt64*)’

Maybe sqlite should behave as prtypes.h for sqlite_int64:
http://lxr.mozilla.org/mozilla/source/nsprpub/pr/include/prtypes.h#357
On branch and trunk
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.