Closed Bug 408872 Opened 17 years ago Closed 17 years ago

hang on shutdown involving URL classifier and storage AsyncIO

Categories

(Toolkit :: Storage, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: hang)

Yesterday I saw a hang on shutdown (in a build from a few days before that).  I think I've seen this hang before, although I'm not sure; this was the first time I've debugged in any detail.

At the point of the hang, all webshells had been destroyed, though there were still a bunch of DOM windows left.

There were 4 threads still around.  Thread #1 is the main thread, thread #2 is the URL classifier's worker thread, thread #3 is unknown, and thread #4 is the timer thread.

The hang seems to be because the main thread is trying to join thread 2, which is waiting on a condvar that nobody is notifying.  Vlad says the problem seems to be that the storage service shuts down the async I/O thread when it gets a shutdown notification, whether or not there are still connections alive.  This means thread #2 here is waiting for a thread that no longer exists to do something.

The stacks, and some other details from gdb, are as follows:

#0  nsThread::Shutdown (this=0xa29acd0)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:478
#1  0x056ab61c in nsUrlClassifierDBService::Shutdown (this=0x9f7f558)
    at /builds/trunk/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:2621
#2  0x056acbef in nsUrlClassifierDBService::Observe (this=0x9f7f558, 
    aSubject=0x9c4a4c8, aTopic=0x170fc1 "profile-before-change", 
    aData=0x171200)
    at /builds/trunk/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:2581
#3  0x00644d72 in nsObserverList::NotifyObservers (this=0x98fcc6c, 
    aSubject=0x9c4a4c8, aTopic=0x170fc1 "profile-before-change", 
    someData=0x171200) at /builds/trunk/mozilla/xpcom/ds/nsObserverList.cpp:128
#4  0x006462a5 in nsObserverService::NotifyObservers (this=0x98cd4c0, 
    aSubject=0x9c4a4c8, aTopic=0x170fc1 "profile-before-change", 
    someData=0x171200)
    at /builds/trunk/mozilla/xpcom/ds/nsObserverService.cpp:181
#5  0x0015d90e in nsXREDirProvider::DoShutdown (this=0xbfa07100)
    at /builds/trunk/mozilla/toolkit/xre/nsXREDirProvider.cpp:848
#6  0x00152367 in ~ScopedXPCOMStartup (this=0xbfa0722c)
    at /builds/trunk/mozilla/toolkit/xre/nsAppRunner.cpp:857
#7  0x001568a0 in XRE_main (argc=<value optimized out>, argv=0xbfa073d4, 
    aAppData=0x9816f98)
    at /builds/trunk/mozilla/toolkit/xre/nsAppRunner.cpp:3190
#8  0x08048cf1 in main (argc=4, argv=0x129dd2)
    at /builds/trunk/mozilla/browser/app/nsBrowserApp.cpp:153

(gdb) p context
$15 = {joiningThread = 0x984bd38, shutdownAck = 0}
(gdb) p *$.joiningThread
$16 = (nsThread) {<nsIThreadInternal> = {<nsIThread> = {<nsIEventTarget> = {<nsISupports> = {
          _vptr.nsISupports = 0x6e98a8}, <No data fields>}, <No data fields>}, <No data fields>}, <nsISupportsPriority> = {<nsISupports> = {
      _vptr.nsISupports = 0x6e98f0}, <No data fields>}, mRefCnt = {
    mValue = 2673}, _mOwningThread = {mThread = 0x9811548}, 
  static sGlobalObserver = 0x98ee89c, mLock = 0x984ce88, mObserver = {
    mRawPtr = 0x990fb84}, mEvents = 0x984bd54, mEventsRoot = {mNext = 0x0, 
    mFilter = {mRawPtr = 0x0}, mQueue = {mMonitor = 0x9850a00, 
      mHead = 0xd152388, mTail = 0xd152388, mOffsetHead = 27, 
      mOffsetTail = 27}}, mPriority = 0, mThread = 0x9811548, 
  mRunningEvent = 0, mShutdownContext = 0x0, mShutdownRequired = 0 '\0', 
  mShutdownPending = 105 'i', mEventsAreDoomed = 0 '\0'}
(gdb) inf thr
  4 Thread -1221719152 (LWP 22615)  0x00110402 in __kernel_vsyscall ()
  3 Thread -1233544304 (LWP 22616)  0x00110402 in __kernel_vsyscall ()
  2 Thread -1302852720 (LWP 22625)  0x00110402 in __kernel_vsyscall ()
* 1 Thread -1209112352 (LWP 22613)  nsThread::Shutdown (this=0xa29acd0)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:478
(gdb) thr 2
[Switching to thread 2 (Thread -1302852720 (LWP 22625))]#0  0x00110402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x007ed206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00715312 in PR_WaitCondVar (cvar=0xb2a58e38, timeout=4294967295)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:405
#3  0x06155be7 in mozStorageService::FlushAsyncIO (this=0x9bc68f0)
    at /builds/trunk/mozilla/storage/src/mozStorageAsyncIO.cpp:581
#4  0x0615bcab in mozStorageConnection::Close (this=0xa29bef0)
    at /builds/trunk/mozilla/storage/src/mozStorageConnection.cpp:187
#5  0x0615c07a in ~mozStorageConnection (this=0xa29bef0)
    at /builds/trunk/mozilla/storage/src/mozStorageConnection.cpp:81
#6  0x0615c3c8 in mozStorageConnection::Release (this=0xa29bef0)
    at /builds/trunk/mozilla/storage/src/mozStorageConnection.cpp:69
#7  0x056a372c in nsCOMPtr<mozIStorageConnection>::assign_assuming_AddRef (
    this=0xa29b80c, newPtr=0x0)
    at ../../../../dist/include/xpcom/nsCOMPtr.h:568
#8  0x056b6245 in nsCOMPtr<mozIStorageConnection>::assign_with_AddRef (
    this=0xa29b80c, rawPtr=0x0)
    at ../../../../dist/include/xpcom/nsCOMPtr.h:1267
#9  0x056b6fc9 in nsCOMPtr<mozIStorageConnection>::operator= (this=0xa29b80c, 
    rhs=0x0) at ../../../../dist/include/xpcom/nsCOMPtr.h:713
#10 0x056ab7f4 in nsUrlClassifierDBServiceWorker::CloseDb (this=0xa29b7f8)
    at /builds/trunk/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp:1988
#11 0x006b3ea9 in NS_InvokeByIndex_P ()
    at /builds/trunk/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp:73
#12 0x006a48b1 in nsProxyObjectCallInfo::Run (this=0xa29ac50)
    at /builds/trunk/mozilla/xpcom/proxy/src/nsProxyEvent.cpp:181
#13 0x0069b1df in nsThread::ProcessNextEvent (this=0xa29acd0, mayWait=1, 
    result=0xb2580330) at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:524
#14 0x0062f38f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1)
    at nsThreadUtils.cpp:227
#15 0x0069bc91 in nsThread::ThreadFunc (arg=0xa29acd0)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:254
#16 0x0071c975 in _pt_root (arg=0xa29af48)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221
#17 0x007e944b in start_thread (arg=0xb2580b90) at pthread_create.c:296
#18 0x00e0780e in clone () from /lib/libc.so.6
(gdb) thr 3
[Switching to thread 3 (Thread -1233544304 (LWP 22616))]#0  0x00110402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x007ed206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00715312 in PR_WaitCondVar (cvar=0x9bc6bd0, timeout=4294967295)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:405
#3  0x00715424 in PR_Wait (mon=0x9bc6c10, timeout=4294967295)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:584
#4  0x0066d4c7 in nsAutoMonitor::Wait (this=0xb6799278, interval=4294967295)
    at ../../dist/include/xpcom/nsAutoLock.h:340
#5  0x006998bb in nsEventQueue::GetEvent (this=0x9bc6b44, mayWait=1, 
    result=0xb67992f0)
    at /builds/trunk/mozilla/xpcom/threads/nsEventQueue.cpp:85
#6  0x0069d26e in nsThread::nsChainedEventQueue::GetEvent (this=0x9bc6b3c, 
    mayWait=1, event=0xb67992f0)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.h:112
#7  0x0069b181 in nsThread::ProcessNextEvent (this=0x9bc6b20, mayWait=1, 
    result=0xb6799330) at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:515
#8  0x0062f38f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1)
    at nsThreadUtils.cpp:227
#9  0x0069bc91 in nsThread::ThreadFunc (arg=0x9bc6b20)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:254
#10 0x0071c975 in _pt_root (arg=0x9bc6d98)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221
#11 0x007e944b in start_thread (arg=0xb6799b90) at pthread_create.c:296
#12 0x00e0780e in clone () from /lib/libc.so.6
(gdb) thr 4
[Switching to thread 4 (Thread -1221719152 (LWP 22615))]#0  0x00110402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x007ed48c in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#2  0x00713af6 in pt_TimedWait (cv=0x984b9bc, ml=0x984d2d8, timeout=10328)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:280
#3  0x007151de in PR_WaitCondVar (cvar=0x984b9b8, timeout=10328)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:407
#4  0x006a27fd in TimerThread::Run (this=0x984d188)
    at /builds/trunk/mozilla/xpcom/threads/TimerThread.cpp:317
#5  0x0069b1df in nsThread::ProcessNextEvent (this=0x9963450, mayWait=1, 
    result=0xb72e0330) at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:524
#6  0x0062f38f in NS_ProcessNextEvent_P (thread=0x0, mayWait=1)
    at nsThreadUtils.cpp:227
#7  0x0069bc91 in nsThread::ThreadFunc (arg=0x9963450)
    at /builds/trunk/mozilla/xpcom/threads/nsThread.cpp:254
#8  0x0071c975 in _pt_root (arg=0x9963660)
    at /builds/trunk/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221
#9  0x007e944b in start_thread (arg=0xb72e0b90) at pthread_create.c:296
#10 0x00e0780e in clone () from /lib/libc.so.6
Could this cause DB corruption?
Yes, this could be very bad; mozStorageService observes the shutdown message and then shuts down the async IO thread, ignoring the fact that other services (such as URL classifier) may be doing sqlite-related work.

However, the async IO layer defaults to non-async IO when the thread is destroyed or not present; the problem is that the URL classifier service is doing async IO from another thread.

I'm guessing that at http://mxr.mozilla.org/mozilla/source/storage/src/mozStorageAsyncIO.cpp#610 we need to set AsyncWriteThreadInstance to nsnull before we call Shutdown on it, so that when we release the lock nothing can get in on the queue (which is no longer being processed).

Version: 1.8 Branch → Trunk
Um yea.  Sounds bad. -->blocker P1
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Should we close this since we killer AsyncIO?  DBaron you seen this recently?
I haven't seen it recently.
Moving to fixed as a result of bug 408914. Please re-open if you see different.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.