Closed
Bug 422472
Opened 16 years ago
Closed 16 years ago
Lock re-entrance in TimerThread, deadlock
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla1.9beta5
People
(Reporter: mayhemer, Assigned: mayhemer)
References
()
Details
(Keywords: hang)
Attachments
(1 file, 2 obsolete files)
3.68 KB,
patch
|
damons
:
approval1.9+
|
Details | Diff | Splinter Review |
Happened with one day old TRUNK build during exit. This is back trace of the main thread: 0 0x900248c7 in semaphore_wait_signal_trap () #1 0x90001582 in pthread_mutex_lock () #2 0x00677bab in PR_Lock (lock=0x2312220) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:206 #3 0x0177c839 in nsAutoLock::nsAutoLock (this=0xbfffef3c, aLock=0x2312220) at ../../dist/include/xpcom/nsAutoLock.h:219 #4 0x0172d80f in TimerThread::RemoveTimer (this=0x23120f0, aTimer=0x23de5f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/TimerThread.cpp:378 #5 0x0172bfd5 in nsTimerImpl::Cancel (this=0x23de5f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/nsTimerImpl.cpp:286 #6 0x28b97b1d in imgContainer::Anim::~Anim (this=0x3450bd80) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/modules/libpr0n/src/imgContainer.h:194 #7 0x28b4d792 in imgContainer::~imgContainer (this=0x2d6ca2f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/modules/libpr0n/src/imgContainer.cpp:94 #8 0x28b4a884 in imgContainer::Release (this=0x2d6ca2f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/modules/libpr0n/src/imgContainer.cpp:73 #9 0x017915d9 in nsTimerImpl::ReleaseCallback (this=0x23de5f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/nsTimerImpl.h:119 #10 0x0172de17 in TimerThread::Shutdown (this=0x23120f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/TimerThread.cpp:169 #11 0x0172bf6d in nsTimerImpl::Shutdown () at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/nsTimerImpl.cpp:199 #12 0x016d54ce in NS_ShutdownXPCOM_P (servMgr=0x2312e04) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/build/nsXPComInit.cpp:739 #13 0x00207972 in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0xbffff4ac) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/toolkit/xre/nsAppRunner.cpp:909 #14 0x0020f3b5 in XRE_main (argc=4, argv=0xbffff7d4, aAppData=0x230b9e0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/toolkit/xre/nsAppRunner.cpp:3195 #15 0x00002798 in main (argc=4, argv=0xbffff7d4) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/browser/app/nsBrowserApp.cpp:158 TimerThread.cpp:169 iterates all timers and calls ReleaseCallback on each under the mLock being held. TimerThread.cpp:378 enters the same mLock a second time. I am curios why the assertion !mLocked didn't fail as I am used to on windows. Just to report everything, this is back trace of the timer thread: #0 0x900248c7 in semaphore_wait_signal_trap () #1 0x90001582 in pthread_mutex_lock () #2 0x90047d7c in pthread_cond_timedwait () #3 0x00677f5a in pt_TimedWait (cv=0x2311dd4, ml=0x2312220, timeout=97) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:280 #4 0x00678472 in PR_WaitCondVar (cvar=0x2311dd0, timeout=97) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:407 #5 0x0172dbce in TimerThread::Run (this=0x23120f0) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/TimerThread.cpp:334 #6 0x0172893d in nsThread::ProcessNextEvent (this=0x2357400, mayWait=1, result=0xb011ee8c) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/nsThread.cpp:510 #7 0x016cc8f3 in NS_ProcessNextEvent_P (thread=0x2357400, mayWait=1) at nsThreadUtils.cpp:227 #8 0x01728b4a in nsThread::ThreadFunc (arg=0x2357400) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/xpcom/threads/nsThread.cpp:254 #9 0x0067efab in _pt_root (arg=0x2357610) at /Users/starapica/Documents/mozilla/TRUNK/mozilla/nsprpub/pr/src/pthreads/ptthread.c:221 #10 0x90024227 in _pthread_body () It simply waits for notification. STR: 1. go to http://images.google.cz/imgres?imgurl=http://www.getoggz.com/anim/AnimationWizard8.gif&imgrefurl=http://www.getoggz.com/GetOggz_Com/About_Oggz/&h=240&w=327&sz=700&hl=cs&start=9&um=1&tbnid=X8qxaVeqzz53NM:&tbnh=87&tbnw=118&prev=/images%3Fq%3Doggz%26um%3D1%26hl%3Dcs%26lr%3Dlang_cs%26sa%3DN 2. use Minefield > Quit to exit 3. Minefield hangs sometimes. This seems not to be reproducible. My suggestion is to copy mTimers array to a local array copy, delete mTimers under the lock and then release the timers using local array copy outside the lock.
Comment 1•16 years ago
|
||
That sounds correct. Can you submit a patch?
Assignee | ||
Comment 2•16 years ago
|
||
Tested on Win and Mac, unfortunately I cannot reproduce this hang anymore to be sure this helps and doesn't break anything.
Attachment #308940 -
Flags: review?(benjamin)
Comment 3•16 years ago
|
||
Comment on attachment 308940 [details] [diff] [review] Fix >Index: xpcom/threads/TimerThread.cpp >+ nsTimerImpl *timer; >+ PRInt32 timersCount = timers.Count(); >+ for (PRInt32 i = 0; i < timersCount; i++) { >+ timer = static_cast<nsTimerImpl*>(timers[i]); No need for "timer" to be outside the loop: nsTimerImpl *timer = static_cast<nsTimerImpl*>(timers[i]); with that nit fixed, r=me... I'd like sr=brendan on this one for double-checking drivers, we should really take this for 1.9/190x, since it's a timing-dependent hang that you can experience in the real world. Honza, can you attach an updated patch so we can request sr+approval on something that's ready for checkin?
Attachment #308940 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 4•16 years ago
|
||
I also tested for leaks - none is present.
Attachment #308940 -
Attachment is obsolete: true
Attachment #308946 -
Flags: superreview?(brendan)
Attachment #308946 -
Flags: review?(benjamin)
Updated•16 years ago
|
Attachment #308946 -
Flags: review?(benjamin) → review+
Updated•16 years ago
|
Flags: blocking1.9?
Comment 5•16 years ago
|
||
Please cc: me in future when requesting review (wish bugzilla did that for you). Thanks, /be
Comment 6•16 years ago
|
||
Comment on attachment 308946 [details] [diff] [review] Fix final This is a good fix in general, to avoid deadlock calling out of module through a destructor to who-knows-what lock-taking code. /be
Attachment #308946 -
Flags: superreview?(brendan) → superreview+
Comment 7•16 years ago
|
||
Comment on attachment 308946 [details] [diff] [review] Fix final Wouldn't nsTArray<nsTimerImpl*> be type-safe without modifying reference counts? nsVoidArray is rarely the right answer. Also would be good to describe in comments (citing this bug by number) why the copy-to-local-array dance has to happen. > // These two internal helper methods must be called while mLock is held. > // AddTimerInternal returns the position where the timer was added in the > // list, or -1 if it failed. > PRInt32 AddTimerInternal(nsTimerImpl *aTimer); > PRBool RemoveTimerInternal(nsTimerImpl *aTimer); >+ void ReleaseTimerInternal(nsTimerImpl *aTimer); Can you update two to three here, please?
Comment 8•16 years ago
|
||
I thought about nsTArray, but it would require doing some magic with the existing mTimers array, which isn't worth it. We already have a copy-constructor that makes copying a nsVoidArray easy.
Assignee | ||
Comment 9•16 years ago
|
||
(In reply to comment #7) > (From update of attachment 308946 [details] [diff] [review]) > Can you update two to three here, please? > No, because ReleaseTimerInternal may not be called under the lock during shutdown - that is fix for this bug together with call of timer->ReleaseCallback() outside the lock. However, ReleaseTimerInternal might be called under the lock when called from other places through RemoveTimer(Internal) but it will not release the timer its self because there is always left one more strong reference to it. (In reply to comment #7) > (From update of attachment 308946 [details] [diff] [review]) > Wouldn't nsTArray<nsTimerImpl*> be type-safe without modifying reference > counts? nsVoidArray is rarely the right answer. Also would be good to > describe in comments (citing this bug by number) why the copy-to-local-array > dance has to happen. I wouldn't change this now, it is too risky IMHO. But filling a new bug for this issue (more cleanup then issue) might be appropriate. I agree that commenting the code change is in place. I will submit one more patch to reflect it.
Status: NEW → ASSIGNED
Assignee | ||
Comment 10•16 years ago
|
||
This patch should be checked in.
Attachment #308946 -
Attachment is obsolete: true
Updated•16 years ago
|
Attachment #309070 -
Flags: approval1.9?
Comment 11•16 years ago
|
||
Comment on attachment 309070 [details] [diff] [review] Fix final + added explanatory comment a1.9+=damons
Attachment #309070 -
Flags: approval1.9? → approval1.9+
Updated•16 years ago
|
Keywords: checkin-needed
Comment 12•16 years ago
|
||
Checking in xpcom/threads/TimerThread.cpp; /cvsroot/mozilla/xpcom/threads/TimerThread.cpp,v <-- TimerThread.cpp new revision: 1.30; previous revision: 1.29 done Checking in xpcom/threads/TimerThread.h; /cvsroot/mozilla/xpcom/threads/TimerThread.h,v <-- TimerThread.h new revision: 1.12; previous revision: 1.11 done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
OS: Mac OS X → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta5
You need to log in
before you can comment on or make changes to this bug.
Description
•