Closed
Bug 166371
Opened 22 years ago
Closed 22 years ago
Crashes at [@ nsThreadPool::Shutdown] in Trunk, N700
Categories
(Core :: Networking, defect)
Tracking
()
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: greer, Assigned: timeless)
References
()
Details
(Keywords: crash, qawanted, topcrash)
Crash Data
Attachments
(1 file, 3 obsolete files)
3.29 KB,
text/plain
|
Details |
Crashes with the nsThreadPool::Shutdown signature are near the top of the list of N7.00 crashes. The signature is also showing up in Trunk builds as recently as 8/26. This needs to be fixed in the next point release. The stack for this crash is mentioned in bug 134724 comment #19, but may be a different issue. Assigning to timeless (owner of bug 134724) for first look. Stack trace(Frame) nsThreadPool::Shutdown [d:\builds\seamonkey\mozilla\xpcom\threads\nsThread.cpp line 745] nsFileTransportService::Shutdown [d:\builds\seamonkey\mozilla\netwerk\base\src\nsFileTransportService.cpp line 226] nsIOService::Observe [d:\builds\seamonkey\mozilla\netwerk\base\src\nsIOService.cpp line 1045] nsObserverService::NotifyObservers [d:\builds\seamonkey\mozilla\xpcom\ds\nsObserverService.cpp line 213] NS_ShutdownXPCOM [d:\builds\seamonkey\mozilla\xpcom\build\nsXPComInit.cpp line 544] main [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp line 1847] WinMain [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp line 1857] WinMainCRTStartup() KERNEL32.DLL + 0x1b560 (0xbff8b560) KERNEL32.DLL + 0x1b412 (0xbff8b412) KERNEL32.DLL + 0x19dd5 (0xbff89dd5)
:-(, i'll look, but i could probably use some help
Assignee: timeless → timeless
ok, i'm going to bet that multiple threads call shutdown thread 1 and 2 enter: 721 nsThreadPool::Shutdown thread 1 reaches 733 nsAutoLock lock(mLock); thread 2 reaches it and gets stuck thread 1 reaches 756 mThreads = nsnull; 757 return rv; thread 2 gets the lock and decides to do: 742 rv = mThreads->EnumerateForwards(nsThreadPool::InterruptThreads, nsnull which is reported as line 745 by talkback just to annoy me. How's my driving?
Status: NEW → ASSIGNED
so this patch should theoretically work assuming i got the right cause. mShuttingDown and mThreads are really equivalent conditions both are made false by ::Shutdown() after it acquires the lock. This might also be a caller bug, but I don't have an environment here to try to figure that out. For 1.2a I think a variant of this should do. darin said he'd look at it.
Comment 7•22 years ago
|
||
Comment on attachment 97723 [details] [diff] [review] condition and assertion are negotiable >Index: mozilla/xpcom/threads/nsThread.cpp >+ NS_ASSERTION(mThreads, "Bug 166371 was Shutdown() called more than once or did someone forget to call Init()?"); can a bug really be shutdown?? perhaps a ':' would do the trick ;-) this patch seems like the right way to protect against multiple calls to Shutdown() on different threads. i'd really like to know if that's actually the problem here or not. there is the possibility that nsFileTransportService::Shutdown is being called while another thread is inside nsFileTransportService::Add/RemoveSuspendedTransport because the locking in nsFileTransportService seems pretty broken to me. sr=darin (fwiw)
Attachment #97723 -
Flags: superreview+
Comment 8•22 years ago
|
||
Comment on attachment 97727 [details] [diff] [review] same patch, but for the caller >Index: mozilla/netwerk/base/src/nsFileTransportService.cpp > PR_Lock(mLock); >- mShuttingDown = 1; >+ if (mShuttingDown) { >+ NS_ASSERTION(!mShuttingDown, "Bug 166371 was Shutdown() called more than once!"); >+ return NS_OK; >+ } >+ mShuttingDown = PR_TRUE; > PR_Unlock(mLock); this locking is silly... assignment to a PRUint32 is atomic on all the processors we care about. if you want, you could rewrite this like so: if (mShuttingDown) return NS_OK; mShuttingDown = PR_TRUE;
Attachment #97727 -
Flags: needs-work+
Attachment #97723 -
Attachment is obsolete: true
Attachment #97727 -
Attachment is obsolete: true
Comment 10•22 years ago
|
||
Comment on attachment 97733 [details] [diff] [review] combined version per darin (fixed1.0.2) sr=darin, what about the assertion text? :-) we also need to seriously look into protecting mSuspendedTransportList inside Shutdown. things would blow up in a big way if Add/RemoveSuspendedTransport were called while inside Shutdown :(
Attachment #97733 -
Flags: superreview+
Assignee | ||
Comment 11•22 years ago
|
||
Comment on attachment 97733 [details] [diff] [review] combined version per darin (fixed1.0.2) r=bz on irc checked in leaving open to ponder locking for mSuspendedTransportList
Attachment #97733 -
Attachment is obsolete: true
Attachment #97733 -
Flags: review+
Comment 12•22 years ago
|
||
The assignment is atomic, but a check + assignment is not atomic, is it? i.e. could we have a situation where two threads enter Shutdown at approximately the same time, thread 1 finds mShuttingDown to be false, and before it sets mShuttingDown to true, thread 2 has checked mShuttingDown and found it to be false also? Then both threads would execute the entire function... it seems like we need a lock around the check _and_ assignment.
Comment 13•22 years ago
|
||
i should have qualified my last comment with : nsFileTransportService::Shutdown() is only called on the UI thread ;-) we know this because it is called from nsIOService::Observe() in response to the "xpcom-shutdown" event. i don't see any way for this notification to arrive on any thread other than the UI thread. i think the crash may actually be caused by "xpcom-shutdown" happening twice. or, it may be caused by "xpcom-shutdown" happening while a background file transport thread is inside Add/RemoveSuspendedTransport. see comment #7.
Comment 14•22 years ago
|
||
timeless: Is the patch checked into Trunk only ?
Comment 15•22 years ago
|
||
namachi - seeing that it does not have any approval, I'd hope that it's only in trunk.
Comment 16•22 years ago
|
||
Have we verified timeless's fix on the Trunk? This is currently the #3 topcrasher for Netscape 7.0. If his patch is safe and we know it fixed most of the crashes...it might be worth a shot getting it into Blackbird? I doubt it, but just thought I would throw that out there.
According to Talkback: http://climate/reports/VeryFastSearchStackSigNEW.cfm?stacksig=nsThreadPool%3A%3AShutdown this frame hasn't crashed since 8/26/2002.
Comment 18•22 years ago
|
||
Isn't there a (very small) hole in this code? nsresult nsFileTransportService::Shutdown(void) { - PR_Lock(mLock); - mShuttingDown = 1; - PR_Unlock(mLock); + if (mShuttingDown) + return NS_OK; + + mShuttingDown = PR_TRUE; Shouldn't that be more like: PR_Lock(mLock); if (mShuttingDown) { PR_Unlock(mLock); return NS_OK; } mShuttingDown = PR_TRUE; PR_Unlock(mLock);
Component: Networking → Accessibility APIs
Comment 19•22 years ago
|
||
no, because Shutdown is only called on the UI thread.
Comment 20•22 years ago
|
||
As timeless pointed out, Darin already answered the race-condition question. I still prefer using locking, since it's not good to make this sort of unchecked assumption about how other modules will call you, but in practice we're ok for now with it.
Updated•22 years ago
|
Attachment #97733 -
Flags: approval+
Comment 21•22 years ago
|
||
Comment on attachment 97733 [details] [diff] [review] combined version per darin (fixed1.0.2) a=rjesup@wgate.com for 1.0 branch; change mozilla1.0.2+ to fixed1.0.2 when checked in
Updated•22 years ago
|
Keywords: mozilla1.0.2+
Comment 22•22 years ago
|
||
rjesup: adding a lock here would buy you very little. we'd crash and burn elsewhere if someone tried to shutdown the file transport service on a background thread. shutdown is only called from the io service when it receives a xpcom-shutdown notification, which always occurs on the main thread. if that ever changes, much code would need to be rev'd.
Component: Accessibility APIs → Networking
Keywords: mozilla1.0.2+ → fixed1.0.2
Comment 23•22 years ago
|
||
timeless: can you explain how this is keyword==fixed1.0.2 while the status==assigned?
Comment 24•22 years ago
|
||
I see checkins on 9/3 for the TRUNK. I cannot find any for the BRANCH. Ben, please verify this fix on the TRUNK. If it's good, then we should feel better about it landing on the BRANCH. Thanks.
Assignee | ||
Comment 25•22 years ago
|
||
Comment on attachment 97733 [details] [diff] [review] combined version per darin (fixed1.0.2) http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&branch=MOZILL A_1_0(_0)?_BRANCH&branchtype=regexp&cvsroot=/cvsroot&date=explicit&mindate=1034 201640&maxdate=1034202180&who=timeless%25mac.com
Attachment #97733 -
Attachment description: combined version per darin → combined version per darin (fixed1.0.2)
Keywords: fixed1.0.2
Comment 26•22 years ago
|
||
Sorry. I'm drawing the line here. Someeone else is going to have to take this verification.
Comment 27•22 years ago
|
||
timeless, darin. Is this being left open for additional work (comment# 11)? Or can it be marked fixed? I am not seeing this crash in any recent talkback reports and the patch is checked in.
Comment 28•22 years ago
|
||
marking FIXED
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Crash Signature: [@ nsThreadPool::Shutdown]
You need to log in
before you can comment on or make changes to this bug.
Description
•