This stack signature is a trunk topcrash First Appearance Date : 2003-01-24 Last Appearance Date : 2003-01-25 First Build ID : 2003012408 Latest Build ID : 2003012508 Source File : prfdcach.c line : 135 Count Offset Real Signature [ 12 _PR_Getfd a71d2f28 - _PR_Getfd ] Crash date range: 2003-01-19 to 2003-01-25 Min/Max Seconds since last crash: 653 - 155636 Min/Max Runtime: 653 - 155636 Keyword List : Count Platform List 12 Windows NT 5.0 build 2195 Count Build Id List 3 2003012105 3 2003012008 1 2003012508 1 2003012408 1 2003012308 1 2003012108 1 2003012005 1 2003011908 No of Unique Users 12 Stack trace(Frame) _PR_Getfd [prfdcach.c line 135] PR_AllocFileDesc [prio.c line 130] PR_Socket [prsocket.c line 1309] PR_OpenTCPSocket [prsocket.c line 1347] nsSocketTransport::BuildSocket [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 790] nsSocketTransport::InitiateSocket [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 887] nsSocketTransport::OnSocketEvent [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1180] nsSocketTransportService::Run [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 542] COMMENTS/URLs: (16606757) URL: http://www.nvnews.com/ (16606757) Comments: Clicked on a link to a forum within NVNews.com though I'm not sure where since the program shut down before I could get it. (16579822) URL: http://www.w3.org/TR/ (16579822) Comments: Trying to load a doc from a link in a tab. (16475701) URL: http://www.w3.org/TR/CSS21/visudet.html#propdef-height (16475701) Comments: I was browsing from http://www.w3.org/TR/CSS21/visudet.html#propdef-height to http://www.w3.org/TR/CSS21/visufx.html#propdef-overflow via the CSS2.1 Quick Reference Sidebar (downloaded from http://devedge.netscape.com/toolbox/sidebars/). When I clicked on (16475701) Comments: the link indicated above the broswer crashed. Subsequent efforts to reproduce seem to indicate that this is not a link-specific issue but in fact occurs randomly when using this particular sidebar. I am unable to reproduce it with other sidebars so (16475701) Comments: far. (16474975) URL: http://www.ars-technica.com/ (16459588) URL: www.mikeandmaureen.net/House (16459588) Comments: Just loading the above URL. -Robert SIMILAR STACK SIGNATURE COMMENTS/URLs: [ 11 _PR_Getfd a127e0f6 - _PR_Getfd ] (16572111) URL: slashdot.org (16572111) Comments: Hit reload Moz creashed. Mail was still working though. Weird (16483979) URL: www.fotki.com (16473323) URL: http://www.presence-pc.com/sqlforum/forum2.php3?post=826&cat=1&page=1&interface=&config=root42.inc (16473323) Comments: opening the page by clicking on a mail notification of a new reply in this thread (16458583) URL: http://sports.espn.go.com/nhl/
Severity: normal → critical
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.3beta
This is still crashing on the MozillaTrunk. Adding qawanted to see if anyone can reproduce this crash.
wtc: do you see anything interesting in this stack trace? PR_OpenTCPSocket is crashing, but clearly it's not like necko is passing NSPR a bad parameter!! ;-)
Darin: No, sorry.
here's another crash in _PR_Getfd that does not originate from the socket transport thread (Incident ID 16778095): _PR_Getfd [prfdcach.c, line 135] PR_AllocFileDesc [prio.c, line 130] PR_Open [prfile.c, line 372] nsLocalFile::OpenNSPRFileDesc [xpcom/io/nsLocalFileWin.cpp, line 768] ValidateOrigin [docshell/base/nsDocShell.cpp, line 959] is there some kind of thread-safety problem in prfdcach.c perhaps?
_PR_Getfd() is a commonly used NSPR function. I can only say that if there is a thread-safety problem in prfdcach.c, it would most likely have been encountered during stress testing of our heavily threaded server products. There has been no change to prfdcach.c since NSPR 4.1.2, a release that many server products have used.
wtc: yeah most likely necko is doing something to corrupt NSPR's fd cache.
Proposing this as 1.3 blocker because it is one of the really "top" crashes - twice as many crashes as bug 186132 (which is already blocking1.3+ because it is "high on the topcrash radar").
ok, so while i doubt this is truly a NSPR bug, i'm going to start investigating from there since that's the only info i've got. the stack trace blames line 135 of prfdcach.c, but as we know talkback is usually off by one line... the real crash is occuring on line 134 of that file, which has this line: memset(fd->secret, 0, sizeof(PRFilePrivate)); so, if something has corrupted |secret|.. maybe deleted it, while we're trying to memset it, then perhaps that would explain this crash. perhaps a file descriptor is being closed twice?!
here's the stack from an instance of this crash on linux. OS->ALL _PR_Getfd() [prfdcach.c, line 133] pt_SetMethods() [ptio.c, line 3273] PR_OpenFile() [ptio.c, line 3516] PR_Open() [ptio.c, line 3516] nsLocalFile::OpenNSPRFileDesc() [nsLocalFileUnix.cpp, line 356] nsDiskCacheStreamIO::OpenCacheFile() [nsDiskCacheStreams.cpp, line 609] nsDiskCacheStreamIO::GetInputStream() [nsDiskCacheStreams.cpp, line 362] nsDiskCacheDevice::OpenInputStreamForEntry() [nsDiskCacheDevice.cpp, line 610] nsCacheService::OpenInputStreamForEntry() [nsCacheService.cpp, line 1252] LazyInit() [nsCacheEntryDescriptor.cpp, line 499] Read() [nsCacheEntryDescriptor.cpp, line 96] nsInputStreamTransport::FillPipeSegment() [nsStreamTransportService.cpp, line 161] nsPipeOutputStream::WriteSegments() [nsPipe3.cpp, line 1045] nsInputStreamTransport::Run() [nsStreamTransportService.cpp, line 202] nsThreadPoolRunnable::Run() [nsThread.cpp, line 896] nsThread::Main() [nsThread.cpp, line 583] _pt_root() [ptthread.c, line 217] libpthread.so.0 + 0x6f87 (0x400d9f87)
OS: Windows NT → All
Created attachment 114304 [details] [diff] [review] v1 patch OK, this was a pain to track down. This problem was caused by PR_Close being called on the same PRFileDesc from two different threads at roughly the same time. The bug is in the nsInputStreamPump code. It is mistakenly calling Close on mStream, which violates the nsIInputStreamPump interface contract: namely that the stream being loaded should be accessed from only one thread at a time.
i should mention that i was fortunately able to repro this bug in a short time with a debug build by loading 8 or so tabs full of big pages (cnn caliber). sometimes it would require a bunch of reloads from cache to trigger the crash. i also had to add a PR_Sleep(1) inside _PR_Getfd just before the crash point to force a context switch. this combination allowed me to easily repro the bug ;-) most of the time, i'd hit the PR_ASSERT at the top of _PR_Putfd, indicating that the PRFileDesc had already been inserted into the fd cache, so it was easy to determine where the duplicate PR_Close was coming from (usually nsDiskCacheInputStream::Close, which is not MT-safe).
Comment on attachment 114304 [details] [diff] [review] v1 patch Nice catch....
Attachment #114304 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 114304 [details] [diff] [review] v1 patch r=dougt
Attachment #114304 - Flags: review?(dougt) → review+
Comment on attachment 114304 [details] [diff] [review] v1 patch a=asa (on behalf of drivers) for checkin to 1.3 final.
Attachment #114304 - Flags: approval1.3? → approval1.3+
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
No crashes in Talkback data since 2/13. Verifying.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.