trunk topcrash [@ _PR_Getfd]

VERIFIED FIXED in mozilla1.3beta

Status

()

Core
Networking: HTTP
P1
critical
VERIFIED FIXED
16 years ago
7 years ago

People

(Reporter: Jan Carpenter, Assigned: Darin Fisher)

Tracking

({crash, qawanted, topcrash})

Trunk
mozilla1.3beta
x86
All
crash, qawanted, topcrash
Points:
---
Bug Flags:
blocking1.3 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed1.3, crash signature)

Attachments

(1 attachment)

(Reporter)

Description

16 years ago
This stack signature is a trunk topcrash
 
     First Appearance Date : 2003-01-24
     Last Appearance Date : 2003-01-25
     First Build ID : 2003012408
     Latest Build ID : 2003012508

Source File : prfdcach.c line : 135
 
  Count   Offset    Real Signature
[ 12   _PR_Getfd a71d2f28 - _PR_Getfd ]
 
     Crash date range: 2003-01-19 to 2003-01-25
     Min/Max Seconds since last crash: 653 - 155636
     Min/Max Runtime: 653 - 155636
     Keyword List :  
     Count   Platform List 
     12   Windows NT 5.0 build 2195
 
     Count   Build Id List 
     3   2003012105
     3   2003012008
     1   2003012508
     1   2003012408
     1   2003012308
     1   2003012108
     1   2003012005
     1   2003011908
 
     No of Unique Users        12
 
 Stack trace(Frame) 

	 _PR_Getfd	[prfdcach.c  line 135] 
	 PR_AllocFileDesc	[prio.c  line 130] 
	 PR_Socket	[prsocket.c  line 1309] 
	 PR_OpenTCPSocket	[prsocket.c  line 1347] 
	 nsSocketTransport::BuildSocket
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 790] 
	 nsSocketTransport::InitiateSocket
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 887] 
	 nsSocketTransport::OnSocketEvent
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 1180] 
	 nsSocketTransportService::Run
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 542]  
 
 
 COMMENTS/URLs:
     (16606757)	URL: http://www.nvnews.com/
     (16606757)	Comments: Clicked on a link to a forum within NVNews.com  though
I'm not sure where since the program shut down before I could get it.
     (16579822)	URL: http://www.w3.org/TR/
     (16579822)	Comments: Trying to load a doc from a link in a tab.  
     (16475701)	URL: http://www.w3.org/TR/CSS21/visudet.html#propdef-height
     (16475701)	Comments: I was browsing from
http://www.w3.org/TR/CSS21/visudet.html#propdef-height to
http://www.w3.org/TR/CSS21/visufx.html#propdef-overflow via the CSS2.1 Quick
Reference Sidebar (downloaded from
http://devedge.netscape.com/toolbox/sidebars/). When I clicked on
     (16475701)	Comments:  the link indicated above  the broswer crashed.
Subsequent efforts to reproduce seem to indicate that this is not a
link-specific issue  but in fact occurs randomly when using this particular
sidebar. I am unable to reproduce it with other sidebars so
     (16475701)	Comments:  far.
     (16474975)	URL: http://www.ars-technica.com/
     (16459588)	URL: www.mikeandmaureen.net/House
     (16459588)	Comments: Just loading the above URL.    -Robert
 

SIMILAR STACK SIGNATURE
COMMENTS/URLs:

[ 11   _PR_Getfd a127e0f6 - _PR_Getfd ]
     (16572111)	URL: slashdot.org
     (16572111)	Comments: Hit reload  Moz creashed.  Mail was still working 
though.  Weird
     (16483979)	URL: www.fotki.com
     (16473323)	URL:
http://www.presence-pc.com/sqlforum/forum2.php3?post=826&cat=1&page=1&interface=&config=root42.inc
     (16473323)	Comments: opening the page  by clicking on a mail notification
of a new reply in this thread
     (16458583)	URL: http://sports.espn.go.com/nhl/
(Assignee)

Updated

16 years ago
Severity: normal → critical
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.3beta

Comment 1

16 years ago
This is still crashing on the MozillaTrunk.  Adding qawanted to see if anyone
can reproduce this crash.
Keywords: qawanted
(Assignee)

Comment 2

16 years ago
wtc: do you see anything interesting in this stack trace?  PR_OpenTCPSocket is
crashing, but clearly it's not like necko is passing NSPR a bad parameter!! ;-)

Comment 3

16 years ago
Darin: No, sorry.
(Assignee)

Comment 4

16 years ago
here's another crash in _PR_Getfd that does not originate from the socket
transport thread (Incident ID 16778095):

_PR_Getfd [prfdcach.c, line 135]
PR_AllocFileDesc [prio.c, line 130]
PR_Open [prfile.c, line 372]
nsLocalFile::OpenNSPRFileDesc [xpcom/io/nsLocalFileWin.cpp, line 768]
ValidateOrigin [docshell/base/nsDocShell.cpp, line 959]

is there some kind of thread-safety problem in prfdcach.c perhaps?

Comment 5

16 years ago
_PR_Getfd() is a commonly used NSPR function.  I can
only say that if there is a thread-safety problem in
prfdcach.c, it would most likely have been encountered
during stress testing of our heavily threaded server
products.  There has been no change to prfdcach.c since
NSPR 4.1.2, a release that many server products have used.
(Assignee)

Comment 6

16 years ago
wtc: yeah most likely necko is doing something to corrupt NSPR's fd cache.

Comment 7

16 years ago
Proposing this as 1.3 blocker because it is one of the really "top" crashes -
twice as many crashes as bug 186132 (which is already blocking1.3+ because it is
"high on the topcrash radar").
Flags: blocking1.3?

Updated

16 years ago
Flags: blocking1.3? → blocking1.3+
(Assignee)

Comment 8

16 years ago
ok, so while i doubt this is truly a NSPR bug, i'm going to start investigating
from there since that's the only info i've got.  the stack trace blames line 135
of prfdcach.c, but as we know talkback is usually off by one line... the real
crash is occuring on line 134 of that file, which has this line:

  memset(fd->secret, 0, sizeof(PRFilePrivate));

so, if something has corrupted |secret|.. maybe deleted it, while we're trying
to memset it, then perhaps that would explain this crash.  perhaps a file
descriptor is being closed twice?!
(Assignee)

Comment 9

16 years ago
here's the stack from an instance of this crash on linux.  OS->ALL

_PR_Getfd() [prfdcach.c, line 133]
pt_SetMethods() [ptio.c, line 3273]
PR_OpenFile() [ptio.c, line 3516]
PR_Open() [ptio.c, line 3516]
nsLocalFile::OpenNSPRFileDesc() [nsLocalFileUnix.cpp, line 356]
nsDiskCacheStreamIO::OpenCacheFile() [nsDiskCacheStreams.cpp, line 609]
nsDiskCacheStreamIO::GetInputStream() [nsDiskCacheStreams.cpp, line 362]
nsDiskCacheDevice::OpenInputStreamForEntry() [nsDiskCacheDevice.cpp, line 610]
nsCacheService::OpenInputStreamForEntry() [nsCacheService.cpp, line 1252]
LazyInit() [nsCacheEntryDescriptor.cpp, line 499]
Read() [nsCacheEntryDescriptor.cpp, line 96]
nsInputStreamTransport::FillPipeSegment() [nsStreamTransportService.cpp, line 161]
nsPipeOutputStream::WriteSegments() [nsPipe3.cpp, line 1045]
nsInputStreamTransport::Run() [nsStreamTransportService.cpp, line 202]
nsThreadPoolRunnable::Run() [nsThread.cpp, line 896]
nsThread::Main() [nsThread.cpp, line 583]
_pt_root() [ptthread.c, line 217]
libpthread.so.0 + 0x6f87 (0x400d9f87) 
OS: Windows NT → All
(Assignee)

Comment 10

16 years ago
Created attachment 114304 [details] [diff] [review]
v1 patch

OK, this was a pain to track down.  This problem was caused by PR_Close being
called on the same PRFileDesc from two different threads at roughly the same
time.  The bug is in the nsInputStreamPump code.  It is mistakenly calling
Close on mStream, which violates the nsIInputStreamPump interface contract:
namely that the stream being loaded should be accessed from only one thread at
a time.
(Assignee)

Updated

16 years ago
Attachment #114304 - Flags: superreview?(bzbarsky)
Attachment #114304 - Flags: review?(dougt)
(Assignee)

Comment 11

16 years ago
i should mention that i was fortunately able to repro this bug in a short time
with a debug build by loading 8 or so tabs full of big pages (cnn caliber). 
sometimes it would require a bunch of reloads from cache to trigger the crash.
i also had to add a PR_Sleep(1) inside _PR_Getfd just before the crash point to
force a context switch.  this combination allowed me to easily repro the bug ;-)
most of the time, i'd hit the PR_ASSERT at the top of _PR_Putfd, indicating that
the PRFileDesc had already been inserted into the fd cache, so it was easy to
determine where the duplicate PR_Close was coming from (usually
nsDiskCacheInputStream::Close, which is not MT-safe).
Comment on attachment 114304 [details] [diff] [review]
v1 patch

Nice catch....
Attachment #114304 - Flags: superreview?(bzbarsky) → superreview+

Comment 13

16 years ago
Comment on attachment 114304 [details] [diff] [review]
v1 patch

r=dougt
Attachment #114304 - Flags: review?(dougt) → review+
(Assignee)

Updated

16 years ago
Attachment #114304 - Flags: approval1.3?

Comment 14

16 years ago
Comment on attachment 114304 [details] [diff] [review]
v1 patch

a=asa (on behalf of drivers) for checkin to 1.3 final.
Attachment #114304 - Flags: approval1.3? → approval1.3+
(Assignee)

Comment 15

16 years ago
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Comment 16

16 years ago
No crashes in Talkback data since 2/13.  Verifying.
Status: RESOLVED → VERIFIED

Updated

16 years ago
Whiteboard: fixed1.3
Crash Signature: [@ _PR_Getfd]
You need to log in before you can comment on or make changes to this bug.