Closed Bug 192207 Opened 21 years ago Closed 21 years ago

M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.3final

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(4 keywords, Whiteboard: fixed1.3)

Crash Data

Attachments

(1 file, 3 obsolete files)

topcrash @nsSocketTransportService::RememberHost

0x00000000
PL_DHashTableOperate [xpcom/ds/pldhash.c line 480]
nsSocketTransportService::RememberHost
[netwerk/base/src/nsSocketTransportService2.cpp line 294]
nsSocketTransport::OnSocketConnected [netwerk/base/src/nsSocketTransport2.cpp
line 1114]
nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp line 1262]
nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp
line 516]
nsThread::Main [xpcom/threads/nsThread.cpp line 134]
_PR_NativeRunThread [pruthr.c line 455]
msvcrt.dll + 0x27fb8 (0x77c07fb8)
kernel32.dll + 0x1d33b (0x77e5d33b)
also, this stack trace:

0x00000000
PL_DHashTableOperate [xpcom/ds/pldhash.c, line 480]
nsSocketTransportService::LookupHost
[netwerk/base/src/nsSocketTransportService2.cpp, line 275]
nsSocketTransport::ResolveHost [netwerk/base/src/nsSocketTransport2.cpp, line 747]
nsSocketTransport::OnSocketEvent [netwerk/base/src/nsSocketTransport2.cpp, line
1208]
nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp,
line 548]
nsThread::Main [xpcom/threads/nsThread.cpp, line 134]
_PR_NativeRunThread [pruthr.c, line 455]
msvcrt.dll + 0x27fb8 (0x77c07fb8)
kernel32.dll + 0x1d33b (0x77e5d33b) 
Status: NEW → ASSIGNED
Flags: blocking1.3?
Priority: -- → P1
Target Milestone: --- → mozilla1.3beta
Yes, this looks good to get. 
Flags: blocking1.3? → blocking1.3+
Keywords: topembedtopembed+
Target Milestone: mozilla1.3beta → mozilla1.3final
Darin, any hope for 1.3?  We're supposed to be done with it (or at least have it
on a branch) by now.

/be
brendan: nope, this bug is a real mystery to me.  i don't understand how the
PLDHashTable could not be initialized.  any thoughts?
Updating summary for tracking.  Here are a few sets of crashes from Talkback data:

Count   Offset    Real Signature
[ 46   0x00000000 59282756 - PL_DHashTableOperate ]
 
     Crash date range: 2003-02-08 to 2003-02-16
     Min/Max Seconds since last crash: 34 - 296449
     Min/Max Runtime: 34 - 449139
 
     Count   Platform List 
     44   Windows NT 5.0 build 2195
     2   Windows NT 4.0 build 1381
 
     Count   Build Id List 
     40   2003021008
     3   2003021108
     1   2003020904
     1   2003020808
     1   2003020708
 
     No of Unique Users        19
 
 Stack trace(Frame) 

	 0x00000000  
	 PL_DHashTableOperate	[c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c  line 480] 
	 nsSocketTransportService::RememberHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 357]  
 
     (17253177)	URL: www.gbcentrum.com
     (17250720)	Comments: The system was idle  and the LAN was not connected to
the Internet at the time the failure occurred.  It was probably the mail client
trying to reach the mail server.
     (17216963)	URL: http://www.cnn.com/WORLD
     (17206814)	URL: yahoo.de
     (17164310)	Comments: Opened one to many windows? It was working so well
then BLAM! when I clicked a link.
     (17134954)	URL: ftp://priede.bf.lu.lv
     (17134954)	Comments: Get files with Leech plugin. 
     (17132263)	URL: ftp://priede.bf.lu.lv
     (17132263)	Comments: Trying to download files with leech plugin from
ftp://pride.bf.lu.lv
     (17088991)	Comments: moz running over night.  hello. it's a new day - it's
a new bug. :)
 
====================================================================================================
     Count   Offset    Real Signature
[ 17   0x00000000 6094ca9a - PL_DHashTableOperate ]
[ 7   0x00000000 12f0253a - PL_DHashTableOperate ]
 
     Crash date range: 2003-02-08 to 2003-02-16
     Min/Max Seconds since last crash: 20 - 232263
     Min/Max Runtime: 68 - 232263
 
     Count   Platform List 
     24   Windows NT 5.1 build 2600
 
     Count   Build Id List 
     9   2003021104
     5   2003020908
     4   2003021408
     3   2003021008
     1   2003021508
     1   2003021504
     1   2003020808
 
     No of Unique Users         5
 
 Stack trace(Frame) 

	 0x00000000  
	 PL_DHashTableOperate	[c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c  line 480] 
	 nsSocketTransportService::LookupHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 334] 
	 nsSocketTransport::ResolveHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 747] 
	 nsSocketTransport::OnSocketEvent
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 1211] 
	 nsSocketTransportService::ServiceEventQ
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 277] 
	 nsSocketTransportService::Run
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 613] 
	 nsThread::Main	[c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp  line 134] 
	 _PR_NativeRunThread	[pruthr.c  line 455] 
	 msvcrt.dll + 0x27fb8 (0x77c07fb8)  
	 kernel32.dll + 0x1d33b (0x77e5d33b)   
 
     (17253235)	URL: naral.org
     (17253235)	Comments: Trying to read email from Naral.org
     (17131755)	URL: zapo.net
     (17131755)	Comments: Logging into web based email..... the 1.3b release
seems crash prone
 
====================================================================================================
     Count   Offset    Real Signature
[ 16   0x00000000 0c76354a - PL_DHashTableOperate ]
[ 12   0x00000000 7e12daea - PL_DHashTableOperate ]
 
     Crash date range: 2003-02-09 to 2003-02-16
     Min/Max Seconds since last crash: 102 - 137451
     Min/Max Runtime: 716 - 224712
 
     Count   Platform List 
     27   Windows NT 5.1 build 2600
     1   Windows NT 5.0 build 2195
 
     Count   Build Id List 
     13   2003021008
     7   2003021104
     3   2003021408
     2   2003021508
     2   2003020908
     1   2003021504
 
     No of Unique Users        10
 
 Stack trace(Frame) 

	 0x00000000  
	 PL_DHashTableOperate	[c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c  line 480] 
	 nsSocketTransportService::RememberHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 353] 
	 nsSocketTransport::OnSocketConnected
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 1117] 
	 nsSocketTransport::OnSocketReady
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 1269] 
	 nsSocketTransportService::Run
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 584] 
	 nsThread::Main	[c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp  line 134] 
	 _PR_NativeRunThread	[pruthr.c  line 455] 
	 msvcrt.dll + 0x27fb8 (0x77c37fb8)  
	 kernel32.dll + 0x1d33b (0x77e7d33b)   
 
     (17253138)	URL: naral.org
     (17253138)	Comments: Trying to access their Donate To Naral link
     (17232094)	Comments: Trying to access open a bookmark in a new tab
     (17207040)	Comments: another groupmark crash see bug 192744
     (17166422)	Comments: opening tabs in the background
     (17146778)	Comments: turning off junk mail controls didn't help . . .
     (17144443)	Comments: Another crash while automatically checking email.
Might try disabling the junk controls  see if that has any effect.
     (17143454)	Comments: Nothing. I think it was automatically checking for
emails at the time.
     (17128908)	Comments: opening groupmark
     (17106677)	Comments: opening groupmark
     (17098969)	Comments: opening group of tabs
     (17098646)	URL: http://www.acme.com/heartmaker/hearts.cgi
 
====================================================================================================
     Count   Offset    Real Signature
[ 8   0x00000000 09ce90c5 - PL_DHashTableOperate ]
 
     Crash date range: 2003-02-11 to 2003-02-16
     Min/Max Seconds since last crash: 43 - 209555
     Min/Max Runtime: 171 - 351058
 
     Count   Platform List 
     8   Windows NT 5.0 build 2195
 
     Count   Build Id List 
     7   2003021008
     1   2003021404
 
     No of Unique Users         7
 
 Stack trace(Frame) 

	 0x00000000  
	 PL_DHashTableOperate	[c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c  line 480] 
	 nsSocketTransportService::LookupHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp 
line 338] 
	 nsSocketTransport::ResolveHost
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 747] 
	 nsSocketTransport::OnSocketEvent
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp  line 1208]  
 
     (17221006)	Comments: Had just opened browser - one window open to my mail
web gateway.  Opened Freenet gateway - clicked on a link there - started to load
- program crashed.  Fix it!  ;-)  
     (17179078)	URL: pop-up window at www.lacasadelcdvirgen.com.ar
 
Added qawanted to see if we can get this reproduced.
Keywords: qawanted
Summary: topcrash @nsSocketTransportService::RememberHost → M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)
*** Bug 192744 has been marked as a duplicate of this bug. ***
I couldn't duplicate this bug based on "user comments" from talkback report. I
tried naral.org, zapo.net, opening groupmarks, marking mail msgs as junk etc. 
:(
suresh: right, me either.  there's something really screwy going on here.
For the record, I've been trying to reproduce this as well...with no luck.  I'll
keep an eye out for more user comments/urls to try.
Hello,

I was the reporter for bug 192744 which was marked as a duplicate of this bug. I
could reproduce this bug :

opened groupmarks with usual account -> crash

Cleaned profile (deleted *.rdf files, xul.mfl, chrome folder, Cache folder,
*.dat files), repeated operation : another crash

Created new profile, imported bookmarks, clicked on the same groupmark -> no crash

added user.js customizations from old profile into new profile -> no crash

directly copied prefs.js from old profile to new profile -> CRASH

The bug seems to be somewhere into the preferences set in my prefs file.

TB17278040Q
If a programmer wants my prefs.js I can send it to him by email but I'd rather
not have it attached to a bugzilla file report since it countains information
about my email accounts and some of them do not receive spam yet :-)
Ok, it looks like i spoke too soon. Now none of my profiles crashes with this
groupmarks...

This groupmark had the following sites :
http://www.osnews.com/
http://solutions.journaldunet.com/
http://www.journaldunet.com/
http://fr.news.yahoo.com/101/
http://www.mozillazine.org/
http://news.zdnet.fr/
http://www.webstandards.org/
http://fr.news.yahoo.com/32/

Since most of these sites are news sites, perhaps the problem is not the site
but a certain type of banner add that would make mozilla crash when it is
displayed...














spoke with brendan about this.  no chance that pldhash is doing something wrong
here.  in fact, it never nulls out the |ops| member var.  chances are something
is corrupting memory.  after inspecting the socket transport service a bit, it
appears that some of the array bounds are not completely protected.  i'm going
to put together a safety patch to eliminate any possibility of memory corruption
when too many sockets are in use.
Attached patch patch: stronger bounds checking (obsolete) — Splinter Review
Comment on attachment 114861 [details] [diff] [review]
patch: stronger bounds checking

summary of changes:

1- AttachSocket: utilize AddToIdleList so that we are only incrementing
mIdleCount in one place.

2- add runtime checks to enforce array bounds.

3- make MoveToPollList and MoveToIdleList handle errors.

4- eliminate unnecessary mServicingEventQ flag.
Attachment #114861 - Flags: superreview?(bzbarsky)
Attachment #114861 - Flags: review?(brendan)
also, it's likely that socket transport leaks (e.g., bug 191835) could explain
the ABR problem.
Comment on attachment 114861 [details] [diff] [review]
patch: stronger bounds checking

>-    // find out what list this is on...
>-    PRInt32 index = sock - mActiveList;
>-    if (index > 0 && index <= NS_SOCKET_MAX_COUNT)
>+    // find out what list this is on.
>+    PRUint32 index = sock - mActiveList;
>+    if (index <= NS_SOCKET_MAX_COUNT)
>         RemoveFromPollList(sock);
>     else
>         RemoveFromIdleList(sock);

mActiveList[0] seems to be unused, given the ++mActiveCount below:

> nsresult
> nsSocketTransportService::AddToPollList(SocketContext *sock)
> {
>     LOG(("nsSocketTransportService::AddToPollList [handler=%x]\n", sock->mHandler));
> 
>-    NS_ASSERTION(mActiveCount < NS_SOCKET_MAX_COUNT, "too many active sockets");
>+    if (mActiveCount == NS_SOCKET_MAX_COUNT) {
>+        NS_ERROR("too many active sockets");
>+        return NS_ERROR_UNEXPECTED;
>+    }
> 
>     memcpy(&mActiveList[++mActiveCount], sock, sizeof(SocketContext));

Do you really need mActiveList and mPollList to be NS_SOCKET_MAX_COUNT+1 in
length, while mIdleList is NS_SOCKET_MAX_COUNT?  My suggestion over irc about
PRUint32 based single-ended range checks allows an mActiveList index of 0,
where the old assertions did not.  The for loop from NS_SOCKET_MAX_COUNT down
to 1 would have to change, of course....

/be
brendan:

yeah, ultimately i want to change mActiveList to be based at index 0 instead of
index 1.  that was not a good design decision on my part to base it at index 1.
 i was thinking of making that change post 1.3 final since it seems more risky,
and i don't fear my checks allowing access to mActiveList[0] since at least that
wouldn't be corrupting memory :-/

that said, i can certainly put together a patch to change mActiveList to be
based at index 0 instead.  let me know if that's what you would prefer.  maybe
it won't turn out to be that much more complex.  hmm...
i have a better patch in mind...
Blocks: 193368
Attachment #114861 - Attachment is obsolete: true
Comment on attachment 115108 [details] [diff] [review]
v1.1 patch: revised per brendan's comments

ok, i feel a lot better about this patch.  mActiveList[i] => mPollList[i+1]. 
this actually ended up simplifying a bunch of the array manipulations since
mActiveList is now based at index 0 :)
Attachment #115108 - Flags: review?(brendan)
Comment on attachment 115108 [details] [diff] [review]
v1.1 patch: revised per brendan's comments

Only thought is to use struct assignment and not memcpy, as SocketContext is a
two-word struct.  Gcc may inline equivalently; wonder what MSVC does.

Nit: "take care to only check idle sockets that were idle to begin with ;-)" --
transpose "only" and "check".

Where's the .h file patch?

/be
Attachment #115108 - Flags: review?(brendan) → review+
*** Bug 194402 has been marked as a duplicate of this bug. ***
thx brendan!
Attachment #115108 - Attachment is obsolete: true
Comment on attachment 115167 [details] [diff] [review]
v1.2 patch: yeah, structure assignment does make a lot more sense

carrying forward r=brendan, requesting sr= from bz.
Attachment #115167 - Flags: superreview?(bzbarsky)
Attachment #115167 - Flags: review+
Comment on attachment 115167 [details] [diff] [review]
v1.2 patch: yeah, structure assignment does make a lot more sense

Do you need an assignment operator?  If struct SocketContext is plain old data,
I think you should let the compiler select the best instructions (64-bit if
possible on new architectures).

/be
Attachment #115167 - Flags: review+
brendan: whoa.. good point.  new patch coming up.
Attached patch v1.3 patchSplinter Review
Attachment #115167 - Attachment is obsolete: true
Attachment #115168 - Flags: superreview?(bzbarsky)
Attachment #115168 - Flags: review?(brendan)
Attachment #114861 - Flags: superreview?(bzbarsky)
Attachment #114861 - Flags: review?(brendan)
Attachment #115167 - Flags: superreview?(bzbarsky)
Comment on attachment 115168 [details] [diff] [review]
v1.3 patch

r=me, thanks.

/be
Attachment #115168 - Flags: review?(brendan) → review+
Blocks: 194402
I won't be able to super-review this until at least a week from now....
Attachment #115168 - Flags: superreview?(bzbarsky) → superreview?(dougt)
Comment on attachment 115168 [details] [diff] [review]
v1.3 patch



+    nsresult rv = AddToIdleList(&sock);
+    if (NS_SUCCEEDED(rv))
+	 NS_ADDREF(handler);

Does it make sense to hide the addref within AddToIdleList?

the patch doesn't apply cleanly to the tree.
Attachment #115168 - Flags: superreview?(dougt) → superreview+
doug: AddToIdleList is also called from MoveToIdleList.  in that case the STS
already owns a reference to the handler, and it just needs to move the reference
from the active list to the idle list.  so, i think it makes sense for
AttachSocket to be responsible for the ADDREF.  NOTE: the release happens in
DetachSocket.  thx for the review!
Comment on attachment 115168 [details] [diff] [review]
v1.3 patch

requesting drivers approval for the 1.3 branch.  this fixes a topcrash in the
socket code that is most easily hit when publishing a big document via FTP (see
bug 194402).  reviewed by dougt and brendan.  thx!
Attachment #115168 - Flags: approval1.3?
fixed-on-trunk
fixed1.3
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Whiteboard: fixed1.3
Crash Signature: [@ 0x00000000 - PL_DHashTableOperate]
You need to log in before you can comment on or make changes to this bug.