Closed Bug 192207 Opened 22 years ago Closed 22 years ago

M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.3final

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(4 keywords, Whiteboard: fixed1.3)

Crash Data

Attachments

(1 file, 3 obsolete files)

topcrash @nsSocketTransportService::RememberHost 0x00000000 PL_DHashTableOperate [xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [netwerk/base/src/nsSocketTransportService2.cpp line 294] nsSocketTransport::OnSocketConnected [netwerk/base/src/nsSocketTransport2.cpp line 1114] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp line 1262] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp line 516] nsThread::Main [xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b)
also, this stack trace: 0x00000000 PL_DHashTableOperate [xpcom/ds/pldhash.c, line 480] nsSocketTransportService::LookupHost [netwerk/base/src/nsSocketTransportService2.cpp, line 275] nsSocketTransport::ResolveHost [netwerk/base/src/nsSocketTransport2.cpp, line 747] nsSocketTransport::OnSocketEvent [netwerk/base/src/nsSocketTransport2.cpp, line 1208] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp, line 548] nsThread::Main [xpcom/threads/nsThread.cpp, line 134] _PR_NativeRunThread [pruthr.c, line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b)
Status: NEW → ASSIGNED
Flags: blocking1.3?
Priority: -- → P1
Target Milestone: --- → mozilla1.3beta
Yes, this looks good to get.
Flags: blocking1.3? → blocking1.3+
Keywords: topembedtopembed+
Target Milestone: mozilla1.3beta → mozilla1.3final
Darin, any hope for 1.3? We're supposed to be done with it (or at least have it on a branch) by now. /be
brendan: nope, this bug is a real mystery to me. i don't understand how the PLDHashTable could not be initialized. any thoughts?
Updating summary for tracking. Here are a few sets of crashes from Talkback data: Count Offset Real Signature [ 46 0x00000000 59282756 - PL_DHashTableOperate ] Crash date range: 2003-02-08 to 2003-02-16 Min/Max Seconds since last crash: 34 - 296449 Min/Max Runtime: 34 - 449139 Count Platform List 44 Windows NT 5.0 build 2195 2 Windows NT 4.0 build 1381 Count Build Id List 40 2003021008 3 2003021108 1 2003020904 1 2003020808 1 2003020708 No of Unique Users 19 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 357] (17253177) URL: www.gbcentrum.com (17250720) Comments: The system was idle and the LAN was not connected to the Internet at the time the failure occurred. It was probably the mail client trying to reach the mail server. (17216963) URL: http://www.cnn.com/WORLD (17206814) URL: yahoo.de (17164310) Comments: Opened one to many windows? It was working so well then BLAM! when I clicked a link. (17134954) URL: ftp://priede.bf.lu.lv (17134954) Comments: Get files with Leech plugin. (17132263) URL: ftp://priede.bf.lu.lv (17132263) Comments: Trying to download files with leech plugin from ftp://pride.bf.lu.lv (17088991) Comments: moz running over night. hello. it's a new day - it's a new bug. :) ==================================================================================================== Count Offset Real Signature [ 17 0x00000000 6094ca9a - PL_DHashTableOperate ] [ 7 0x00000000 12f0253a - PL_DHashTableOperate ] Crash date range: 2003-02-08 to 2003-02-16 Min/Max Seconds since last crash: 20 - 232263 Min/Max Runtime: 68 - 232263 Count Platform List 24 Windows NT 5.1 build 2600 Count Build Id List 9 2003021104 5 2003020908 4 2003021408 3 2003021008 1 2003021508 1 2003021504 1 2003020808 No of Unique Users 5 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::LookupHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 334] nsSocketTransport::ResolveHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 747] nsSocketTransport::OnSocketEvent [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1211] nsSocketTransportService::ServiceEventQ [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 277] nsSocketTransportService::Run [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 613] nsThread::Main [c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b) (17253235) URL: naral.org (17253235) Comments: Trying to read email from Naral.org (17131755) URL: zapo.net (17131755) Comments: Logging into web based email..... the 1.3b release seems crash prone ==================================================================================================== Count Offset Real Signature [ 16 0x00000000 0c76354a - PL_DHashTableOperate ] [ 12 0x00000000 7e12daea - PL_DHashTableOperate ] Crash date range: 2003-02-09 to 2003-02-16 Min/Max Seconds since last crash: 102 - 137451 Min/Max Runtime: 716 - 224712 Count Platform List 27 Windows NT 5.1 build 2600 1 Windows NT 5.0 build 2195 Count Build Id List 13 2003021008 7 2003021104 3 2003021408 2 2003021508 2 2003020908 1 2003021504 No of Unique Users 10 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 353] nsSocketTransport::OnSocketConnected [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1117] nsSocketTransport::OnSocketReady [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1269] nsSocketTransportService::Run [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 584] nsThread::Main [c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c37fb8) kernel32.dll + 0x1d33b (0x77e7d33b) (17253138) URL: naral.org (17253138) Comments: Trying to access their Donate To Naral link (17232094) Comments: Trying to access open a bookmark in a new tab (17207040) Comments: another groupmark crash see bug 192744 (17166422) Comments: opening tabs in the background (17146778) Comments: turning off junk mail controls didn't help . . . (17144443) Comments: Another crash while automatically checking email. Might try disabling the junk controls see if that has any effect. (17143454) Comments: Nothing. I think it was automatically checking for emails at the time. (17128908) Comments: opening groupmark (17106677) Comments: opening groupmark (17098969) Comments: opening group of tabs (17098646) URL: http://www.acme.com/heartmaker/hearts.cgi ==================================================================================================== Count Offset Real Signature [ 8 0x00000000 09ce90c5 - PL_DHashTableOperate ] Crash date range: 2003-02-11 to 2003-02-16 Min/Max Seconds since last crash: 43 - 209555 Min/Max Runtime: 171 - 351058 Count Platform List 8 Windows NT 5.0 build 2195 Count Build Id List 7 2003021008 1 2003021404 No of Unique Users 7 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::LookupHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 338] nsSocketTransport::ResolveHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 747] nsSocketTransport::OnSocketEvent [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1208] (17221006) Comments: Had just opened browser - one window open to my mail web gateway. Opened Freenet gateway - clicked on a link there - started to load - program crashed. Fix it! ;-) (17179078) URL: pop-up window at www.lacasadelcdvirgen.com.ar Added qawanted to see if we can get this reproduced.
Keywords: qawanted
Summary: topcrash @nsSocketTransportService::RememberHost → M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)
*** Bug 192744 has been marked as a duplicate of this bug. ***
I couldn't duplicate this bug based on "user comments" from talkback report. I tried naral.org, zapo.net, opening groupmarks, marking mail msgs as junk etc. :(
suresh: right, me either. there's something really screwy going on here.
For the record, I've been trying to reproduce this as well...with no luck. I'll keep an eye out for more user comments/urls to try.
Hello, I was the reporter for bug 192744 which was marked as a duplicate of this bug. I could reproduce this bug : opened groupmarks with usual account -> crash Cleaned profile (deleted *.rdf files, xul.mfl, chrome folder, Cache folder, *.dat files), repeated operation : another crash Created new profile, imported bookmarks, clicked on the same groupmark -> no crash added user.js customizations from old profile into new profile -> no crash directly copied prefs.js from old profile to new profile -> CRASH The bug seems to be somewhere into the preferences set in my prefs file. TB17278040Q
If a programmer wants my prefs.js I can send it to him by email but I'd rather not have it attached to a bugzilla file report since it countains information about my email accounts and some of them do not receive spam yet :-)
Ok, it looks like i spoke too soon. Now none of my profiles crashes with this groupmarks... This groupmark had the following sites : http://www.osnews.com/ http://solutions.journaldunet.com/ http://www.journaldunet.com/ http://fr.news.yahoo.com/101/ http://www.mozillazine.org/ http://news.zdnet.fr/ http://www.webstandards.org/ http://fr.news.yahoo.com/32/ Since most of these sites are news sites, perhaps the problem is not the site but a certain type of banner add that would make mozilla crash when it is displayed...
spoke with brendan about this. no chance that pldhash is doing something wrong here. in fact, it never nulls out the |ops| member var. chances are something is corrupting memory. after inspecting the socket transport service a bit, it appears that some of the array bounds are not completely protected. i'm going to put together a safety patch to eliminate any possibility of memory corruption when too many sockets are in use.
Attached patch patch: stronger bounds checking (obsolete) — Splinter Review
Comment on attachment 114861 [details] [diff] [review] patch: stronger bounds checking summary of changes: 1- AttachSocket: utilize AddToIdleList so that we are only incrementing mIdleCount in one place. 2- add runtime checks to enforce array bounds. 3- make MoveToPollList and MoveToIdleList handle errors. 4- eliminate unnecessary mServicingEventQ flag.
Attachment #114861 - Flags: superreview?(bzbarsky)
Attachment #114861 - Flags: review?(brendan)
also, it's likely that socket transport leaks (e.g., bug 191835) could explain the ABR problem.
Comment on attachment 114861 [details] [diff] [review] patch: stronger bounds checking >- // find out what list this is on... >- PRInt32 index = sock - mActiveList; >- if (index > 0 && index <= NS_SOCKET_MAX_COUNT) >+ // find out what list this is on. >+ PRUint32 index = sock - mActiveList; >+ if (index <= NS_SOCKET_MAX_COUNT) > RemoveFromPollList(sock); > else > RemoveFromIdleList(sock); mActiveList[0] seems to be unused, given the ++mActiveCount below: > nsresult > nsSocketTransportService::AddToPollList(SocketContext *sock) > { > LOG(("nsSocketTransportService::AddToPollList [handler=%x]\n", sock->mHandler)); > >- NS_ASSERTION(mActiveCount < NS_SOCKET_MAX_COUNT, "too many active sockets"); >+ if (mActiveCount == NS_SOCKET_MAX_COUNT) { >+ NS_ERROR("too many active sockets"); >+ return NS_ERROR_UNEXPECTED; >+ } > > memcpy(&mActiveList[++mActiveCount], sock, sizeof(SocketContext)); Do you really need mActiveList and mPollList to be NS_SOCKET_MAX_COUNT+1 in length, while mIdleList is NS_SOCKET_MAX_COUNT? My suggestion over irc about PRUint32 based single-ended range checks allows an mActiveList index of 0, where the old assertions did not. The for loop from NS_SOCKET_MAX_COUNT down to 1 would have to change, of course.... /be
brendan: yeah, ultimately i want to change mActiveList to be based at index 0 instead of index 1. that was not a good design decision on my part to base it at index 1. i was thinking of making that change post 1.3 final since it seems more risky, and i don't fear my checks allowing access to mActiveList[0] since at least that wouldn't be corrupting memory :-/ that said, i can certainly put together a patch to change mActiveList to be based at index 0 instead. let me know if that's what you would prefer. maybe it won't turn out to be that much more complex. hmm...
i have a better patch in mind...
Blocks: 193368
Attachment #114861 - Attachment is obsolete: true
Comment on attachment 115108 [details] [diff] [review] v1.1 patch: revised per brendan's comments ok, i feel a lot better about this patch. mActiveList[i] => mPollList[i+1]. this actually ended up simplifying a bunch of the array manipulations since mActiveList is now based at index 0 :)
Attachment #115108 - Flags: review?(brendan)
Comment on attachment 115108 [details] [diff] [review] v1.1 patch: revised per brendan's comments Only thought is to use struct assignment and not memcpy, as SocketContext is a two-word struct. Gcc may inline equivalently; wonder what MSVC does. Nit: "take care to only check idle sockets that were idle to begin with ;-)" -- transpose "only" and "check". Where's the .h file patch? /be
Attachment #115108 - Flags: review?(brendan) → review+
*** Bug 194402 has been marked as a duplicate of this bug. ***
thx brendan!
Attachment #115108 - Attachment is obsolete: true
Comment on attachment 115167 [details] [diff] [review] v1.2 patch: yeah, structure assignment does make a lot more sense carrying forward r=brendan, requesting sr= from bz.
Attachment #115167 - Flags: superreview?(bzbarsky)
Attachment #115167 - Flags: review+
Comment on attachment 115167 [details] [diff] [review] v1.2 patch: yeah, structure assignment does make a lot more sense Do you need an assignment operator? If struct SocketContext is plain old data, I think you should let the compiler select the best instructions (64-bit if possible on new architectures). /be
Attachment #115167 - Flags: review+
brendan: whoa.. good point. new patch coming up.
Attached patch v1.3 patchSplinter Review
Attachment #115167 - Attachment is obsolete: true
Attachment #115168 - Flags: superreview?(bzbarsky)
Attachment #115168 - Flags: review?(brendan)
Attachment #114861 - Flags: superreview?(bzbarsky)
Attachment #114861 - Flags: review?(brendan)
Attachment #115167 - Flags: superreview?(bzbarsky)
Comment on attachment 115168 [details] [diff] [review] v1.3 patch r=me, thanks. /be
Attachment #115168 - Flags: review?(brendan) → review+
Blocks: 194402
I won't be able to super-review this until at least a week from now....
Attachment #115168 - Flags: superreview?(bzbarsky) → superreview?(dougt)
Comment on attachment 115168 [details] [diff] [review] v1.3 patch + nsresult rv = AddToIdleList(&sock); + if (NS_SUCCEEDED(rv)) + NS_ADDREF(handler); Does it make sense to hide the addref within AddToIdleList? the patch doesn't apply cleanly to the tree.
Attachment #115168 - Flags: superreview?(dougt) → superreview+
doug: AddToIdleList is also called from MoveToIdleList. in that case the STS already owns a reference to the handler, and it just needs to move the reference from the active list to the idle list. so, i think it makes sense for AttachSocket to be responsible for the ADDREF. NOTE: the release happens in DetachSocket. thx for the review!
Comment on attachment 115168 [details] [diff] [review] v1.3 patch requesting drivers approval for the 1.3 branch. this fixes a topcrash in the socket code that is most easily hit when publishing a big document via FTP (see bug 194402). reviewed by dougt and brendan. thx!
Attachment #115168 - Flags: approval1.3?
fixed-on-trunk
fixed1.3
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: fixed1.3
Crash Signature: [@ 0x00000000 - PL_DHashTableOperate]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: