Closed
Bug 192207
Opened 20 years ago
Closed 20 years ago
M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)
Categories
(Core :: Networking, defect, P1)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla1.3final
People
(Reporter: darin.moz, Assigned: darin.moz)
References
Details
(4 keywords, Whiteboard: fixed1.3)
Crash Data
Attachments
(1 file, 3 obsolete files)
18.12 KB,
patch
|
brendan
:
review+
dougt
:
superreview+
dbaron
:
approval1.3+
|
Details | Diff | Splinter Review |
topcrash @nsSocketTransportService::RememberHost 0x00000000 PL_DHashTableOperate [xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [netwerk/base/src/nsSocketTransportService2.cpp line 294] nsSocketTransport::OnSocketConnected [netwerk/base/src/nsSocketTransport2.cpp line 1114] nsSocketTransport::OnSocketReady [netwerk/base/src/nsSocketTransport2.cpp line 1262] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp line 516] nsThread::Main [xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b)
Assignee | ||
Comment 1•20 years ago
|
||
also, this stack trace: 0x00000000 PL_DHashTableOperate [xpcom/ds/pldhash.c, line 480] nsSocketTransportService::LookupHost [netwerk/base/src/nsSocketTransportService2.cpp, line 275] nsSocketTransport::ResolveHost [netwerk/base/src/nsSocketTransport2.cpp, line 747] nsSocketTransport::OnSocketEvent [netwerk/base/src/nsSocketTransport2.cpp, line 1208] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp, line 548] nsThread::Main [xpcom/threads/nsThread.cpp, line 134] _PR_NativeRunThread [pruthr.c, line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b)
Status: NEW → ASSIGNED
Flags: blocking1.3?
Priority: -- → P1
Target Milestone: --- → mozilla1.3beta
Updated•20 years ago
|
Assignee | ||
Updated•20 years ago
|
Target Milestone: mozilla1.3beta → mozilla1.3final
Comment 3•20 years ago
|
||
Darin, any hope for 1.3? We're supposed to be done with it (or at least have it on a branch) by now. /be
Assignee | ||
Comment 4•20 years ago
|
||
brendan: nope, this bug is a real mystery to me. i don't understand how the PLDHashTable could not be initialized. any thoughts?
Comment 5•20 years ago
|
||
Updating summary for tracking. Here are a few sets of crashes from Talkback data: Count Offset Real Signature [ 46 0x00000000 59282756 - PL_DHashTableOperate ] Crash date range: 2003-02-08 to 2003-02-16 Min/Max Seconds since last crash: 34 - 296449 Min/Max Runtime: 34 - 449139 Count Platform List 44 Windows NT 5.0 build 2195 2 Windows NT 4.0 build 1381 Count Build Id List 40 2003021008 3 2003021108 1 2003020904 1 2003020808 1 2003020708 No of Unique Users 19 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 357] (17253177) URL: www.gbcentrum.com (17250720) Comments: The system was idle and the LAN was not connected to the Internet at the time the failure occurred. It was probably the mail client trying to reach the mail server. (17216963) URL: http://www.cnn.com/WORLD (17206814) URL: yahoo.de (17164310) Comments: Opened one to many windows? It was working so well then BLAM! when I clicked a link. (17134954) URL: ftp://priede.bf.lu.lv (17134954) Comments: Get files with Leech plugin. (17132263) URL: ftp://priede.bf.lu.lv (17132263) Comments: Trying to download files with leech plugin from ftp://pride.bf.lu.lv (17088991) Comments: moz running over night. hello. it's a new day - it's a new bug. :) ==================================================================================================== Count Offset Real Signature [ 17 0x00000000 6094ca9a - PL_DHashTableOperate ] [ 7 0x00000000 12f0253a - PL_DHashTableOperate ] Crash date range: 2003-02-08 to 2003-02-16 Min/Max Seconds since last crash: 20 - 232263 Min/Max Runtime: 68 - 232263 Count Platform List 24 Windows NT 5.1 build 2600 Count Build Id List 9 2003021104 5 2003020908 4 2003021408 3 2003021008 1 2003021508 1 2003021504 1 2003020808 No of Unique Users 5 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::LookupHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 334] nsSocketTransport::ResolveHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 747] nsSocketTransport::OnSocketEvent [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1211] nsSocketTransportService::ServiceEventQ [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 277] nsSocketTransportService::Run [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 613] nsThread::Main [c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c07fb8) kernel32.dll + 0x1d33b (0x77e5d33b) (17253235) URL: naral.org (17253235) Comments: Trying to read email from Naral.org (17131755) URL: zapo.net (17131755) Comments: Logging into web based email..... the 1.3b release seems crash prone ==================================================================================================== Count Offset Real Signature [ 16 0x00000000 0c76354a - PL_DHashTableOperate ] [ 12 0x00000000 7e12daea - PL_DHashTableOperate ] Crash date range: 2003-02-09 to 2003-02-16 Min/Max Seconds since last crash: 102 - 137451 Min/Max Runtime: 716 - 224712 Count Platform List 27 Windows NT 5.1 build 2600 1 Windows NT 5.0 build 2195 Count Build Id List 13 2003021008 7 2003021104 3 2003021408 2 2003021508 2 2003020908 1 2003021504 No of Unique Users 10 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::RememberHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 353] nsSocketTransport::OnSocketConnected [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1117] nsSocketTransport::OnSocketReady [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1269] nsSocketTransportService::Run [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 584] nsThread::Main [c:/builds/seamonkey/mozilla/xpcom/threads/nsThread.cpp line 134] _PR_NativeRunThread [pruthr.c line 455] msvcrt.dll + 0x27fb8 (0x77c37fb8) kernel32.dll + 0x1d33b (0x77e7d33b) (17253138) URL: naral.org (17253138) Comments: Trying to access their Donate To Naral link (17232094) Comments: Trying to access open a bookmark in a new tab (17207040) Comments: another groupmark crash see bug 192744 (17166422) Comments: opening tabs in the background (17146778) Comments: turning off junk mail controls didn't help . . . (17144443) Comments: Another crash while automatically checking email. Might try disabling the junk controls see if that has any effect. (17143454) Comments: Nothing. I think it was automatically checking for emails at the time. (17128908) Comments: opening groupmark (17106677) Comments: opening groupmark (17098969) Comments: opening group of tabs (17098646) URL: http://www.acme.com/heartmaker/hearts.cgi ==================================================================================================== Count Offset Real Signature [ 8 0x00000000 09ce90c5 - PL_DHashTableOperate ] Crash date range: 2003-02-11 to 2003-02-16 Min/Max Seconds since last crash: 43 - 209555 Min/Max Runtime: 171 - 351058 Count Platform List 8 Windows NT 5.0 build 2195 Count Build Id List 7 2003021008 1 2003021404 No of Unique Users 7 Stack trace(Frame) 0x00000000 PL_DHashTableOperate [c:/builds/seamonkey/mozilla/xpcom/ds/pldhash.c line 480] nsSocketTransportService::LookupHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransportService2.cpp line 338] nsSocketTransport::ResolveHost [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 747] nsSocketTransport::OnSocketEvent [c:/builds/seamonkey/mozilla/netwerk/base/src/nsSocketTransport2.cpp line 1208] (17221006) Comments: Had just opened browser - one window open to my mail web gateway. Opened Freenet gateway - clicked on a link there - started to load - program crashed. Fix it! ;-) (17179078) URL: pop-up window at www.lacasadelcdvirgen.com.ar Added qawanted to see if we can get this reproduced.
Keywords: qawanted
Summary: topcrash @nsSocketTransportService::RememberHost → M130B Trunk crash [@ 0x00000000 - PL_DHashTableOperate] (really in nsSocketTransportService::RememberHost)
Comment 6•20 years ago
|
||
*** Bug 192744 has been marked as a duplicate of this bug. ***
I couldn't duplicate this bug based on "user comments" from talkback report. I tried naral.org, zapo.net, opening groupmarks, marking mail msgs as junk etc. :(
Assignee | ||
Comment 8•20 years ago
|
||
suresh: right, me either. there's something really screwy going on here.
Comment 9•20 years ago
|
||
For the record, I've been trying to reproduce this as well...with no luck. I'll keep an eye out for more user comments/urls to try.
Comment 10•20 years ago
|
||
Hello, I was the reporter for bug 192744 which was marked as a duplicate of this bug. I could reproduce this bug : opened groupmarks with usual account -> crash Cleaned profile (deleted *.rdf files, xul.mfl, chrome folder, Cache folder, *.dat files), repeated operation : another crash Created new profile, imported bookmarks, clicked on the same groupmark -> no crash added user.js customizations from old profile into new profile -> no crash directly copied prefs.js from old profile to new profile -> CRASH The bug seems to be somewhere into the preferences set in my prefs file. TB17278040Q
Comment 11•20 years ago
|
||
If a programmer wants my prefs.js I can send it to him by email but I'd rather not have it attached to a bugzilla file report since it countains information about my email accounts and some of them do not receive spam yet :-)
Comment 12•20 years ago
|
||
Ok, it looks like i spoke too soon. Now none of my profiles crashes with this groupmarks... This groupmark had the following sites : http://www.osnews.com/ http://solutions.journaldunet.com/ http://www.journaldunet.com/ http://fr.news.yahoo.com/101/ http://www.mozillazine.org/ http://news.zdnet.fr/ http://www.webstandards.org/ http://fr.news.yahoo.com/32/ Since most of these sites are news sites, perhaps the problem is not the site but a certain type of banner add that would make mozilla crash when it is displayed...
Assignee | ||
Comment 13•20 years ago
|
||
spoke with brendan about this. no chance that pldhash is doing something wrong here. in fact, it never nulls out the |ops| member var. chances are something is corrupting memory. after inspecting the socket transport service a bit, it appears that some of the array bounds are not completely protected. i'm going to put together a safety patch to eliminate any possibility of memory corruption when too many sockets are in use.
Assignee | ||
Comment 14•20 years ago
|
||
Assignee | ||
Comment 15•20 years ago
|
||
Comment on attachment 114861 [details] [diff] [review] patch: stronger bounds checking summary of changes: 1- AttachSocket: utilize AddToIdleList so that we are only incrementing mIdleCount in one place. 2- add runtime checks to enforce array bounds. 3- make MoveToPollList and MoveToIdleList handle errors. 4- eliminate unnecessary mServicingEventQ flag.
Attachment #114861 -
Flags: superreview?(bzbarsky)
Attachment #114861 -
Flags: review?(brendan)
Assignee | ||
Comment 16•20 years ago
|
||
also, it's likely that socket transport leaks (e.g., bug 191835) could explain the ABR problem.
Comment 17•20 years ago
|
||
Comment on attachment 114861 [details] [diff] [review] patch: stronger bounds checking >- // find out what list this is on... >- PRInt32 index = sock - mActiveList; >- if (index > 0 && index <= NS_SOCKET_MAX_COUNT) >+ // find out what list this is on. >+ PRUint32 index = sock - mActiveList; >+ if (index <= NS_SOCKET_MAX_COUNT) > RemoveFromPollList(sock); > else > RemoveFromIdleList(sock); mActiveList[0] seems to be unused, given the ++mActiveCount below: > nsresult > nsSocketTransportService::AddToPollList(SocketContext *sock) > { > LOG(("nsSocketTransportService::AddToPollList [handler=%x]\n", sock->mHandler)); > >- NS_ASSERTION(mActiveCount < NS_SOCKET_MAX_COUNT, "too many active sockets"); >+ if (mActiveCount == NS_SOCKET_MAX_COUNT) { >+ NS_ERROR("too many active sockets"); >+ return NS_ERROR_UNEXPECTED; >+ } > > memcpy(&mActiveList[++mActiveCount], sock, sizeof(SocketContext)); Do you really need mActiveList and mPollList to be NS_SOCKET_MAX_COUNT+1 in length, while mIdleList is NS_SOCKET_MAX_COUNT? My suggestion over irc about PRUint32 based single-ended range checks allows an mActiveList index of 0, where the old assertions did not. The for loop from NS_SOCKET_MAX_COUNT down to 1 would have to change, of course.... /be
Assignee | ||
Comment 18•20 years ago
|
||
brendan: yeah, ultimately i want to change mActiveList to be based at index 0 instead of index 1. that was not a good design decision on my part to base it at index 1. i was thinking of making that change post 1.3 final since it seems more risky, and i don't fear my checks allowing access to mActiveList[0] since at least that wouldn't be corrupting memory :-/ that said, i can certainly put together a patch to change mActiveList to be based at index 0 instead. let me know if that's what you would prefer. maybe it won't turn out to be that much more complex. hmm...
Assignee | ||
Comment 19•20 years ago
|
||
i have a better patch in mind...
Assignee | ||
Comment 20•20 years ago
|
||
Attachment #114861 -
Attachment is obsolete: true
Assignee | ||
Comment 21•20 years ago
|
||
Comment on attachment 115108 [details] [diff] [review] v1.1 patch: revised per brendan's comments ok, i feel a lot better about this patch. mActiveList[i] => mPollList[i+1]. this actually ended up simplifying a bunch of the array manipulations since mActiveList is now based at index 0 :)
Attachment #115108 -
Flags: review?(brendan)
Comment 22•20 years ago
|
||
Comment on attachment 115108 [details] [diff] [review] v1.1 patch: revised per brendan's comments Only thought is to use struct assignment and not memcpy, as SocketContext is a two-word struct. Gcc may inline equivalently; wonder what MSVC does. Nit: "take care to only check idle sockets that were idle to begin with ;-)" -- transpose "only" and "check". Where's the .h file patch? /be
Attachment #115108 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 23•20 years ago
|
||
*** Bug 194402 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 25•20 years ago
|
||
Comment on attachment 115167 [details] [diff] [review] v1.2 patch: yeah, structure assignment does make a lot more sense carrying forward r=brendan, requesting sr= from bz.
Attachment #115167 -
Flags: superreview?(bzbarsky)
Attachment #115167 -
Flags: review+
Comment 26•20 years ago
|
||
Comment on attachment 115167 [details] [diff] [review] v1.2 patch: yeah, structure assignment does make a lot more sense Do you need an assignment operator? If struct SocketContext is plain old data, I think you should let the compiler select the best instructions (64-bit if possible on new architectures). /be
Attachment #115167 -
Flags: review+
Assignee | ||
Comment 27•20 years ago
|
||
brendan: whoa.. good point. new patch coming up.
Assignee | ||
Comment 28•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #115167 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #115168 -
Flags: superreview?(bzbarsky)
Attachment #115168 -
Flags: review?(brendan)
Assignee | ||
Updated•20 years ago
|
Attachment #114861 -
Flags: superreview?(bzbarsky)
Attachment #114861 -
Flags: review?(brendan)
Assignee | ||
Updated•20 years ago
|
Attachment #115167 -
Flags: superreview?(bzbarsky)
Comment 29•20 years ago
|
||
Comment on attachment 115168 [details] [diff] [review] v1.3 patch r=me, thanks. /be
Attachment #115168 -
Flags: review?(brendan) → review+
![]() |
||
Comment 30•20 years ago
|
||
I won't be able to super-review this until at least a week from now....
Assignee | ||
Updated•20 years ago
|
Attachment #115168 -
Flags: superreview?(bzbarsky) → superreview?(dougt)
Comment 31•20 years ago
|
||
Comment on attachment 115168 [details] [diff] [review] v1.3 patch + nsresult rv = AddToIdleList(&sock); + if (NS_SUCCEEDED(rv)) + NS_ADDREF(handler); Does it make sense to hide the addref within AddToIdleList? the patch doesn't apply cleanly to the tree.
Attachment #115168 -
Flags: superreview?(dougt) → superreview+
Assignee | ||
Comment 32•20 years ago
|
||
doug: AddToIdleList is also called from MoveToIdleList. in that case the STS already owns a reference to the handler, and it just needs to move the reference from the active list to the idle list. so, i think it makes sense for AttachSocket to be responsible for the ADDREF. NOTE: the release happens in DetachSocket. thx for the review!
Assignee | ||
Comment 33•20 years ago
|
||
Comment on attachment 115168 [details] [diff] [review] v1.3 patch requesting drivers approval for the 1.3 branch. this fixes a topcrash in the socket code that is most easily hit when publishing a big document via FTP (see bug 194402). reviewed by dougt and brendan. thx!
Attachment #115168 -
Flags: approval1.3?
Updated•20 years ago
|
Attachment #115168 -
Flags: approval1.3? → approval1.3+
Assignee | ||
Comment 34•20 years ago
|
||
fixed-on-trunk
Assignee | ||
Comment 35•20 years ago
|
||
fixed1.3
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Whiteboard: fixed1.3
Updated•12 years ago
|
Crash Signature: [@ 0x00000000 - PL_DHashTableOperate]
You need to log in
before you can comment on or make changes to this bug.
Description
•