Closed
Bug 73780
Opened 23 years ago
Closed 23 years ago
Segmentation fault in nsCacheService::DoomEntry_Locked [@ nkcache.dll]
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: db, Assigned: beard)
References
()
Details
(Keywords: crash, topcrash, Whiteboard: [cache] fix checked in on branch, need review)
Crash Data
Attachments
(1 file)
6.02 KB,
patch
|
Details | Diff | Splinter Review |
When I open the page http://www.itbutikken.dk/sweden mozilla 2001032708 crashes. When I run with option -g I can see this in the debugger: Error loading URL http://www.itbutikken.se/: 804b001e [Switching to Thread 24084 (initial thread)] Program received signal SIGSEGV, Segmentation fault. 0x4121e7b0 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so You probably dont need it but if I do backtrace i get : (gdb) backtrace #0 0x4121e7b0 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #1 0x4121e488 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #2 0x41220a61 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #3 0x4121e2ff in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #4 0x4121e58d in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #5 0x4121c4f6 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #6 0x4121c68b in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so #7 0x409000e3 in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #8 0x40902ebc in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #9 0x40902a1b in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #10 0x409087b4 in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #11 0x409071d9 in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #12 0x408c59e2 in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #13 0x408c4dac in NSGetModule () from /home/dennis/bin/mozilla/components/libnecko.so #14 0x400c2377 in PL_HandleEvent () from /home/dennis/bin/mozilla/./libxpcom.so #15 0x400c2296 in PL_ProcessPendingEvents () from /home/dennis/bin/mozilla/./libxpcom.so #16 0x400c3179 in nsEventQueueImpl::ProcessPendingEvents () from /home/dennis/bin/mozilla/./libxpcom.so #17 0x404ca0f3 in NSGetModule () from /home/dennis/bin/mozilla/components/libwidget_gtk.so #18 0x404c9e6d in NSGetModule () from /home/dennis/bin/mozilla/components/libwidget_gtk.so #19 0x40686360 in g_io_unix_dispatch () from /usr/lib/libglib-1.2.so.0 #20 0x40687bf6 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0 #21 0x40688213 in g_main_iterate () from /usr/lib/libglib-1.2.so.0 #22 0x406883dc in g_main_run () from /usr/lib/libglib-1.2.so.0 #23 0x405a376c in gtk_main () from /usr/lib/libgtk-1.2.so.0 #24 0x404ca5ec in NSGetModule () from /home/dennis/bin/mozilla/components/libwidget_gtk.so #25 0x403a554a in NSGetModule () from /home/dennis/bin/mozilla/components/libnsappshell.so #26 0x804dfa5 in JS_PushArguments () #27 0x804e805 in JS_PushArguments () #28 0x4025bb5c in __libc_start_main (main=0x804e6d8 <JS_PushArguments+12836>, argc=1, ubp_av=0xbffffa34, init=0x804aff4 <_init>, fini=0x8054394 <_fini>, rtld_fini=0x4000d634 <_dl_fini>, stack_end=0xbffffa2c) at ../sysdeps/generic/libc-start.c:129 I have had some friends try this page and it seems to be working for them. I don't know whats different in my computer. But no matter what, it should not seg fault even if there is some module missing or something that NSGetModule can't do. It should print an error message and quit or something like that.
Comment 1•23 years ago
|
||
Confirming, moving to Networking: Cache, and adding crash keyword. I'm attaching a backtrace with symbols - gdb give NS_GetModule as the symbol sometimes if you run it on a non-debug version of mozilla. #0 0x4188d5e2 in nsCacheService::DoomEntry_Locked (this=0x824d290, entry=0x42963ae8) at nsCacheService.cpp:581 #1 0x41891982 in nsDiskCacheDevice::BindEntry (this=0x420661f8, newEntry=0x42995e50) at nsCacheService.h:91 #2 0x4188d3a6 in nsCacheService::EnsureEntryHasDevice (this=0x824d290, entry=0x42995e50) at nsCacheService.cpp:536 #3 0x4188d7f1 in nsCacheService::GetTransportForEntry (this=0x824d290, entry=0x42995e50, mode=2, result=0x42995ec0) at nsCacheService.cpp:627 #4 0x4188a459 in nsCacheEntryDescriptor::nsTransportWrapper::EnsureTransportWithAccess (this=0x42995ebc, mode=2) at nsCacheService.h:91 #5 0x4188a879 in nsCacheEntryDescriptor::nsTransportWrapper::OpenOutputStream (this=0x42995ebc, offset=0, count=4294967295, flags=0, result=0xbfffe800) at nsCacheEntryDescriptor.cpp:529 #6 0x40b3a9e4 in nsHTTPChannel::CacheReceivedResponse (this=0x42995c50, aListener=0x42995da0, aResult=0xbfffe920) at ../../../../dist/include/nsCOMPtr.h:648 #7 0x40b405f4 in nsHTTPChannel::ProcessStatusCode (this=0x42995c50) at ../../../../dist/include/nsCOMPtr.h:641 #8 0x40b3fce1 in nsHTTPChannel::FinishedResponseHeaders (this=0x42995c50) at nsHTTPChannel.cpp:2831 #9 0x40b4ba53 in nsHTTPServerListener::FinishedResponseHeaders ( this=0x42495cd8) at nsHTTPResponseListener.cpp:1020 #10 0x40b494eb in nsHTTPServerListener::OnDataAvailable (this=0x42495cd8, request=0x424bf978, context=0x42995c50, i_pStream=0x4293e560, i_SourceOffset=0, i_Length=1967) at nsHTTPResponseListener.cpp:418 #11 0x40adbb43 in nsOnDataAvailableEvent::HandleEvent (this=0x41c01820) at ../../../dist/include/nsCOMPtr.h:648 #12 0x40ada591 in nsStreamObserverEvent::HandlePLEvent (aEvent=0x41c01820) at nsStreamObserverProxy.cpp:78 #13 0x400e54bb in PL_HandleEvent (self=0x41c01820) at plevent.c:588 #14 0x400e531b in PL_ProcessPendingEvents (self=0x80a3f18) at plevent.c:518 #15 0x400e72a6 in nsEventQueueImpl::ProcessPendingEvents (this=0x80a3ef0) at nsEventQueue.cpp:361 and then down into gtk.
Assignee: asa → neeti
Severity: major → critical
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking: Cache
Ever confirmed: true
Keywords: crash
QA Contact: doronr → gordon
Summary: Segmentation fault in NSGetModule → Segmentation fault in nsCacheService::DoomEntry_Locked
Comment 2•23 years ago
|
||
I can confirm this bug with win2k / build 2001032304 OS from Linux -=> ALL Talkback: TB28380423Q
OS: Linux → All
Assignee | ||
Comment 4•23 years ago
|
||
We can't seem to get through to this site. Is it currently up?
Comment 6•23 years ago
|
||
adding topcrash keyword and [@ nkcache.dll] for tracking. this bug was in one of the comments for the nkcache.dll crash reported by talkback. the stack trace from talkback doesn't have the function symbols so it's not very helpful, but here is the entry: nkcache.dll + 0x3e83 (0x60793e83) 75c15a00 line Build: 2001032309 CrashDate: 2001-03-28 UptimeMinutes: 14 Total: 456 OS: Windows NT 5.0 build 2195 URL: http://bugzilla.mozilla.org/show_bug.cgi?id=73780 Comment: Detailed : http://cyclone/reports/incidenttemplate.cfm?bbid=28380423 StackTrace: http://cyclone/reports/stackcommentemail.cfm?dynamicBBID=28380423 there are a lot of crashes showing up under the nkcache.dll stack signature, but i'm not sure they are all the same. here are all the comments from the latest talkback report for those crashes: (28194948) URL: http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=1223549089 (28219753) URL: http://abcnews.go.com/ (28223470) URL: http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=1223549089 (28228945) URL: http://abcnews.go.com/ (28228945) Comments: Never ending Moz crash on close bug. (28343470) URL: http://www.weather.com/weather/local/30339 (28343990) URL: www.epost.de (28346384) URL: rain.ra.rockwell.com (28347711) Comments: win32 2001-03-27-12-trunk build. Composing a message. Was not typing or anything (28351940) URL: www.epost.de (28352213) URL: www.epost.de (28353132) URL: http://slashdot.org (28353132) Comments: reading (28364146) URL: imp.pro.proxad.net (28364146) Comments: Trying to type some text in ... (28380423) URL: http://bugzilla.mozilla.org/show_bug.cgi?id=73780 (28380664) URL: http://www.weather.com/weather/local/30082 (28380664) Comments: I had clicked a link in the 10 day forecast sectionThis bug is the same behavior as bugzilla 73657 that I logged 3-27-2001. That bug was marked as Linux (28391550) URL: http://www.time.com/time/health/article/0 (28391550) Comments: Crashed loading page.
Keywords: topcrash
Summary: Segmentation fault in nsCacheService::DoomEntry_Locked → Segmentation fault in nsCacheService::DoomEntry_Locked [@ nkcache.dll]
I am unable to connect to this URL with both mozilla and Netscape 4.7.
It's typical. It's a fairly big scandinavian shop which used to be up all the time but it seems that the last two days it goes up and down all the time. Now it's down again, but it's morning in scandinavia so I guess it will be up again when they get to work :-) I have quite a lot problems where mozilla crashes. But this site is the only one that it happens all the time. Otherwise it's just random crashes after having visited 20-30 pages or something. Other times it can run for days and mayby 100-200 pages visited. I started to run it in ddd as default to get som info out and it almost always looks the same when it crashes. It's always in NSGetModule. But it seems like its useless for me to run it like that since the information I get from ddd is useless for you, right?
Comment 9•23 years ago
|
||
To reproduce, go to http://geocities.com/the_firey/, scroll to the bottom of the page, and click on the "Enter Lightning!" image.
Comment 10•23 years ago
|
||
I´m sure the two WWW.Epost.de crashers are mine. I crashed there 2 times after I reopend mozilla after another crash. The problem with win2k is the file locking. Mozilla locks files (cache, profile ..). If Mozilla crash, WIN2K does not clear this file lockings. The files are still locked and you can´t write to the locked files. I solved the problem by logoff/login and have no problems with www.epost.de.
Comment 11•23 years ago
|
||
*** Bug 74271 has been marked as a duplicate of this bug. ***
Comment 12•23 years ago
|
||
Here's a "full" stack trace: (gdb) where full 3 #0 0x41f998fd in nsCacheService::DoomEntry_Locked (this=0x8282178, entry=0x87d6fc0) at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:763 this = (nsCacheService *) 0x8282178 rv = 0 device = (nsCacheDevice *) 0x1 #1 0x41f9da6c in nsDiskCacheDevice::BindEntry (this=0x8641d58, newEntry=0x895db28) at ../../../../mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:803 this = (nsDiskCacheDevice *) 0x8641d58 rv = 136847736 newDiskEntry = (nsDiskCacheEntry *) 0x8741a20 oldDiskEntry = (nsDiskCacheEntry *) 0x89696d0 dataSize = 1106952861 #2 0x41f996e4 in nsCacheService::EnsureEntryHasDevice (this=0x8282178, entry=0x895db28) at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:718 this = (nsCacheService *) 0x8282178 device = (nsCacheDevice *) 0x8641d58 rv = 136847736 (More stack frames follow...) As you can see, "device" is corrupt with value 0x1. Thus the crash on line 763: 753 nsresult 754 nsCacheService::DoomEntry_Locked(nsCacheEntry * entry) 755 { 756 if (this == nsnull) return NS_ERROR_NOT_AVAILABLE; 757 if (entry->IsDoomed()) return NS_OK; 758 759 nsresult rv = NS_OK; 760 entry->MarkDoomed(); 761 762 nsCacheDevice * device = entry->CacheDevice(); 763 if (device) device->DoomEntry(entry);
Comment 13•23 years ago
|
||
*** Bug 74280 has been marked as a duplicate of this bug. ***
Comment 14•23 years ago
|
||
Another data point: gdb seems to be consistently hanging when it tries to print the value for "lock" in frame #3: nsCacheService::GetTransportForEntry (this=0x8282818, entry=0x8c05e00, mode=2, result=0x8c05ef8) at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:827 this = (nsCacheService *) 0x8282818 lock = {<nsAutoLockBase> = {mAddr = 0x8282890, mDown = 0x0, mType = eAutoLock ^here gdb hangs while using approx. 100% CPU I'm using the gdb snapshot from 20010102. Also, when I "shar cache" manually, this is usually followed by a segfault with the following stack: (gdb) where full #0 pthread_cond_signal (cond=0x814220c) at queue.h:40 th = 0xbf1ffe78 cond = (pthread_cond_t *) 0x814220c #1 0x4031268c in pt_PostNotifies (lock=0x81421b0, unlock=1) at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:106 cv = (PRCondVar *) 0x8142208 index = 0 rv = 0 post = {length = 1, cv = {{cv = 0x8142208, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}}, link = 0x0} notified = (_PT_Notified *) 0xbfffe9b8 prev = (_PT_Notified *) 0x0 #2 0x40312c3b in PR_Unlock (lock=0x81421b0) at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:195 rv = -1073747416 #3 0x4018edb7 in nsAutoLock::~nsAutoLock (this=0xbfffea74, __in_chrg=2) at ../../../../dist/include/nsAutoLock.h:140 this = (nsAutoLock *) 0xbfffea74 __in_chrg = 2 #4 0x4014613f in nsThreadPool::DispatchRequest (this=0x81437b8, runnable=0x86689f0) at ../../../mozilla/xpcom/threads/nsThread.cpp:513 rv = 0 lock = {<nsAutoLockBase> = {mAddr = 0x81421b0, mDown = 0xbfffeb84, mType = eAutoLock ^gdb hangs here (using 100% CPU) The stack call stack looks something like: #0 pthread_cond_signal (cond=0x814220c) at queue.h:40 #1 0x4031268c in pt_PostNotifies (lock=0x81421b0, unlock=1) at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:106 #2 0x40312c3b in PR_Unlock (lock=0x81421b0) at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:195 #3 0x4018edb7 in nsAutoLock::~nsAutoLock (this=0xbfffea74, __in_chrg=2) at ../../../../dist/include/nsAutoLock.h:140 #4 0x4014613f in nsThreadPool::DispatchRequest (this=0x81437b8, runnable=0x8735a30) at ../../../mozilla/xpcom/threads/nsThread.cpp:513 #5 0x40d954f0 in nsFileTransportService::DispatchRequest (this=0x8143718, runnable=0x8735a30) at ../../../../mozilla/netwerk/base/src/nsFileTransportService.cpp:171 #6 0x40d91fb5 in nsFileTransport::AsyncRead (this=0x8735a28, aListener=0x87c47cc, aContext=0x0, aTransferOffset=0, aTransferCount=4294967295, aFlags=0, aResult=0x87c4830) at ../../../../mozilla/netwerk/base/src/nsFileTransport.cpp:477 #7 0x40e21688 in nsJARChannel::AsyncReadJARElement (this=0x87c47c8) at ../../../../../mozilla/netwerk/protocol/jar/src/nsJARChannel.cpp:377 #8 0x40e22186 in nsJARChannel::OnDownloadComplete (this=0x87c47c8, aDownloader=0x87c4700, aClosure=0x0, aStatus=0, aFile=0x87c3d08) at ../../../../../mozilla/netwerk/protocol/jar/src/nsJARChannel.cpp:574 #9 0x4016742d in XPTC_InvokeByIndex (that=0x87c47d4, methodIndex=3, paramCount=4, params=0x87c3c80) at ../../../../../../../mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:138 #10 0x40149328 in EventHandler (self=0x87c3bf8) at ../../../../mozilla/xpcom/proxy/src/nsProxyEvent.cpp:506 #11 0x4014040b in PL_HandleEvent (self=0x87c3bf8) at ../../../mozilla/xpcom/threads/plevent.c:588 #12 0x401401b9 in PL_ProcessPendingEvents (self=0x80b47f8) at ../../../mozilla/xpcom/threads/plevent.c:518 #13 0x401428f9 in nsEventQueueImpl::ProcessPendingEvents (this=0x80b47d0) at ../../../mozilla/xpcom/threads/nsEventQueue.cpp:361
Assignee | ||
Comment 15•23 years ago
|
||
This is reproducible on the Mac, now that I can load the site.
Assignee: gordon → beard
Assignee | ||
Comment 16•23 years ago
|
||
Here's the problem: I'm getting a disk cache entry collision, and the entry that I am showing as colliding is somehow already marked doomed. Then when nsDiskCacheDevice::BindEntry() sees the collision, it calls nsCacheService::DoomEntry_Locked(), which returns immediately because somehow the entry is already marked doom. However, since the cache service isn't calling nsDiskCacheDevice::Doom() immediately after, the disk cache entry is left dangling, and we get a crash sometime later.
Assignee | ||
Comment 17•23 years ago
|
||
OK, here's the REAL problem: somehow, bound nsCacheEntry objects are getting deleted behind the back of the disk cache device, leaving live, but invalid nsDiskCacheEntry objects in the bound entries hash table. I've added some code to detect this condition to nsDiskCacheDevice::BindEntry(). If an nsDiskCacheEntry is found in mBoundEntries that has a reference count of 1, then the disk cache entry is invalid. For some reason, this page generates this condition repeatably. Now to discover how this is happening.
Whiteboard: [cache] → [cache] fix checked in on branch.
Assignee | ||
Comment 18•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: [cache] fix checked in on branch. → [cache] fix checked in on branch, need review
Comment 19•23 years ago
|
||
is it safe to access mRefCnt directly? do we have to worry about XPCOM drift?
Comment 20•23 years ago
|
||
Okay, the REAL REAL problem (really this time) is that it is legal for cache devices to get a BindEntry() call for an entry that is already doomed. One way this can happen is if an http FORCE-WRITE request dooms an existing entry before it is bound. The holder of the descriptor for the existing (now doomed) entry has no knowledge that the entry has been doomed, and doesn't really care; it may still need to provide data to its client. The late binding of cache entries to devices was introduced fairly late in the design, and neither the disk or memory cache devices handled the binding of doomed entries properly. This has been fixed on the DISKCACHE1_BRANCH, and we hope to land it on the trunk in the next day or so.
Comment 21•23 years ago
|
||
This has been fixed with the landing of the DISKCACHE1_BRANCH. Marking FIXED. Please verify with a build from 2001/04/04 or later and reopen if it occurs again. Thanks.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 22•23 years ago
|
||
verifying fixed on recent Linux CVS build.
Updated•13 years ago
|
Crash Signature: [@ nkcache.dll]
You need to log in
before you can comment on or make changes to this bug.
Description
•