Closed Bug 73780 Opened 23 years ago Closed 23 years ago

Segmentation fault in nsCacheService::DoomEntry_Locked [@ nkcache.dll]

Categories

(Core :: Networking: Cache, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: db, Assigned: beard)

References

()

Details

(Keywords: crash, topcrash, Whiteboard: [cache] fix checked in on branch, need review)

Crash Data

Attachments

(1 file)

When I open the page http://www.itbutikken.dk/sweden mozilla 2001032708 crashes.
When I run with option -g I can see this in the debugger:

Error loading URL http://www.itbutikken.se/: 804b001e 
[Switching to Thread 24084 (initial thread)]

Program received signal SIGSEGV, Segmentation fault.
0x4121e7b0 in NSGetModule () from /home/dennis/bin/mozilla/components/libnkcache.so


You probably dont need it but if I do backtrace i get :

(gdb) backtrace
#0  0x4121e7b0 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#1  0x4121e488 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#2  0x41220a61 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#3  0x4121e2ff in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#4  0x4121e58d in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#5  0x4121c4f6 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#6  0x4121c68b in NSGetModule () from
/home/dennis/bin/mozilla/components/libnkcache.so
#7  0x409000e3 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#8  0x40902ebc in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#9  0x40902a1b in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#10 0x409087b4 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#11 0x409071d9 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#12 0x408c59e2 in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#13 0x408c4dac in NSGetModule () from
/home/dennis/bin/mozilla/components/libnecko.so
#14 0x400c2377 in PL_HandleEvent () from /home/dennis/bin/mozilla/./libxpcom.so
#15 0x400c2296 in PL_ProcessPendingEvents () from
/home/dennis/bin/mozilla/./libxpcom.so
#16 0x400c3179 in nsEventQueueImpl::ProcessPendingEvents () from
/home/dennis/bin/mozilla/./libxpcom.so
#17 0x404ca0f3 in NSGetModule () from
/home/dennis/bin/mozilla/components/libwidget_gtk.so
#18 0x404c9e6d in NSGetModule () from
/home/dennis/bin/mozilla/components/libwidget_gtk.so
#19 0x40686360 in g_io_unix_dispatch () from /usr/lib/libglib-1.2.so.0
#20 0x40687bf6 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0
#21 0x40688213 in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#22 0x406883dc in g_main_run () from /usr/lib/libglib-1.2.so.0
#23 0x405a376c in gtk_main () from /usr/lib/libgtk-1.2.so.0
#24 0x404ca5ec in NSGetModule () from
/home/dennis/bin/mozilla/components/libwidget_gtk.so
#25 0x403a554a in NSGetModule () from
/home/dennis/bin/mozilla/components/libnsappshell.so
#26 0x804dfa5 in JS_PushArguments ()
#27 0x804e805 in JS_PushArguments ()
#28 0x4025bb5c in __libc_start_main (main=0x804e6d8 <JS_PushArguments+12836>,
argc=1, ubp_av=0xbffffa34, init=0x804aff4 <_init>, fini=0x8054394 <_fini>,
rtld_fini=0x4000d634 <_dl_fini>, stack_end=0xbffffa2c) at
../sysdeps/generic/libc-start.c:129


I have had some friends try this page and it seems to be working for them. I
don't know whats different in my computer.

But no matter what, it should not seg fault even if there is some module missing
or something that NSGetModule can't do. It should print an error message and
quit or something like that.
Confirming, moving to Networking: Cache, and adding crash keyword. I'm attaching
a backtrace with symbols - gdb give NS_GetModule as the symbol sometimes if you
run it on a non-debug version of mozilla.

#0  0x4188d5e2 in nsCacheService::DoomEntry_Locked (this=0x824d290, 
    entry=0x42963ae8) at nsCacheService.cpp:581
#1  0x41891982 in nsDiskCacheDevice::BindEntry (this=0x420661f8, 
    newEntry=0x42995e50) at nsCacheService.h:91
#2  0x4188d3a6 in nsCacheService::EnsureEntryHasDevice (this=0x824d290, 
    entry=0x42995e50) at nsCacheService.cpp:536
#3  0x4188d7f1 in nsCacheService::GetTransportForEntry (this=0x824d290, 
    entry=0x42995e50, mode=2, result=0x42995ec0) at nsCacheService.cpp:627
#4  0x4188a459 in
nsCacheEntryDescriptor::nsTransportWrapper::EnsureTransportWithAccess
(this=0x42995ebc, mode=2) at nsCacheService.h:91
#5  0x4188a879 in nsCacheEntryDescriptor::nsTransportWrapper::OpenOutputStream
    (this=0x42995ebc, offset=0, count=4294967295, flags=0, result=0xbfffe800)
    at nsCacheEntryDescriptor.cpp:529
#6  0x40b3a9e4 in nsHTTPChannel::CacheReceivedResponse (this=0x42995c50, 
    aListener=0x42995da0, aResult=0xbfffe920)
    at ../../../../dist/include/nsCOMPtr.h:648
#7  0x40b405f4 in nsHTTPChannel::ProcessStatusCode (this=0x42995c50)
    at ../../../../dist/include/nsCOMPtr.h:641
#8  0x40b3fce1 in nsHTTPChannel::FinishedResponseHeaders (this=0x42995c50)
    at nsHTTPChannel.cpp:2831
#9  0x40b4ba53 in nsHTTPServerListener::FinishedResponseHeaders (
    this=0x42495cd8) at nsHTTPResponseListener.cpp:1020
#10 0x40b494eb in nsHTTPServerListener::OnDataAvailable (this=0x42495cd8, 
    request=0x424bf978, context=0x42995c50, i_pStream=0x4293e560, 
    i_SourceOffset=0, i_Length=1967) at nsHTTPResponseListener.cpp:418
#11 0x40adbb43 in nsOnDataAvailableEvent::HandleEvent (this=0x41c01820)
    at ../../../dist/include/nsCOMPtr.h:648
#12 0x40ada591 in nsStreamObserverEvent::HandlePLEvent (aEvent=0x41c01820)
    at nsStreamObserverProxy.cpp:78
#13 0x400e54bb in PL_HandleEvent (self=0x41c01820) at plevent.c:588
#14 0x400e531b in PL_ProcessPendingEvents (self=0x80a3f18) at plevent.c:518
#15 0x400e72a6 in nsEventQueueImpl::ProcessPendingEvents (this=0x80a3ef0)
    at nsEventQueue.cpp:361

and then down into gtk.
Assignee: asa → neeti
Severity: major → critical
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking: Cache
Ever confirmed: true
Keywords: crash
QA Contact: doronr → gordon
Summary: Segmentation fault in NSGetModule → Segmentation fault in nsCacheService::DoomEntry_Locked
I can confirm this bug with win2k / build 2001032304

OS from Linux -=> ALL
Talkback: TB28380423Q
OS: Linux → All
--->gordon
Assignee: neeti → gordon
Whiteboard: [cache]
We can't seem to get through to this site. Is it currently up?
Now it's up at least.
adding topcrash keyword and [@ nkcache.dll] for tracking. this bug was in one of 
the comments for the nkcache.dll crash reported by talkback.  the stack trace 
from talkback doesn't have the function symbols so it's not very helpful, but 
here is the entry:

nkcache.dll + 0x3e83 (0x60793e83) 75c15a00
         line 
        Build: 2001032309 CrashDate: 2001-03-28 UptimeMinutes: 14  Total: 456 
        OS: Windows NT  5.0 build 2195
        URL: http://bugzilla.mozilla.org/show_bug.cgi?id=73780
        Comment: 
         Detailed : http://cyclone/reports/incidenttemplate.cfm?bbid=28380423
         StackTrace: 
http://cyclone/reports/stackcommentemail.cfm?dynamicBBID=28380423

there are a lot of crashes showing up under the nkcache.dll stack signature, but 
i'm not sure they are all the same.  here are all the comments from the latest 
talkback report for those crashes:

(28194948) URL: 
http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=1223549089
     (28219753) URL: http://abcnews.go.com/
     (28223470) URL: 
http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=1223549089
     (28228945) URL: http://abcnews.go.com/
     (28228945) Comments: Never ending Moz crash on close bug.
     (28343470) URL: http://www.weather.com/weather/local/30339
     (28343990) URL: www.epost.de
     (28346384) URL: rain.ra.rockwell.com
     (28347711) Comments: win32 2001-03-27-12-trunk build.  Composing a message.  
Was not typing or anything
     (28351940) URL: www.epost.de
     (28352213) URL: www.epost.de
     (28353132) URL: http://slashdot.org
     (28353132) Comments: reading
     (28364146) URL: imp.pro.proxad.net
     (28364146) Comments: Trying to type some text in ...
     (28380423) URL: http://bugzilla.mozilla.org/show_bug.cgi?id=73780
     (28380664) URL: http://www.weather.com/weather/local/30082
     (28380664) Comments: I had clicked a link in the 10 day forecast 
sectionThis bug is the same behavior as bugzilla 73657 that I logged 3-27-2001.  
That bug was marked as Linux
     (28391550) URL: http://www.time.com/time/health/article/0
     (28391550) Comments: Crashed loading page.
Keywords: topcrash
Summary: Segmentation fault in nsCacheService::DoomEntry_Locked → Segmentation fault in nsCacheService::DoomEntry_Locked [@ nkcache.dll]
I am unable to connect to this URL with both mozilla and Netscape 4.7.
It's typical. It's a fairly big scandinavian shop which used to be up all the
time but it seems that the last two days it goes up and down all the time. Now
it's down again, but it's morning in scandinavia so I guess it will be up again
when they get to work :-)

I have quite a lot problems where mozilla crashes. But this site is the only one
that it happens all the time. Otherwise it's just random crashes after having
visited 20-30 pages or something. Other times it can run for days and mayby
100-200 pages visited. I started to run it in ddd as default to get som info out
and it almost always looks the same when it crashes. It's always in NSGetModule.
But it seems like its useless for me to run it like that since the information I
get from ddd is useless for you, right?
To reproduce, go to http://geocities.com/the_firey/, scroll to the bottom
of the page, and click on the "Enter Lightning!" image.
I´m sure the two WWW.Epost.de crashers are mine.
I crashed there 2 times after I reopend mozilla after another crash.
The problem with win2k is the file locking. 
Mozilla locks files (cache, profile ..). If Mozilla crash, WIN2K does not clear 
this file lockings. The files are still locked and you can´t write to the locked 
files.

I solved the problem by logoff/login and have no problems with www.epost.de.

*** Bug 74271 has been marked as a duplicate of this bug. ***
Here's a "full" stack trace:

(gdb) where full 3
#0  0x41f998fd in nsCacheService::DoomEntry_Locked (this=0x8282178, 
    entry=0x87d6fc0)
    at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:763
        this = (nsCacheService *) 0x8282178
        rv = 0
        device = (nsCacheDevice *) 0x1
#1  0x41f9da6c in nsDiskCacheDevice::BindEntry (this=0x8641d58, 
    newEntry=0x895db28)
    at ../../../../mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:803
        this = (nsDiskCacheDevice *) 0x8641d58
        rv = 136847736
        newDiskEntry = (nsDiskCacheEntry *) 0x8741a20
        oldDiskEntry = (nsDiskCacheEntry *) 0x89696d0
        dataSize = 1106952861
#2  0x41f996e4 in nsCacheService::EnsureEntryHasDevice (this=0x8282178, 
    entry=0x895db28)
    at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:718
        this = (nsCacheService *) 0x8282178
        device = (nsCacheDevice *) 0x8641d58
        rv = 136847736
(More stack frames follow...)

As you can see, "device" is corrupt with value 0x1. Thus the crash on line 763:

753 nsresult
754 nsCacheService::DoomEntry_Locked(nsCacheEntry * entry)
755 {
756     if (this == nsnull)  return NS_ERROR_NOT_AVAILABLE;
757     if (entry->IsDoomed())  return NS_OK;
758 
759     nsresult  rv = NS_OK;
760     entry->MarkDoomed();
761 
762     nsCacheDevice * device = entry->CacheDevice();
763     if (device)  device->DoomEntry(entry);
*** Bug 74280 has been marked as a duplicate of this bug. ***
Another data point: gdb seems to be consistently hanging when it tries to print
the value for "lock" in frame #3:

nsCacheService::GetTransportForEntry (this=0x8282818, 
    entry=0x8c05e00, mode=2, result=0x8c05ef8)
    at ../../../../mozilla/netwerk/cache/src/nsCacheService.cpp:827
        this = (nsCacheService *) 0x8282818
        lock = {<nsAutoLockBase> = {mAddr = 0x8282890, mDown = 0x0, 
    mType = eAutoLock
                     ^here gdb hangs while using approx. 100% CPU

I'm using the gdb snapshot from 20010102.

Also, when I "shar cache" manually, this is usually followed by a segfault with
the following stack:

(gdb) where full
#0  pthread_cond_signal (cond=0x814220c) at queue.h:40
        th = 0xbf1ffe78
        cond = (pthread_cond_t *) 0x814220c
#1  0x4031268c in pt_PostNotifies (lock=0x81421b0, unlock=1)
    at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:106
        cv = (PRCondVar *) 0x8142208
        index = 0
        rv = 0
        post = {length = 1, cv = {{cv = 0x8142208, times = 0}, {cv = 0x0, 
      times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, 
      times = 0}, {cv = 0x0, times = 0}}, link = 0x0}
        notified = (_PT_Notified *) 0xbfffe9b8
        prev = (_PT_Notified *) 0x0
#2  0x40312c3b in PR_Unlock (lock=0x81421b0)
    at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:195
        rv = -1073747416
#3  0x4018edb7 in nsAutoLock::~nsAutoLock (this=0xbfffea74, __in_chrg=2)
    at ../../../../dist/include/nsAutoLock.h:140
        this = (nsAutoLock *) 0xbfffea74
        __in_chrg = 2
#4  0x4014613f in nsThreadPool::DispatchRequest (this=0x81437b8, 
    runnable=0x86689f0) at ../../../mozilla/xpcom/threads/nsThread.cpp:513
        rv = 0
        lock = {<nsAutoLockBase> = {mAddr = 0x81421b0, mDown = 0xbfffeb84, 
    mType = eAutoLock
                     ^gdb hangs here (using 100% CPU)

The stack call stack looks something like:

#0  pthread_cond_signal (cond=0x814220c) at queue.h:40
#1  0x4031268c in pt_PostNotifies (lock=0x81421b0, unlock=1)
    at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:106
#2  0x40312c3b in PR_Unlock (lock=0x81421b0)
    at ../../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:195
#3  0x4018edb7 in nsAutoLock::~nsAutoLock (this=0xbfffea74, __in_chrg=2)
    at ../../../../dist/include/nsAutoLock.h:140
#4  0x4014613f in nsThreadPool::DispatchRequest (this=0x81437b8, 
    runnable=0x8735a30) at ../../../mozilla/xpcom/threads/nsThread.cpp:513
#5  0x40d954f0 in nsFileTransportService::DispatchRequest (this=0x8143718, 
    runnable=0x8735a30)
    at ../../../../mozilla/netwerk/base/src/nsFileTransportService.cpp:171
#6  0x40d91fb5 in nsFileTransport::AsyncRead (this=0x8735a28, 
    aListener=0x87c47cc, aContext=0x0, aTransferOffset=0, 
    aTransferCount=4294967295, aFlags=0, aResult=0x87c4830)
    at ../../../../mozilla/netwerk/base/src/nsFileTransport.cpp:477
#7  0x40e21688 in nsJARChannel::AsyncReadJARElement (this=0x87c47c8)
    at ../../../../../mozilla/netwerk/protocol/jar/src/nsJARChannel.cpp:377
#8  0x40e22186 in nsJARChannel::OnDownloadComplete (this=0x87c47c8, 
    aDownloader=0x87c4700, aClosure=0x0, aStatus=0, aFile=0x87c3d08)
    at ../../../../../mozilla/netwerk/protocol/jar/src/nsJARChannel.cpp:574
#9  0x4016742d in XPTC_InvokeByIndex (that=0x87c47d4, methodIndex=3, 
    paramCount=4, params=0x87c3c80)
    at
../../../../../../../mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:138
#10 0x40149328 in EventHandler (self=0x87c3bf8)
    at ../../../../mozilla/xpcom/proxy/src/nsProxyEvent.cpp:506
#11 0x4014040b in PL_HandleEvent (self=0x87c3bf8)
    at ../../../mozilla/xpcom/threads/plevent.c:588
#12 0x401401b9 in PL_ProcessPendingEvents (self=0x80b47f8)
    at ../../../mozilla/xpcom/threads/plevent.c:518
#13 0x401428f9 in nsEventQueueImpl::ProcessPendingEvents (this=0x80b47d0)
    at ../../../mozilla/xpcom/threads/nsEventQueue.cpp:361
This is reproducible on the Mac, now that I can load the site.
Assignee: gordon → beard
Here's the problem:  I'm getting a disk cache entry collision, and the entry that 
I am showing as colliding is somehow already marked doomed. Then when 
nsDiskCacheDevice::BindEntry() sees the collision, it calls 
nsCacheService::DoomEntry_Locked(), which returns immediately because somehow the 
entry is already marked doom. However, since the cache service isn't calling 
nsDiskCacheDevice::Doom() immediately after, the disk cache entry is left 
dangling, and we get a crash sometime later.
OK, here's the REAL problem:  somehow, bound nsCacheEntry objects are getting 
deleted behind the back of the disk cache device, leaving live, but invalid 
nsDiskCacheEntry objects in the bound entries hash table. I've added some code to 
detect this condition to nsDiskCacheDevice::BindEntry(). If an nsDiskCacheEntry 
is found in mBoundEntries that has a reference count of 1, then the disk cache 
entry is invalid. For some reason, this page generates this condition repeatably. 
Now to discover how this is happening.
Whiteboard: [cache] → [cache] fix checked in on branch.
Whiteboard: [cache] fix checked in on branch. → [cache] fix checked in on branch, need review
is it safe to access mRefCnt directly?  do we have to worry about XPCOM drift?
Okay, the REAL REAL problem (really this time) is that it is legal for cache 
devices to get a BindEntry() call for an entry that is already doomed.  One way 
this can happen is if an http FORCE-WRITE request dooms an existing entry before 
it is bound.  The holder of the descriptor for the existing (now doomed) entry 
has no knowledge that the entry has been doomed, and doesn't really care; it may 
still need to provide data to its client.

The late binding of cache entries to devices was introduced fairly late in the 
design, and neither the disk or memory cache devices handled the binding of 
doomed entries properly.

This has been fixed on the DISKCACHE1_BRANCH, and we hope to land it on the trunk 
in the next day or so.
This has been fixed with the landing of the DISKCACHE1_BRANCH.  Marking FIXED. 
Please verify with a build from 2001/04/04 or later and reopen if it occurs again.

Thanks.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verifying fixed on recent Linux CVS build.
Crash Signature: [@ nkcache.dll]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: