Closed Bug 73491 Opened 23 years ago Closed 23 years ago

Solaris (Linux?) optimized gtkEmbed core dumps on startup

Categories

(Core :: Preferences: Backend, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: mcafee, Assigned: bnesse)

Details

(Keywords: regression, smoketest)

Attachments

(1 file)

Solaris optimized gtkEmbed core dumps on startup.
Linux opt & debug work, Solaris debug works, this is
only optimized.  This also happens to match the speedracer
tinderbox build, it has been orange since last week for
this reason.
My linux opt builds crash when running gtkEmbed.

(gdb) bt
#0  0x2ab4ea8e in PL_DHashTableEnumerate ()
   from /usr/cls/moz/main/obj-deps/dist/bin/./libxpcom.so
#1  0x2bfb39cf in nsDiskCacheEntryHashTable::VisitEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#2  0x2bfb1d17 in nsDiskCacheDevice::updateDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#3  0x2bfb2698 in nsDiskCacheDevice::scanDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#4  0x2bfb2e44 in nsDiskCacheDevice::evictDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#5  0x2bfb0d91 in nsDiskCacheDevice::~nsDiskCacheDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#6  0x2bfadc39 in nsCacheService::CreateDiskDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#7  0x2bfae657 in nsCacheService::SearchCacheDevices ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#8  0x2bfae4f3 in nsCacheService::ActivateEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#9  0x2bfae151 in nsCacheService::ProcessRequest ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#10 0x2bfae3d7 in nsCacheService::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#11 0x2bfaf265 in nsCacheSession::AsyncOpenCacheEntry ()
---Type <return> to continue, or q <return> to quit---
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#12 0x2b531f4c in nsHTTPChannel::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#13 0x2b533158 in nsHTTPChannel::Connect ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#14 0x2b530f1a in nsHTTPChannel::AsyncOpen ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#15 0x2bf8ed1d in nsDocumentOpenInfo::Open ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#16 0x2bf91370 in nsURILoader::OpenURIVia ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#17 0x2bf8fec3 in nsURILoader::OpenURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#18 0x2afdc840 in nsDocShell::DoChannelLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#19 0x2afdc1d3 in nsDocShell::DoURILoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#20 0x2afdb061 in nsDocShell::InternalLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#21 0x2afd4030 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#22 0x2afd73b7 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
---Type <return> to continue, or q <return> to quit---
#23 0x2b9b0def in nsWebBrowser::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libwebbrwsr.so
#24 0x804d8da in OpenWebPage ()
#25 0x804ddba in main ()
#26 0x2ae7f9cb in ?? () from /lib/libc.so.6
(gdb) 
cls, I believe chak just checked in a fix for that?
Jud : I checked in the fix for windows only (for Bug #73225) since cls mentioned 
he's not seeing it on Unix with gtkemded.

This looks like a different issue. 

looking at solaris truss output, we load libnkcache.so and
promptly crash soon after.

open("/export2/mcafee/cmonkey/mozilla/dist/bin/components/libnkcache.so", 
O_RDONLY) = 11
fstat(11, 0xEFFFD8AC)               = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEE390000
mmap(0x00000000, 540672, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEBC00000
munmap(0xEBC6E000, 57344)           = 0
mmap(0xEBC7C000, 32024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 
11, 442368) =\
 0xEBC7C000
close(11)                   = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_EXEC) = 0
munmap(0xEE390000, 8192)            = 0
brk(0x001C02B0)                 = 0
brk(0x001C22B0)                 = 0
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [caught]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
sigprocmask(SIG_SETMASK, 0xEEF663D8, 0x00000000) = 0
sigaction(SIGSEGV, 0xEFFFDAA0, 0x00000000)  = 0
setcontext(0xEFFFDBE8)
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    *** process killed ***
c'ing necko cache folks.
patrick: any thoughts?
I built a few files debug by-hand, and got the
stack trace below.  entry is null for this crash,
macro derefs w/o checking for null.

[xpcom/ds/pldhash.c:438]
for (i = j = 0; i < capacity; i++) {
        entry = (PLDHashEntryHdr *)entryAddr;
      =>if (ENTRY_IS_LIVE(entry)) {
            op = etor(table, entry, j++, arg);
            if (op & PL_DHASH_REMOVE) {
                METER(table->stats.removeEnums++);
                PL_DHashTableRawRemove(table, entry);
            }
            if (op & PL_DHASH_STOP)
                break;
        }



#0  0xef497310 in PL_DHashTableEnumerate (table=0x1b947c,
    etor=0xebdf64e4 <nsDiskCacheEntryHashTable::VisitEntry(PLDHashTable *, PLDH\
ashEntryHdr *, unsigned int, void *)>, arg=0xefffdf98) at pldhash.c:438
#1  0xebdf64dc in nsDiskCacheEntryHashTable::VisitEntries (this=0x1b947c,
    visitor=0xefffdf98) at nsDiskCacheEntry.cpp:121
#2  0xebde888c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:984
#3  0xebde9ce4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1b9470,
    result=0xefffe1e0) at nsDiskCacheDevice.cpp:1238
#4  0xebdead88 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:1329
#5  0xebde6b64 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1b9470,
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#6  0xebdd0dfc in nsCacheService::CreateDiskDevice (this=0x1b92d0)
    at nsCacheService.cpp:242
#7  0xebdd1d0c in nsCacheService::SearchCacheDevices (this=0x1b92d0,
    key=0x1b20d0, policy=0) at nsCacheService.cpp:504
#8  0xebdd1974 in nsCacheService::ActivateEntry (this=0x1b92d0,
    request=0x1afc58, result=0xefffe448) at nsCacheService.cpp:431
#9  0xebdd14a4 in nsCacheService::ProcessRequest (this=0x1b92d0,
    request=0x1afc58, result=0x0) at nsCacheService.cpp:340
#10 0xebdd183c in nsCacheService::OpenCacheEntry (this=0x1b92d0,
    session=0x1b6e38,
    key=0x11f9b8 "http://www.mozilla.org/projects/embedding",
    accessRequested=3, listener=0x1b5624, result=0x0) at nsCacheService.cpp:405
#11 0xebddc304 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsID.h:68
#12 0xee1c3634 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#13 0xee1c498c in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#14 0xee1c25a8 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
over to gagan.  I have solaris tree ready to go:

  mocha:/export2/mcafee/cmonkey/mozilla
Assignee: kandrot → gagan
Component: Embedding APIs → Networking: Cache
Summary: Solaris optimized gtkEmbed core dumps on startup → Solaris (Linux?) optimized gtkEmbed core dumps on startup
It looks like we're failing to fully initialize the disk cache, and dying when 
it's destructor is called.  I'm guessing that something goes wrong in 
installObservers() so the hashtable never gets initialized; then things go way 
bad in the destructor when we try to enumerate the uninitialized hashtable.

Patrick can you fix this tonight, or do we need to land your branch tomorrow 
afternoon?
But what if initialization of the hash table itself fails? You need to be more 
careful in your destructor, I'd argue. Try this patch to see if it makes things 
more stable:

Index: mozilla/netwerk/cache/src/nsCacheEntry.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/cache/src/nsCacheEntry.cpp,v
retrieving revision 1.30.2.1
diff -b -u -2 -r1.30.2.1 nsCacheEntry.cpp
--- nsCacheEntry.cpp	2001/03/27 03:05:36	1.30.2.1
+++ nsCacheEntry.cpp	2001/03/27 08:41:18
@@ -409,5 +409,5 @@
 
 nsCacheEntryHashTable::nsCacheEntryHashTable()
-    : initialized(0)
+    : initialized(PR_FALSE)
 {
 }
@@ -416,4 +416,5 @@
 nsCacheEntryHashTable::~nsCacheEntryHashTable()
 {
+    if (initialized)
     PL_DHashTableFinish(&table);
 }
beard patch looks good, tried it but solaris/opt still crashes.
The patch is for the wrong hashtable.  It's crashing in the 
nsDiskCacheEntryHashTable.  We need those checks everywhere.
I'll take this.

mcafee, let me know when you're in, and I'll work with you to track this down.
Assignee: gagan → gordon
Whiteboard: [cache]
gordon and I patched the disk cache entry HT, no go.  New stack:

#0  0xee37b81c in nsDiskCacheEntry::getCacheEntry ()
#1  0xee3688c8 in nsDiskCacheDevice::updateDiskCacheEntry (this=0x1bd4f0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:990
#2  0xee370ecc in UpdateEntryVisitor::VisitEntry (this=0xefffdfc0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:976
#3  0xee376564 in nsDiskCacheEntryHashTable::VisitEntry ()
#4  0xef497344 in ?? ()
   from /builds/mcafee/cmonkey/mozilla/dist/bin/libxpcom.so
#5  0xee3764fc in nsDiskCacheEntryHashTable::VisitEntries ()
#6  0xee36889c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:984
#7  0xee369cf4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1bd4f0, 
    result=0xefffe208) at nsDiskCacheDevice.cpp:1238
#8  0xee36ad98 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:1329
#9  0xee366b74 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1bd4f0, 
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#10 0xee350e0c in nsCacheService::CreateDiskDevice (this=0x1bd380)
    at nsCacheService.cpp:242
#11 0xee351d1c in nsCacheService::SearchCacheDevices (this=0x1bd380, 
    key=0x1b6b18, policy=0) at nsCacheService.cpp:504
#12 0xee351984 in nsCacheService::ActivateEntry (this=0x1bd380, 
    request=0x1b3ee0, result=0xefffe470) at nsCacheService.cpp:431
#13 0xee3514b4 in nsCacheService::ProcessRequest (this=0x1bd380, 
    request=0x1b3ee0, result=0x0) at nsCacheService.cpp:340
#14 0xee35184c in nsCacheService::OpenCacheEntry (this=0x1bd380, 
    session=0x1bd440, 
    key=0x1286b8 "http://www.mozilla.org/projects/embedding", 
    accessRequested=3, listener=0x1ba124, result=0x0) at nsCacheService.cpp:405
#15 0xee35c314 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsIObserverService.h:36
tried last gordon patch, now we die in PREF_Init().  Cache is failing
to get the pref service, now the hot potato has been tossed to
another part of the code?  Pref service bug?



#0  0xee341228 in js_LockRuntime ()
#1  0xee31e498 in js_NewContext ()
#2  0xee314d48 in JS_NewContext ()
#3  0xedfd0f8c in PREF_Init ()
#4  0xedfd5904 in nsPref::StartUp ()
#5  0xedfd55f0 in nsPref::GetInstance ()
#6  0xedfdb168 in CreateNewPref ()
#7  0xef5866d8 in nsGenericFactory::CreateInstance ()
#8  0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#9  0xef5a160c in nsComponentManager::CreateInstance ()
#10 0xef5a2370 in nsServiceManagerImpl::GetService ()
#11 0xef5a29a0 in nsServiceManager::GetService ()
#12 0xef5a1a1c in nsGetServiceByCID::operator() ()
#13 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#14 0xedac5020 in nsCOMPtr<nsIPref>::nsCOMPtr ()
#15 0xedab1534 in nsChromeRegistry::nsChromeRegistry ()
#16 0xedaa9fa4 in nsChromeRegistryConstructor ()
#17 0xef5866d8 in nsGenericFactory::CreateInstance ()
#18 0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#19 0xef5a160c in nsComponentManager::CreateInstance ()
#20 0xef5a2370 in nsServiceManagerImpl::GetService ()
#21 0xef5a29a0 in nsServiceManager::GetService ()
#22 0xef5a1a1c in nsGetServiceByCID::operator() ()
#23 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#24 0xedad5d6c in nsCOMPtr<nsIChromeRegistry>::nsCOMPtr ()
#25 0xedacf390 in nsChromeProtocolHandler::NewChannel ()
#26 0xedbc6128 in nsIOService::NewChannelFromURI ()
#27 0xeee2fccc in nsStringBundle::OpenInputStream ()
#28 0xeee2fa9c in nsStringBundle::GetInputStream ()
#29 0xeee2e9d4 in nsStringBundle::InitSyncStream ()
#30 0xeee31194 in nsStringBundleService::getStringBundle ()
#31 0xeee3138c in nsStringBundleService::CreateBundle ()
#32 0x32174 in NS_InitEmbedding ()
#33 0x2ac18 in main ()
we should be handling all these failure cases gracefully, but I think there's
another problem here. does the Embed dir on solaris match what's in the Embed
dir on a linux box?
Neeti, you're listed as module owner for prefs on mozilla.org.  I know that's
out of date, but do you know who is current owner?  This bug no longer pertains
to the cache.
Assignee: gordon → neeti
Component: Networking: Cache → Preferences: Backend
Whiteboard: [cache]
alecf and bnesse are the new owners I believe.?
The only thing that I can see which would cause that stack is if there is no 
JSRuntimeService. This should fix that problem:

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c	2001/03/20 14:34:54	3.87
+++ prefapi.c	2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            goto out;
+    }
 
     if (!gMochaContext)
Reassigning to bnesse for now
Assignee: neeti → bnesse
seems reasonable to me... as long as it's been testeded with seamonkey,
winEmbed, and gtkEmbed, then sr=alecf
Actually, after looking at it again... I believe this is better...

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c   2001/03/20 14:34:54     3.87
+++ prefapi.c   2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            return PR_FALSE;
+    }
 
     if (!gMochaContext)

The previous patch will return PR_TRUE due to the intialization of 'ok' when it 
is declared.
Keywords: patch
The patch to nsDiskCacheDevice was checked in long ago.
Anyone with a winEmbed and/or gtkEmbed, can you please build & test this so that 
the fix could be checked in? Otherwise this will miss the 0.9.1 train...
Cc:ing Conrad and blizzard...
This patch worksforme on linux, debug and optimized.
Since Solaris isn't currently crashing, it will be hard
to verify this is actually fixing the crash.  I think
we should still check this in if we think this is a
good change.  r=mcafee
Target Milestone: --- → mozilla0.9.2
the patch helped my solaris7sparc SunForteC5 build, please get approval 
and check it in [r=timeless].
Keywords: approval
Ok, I've already had r's and sr's on this patch, we just didn't check in because 
there was no verification that it did anything. I have to wipe my current 
prefapi.c (working on other patches in the same file) and reapply this patch. 
Then I will check it in.
Checked in. Closing as fixed as bug 81436 is now tracking the question of "Why 
does libxpconnect fail to load"
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Clean up verification of dated code change bus
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: