73491 - Solaris (Linux?) optimized gtkEmbed core dumps on startup

Reporter

Description

•

23 years ago

Solaris optimized gtkEmbed core dumps on startup.
Linux opt & debug work, Solaris debug works, this is
only optimized.  This also happens to match the speedracer
tinderbox build, it has been orange since last week for
this reason.

cls

Comment 1

•

23 years ago

My linux opt builds crash when running gtkEmbed.

(gdb) bt
#0  0x2ab4ea8e in PL_DHashTableEnumerate ()
   from /usr/cls/moz/main/obj-deps/dist/bin/./libxpcom.so
#1  0x2bfb39cf in nsDiskCacheEntryHashTable::VisitEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#2  0x2bfb1d17 in nsDiskCacheDevice::updateDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#3  0x2bfb2698 in nsDiskCacheDevice::scanDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#4  0x2bfb2e44 in nsDiskCacheDevice::evictDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#5  0x2bfb0d91 in nsDiskCacheDevice::~nsDiskCacheDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#6  0x2bfadc39 in nsCacheService::CreateDiskDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#7  0x2bfae657 in nsCacheService::SearchCacheDevices ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#8  0x2bfae4f3 in nsCacheService::ActivateEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#9  0x2bfae151 in nsCacheService::ProcessRequest ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#10 0x2bfae3d7 in nsCacheService::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#11 0x2bfaf265 in nsCacheSession::AsyncOpenCacheEntry ()
---Type <return> to continue, or q <return> to quit---
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#12 0x2b531f4c in nsHTTPChannel::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#13 0x2b533158 in nsHTTPChannel::Connect ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#14 0x2b530f1a in nsHTTPChannel::AsyncOpen ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#15 0x2bf8ed1d in nsDocumentOpenInfo::Open ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#16 0x2bf91370 in nsURILoader::OpenURIVia ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#17 0x2bf8fec3 in nsURILoader::OpenURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#18 0x2afdc840 in nsDocShell::DoChannelLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#19 0x2afdc1d3 in nsDocShell::DoURILoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#20 0x2afdb061 in nsDocShell::InternalLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#21 0x2afd4030 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#22 0x2afd73b7 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
---Type <return> to continue, or q <return> to quit---
#23 0x2b9b0def in nsWebBrowser::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libwebbrwsr.so
#24 0x804d8da in OpenWebPage ()
#25 0x804ddba in main ()
#26 0x2ae7f9cb in ?? () from /lib/libc.so.6
(gdb)

Judson Valeski

Comment 2

•

23 years ago

cls, I believe chak just checked in a fix for that?

Chak Nanga

Comment 3

•

23 years ago

Jud : I checked in the fix for windows only (for Bug #73225) since cls mentioned 
he's not seeing it on Unix with gtkemded.

This looks like a different issue.

Chris McAfee

Reporter

Comment 4

•

23 years ago

looking at solaris truss output, we load libnkcache.so and
promptly crash soon after.

open("/export2/mcafee/cmonkey/mozilla/dist/bin/components/libnkcache.so", 
O_RDONLY) = 11
fstat(11, 0xEFFFD8AC)               = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEE390000
mmap(0x00000000, 540672, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEBC00000
munmap(0xEBC6E000, 57344)           = 0
mmap(0xEBC7C000, 32024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 
11, 442368) =\
 0xEBC7C000
close(11)                   = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_EXEC) = 0
munmap(0xEE390000, 8192)            = 0
brk(0x001C02B0)                 = 0
brk(0x001C22B0)                 = 0
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [caught]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
sigprocmask(SIG_SETMASK, 0xEEF663D8, 0x00000000) = 0
sigaction(SIGSEGV, 0xEFFFDAA0, 0x00000000)  = 0
setcontext(0xEFFFDBE8)
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    *** process killed ***

Chris McAfee

Reporter

Updated

•

23 years ago

Keywords: regression, smoketest

Judson Valeski

Comment 5

•

23 years ago

c'ing necko cache folks.

Darin Fisher

Comment 6

•

23 years ago

patrick: any thoughts?

Chris McAfee

Reporter

Comment 7

•

23 years ago

I built a few files debug by-hand, and got the
stack trace below.  entry is null for this crash,
macro derefs w/o checking for null.

[xpcom/ds/pldhash.c:438]
for (i = j = 0; i < capacity; i++) {
        entry = (PLDHashEntryHdr *)entryAddr;
      =>if (ENTRY_IS_LIVE(entry)) {
            op = etor(table, entry, j++, arg);
            if (op & PL_DHASH_REMOVE) {
                METER(table->stats.removeEnums++);
                PL_DHashTableRawRemove(table, entry);
            }
            if (op & PL_DHASH_STOP)
                break;
        }



#0  0xef497310 in PL_DHashTableEnumerate (table=0x1b947c,
    etor=0xebdf64e4 <nsDiskCacheEntryHashTable::VisitEntry(PLDHashTable *, PLDH\
ashEntryHdr *, unsigned int, void *)>, arg=0xefffdf98) at pldhash.c:438
#1  0xebdf64dc in nsDiskCacheEntryHashTable::VisitEntries (this=0x1b947c,
    visitor=0xefffdf98) at nsDiskCacheEntry.cpp:121
#2  0xebde888c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:984
#3  0xebde9ce4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1b9470,
    result=0xefffe1e0) at nsDiskCacheDevice.cpp:1238
#4  0xebdead88 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:1329
#5  0xebde6b64 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1b9470,
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#6  0xebdd0dfc in nsCacheService::CreateDiskDevice (this=0x1b92d0)
    at nsCacheService.cpp:242
#7  0xebdd1d0c in nsCacheService::SearchCacheDevices (this=0x1b92d0,
    key=0x1b20d0, policy=0) at nsCacheService.cpp:504
#8  0xebdd1974 in nsCacheService::ActivateEntry (this=0x1b92d0,
    request=0x1afc58, result=0xefffe448) at nsCacheService.cpp:431
#9  0xebdd14a4 in nsCacheService::ProcessRequest (this=0x1b92d0,
    request=0x1afc58, result=0x0) at nsCacheService.cpp:340
#10 0xebdd183c in nsCacheService::OpenCacheEntry (this=0x1b92d0,
    session=0x1b6e38,
    key=0x11f9b8 "http://www.mozilla.org/projects/embedding",
    accessRequested=3, listener=0x1b5624, result=0x0) at nsCacheService.cpp:405
#11 0xebddc304 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsID.h:68
#12 0xee1c3634 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#13 0xee1c498c in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#14 0xee1c25a8 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so

Chris McAfee

Reporter

Comment 8

•

23 years ago

over to gagan.  I have solaris tree ready to go:

  mocha:/export2/mcafee/cmonkey/mozilla

Assignee: kandrot → gagan

Component: Embedding APIs → Networking: Cache

Chris McAfee

Reporter

Updated

•

23 years ago

Summary: Solaris optimized gtkEmbed core dumps on startup → Solaris (Linux?) optimized gtkEmbed core dumps on startup

gordon

Comment 9

•

23 years ago

It looks like we're failing to fully initialize the disk cache, and dying when 
it's destructor is called.  I'm guessing that something goes wrong in 
installObservers() so the hashtable never gets initialized; then things go way 
bad in the destructor when we try to enumerate the uninitialized hashtable.

Patrick can you fix this tonight, or do we need to land your branch tomorrow 
afternoon?

Patrick C. Beard

Comment 10

•

23 years ago

But what if initialization of the hash table itself fails? You need to be more 
careful in your destructor, I'd argue. Try this patch to see if it makes things 
more stable:

Index: mozilla/netwerk/cache/src/nsCacheEntry.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/cache/src/nsCacheEntry.cpp,v
retrieving revision 1.30.2.1
diff -b -u -2 -r1.30.2.1 nsCacheEntry.cpp
--- nsCacheEntry.cpp	2001/03/27 03:05:36	1.30.2.1
+++ nsCacheEntry.cpp	2001/03/27 08:41:18
@@ -409,5 +409,5 @@
 
 nsCacheEntryHashTable::nsCacheEntryHashTable()
-    : initialized(0)
+    : initialized(PR_FALSE)
 {
 }
@@ -416,4 +416,5 @@
 nsCacheEntryHashTable::~nsCacheEntryHashTable()
 {
+    if (initialized)
     PL_DHashTableFinish(&table);
 }

Chris McAfee

Reporter

Comment 11

•

23 years ago

beard patch looks good, tried it but solaris/opt still crashes.

gordon

Comment 12

•

23 years ago

The patch is for the wrong hashtable.  It's crashing in the 
nsDiskCacheEntryHashTable.  We need those checks everywhere.

gordon

Comment 13

•

23 years ago

I'll take this.

mcafee, let me know when you're in, and I'll work with you to track this down.

Assignee: gagan → gordon

Whiteboard: [cache]

Chris McAfee

Reporter

Comment 14

•

23 years ago

gordon and I patched the disk cache entry HT, no go.  New stack:

#0  0xee37b81c in nsDiskCacheEntry::getCacheEntry ()
#1  0xee3688c8 in nsDiskCacheDevice::updateDiskCacheEntry (this=0x1bd4f0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:990
#2  0xee370ecc in UpdateEntryVisitor::VisitEntry (this=0xefffdfc0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:976
#3  0xee376564 in nsDiskCacheEntryHashTable::VisitEntry ()
#4  0xef497344 in ?? ()
   from /builds/mcafee/cmonkey/mozilla/dist/bin/libxpcom.so
#5  0xee3764fc in nsDiskCacheEntryHashTable::VisitEntries ()
#6  0xee36889c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:984
#7  0xee369cf4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1bd4f0, 
    result=0xefffe208) at nsDiskCacheDevice.cpp:1238
#8  0xee36ad98 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:1329
#9  0xee366b74 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1bd4f0, 
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#10 0xee350e0c in nsCacheService::CreateDiskDevice (this=0x1bd380)
    at nsCacheService.cpp:242
#11 0xee351d1c in nsCacheService::SearchCacheDevices (this=0x1bd380, 
    key=0x1b6b18, policy=0) at nsCacheService.cpp:504
#12 0xee351984 in nsCacheService::ActivateEntry (this=0x1bd380, 
    request=0x1b3ee0, result=0xefffe470) at nsCacheService.cpp:431
#13 0xee3514b4 in nsCacheService::ProcessRequest (this=0x1bd380, 
    request=0x1b3ee0, result=0x0) at nsCacheService.cpp:340
#14 0xee35184c in nsCacheService::OpenCacheEntry (this=0x1bd380, 
    session=0x1bd440, 
    key=0x1286b8 "http://www.mozilla.org/projects/embedding", 
    accessRequested=3, listener=0x1ba124, result=0x0) at nsCacheService.cpp:405
#15 0xee35c314 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsIObserverService.h:36

gordon

Comment 15

•

23 years ago

Attached patch patch to gracefully shutdown on failed Init() — Details — Splinter Review

Chris McAfee

Reporter

Comment 16

•

23 years ago

tried last gordon patch, now we die in PREF_Init().  Cache is failing
to get the pref service, now the hot potato has been tossed to
another part of the code?  Pref service bug?



#0  0xee341228 in js_LockRuntime ()
#1  0xee31e498 in js_NewContext ()
#2  0xee314d48 in JS_NewContext ()
#3  0xedfd0f8c in PREF_Init ()
#4  0xedfd5904 in nsPref::StartUp ()
#5  0xedfd55f0 in nsPref::GetInstance ()
#6  0xedfdb168 in CreateNewPref ()
#7  0xef5866d8 in nsGenericFactory::CreateInstance ()
#8  0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#9  0xef5a160c in nsComponentManager::CreateInstance ()
#10 0xef5a2370 in nsServiceManagerImpl::GetService ()
#11 0xef5a29a0 in nsServiceManager::GetService ()
#12 0xef5a1a1c in nsGetServiceByCID::operator() ()
#13 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#14 0xedac5020 in nsCOMPtr<nsIPref>::nsCOMPtr ()
#15 0xedab1534 in nsChromeRegistry::nsChromeRegistry ()
#16 0xedaa9fa4 in nsChromeRegistryConstructor ()
#17 0xef5866d8 in nsGenericFactory::CreateInstance ()
#18 0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#19 0xef5a160c in nsComponentManager::CreateInstance ()
#20 0xef5a2370 in nsServiceManagerImpl::GetService ()
#21 0xef5a29a0 in nsServiceManager::GetService ()
#22 0xef5a1a1c in nsGetServiceByCID::operator() ()
#23 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#24 0xedad5d6c in nsCOMPtr<nsIChromeRegistry>::nsCOMPtr ()
#25 0xedacf390 in nsChromeProtocolHandler::NewChannel ()
#26 0xedbc6128 in nsIOService::NewChannelFromURI ()
#27 0xeee2fccc in nsStringBundle::OpenInputStream ()
#28 0xeee2fa9c in nsStringBundle::GetInputStream ()
#29 0xeee2e9d4 in nsStringBundle::InitSyncStream ()
#30 0xeee31194 in nsStringBundleService::getStringBundle ()
#31 0xeee3138c in nsStringBundleService::CreateBundle ()
#32 0x32174 in NS_InitEmbedding ()
#33 0x2ac18 in main ()

Judson Valeski

Comment 17

•

23 years ago

we should be handling all these failure cases gracefully, but I think there's
another problem here. does the Embed dir on solaris match what's in the Embed
dir on a linux box?

gordon

Comment 18

•

23 years ago

Neeti, you're listed as module owner for prefs on mozilla.org.  I know that's
out of date, but do you know who is current owner?  This bug no longer pertains
to the cache.

Assignee: gordon → neeti

Component: Networking: Cache → Preferences: Backend

Whiteboard: [cache]

Judson Valeski

Comment 19

•

23 years ago

alecf and bnesse are the new owners I believe.?

Brian Nesse (gone)

Assignee

Comment 20

•

23 years ago

The only thing that I can see which would cause that stack is if there is no 
JSRuntimeService. This should fix that problem:

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c	2001/03/20 14:34:54	3.87
+++ prefapi.c	2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            goto out;
+    }
 
     if (!gMochaContext)

neeti

Comment 21

•

23 years ago

Reassigning to bnesse for now

Assignee: neeti → bnesse

Alec Flett

Comment 22

•

23 years ago

seems reasonable to me... as long as it's been testeded with seamonkey,
winEmbed, and gtkEmbed, then sr=alecf

Brian Nesse (gone)

Assignee

Comment 23

•

23 years ago

Actually, after looking at it again... I believe this is better...

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c   2001/03/20 14:34:54     3.87
+++ prefapi.c   2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            return PR_FALSE;
+    }
 
     if (!gMochaContext)

The previous patch will return PR_TRUE due to the intialization of 'ok' when it 
is declared.

Chris McAfee

Reporter

Updated

•

23 years ago

Keywords: patch

gordon

Comment 24

•

23 years ago

The patch to nsDiskCacheDevice was checked in long ago.

Jussi-Pekka Mantere

Comment 25

•

23 years ago

Anyone with a winEmbed and/or gtkEmbed, can you please build & test this so that 
the fix could be checked in? Otherwise this will miss the 0.9.1 train...

Chak Nanga

Comment 26

•

23 years ago

Cc:ing Conrad and blizzard...

Chris McAfee

Reporter

Comment 27

•

23 years ago

This patch worksforme on linux, debug and optimized.
Since Solaris isn't currently crashing, it will be hard
to verify this is actually fixing the crash.  I think
we should still check this in if we think this is a
good change.  r=mcafee

Jussi-Pekka Mantere

Updated

•

23 years ago

Target Milestone: --- → mozilla0.9.2

timeless

Comment 28

•

23 years ago

the patch helped my solaris7sparc SunForteC5 build, please get approval 
and check it in [r=timeless].

Keywords: approval

Brian Nesse (gone)

Assignee

Comment 29

•

23 years ago

Ok, I've already had r's and sr's on this patch, we just didn't check in because 
there was no verification that it did anything. I have to wipe my current 
prefapi.c (working on other patches in the same file) and reapply this patch. 
Then I will check it in.

Brian Nesse (gone)

Assignee

Comment 30

•

23 years ago

Checked in. Closing as fixed as bug 81436 is now tracking the question of "Why 
does libxpconnect fail to load"

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Michael Dunn

Comment 31

•

22 years ago

Clean up verification of dated code change bus

Status: RESOLVED → VERIFIED