Solaris (Linux?) optimized gtkEmbed core dumps on startup

VERIFIED FIXED in mozilla0.9.2

Status

()

Core
Preferences: Backend
VERIFIED FIXED
17 years ago
15 years ago

People

(Reporter: Chris McAfee, Assigned: Brian Nesse (gone))

Tracking

({regression, smoketest})

Trunk
mozilla0.9.2
x86
Linux
regression, smoketest
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

17 years ago
Solaris optimized gtkEmbed core dumps on startup.
Linux opt & debug work, Solaris debug works, this is
only optimized.  This also happens to match the speedracer
tinderbox build, it has been orange since last week for
this reason.

Comment 1

17 years ago
My linux opt builds crash when running gtkEmbed.

(gdb) bt
#0  0x2ab4ea8e in PL_DHashTableEnumerate ()
   from /usr/cls/moz/main/obj-deps/dist/bin/./libxpcom.so
#1  0x2bfb39cf in nsDiskCacheEntryHashTable::VisitEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#2  0x2bfb1d17 in nsDiskCacheDevice::updateDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#3  0x2bfb2698 in nsDiskCacheDevice::scanDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#4  0x2bfb2e44 in nsDiskCacheDevice::evictDiskCacheEntries ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#5  0x2bfb0d91 in nsDiskCacheDevice::~nsDiskCacheDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#6  0x2bfadc39 in nsCacheService::CreateDiskDevice ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#7  0x2bfae657 in nsCacheService::SearchCacheDevices ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#8  0x2bfae4f3 in nsCacheService::ActivateEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#9  0x2bfae151 in nsCacheService::ProcessRequest ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#10 0x2bfae3d7 in nsCacheService::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#11 0x2bfaf265 in nsCacheSession::AsyncOpenCacheEntry ()
---Type <return> to continue, or q <return> to quit---
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so
#12 0x2b531f4c in nsHTTPChannel::OpenCacheEntry ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#13 0x2b533158 in nsHTTPChannel::Connect ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#14 0x2b530f1a in nsHTTPChannel::AsyncOpen ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so
#15 0x2bf8ed1d in nsDocumentOpenInfo::Open ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#16 0x2bf91370 in nsURILoader::OpenURIVia ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#17 0x2bf8fec3 in nsURILoader::OpenURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so
#18 0x2afdc840 in nsDocShell::DoChannelLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#19 0x2afdc1d3 in nsDocShell::DoURILoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#20 0x2afdb061 in nsDocShell::InternalLoad ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#21 0x2afd4030 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
#22 0x2afd73b7 in nsDocShell::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so
---Type <return> to continue, or q <return> to quit---
#23 0x2b9b0def in nsWebBrowser::LoadURI ()
   from /usr/cls/moz/main/obj-deps/dist/bin/components/libwebbrwsr.so
#24 0x804d8da in OpenWebPage ()
#25 0x804ddba in main ()
#26 0x2ae7f9cb in ?? () from /lib/libc.so.6
(gdb) 

Comment 2

17 years ago
cls, I believe chak just checked in a fix for that?

Comment 3

17 years ago
Jud : I checked in the fix for windows only (for Bug #73225) since cls mentioned 
he's not seeing it on Unix with gtkemded.

This looks like a different issue. 

(Reporter)

Comment 4

17 years ago
looking at solaris truss output, we load libnkcache.so and
promptly crash soon after.

open("/export2/mcafee/cmonkey/mozilla/dist/bin/components/libnkcache.so", 
O_RDONLY) = 11
fstat(11, 0xEFFFD8AC)               = 0
mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEE390000
mmap(0x00000000, 540672, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEBC00000
munmap(0xEBC6E000, 57344)           = 0
mmap(0xEBC7C000, 32024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 
11, 442368) =\
 0xEBC7C000
close(11)                   = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0xEBC00000, 447412, PROT_READ|PROT_EXEC) = 0
munmap(0xEE390000, 8192)            = 0
brk(0x001C02B0)                 = 0
brk(0x001C22B0)                 = 0
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [caught]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
sigprocmask(SIG_SETMASK, 0xEEF663D8, 0x00000000) = 0
sigaction(SIGSEGV, 0xEFFFDAA0, 0x00000000)  = 0
setcontext(0xEFFFDBE8)
    Incurred fault #6, FLTBOUNDS  %pc = 0xEF496D0C
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
    *** process killed ***
(Reporter)

Updated

17 years ago
Keywords: regression, smoketest

Comment 5

17 years ago
c'ing necko cache folks.

Comment 6

17 years ago
patrick: any thoughts?
(Reporter)

Comment 7

17 years ago
I built a few files debug by-hand, and got the
stack trace below.  entry is null for this crash,
macro derefs w/o checking for null.

[xpcom/ds/pldhash.c:438]
for (i = j = 0; i < capacity; i++) {
        entry = (PLDHashEntryHdr *)entryAddr;
      =>if (ENTRY_IS_LIVE(entry)) {
            op = etor(table, entry, j++, arg);
            if (op & PL_DHASH_REMOVE) {
                METER(table->stats.removeEnums++);
                PL_DHashTableRawRemove(table, entry);
            }
            if (op & PL_DHASH_STOP)
                break;
        }



#0  0xef497310 in PL_DHashTableEnumerate (table=0x1b947c,
    etor=0xebdf64e4 <nsDiskCacheEntryHashTable::VisitEntry(PLDHashTable *, PLDH\
ashEntryHdr *, unsigned int, void *)>, arg=0xefffdf98) at pldhash.c:438
#1  0xebdf64dc in nsDiskCacheEntryHashTable::VisitEntries (this=0x1b947c,
    visitor=0xefffdf98) at nsDiskCacheEntry.cpp:121
#2  0xebde888c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:984
#3  0xebde9ce4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1b9470,
    result=0xefffe1e0) at nsDiskCacheDevice.cpp:1238
#4  0xebdead88 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1b9470)
    at nsDiskCacheDevice.cpp:1329
#5  0xebde6b64 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1b9470,
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#6  0xebdd0dfc in nsCacheService::CreateDiskDevice (this=0x1b92d0)
    at nsCacheService.cpp:242
#7  0xebdd1d0c in nsCacheService::SearchCacheDevices (this=0x1b92d0,
    key=0x1b20d0, policy=0) at nsCacheService.cpp:504
#8  0xebdd1974 in nsCacheService::ActivateEntry (this=0x1b92d0,
    request=0x1afc58, result=0xefffe448) at nsCacheService.cpp:431
#9  0xebdd14a4 in nsCacheService::ProcessRequest (this=0x1b92d0,
    request=0x1afc58, result=0x0) at nsCacheService.cpp:340
#10 0xebdd183c in nsCacheService::OpenCacheEntry (this=0x1b92d0,
    session=0x1b6e38,
    key=0x11f9b8 "http://www.mozilla.org/projects/embedding",
    accessRequested=3, listener=0x1b5624, result=0x0) at nsCacheService.cpp:405
#11 0xebddc304 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsID.h:68
#12 0xee1c3634 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#13 0xee1c498c in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
#14 0xee1c25a8 in ?? ()
   from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
(Reporter)

Comment 8

17 years ago
over to gagan.  I have solaris tree ready to go:

  mocha:/export2/mcafee/cmonkey/mozilla
Assignee: kandrot → gagan
Component: Embedding APIs → Networking: Cache
(Reporter)

Updated

17 years ago
Summary: Solaris optimized gtkEmbed core dumps on startup → Solaris (Linux?) optimized gtkEmbed core dumps on startup

Comment 9

17 years ago
It looks like we're failing to fully initialize the disk cache, and dying when 
it's destructor is called.  I'm guessing that something goes wrong in 
installObservers() so the hashtable never gets initialized; then things go way 
bad in the destructor when we try to enumerate the uninitialized hashtable.

Patrick can you fix this tonight, or do we need to land your branch tomorrow 
afternoon?

Comment 10

17 years ago
But what if initialization of the hash table itself fails? You need to be more 
careful in your destructor, I'd argue. Try this patch to see if it makes things 
more stable:

Index: mozilla/netwerk/cache/src/nsCacheEntry.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/cache/src/nsCacheEntry.cpp,v
retrieving revision 1.30.2.1
diff -b -u -2 -r1.30.2.1 nsCacheEntry.cpp
--- nsCacheEntry.cpp	2001/03/27 03:05:36	1.30.2.1
+++ nsCacheEntry.cpp	2001/03/27 08:41:18
@@ -409,5 +409,5 @@
 
 nsCacheEntryHashTable::nsCacheEntryHashTable()
-    : initialized(0)
+    : initialized(PR_FALSE)
 {
 }
@@ -416,4 +416,5 @@
 nsCacheEntryHashTable::~nsCacheEntryHashTable()
 {
+    if (initialized)
     PL_DHashTableFinish(&table);
 }
(Reporter)

Comment 11

17 years ago
beard patch looks good, tried it but solaris/opt still crashes.

Comment 12

17 years ago
The patch is for the wrong hashtable.  It's crashing in the 
nsDiskCacheEntryHashTable.  We need those checks everywhere.

Comment 13

17 years ago
I'll take this.

mcafee, let me know when you're in, and I'll work with you to track this down.
Assignee: gagan → gordon
Whiteboard: [cache]
(Reporter)

Comment 14

17 years ago
gordon and I patched the disk cache entry HT, no go.  New stack:

#0  0xee37b81c in nsDiskCacheEntry::getCacheEntry ()
#1  0xee3688c8 in nsDiskCacheDevice::updateDiskCacheEntry (this=0x1bd4f0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:990
#2  0xee370ecc in UpdateEntryVisitor::VisitEntry (this=0xefffdfc0, 
    diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:976
#3  0xee376564 in nsDiskCacheEntryHashTable::VisitEntry ()
#4  0xef497344 in ?? ()
   from /builds/mcafee/cmonkey/mozilla/dist/bin/libxpcom.so
#5  0xee3764fc in nsDiskCacheEntryHashTable::VisitEntries ()
#6  0xee36889c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:984
#7  0xee369cf4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1bd4f0, 
    result=0xefffe208) at nsDiskCacheDevice.cpp:1238
#8  0xee36ad98 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1bd4f0)
    at nsDiskCacheDevice.cpp:1329
#9  0xee366b74 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1bd4f0, 
    __in_chrg=3) at nsDiskCacheDevice.cpp:569
#10 0xee350e0c in nsCacheService::CreateDiskDevice (this=0x1bd380)
    at nsCacheService.cpp:242
#11 0xee351d1c in nsCacheService::SearchCacheDevices (this=0x1bd380, 
    key=0x1b6b18, policy=0) at nsCacheService.cpp:504
#12 0xee351984 in nsCacheService::ActivateEntry (this=0x1bd380, 
    request=0x1b3ee0, result=0xefffe470) at nsCacheService.cpp:431
#13 0xee3514b4 in nsCacheService::ProcessRequest (this=0x1bd380, 
    request=0x1b3ee0, result=0x0) at nsCacheService.cpp:340
#14 0xee35184c in nsCacheService::OpenCacheEntry (this=0x1bd380, 
    session=0x1bd440, 
    key=0x1286b8 "http://www.mozilla.org/projects/embedding", 
    accessRequested=3, listener=0x1ba124, result=0x0) at nsCacheService.cpp:405
#15 0xee35c314 in nsCacheSession::AsyncOpenCacheEntry ()
    at ../../../dist/include/nsIObserverService.h:36

Comment 15

17 years ago
Created attachment 28949 [details] [diff] [review]
patch to gracefully shutdown on failed Init()
(Reporter)

Comment 16

17 years ago
tried last gordon patch, now we die in PREF_Init().  Cache is failing
to get the pref service, now the hot potato has been tossed to
another part of the code?  Pref service bug?



#0  0xee341228 in js_LockRuntime ()
#1  0xee31e498 in js_NewContext ()
#2  0xee314d48 in JS_NewContext ()
#3  0xedfd0f8c in PREF_Init ()
#4  0xedfd5904 in nsPref::StartUp ()
#5  0xedfd55f0 in nsPref::GetInstance ()
#6  0xedfdb168 in CreateNewPref ()
#7  0xef5866d8 in nsGenericFactory::CreateInstance ()
#8  0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#9  0xef5a160c in nsComponentManager::CreateInstance ()
#10 0xef5a2370 in nsServiceManagerImpl::GetService ()
#11 0xef5a29a0 in nsServiceManager::GetService ()
#12 0xef5a1a1c in nsGetServiceByCID::operator() ()
#13 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#14 0xedac5020 in nsCOMPtr<nsIPref>::nsCOMPtr ()
#15 0xedab1534 in nsChromeRegistry::nsChromeRegistry ()
#16 0xedaa9fa4 in nsChromeRegistryConstructor ()
#17 0xef5866d8 in nsGenericFactory::CreateInstance ()
#18 0xef57c710 in nsComponentManagerImpl::CreateInstance ()
#19 0xef5a160c in nsComponentManager::CreateInstance ()
#20 0xef5a2370 in nsServiceManagerImpl::GetService ()
#21 0xef5a29a0 in nsServiceManager::GetService ()
#22 0xef5a1a1c in nsGetServiceByCID::operator() ()
#23 0xef602dd0 in nsCOMPtr_base::assign_from_helper ()
#24 0xedad5d6c in nsCOMPtr<nsIChromeRegistry>::nsCOMPtr ()
#25 0xedacf390 in nsChromeProtocolHandler::NewChannel ()
#26 0xedbc6128 in nsIOService::NewChannelFromURI ()
#27 0xeee2fccc in nsStringBundle::OpenInputStream ()
#28 0xeee2fa9c in nsStringBundle::GetInputStream ()
#29 0xeee2e9d4 in nsStringBundle::InitSyncStream ()
#30 0xeee31194 in nsStringBundleService::getStringBundle ()
#31 0xeee3138c in nsStringBundleService::CreateBundle ()
#32 0x32174 in NS_InitEmbedding ()
#33 0x2ac18 in main ()

Comment 17

17 years ago
we should be handling all these failure cases gracefully, but I think there's
another problem here. does the Embed dir on solaris match what's in the Embed
dir on a linux box?

Comment 18

17 years ago
Neeti, you're listed as module owner for prefs on mozilla.org.  I know that's
out of date, but do you know who is current owner?  This bug no longer pertains
to the cache.
Assignee: gordon → neeti
Component: Networking: Cache → Preferences: Backend
Whiteboard: [cache]

Comment 19

17 years ago
alecf and bnesse are the new owners I believe.?
(Assignee)

Comment 20

17 years ago
The only thing that I can see which would cause that stack is if there is no 
JSRuntimeService. This should fix that problem:

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c	2001/03/20 14:34:54	3.87
+++ prefapi.c	2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            goto out;
+    }
 
     if (!gMochaContext)

Comment 21

17 years ago
Reassigning to bnesse for now
Assignee: neeti → bnesse

Comment 22

17 years ago
seems reasonable to me... as long as it's been testeded with seamonkey,
winEmbed, and gtkEmbed, then sr=alecf
(Assignee)

Comment 23

17 years ago
Actually, after looking at it again... I believe this is better...

Index: mozilla/modules/libpref/src/prefapi.c
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v
retrieving revision 3.87
diff -u -2 -r3.87 prefapi.c
--- prefapi.c   2001/03/20 14:34:54     3.87
+++ prefapi.c   2001/03/29 17:47:29
@@ -284,5 +284,9 @@
 
     if (!gMochaTaskState)
+    {
         gMochaTaskState = PREF_GetJSRuntime();
+        if (!gMochaTaskState)
+            return PR_FALSE;
+    }
 
     if (!gMochaContext)

The previous patch will return PR_TRUE due to the intialization of 'ok' when it 
is declared.
(Reporter)

Updated

17 years ago
Keywords: patch

Comment 24

17 years ago
The patch to nsDiskCacheDevice was checked in long ago.

Comment 25

17 years ago
Anyone with a winEmbed and/or gtkEmbed, can you please build & test this so that 
the fix could be checked in? Otherwise this will miss the 0.9.1 train...

Comment 26

17 years ago
Cc:ing Conrad and blizzard...
(Reporter)

Comment 27

17 years ago
This patch worksforme on linux, debug and optimized.
Since Solaris isn't currently crashing, it will be hard
to verify this is actually fixing the crash.  I think
we should still check this in if we think this is a
good change.  r=mcafee

Updated

17 years ago
Target Milestone: --- → mozilla0.9.2

Comment 28

17 years ago
the patch helped my solaris7sparc SunForteC5 build, please get approval 
and check it in [r=timeless].
Keywords: approval
(Assignee)

Comment 29

17 years ago
Ok, I've already had r's and sr's on this patch, we just didn't check in because 
there was no verification that it did anything. I have to wipe my current 
prefapi.c (working on other patches in the same file) and reapply this patch. 
Then I will check it in.
(Assignee)

Comment 30

17 years ago
Checked in. Closing as fixed as bug 81436 is now tracking the question of "Why 
does libxpconnect fail to load"
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 31

15 years ago
Clean up verification of dated code change bus
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.