Closed
Bug 73491
Opened 23 years ago
Closed 23 years ago
Solaris (Linux?) optimized gtkEmbed core dumps on startup
Categories
(Core :: Preferences: Backend, defect)
Tracking
()
VERIFIED
FIXED
mozilla0.9.2
People
(Reporter: mcafee, Assigned: bnesse)
Details
(Keywords: regression, smoketest)
Attachments
(1 file)
715 bytes,
patch
|
Details | Diff | Splinter Review |
Solaris optimized gtkEmbed core dumps on startup. Linux opt & debug work, Solaris debug works, this is only optimized. This also happens to match the speedracer tinderbox build, it has been orange since last week for this reason.
My linux opt builds crash when running gtkEmbed. (gdb) bt #0 0x2ab4ea8e in PL_DHashTableEnumerate () from /usr/cls/moz/main/obj-deps/dist/bin/./libxpcom.so #1 0x2bfb39cf in nsDiskCacheEntryHashTable::VisitEntries () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #2 0x2bfb1d17 in nsDiskCacheDevice::updateDiskCacheEntries () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #3 0x2bfb2698 in nsDiskCacheDevice::scanDiskCacheEntries () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #4 0x2bfb2e44 in nsDiskCacheDevice::evictDiskCacheEntries () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #5 0x2bfb0d91 in nsDiskCacheDevice::~nsDiskCacheDevice () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #6 0x2bfadc39 in nsCacheService::CreateDiskDevice () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #7 0x2bfae657 in nsCacheService::SearchCacheDevices () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #8 0x2bfae4f3 in nsCacheService::ActivateEntry () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #9 0x2bfae151 in nsCacheService::ProcessRequest () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #10 0x2bfae3d7 in nsCacheService::OpenCacheEntry () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #11 0x2bfaf265 in nsCacheSession::AsyncOpenCacheEntry () ---Type <return> to continue, or q <return> to quit--- from /usr/cls/moz/main/obj-deps/dist/bin/components/libnkcache.so #12 0x2b531f4c in nsHTTPChannel::OpenCacheEntry () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so #13 0x2b533158 in nsHTTPChannel::Connect () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so #14 0x2b530f1a in nsHTTPChannel::AsyncOpen () from /usr/cls/moz/main/obj-deps/dist/bin/components/libnecko.so #15 0x2bf8ed1d in nsDocumentOpenInfo::Open () from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so #16 0x2bf91370 in nsURILoader::OpenURIVia () from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so #17 0x2bf8fec3 in nsURILoader::OpenURI () from /usr/cls/moz/main/obj-deps/dist/bin/components/liburiloader.so #18 0x2afdc840 in nsDocShell::DoChannelLoad () from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so #19 0x2afdc1d3 in nsDocShell::DoURILoad () from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so #20 0x2afdb061 in nsDocShell::InternalLoad () from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so #21 0x2afd4030 in nsDocShell::LoadURI () from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so #22 0x2afd73b7 in nsDocShell::LoadURI () from /usr/cls/moz/main/obj-deps/dist/bin/components/libdocshell.so ---Type <return> to continue, or q <return> to quit--- #23 0x2b9b0def in nsWebBrowser::LoadURI () from /usr/cls/moz/main/obj-deps/dist/bin/components/libwebbrwsr.so #24 0x804d8da in OpenWebPage () #25 0x804ddba in main () #26 0x2ae7f9cb in ?? () from /lib/libc.so.6 (gdb)
Comment 2•23 years ago
|
||
cls, I believe chak just checked in a fix for that?
Comment 3•23 years ago
|
||
Jud : I checked in the fix for windows only (for Bug #73225) since cls mentioned he's not seeing it on Unix with gtkemded. This looks like a different issue.
Reporter | ||
Comment 4•23 years ago
|
||
looking at solaris truss output, we load libnkcache.so and promptly crash soon after. open("/export2/mcafee/cmonkey/mozilla/dist/bin/components/libnkcache.so", O_RDONLY) = 11 fstat(11, 0xEFFFD8AC) = 0 mmap(0x00000000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEE390000 mmap(0x00000000, 540672, PROT_READ|PROT_EXEC, MAP_PRIVATE, 11, 0) = 0xEBC00000 munmap(0xEBC6E000, 57344) = 0 mmap(0xEBC7C000, 32024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 11, 442368) =\ 0xEBC7C000 close(11) = 0 mprotect(0xEBC00000, 447412, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 mprotect(0xEBC00000, 447412, PROT_READ|PROT_EXEC) = 0 munmap(0xEE390000, 8192) = 0 brk(0x001C02B0) = 0 brk(0x001C22B0) = 0 Incurred fault #6, FLTBOUNDS %pc = 0xEF496D0C siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 Received signal #11, SIGSEGV [caught] siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 sigprocmask(SIG_SETMASK, 0xEEF663D8, 0x00000000) = 0 sigaction(SIGSEGV, 0xEFFFDAA0, 0x00000000) = 0 setcontext(0xEFFFDBE8) Incurred fault #6, FLTBOUNDS %pc = 0xEF496D0C siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 Received signal #11, SIGSEGV [default] siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 *** process killed ***
Reporter | ||
Updated•23 years ago
|
Keywords: regression,
smoketest
Comment 5•23 years ago
|
||
c'ing necko cache folks.
Comment 6•23 years ago
|
||
patrick: any thoughts?
Reporter | ||
Comment 7•23 years ago
|
||
I built a few files debug by-hand, and got the stack trace below. entry is null for this crash, macro derefs w/o checking for null. [xpcom/ds/pldhash.c:438] for (i = j = 0; i < capacity; i++) { entry = (PLDHashEntryHdr *)entryAddr; =>if (ENTRY_IS_LIVE(entry)) { op = etor(table, entry, j++, arg); if (op & PL_DHASH_REMOVE) { METER(table->stats.removeEnums++); PL_DHashTableRawRemove(table, entry); } if (op & PL_DHASH_STOP) break; } #0 0xef497310 in PL_DHashTableEnumerate (table=0x1b947c, etor=0xebdf64e4 <nsDiskCacheEntryHashTable::VisitEntry(PLDHashTable *, PLDH\ ashEntryHdr *, unsigned int, void *)>, arg=0xefffdf98) at pldhash.c:438 #1 0xebdf64dc in nsDiskCacheEntryHashTable::VisitEntries (this=0x1b947c, visitor=0xefffdf98) at nsDiskCacheEntry.cpp:121 #2 0xebde888c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1b9470) at nsDiskCacheDevice.cpp:984 #3 0xebde9ce4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1b9470, result=0xefffe1e0) at nsDiskCacheDevice.cpp:1238 #4 0xebdead88 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1b9470) at nsDiskCacheDevice.cpp:1329 #5 0xebde6b64 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1b9470, __in_chrg=3) at nsDiskCacheDevice.cpp:569 #6 0xebdd0dfc in nsCacheService::CreateDiskDevice (this=0x1b92d0) at nsCacheService.cpp:242 #7 0xebdd1d0c in nsCacheService::SearchCacheDevices (this=0x1b92d0, key=0x1b20d0, policy=0) at nsCacheService.cpp:504 #8 0xebdd1974 in nsCacheService::ActivateEntry (this=0x1b92d0, request=0x1afc58, result=0xefffe448) at nsCacheService.cpp:431 #9 0xebdd14a4 in nsCacheService::ProcessRequest (this=0x1b92d0, request=0x1afc58, result=0x0) at nsCacheService.cpp:340 #10 0xebdd183c in nsCacheService::OpenCacheEntry (this=0x1b92d0, session=0x1b6e38, key=0x11f9b8 "http://www.mozilla.org/projects/embedding", accessRequested=3, listener=0x1b5624, result=0x0) at nsCacheService.cpp:405 #11 0xebddc304 in nsCacheSession::AsyncOpenCacheEntry () at ../../../dist/include/nsID.h:68 #12 0xee1c3634 in ?? () from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so #13 0xee1c498c in ?? () from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so #14 0xee1c25a8 in ?? () from /export2/mcafee/cmonkey/mozilla/dist/bin/components/libnecko.so
Reporter | ||
Comment 8•23 years ago
|
||
over to gagan. I have solaris tree ready to go: mocha:/export2/mcafee/cmonkey/mozilla
Assignee: kandrot → gagan
Component: Embedding APIs → Networking: Cache
Reporter | ||
Updated•23 years ago
|
Summary: Solaris optimized gtkEmbed core dumps on startup → Solaris (Linux?) optimized gtkEmbed core dumps on startup
It looks like we're failing to fully initialize the disk cache, and dying when it's destructor is called. I'm guessing that something goes wrong in installObservers() so the hashtable never gets initialized; then things go way bad in the destructor when we try to enumerate the uninitialized hashtable. Patrick can you fix this tonight, or do we need to land your branch tomorrow afternoon?
Comment 10•23 years ago
|
||
But what if initialization of the hash table itself fails? You need to be more careful in your destructor, I'd argue. Try this patch to see if it makes things more stable: Index: mozilla/netwerk/cache/src/nsCacheEntry.cpp =================================================================== RCS file: /cvsroot/mozilla/netwerk/cache/src/nsCacheEntry.cpp,v retrieving revision 1.30.2.1 diff -b -u -2 -r1.30.2.1 nsCacheEntry.cpp --- nsCacheEntry.cpp 2001/03/27 03:05:36 1.30.2.1 +++ nsCacheEntry.cpp 2001/03/27 08:41:18 @@ -409,5 +409,5 @@ nsCacheEntryHashTable::nsCacheEntryHashTable() - : initialized(0) + : initialized(PR_FALSE) { } @@ -416,4 +416,5 @@ nsCacheEntryHashTable::~nsCacheEntryHashTable() { + if (initialized) PL_DHashTableFinish(&table); }
Reporter | ||
Comment 11•23 years ago
|
||
beard patch looks good, tried it but solaris/opt still crashes.
Comment 12•23 years ago
|
||
The patch is for the wrong hashtable. It's crashing in the nsDiskCacheEntryHashTable. We need those checks everywhere.
Comment 13•23 years ago
|
||
I'll take this. mcafee, let me know when you're in, and I'll work with you to track this down.
Assignee: gagan → gordon
Whiteboard: [cache]
Reporter | ||
Comment 14•23 years ago
|
||
gordon and I patched the disk cache entry HT, no go. New stack: #0 0xee37b81c in nsDiskCacheEntry::getCacheEntry () #1 0xee3688c8 in nsDiskCacheDevice::updateDiskCacheEntry (this=0x1bd4f0, diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:990 #2 0xee370ecc in UpdateEntryVisitor::VisitEntry (this=0xefffdfc0, diskEntry=0xf9d211d2) at nsDiskCacheDevice.cpp:976 #3 0xee376564 in nsDiskCacheEntryHashTable::VisitEntry () #4 0xef497344 in ?? () from /builds/mcafee/cmonkey/mozilla/dist/bin/libxpcom.so #5 0xee3764fc in nsDiskCacheEntryHashTable::VisitEntries () #6 0xee36889c in nsDiskCacheDevice::updateDiskCacheEntries (this=0x1bd4f0) at nsDiskCacheDevice.cpp:984 #7 0xee369cf4 in nsDiskCacheDevice::scanDiskCacheEntries (this=0x1bd4f0, result=0xefffe208) at nsDiskCacheDevice.cpp:1238 #8 0xee36ad98 in nsDiskCacheDevice::evictDiskCacheEntries (this=0x1bd4f0) at nsDiskCacheDevice.cpp:1329 #9 0xee366b74 in nsDiskCacheDevice::~nsDiskCacheDevice (this=0x1bd4f0, __in_chrg=3) at nsDiskCacheDevice.cpp:569 #10 0xee350e0c in nsCacheService::CreateDiskDevice (this=0x1bd380) at nsCacheService.cpp:242 #11 0xee351d1c in nsCacheService::SearchCacheDevices (this=0x1bd380, key=0x1b6b18, policy=0) at nsCacheService.cpp:504 #12 0xee351984 in nsCacheService::ActivateEntry (this=0x1bd380, request=0x1b3ee0, result=0xefffe470) at nsCacheService.cpp:431 #13 0xee3514b4 in nsCacheService::ProcessRequest (this=0x1bd380, request=0x1b3ee0, result=0x0) at nsCacheService.cpp:340 #14 0xee35184c in nsCacheService::OpenCacheEntry (this=0x1bd380, session=0x1bd440, key=0x1286b8 "http://www.mozilla.org/projects/embedding", accessRequested=3, listener=0x1ba124, result=0x0) at nsCacheService.cpp:405 #15 0xee35c314 in nsCacheSession::AsyncOpenCacheEntry () at ../../../dist/include/nsIObserverService.h:36
Comment 15•23 years ago
|
||
Reporter | ||
Comment 16•23 years ago
|
||
tried last gordon patch, now we die in PREF_Init(). Cache is failing to get the pref service, now the hot potato has been tossed to another part of the code? Pref service bug? #0 0xee341228 in js_LockRuntime () #1 0xee31e498 in js_NewContext () #2 0xee314d48 in JS_NewContext () #3 0xedfd0f8c in PREF_Init () #4 0xedfd5904 in nsPref::StartUp () #5 0xedfd55f0 in nsPref::GetInstance () #6 0xedfdb168 in CreateNewPref () #7 0xef5866d8 in nsGenericFactory::CreateInstance () #8 0xef57c710 in nsComponentManagerImpl::CreateInstance () #9 0xef5a160c in nsComponentManager::CreateInstance () #10 0xef5a2370 in nsServiceManagerImpl::GetService () #11 0xef5a29a0 in nsServiceManager::GetService () #12 0xef5a1a1c in nsGetServiceByCID::operator() () #13 0xef602dd0 in nsCOMPtr_base::assign_from_helper () #14 0xedac5020 in nsCOMPtr<nsIPref>::nsCOMPtr () #15 0xedab1534 in nsChromeRegistry::nsChromeRegistry () #16 0xedaa9fa4 in nsChromeRegistryConstructor () #17 0xef5866d8 in nsGenericFactory::CreateInstance () #18 0xef57c710 in nsComponentManagerImpl::CreateInstance () #19 0xef5a160c in nsComponentManager::CreateInstance () #20 0xef5a2370 in nsServiceManagerImpl::GetService () #21 0xef5a29a0 in nsServiceManager::GetService () #22 0xef5a1a1c in nsGetServiceByCID::operator() () #23 0xef602dd0 in nsCOMPtr_base::assign_from_helper () #24 0xedad5d6c in nsCOMPtr<nsIChromeRegistry>::nsCOMPtr () #25 0xedacf390 in nsChromeProtocolHandler::NewChannel () #26 0xedbc6128 in nsIOService::NewChannelFromURI () #27 0xeee2fccc in nsStringBundle::OpenInputStream () #28 0xeee2fa9c in nsStringBundle::GetInputStream () #29 0xeee2e9d4 in nsStringBundle::InitSyncStream () #30 0xeee31194 in nsStringBundleService::getStringBundle () #31 0xeee3138c in nsStringBundleService::CreateBundle () #32 0x32174 in NS_InitEmbedding () #33 0x2ac18 in main ()
Comment 17•23 years ago
|
||
we should be handling all these failure cases gracefully, but I think there's another problem here. does the Embed dir on solaris match what's in the Embed dir on a linux box?
Comment 18•23 years ago
|
||
Neeti, you're listed as module owner for prefs on mozilla.org. I know that's out of date, but do you know who is current owner? This bug no longer pertains to the cache.
Assignee: gordon → neeti
Component: Networking: Cache → Preferences: Backend
Whiteboard: [cache]
Comment 19•23 years ago
|
||
alecf and bnesse are the new owners I believe.?
Assignee | ||
Comment 20•23 years ago
|
||
The only thing that I can see which would cause that stack is if there is no JSRuntimeService. This should fix that problem: Index: mozilla/modules/libpref/src/prefapi.c =================================================================== RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v retrieving revision 3.87 diff -u -2 -r3.87 prefapi.c --- prefapi.c 2001/03/20 14:34:54 3.87 +++ prefapi.c 2001/03/29 17:47:29 @@ -284,5 +284,9 @@ if (!gMochaTaskState) + { gMochaTaskState = PREF_GetJSRuntime(); + if (!gMochaTaskState) + goto out; + } if (!gMochaContext)
Comment 22•23 years ago
|
||
seems reasonable to me... as long as it's been testeded with seamonkey, winEmbed, and gtkEmbed, then sr=alecf
Assignee | ||
Comment 23•23 years ago
|
||
Actually, after looking at it again... I believe this is better... Index: mozilla/modules/libpref/src/prefapi.c =================================================================== RCS file: /cvsroot/mozilla/modules/libpref/src/prefapi.c,v retrieving revision 3.87 diff -u -2 -r3.87 prefapi.c --- prefapi.c 2001/03/20 14:34:54 3.87 +++ prefapi.c 2001/03/29 17:47:29 @@ -284,5 +284,9 @@ if (!gMochaTaskState) + { gMochaTaskState = PREF_GetJSRuntime(); + if (!gMochaTaskState) + return PR_FALSE; + } if (!gMochaContext) The previous patch will return PR_TRUE due to the intialization of 'ok' when it is declared.
Comment 24•23 years ago
|
||
The patch to nsDiskCacheDevice was checked in long ago.
Comment 25•23 years ago
|
||
Anyone with a winEmbed and/or gtkEmbed, can you please build & test this so that the fix could be checked in? Otherwise this will miss the 0.9.1 train...
Comment 26•23 years ago
|
||
Cc:ing Conrad and blizzard...
Reporter | ||
Comment 27•23 years ago
|
||
This patch worksforme on linux, debug and optimized. Since Solaris isn't currently crashing, it will be hard to verify this is actually fixing the crash. I think we should still check this in if we think this is a good change. r=mcafee
Updated•23 years ago
|
Target Milestone: --- → mozilla0.9.2
Comment 28•23 years ago
|
||
the patch helped my solaris7sparc SunForteC5 build, please get approval and check it in [r=timeless].
Keywords: approval
Assignee | ||
Comment 29•23 years ago
|
||
Ok, I've already had r's and sr's on this patch, we just didn't check in because there was no verification that it did anything. I have to wipe my current prefapi.c (working on other patches in the same file) and reapply this patch. Then I will check it in.
Assignee | ||
Comment 30•23 years ago
|
||
Checked in. Closing as fixed as bug 81436 is now tracking the question of "Why does libxpconnect fail to load"
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•