Closed
Bug 234620
Opened 21 years ago
Closed 20 years ago
Unknown random SEGV/seg fault/core dumps/crashes, only thing on is Mail/IMAP [@ 0x00000001 - nsSupportsArray::ElementAt][@ nsSupportsArray::Clear][@ NSS_CMSArray_Sort][@ nsSupportsArray::Clear][@ nsSupportsArray::DeleteArray]
Categories
(MailNews Core :: Networking: IMAP, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.8beta1
People
(Reporter: jerry.lundstrom, Assigned: darin.moz)
References
Details
(4 keywords, Whiteboard: [not fixed in firefox1.0])
Crash Data
Attachments
(6 files, 3 obsolete files)
2.32 KB,
text/plain
|
Details | |
2.76 KB,
text/plain
|
Details | |
2.60 KB,
text/plain
|
Details | |
6.89 KB,
text/plain
|
Details | |
1.07 KB,
patch
|
dbaron
:
review+
Bienvenu
:
superreview+
asa
:
approval-aviary+
mkaply
:
approval1.7.5+
|
Details | Diff | Splinter Review |
1.33 KB,
patch
|
dbaron
:
review+
Bienvenu
:
superreview+
mkaply
:
approval1.7.5+
asa
:
approval1.8a5+
|
Details | Diff | Splinter Review |
User-Agent: Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7a) Gecko/20040215 Mozilla will randomly crash for me when i just leave it running. Only thing its running is IMAP to one mail account, checking mail one time a minut. I get alot of mail and have about 30 filters, 15 imap boxes. All mail are on the server, nothing is moved to local disk. Im running nightly from 15/2 and this has been going on since 1.6 (1.6 hangs insteed of crashes). Im only using complete install, no other modules. Only plugins i have is flashplayer. Reproducible: Always Steps to Reproduce: 1. Just leave it running Actual Results: It crashes, core dumps, SEGV's. Expected Results: Keep running =) One thing to think about is that my imap server is a roundrobin to 2-3 backends, maybe thats the problem. Here is the backtrace from gdb: (gdb) bt #0 0x403f9bf1 in kill () from /lib/libc.so.6 #1 0x400e783d in pthread_kill () from /lib/libpthread.so.0 #2 0x400e7b5b in raise () from /lib/libpthread.so.0 #3 0x40e7d408 in NSGetModule () from /home/prox/mozilla/components/libprofile.so #4 0x400ea905 in __pthread_sighandler () from /lib/libpthread.so.0 #5 <signal handler called> #6 0x41c657d9 in NSGetModule () from /home/prox/mozilla/components/libxpinstall.so #7 0x41c658bc in NSGetModule () from /home/prox/mozilla/components/libxpinstall.so #8 0x40624028 in nsSupportsArray::Clear() () from /home/prox/mozilla/libxpcom.so #9 0x40623988 in nsSupportsArray::DeleteArray() () from /home/prox/mozilla/libxpcom.so #10 0x406233fa in nsSupportsArray::~nsSupportsArray() () from /home/prox/mozilla/libxpcom.so #11 0x4062365c in nsSupportsArray::Release() () from /home/prox/mozilla/libxpcom.so #12 0x0807665b in nsCOMPtr_base::~nsCOMPtr_base() () #13 0x4061e54f in nsObserverList::~nsObserverList() () from /home/prox/mozilla/libxpcom.so #14 0x4061efe6 in nsObserverService::Create(nsISupports*, nsID const&, void**) () from /home/prox/mozilla/libxpcom.so #15 0x4061c606 in nsHashtable::Enumerate(int (*)(nsHashKey*, void*, void*), void*) () from /home/prox/mozilla/libxpcom.so #16 0x4061715f in PL_DHashTableEnumerate () from /home/prox/mozilla/libxpcom.so #17 0x4061c6ac in nsHashtable::Reset(int (*)(nsHashKey*, void*, void*), void*) () from /home/prox/mozilla/libxpcom.so #18 0x4061ddcd in nsObjectHashtable::Reset() () from /home/prox/mozilla/libxpcom.so #19 0x4061dc5d in nsObjectHashtable::~nsObjectHashtable() () from /home/prox/mozilla/libxpcom.so #20 0x4061ef48 in nsObserverService::~nsObserverService() () from /home/prox/mozilla/libxpcom.so #21 0x4061ed8c in nsObserverService::Release() () from /home/prox/mozilla/libxpcom.so #22 0x080766ae in nsCOMPtr_base::assign_with_AddRef(nsISupports*) () #23 0x4065514b in nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) () from /home/prox/mozilla/libxpcom.so #24 0x4061715f in PL_DHashTableEnumerate () from /home/prox/mozilla/libxpcom.so #25 0x406551d3 in nsComponentManagerImpl::FreeServices() () from /home/prox/mozilla/libxpcom.so #26 0x4061627d in NS_ShutdownXPCOM () from /home/prox/mozilla/libxpcom.so #27 0x08077029 in NS_ShutdownXPCOM () #28 0x080774d0 in GRE_Shutdown () #29 0x0805b7c5 in main ()
Reporter | ||
Comment 1•21 years ago
|
||
confirmed on mozilla build id: 2004021608 also.
Comment 2•21 years ago
|
||
that stack trace looks like the app is trying to shut down. Is that possible?
Reporter | ||
Comment 3•21 years ago
|
||
Hmmm shutdown, i dont think so, all it does it fetch my mail. I'll pay closer attention to this (checking if the Mail window is up when i close browser windows) but i doubt that the problem. My current dist is lunar-linux (www.lunar-linux.org), most things are compiled with pentium4, see/mmx, fpu=both. Maybe thats the problem that libc is compiled with pentium4 under gcc 3.2.3 but this kinds of crashes happend on my debian/unstable system also but not as frequently. I'll start dumping core and see if it crashes at the same place everytime.
Reporter | ||
Comment 4•21 years ago
|
||
more backtraces: (gdb) bt #0 0x403f9bf1 in kill () from /lib/libc.so.6 #1 0x400e783d in pthread_kill () from /lib/libpthread.so.0 #2 0x400e7b5b in raise () from /lib/libpthread.so.0 #3 0x40e7d408 in NSGetModule () from /home/prox/mozilla/components/libprofile.so #4 0x400ea905 in __pthread_sighandler () from /lib/libpthread.so.0 #5 <signal handler called> #6 0x00000011 in ?? () #7 0x40623aa6 in nsSupportsArray::ElementAt(unsigned) () from /home/prox/mozilla/libxpcom.so #8 0x406243d8 in nsSupportsArray::GetElementAt(unsigned, nsISupports**) () from /home/prox/mozilla/libxpcom.so #9 0x4061ec5d in ObserverListEnumerator::GetNext(nsISupports**) () from /home/prox/mozilla/libxpcom.so #10 0x4061f332 in nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) () from /home/prox/mozilla/libxpcom.so #11 0x4065f81f in nsEventQueueImpl::NotifyObservers(char const*) () from /home/prox/mozilla/libxpcom.so #12 0x4065f3cb in nsEventQueueImpl::InitFromPRThread(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #13 0x40660bf9 in nsEventQueueServiceImpl::MakeNewQueue(PRThread*, int, nsIEventQueue**) () from /home/prox/mozilla/libxpcom.so #14 0x40660c97 in nsEventQueueServiceImpl::CreateEventQueue(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #15 0x40660ad3 in nsEventQueueServiceImpl::CreateMonitoredThreadEventQueue() () from /home/prox/mozilla/libxpcom.so #16 0x406659cb in nsProxyObject::PostAndWait(nsProxyObjectCallInfo*) () from /home/prox/mozilla/libxpcom.so #17 0x40665ce7 in nsProxyObject::Post(unsigned, nsXPTMethodInfo*, nsXPTCMiniVariant*, nsIInterfaceInfo*) () from /home/prox/mozilla/libxpcom.so #18 0x40667bd6 in nsProxyEventObject::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) () from /home/prox/mozilla/libxpcom.so #19 0x4067a4b7 in XPTC_InvokeByIndex () from /home/prox/mozilla/libxpcom.so #20 0x417737d6 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #21 0x41777c64 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #22 0x41769fb9 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #23 0x41769694 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #24 0x41768cb0 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #25 0x406617ab in nsThread::Main(void*) () from /home/prox/mozilla/libxpcom.so #26 0x400c6639 in PR_Select () from mozilla/libnspr4.so #27 0x400e4d03 in pthread_start_thread () from /lib/libpthread.so.0 #28 0x404b1d97 in clone () from /lib/libc.so.6 Here you clearly see that its a SEGV in nsSupportsArray::ElementAt().
Reporter | ||
Comment 5•21 years ago
|
||
Another core, looks same as before. Im gonna update to 20040217 nightly now. (gdb) bt #0 0x403f9bf1 in kill () from /lib/libc.so.6 #1 0x400e783d in pthread_kill () from /lib/libpthread.so.0 #2 0x400e7b5b in raise () from /lib/libpthread.so.0 #3 0x40e7d408 in NSGetModule () from /home/prox/mozilla/components/libprofile.so #4 0x400ea905 in __pthread_sighandler () from /lib/libpthread.so.0 #5 <signal handler called> #6 0x00000011 in ?? () #7 0x40623aa6 in nsSupportsArray::ElementAt(unsigned) () from /home/prox/mozilla/libxpcom.so #8 0x406243d8 in nsSupportsArray::GetElementAt(unsigned, nsISupports**) () from /home/prox/mozilla/libxpcom.so #9 0x4061ec5d in ObserverListEnumerator::GetNext(nsISupports**) () from /home/prox/mozilla/libxpcom.so #10 0x4061f332 in nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) () from /home/prox/mozilla/libxpcom.so #11 0x4065f81f in nsEventQueueImpl::NotifyObservers(char const*) () from /home/prox/mozilla/libxpcom.so #12 0x4065f3cb in nsEventQueueImpl::InitFromPRThread(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #13 0x40660bf9 in nsEventQueueServiceImpl::MakeNewQueue(PRThread*, int, nsIEventQueue**) () from /home/prox/mozilla/libxpcom.so #14 0x40660c97 in nsEventQueueServiceImpl::CreateEventQueue(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #15 0x40660ad3 in nsEventQueueServiceImpl::CreateMonitoredThreadEventQueue() () from /home/prox/mozilla/libxpcom.so #16 0x406659cb in nsProxyObject::PostAndWait(nsProxyObjectCallInfo*) () from /home/prox/mozilla/libxpcom.so #17 0x40665ce7 in nsProxyObject::Post(unsigned, nsXPTMethodInfo*, nsXPTCMiniVariant*, nsIInterfaceInfo*) () from /home/prox/mozilla/libxpcom.so #18 0x40667bd6 in nsProxyEventObject::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) () from /home/prox/mozilla/libxpcom.so #19 0x4067a4b7 in XPTC_InvokeByIndex () from /home/prox/mozilla/libxpcom.so #20 0x419d07d6 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #21 0x419d4c64 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #22 0x419c6fb9 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #23 0x419c6694 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #24 0x419c5cb0 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #25 0x406617ab in nsThread::Main(void*) () from /home/prox/mozilla/libxpcom.so #26 0x400c6639 in PR_Select () from mozilla/libnspr4.so #27 0x400e4d03 in pthread_start_thread () from /lib/libpthread.so.0 #28 0x404b1d97 in clone () from /lib/libc.so.6
Comment 6•21 years ago
|
||
I guess it would be nice to have symbols for the imap part of the stack trace. Why does xpcom have symbols and not imap? We're supposed to work fine mixing debug and non-debug components, but it always makes me nervous. The stack trace itself points to a problem in the observer service, or a ref-counting problem with the observers.
Components are compiled with only the necessary symbols exported. (See mozilla/build/unix/gnu-ld-scripts/.) In optimized builds, this means they don't have any symbol data other than NSGetModule (or equivalent). This is the way we've distributed builds for years. It's not a mix of debug and non-debug components -- it's just that libraries that are linked against need symbols to link against, but component libraries only need a single symbol as an entry point.
Reporter | ||
Comment 8•21 years ago
|
||
more bt's, also, if someone could build me a dbg nightly I'll be happy to wait for it to core dump =) (gdb) bt #0 0x403f9bf1 in kill () from /lib/libc.so.6 #1 0x400e783d in pthread_kill () from /lib/libpthread.so.0 #2 0x400e7b5b in raise () from /lib/libpthread.so.0 #3 0x40e58408 in NSGetModule () from /home/prox/mozilla/components/libprofile.so #4 0x400ea905 in __pthread_sighandler () from /lib/libpthread.so.0 #5 <signal handler called> #6 0x00000011 in ?? () #7 0x405feaa6 in nsSupportsArray::ElementAt(unsigned) () from /home/prox/mozilla/libxpcom.so #8 0x405ff3d8 in nsSupportsArray::GetElementAt(unsigned, nsISupports**) () from /home/prox/mozilla/libxpcom.so #9 0x405f9c5d in ObserverListEnumerator::GetNext(nsISupports**) () from /home/prox/mozilla/libxpcom.so #10 0x405fa332 in nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) () from /home/prox/mozilla/libxpcom.so #11 0x4063a81f in nsEventQueueImpl::NotifyObservers(char const*) () from /home/prox/mozilla/libxpcom.so #12 0x4063a3cb in nsEventQueueImpl::InitFromPRThread(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #13 0x4063bbf9 in nsEventQueueServiceImpl::MakeNewQueue(PRThread*, int, nsIEventQueue**) () from /home/prox/mozilla/libxpcom.so #14 0x4063bc97 in nsEventQueueServiceImpl::CreateEventQueue(PRThread*, int) () from /home/prox/mozilla/libxpcom.so #15 0x4063bad3 in nsEventQueueServiceImpl::CreateMonitoredThreadEventQueue() () from /home/prox/mozilla/libxpcom.so #16 0x406409cb in nsProxyObject::PostAndWait(nsProxyObjectCallInfo*) () from /home/prox/mozilla/libxpcom.so #17 0x40640ce7 in nsProxyObject::Post(unsigned, nsXPTMethodInfo*, nsXPTCMiniVariant*, nsIInterfaceInfo*) () from /home/prox/mozilla/libxpcom.so #18 0x40642bd6 in nsProxyEventObject::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) () from /home/prox/mozilla/libxpcom.so #19 0x406554b7 in XPTC_InvokeByIndex () from /home/prox/mozilla/libxpcom.so #20 0x418997d6 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #21 0x4189dc64 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #22 0x4188ffb9 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #23 0x4188f694 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #24 0x4188ecb0 in NSGetModule () from /home/prox/mozilla/components/libmsgimap.so #25 0x4063c7ab in nsThread::Main(void*) () from /home/prox/mozilla/libxpcom.so #26 0x400c6639 in PR_Select () from mozilla/libnspr4.so #27 0x400e4d03 in pthread_start_thread () from /lib/libpthread.so.0 #28 0x404b1d97 in clone () from /lib/libc.so.6
The stack traces have a slightly higher chance of being useful if you also include the output of /proc/<pid>/maps , where <pid> is the process ID of the process that crashed. (Slightly higher means that it becomes possible to extract the necessary information given: * the stack * the exact nightly you were using * the maps file but it's still quite difficult.) Also, if you attach further stacks, it's probably better to attach them (see the "Create an attachment" link above) so that the bug stays more readable.
Reporter | ||
Comment 10•21 years ago
|
||
In what manner can i copy the map file? doesn't it dissapear after the process crashes? My nightly id right now is 2004021708.
You do need to get the map file before the process exits, but it doesn't need to be immediately before -- anytime after it's fully started up should be fine. (I was thinking you had the crash in gdb rather than debugging a core file, in which case the map file would still have been there.) Also, do you have a dual CPU machine?
(Note that any of those three pieces of information other than the stack isn't useful without having all of them for the same crash. Also, don't worry too much about getting them, because it's not all that likely they'll lead to anything useful.)
It's worth noting that the observer being notified here is probably the appshell service, but that if the problem is a refcounting error it would be a refcounting error on the weak reference object and not the appshell service itself. (Both the appshell service code and the observer code do a bunch of rather nasty things, but nothing obvious that would crash.)
Reporter | ||
Comment 14•21 years ago
|
||
Yes its a dual p4 2.4ghz with HyperThreading on so it says there is 4 cpus. No i don't run mozilla via gdb, i just gdb the core. I don't have enought time to spend to run it via gdb.
Reporter | ||
Comment 15•20 years ago
|
||
bump, what has happend? anyone find anything? It still crashes for me (build id: 2004030109).
Reporter | ||
Comment 16•20 years ago
|
||
Is there a debug version of the nightly somewhere I can use to get a better dump ? Or can i find the build schema for nightly somewhere?
Reporter | ||
Comment 17•20 years ago
|
||
this is the builtin stacktrace from nightly built with debug
Reporter | ||
Comment 18•20 years ago
|
||
Please look at the attached stack trace, I can replicate this if you want to have other type of information like the map file etc etc.
Comment 19•20 years ago
|
||
thx, that stack trace is much more useful. I suspect it's a race condition exacerbated by your cpu setup. I also suspect that you're encountering a lot of different problems, from all your stack traces. I see a race condition that could result in m_transport getting cleared between the time that it's checked for null and the time it's used. I can try to fix that...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 20•20 years ago
|
||
This is the most common stack trace, it breaks at nsSupportArray::ElementAt(). I have not seen any other array breaks.
Reporter | ||
Comment 21•20 years ago
|
||
btw, if you want me to test some patch just send it to me and i will, but include the build configure (.mozconfig).
Comment 22•20 years ago
|
||
this is just a possibility - but we should be protecting the clearing of m_transport with a monitor in case the code that's checking the non-nullness of the m_transport has it cleared out from under it. I'm not sure why you need a .mozconfig from me - the one you have should be fine - do you have a tree that builds? If so, you can just apply the patch and rebuild. But as I said before, I think you have a lot of different problems, that we'll have to try to knock off one at a time. I'll look at the stack you just posted, but I have a fear it's not in the imap code...
Comment 23•20 years ago
|
||
Re the event queue stuff, I stepped through the code a bit. Is it possible that the app shell event queue stuff isn't thread-safe? The stack in http://bugzilla.mozilla.org/attachment.cgi?id=142984&action=view is from the imap thread. I wonder what happens when the observers array gets changed while we're iterating over it. The observer service uses a lock when things are added or removed, but I don't see any locking when we're iterating over the list of observers via an enumerator...Also, does the linux code use native event queues? It's not clear to me if the native event queue code path modifies the array we're enumerating over or not, but if it did, that could cause more possibilities of race conditions...
Comment 24•20 years ago
|
||
Jerry, what's the date of the most recent build you've been running? Darin says that Brendan fixed some crash in nsSupportsArray, though I doubt that's involved here, since I believe you've crashed before and after his checkin of 02/25/04
Reporter | ||
Comment 25•20 years ago
|
||
I don't know if this has anything todo with the race but i just got this: ###!!! ASSERTION: nsTDependentString must wrap only null-terminated strings: 'mData[mLength] == 0', file ../../../dist/include/string/nsTDependentString.h, line 67 Break: at file ../../../dist/include/string/nsTDependentString.h, line 67 Other then that ive been running the lastest cvs with the patch you added for a few hours now.
Reporter | ||
Comment 26•20 years ago
|
||
stack with the patch :/ seams like its still racing.
Comment 27•20 years ago
|
||
As I said, I think you're running into several different race conditions. My fix has nothing to do with the event queue race conditions, but rather an internal race condition in the imap code (the stack trace with CanHandleUrl in it, - http://bugzilla.mozilla.org/attachment.cgi?id=142864&action=view ). The string assertion is probably just because of some new string changes and is most likely not related. I could take some stabs at using locks in the event queue code, for you to try, but it would be just a stab in the dark. It's also possible that it's a ref-counting problem, as dbaron points out, but the fact that you've got a 4 cpu system makes me suspect a race condition (though race conditions can expose ref-counting problems too).
Reporter | ||
Comment 28•20 years ago
|
||
I wondering if it could be HyperThreading also, if linux treats it as just 2 more cpus but its really not maybe that has to do with the instability. The only other thing I notice about my machine is that it can sometimes lock up for a sec or two if its doing MASSIVE memory swaping. I will reboot and disable HyperThreading and see if thats the problem.
Reporter | ||
Comment 29•20 years ago
|
||
Turning of HT makes it more stable but it still races. It has crashed two times now since yesterday and both are at the place shown in http://bugzilla.mozilla.org/attachment.cgi?id=142984&action=view .
Reporter | ||
Comment 30•20 years ago
|
||
This stack is of 3 processes that crashed at the same time. Before the segv you will see 1 2 3 4, they are printf in SupportsArray::ElementAt : NS_IMETHODIMP_(nsISupports*) nsSupportsArray::ElementAt(PRUint32 aIndex) { printf("1 %lu %lu\n", aIndex, mCount); if (aIndex < mCount) { printf("2 %p %p\n", mArray, mAutoArray); nsISupports* element = mArray[aIndex]; printf("3 %p\n", element); NS_IF_ADDREF(element); printf("4\n"); return element; printf("5\n"); } printf("6\n"); return 0; } As you can see it clearly crashes between 3 and 4 doing the ref count. And as you can see from my other gdb backtrace the value is 0x00000011 . So something sets the element to 0x11. I currently am running HT again, using gcc 3.3.3 and all things (except mozilla) are optimized with pentium4, mmx/sse/sse2, fpu=x387/sse -O2 .
Comment 31•20 years ago
|
||
I can't see that someone's set it to 11 from the stack trace - am I missing something? It definitely seems that multiple threads are accessing the array, though that's not neccesarily a problem (though it's not protected by a monitor, so if someone's altering the queue at the same time, maybe bad things could happen). You might try adding printfs in the code that removes elements from the nsSupportsArray...
Comment 32•20 years ago
|
||
Jerry: if you think memory is getting overwritten, one of the best ways to track that down is valgrind: http://valgrind.kde.org/ It handles threading, but I'm not sure how well.
Keywords: crash
Comment 33•20 years ago
|
||
I believe this bug still exists at least in mozilla 1.7. A customer of mine reported a crash and had the same stack trace as comment #4
Comment 34•20 years ago
|
||
Following is my investigation based on the core file I got. The crash also happened on a 2 AMD CPU machine running solaris. HIH the crash happened at nsSupportsArray::ElementAt(PRUint32 aIndex) which is: 1 NS_IMETHODIMP_(nsISupports*) 2 nsSupportsArray::ElementAt(PRUint32 aIndex) 3 { 4 if (aIndex < mCount) { 5 nsISupports* element = mArray[aIndex]; 6 NS_IF_ADDREF(element); //return expr ? expr->AddRef() : 0; 7 return element; 8 } 9 return 0; 10 } In the core file, beside the sighandler, the top of the call stack is: libxpcom.so`__1cPnsSupportsArrayJElementAt6MI_pnLnsISupports__+0x27(81e6b78, 0) 0xcd250831(81e6b78, 0, cb21f7bc) checking the assemble code: : pushl %ebp +1: movl %esp,%ebp +3: pushl %ebx +4: call +0x5 <libxpcom.so`__1cPnsSupportsArrayJElementAt6MI_pnLnsISupports__+9> +9: popl %ebx +0xa: addl $0x85b7f,%ebx +0x10: movl 0xc(%ebp),%ecx +0x13: movl 0x8(%ebp),%eax +0x16: cmpl 0x10(%eax),%ecx +0x19: jae +0x19 <libxpcom.so`__1cPnsSupportsArrayJElementAt6MI_pnLnsISupports__+0x32> +0x1b: movl 0x8(%eax),%eax +0x1e: movl (%eax,%ecx,4),%ebx +0x21: testl %ebx,%ebx +0x23: je +0x11 <libxpcom.so`__1cPnsSupportsArrayJElementAt6MI_pnLnsISupports__+0x34> +0x25: movl (%ebx),%eax +0x27: movl 0xc(%eax),%eax +0x2a: pushl %ebx +0x2b: call *%eax +0x2d: addl $0x4,%esp +0x30: jmp +0x4 <libxpcom.so`__1cPnsSupportsArrayJElementAt6MI_pnLnsISupports__+0x34> +0x32: xorl %ebx,%ebx +0x34: movl %ebx,%eax +0x36: popl %ebx +0x37: movl %ebp,%esp +0x39: popl %ebp +0x3a: ret We can find that at +0x25, where %ebx has already been the "element", %eax gets the vtable of the object. Checking the register and memery, we get: $r %cs = 0x0017 %eax = 0x00000000 %ds = 0x001f %ebx = 0xceba8000 %ss = 0x001f %ecx = 0xcb21f408 %es = 0x001f %edx = 0xd362fa00 %fs = 0x0000 %esi = 0x0000000b %gs = 0x012f %edi = 0xcb21f480 0xceba8000/X 0xceba8000: c8b18 so, %eax is supposed to be c8b18. However, we found %eax = 0x00000000 and it caused the crash.
Comment 35•20 years ago
|
||
I found the reason that cause http://bugzilla.mozilla.org/attachment.cgi?id=143388&action=view might be that ObserverListEnumerator is not thread safe. It may need to share the lock with the nsObserverList which the emumerator is got from.
Reporter | ||
Comment 36•20 years ago
|
||
Can this be verified to exist in thunderbird also? Im running thunderbird now, its a bit more stable but still it crashes some.
Comment 37•20 years ago
|
||
Jerr, Can you try these (assume you are using bash)? 1. export NSPR_LOG_MODULES=ObserverService:5 exprot NSPR_LOG_FILE=nspr.log 2. run mozilla mail as you usually do until it crashes 3. post the file nspr.log here Thanks
Comment 38•20 years ago
|
||
I think the root cause may be in nsObserverService::EnumerateObservers(). I found there two threads access this method one thread's call stack is: nsWeakReference::AddRef() nsSupportsArray::ElementAt() nsSupportsArray::GetElementAt() ObserverListEnumerator::GetNext() ObserverService::NotifyObservers() nsEventQueueImpl::NotifyObservers() nsEventQueueImpl::~nsEventQueueImpl() nsEventQueueServiceImpl::PopThreadEventQueue() ... The other thread's call stack is: nsWeakReference::AddRef() nsSupportsArray::ElementAt() nsSupportsArray::GetElementAt() ObserverListEnumerator::GetNext() ObserverService::NotifyObservers() nsEventQueueImpl::NotifyObservers() nsEventQueueImpl::InitFromPRThread() nsEventQueueServiceImpl::MakeNewQueue() nsEventQueueServiceImpl::CreateEventQueue() nsProxyObject::PostAndWait() ...
Comment 39•20 years ago
|
||
Comment 40•20 years ago
|
||
Comment on attachment 164365 [details] [diff] [review] add a monitor Can you give r? Thanks
Attachment #164365 -
Flags: review?(bienvenu)
Reporter | ||
Comment 41•20 years ago
|
||
Hi, sorry for the delay. As of now im running the suggested NSPR_* env variables but I'm not running mozilla any longer. Im running thunderbird 0.7.3 and I dont have the oppertunity to run mozilla because it will interfere with my work. Altho thunderbird is more stable it too crashes from time to time.
Comment 42•20 years ago
|
||
Darin, biesi, this is the same issue as I uncovered in bug 266873 - the global observer events for nsIEventQueueCreated and nsIEventQueueDestroyed are being fired on multiple threads: I presume the appshellservice doesn't even want those notifications for non-main-thread event queues. In this case things are being compounded by the weak reference, which appears to be racing to a dual-release or something like that.
Comment 43•20 years ago
|
||
Comment on attachment 164365 [details] [diff] [review] add a monitor No, I'm not a module owner - dougt or darin would be your best bets...
Attachment #164365 -
Flags: review?(bienvenu) → review?(darin)
Comment 44•20 years ago
|
||
Jerry, this crash would be pretty much just as likely to happen in Thunderbird. And when it's fixed in Mozilla, it will be fixed in thunderbird at the same time...
Comment 45•20 years ago
|
||
Comment on attachment 164365 [details] [diff] [review] add a monitor Don't use a monitor where a lock will do. Do use a lock, or if possible, atomic instructions in AddRef and Release, which is what NS_IMPL_THREADSAFE_ISUPPORTS will give you. Looks like nsObserverList is thread-safe but ObserverListEnumerator is not, which is a bug too. It's not clear to me that there's a double-release bug too, but let's fix the above two bugs and see what we can see. This would be good to get for thunderbird 1.0. /be
Attachment #164365 -
Flags: superreview-
Updated•20 years ago
|
Flags: blocking-aviary1.0?
Assignee | ||
Comment 46•20 years ago
|
||
Here's a better patch. It makes no sense to invoke the observer service from a background thread. The observers don't expect to be called on the background thread, and there is no contract that requires them to be threadsafe. Moreover, we don't make any effort to proxy notifications from a background thread over to the "right" thread. Lastly, the only consumer of this particular notification expects to be called on the main thread and definitely has no interest in non-native event queues such as the ones created by IMAP.
Assignee: bienvenu → darin
Attachment #143299 -
Attachment is obsolete: true
Attachment #164365 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Assignee | ||
Updated•20 years ago
|
Attachment #164484 -
Flags: superreview?(bienvenu)
Attachment #164484 -
Flags: review?(bsmedberg)
Assignee | ||
Updated•20 years ago
|
Target Milestone: --- → mozilla1.8beta
Updated•20 years ago
|
Attachment #164484 -
Flags: superreview?(bienvenu) → superreview+
Comment 47•20 years ago
|
||
erm, all observers must be able to live on the main thread? what if my observer doesn't want to?
Assignee | ||
Comment 48•20 years ago
|
||
> erm, all observers must be able to live on the main thread? what if my observer
> doesn't want to?
timeless: i don't know... you may be SOL, or perhaps the observer service will
work properly if your observers and the guy calling NotifyObservers all live on
the same thread. clearly, there is no code to support notifying observers from
a background thread and having those observers execute on the main thread (or
whatever appropriate thread).
in the long run, the observer service should either build proxies or partition
the observers by thread such that any notifications for topic "foo" on thread 1
will only affect observers registered for topic "foo" on thread 1.
note: my patch only affects nsEventQueue.cpp... it leaves nsObserverService.cpp
completely untouched.
Comment 49•20 years ago
|
||
I think we should take darin's minimal patch for the branches, and leave this bug open for a bigger trunk patch that removes bogus threadsafe-isupports wallpaper in observer-service and -list land, instead asserting or testing is-main-thread and enforcing single-threadedness. Timeless: do you have any real requirements, or were you just wondering whether the o.s. might not be MT? It's reasonable to want it that way, but we need a new design and interface contracts. /be
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Comment 50•20 years ago
|
||
*** Bug 245820 has been marked as a duplicate of this bug. ***
Attachment #164484 -
Flags: review?(bsmedberg) → review+
Assignee | ||
Updated•20 years ago
|
Attachment #164365 -
Flags: review?(darin)
Assignee | ||
Comment 51•20 years ago
|
||
fixed-on-trunk brendan: i'd rather file a new bug for the enhancements to observer service since this bug has the crash keyword :)
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 52•20 years ago
|
||
i'm pretty sure i have real requirements. we have js components which want to (and do) live on other threads, they want to be able to observe things, but they don't want to be dragged across threads. keep in mind that the "proxy" object offered by xpcom will *drag* the object across threads, resulting in threadsafety errors. this isn't urgent, and having working imap is more important to me as an end user, but the fix does rub me the wrong way and is quite likely to hose me eventually. as long as someone files a bug about that, i'll be ok, i suppose.
Assignee | ||
Comment 53•20 years ago
|
||
Comment on attachment 164484 [details] [diff] [review] v2 patch This is good for the 1.7 branch as well as the aviary 1.0 branch. It's a very safe fix. Should only apply to IMAP since that's our only consumer of nsEventQueue on background threads.
Attachment #164484 -
Flags: approval1.7.x?
Attachment #164484 -
Flags: approval-aviary?
Comment 54•20 years ago
|
||
i just looked at the patch (after reading darin's note). am i understanding that as saying that we're changing the contract for eventqueue push/pop? that's very unsettling.
Assignee | ||
Comment 55•20 years ago
|
||
timeless: the observer events are a private backdoor mechanism used to enable native event queues. background threads don't need to use native event queues. UI in mozilla only runs from the main event queue. nothing changes for event queues managed on the UI thread.
Comment 56•20 years ago
|
||
I'd like to plus this for the aviary 1.0 branch. However, I understand if Ben would rather wait and have us check this in after Firefox 1.0 is out the door to minimize risk, since the problem only effects Thunderbird.
Comment 57•20 years ago
|
||
Comment on attachment 164484 [details] [diff] [review] v2 patch a=mkaply for 1.7
Attachment #164484 -
Flags: approval1.7.x? → approval1.7.x+
Comment 58•20 years ago
|
||
Comment on attachment 164484 [details] [diff] [review] v2 patch a=asa for aviary checkin but it would be nice if we could wait until after firefox 1.0 ships.
Attachment #164484 -
Flags: approval-aviary? → approval-aviary+
Comment 59•20 years ago
|
||
yeah I already told Darin we'd wait until after firefox 1.0 is out the door...
Comment 60•20 years ago
|
||
*** Bug 264935 has been marked as a duplicate of this bug. ***
Comment 61•20 years ago
|
||
*** Bug 268313 has been marked as a duplicate of this bug. ***
Comment 62•20 years ago
|
||
Darin, you can go ahead and check this into the aviary 1.0 branch now. I can do it for you if you want too. Thanks again.
Comment 64•20 years ago
|
||
Hey Darin, I think this patch may have introduced a crash on linux builds that seems to effect Firefox, Thunderbird and mozilla. See crash reports in: https://bugzilla.mozilla.org/show_bug.cgi?id=269076 https://bugzilla.mozilla.org/show_bug.cgi?id=269585 https://bugzilla.mozilla.org/show_bug.cgi?id=268402 https://bugzilla.mozilla.org/show_bug.cgi?id=270064 They all seem to die in event_process_queue. Branch and trunk and popped up around the time this fix went into the branch and trunk.
Comment 65•20 years ago
|
||
(In reply to comment #64) > I think this patch may have introduced a crash on linux builds that seems to > effect Firefox, Thunderbird and mozilla. this is being tracked in bug 269585, which is a topcrasher (and also affects aviary1.0-tbird bits).
Comment 66•20 years ago
|
||
note to self: I temporarily backed this out of the aviary branch until we fix Bug #269585 (sounds like Darin is getting close)
Keywords: fixed-aviary1.0
Assignee | ||
Comment 67•20 years ago
|
||
It's as if nsIThread::IsMainThread is lying to us :(
Assignee | ||
Comment 68•20 years ago
|
||
alternate patch. this version bypasses NotifyObservers when the event queue is not native. that should solve this bug, and should hopefully avoid the crashes in event_processor_callback that seem to have resulted from the v2 patch.
Assignee | ||
Comment 69•20 years ago
|
||
Attachment #166375 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #166379 -
Flags: superreview?(bienvenu)
Attachment #166379 -
Flags: review?(dbaron)
Attachment #166379 -
Flags: review?(dbaron) → review+
Updated•20 years ago
|
Attachment #166379 -
Flags: superreview?(bienvenu) → superreview+
Assignee | ||
Comment 70•20 years ago
|
||
Comment on attachment 166379 [details] [diff] [review] v3.1 patch - same thing, but with an assertion about being on the main thread I would like to try out this fix on the trunk. If all goes well, it should fix the topcrasher, bug 269585 (which is blocking 1.8a5)
Attachment #166379 -
Flags: approval1.8a5?
Updated•20 years ago
|
Attachment #166379 -
Flags: review+ → review?(dbaron)
Attachment #166379 -
Flags: review?(dbaron) → review+
Comment 71•20 years ago
|
||
Comment on attachment 166379 [details] [diff] [review] v3.1 patch - same thing, but with an assertion about being on the main thread a=asa for 1.8a5 checkin.
Attachment #166379 -
Flags: approval1.8a5? → approval1.8a5+
Assignee | ||
Comment 72•20 years ago
|
||
v3.1 patch fixed-on-trunk: Checking in nsEventQueue.cpp; /cvsroot/mozilla/xpcom/threads/nsEventQueue.cpp,v <-- nsEventQueue.cpp new revision: 3.43; previous revision: 3.42 done
Comment 73•20 years ago
|
||
Someone going to check this into aviary then?
Updated•20 years ago
|
Product: MailNews → Core
Comment 74•20 years ago
|
||
I just checked the alternate fix into the aviary 1.0 branch since talkback shows it fixed the crash regression.
Keywords: fixed-aviary1.0
Comment 75•20 years ago
|
||
What about the 1.7 branch?
Updated•20 years ago
|
Attachment #166379 -
Flags: approval1.7.x?
Comment 76•20 years ago
|
||
Adding topcrash info from duped bug 264935 for tracking.
Keywords: topcrash
Summary: Unknown random SEGV/seg fault/core dumps/crashes, only thing on is Mail/IMAP → Unknown random SEGV/seg fault/core dumps/crashes, only thing on is Mail/IMAP [@ 0x00000001 - nsSupportsArray::ElementAt][@ nsSupportsArray::Clear][@ NSS_CMSArray_Sort][@ nsSupportsArray::Clear][@ nsSupportsArray::DeleteArray]
Comment 77•20 years ago
|
||
Comment on attachment 166379 [details] [diff] [review] v3.1 patch - same thing, but with an assertion about being on the main thread a=mkaply
Attachment #166379 -
Flags: approval1.7.x? → approval1.7.x+
Assignee | ||
Comment 78•20 years ago
|
||
v3.1 patch fixed1.7.x
Comment 79•20 years ago
|
||
(In reply to comment #74) > I just checked the alternate fix into the aviary 1.0 branch since talkback shows > it fixed the crash regression. thunderbird built on 11/23 has been running since 11/23, where previously it was crashing every couple of hours. I'm immensely happy. I think thunderbird 1,0 should be released now :) Thanks all!
Comment 80•20 years ago
|
||
As my orginally reported bug (Bug 268313) was closed as duplicate here is still some crashes when closing thunderbird (version 0.9+ (20041129)): TB2275925H,TB2266547Q
Updated•16 years ago
|
Product: Core → MailNews Core
Updated•13 years ago
|
Crash Signature: [@ 0x00000001 - nsSupportsArray::ElementAt]
[@ nsSupportsArray::Clear]
[@ NSS_CMSArray_Sort]
[@ nsSupportsArray::Clear]
[@ nsSupportsArray::DeleteArray]
You need to log in
before you can comment on or make changes to this bug.
Description
•