Closed Bug 565065 Opened 14 years ago Closed 13 years ago

Thunderbird stuck at shutdown, infinite poll gettimeofday read sequence in strace

Categories

(MailNews Core :: Backend, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 5.0b1

People

(Reporter: protz, Assigned: Bienvenu)

References

Details

(Keywords: hang, qawanted, stackwanted)

Attachments

(4 files)

I haven't found a reliable way to trigger this. Basically, when I shutdown Thunderbird, it goes 100%cpu and doesn't respond anymore, and I have to kill it.

The last time this happened, I managed to strace the running process. Here's what I got:

Process 4156 attached - interrupt to quit
read(3, 0xb625a058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 1 ([{fd=18, revents=POLLIN}])
read(18, "\372", 1)                     = 1
gettimeofday({1273581961, 969398}, NULL) = 0
read(3, 0xb625a058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 0 (Timeout)
gettimeofday({1273581961, 972734}, NULL) = 0
read(3, 0xb625a058, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 0 (Timeout)

I filled 5mb of logs with the same sequence repeated over and over again, until I got bored and killed Thunderbird.

There's a few reports of similar issues but I'm confused by the insane number of calls to gettimeofday, and I don't think I can reliably consider this as a duplicate of other report issues involving 100% cpu at shutdown, so I'm reporting here in case we're missing something important.

I'm using the official b2 build FWIW.
strace output is not super useful unless you also happened to do an "ls -l /proc/4156/fd" and logged it.  Most useful would be stack traces via gdb or 'perf' although I must concede I'm not clear on where the debug symbols for official linux builds end up... one would ideally have that already installed and available so those tools could use them.  If there's a way to kill it to get breakpad to report it, that might also work...
protz, adding critical + hang. you can add hang keyword when creating the bug if you do "Show Advanced Fields" on the bug entry form.
Severity: normal → critical
Keywords: hang
I just managed to reproduce the bug! I also managed to kill the process using a SIGSEGV. That triggered breakpad and I submitted a crash report.

    bp-e1da6913-ec18-49cf-a591-b180f2100522	05/22/2010	12:08 PM

Plus, I also did ls -l and a strace -p. The files are here.

http://jonathan.protzenko.free.fr/bug565065

And the crash report is there.

http://crash-stats.mozilla.com/report/index/e1da6913-ec18-49cf-a591-b180f2100522

It might be an extension, but it's hard to tell.
bienvenu, you'll probably want to look at this crash report.  It looks like nsMsgCompressIStream has gone off the rails and just spends its entire life being spoon fed the byte 0xfa which presumably causes it/zlib to think there is still more work to do.  Unfortunately, it's not immediately obvious (to me) from the crash who keeps posting events to the main thread that keeps it alive, but I would assume it's related.
protz, are you using the fastmail imap server, or an other imap server that supports compress? You can try toggling mail.server.default.use_compress_deflate to see if it makes the crash go away.

Do you have empty trash/expunge inbox on exit set? I wonder if this is some failure to connection problems in the decompress code...
(In reply to comment #5)
> protz, are you using the fastmail imap server, or an other imap server that
> supports compress? You can try toggling
> mail.server.default.use_compress_deflate to see if it makes the crash go away.
> 
How can I know if a IMAP server supports deflate? One is imap.googlemail.com, the other two are:

jonathan@nala:~ $ nc imap.ens-lyon.fr 143
* OK bubulcus Cyrus IMAP4 v2.2.13-Debian-2.2.13-14+lenny3 server ready
^C
jonathan@nala:~ $ nc imap.free.fr 143
* OK [CAPABILITY IMAP4REV1 X-NETSCAPE LOGIN-REFERRALS AUTH=LOGIN] IMAP4rev1 Free
^C

> Do you have empty trash/expunge inbox on exit set? I wonder if this is some
> failure to connection problems in the decompress code...

None of my accouts has this setting.

It's kind of hard to reproduce. I'll try a tcpdump next to see which connections are still open when it loops like that. I haven't any reliable way to reproduce this.
You'd need an imap protocol log to see what the capability response was for each server - https://wiki.mozilla.org/MailNews:Logging
Protz can we get somewhere with this bug ?
Not really. For a while I exported the right environment variables when starting Thunderbird, so that I would get a protocol dump when it got stuck, but I haven't been able to reproduce it in such conditions. I don't have a reliable way to reproduce, and as I said, it's very random.

David, do you need the protocol log right when it crashes, or just the first few lines?
I don't need a protocol log of the getting stuck part, just the initial part of the log where we ask the server what its capabilities are. But I believe gmail has added compress/deflate support.
Here's the log. It's not the same computer that experiences the bug, but it's the same setup for accounts.
so yeah, that confirms that your gmail server does support COMPRESS, so that would be consistent with asuth's diagnosis. But that's only marginally helpful. I suspect this is a linux-y issue, since we have a ton of gmail users, and have not heard other reports of this.
What can I do to help track down this issue?
You could try toggling
mail.server.default.use_compress_deflate to see if it makes the hang go away, though if it's random and hard to reproduce, that may not be that helpful. How random is it, in the sense of how frequent is the hang?  It's unfortunate that turning on protocol logging seems to make this problem go away because my first suggestion would be to try to turn on other sorts of logging, but that may not be helpful.
It used to be ~twice a week, now I don't know, I'm using momo's mac as my primary machine :).

You can tell me which parts of logging to enable, and I'll make sure whenever I run Thunderbird on the faulty machine, these are enabled, just in case.
IMAP:5 is where I'd start, since it tells us what's happening with server communications at shutdown, and the other forms of logging (e.g., socket transport) are too fire-hosey.
Attached file Hang log
I managed to reproduce the bug. Seems to happen after Tb has been open for a long time (overnight). Tb feels sluggish and unresponsive, feel like restarting it, and then the bug appears.

Tb went 100% cpu, waited ~10 seconds, and then killed it. Here's the protocol log.
Nikolay, Jay, does anything in the imap log pop out?  :)

protz, can you reasonably rule out extensions?  Which ones do you run?
That error is NS_ERROR_ABORT, which is the status we give when trying to close the transport because the app is getting shut down. In this case, I guess a stack trace of the hang might be helpful.
I feel like this is happening when some IMAP connection is open and my network connection is shut down (i.e. I loose the wifi connection, for instance).

I already put a crashreport in comment #3. What do you mean by stack trace? Do you want me to attach gdb to the running process and bt once it's killed? Do I need to run a debug build for this or is a shared build ok?
(In reply to comment #20)

> I already put a crashreport in comment #3.
Yeah, that didn't contain much useful that I could see.

> Do you want me to attach gdb to the running process and bt once it's killed?
Yes, I want to see what the threads are doing that's preventing the app from shutting down. So attach and break into the debugger.


> need to run a debug build for this or is a shared build ok?
If you can get symbols, a shared build is fine.
Yay! Managed to get a backtrace.

Please note that the symptoms are linked to my connection being particularly crappy (mobile phone tethering) and sockets timing out. I event got a "too many imap connections for this account" error message.

After I understood that Thunderbird was not closing properly, I attached gdb to it, killed Thunderbird, and ran a backtrace. Here comes the full backtrace.

0x00007f7ea666f0bd in read () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007f7ea666f0bd in read () from /lib/libpthread.so.0
#1  0x00007f7ea538cbba in nsAppShell::EventProcessorCallback (
    source=<value optimized out>, condition=<value optimized out>, 
    data=0x7f7e9b3d1fd0)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/gtk2/nsAppShell.cpp:66
#2  0x00007f7ea10b86f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
#3  0x00007f7ea10bc568 in ?? () from /lib/libglib-2.0.so.0
#4  0x00007f7ea10bc71c in g_main_context_iteration ()
   from /lib/libglib-2.0.so.0
#5  0x00007f7ea53a23c7 in nsBaseAppShell::DoProcessNextNativeEvent (this=0xf, 
    mayWait=-1871547345)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:161
#6  0x00007f7ea53a25b1 in nsBaseAppShell::OnProcessNextEvent (
    this=0x7f7e9b3d1fd0, thr=0x7f7e9b3364c0, mayWait=1, 
    recursionDepth=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:299
#7  0x00007f7ea569fd11 in nsThread::ProcessNextEvent (this=0x7f7e9b3364c0, 
    mayWait=1, result=0x7fff9072766c)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/threads/nsThread.cpp:517
#8  0x00007f7ea566fcd1 in NS_ProcessNextEvent_P (thread=0xf, 
---Type <return> to continue, or q <return> to quit---
    mayWait=-1871547345) at nsThreadUtils.cpp:250
#9  0x00007f7ea4b5d8e6 in nsSyncStreamListener::WaitForData (
    this=0x7f7e80e7d250)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:58
#10 0x00007f7ea4b5d93a in nsSyncStreamListener::Available (
    this=0x7f7e80e7d250, result=0x7fff907276cc)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:160
#11 0x00007f7ea4b5d7c4 in nsSyncStreamListener::Read (this=0xf, 
    buf=0x7f7e80e7f000 "", bufLen=4096, result=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:178
#12 0x00007f7ea5443293 in unsigned int NS_ReadLine<char, nsIInputStream, nsCAutoString>(nsIInputStream*, nsLineBuffer<char>*, nsCAutoString&, int*) ()
   from /home/jonathan/Code/objdir-comm-central/mozilla/dist/bin/libxul.so
#13 0x00007f7ea543783f in nsMsgDBFolder::GetMsgTextFromStream (
    this=<value optimized out>, stream=<value optimized out>, 
    aCharset=<value optimized out>, bytesToRead=<value optimized out>, 
    aMaxOutputLen=<value optimized out>, 
    aCompressQuotes=<value optimized out>, aStripHTMLTags=1, aContentType=..., 
    aMsgText=...)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgDBFolder.cpp:533---Type <return> to continue, or q <return> to quit---
5
#14 0x00007f7ea56abc4a in NS_InvokeByIndex_P (that=0xf, 
    methodIndex=2423419951, paramCount=1, params=0xffffffffffffffff)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:208
#15 0x00007f7ea51af8e4 in CallMethodHelper::Invoke (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:3081
#16 CallMethodHelper::Call (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2348
#17 XPCWrappedNative::CallMethod (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2312
#18 0x00007f7ea51b5021 in XPC_WN_CallMethod (cx=0x7f7e936df800, argc=7, 
    vp=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1752
#19 0x00007f7ea61fb3ff in js::Interpret (cx=0x7f7e936df800, 
    entryFrame=<value optimized out>, inlineCallCount=6, 
---Type <return> to continue, or q <return> to quit---
    interpFlags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:4477
#20 0x00007f7ea62062a9 in RunScript (cx=0x7f7e936df800, 
    argsRef=<value optimized out>, flags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:484
#21 js::Invoke (cx=0x7f7e936df800, argsRef=<value optimized out>, 
    flags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:592
#22 0x00007f7ea6206bb4 in js::ExternalInvoke (cx=0x7f7e936df800, thisv=..., 
    fval=..., argc=1, argv=<value optimized out>, rval=0x7fff90729568)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:622
#23 0x00007f7ea619ef71 in ExternalInvoke (cx=0x7f7e936df800, 
    obj=<value optimized out>, fval=18445617617794269632, argc=4294967295, 
    argv=0x7f7ea695e040, rval=0x0)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.h:910
#24 JS_CallFunctionValue (cx=0x7f7e936df800, obj=<value optimized out>, 
    fval=18445617617794269632, argc=4294967295, argv=0x7f7ea695e040, rval=0x0)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsapi.cpp:4854
#25 0x00007f7ea51a9bdb in nsXPCWrappedJSClass::CallMethod (
    this=0x7f7e8969a400, wrapper=<value optimized out>, 
    methodIndex=<value optimized out>, info=0x7f7e95dcf450, 
    nativeParams=0x7fff90729620)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappedj---Type <return> to continue, or q <return> to quit---
sclass.cpp:1692
#26 0x00007f7ea56ac815 in PrepareAndDispatch (self=0x7f7e94e441c0, 
    methodIndex=<value optimized out>, args=0x7fff907297a0, 
    gpregs=0x7fff90729720, fpregs=0x7fff90729750)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:153
#27 0x00007f7ea56abcd7 in SharedStub ()
   from /home/jonathan/Code/objdir-comm-central/mozilla/dist/bin/libxul.so
#28 0x00007f7ea5351457 in Run (this=0x7f7e82ca4370)
    at /home/jonathan/Code/comm-central/mozilla/storage/src/mozStorageAsyncStatementExecution.cpp:164
#29 0x00007f7ea569fd67 in nsThread::ProcessNextEvent (this=0x7f7e9b3364c0, 
    mayWait=1, result=0x7fff9072981c)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/threads/nsThread.cpp:547
#30 0x00007f7ea566fcd1 in NS_ProcessNextEvent_P (thread=0xf, 
    mayWait=-1871547345) at nsThreadUtils.cpp:250
#31 0x00007f7ea4b5d8e6 in nsSyncStreamListener::WaitForData (
    this=0x7f7e88e43400)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:58
#32 0x00007f7ea4b5d93a in nsSyncStreamListener::Available (
    this=0x7f7e88e43400, result=0x7fff9072987c)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamLis---Type <return> to continue, or q <return> to quit---
tener.cpp:160
#33 0x00007f7ea4b5d7c4 in nsSyncStreamListener::Read (this=0xf, 
    buf=0x7f7e7e542000 "", bufLen=4096, result=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:178
#34 0x00007f7ea5443293 in unsigned int NS_ReadLine<char, nsIInputStream, nsCAutoString>(nsIInputStream*, nsLineBuffer<char>*, nsCAutoString&, int*) ()
   from /home/jonathan/Code/objdir-comm-central/mozilla/dist/bin/libxul.so
#35 0x00007f7ea543783f in nsMsgDBFolder::GetMsgTextFromStream (
    this=<value optimized out>, stream=<value optimized out>, 
    aCharset=<value optimized out>, bytesToRead=<value optimized out>, 
    aMaxOutputLen=<value optimized out>, 
    aCompressQuotes=<value optimized out>, aStripHTMLTags=1, aContentType=..., 
    aMsgText=...)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgDBFolder.cpp:5335
#36 0x00007f7ea56abc4a in NS_InvokeByIndex_P (that=0xf, 
    methodIndex=2423419951, paramCount=1, params=0xffffffffffffffff)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:208
#37 0x00007f7ea51af8e4 in CallMethodHelper::Invoke (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappedn---Type <return> to continue, or q <return> to quit---
ative.cpp:3081
#38 CallMethodHelper::Call (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2348
#39 XPCWrappedNative::CallMethod (ccx=<value optimized out>, 
    mode=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2312
#40 0x00007f7ea51b5021 in XPC_WN_CallMethod (cx=0x7f7e936df800, argc=7, 
    vp=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1752
#41 0x00007f7ea61fb3ff in js::Interpret (cx=0x7f7e936df800, 
    entryFrame=<value optimized out>, inlineCallCount=6, 
    interpFlags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:4477
#42 0x00007f7ea62062a9 in RunScript (cx=0x7f7e936df800, 
    argsRef=<value optimized out>, flags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:484
#43 js::Invoke (cx=0x7f7e936df800, argsRef=<value optimized out>, 
    flags=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:592
---Type <return> to continue, or q <return> to quit---
#44 0x00007f7ea6206bb4 in js::ExternalInvoke (cx=0x7f7e936df800, thisv=..., 
    fval=..., argc=1, argv=<value optimized out>, rval=0x7fff9072b718)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.cpp:622
#45 0x00007f7ea619ef71 in ExternalInvoke (cx=0x7f7e936df800, 
    obj=<value optimized out>, fval=18445617617794269632, argc=4294967295, 
    argv=0x7f7ea695e040, rval=0x0)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsinterp.h:910
#46 JS_CallFunctionValue (cx=0x7f7e936df800, obj=<value optimized out>, 
    fval=18445617617794269632, argc=4294967295, argv=0x7f7ea695e040, rval=0x0)
    at /home/jonathan/Code/comm-central/mozilla/js/src/jsapi.cpp:4854
#47 0x00007f7ea51a9bdb in nsXPCWrappedJSClass::CallMethod (
    this=0x7f7e8969a400, wrapper=<value optimized out>, 
    methodIndex=<value optimized out>, info=0x7f7e95dcf450, 
    nativeParams=0x7fff9072b7d0)
    at /home/jonathan/Code/comm-central/mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp:1692
#48 0x00007f7ea56ac815 in PrepareAndDispatch (self=0x7f7e8ed48380, 
    methodIndex=<value optimized out>, args=0x7fff9072b950, 
    gpregs=0x7fff9072b8d0, fpregs=0x7fff9072b900)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:153
#49 0x00007f7ea56abcd7 in SharedStub ()
   from /home/jonathan/Code/objdir-comm-central/mozilla/dist/bin/libxul.so
---Type <return> to continue, or q <return> to quit---
#50 0x00007f7ea5351457 in Run (this=0x7f7e7f0cc790)
    at /home/jonathan/Code/comm-central/mozilla/storage/src/mozStorageAsyncStatementExecution.cpp:164
#51 0x00007f7ea569fd67 in nsThread::ProcessNextEvent (this=0x7f7e9b3364c0, 
    mayWait=1, result=0x7fff9072b9cc)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/threads/nsThread.cpp:547
#52 0x00007f7ea566fcd1 in NS_ProcessNextEvent_P (thread=0xf, 
    mayWait=-1871547345) at nsThreadUtils.cpp:250
#53 0x00007f7ea53a26ed in nsBaseAppShell::Run (this=0x7f7e9b3d1fd0)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:183
#54 0x00007f7ea5293fee in nsAppStartup::Run (this=0x7f7e937d8640)
    at /home/jonathan/Code/comm-central/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:191
#55 0x00007f7ea4b2c244 in XRE_main (argc=<value optimized out>, 
    argv=<value optimized out>, aAppData=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:3673
#56 0x0000000000401a16 in main (argc=5, argv=0x7fff9072c308)
    at /home/jonathan/Code/comm-central/mail/app/nsMailApp.cpp:101
(gdb)
So that just means we're trying to get some message text for the new mail alert (most likely). It doesn't look like a hang, per se. Does linux display message text for the new mail alert?
I've had issues with the popup alert, yes, the most notable one being the popup flickering for a fraction of a second and not showing properly, although because I wasn't able to reliably trigger it, I never filed a bug.

I'll try to get more backtraces to see if we're always stuck in the same area.
(In reply to comment #23)
> It doesn't look like a hang, per se. Does linux display message text for the new mail alert?

what is this, if it isn't a hang?
if it's a linux new message pop up perf issue, see Bug 367431 - New Mail alert popup takes 100% CPU for 10 secs
(In reply to comment #25)

> 
> what is this, if it isn't a hang?
I'm just saying that stack trace doesn't look like it's hung.

In the past, we've had bugs where the code on linux and windows that's supposed to shut down the app when the last window is closed gets the count wrong, and leaves the app spinning the event loop, but no window up. And the new mail alert window might confuse that count.
Happened again. The computer is an 8-core i7 Xeon, and I waited for 30s+ before attaching gdb, and the CPU kept going through the roof at 100% all the time, so I think it's definitely a hang. One more backtrace (not very helpful, I'm afraid).

(gdb) bt
#0  0x00007f674b4b2113 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f674c6dd2c9 in ?? () from /lib/libglib-2.0.so.0
#2  0x00007f674c6dd71c in g_main_context_iteration () from /lib/libglib-2.0.so.0
#3  0x00007f673b5dd337 in nsBaseAppShell::DoProcessNextNativeEvent (this=0x7f672e7aae10, mayWait=10)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:155
#4  0x00007f673b5dd5cd in nsBaseAppShell::OnProcessNextEvent (this=0x7f67337457c0, thr=0x7f6747b28550, 
    mayWait=0, recursionDepth=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:311
#5  0x00007f674f807e05 in nsThread::ProcessNextEvent (this=0x7f6747b28550, mayWait=1, result=0x7fff00dc7d2c)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:508
#6  0x00007f674f7d439a in NS_ProcessNextEvent_P (thread=0x7f672e7aae10, mayWait=10) at nsThreadUtils.cpp:250
#7  0x00007f674f80815f in nsThread::Shutdown (this=0x7f671b0c4ee0)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:468
#8  0x00007f674f809058 in nsThreadManager::Shutdown (this=0x7f674fa6e880)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThreadManager.cpp:158
#9  0x00007f674f7d7534 in mozilla::ShutdownXPCOM (servMgr=0x7f6747ba9148)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/build/nsXPComInit.cpp:873
#10 0x00007f674fc94dc5 in ~ScopedXPCOMStartup (this=0x7fff00dc8440, __in_chrg=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:1052
#11 0x00007f674fc984c2 in XRE_main (argc=<value optimized out>, argv=<value optimized out>, 
    aAppData=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:3527
#12 0x0000000000401b04 in main (argc=1, argv=0x7fff00dc8728)
    at /home/jonathan/Sources/comm-1.9.2/mail/app/nsMailApp.cpp:101

Is there anything useful you want me to do next time?
0x00007fda3be9f113 in *__GI___poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
87	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
	in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0  0x00007fda3be9f113 in *__GI___poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fda3d0ca2c9 in ?? () from /lib/libglib-2.0.so.0
#2  0x00007fda3d0ca71c in g_main_context_iteration ()
   from /lib/libglib-2.0.so.0
#3  0x00007fda2f61b337 in nsBaseAppShell::DoProcessNextNativeEvent (
    this=0x7fda1c39d350, mayWait=10)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:155
#4  0x00007fda2f61b5cd in nsBaseAppShell::OnProcessNextEvent (
    this=0x7fda32b8a400, thr=0x7fda38528550, mayWait=0, 
    recursionDepth=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:311
#5  0x00007fda401f4e05 in nsThread::ProcessNextEvent (this=0x7fda38528550, 
    mayWait=1, result=0x7ffffc74e1cc)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:508
#6  0x00007fda401c139a in NS_ProcessNextEvent_P (thread=0x7fda1c39d350, 
    mayWait=10) at nsThreadUtils.cpp:250
#7  0x00007fda401f515f in nsThread::Shutdown (this=0x7fda27ac3fb0)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:468
#8  0x00007fda401f6058 in nsThreadManager::Shutdown (this=0x7fda4045b880)
---Type <return> to continue, or q <return> to quit---
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThreadManager.cpp:158
#9  0x00007fda401c4534 in mozilla::ShutdownXPCOM (servMgr=0x7fda385a9148)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/build/nsXPComInit.cpp:873
#10 0x00007fda40681dc5 in ~ScopedXPCOMStartup (this=0x7ffffc74e8e0, 
    __in_chrg=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:1052
#11 0x00007fda406854c2 in XRE_main (argc=<value optimized out>, 
    argv=<value optimized out>, aAppData=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:3527
#12 0x0000000000401b04 in main (argc=1, argv=0x7ffffc74ebc8)
    at /home/jonathan/Sources/comm-1.9.2/mail/app/nsMailApp.cpp:101

Seems to be correlated with a flaky connection.
not sure if this is what you saw as well, Ben.
Yes, I saw exactly the same as in initial description, including the very same "\372".
many major/hang bugs cite gettimeofday - Bug 533464, Bug 361730, Bug 508263. needs a deeper look. dup of bug 508263?
Attached file GDB log
David, I've had another hang at shutdown, but not 100% cpu this time. I did a complete GDB session, and inspected every one of the 15 threads running. One is indeed hanging in the IMAP code. The full backtrace is near the end of the file. Relevant parts:

#5  nsPipeInputStream::Wait (this=0x7fe274372910)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/io/nsPipe3.cpp:652
No locals.
#6  0x00007fe2a1cebfd5 in nsPipeInputStream::ReadSegments (
    this=0x7fe274372910,
    writer=0x7fe2a1cec8fb <NS_CopySegmentToBuffer(nsIInputStream*, void*, char const*, P
    readCount=0x7fe26d9fc80c)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/io/nsPipe3.cpp:777
        writeCount = 1924833280
        originalLen = <value optimized out>
        rv = <value optimized out>
        segment = 0x7fe26d9fc858 "\n"
        segmentLen = 32738
#7  0x00007fe2a20616b8 in nsMsgCompressIStream::Read (this=0x7fe275d70950, 
    aBuf=0x7fe272baa000 "+ idling\r\n", aCount=7008, aResult=0x7fe26d9fc868)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgCompressIStream.cpp:175
        bytesRead = 0
        rv = <value optimized out>
        rv = <value optimized out>
#8  0x00007fe2a2044619 in nsMsgLineStreamBuffer::ReadNextLine (
    this=0x7fe270a5d310, aInputStream=0x7fe275d70950,
    aNumBytesInLine=@0x7fe26d9fc928, aPauseForMoreData=@0x7fe26d9fc92c,
    prv=0x7fe26d9fc924, addLineTerminator=0)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgLineBuffer.cpp:403
        startOfNewData = <value optimized out>
        i = <value optimized out>
        rv = <value optimized out>
        numBytesInStream = 7008
        numFreeBytesInBuffer = <value optimized out>
        numBytesCopied = 0
        nonBlockingStream = 0
        endOfLine = <value optimized out>
        startOfLine = 0x7fe272baa000 "+ idling\r\n"
#9  0x00007fe2a1fb81b8 in nsImapProtocol::CreateNewLineFromSocket (
    this=0x7fe2709c6000)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapProtocol.cpp:4718
        numBytesInLine = 0
        rv = 0
        kungFuGrip = {<nsCOMPtr_base> = {
            mRawPtr = 0x7fe275d70950}, <No data fields>}
        needMoreData = 0
        newLine = <value optimized out>
#10 0x00007fe2a1fc7b6e in nsImapServerResponseParser::GetNextLineForParser (
    this=0x7fe2709c6290, nextLine=0x7fe2709c62a0)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp
        rv = 1
#11 0x00007fe2a2085569 in nsIMAPGenericParser::AdvanceToNextLine (
    this=0x7fe2709c6290)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsIMAPGenericParser.cpp:181
        ok = <value optimized out>
#12 0x00007fe2a20855ff in nsIMAPGenericParser::AdvanceToNextToken (
    this=0x7fe2709c6290)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsIMAPGenericParser.cpp:152
No locals.
#13 0x00007fe2a1fc93d2 in nsImapServerResponseParser::ParseIMAPServerResponse (
    this=0x7fe2709c6290, aCurrentCommand=<value optimized out>,
    aIgnoreBadAndNOResponses=0, aGreetingWithCapability=0x0)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp
        placeInTokenString = 0x7fe274349271 "\n"
        tagToken = <value optimized out>
        commandToken = <value optimized out>
        inIdle = <value optimized out>
        sendingIdleDone = 0
        numberOfTaggedResponsesReceived = 0
        copyCurrentCommand = {<nsACString_internal> = {
            mData = 0x7fe274349268 "321", mLength = 10,
            mFlags = 5}, <No data fields>}
#14 0x00007fe2a1fb25f0 in nsImapProtocol::HandleIdleResponses (
Whiteboard: [has stack trace]
this stack trace just means that we have an imap thread waiting for data - it doesn't mean anything other than that. Other threads could be blocking the shutdown process - I've seen a strange behavior with the UI thread and PSM where the UI thread is blocked on shutdown trying to clean up a PSM thread.
Whiteboard: [has stack trace]
Assignee: nobody → dbienvenu
I just had a hang on shutdown like this happen to me today. Poking around, I discovered that the url was not in the imap log, which meant we had run the url after the shutdown process had turned off logging, and I discovered that mailnews had been shut down. Since we ran the url anyway, I believe this meant that the account manager had been destroyed and then recreated, with m_shuttingDown set to false. I remembered from previous experience that sometimes we reload the accounts at shutdown, which means that every once in a while, some piece of code tries to do stuff after shutdown.

So this fix does a couple things:

1. Make the variables that track whether we're shutting down truly global
2. Move the check for shutting down lower down in the IMAP call stack (but still on the UI thread) so that various edge cases don't avoid the check. One edge case is queued urls, though I know that's not what happened to me or others, because queuing urls shows up in the protocol log.
Attachment #529838 - Flags: review?(neil)
Attachment #529838 - Flags: review?(neil) → review+
fixed on trunk - http://hg.mozilla.org/comm-central/rev/a217f63f0cee

I'm going to mark this fixed - the symptom is (usually) one imap thread blocked waiting for input, no other threads blocked, and a hang on shutdown. Protz, I'm not sure if this is what you are seeing, but I would be very interested to hear if you see it with builds going forward. As I mentioned before, the other symptom is that the imap url that's blocked doesn't show up in an imap protocol log because it was started after logging is torn down.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.3a4
Sure. I haven't seen this happen frequently lately, but I'll make sure I try to get a relevant backtrace in case this happens again. Ehsan, you said your Thunderbird reliably had this issue, do you think you could try this out?
(In reply to comment #37)
> Sure. I haven't seen this happen frequently lately, but I'll make sure I try to
> get a relevant backtrace in case this happens again. Ehsan, you said your
> Thunderbird reliably had this issue, do you think you could try this out?

Sure.  Do I need to download a trunk build for this?  (From where?! :D)

Will that build blow away my profile?  Specifically, should I make a backup before running it?
I mean, I can probably run the trunk build regularly if I can get some of my extensions working on it.
Ehsan, tomorrow's miramar nightly builds will have this fix in it - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-miramar/

You can disable extension compatibility checking and most extensions will run. Binary extensions like Lightning or Enigmail aren't so lucky.

However, I thought your big issue was a hang after wakeup from sleep, not a hang on shutdown. This patch has no effect on the former, but there's work going on in Bug 508263 that may help with that. (Sorry, all these bugs are dirty, in that they conflate a bunch of issues)
Was this released in 5.0?  Because I'm still seeing a hang on quit... now with Lion.
(In reply to comment #42)
> Was this released in 5.0?  Because I'm still seeing a hang on quit... now
> with Lion.

Yes it was, but please file a new bug, especially as you've changed OS. It could be that you're seeing something different.
the hints that the patch is in v5 are comment 40 which cites Miramar, and Target Milestone=Thunderbird 3.3a4 - these are the precursor to v5.

If you still see a hang, please consult https://wiki.mozilla.org/Thunderbird:Testing:Shutdown_Hang
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: