frequently enough (1/10 or so) when I hit space or n, say yes to the prompt sheet to advance to next folder with unread (or, just click on a folder to switch to it) this results in a hang. server is exchange over imap hanging on central/central since sometime in January, I think ...a bit hard to narrow it down as the reproducibility is a bit random. Actually, the str could be something like: read stuff, idle for a minute, switch to a folder with no unread, hit space and yes to advance. Regression range: 2012-01-04-03-00-26-comm-central good 0/4 http://hg.mozilla.org/mozilla-central/rev/200a8d1fb452 http://hg.mozilla.org/comm-central/rev/fd9f0ac2bcaf 2012-01-05-03-00-25-comm-central bad 2/2 http://hg.mozilla.org/mozilla-central/rev/4795500b7c1d http://hg.mozilla.org/comm-central/rev/6ad18d15c741
(gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007ffff7bca36f in _L_lock_1145 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff7bca2ba in __pthread_mutex_lock (mutex=0x7ffff6d48288) at pthread_mutex_lock.c:101 #3 0x00007ffff62876f9 in PR_Lock () from /usr/lib/x86_64-linux-gnu/libnspr4.so #4 0x00007ffff6287d15 in PR_EnterMonitor () from /usr/lib/x86_64-linux-gnu/libnspr4.so #5 0x00007ffff6278501 in PR_CEnterMonitor () from /usr/lib/x86_64-linux-gnu/libnspr4.so #6 0x00007ffff23b7983 in nsImapProtocol::PseudoInterruptMsgLoad ( this=0x7fffc95e9000, aImapFolder=0x7fffcbab3440, aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f) at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1279 #7 0x00007ffff2371c64 in nsImapIncomingServer::PseudoInterruptMsgLoad ( this=0x7fffd6ac9be0, aImapFolder=0x7fffcbab3440, aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f) at comm-central/mailnews/imap/src/nsImapIncomingServer.cpp:2246 #8 0x00007ffff23eaac4 in nsImapService::GetMessageFromUrl ( this=0x7fffd0c87e20, aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, aConvertDataToText=false, aURL=0x0) at comm-central/mailnews/imap/src/nsImapService.cpp:1084 #9 0x00007ffff23ea641 in nsImapService::FetchMessage (this=0x7fffd0c87e20, aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, messageIdentifierList=..., aConvertDataToText=false, aAdditionalHeader=..., aURL=0x0) at comm-central/mailnews/imap/src/nsImapService.cpp:1046 #10 0x00007ffff23e7728 in nsImapService::DisplayMessage (this=0x7fffd0c87e20, aMessageURI=0x7fffb6b84e88 "imap-message://domain%5Cuser@host.company.com/folder#86789", aDisplayConsumer=0x7fffd23b3908, aMsgWindow=0x7fffd6ad4f70, aUrlListener=0x0, aCharsetOverride=0x0, aURL=0x0) at comm-central/mailnews/imap/src/nsImapService.cpp:580 (not sure why I seem to be getting system nspr? anyhow, it hangs in m.o nightlies as well)
reverting this seems to help: changeset: 9107:75840841cc21 user: David Bienvenu <firstname.lastname@example.org> date: Wed Jan 04 08:40:48 2012 -0800 summary: fix deadlock in imap ssl calling isAlive, r=standard8, bug 711787 mailnews/imap/src/nsImapProtocol.cpp | 27 ++++++++++++++------------- 1 files changed, 14 insertions(+), 13 deletions(-)
Have you changed the number of connections to cache with that server from the default of 5? Next time this happens, we need more of the main thread's stack, and info from the other stacks, in particular, there should be an other thread doing something with the imap protocol object. We can't revert the other fix since that hang was a lot worse. I've never seen this hang myself.
max_cached_connections is 4; timeout 29
Created attachment 603846 [details] (gdb) thread apply all bt this box has the default 5 cached connections fwiw
Is following effective recovery procedure when problem occurs? Go Work Offlne mode, Go back to Work Online mode, then open the folder again. Do you enable automatic new mail check? Do you enable IDLE command use? Is frequency of your problem reduced by disabling IDLE command use?
ah, thx. Ok, thread 30 is trying to retry a url, probably because the server or network dropped a connection (or perhaps a loadgroup was cancelled which killed the connection): #5 0x00007ffff24c7ae5 in (anonymous namespace)::DispatchSyncRunnable ( r=0x7fffc511b780) at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:308 #6 0x00007ffff24cab48 in ImapServerSinkProxy::PrepareToRetryUrl ( this=0x7fffc507d740, a1=0x7fffc51ec000, a2=0x7fffcb8febf0) at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:464 #7 0x00007ffff2463e6b in nsImapProtocol::RetryUrl (this=0x7fffcd9b1800) at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1876 #8 0x00007ffff2461e19 in nsImapProtocol::ImapThreadMainLoop ( this=0x7fffcd9b1800) at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1361 #9 0x00007ffff2460b31 in nsImapProtocol::Run (this=0x7fffcd9b1800) and at just the wrong time, the UI thread is trying to find a connection it can use, which causes contention over a PR_CEnterMonitor(this) on the protocol object. I'll have to think about this. I think the imap thread shouldn't be holding onto the monitor when calling into the UI thread with the sink proxy runnable calls, which might not be too hard to fix.
I suspect we don't need to use the monitor at all in this method. All the member vars we're accessing should be safe to access from the imap thread. The server object protects accesses to connections with its own monitor.
Created attachment 604568 [details] [diff] [review] possible fix I'll request a try server build w/ this patch, but if you can do your own builds, here's the patch.
Comment on attachment 604568 [details] [diff] [review] possible fix yeah, this fixes it for me, didn't get any other issues in the past day with it
Comment on attachment 604568 [details] [diff] [review] possible fix [Approval Request Comment] User impact if declined: occasional hangs Testing completed (on c-c, etc.): try server build fixed issue for reporter Risk to taking this patch (and alternatives if risky): slight risk of race conditions though hangs are much more likely
Comment on attachment 604568 [details] [diff] [review] possible fix [Triage Comment] This already landed in time for 13. The requests should have let it go into 12, somehow I missed that. So a=me for comm-beta.
Is it possible to apply this patch to 11.0?
Comment on attachment 604568 [details] [diff] [review] possible fix [Triage Comment] a=Standard8, as per drivers meeting we've decided to spin a 11.0.1 for this and another issue. I've landed it on comm-release already: http://hg.mozilla.org/releases/comm-release/rev/832c448e5d0a
I've marked the topic in https://getsatisfaction.com/mozilla_messaging/topics/problems_with_mozilla_11_0 to reflect solution in 11.0.1