Last Comment Bug 733731 - hang when switching folders on IMAP server
: hang when switching folders on IMAP server
Status: RESOLVED FIXED
[gs][gssolved]
: hang, regression
Product: MailNews Core
Classification: Components
Component: Networking: IMAP (show other bugs)
: Trunk
: x86_64 Linux
: -- critical with 2 votes (vote)
: Thunderbird 13.0
Assigned To: David :Bienvenu
:
:
Mentors:
https://getsatisfaction.com/mozilla_m...
: 713624 728740 738562 738930 739251 739688 739781 739997 740298 (view as bug list)
Depends on:
Blocks: 711787 739997
  Show dependency treegraph
 
Reported: 2012-03-07 05:18 PST by Tuukka Tolvanen (sp3000)
Modified: 2013-01-30 12:17 PST (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
-
fixed
+
fixed


Attachments
(gdb) thread apply all bt (65.00 KB, text/plain)
2012-03-07 14:01 PST, Tuukka Tolvanen (sp3000)
no flags Details
possible fix (1.38 KB, patch)
2012-03-09 17:02 PST, David :Bienvenu
neil: review+
tuukka.tolvanen: feedback+
standard8: approval‑comm‑beta+
standard8: approval‑comm‑release+
Details | Diff | Splinter Review

Description Tuukka Tolvanen (sp3000) 2012-03-07 05:18:03 PST
frequently enough (1/10 or so) when I hit space or n, say yes to the prompt sheet to advance to next folder with unread (or, just click on a folder to switch to it) this results in a hang. server is exchange over imap

hanging on central/central since sometime in January, I think ...a bit hard to narrow it down as the reproducibility is a bit random.

Actually, the str could be something like: read stuff, idle for a minute, switch to a folder with no unread, hit space and yes to advance. Regression range:

2012-01-04-03-00-26-comm-central good 0/4
http://hg.mozilla.org/mozilla-central/rev/200a8d1fb452
http://hg.mozilla.org/comm-central/rev/fd9f0ac2bcaf

2012-01-05-03-00-25-comm-central bad 2/2
http://hg.mozilla.org/mozilla-central/rev/4795500b7c1d
http://hg.mozilla.org/comm-central/rev/6ad18d15c741
Comment 1 Tuukka Tolvanen (sp3000) 2012-03-07 05:42:11 PST
(gdb) bt
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007ffff7bca36f in _L_lock_1145 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff7bca2ba in __pthread_mutex_lock (mutex=0x7ffff6d48288)
    at pthread_mutex_lock.c:101
#3  0x00007ffff62876f9 in PR_Lock ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#4  0x00007ffff6287d15 in PR_EnterMonitor ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#5  0x00007ffff6278501 in PR_CEnterMonitor ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#6  0x00007ffff23b7983 in nsImapProtocol::PseudoInterruptMsgLoad (
    this=0x7fffc95e9000, aImapFolder=0x7fffcbab3440, 
    aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1279
#7  0x00007ffff2371c64 in nsImapIncomingServer::PseudoInterruptMsgLoad (
    this=0x7fffd6ac9be0, aImapFolder=0x7fffcbab3440, 
    aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f)
    at comm-central/mailnews/imap/src/nsImapIncomingServer.cpp:2246
#8  0x00007ffff23eaac4 in nsImapService::GetMessageFromUrl (
    this=0x7fffd0c87e20, aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, 
    aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, 
    aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, 
    aConvertDataToText=false, aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:1084
#9  0x00007ffff23ea641 in nsImapService::FetchMessage (this=0x7fffd0c87e20, 
    aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, 
    aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, 
    aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, 
    messageIdentifierList=..., aConvertDataToText=false, 
    aAdditionalHeader=..., aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:1046
#10 0x00007ffff23e7728 in nsImapService::DisplayMessage (this=0x7fffd0c87e20, 
    aMessageURI=0x7fffb6b84e88 "imap-message://domain%5Cuser@host.company.com/folder#86789", aDisplayConsumer=0x7fffd23b3908, aMsgWindow=0x7fffd6ad4f70, 
    aUrlListener=0x0, aCharsetOverride=0x0, aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:580

(not sure why I seem to be getting system nspr? anyhow, it hangs in m.o nightlies as well)
Comment 2 Tuukka Tolvanen (sp3000) 2012-03-07 05:57:12 PST
reverting this seems to help:

changeset:   9107:75840841cc21
user:        David Bienvenu <bienvenu@nventure.com>
date:        Wed Jan 04 08:40:48 2012 -0800
summary:     fix deadlock in imap ssl calling isAlive, r=standard8, bug 711787
 mailnews/imap/src/nsImapProtocol.cpp |  27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)
Comment 3 David :Bienvenu 2012-03-07 08:21:57 PST
Have you changed the number of connections to cache with that server from the default of 5?

Next time this happens, we need more of the main thread's stack, and info from the other stacks, in particular, there should be an other thread doing something with the imap protocol object.

We can't revert the other fix since that hang was a lot worse. I've never seen this hang myself.
Comment 4 Tuukka Tolvanen (sp3000) 2012-03-07 11:33:06 PST
max_cached_connections is 4; timeout 29
Comment 5 Tuukka Tolvanen (sp3000) 2012-03-07 14:01:49 PST
Created attachment 603846 [details]
(gdb) thread apply all bt

this box has the default 5 cached connections fwiw
Comment 6 WADA 2012-03-08 23:13:09 PST
Is following effective recovery procedure when problem occurs?
  Go Work Offlne mode, Go back to Work Online mode, then open the folder again.

Do you enable automatic new mail check? Do you enable IDLE command use?
Is frequency of your problem reduced by disabling IDLE command use?
Comment 7 David :Bienvenu 2012-03-09 07:37:13 PST
ah, thx. Ok, thread 30 is trying to retry a url, probably because the server or network dropped a connection (or perhaps a loadgroup was cancelled which killed the connection):

#5  0x00007ffff24c7ae5 in (anonymous namespace)::DispatchSyncRunnable (
    r=0x7fffc511b780)
    at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:308
#6  0x00007ffff24cab48 in ImapServerSinkProxy::PrepareToRetryUrl (
    this=0x7fffc507d740, a1=0x7fffc51ec000, a2=0x7fffcb8febf0)
    at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:464
#7  0x00007ffff2463e6b in nsImapProtocol::RetryUrl (this=0x7fffcd9b1800)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1876
#8  0x00007ffff2461e19 in nsImapProtocol::ImapThreadMainLoop (
    this=0x7fffcd9b1800)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1361
#9  0x00007ffff2460b31 in nsImapProtocol::Run (this=0x7fffcd9b1800)

and at just the wrong time, the UI thread is trying to find a connection it can use, which causes contention over a PR_CEnterMonitor(this) on the protocol object. I'll have to think about this. I think the imap thread shouldn't be holding onto the monitor when calling into the UI thread with the sink proxy runnable calls, which might not be too hard to fix.
Comment 8 David :Bienvenu 2012-03-09 08:12:06 PST
I suspect we don't need to use the monitor at all in this method. All the member vars we're accessing should be safe to access from the imap thread. The server object protects accesses to connections with its own monitor.
Comment 9 David :Bienvenu 2012-03-09 17:02:13 PST
Created attachment 604568 [details] [diff] [review]
possible fix

I'll request a try server build w/ this patch, but if you can do your own builds, here's the patch.
Comment 10 Tuukka Tolvanen (sp3000) 2012-03-13 05:20:35 PDT
Comment on attachment 604568 [details] [diff] [review]
possible fix

yeah, this fixes it for me, didn't get any other issues in the past day with it
Comment 11 David :Bienvenu 2012-03-13 07:55:11 PDT
http://hg.mozilla.org/comm-central/rev/a9f0e769a175
Comment 12 David :Bienvenu 2012-03-13 07:57:30 PDT
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Approval Request Comment]
User impact if declined: occasional hangs
Testing completed (on c-c, etc.): try server build fixed issue for reporter
Risk to taking this patch (and alternatives if risky): slight risk of race conditions though hangs are much more likely
Comment 13 Mark Banner (:standard8, afk until Dec) 2012-03-20 07:52:39 PDT
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Triage Comment]
This already landed in time for 13. The requests should have let it go into 12, somehow I missed that. So a=me for comm-beta.
Comment 14 Mark Banner (:standard8, afk until Dec) 2012-03-20 07:55:58 PDT
Checked in:

http://hg.mozilla.org/releases/comm-beta/rev/151697a4635b
Comment 15 Ludovic Hirlimann [:Usul] 2012-03-26 08:46:14 PDT
*** Bug 739251 has been marked as a duplicate of this bug. ***
Comment 16 Wayne Mery (:wsmwk, NI for questions) 2012-03-26 15:19:46 PDT
*** Bug 738562 has been marked as a duplicate of this bug. ***
Comment 17 Ludovic Hirlimann [:Usul] 2012-03-27 01:21:25 PDT
*** Bug 738930 has been marked as a duplicate of this bug. ***
Comment 18 Paul miranda 2012-03-27 06:52:14 PDT
Is it possible to apply this patch to 11.0?
Comment 19 Mark Banner (:standard8, afk until Dec) 2012-03-27 10:36:23 PDT
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Triage Comment]
a=Standard8, as per drivers meeting we've decided to spin a 11.0.1 for this and another issue.

I've landed it on comm-release already:

http://hg.mozilla.org/releases/comm-release/rev/832c448e5d0a
Comment 20 Wayne Mery (:wsmwk, NI for questions) 2012-03-27 12:14:22 PDT
*** Bug 739251 has been marked as a duplicate of this bug. ***
Comment 21 David :Bienvenu 2012-03-27 14:01:20 PDT
*** Bug 739781 has been marked as a duplicate of this bug. ***
Comment 22 Wayne Mery (:wsmwk, NI for questions) 2012-03-28 11:00:41 PDT
I've marked the topic in https://getsatisfaction.com/mozilla_messaging/topics/problems_with_mozilla_11_0 to reflect solution in 11.0.1
Comment 23 :aceman 2012-03-29 05:46:39 PDT
*** Bug 740298 has been marked as a duplicate of this bug. ***
Comment 24 John Center 2012-04-02 06:41:49 PDT
*** Bug 739997 has been marked as a duplicate of this bug. ***
Comment 25 David :Bienvenu 2012-04-02 07:05:10 PDT
*** Bug 728740 has been marked as a duplicate of this bug. ***
Comment 26 Wayne Mery (:wsmwk, NI for questions) 2012-04-02 20:14:19 PDT
*** Bug 739688 has been marked as a duplicate of this bug. ***
Comment 27 :Irving Reid (No longer working on Firefox) 2013-01-30 12:17:57 PST
*** Bug 713624 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.