Closed Bug 733731 Opened 12 years ago Closed 12 years ago

hang when switching folders on IMAP server

Categories

(MailNews Core :: Networking: IMAP, defect)

x86_64
Linux
defect
Not set
critical

Tracking

(thunderbird11- fixed, thunderbird12+ fixed)

RESOLVED FIXED
Thunderbird 13.0
Tracking Status
thunderbird11 - fixed
thunderbird12 + fixed

People

(Reporter: tuukka.tolvanen, Assigned: Bienvenu)

References

(Regression, )

Details

(Keywords: hang, regression, Whiteboard: [gs][gssolved])

Attachments

(2 files)

frequently enough (1/10 or so) when I hit space or n, say yes to the prompt sheet to advance to next folder with unread (or, just click on a folder to switch to it) this results in a hang. server is exchange over imap

hanging on central/central since sometime in January, I think ...a bit hard to narrow it down as the reproducibility is a bit random.

Actually, the str could be something like: read stuff, idle for a minute, switch to a folder with no unread, hit space and yes to advance. Regression range:

2012-01-04-03-00-26-comm-central good 0/4
http://hg.mozilla.org/mozilla-central/rev/200a8d1fb452
http://hg.mozilla.org/comm-central/rev/fd9f0ac2bcaf

2012-01-05-03-00-25-comm-central bad 2/2
http://hg.mozilla.org/mozilla-central/rev/4795500b7c1d
http://hg.mozilla.org/comm-central/rev/6ad18d15c741
(gdb) bt
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007ffff7bca36f in _L_lock_1145 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff7bca2ba in __pthread_mutex_lock (mutex=0x7ffff6d48288)
    at pthread_mutex_lock.c:101
#3  0x00007ffff62876f9 in PR_Lock ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#4  0x00007ffff6287d15 in PR_EnterMonitor ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#5  0x00007ffff6278501 in PR_CEnterMonitor ()
   from /usr/lib/x86_64-linux-gnu/libnspr4.so
#6  0x00007ffff23b7983 in nsImapProtocol::PseudoInterruptMsgLoad (
    this=0x7fffc95e9000, aImapFolder=0x7fffcbab3440, 
    aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1279
#7  0x00007ffff2371c64 in nsImapIncomingServer::PseudoInterruptMsgLoad (
    this=0x7fffd6ac9be0, aImapFolder=0x7fffcbab3440, 
    aMsgWindow=0x7fffd6ad4f70, interrupted=0x7ffffffef70f)
    at comm-central/mailnews/imap/src/nsImapIncomingServer.cpp:2246
#8  0x00007ffff23eaac4 in nsImapService::GetMessageFromUrl (
    this=0x7fffd0c87e20, aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, 
    aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, 
    aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, 
    aConvertDataToText=false, aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:1084
#9  0x00007ffff23ea641 in nsImapService::FetchMessage (this=0x7fffd0c87e20, 
    aImapUrl=0x7fffb6ef8c00, aImapAction=268435480, 
    aImapMailFolder=0x7fffcbab3440, aImapMessage=0x7fffcbab35f8, 
    aMsgWindow=0x7fffd6ad4f70, aDisplayConsumer=0x7fffd23b3908, 
    messageIdentifierList=..., aConvertDataToText=false, 
    aAdditionalHeader=..., aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:1046
#10 0x00007ffff23e7728 in nsImapService::DisplayMessage (this=0x7fffd0c87e20, 
    aMessageURI=0x7fffb6b84e88 "imap-message://domain%5Cuser@host.company.com/folder#86789", aDisplayConsumer=0x7fffd23b3908, aMsgWindow=0x7fffd6ad4f70, 
    aUrlListener=0x0, aCharsetOverride=0x0, aURL=0x0)
    at comm-central/mailnews/imap/src/nsImapService.cpp:580

(not sure why I seem to be getting system nspr? anyhow, it hangs in m.o nightlies as well)
reverting this seems to help:

changeset:   9107:75840841cc21
user:        David Bienvenu <bienvenu@nventure.com>
date:        Wed Jan 04 08:40:48 2012 -0800
summary:     fix deadlock in imap ssl calling isAlive, r=standard8, bug 711787
 mailnews/imap/src/nsImapProtocol.cpp |  27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)
Blocks: 711787
Severity: normal → critical
Keywords: hang, regression
Have you changed the number of connections to cache with that server from the default of 5?

Next time this happens, we need more of the main thread's stack, and info from the other stacks, in particular, there should be an other thread doing something with the imap protocol object.

We can't revert the other fix since that hang was a lot worse. I've never seen this hang myself.
max_cached_connections is 4; timeout 29
this box has the default 5 cached connections fwiw
Is following effective recovery procedure when problem occurs?
  Go Work Offlne mode, Go back to Work Online mode, then open the folder again.

Do you enable automatic new mail check? Do you enable IDLE command use?
Is frequency of your problem reduced by disabling IDLE command use?
ah, thx. Ok, thread 30 is trying to retry a url, probably because the server or network dropped a connection (or perhaps a loadgroup was cancelled which killed the connection):

#5  0x00007ffff24c7ae5 in (anonymous namespace)::DispatchSyncRunnable (
    r=0x7fffc511b780)
    at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:308
#6  0x00007ffff24cab48 in ImapServerSinkProxy::PrepareToRetryUrl (
    this=0x7fffc507d740, a1=0x7fffc51ec000, a2=0x7fffcb8febf0)
    at comm-central/mailnews/imap/src/nsSyncRunnableHelpers.cpp:464
#7  0x00007ffff2463e6b in nsImapProtocol::RetryUrl (this=0x7fffcd9b1800)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1876
#8  0x00007ffff2461e19 in nsImapProtocol::ImapThreadMainLoop (
    this=0x7fffcd9b1800)
    at comm-central/mailnews/imap/src/nsImapProtocol.cpp:1361
#9  0x00007ffff2460b31 in nsImapProtocol::Run (this=0x7fffcd9b1800)

and at just the wrong time, the UI thread is trying to find a connection it can use, which causes contention over a PR_CEnterMonitor(this) on the protocol object. I'll have to think about this. I think the imap thread shouldn't be holding onto the monitor when calling into the UI thread with the sink proxy runnable calls, which might not be too hard to fix.
I suspect we don't need to use the monitor at all in this method. All the member vars we're accessing should be safe to access from the imap thread. The server object protects accesses to connections with its own monitor.
Attached patch possible fixSplinter Review
I'll request a try server build w/ this patch, but if you can do your own builds, here's the patch.
Assignee: nobody → dbienvenu
Comment on attachment 604568 [details] [diff] [review]
possible fix

yeah, this fixes it for me, didn't get any other issues in the past day with it
Attachment #604568 - Flags: feedback+
Attachment #604568 - Flags: review?(neil)
Attachment #604568 - Flags: review?(neil) → review+
http://hg.mozilla.org/comm-central/rev/a9f0e769a175
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 13.0
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Approval Request Comment]
User impact if declined: occasional hangs
Testing completed (on c-c, etc.): try server build fixed issue for reporter
Risk to taking this patch (and alternatives if risky): slight risk of race conditions though hangs are much more likely
Attachment #604568 - Flags: approval-comm-aurora?
Attachment #604568 - Flags: approval-comm-aurora? → approval-comm-aurora+
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Triage Comment]
This already landed in time for 13. The requests should have let it go into 12, somehow I missed that. So a=me for comm-beta.
Attachment #604568 - Flags: approval-comm-aurora+ → approval-comm-beta+
Summary: hang switching folders → hang when switching folders on IMAP server
Is it possible to apply this patch to 11.0?
Comment on attachment 604568 [details] [diff] [review]
possible fix

[Triage Comment]
a=Standard8, as per drivers meeting we've decided to spin a 11.0.1 for this and another issue.

I've landed it on comm-release already:

http://hg.mozilla.org/releases/comm-release/rev/832c448e5d0a
Attachment #604568 - Flags: approval-comm-release+
I've marked the topic in https://getsatisfaction.com/mozilla_messaging/topics/problems_with_mozilla_11_0 to reflect solution in 11.0.1
Whiteboard: [gs][gssolved]
Blocks: 739997
No longer blocks: 711787
Regressed by: 711787
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: