Closed Bug 476960 Opened 11 years ago Closed 11 years ago

IMAP Hang involving UI thread

Categories

(MailNews Core :: Networking: IMAP, defect, major)

x86
All
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 3.0b2

People

(Reporter: gkw, Assigned: Bienvenu)

References

Details

(Keywords: hang)

Attachments

(7 files, 2 obsolete files)

Attached file all-thread backtrace
I've finally trapped down this hard-to-diagnose IMAP hang. When I came out of standby, sometimes TB will have a spinning bob for about 30s before I can use TB. This time, I trapped it using gdb and the attached info is there.
Flags: blocking-thunderbird3?
taking - we shouldn't be issuing imap protocol and parsing responses on the UI thread - it's a recipe for disaster. And when the connection has timed out, we shouldn't be doing all that stuff anyway.
Assignee: nobody → bienvenu
Summary: Hang involving UI thread → IMAP Hang involving UI thread
Attached file second backtrace
when i pressed continue, there were another 3 traps by gdb. Here's the thread apply all backtrace of the first one of these 3.
Related to bug 434642?
bienvenu, do you need the IMAP logs?

One of them ends like this (during the hang):

-1333592064[16e62bd0]: ReadNextLine [stream=16e630cc nb=10 needmore=0]
-1333592064[16e62bd0]: 160bb000:imap.gmail.com:S-INBOX:CreateNewLineFromSocket: + idling
-1603684576[70a8d0]: WARNING: GetGeckoKeyCodeFromChar has failed.: file /Users/skywalker/comm-central/mozilla/widget/src/cocoa/nsChildView.mm, line 4320
-1603684576[70a8d0]: WARNING: Preventing load of VerifiedDownloadPlugin.plugin (see bug 436575): file /Users/skywalker/comm-central/mozilla/modules/plugin/base/src/nsPluginsDirDarwin.cpp, line 163
-1603684576[70a8d0]: 160bb000:imap.gmail.com:S-INBOX:SendData: DONE
-1603684576[70a8d0]: ReadNextLine [stream=16e630cc nb=0 needmore=1]
-1603684576[70a8d0]: 160bb000:imap.gmail.com:S-INBOX:CreateNewLineFromSocket: clearing IMAP_CONNECTION_IS_OPEN - rv = 804b0014
Attached patch proposed fix (obsolete) — Splinter Review
Gary, here's a patch to try - basically, this makes it so the UI thread no longer blocks when trying to shutdown imap connections. This may cause us not to gracefully log out of all cached imap connections at shutdown, depending on how fast the rest of shutdown goes - I need to think about that a little...but on the plus side, bonking the go offline button will no longer block the UI for several seconds while we close imap connections. And it should help with your problem.
OS=ALL, and not a regression, correct?

can this generalized as a potential cause of crashes which have TellThreadToDie in the stack, such as nsProxyEventObject::CallMethod == bug 470808? (topcrash for 3.0b1, but on trunk, only 2 in last 8 days)
I'd say OS=ALL, right. Probably not a regression, though it might be from 1.0 or something...it might be the cause of a crash, though I can't say for sure.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: --- → Thunderbird 3.0b3
(In reply to comment #5)
> Created an attachment (id=361385) [details]
> proposed fix
> 
> Gary, here's a patch to try - basically, this makes it so the UI thread no
> longer blocks when trying to shutdown imap connections. This may cause us not
> to gracefully log out of all cached imap connections at shutdown, depending on
> how fast the rest of shutdown goes - I need to think about that a little...but
> on the plus side, bonking the go offline button will no longer block the UI for
> several seconds while we close imap connections. And it should help with your
> problem.

This patch looks good -- I no longer get spinning bob hangs at school, now I get a bunch of "cannot connect to server" dialogs immediately when I just resume from standby, but that's to be expected since my wifi is not yet connected. Once everything connects, they're good to go.

Request review or something for checkin of patch?
Attachment #361385 - Flags: superreview?(neil)
Attachment #361385 - Flags: review?(bugzilla)
Attached file sample of another hang
bienvenu, I don't know if this hang is related, but it occurs with the patch.

STR:
- Send an email with 2 large attachments through Gmail SMTP.
- While it is sending, i.e. the dialog screen, go back to TB main window and press Get All Messages, i.e. cmd-shift-t. Shredder hangs, but gdb doesn't show anything so I took a sample through Activity Monitor.

I seem to recall a not-so-fast connection or one that is being throttled by the school during the sending of the email. Would this be related?
Comment on attachment 361385 [details] [diff] [review]
proposed fix

>+  /**
>+   * Tell thread to die - only call from IMAP thread
>+   *
>+   * @param aIsSafeToClose false if we're dropping a timed out connection.
>+   */
>+  void tellThreadToDie(in boolean aIsSafeToClose);
Are there supposed to be any public callers of this any more? I noticed a caller in nsImapIncomingServer::CloseConnectionForFolder which is invoked when a folder is renamed which looks as if it should switched. Additionally it might be better to rename the internal TellThreadToDie instead. This would mean that you don't have to change the interface ;-)

>+// this is to be called from the UI thread.
Worth an assertion?

> NS_IMETHODIMP
> nsImapProtocol::TellThreadToDie(PRBool isSafeToClose)
> {
>   nsresult rv = NS_OK;
>   // ** This routine is called from the ui thread and the imap protocol thread.
Fix the comment? (Also worth an assertion?) Also, I notice that all the callers pass PR_FALSE, except the new one that passes m_safeToCloseConnection; if you initialise that to 0 in the constructor then you can use it directly instead of passing it in explicitly.
Attached patch My idea didn't quite work... (obsolete) — Splinter Review
Spot the ugly static cast :-(
I'll add assertions and fix the comments, and fix CloseConnectionForFolder...
Flags: blocking-thunderbird3+ → blocking-thunderbird3?
OS: All → Mac OS X
Hardware: All → x86
Target Milestone: Thunderbird 3.0b3 → ---
[01:51]   bienvenu> nth10sd - the ui thread was doing things like this:
[01:51]   bienvenu> #14 0x13d847dc in nsImapServerResponseParser::ParseIMAPServerResponse (this=0x160bb160, aCurrentCommand=0xfb87ca8 "DONE\r\n", aIgnoreBadAndNOResponses=0, aGreetingWithCapability=0x0) at /Users/skywalker/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp:233
[01:51]   bienvenu> #15 0x13d5d609 in nsImapProtocol::ParseIMAPandCheckForNewMail (this=0x160bb000, commandString=0x0, aIgnoreBadAndNOResponses=0) at /Users/skywalker/comm-central/mailnews/imap/src/nsImapProtocol.cpp:1758
[01:51]   bienvenu> the ui thread isn't supposed to read or write to the network
[01:51]   bienvenu> and that can cause deadlock
[01:51]  * nth10sd reads
[01:53]    nth10sd> bienvenu: http://pastebin.mozilla.org/620853 has:
[01:53]    nth10sd> #19 0x1495940d in MimeMessage_parse_eof (obj=0x1020a3b0, abort_p=0) at /Users/skywalker/comm-central/mailnews/mime/src/mimemsg.cpp:584
[01:53]  * nth10sd wonders if there's a similarity in the "parse"s
[01:53]   bienvenu> no, mime is only supposed to be called on the ui thread, and it is, in this case.
[01:54]    nth10sd> oh


The key causes of this IMAP hang are the ParseIMAPServerResponse and ParseIMAPandCheckForNewMail functions. (Thanks bienvenu for patiently guiding me on IRC)
Attachment #361385 - Attachment is obsolete: true
Attachment #361551 - Flags: superreview?(neil)
Attachment #361551 - Flags: review?(bugzilla)
Attachment #361385 - Flags: superreview?(neil)
Attachment #361385 - Flags: review?(bugzilla)
Attachment #361551 - Flags: superreview?(neil) → superreview+
Attached patch Castless versionSplinter Review
The code move looks safe, because calling Cancel on the mock channel will cause DeathSignalReceived to return true eventually anyway, but I'm not sure...
Attachment #361496 - Attachment is obsolete: true
Comment on attachment 361741 [details] [diff] [review]
Castless version

Gary, could you try running with this version of the patch instead to see if it works as well? I like this version of the patch well enough, but I'm a little worried about removing the check for safeToCloseConnection after the sleep in ImapThreadMainLoop()...
Flags: blocking-thunderbird3? → blocking-thunderbird3+
(In reply to comment #14)
> Created an attachment (id=361551) [details]
> fix comments, add assertions, fix folder rename case

(In reply to comment #15)
> Created an attachment (id=361741) [details]
> Castless version

Bienvenu, so I'm confused, I've been testing your patch id 361551 for quite a few days and it seems to work properly. I haven't tested Neil's patch 361741.

Should I test them together, or are they mutually exclusive such that I should test Neil's patch stand-alone only?
They are alternatives, mutually exclusive, so you'd need to back out my patch and apply Neil's instead. I'm just trying to choose between them.
(In reply to comment #18)
> They are alternatives, mutually exclusive, so you'd need to back out my patch
> and apply Neil's instead. I'm just trying to choose between them.

Sure, I have to test over the next few days.

/me goes to prepare a new debug build..
(In reply to comment #15)
> Created an attachment (id=361741) [details]
> Castless version
> 
> The code move looks safe, because calling Cancel on the mock channel will cause
> DeathSignalReceived to return true eventually anyway, but I'm not sure...

Neil, I have a compile error with your patch on the Mac:

chmod +x libimport.dylib
/Users/skywalker/objdir-tb/mozilla/config/nsinstall -L /Users/skywalker/objdir-tb/mailnews/import/build -m 755 libimport.dylib ../../../mozilla/dist/bin/components
: ../../../mozilla/dist/bin/components/libimport.dylib
make[3]: *** [libs_tier_app] Error 2
make[2]: *** [tier_app] Error 2
make[1]: *** [default] Error 2
make: *** [build] Error 2

real    48m19.073s
user    57m32.440s
sys     10m13.814s

(or is it just me?)
(In reply to comment #20)
> Neil, I have a compile error with your patch on the Mac:

I'm quite sure this is the case -- I did a reverse of Neil's patch and applied bienvenu's patch, this time the compile succeeded.
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case

switching review to Neil in the hopes of landing this today...but if you don't want to review it, feel free to switch it back...
Attachment #361551 - Flags: review?(bugzilla) → review?(neil)
Attachment #361551 - Flags: review?(neil) → review?(bugzilla)
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case

oops, standard8 thinks he might be able to look at this today.
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case

>+  /////////////////////////////////////////////////////////////////////////
>+  // IsBusy returns true if the connection is currently processing a url
>+  // and false otherwise.
>+  /////////////////////////////////////////////////////////////////////////

nit: I know you're just re-indenting, but can you change this to /**\n * IsBusy...\n */ format so that it is picked up by doxygen properly (same to the rest of the comments you've re-indented). Note: I'm not fussed about full documentation at this stage, just properly formatting what's there.

>+  /**
>+   * Tell thread to die - only call from IMAP thread
>+   *
>+   * @param aIsSafeToClose false if we're dropping a timed out connection.
>+   */
>+  void tellThreadToDie(in boolean aIsSafeToClose);

This seems unnecessary as an extra function to the UI thread - do we really need it?
>+
>+  /**
>+   * signalThreadToDie - called from the UI thread
>+   *
>+   * @param aIsSafeToClose false if we're dropping a timed out connection.
>+   */
>+  void signalThreadToDie(in boolean aIsSafeToClose);

>diff --git a/mailnews/imap/src/nsImapProtocol.cpp b/mailnews/imap/src/nsImapProtocol.cpp

>+nsImapProtocol::SignalThreadToDie(PRBool aIsSafeToClose)
>+{
>+  NS_WARN_IF_FALSE(NS_IsMainThread(),
>+                   "TellThreadToDie() should only be called from  UI thread");
>+  nsAutoCMonitor mon(this);
>+
>+  nsCOMPtr<nsIMsgIncomingServer> me_server = do_QueryReferent(m_server);
>+  if (me_server)
>+  {
>+      nsresult rv;
>+      nsCOMPtr<nsIImapIncomingServer>
>+          aImapServer(do_QueryInterface(me_server, &rv));
>+      if (NS_SUCCEEDED(rv))
>+          aImapServer->RemoveConnection(this);
>+      m_server = nsnull;
>+      me_server = nsnull;

nit: indentation needs fixing.
Attached patch combo-fixSplinter Review
this is a combination of Neil and my fixes - I left the static cast in because I'm not convinced we don't need it yet, and I'd like to put this in b2.

I changed the uuid because I changed TellThreadToDie to tellThreadToDie...
Attachment #362768 - Flags: review?(bugzilla)
Comment on attachment 362768 [details] [diff] [review]
combo-fix

>+  /**
>+   * Tell thread to die - only call from IMAP thread
>+   *
>+   * @param aIsSafeToClose false if we're dropping a timed out connection.
>+   */
>+  void tellThreadToDie(in boolean aIsSafeToClose);

The documentation is meant to say UI thread now...

>+  NS_WARN_IF_FALSE(NS_IsMainThread(),
>+                   "TellThreadToDie(aIsSafeToClose) should only be called from  UI thread");

nit: only one space before UI (two places)

r=me with them fixed. Lets give this a go and see how it goes (I've been seeing it as well btw).
Attachment #362768 - Flags: review?(bugzilla) → review+
Comment on attachment 362768 [details] [diff] [review]
combo-fix

carrying forward Neil's sr
Attachment #362768 - Flags: superreview+
OK, Gary, I hope you can go back to release builds with b2 :-)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.0b2
Depends on: 479011
Since this patch landed, both SeaMonkey and Thunderbird3.0 fail mozilla/_tests/xpcshell-simple/test_imap/unit/test_bug460636.js with that message:
*** TEST-UNEXPECTED-FAIL | e:/buildbot/win32-comm-1.9.1-check/build/mozilla/testing/xpcshell/head.js | SaveMessageToDisk did not complete within 10 seconds (incorrect messageURI?). ABORTING.

I don't think Thunderbird can release a b2 or SeaMonkey an a3 with unit tests failing on Windows...
I'm looking into the test failure - I can tweak the test so that it doesn't fail - the thing the test is checking actually succeeds - it's the removal of the saved file that fails, which I'm trying to figure out.
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case

I believe this patch is now obsolete.
Attachment #361551 - Flags: review?(bugzilla)
Gary, can you v. this is fixed in b2?
OS: Mac OS X → All
Fixed as far as I can tell in my <24h testing of b2 in Mac.
Status: RESOLVED → VERIFIED
Duplicate of this bug: 434642
Duplicate of this bug: 470808
Duplicate of this bug: 436774
You need to log in before you can comment on or make changes to this bug.