Closed Bug 420744 Opened 16 years ago Closed 15 years ago

Shutdown fails with 100% cpu load when auto-complete over LDAP is used

Categories

(Thunderbird :: Message Compose Window, defect)

defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 3.0rc1

People

(Reporter: wsmwk, Assigned: Bienvenu)

References

Details

(Keywords: fixed-seamonkey2.0, hang, Whiteboard: [no l10n impact])

Attachments

(7 files)

hangs on trunk are far more frequent than bug 340657, so this might be a new problem, or simply something tickling 340657 to happen more often. frequency is more often than not. happening for several weeks(?). unable so far to determine cause. Presently on version 3.0a1pre (2008022803). prior to that 20080204.

I don't use ldaps.  will attach an imap trace to show whether imap is all closed.
bug 340657 is incorrect. Not sure what bug# I had in mind.

xref bug 283823 MailNews "hangs" when it should be closing: CPU goes to 18%
this may be a dupe of bug 382446
ldap directory defined, ldap port 389

1. start thunderbird
2. compose window
3. type address in compose window that results in some condition (1)
4. file|exit
5. click don't save

result: hang, high cpu

reproducible: sometimes
note: in my daily usage of 1+hours, 90+% of my shutdowns are hangs.
currently running version 3.0a1pre (2008032302)

(1) I'm not yet sure what that condition is - lookup failure?
Usually I seen this problems when TB unable to close connection because of network problems. So it may hang forever or even drain cpu cycles.
Depends on: 382446
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
I haven't seen my shutdown hangs since bug 382446 patch landed. Will leave open a little longer yet though.
David, I can still reproduce this pretty often (but not 100%):

1. My mozilla.com account is set to be the default account.
2. From my Gmail account, which is set to use my global LDAP autocomplete (ldap.mozilla.org), composing a new message using LDAP autocomplete to self, subject of "test", body "test", and sending.  Auto spell-check comes on, finds no corrections needed, and sends it.
3. If I do a Thunderbird | Quit Shredder, I get the "operation cannot be complete...secure connection" error, and Thunderbird hangs...

Build ID: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.0.2pre) Gecko/2008070503 Thunderbird/3.0a2pre

Would a Mac Sample help?  I've got one.
Attached file Giant log file
Sorry for the giant logfile -- I CTRL+C'd from Terminal (from which I had launched Thunderbird.app just after setting NSPR_LOG_MODULES=IMAP:5, etc), and the following lines were the last output before the hang:

387587584[1b2a02c0]: ImapThreadMainLoop leaving [this=17184c00]
40736768[1b264c00]: ImapThreadMainLoop leaving [this=268fe00]
41136640[162b65f0]: ImapThreadMainLoop leaving [this=273aa00]
40209408[1aed9130]: ImapThreadMainLoop leaving [this=2341200]
Stephen looks like your problem in bug 206018
(In reply to comment #5)
> I haven't seen my shutdown hangs since bug 382446 patch landed. Will leave open
> a little longer yet though.

encountered today - first time in 3 weeks I've seen this:
* XP
* 20080716 build
Wayne, debugging symbols are real helper here in case you know how to debug process.
yeah, i screwed up trying to symbols
Well make sure all connection are REALLY closed, netstat helps a lot too ;). 
I'm facing this issue with even today's latest build. I've a e-mail account with LDAP address book configured. I opened the compose window, typed in some addresses, closed the compose window, then closed the Shredder. It hangs. I force-quit it. And got the hang report details (Attached).
Let me know if you need any specific information from me. 
Vinay, do you see this using current trunk?
My problems are gone, I think.
I see this still, using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.1b3pre) Gecko/20090210 Shredder/3.0b2pre, when I've just finished sending an email and then I shut down.

In my case, I have "Clean up (Expunge) Inbox on exit" checked, and I'm running over SSL.
Sorry: I really think that it's more the "Empty Trash on exit" pref, which I've enabled, that's causing this behavior.
Flags: blocking-thunderbird3?
Bienvenu: is there anything bad in the logfile in https://bugzilla.mozilla.org/show_bug.cgi?id=420744#c7 that might explain this?

I see this after doing a File | Empty Trash, a message send, and then a File | Exit.
(In reply to comment #18)
> Sorry: I really think that it's more the "Empty Trash on exit" pref, which I've
> enabled, that's causing this behavior.

Stephen, can you reproduce this issue consistently? If yes, can you disable "Empty Thrash on exit" pref and then try reproducing it again?

I never had this option enabled and I've seen hang on exit sometime last week but not today. I'm on nightly builds so can't say for sure if the issue got fixed or if I've not hit the exact scenario for it recur.
Vinay: I disabled "Empty Trash on Exit" and it still happens:

http://screencast.com/t/RSASpJppq
Dtephen, Vinay: does this happen without LDAP enabled? Comment 0 explicitly states that this bug was raised as an issue without LDAP enabled, so if you have problems with LDAP then we need to get those covered in a different bug.
LDAP was in bug 382446, so I guess we should file a new bug, since it's been a while since it was resolved fixed?
Stephen, LDAP over SSL was particularly the issue, and might still be...
Sorry, but this just doesn't feel like a blocker yet. Odd though it is to put qawanted on a bug with two of our best QA people already on it, that's what it needs before it has a hope of being a blocker: at this point, it's really UNCO.

bienvenu: any thoughts about what they can be doing to give us an idea of what this is, or at least whether it's LDAP, IMAP, or something else entirely?
Flags: blocking-thunderbird3?
Keywords: qawanted
doing a netstat when shutdown hangs can be interesting - if there are open ssl ldap connections, that's a clue.
foo:~ stephend$ netstat -n | grep "443"
tcp4       0      0  192.168.0.198.59736    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59735    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59734    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59733    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59732    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59730    209.225.35.36.443      ESTABLISHED
tcp4       0      0  192.168.0.198.59727    74.125.19.17.443       ESTABLISHED
tcp4       0      0  192.168.0.198.59607    74.125.19.17.443       ESTABLISHED

David, that's dumped while Shredder hangs; after I force-quit Shredder, I do a netstat -n | grep "443" again, and this time all port-443 sessions are terminated.
https? Do you have any rss feeds that are read over https?
(In reply to comment #28)
> https? Do you have any rss feeds that are read over https?

I don't, no; does the logfile in comment 29 help at all?  Maybe not without hostnames, I guess.  If you need those, I can do a netstat -a next; let me know.
going to https://209.225.35.36 tells me that it wants a cert for itss.gsa.gov

hostnames might help figure out what host it's connecting to, yes.
(In reply to comment #31)
> going to https://209.225.35.36 tells me that it wants a cert for itss.gsa.gov
> 
> hostnames might help figure out what host it's connecting to, yes.

Hi David -

Turns out that netstat doesn't really help, and I really wanted:

foo:~ stephend$ lsof -i -P
COMMAND    PID     USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
Tunnelbli  115 stephend   10u  IPv4 0x6684410      0t0  UDP *:49155
Itiva\x20  116 stephend   13u  IPv4 0x6685860      0t0  UDP *:49153
Itiva\x20  116 stephend   18u  IPv4 0x6cd0228      0t0  TCP *:25111 (LISTEN)
Itiva\x20  116 stephend   19u  IPv4 0x6cd0920      0t0  TCP *:25112 (LISTEN)
ARDAgent   237 stephend   16u  IPv4 0x6e37d40      0t0  UDP *:3283
AppleVNCS  239 stephend   15u  IPv4 0x72a1c9c      0t0  TCP *:5900 (LISTEN)
thunderbi 5651 stephend   47u  IPv4 0x7705710      0t0  TCP 192.168.0.198:53578->pm-ns01.mozilla.org:636 (ESTABLISHED)

As you can see, there's still a TCP session to the LDAP server on port 636.
Depends on: 487965
I was able to reproduce with 20090408 build. 
(wonder if it's as easy to recreate before 20090331/bug 485995?)

1. open 3pane, open AB, open and compose message (but don't send)
2. empty trash
3. send message (which prompted for smtp password)
4. file exit on 3pane
AB window is left open
5. attempt to close AB window

windows dialog reports 
 End Program - AB
 program not responding
 End Now   //  Cancel

netstat reports open imap connections and no ldap afaict

Active Connections
  Proto  Local Address          Foreign Address        State
  TCP    fs0011110aa346:3485    localhost:3486         ESTABLISHED
  TCP    fs0011110aa346:3486    localhost:3485         ESTABLISHED
  TCP    fs0011110aa346:3489    localhost:3490         ESTABLISHED
  TCP    fs0011110aa346:3490    localhost:3489         ESTABLISHED
  TCP    fs0011110aa346:3491    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3495    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3496    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3503    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3631    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3632    rain.cc.lehigh.edu:993  CLOSE_WAIT
  TCP    fs0011110aa346:3671    rain.cc.lehigh.edu:993  CLOSE_WAIT


one second attempt at similar steps (minus AB being open) did not result in hang.
Flags: wanted-thunderbird3?
Keywords: qawanted
Not sure if there is anything else being sought in regards to this bug or not, but I've been having similar issues for a couple of months now on Mac OSX 10.5.6 with nightlies. Same symptoms as described above, same artifacts as well:

silent:~ jay$ netstat -n | grep 993
tcp4       0      0  10.181.70.70.52350     10.178.3.61.993        CLOSE_WAIT
tcp4       0      0  10.181.70.70.52349     10.178.3.61.993        CLOSE_WAIT
tcp4       0      0  10.181.70.70.52348     10.178.3.61.993        CLOSE_WAIT
tcp4       0      0  10.181.70.70.51229     10.178.3.61.993        CLOSE_WAIT
tcp4       0      0  10.181.70.70.51188     10.178.3.61.993        CLOSE_WAIT
For the record after I posted on 4/30 I have NEVER had another hang. This is with corporate Exchange IMAP.
(In reply to comment #37)
> For the record after I posted on 4/30 I have NEVER had another hang. This is
> with corporate Exchange IMAP.

If comment 36 was a nightly build prior to 2009-04-18, and you ran something higher than that since then, then likely fixed by bug 487965 comment 11.
I've been running nightlies again since Feb '09 through today. I saw the problem in comment 36 up until 4/30, and would see it maybe every other day for a month or two before that (call it all of March/April). I had (have) been waiting for it to happen again to do more testing/profiling on it, but haven't seen it since 4/30 for me.
This is still happening for me on the nightly -- since I am forced to force-quit, I can't submit a crash report. :(
Flags: blocking-thunderbird3?
This bug is rather ill-defined and will likely morph to being a meta. The real action has moved into other bugs. So you will be better off to visiting one of them. Or file a new one if none match your symptoms - the primary symptoms being amount of cpu and open tcp connections (how many and what protocol).

Currently open bugs:
bug 459265 Thunderbird.exe in Task Manager Processes after closing application (with Lightning 0.9) - hangs during shutdown
bug 495551 hang on shutdown, zero cpu, few (1-2) imap connections - from imap url initiated during shutdown (modal alert from imap thread / proxy call to UI thread)
bug 497059 Hang with unreliable wifi connection, either when re-connecting to email server, or on shutdown
bug 494014 shutdown hang, high cpu, no open imap connections 

query including FIXED bugs: https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&short_desc_type=anywordssubstr&short_desc=shut+quit+close&product=MailNews+Core&product=Thunderbird&keywords_type=allwords&keywords=hang&resolution=FIXED&resolution=---&chfieldto=Now
wsm: excellent point about this bug being insufficiently concrete.  I've removed the nominations from this one, and made some comments in the other bugs.
Flags: wanted-thunderbird3?
Flags: blocking-thunderbird3?
using Henrik's excellent summary

transplanting bienvenu's bug 502558 comment 9 ...

bug 382446 was the one I was thinking of - it was marked fixed because the main
connection leak was fixed, but afterwards, we discovered that there's still a
connection left open if you shut down immediately after sending a message. I
think this bug can stand for the remaining issue, because I can't find a bug
that describes the issue - davida ran into it, and we (Standard8 and I)
debugged it, but I don't know if the info ended up in a bug.
Summary: thunderbird hangs on shutdown → Shutdown fails with 100% cpu load when auto-complete over LDAP is used
So what info is needed from QA? I wonder that no-one can reproduce this hang on shutdown. It happens all the time for me.
Stephen - here's your hang on shutdown. Henrik, we know how to recreate it. But fixing it requires re-architecting the ldap threading code in scary ways. That discussion may have unfortunately taken place in IRC and not bugzilla. Perhaps there's some hack we can involving listening for quit-application requested and giving js gc more time to clean up. But the issue as I remember it was that closing the compose window doesn't clean up the autocomplete's ldap connection quickly enough before shutdown comes along.
We're going to try once more to see if we can fix this for 3.0
Assignee: nobody → bienvenu
Flags: blocking-thunderbird3+
Target Milestone: --- → Thunderbird 3.0rc1
(In reply to comment #47)
> We're going to try once more to see if we can fix this for 3.0

So I was trying to fix this in bug 343332 - as rather than having a big while loop all the events on that thread would be posted as required. However as bug 343332 comment 1 says at the end it still didn't quite work (shutting down immediately after an ldap search would hang) and I think it probably needed some kind of xpcom shutdown observer.
Whiteboard: [no l10n impact]
Here's my current thinking about this - instead of messing with the ldap connection code, I'd like to avoid the threading issues, and just add a service that keeps track of ldap autocomplete sessions, and is a shutdown observer. If it gets notified of shutdown and there are existing autocomplete sessions, it would pump events until the autocomplete sessions have been garbage collected, or a few seconds have passed (don't want to block shutdown forever). In theory, we could do this for ldap connections, but in actual fact, it's ldap autocomplete sessions that are causing the problem.
So far, this approach hasn't worked...perhaps pumping events doesn't make js garbage collection work...
Attached patch proposed fixSplinter Review
This basically makes us cleanup our ldap connections as early as possible in the process of shutting down a compose window.

If the user cancels the close of the compose window, we will have already cleaned up the ldap session, but we'll recreate it when the user tries to do autocomplete, because of the "gSetupLdapAutocomplete = false;" line I added when we cleanup.
Attachment #405055 - Flags: review?(bugzilla)
SM might be interested in a similar fix (even though it's a bit of a hack).
Whiteboard: [no l10n impact] → [no l10n impact][has patch for review standard8]
Attached patch SeaMonkey portSplinter Review
Sounds good to me, but without the trailing whitespace on line 35 ;-)
Attachment #405057 - Flags: review?
Neil, did you mean to ask review of someone in particular?
Comment on attachment 405057 [details] [diff] [review]
SeaMonkey port

Odd, I don't remember seeing the usual "match failed, loser" message...
Attachment #405057 - Flags: review? → review?(bugzilla)
Comment on attachment 405055 [details] [diff] [review]
proposed fix

Looks good. Definitely the problem I think we've been having. r=Standard8
Attachment #405055 - Flags: review?(bugzilla) → review+
Component: General → Message Compose Window
QA Contact: general → message-compose
fix checked in.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [no l10n impact][has patch for review standard8] → [no l10n impact]
Comment on attachment 405057 [details] [diff] [review]
SeaMonkey port

r+aSM2.0 = Standard8
Attachment #405057 - Flags: review?(bugzilla) → review+
Thanks a ton, David! Works perfectly with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.5pre) Gecko/20091009 Shredder/3.0pre ID:20091009073703

Marking as verified fixed.
Status: RESOLVED → VERIFIED
Happy to report that this also fixed my problem (comment 46); thanks, David!
Pushed changeset to 115cfab35745 to comm-central for SeaMonkey port.
David, today I have seen this problem again with a Lanakai build from 20100513. The process didn't shutdown and I got the warning dialog that a secure communication channel could not be shutdown. I had to kill Thunderbird.
Henrik, did you do an ldap autocomplete right before shutting down? There's still a race condition in the ldap code that would involve quite a bit of rejiggering of the ldap code to fix. It would be good to know, in future, if it's an ldap or imap connection that's still open at shutdown time.
I'm sure it was an LDAP query but I can't reproduce it right now. Do we have a bug filed for the remaining issue? I don't wanna reopen that one.
(In reply to comment #64)
> I'm sure it was an LDAP query but I can't reproduce it right now. Do we have a
> bug filed for the remaining issue? I don't wanna reopen that one.

Its probably loosely covered by bug 343332 and maybe bug 452232. Certainly bug 343332 is where the work should probably be done.
yes, thx, bug 343332 is the one I was thinking of.
You need to log in before you can comment on or make changes to this bug.