Last Comment Bug 382446 - Alert on shutdown when using ldaps address book - "operation can not be completed because of an internal failure. A secure network communication has not been cleaned up correctly."
: Alert on shutdown when using ldaps address book - "operation can not be compl...
Status: RESOLVED FIXED
:
Product: MailNews Core
Classification: Components
Component: Address Book (show other bugs)
: Trunk
: All All
: -- major (vote)
: mozilla1.9.1a1
Assigned To: David :Bienvenu
:
Mentors:
: 375667 416332 529991 (view as bug list)
Depends on:
Blocks: 420744
  Show dependency treegraph
 
Reported: 2007-05-30 07:27 PDT by Marc Sherman
Modified: 2014-12-30 05:08 PST (History)
12 users (show)
davida: blocking‑thunderbird3.0a2+
davida: blocking‑thunderbird3+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
screenshot of the alert (7.59 KB, image/png)
2007-05-30 07:27 PDT, Marc Sherman
no flags Details
fix for connection leak in ldap autocomplete session (1.64 KB, patch)
2008-06-30 13:19 PDT, David :Bienvenu
standard8: review+
neil: superreview+
Details | Diff | Splinter Review

Description Marc Sherman 2007-05-30 07:27:05 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Build Identifier: version 2.0.0.0 (20070326)

When an ldaps address book is configured in thunderbird, I get an Alert message box when I shut down thunderbird. The alert reads:

The operation can not be completed because cf (sic) an internal failure. A secure network communication has not been cleaned up correctly.

The ldaps address book does appear to function correctly, and after dismissing the alert, thunderbird appears to shut down cleanly.

Reproducible: Sometimes

Steps to Reproduce:
[note, this happens about 80% of the time -- I occasionally am able to quit without seeing the alert, but it happens more often than not.]

1. Configure ldaps address book
2. look up an address in it
3. quit thunderbird
Actual Results:  
Alert pops up before quitting

Expected Results:  
No alert
Comment 1 Marc Sherman 2007-05-30 07:27:50 PDT
Created attachment 266614 [details]
screenshot of the alert
Comment 2 Marc Sherman 2007-05-30 07:29:04 PDT
Argh. I guess I was transcribing from a shrunken view of the screen shot -- it doesn't say "cf", it says "of" correctly. :)
Comment 3 David :Bienvenu 2007-05-30 07:49:04 PDT
this is because we're not cleaning up an ldap connection before shut down.

Does this only happen in a session where you add the ldaps address book, or does it happen in subsequent sessions where you just look up an address in the ldap address book? Are you looking up the address via the compose window, or the address book window?
Comment 4 Marc Sherman 2007-05-30 07:52:19 PDT
It happens in subsequent sessions. I just tried a session where I only used the compose window, and another one where I only used the address book window, and it happened in both sessions.
Comment 5 Wayne Mery (:wsmwk, NI for questions) 2008-04-09 09:59:14 PDT
Marc, ever not have the alert and shutdown still not complete?
Comment 6 Marc Sherman 2008-04-09 10:45:20 PDT
Sorry, I'm not using ldaps any more, so I can't confirm.
Comment 7 David Ascher (:davida) 2008-05-02 17:33:06 PDT
FWIW, I'm still seeing this with trunk nightlies.
Comment 8 Wayne Mery (:wsmwk, NI for questions) 2008-05-02 18:09:43 PDT
is it possible to have the shutdown hang caused by ldap, and not get the alert?  
ref my bug 420744.
Comment 9 David Ascher (:davida) 2008-05-02 20:11:27 PDT
Wayne, if you were asking me, I don't know.  I'm not able to reproduce it reliably yet.
Comment 10 Wayne Mery (:wsmwk, NI for questions) 2008-05-20 12:35:09 PDT
confirming 

related? to Bug 188554 – Profile switching fails after having used LDAP/SSL ?

I have a colleague how sees this error message on Mac - OS=ALL [not that david doesn't count :)] - "consistently, Tbird (I have version 2.0.0.12 (20080213)) crashed when I Disconnect from Lehigh's VPN. This was the case both in Mac OS X 10.4 Tiger and now 10.5 Leopard".  

And also can create on windows - use an sldap (SSL ldap) search, close vpn, then close thunderbird, hang. (no error message)

perhaps the same as my bug 420744, but I don't see an error message at shutdown. bit of a pain to have to constantly kill the task.
Comment 11 David Ascher (:davida) 2008-06-09 11:35:57 PDT
I see this all the time, marking wanted for tb3.0a2 and blocking for 3.
Comment 12 Serge Gautherie (:sgautherie) 2008-06-12 17:36:02 PDT
Depends on bug 438035 ?
Comment 13 Mark Banner (:standard8) 2008-06-13 00:14:37 PDT
(In reply to comment #12)
> Depends on bug 438035 ?
> 
Not really, I forgot to comment on this bug the David Ascher hasn't been using the sidebar (he didn't know about it), which therefore means we have two bugs not one, and additionally that bug won't stop this one being fixed and vice versa.
Comment 14 Mark Banner (:standard8) 2008-06-13 01:57:16 PDT
*** Bug 416332 has been marked as a duplicate of this bug. ***
Comment 15 Sander Goudswaard 2008-06-13 03:53:12 PDT
My bug was marked as a dup of this bug which is OK, however in my case I am sure that it only *seems* to exit after confirming the dialog. In fact it remains running and visible in XP's task manager, even though there is no active or visible window anymore.
Comment 16 David Ascher (:davida) 2008-06-13 09:36:36 PDT
FWIW, on OS X I get the alert dialog but I can never dismiss it.  I always have to kill the process using brutal means.
Comment 17 David Ascher (:davida) 2008-06-18 11:39:26 PDT
David, It'd be good to get this frequent hang fixed sooner rather than later.  I'm putting it on the 3a2 blocker list, but if you don't agree, push back.
Comment 18 David :Bienvenu 2008-06-18 13:15:38 PDT
I agree - I have to think how I can fix it without duplicating the work Mark is doing to make nsAbView more sane...
Comment 19 David :Bienvenu 2008-06-28 11:14:42 PDT
I forgot that davida's not using the sidebar so the abview stuff is not involved. 

When this bug happens, we end up creating two ldap connections, and only clean up one.  If you've only got one compose window open, we should only ever have one ldap autocomplete session, and thus one ldap connection open at a time.

I was able to reproduce this problem with a release build, but I have not been able to recreate it with a debug build, probably because of timing issues.
Comment 20 David :Bienvenu 2008-06-29 08:37:56 PDT
My plan for this was to have either the ldap service or the ldap servers keep track of the connections they've opened, let them listen for shutdown events, and close connections that haven't been closed yet. This general approach has worked for the mailnews protocols. But LDAP connections are created straight from the consumers, and the consumers are responsible for caching them and keeping track of them. This works fine as long as the consumers don't have any bugs or memory leaks or cycles.

Rather than change all the consumers, my thinking right now is just to make the ldap connections register with the ldap service when they're created, and unregister when they're closed.  The ldap service would be a shutdown observer, and close registered connections when it gets a shutdown notification.

The alternative is to make all the consumers create and free their connections using the existing ldap service mechanisms, and then extend the ldap service to be a shutdown observer, as above. That would be a bigger change, and involve using a lot of code in the ldap service that is untested, afaik. My understanding is that code was never quite finished, though it might just be that the consumers were never changed to use it.
Comment 21 David :Bienvenu 2008-06-30 09:40:19 PDT
I've spent a fair amount of time trying to reproduce a leak in ldap connections from the compose window, without any luck at all. I also spent a fair amount of time trying to figure out some minimally disruptive way to force an ldap connection cleanup on shutdown, without much luck.  LDAP connections really only want to get cleaned up by their destructor. I can expose the close() method but I worry that's going to exorcise unexpected code paths in LDAP.

One thing I noticed was that it can take 5 seconds or longer for autocomplete sessions to get garbage collected, which means ldap connections can linger quite a while after the compose window is closed.

Since I was having so much trouble reproducing leaks with non-SSL ldap, I thought maybe using SSL was causing a race condition so I found a publicly available ldap server that works over SSL. I couldn't reproduce any connections leak, but I quickly reproduced the alert on shutdown originally reported. The ssl connection was in close_wait state, which means we had called ldap_unbind, but the connection hadn't been completely torn down yet from nss's point of view. When I dismiss the alert, the app doesn't shut down, though the UI is all gone. I don't know if davida is seeing the same issue w/o the alert - it's possible that for some reason we were unable to put up the alert, or it could be a difference between windows and mac...

This all means that forcing the connection to close on shutdown is not sufficient to fix this issue. Closing the connection explicitly from js, before garbage collection runs might help simply by giving the connection more time to close, but it's a hack. 

It's interesting that SSL IMAP doesn't have this problem - I don't know if it's because IMAP connections get cleaned up explicitly earlier, or if there's something about the LDAP CDK that's different. There's a comment in the code that indicates that ldap_unbind might be synchronous...
Comment 22 David :Bienvenu 2008-06-30 10:01:09 PDT
This time I left just the compose window open, so that when I closed it, the app would shut down quickly. In that case, I got the alert from nss before the gc ran to clean up the ldap connection. Oddly, this time the app did shut down, after I dismissed the alert, the autocomplete session was gc'd, and the ldap connection closed down. Which leads me to think that one hack would be to force gc when the compose window closes, if possible.

Davida, do you see this problem mainly when you shutdown right after closing the compose window, or does it happen a lot when the compose window hasn't been open for a while?
Comment 23 David :Bienvenu 2008-06-30 11:32:51 PDT
Turns out that the compose window itself forces a garbage collection in its on close handler, http://mxr.mozilla.org/mozilla/source/mailnews/compose/src/nsMsgCompose.cpp#1371, so I don't think there's any benefit to adding a gc() call in the js (which threw an assertion anyway, for me, saying gc() was not defined). nsMsgCompose only does a gc if recyleIt is true, but it seems to be true for me even when I close the compose window and it's the last window (this is on windows).

I was able to reproduce an honest to goodness connection leak, however. I got nsLDAPAutoCompleteSession::Init to get called with mConnection already set. This guarantees a leak, because mConnection isn't a comptr - assigning over it just leaks the old connection (I suspected this was happening and added an assertion to detect it). So either we need to release the old connection before getting a new connection, or figure out why we're in Init with an existing connection. (the short answer is that the previous attempted connection is in the unbound state, either because we haven't starting binding, or it got unbound, I guess)
Comment 24 David :Bienvenu 2008-06-30 13:19:31 PDT
Created attachment 327461 [details] [diff] [review]
fix for connection leak in ldap autocomplete session

this fixes the bug that can be recreated this way:

configure an ldap server for autocomplete (preferably ssl)
startup
open a new compose window
type a couple letters in the addressing widget
pause long enough for autocomplete to start (2-3 seconds)
type a third character

the autocomplete that started after the first two characters will start binding but not finish. The third character will cause nsLDAPAutoCompleteSession::OnStopLookup to get called, which aborts the current binding. My fix is to also free the mConnection when that happens. I also added an assertion when we try to call InitConnection with an existing connection.  I'd be happy to add an NS_IF_RELEASE(mConnection) to the top of InitConnection as well, after the assertion, if people think that's worthwhile.
Comment 25 David :Bienvenu 2008-06-30 13:42:22 PDT
A couple notes - you need to cause the problem with the first couple characters you type in a compose window; otherwise, the ldap session is cached and lives until the compose window is closed. Also, I'd like people seeing this problem try a nightly build with the fix to see if it indeed fixes their issue, before investigating fixes for the shutdown before gc finishes issue I discovered earlier.
Comment 26 Marc Sherman 2008-07-01 06:54:49 PDT
Is there any way to take my email address off the cc list for this bug report? As stated earlier, I stopped using ldaps in thunderbird soon after I filed the bug.
Comment 27 David :Bienvenu 2008-07-01 07:51:05 PDT
optimistically marking fixed. Marc, I don't know how to stop getting notifications for bugs you yourself filed. We can open a new bug for remaining issues, if any.
Comment 28 Wayne Mery (:wsmwk, NI for questions) 2008-11-03 06:14:37 PST
*** Bug 375667 has been marked as a duplicate of this bug. ***
Comment 29 Mark Banner (:standard8) 2009-11-20 01:05:29 PST
*** Bug 529991 has been marked as a duplicate of this bug. ***
Comment 30 Alex 2010-10-19 01:47:44 PDT
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4

The behavoir mentioned in the first description occur again.

I use the address book with ldap-support and secure connection (ssl).

I got an alert if I do the following steps: 

1. start Thunberbird
2. open Address Book
3. select the ldap address book
4. search for "." (--> getting the whole entries of the address book correctly)
5. close Address Book
6. close Thunderbird --> Alert: "The operation can not be completed ..."
Comment 31 Alex 2010-10-19 23:13:08 PDT
... same problem with Thunderbird 3.1.5
Comment 32 chinny.tp 2014-12-30 04:37:23 PST
[Tracking Requested - why for this release]:

Note You need to log in before you can comment on or make changes to this bug.