Closed Bug 382446 Opened 17 years ago Closed 16 years ago

Alert on shutdown when using ldaps address book - "operation can not be completed because of an internal failure. A secure network communication has not been cleaned up correctly."

Categories

(MailNews Core :: Address Book, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1a1

People

(Reporter: msherman, Assigned: Bienvenu)

References

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Build Identifier: version 2.0.0.0 (20070326)

When an ldaps address book is configured in thunderbird, I get an Alert message box when I shut down thunderbird. The alert reads:

The operation can not be completed because cf (sic) an internal failure. A secure network communication has not been cleaned up correctly.

The ldaps address book does appear to function correctly, and after dismissing the alert, thunderbird appears to shut down cleanly.

Reproducible: Sometimes

Steps to Reproduce:
[note, this happens about 80% of the time -- I occasionally am able to quit without seeing the alert, but it happens more often than not.]

1. Configure ldaps address book
2. look up an address in it
3. quit thunderbird
Actual Results:  
Alert pops up before quitting

Expected Results:  
No alert
Attached image screenshot of the alert
Argh. I guess I was transcribing from a shrunken view of the screen shot -- it doesn't say "cf", it says "of" correctly. :)
this is because we're not cleaning up an ldap connection before shut down.

Does this only happen in a session where you add the ldaps address book, or does it happen in subsequent sessions where you just look up an address in the ldap address book? Are you looking up the address via the compose window, or the address book window?
It happens in subsequent sessions. I just tried a session where I only used the compose window, and another one where I only used the address book window, and it happened in both sessions.
Marc, ever not have the alert and shutdown still not complete?
Assignee: mscott → nobody
Component: Address Book → MailNews: Address Book
Product: Thunderbird → Core
QA Contact: address-book → addressbook
Summary: Alert on shutdown when using ldaps address book → Alert on shutdown when using ldaps address book - "operation can not be completed because of an internal failure. A secure network communication has not been cleaned up correctly."
Sorry, I'm not using ldaps any more, so I can't confirm.
FWIW, I'm still seeing this with trunk nightlies.
Flags: wanted-thunderbird3+
is it possible to have the shutdown hang caused by ldap, and not get the alert?  
ref my bug 420744.
Wayne, if you were asking me, I don't know.  I'm not able to reproduce it reliably yet.
confirming 

related? to Bug 188554 – Profile switching fails after having used LDAP/SSL ?

I have a colleague how sees this error message on Mac - OS=ALL [not that david doesn't count :)] - "consistently, Tbird (I have version 2.0.0.12 (20080213)) crashed when I Disconnect from Lehigh's VPN. This was the case both in Mac OS X 10.4 Tiger and now 10.5 Leopard".  

And also can create on windows - use an sldap (SSL ldap) search, close vpn, then close thunderbird, hang. (no error message)

perhaps the same as my bug 420744, but I don't see an error message at shutdown. bit of a pain to have to constantly kill the task.
Blocks: 420744
Severity: normal → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: wanted-thunderbird3.0a2?
Keywords: qawanted
OS: Windows XP → All
Hardware: PC → All
I see this all the time, marking wanted for tb3.0a2 and blocking for 3.
Flags: wanted-thunderbird3.0a2?
Flags: wanted-thunderbird3.0a2+
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3+
(In reply to comment #12)
> Depends on bug 438035 ?
> 
Not really, I forgot to comment on this bug the David Ascher hasn't been using the sidebar (he didn't know about it), which therefore means we have two bugs not one, and additionally that bug won't stop this one being fixed and vice versa.
Keywords: qawanted
My bug was marked as a dup of this bug which is OK, however in my case I am sure that it only *seems* to exit after confirming the dialog. In fact it remains running and visible in XP's task manager, even though there is no active or visible window anymore.
FWIW, on OS X I get the alert dialog but I can never dismiss it.  I always have to kill the process using brutal means.
David, It'd be good to get this frequent hang fixed sooner rather than later.  I'm putting it on the 3a2 blocker list, but if you don't agree, push back.
Assignee: nobody → bienvenu
Flags: wanted-thunderbird3.0a2+ → blocking-thunderbird3.0a2+
I agree - I have to think how I can fix it without duplicating the work Mark is doing to make nsAbView more sane...
Whiteboard: [to be looked at]
I forgot that davida's not using the sidebar so the abview stuff is not involved. 

When this bug happens, we end up creating two ldap connections, and only clean up one.  If you've only got one compose window open, we should only ever have one ldap autocomplete session, and thus one ldap connection open at a time.

I was able to reproduce this problem with a release build, but I have not been able to recreate it with a debug build, probably because of timing issues.
My plan for this was to have either the ldap service or the ldap servers keep track of the connections they've opened, let them listen for shutdown events, and close connections that haven't been closed yet. This general approach has worked for the mailnews protocols. But LDAP connections are created straight from the consumers, and the consumers are responsible for caching them and keeping track of them. This works fine as long as the consumers don't have any bugs or memory leaks or cycles.

Rather than change all the consumers, my thinking right now is just to make the ldap connections register with the ldap service when they're created, and unregister when they're closed.  The ldap service would be a shutdown observer, and close registered connections when it gets a shutdown notification.

The alternative is to make all the consumers create and free their connections using the existing ldap service mechanisms, and then extend the ldap service to be a shutdown observer, as above. That would be a bigger change, and involve using a lot of code in the ldap service that is untested, afaik. My understanding is that code was never quite finished, though it might just be that the consumers were never changed to use it.
I've spent a fair amount of time trying to reproduce a leak in ldap connections from the compose window, without any luck at all. I also spent a fair amount of time trying to figure out some minimally disruptive way to force an ldap connection cleanup on shutdown, without much luck.  LDAP connections really only want to get cleaned up by their destructor. I can expose the close() method but I worry that's going to exorcise unexpected code paths in LDAP.

One thing I noticed was that it can take 5 seconds or longer for autocomplete sessions to get garbage collected, which means ldap connections can linger quite a while after the compose window is closed.

Since I was having so much trouble reproducing leaks with non-SSL ldap, I thought maybe using SSL was causing a race condition so I found a publicly available ldap server that works over SSL. I couldn't reproduce any connections leak, but I quickly reproduced the alert on shutdown originally reported. The ssl connection was in close_wait state, which means we had called ldap_unbind, but the connection hadn't been completely torn down yet from nss's point of view. When I dismiss the alert, the app doesn't shut down, though the UI is all gone. I don't know if davida is seeing the same issue w/o the alert - it's possible that for some reason we were unable to put up the alert, or it could be a difference between windows and mac...

This all means that forcing the connection to close on shutdown is not sufficient to fix this issue. Closing the connection explicitly from js, before garbage collection runs might help simply by giving the connection more time to close, but it's a hack. 

It's interesting that SSL IMAP doesn't have this problem - I don't know if it's because IMAP connections get cleaned up explicitly earlier, or if there's something about the LDAP CDK that's different. There's a comment in the code that indicates that ldap_unbind might be synchronous...
Status: NEW → ASSIGNED
This time I left just the compose window open, so that when I closed it, the app would shut down quickly. In that case, I got the alert from nss before the gc ran to clean up the ldap connection. Oddly, this time the app did shut down, after I dismissed the alert, the autocomplete session was gc'd, and the ldap connection closed down. Which leads me to think that one hack would be to force gc when the compose window closes, if possible.

Davida, do you see this problem mainly when you shutdown right after closing the compose window, or does it happen a lot when the compose window hasn't been open for a while?
Turns out that the compose window itself forces a garbage collection in its on close handler, http://mxr.mozilla.org/mozilla/source/mailnews/compose/src/nsMsgCompose.cpp#1371, so I don't think there's any benefit to adding a gc() call in the js (which threw an assertion anyway, for me, saying gc() was not defined). nsMsgCompose only does a gc if recyleIt is true, but it seems to be true for me even when I close the compose window and it's the last window (this is on windows).

I was able to reproduce an honest to goodness connection leak, however. I got nsLDAPAutoCompleteSession::Init to get called with mConnection already set. This guarantees a leak, because mConnection isn't a comptr - assigning over it just leaks the old connection (I suspected this was happening and added an assertion to detect it). So either we need to release the old connection before getting a new connection, or figure out why we're in Init with an existing connection. (the short answer is that the previous attempted connection is in the unbound state, either because we haven't starting binding, or it got unbound, I guess)
this fixes the bug that can be recreated this way:

configure an ldap server for autocomplete (preferably ssl)
startup
open a new compose window
type a couple letters in the addressing widget
pause long enough for autocomplete to start (2-3 seconds)
type a third character

the autocomplete that started after the first two characters will start binding but not finish. The third character will cause nsLDAPAutoCompleteSession::OnStopLookup to get called, which aborts the current binding. My fix is to also free the mConnection when that happens. I also added an assertion when we try to call InitConnection with an existing connection.  I'd be happy to add an NS_IF_RELEASE(mConnection) to the top of InitConnection as well, after the assertion, if people think that's worthwhile.
Attachment #327461 - Flags: superreview?(neil)
Attachment #327461 - Flags: review?(bugzilla)
A couple notes - you need to cause the problem with the first couple characters you type in a compose window; otherwise, the ldap session is cached and lives until the compose window is closed. Also, I'd like people seeing this problem try a nightly build with the fix to see if it indeed fixes their issue, before investigating fixes for the shutdown before gc finishes issue I discovered earlier.
Attachment #327461 - Flags: superreview?(neil) → superreview+
Attachment #327461 - Flags: review?(bugzilla) → review+
Is there any way to take my email address off the cc list for this bug report? As stated earlier, I stopped using ldaps in thunderbird soon after I filed the bug.
optimistically marking fixed. Marc, I don't know how to stop getting notifications for bugs you yourself filed. We can open a new bug for remaining issues, if any.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [to be looked at]
Target Milestone: --- → mozilla1.9.1a1
Version: unspecified → Trunk
Product: Core → MailNews Core
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4

The behavoir mentioned in the first description occur again.

I use the address book with ldap-support and secure connection (ssl).

I got an alert if I do the following steps: 

1. start Thunberbird
2. open Address Book
3. select the ldap address book
4. search for "." (--> getting the whole entries of the address book correctly)
5. close Address Book
6. close Thunderbird --> Alert: "The operation can not be completed ..."
... same problem with Thunderbird 3.1.5
[Tracking Requested - why for this release]:
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: