Last Comment Bug 79509 - avoid stalling out if ldap C SDK hangs during connect() (LDAP server/service not available/unresponsive causes hang/freeze or causes mozilla to stop responding)
: avoid stalling out if ldap C SDK hangs during connect() (LDAP server/service ...
Status: NEW
problems found while testing
: perf
Product: Directory
Classification: Components
Component: LDAP XPCOM SDK (show other bugs)
: other
: All All
: P1 major with 5 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 50074 90563 90815 99423 101040 105112 126283 130967 306963 533656 (view as bug list)
Depends on: 102270
Blocks: 373167 82317 104166 122274
  Show dependency treegraph
 
Reported: 2001-05-08 13:00 PDT by Dan Mosedale (:dmose)
Modified: 2014-08-08 11:18 PDT (History)
33 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
C SDK patch to explicitly init a variable, v1 (528 bytes, patch)
2001-08-14 21:35 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
patch for C SDK, v2 (4.07 KB, patch)
2001-08-15 15:07 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
patch for XPCOM SDK, v1 (920 bytes, patch)
2001-08-15 16:38 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
LDAP XPCOM SDK patch to turn on async connections by default, v2 (920 bytes, patch)
2001-09-20 17:38 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
patch to C-SDK, v3 (5.19 KB, patch)
2001-09-22 20:34 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
threadsafety patch to nsLDAPAutoCompleteSession, v1 (1.09 KB, patch)
2001-09-24 13:57 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
patch to C-SDK, v4 (5.44 KB, patch)
2001-09-25 18:00 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
C SDK changes, v5 (6.73 KB, patch)
2001-09-27 17:39 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
C SDK changes, v6 (8.56 KB, patch)
2001-11-06 18:05 PST, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
async patch,. v7 (work-in-progress) (2.84 KB, patch)
2002-04-14 23:12 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
experimental libldap changes (3.76 KB, patch)
2002-04-18 15:05 PDT, Mark Smith (:mcs)
no flags Details | Diff | Splinter Review
async patc, v8 (work-in-progress) (6.58 KB, patch)
2002-04-18 17:02 PDT, Dan Mosedale (:dmose)
no flags Details | Diff | Splinter Review
dmose's async patch, merged with trunk (1.55 KB, patch)
2014-08-08 11:18 PDT, Jon Robison
no flags Details | Diff | Splinter Review

Description Dan Mosedale (:dmose) 2001-05-08 13:00:30 PDT
There is an "aync reconnect" option in the C SDK which can be set.  We should
use this.  Last time I checked, using it at least doesn't break anything on
Windows and Mac for the success case, but does on Linux.  So we'll probably need
to whack the C SDK a bit for this, and then test to make sure it things work in
both the success case and the case where there is actually a problem during
connect().
Comment 1 yulian chang 2001-05-10 15:47:42 PDT
reassign to Olga as QA contact
Comment 2 Jussi-Pekka Mantere 2001-07-23 14:54:04 PDT
Marking P2, 0.9.4.
Comment 3 Dan Mosedale (:dmose) 2001-08-14 21:34:47 PDT
The reason async connects were often causing crashes under linux is that in the
asynchronous case, wait4msg() in the LDAP C SDK depends on the compiler to
initialize automatics variables to 0.  This isn't guaranteed by the C language,
although it apparently is done by the Windows and Mac compilers.  I'm attaching
a one-liner patch for the C SDK problem.  mhein or mcs, can you review?

Another patch to turn async I/O on in the LDAP XPCOM SDK will be forthcoming...
Comment 4 Dan Mosedale (:dmose) 2001-08-14 21:35:44 PDT
Created attachment 45886 [details] [diff] [review]
C SDK patch to explicitly init a variable, v1
Comment 5 Mark Smith (:mcs) 2001-08-15 07:24:29 PDT
The patch looks fine, so you certainly have module owner approval.

But the code in result.c:wait2msg() that uses the potentially unitialized lr 
variable seems incomplete to me.  The code looks like this:

 else if (ld->ld_options & LDAP_BITOPT_ASYNC) {
    if ( lr
            && lc->lconn_status == LDAP_CONNST_CONNECTING
            && nsldapi_iostatus_is_write_ready( ld,
            lc->lconn_sb ) ) {
         rc = nsldapi_ber_flush( ld, lc->lconn_sb, lr->lr_ber 0, 1 );
         ...
    }
 }

The problem I see is that lr is set earlier in the function ONLY if you request 
a specific message id (i.e., when you call ldap_result() with positive integer 
as the second parameter).  But the code above is trying to flush any pending 
requests to a server for which the TCP connect just completed, so it seems like 
it should loop through all of the outstanding requests that are associated with 
that connection and call nsldapi_ber_flush().  Of course if you never initiate 
more than one (or any) LDAP operations before the connect occurs, you will never 
run into problems because of this.  Should we open a separate bug?
Comment 6 Dan Mosedale (:dmose) 2001-08-15 15:07:09 PDT
Created attachment 45976 [details] [diff] [review]
patch for C SDK, v2
Comment 7 Dan Mosedale (:dmose) 2001-08-15 16:35:03 PDT
OK, I think this is a reasonable patch.  It attempts to free all requests
associated with a connection when something goes wrong during the flush.  Also
worth nothing is that in my previous fix, this code path actually NEVER got
followed in the LDAP_MSG_ANY case, because lr was always 0.  So
LDAP_CONNST_CONNECTED was only ever set in request.c, never here in result.c.
Comment 8 Dan Mosedale (:dmose) 2001-08-15 16:38:08 PDT
Created attachment 45999 [details] [diff] [review]
patch for XPCOM SDK, v1
Comment 9 Dan Mosedale (:dmose) 2001-08-15 16:39:39 PDT
The above patch turns on async connections in the XPCOM SDK.  leif, can you  r=
that one?
Comment 10 Mark Smith (:mcs) 2001-08-15 19:30:41 PDT
With regard to the C SDK v2 patch:

The connection status should be set to "connected" regardless of whether the 
flushes succeed.  I think.

nsldapi_ber_flush() will return -2 if the I/O would block but the proposed 
flush_connection_requests() function treats that as a fatal error.  I think the 
there is a general problem in the C SDK LDAP_OPT_ASYNC code where a request that 
is partially flushed will never be completely flushed (I just do not see where 
that would happen, since nsldapi_ber_flush() is only called:

1) When a connection completes (in result.c; the subject of this patch)

2) When an LDAP operation is first issued (inside 
request.c:nsldapi_send_server_request())

3) When an unbind request is sent.

The logical place to do the necessary nsldapi_ber_flush() calls is inside 
result.c:wait4msg(), since that is where we poll() for the socket state.  It 
seems like the right strategy would be:

a) If a ber_flush() call returns "would block", call 
nsldapi_iostatus_interest_write() for the connection.

b) Check for "write ready" status inside wait4msg() regardless of whether the 
connection state is "connecting" or "connected."  If it is "connecting", do the 
things in the patch.  If it is "connected", just call 
flush_connection_requests(). 
Comment 11 grega 2001-09-17 11:41:32 PDT
really adding chofmann to the cc and marking nsbranch+
Comment 12 Dan Mosedale (:dmose) 2001-09-19 13:06:38 PDT
*** Bug 99423 has been marked as a duplicate of this bug. ***
Comment 13 Dan Mosedale (:dmose) 2001-09-20 17:38:18 PDT
Created attachment 50179 [details] [diff] [review]
LDAP XPCOM SDK patch to turn on async connections by default, v2
Comment 14 Dan Mosedale (:dmose) 2001-09-20 18:33:40 PDT
mcs: I don't see anything in the 4.0 SDK with a name resembling
nsldapi_iostatus_interest_write().  Is there another way this is done with the
4.0 code?
Comment 15 Mark Smith (:mcs) 2001-09-21 06:59:58 PDT
Sorry.  The function is called nsldapi_is_write_ready() in the 4.x SDK.
Comment 16 Dan Mosedale (:dmose) 2001-09-21 16:11:14 PDT
*** Bug 101040 has been marked as a duplicate of this bug. ***
Comment 17 Dan Mosedale (:dmose) 2001-09-22 20:34:33 PDT
Created attachment 50435 [details] [diff] [review]
patch to C-SDK, v3
Comment 18 Dan Mosedale (:dmose) 2001-09-22 20:47:40 PDT
mcs: OK, here's a new version of the patch that does what you suggest.  I assume
that when you said

> a) If a ber_flush() call returns "would block", call 
> nsldapi_iostatus_interest_write() for the connection.
>
> [...]
>
> Sorry.  The function is called nsldapi_is_write_ready() in the 4.x SDK.

that you really meant to say that I should call nsldapi_mark_select_write() each
time flushing would block, and the patch is written with that assumption.

There is an XXXdmose comment in the code, I assume I don't really need to call
nsldapi_mark_select_read() in the case where we're already connected, but I just
wanted to be sure.

Interestingly, in the case where the LDAP connect is working, I see lots of
"flush_connection_requests would block" messages in the debug log, though they
eventually go away.  The thing that's odd is that LDAP autocomplete worked just
fine even with the previous rev of this patch, which didn't have the flushing,
so given this debug message, I would have expected previous versions of the
patch not to work.  One conceivable explanation for this is that
nsldapi_ber_flush() will get called the next time a request gets sent out, but
in that case it would get called with a different BerElement arg. And the LDAP
autocomplete code only sends its requests to the C SDK one at a time anyway, so
I'm not really convinced that that's what's going on.

Thoughts appreciated....


Comment 19 Dan Mosedale (:dmose) 2001-09-24 13:57:31 PDT
Created attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1
Comment 20 Dan Mosedale (:dmose) 2001-09-24 13:58:34 PDT
Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

This patch changes the LDAP autocomplete session to use the threadsafe ISUPPORTS macros, as per the discussion in bug 101252.
Comment 21 Mark Smith (:mcs) 2001-09-24 14:03:14 PDT
Your assumption about nsldapi_mark_select_write() is correct.  Sorry.

I think you do need to call nsldapi_mark_select_read() in the connected case, 
because the code inside nsldapi_send_server_request() does not call same if it 
encounters a "would block" situation.

I am not sure why you see the "flush_connection_requests would block" messages 
and things still work.  Does not quite add up, as you say.  Unfortunately I 
don't have time today to puzzle it out (too many meetings).

Your patch looks fine except for one thing:  I think free_connection_requests() 
needs to save the lr->lr_next pointer into a local variable before it calls 
nsldapi_free_request() (otherwise it will access freed memory the next time 
around the loop when it dereferences lr to grab lr->lr_next, won't it?)  See the 
code in bind.c/ldap_ld_free() for an example.
Comment 22 Jaime Rodriguez, Jr. 2001-09-24 16:59:10 PDT
Pls provide an ETA for this one.
Comment 23 Dan Mosedale (:dmose) 2001-09-25 18:00:07 PDT
Created attachment 50807 [details] [diff] [review]
patch to C-SDK, v4
Comment 24 Mark Smith (:mcs) 2001-09-26 06:56:49 PDT
The latest C-SDK patch (v4) looks good.  r=mcs
Comment 25 Leif Hedstrom 2001-09-26 12:10:46 PDT
Comment on attachment 50179 [details] [diff] [review]
LDAP XPCOM SDK patch to turn on async connections by default, v2

r=leif
Comment 26 Leif Hedstrom 2001-09-26 12:12:35 PDT
Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

r=leif
Comment 27 yulian chang 2001-09-27 15:47:55 PDT
Reassign QA contact to yulian@netscape.com
Comment 28 Dan Mosedale (:dmose) 2001-09-27 17:39:31 PDT
Created attachment 51167 [details] [diff] [review]
C SDK changes, v5
Comment 29 Dan Mosedale (:dmose) 2001-09-27 17:45:32 PDT
OK, the latest C-SDK patch makes the following changes:

* turns on the async functionality for Windows

* adds an nsldapi_mark_select_write()call to nsldapi_new_connection().  This
causes the appropriate if() clause in wait4msg() to notice if an error happens
while in the connecting state (eg connection refused).  I don't think it should
case any other ill-effects, but I'd love confirmation of that.  Also, mcs, do I
need to clear out the read and write fdsets ever, or is that taken care of? 
Turns out the exception fdset isn't very interesting, because it's only used for
TCP out-of-band data and some pseudo-TTY junk.
Comment 30 Dan Mosedale (:dmose) 2001-09-27 18:36:06 PDT
Interestingly, the nsldapi_mark_select_write change seems to only fix the
"connection refused" not being noticed problem on Linux.  On windows, the
problem is still there.
Comment 31 Dan Mosedale (:dmose) 2001-09-27 21:45:33 PDT
Turns out that on Windows, unlike Unix, I _do_ need to use the exception fdset
to notice a connect error.  New patch in progress...
Comment 32 Jussi-Pekka Mantere 2001-10-01 14:28:48 PDT
Too risky for 0.9.5 at this time; pushing out to 0.9.6. Removing nsbranch+ 
nomination, setting back to nsenterprise.
Comment 33 Hong Kwon 2001-10-09 19:06:57 PDT
marking nsenterprise-; will be reevaluated for nsenterprise in future release.

Comment 34 Dan Mosedale (:dmose) 2001-10-22 12:08:16 PDT
*** Bug 105112 has been marked as a duplicate of this bug. ***
Comment 35 Dan Mosedale (:dmose) 2001-10-23 17:32:27 PDT
*** Bug 90563 has been marked as a duplicate of this bug. ***
Comment 36 Dan Mosedale (:dmose) 2001-11-06 18:05:12 PST
Created attachment 56833 [details] [diff] [review]
C SDK changes, v6
Comment 37 Dan Mosedale (:dmose) 2001-11-06 18:45:38 PST
While testing the previous version of the patch, I made
flush_connection_requests return -1 always in order to test that code path, and
I could make it race and crash about every 3rd or 4th time I used LDAP autocomplete.

What appeared to be going on was this: 

The LDAP connection thread would call ldap_result(), which would end up down in
wait4msg() which would see the error in flush_connection_requests, and then
would call nsldapi_free_connection(), which actually decrements the refcount on
the connection, and frees itif appropriate.  wait4msg never actually incremented
the refcount, so I believe the reason that it was non-zero in wait4msg is
because this happened to be the default connection for the session.
Anyway, the connection would get freed from wait4msg, and then when an operation
would be started from another thread, the other thread would increment the
refcount (through a free()d pointer!) attempt to use the connection, hit an
error, and then try and free the connection pointer again.

So I've attached a new version of the patch, which simply doesn't call
nsldapi_free_connection from inside wait4msg when there's an error.  See the
XXXdmose comments in the patch for some details about possible problems.

I'm not particularly convinced this patch is the right fix, but it's one
possibility.  I sort of suspect that a better fix would be to make wait4msg
increment the connection refcount before use.  

Also, should nsldapi_free_connection() check to see if the connection is the
default connection for a session, and if so, set the default connection to
either NULL or another connection?

The real problem I'm having here is that it's not entirely clear from reading
the code what the refcounting model is supposed to be: do only LDAPRequests hold
references to a connection (except for the "never go away property of the
default connection")?  Or can/should other things (eg ldap_result/wait4msg) get
references to a connection?

mcs/mhein: any light you can shed here would be greatly appreciated.
Comment 38 scottputterman 2001-11-15 00:11:49 PST
Moving to 0.9.7 since it looks like there's still some work left to do.
Comment 39 Mark Smith (:mcs) 2001-11-19 18:21:54 PST
Sorry for the delay in looking at the recent comments from Dan.

lconn_refcnt is supposed to be incremented when a new LDAP operation (request)
is sent on a connection and decremented when the LDAPResult message is received
for a request.  Functions like wait4msg() should not need to increment the
refcnt, at least in theory.  Obviously there is a bug somewhere.

With the previous patch, were there more calls to nsldapi_free_connection() than
to nsldapi_send_server_request()?

Or, is it possible that the call to flush_connection_requests() and related code
in wait4msg() was being executed when no outstanding requests were present?

Also, I can't remember why we didn't just call nsldapi_free_connection() once
for each request we free inside free_connection_requests().
Comment 40 Dan Mosedale (:dmose) 2001-11-28 18:24:22 PST
*** Bug 90815 has been marked as a duplicate of this bug. ***
Comment 41 Dan Mosedale (:dmose) 2002-01-08 16:14:41 PST
Mass change: pushing out 0.9.8 bugs, as I'm concentrating feature work now.
Comment 42 Dan Mosedale (:dmose) 2002-02-19 13:36:31 PST
*** Bug 126283 has been marked as a duplicate of this bug. ***
Comment 43 saari (gone) 2002-03-11 17:04:24 PST
ADT needs more info. as to what the impact of this bug will be
Comment 44 Dan Mosedale (:dmose) 2002-04-14 23:03:34 PDT
Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

Already checked in.
Comment 45 Dan Mosedale (:dmose) 2002-04-14 23:04:24 PDT
Comment on attachment 56833 [details] [diff] [review]
C SDK changes, v6

This patch was only relevant to the C-SDK v4.
Comment 46 Dan Mosedale (:dmose) 2002-04-14 23:10:47 PDT
ADT: the deal is that any LDAP operation (autocomplete, addressbook search,
replication) against a server which is unreachable (eg because a laptop has been
taken outside a firewall) will cause the entire app-suite to freeze for the
length of the TCP timeout (typically long enough that most users will give up
and kill the program).

I've got a patch which fixes this for cleartext connections, but breaks SSL
connections entirely.  I'll attach it as a work-in-progress; worst case it will
be possible to confine this fix to only be used on cleartext connections.  I'll
keep working on the SSL-piece.
Comment 47 Dan Mosedale (:dmose) 2002-04-14 23:12:38 PDT
Created attachment 79224 [details] [diff] [review]
async patch,. v7 (work-in-progress)
Comment 48 Vadim Berezniker 2002-04-17 19:50:40 PDT
*** Bug 130967 has been marked as a duplicate of this bug. ***
Comment 49 Vadim Berezniker 2002-04-17 19:57:30 PDT
clarifying summary ... it's kind of weird but hopefully it will make this bug
easier to find. 
Comment 50 Mark Smith (:mcs) 2002-04-18 14:59:40 PDT
I am starting to see why someone put the LDAP_OPT_ASYNC_CONNECT in
ldap-deprecated.h. But maybe we can fix it. I tried enabling that option with a
simple LDAP client (outside of the Mozilla client) that uses the LDAP C SDK's
libssldap layer. It failed with "timeout" errors. I debugged a bit and found a
few problems. The one I have not yet solved is that libldap needs to know if the
connection really completed or not. I will attach a patch that shows what I mean.

Also, note that you should be able to reduce the connect timeout when you are
NOT using the LDAP_OPT_ASYNC_CONNECT option by making a call like this:

int ms = 10000; /* 10 seconds, in milliseconds */
if ( ldap_set_option( 0, LDAP_X_OPT_CONNECT_TIMEOUT, *ms ) != 0 ) {
      /* error */
}
Comment 51 Mark Smith (:mcs) 2002-04-18 15:05:05 PDT
Created attachment 79877 [details] [diff] [review]
experimental libldap changes

3 changes:

1) Fix a silly bug in nsldapi_iostatus_poll() that causes it to mistakenly
generate timeout errors.

2) Add a hack that says "the connection did not yet complete." This is a real
hack, because we should not do this unconditionally. I need to figure out a way
to pass back the "not yet connected" status via libldap's (and libprldap's)
CONNECT callback function.

3) Set rc to -2 so we loop waiting for LDAP results after a connect completes.
I vaguely remember Dan fixing this before. Or at least talking about it. I
don't understand at all why we would set rc to LDAP_RES_BIND (I didn't write
this particular code ;-)

With these changes in place, my drop deal simple LDAP over SSL async test
program works.
Comment 52 Dan Mosedale (:dmose) 2002-04-18 17:02:10 PDT
Created attachment 79905 [details] [diff] [review]
async patc, v8 (work-in-progress)

Woot!  mcs' changes made async work with SSL also.  I've incorporated his
changes into V8 of the patch.  Additionally, I think I'm gonna have to
incorporate the (not-yet-entirely-complete) patches that I was working on for
the LDAP C SDK v4 (ie patch v6 in this bug) before checking in, or this change
will just expose other problems.  That's my next step.
Comment 53 Mark Smith (:mcs) 2002-04-19 07:15:55 PDT
Comment on attachment 79905 [details] [diff] [review]
async patc, v8 (work-in-progress)

Yes, you will need some of the changes from patch v6. Let me know if you need
help sorting out the unknown issues.

As you and I discussed via AIM yesterday, the changes you propose for
nsldapi_new_connection() to always set the state to LDAP_CONNST_CONNECTING
should be safe and are not as much of a hack as I originally thought. But it
would be a little better to put the code inside nsldapi_connect_to_host(); that
way, we can return -2 (connect not yet completed) for async connects that are
processed using the extended I/O callbacks only (because in the case where
there are no is no connect I/O callback, the return code is already set to -2
if appropriate).
Comment 54 Dan Mosedale (:dmose) 2002-04-22 16:19:28 PDT
*** Bug 50074 has been marked as a duplicate of this bug. ***
Comment 55 Dan Mosedale (:dmose) 2002-04-23 20:44:36 PDT
I spent yesterday and today wrapping my head around the various codepaths taken
when making LDAP connections.  It turns out that the code in the tree (and in
patch v8 here, I think) never actually returns -2 for EINPROGRESS, even though
there's some code to handle that case and it's documented that way.

The deal is that both the extended connect command and the nsldapi_try_each_host
return a socket file descriptor (which may or may not be used, depending on the
particular call-stack) or -1 on error.  So it's not possible to return -2,
because that would mean not returning the socket fd itself, which is required in
some cases (in the USE_NATIVE_OS_SOCKETS case, as well as the
nsldapi_ext_compat_connect / old IO callbacks case).

The reason that we get away with it in the current patch is that
LDAP_CONNST_CONNECTING is unconditionally set when we're using LDAP_BITOPT_ASYNC.

Presumably a better fix would be to somehow tweak the extended connect and
nsldapi_try_each_host APIs to pass back more information.  Another option might
be to say that LDAP_BITOPT_ASYNC isn't supported for those cases where the
socket fd is actually used, and then change those two functions to return -2 for
EINPROGRESS.

Thoughts?
Comment 56 Dan Mosedale (:dmose) 2002-04-23 20:51:10 PDT
Hmmm, there may still be a problem with the above explanation, because it
doesn't explain why patch v7 worked in the non-SSL case... if the above
explanation were exactly correct, there's no way LDAP_CONNST_CONNECTING could
ever be set.
Comment 57 Mark Smith (:mcs) 2002-04-24 11:30:21 PDT
Maybe the v7 patch worked by chance (if the connect happens to complete before
and writes or reads are attempted on the socket, the LDAP code will work fine).

But you are right about the -2... I probably broke that when I added the
extended I/O callbacks. I don't think we should bother changing any APIs right
now though. If we assume that the connect may be in progress whenever
LDAP_OPT_ASYNC is set, then I believe everything will work fine. We can clean up
the other things later.
Comment 58 (not reading, please use seth@sspitzer.org instead) 2002-05-08 13:08:44 PDT
I think gayatri, laurel and yulian were seeing this bug this morning, as
nsdirectory.netscape.com:389 was not responding.

D:\builds\trunk\mozilla>telnet nsdirectory.netscape.com 389
Connecting To nsdirectory.netscape.com...Could not open a connection to host on
port 389 : Connect failed

they were doing ldap autocomplete.  what they were seeing was the UI thread
hang, and then a while later (a long while, in some cases) the message about
"ldap connection failed" would appear in the address field.
Comment 59 Gayatri Rimola 2002-05-08 14:01:53 PDT
Well, I never saw the "ldap connection failed" message but maybe I didn't wait
long enough.  In task manager all the netscape proceses were showing "not
responding". However, it does sound like the same problem.
Comment 60 yulian chang 2002-05-08 14:52:31 PDT
Comment #34:
Bug 105112 reported that Linux hangs too long to get <LDAP server connection
failed> error back. 

On Win2K, it took about 1.5 minutes to get the error message.
Comment 61 Mark Smith (:mcs) 2002-05-08 14:59:48 PDT
It would be very easy to implement a consistent and short timeout for
autocomplete on all platforms.  See the second half of my comment # 50. IMHO a
timeout of 10 seconds seems more than long enough for autocomplete (of course it
should be configurable).
Comment 62 Dan Mosedale (:dmose) 2002-05-08 18:49:14 PDT
mcs: thanks for pointing that out; I had totally spaced that part of your
comment.  I've spun off bug 143172 for this in the hopes of getting it into
Mozilla 1.0 since it's such a small fix.
Comment 63 Michael Buckland 2002-05-22 14:11:44 PDT
Discussed at mail news bug meeting.  Decided to minus this bug.
Comment 64 Paul Prescod 2002-10-18 21:49:37 PDT
LDAP hangs for much longer than 10 seconds for me. I've waited HOURS on two
different occasions. Both times it was after typing REALLY LONG emails that I
eventually lost when I had to kill and restart Mozilla. :( I don't know how to
repro it though.
Comment 65 Mark Smith (:mcs) 2003-01-14 09:25:01 PST
I have been working on various async. I/O improvements inside the LDAP library
code. I just attached a patch (work on progress) to bug 140182 (attachment
111518 [details] [diff] [review]). I have not yet tried the new code inside the Mozilla client.
Comment 66 Mark Smith (:mcs) 2003-10-01 10:26:18 PDT
The fix for bug 140182 has been committed.  It includes the changes in this bug
that are under mozilla/directory/c-sdk (and many more fixes too).
Comment 67 Axxackall 2004-01-16 13:50:49 PST
This bug was a showstopper in the project of migrating Outlook users to Mozilla
Mail I was involved.
Comment 68 Axxackall 2004-01-16 15:22:50 PST
Is there any particular reason why voting on this bug is disabled?
Comment 69 Adam Guthrie 2005-09-03 09:27:51 PDT
*** Bug 306963 has been marked as a duplicate of this bug. ***
Comment 70 Mark Smith (:mcs) 2006-07-31 06:14:04 PDT
Does anyone have time to try the nsLDAPProtocolModule.cpp portion of dmose's last patch?  Does it solve this problem?
Comment 71 Dan Mosedale (:dmose) 2007-03-26 16:20:33 PDT
Assigning bugs that I'm not actively working on back to nobody; use SearchForThis as a search term if you want to delete all related bugmail at once.
Comment 72 Serge Gautherie (:sgautherie) 2008-06-20 10:37:53 PDT
Filter on "Nobody_NScomTLD_20080620"
Comment 73 Nikolay Shopik 2011-01-13 13:27:12 PST
*** Bug 533656 has been marked as a duplicate of this bug. ***
Comment 74 boris 2011-09-22 21:18:51 PDT
Anniversary!
This unresolved bug is now more than 10 years old and still causes problems especially in corporate environments.
We should light a candle...
... could someone please at least set the timeout to 300ms and display a warning: e.g. "LDAP timeaout"
Comment 75 David Rees 2013-05-19 14:02:37 PDT
I have found the LDAP Swapping extension useful for turning off LDAP when I am not on the corporate network - https://bugzilla.mozilla.org/show_bug.cgi?id=79509. Still would like to see this just quickly timeout though.
Comment 76 Jon Robison 2014-08-08 11:16:37 PDT
I applied the nsLDAPProtocolModule.cpp portion of dmose's last patch as Mark Smith suggests above, and it fixes the hanging issue in OSX 10.9. I've only put it through minimal testing, however. Optimally the LDAP query resultant names are appended to the pre-existing entries in the combobox when the query returns, though I haven't stuck around long enough for it to execute. Hopefully I can test that at a later time.

I'm a remote worker with big corporate ldap, and each name query was taking forever for me (10+ minutes each name lookup when composing). My current workaround is move abook.mab to abook.mab.bak, then ln -s ldap.mab to abook.mab. This makes everything work, however, I'd have to update my offline mab regularly for new hires/people that leave. Optimally, thunderbird would immediately grab from the offline mab for the combo box, then when the query returns both update the combo box and update the mab. I'd be elated just with updated the combo box.

Will add my modified patch, based on dmose's but cut out what seems to have already been patched in.
Comment 77 Jon Robison 2014-08-08 11:18:46 PDT
Created attachment 8470170 [details] [diff] [review]
dmose's async patch, merged with trunk

Note You need to log in before you can comment on or make changes to this bug.