Closed Bug 79509 Opened 23 years ago Closed 2 years ago

avoid stalling out if ldap C SDK hangs during connect() (LDAP server/service not available/unresponsive causes hang/freeze or causes mozilla to stop responding)

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: dmosedale, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: hang, perf, Whiteboard: problems found while testing)

Attachments

(2 files, 11 obsolete files)

C SDK patch to explicitly init a variable, v1 23 years ago Dan Mosedale (:dmosedale, :dmose) 528 bytes, patch		Details \| Diff \| Splinter Review
patch for C SDK, v2 23 years ago Dan Mosedale (:dmosedale, :dmose) 4.07 KB, patch		Details \| Diff \| Splinter Review
patch for XPCOM SDK, v1 23 years ago Dan Mosedale (:dmosedale, :dmose) 920 bytes, patch		Details \| Diff \| Splinter Review
LDAP XPCOM SDK patch to turn on async connections by default, v2 23 years ago Dan Mosedale (:dmosedale, :dmose) 920 bytes, patch		Details \| Diff \| Splinter Review
patch to C-SDK, v3 23 years ago Dan Mosedale (:dmosedale, :dmose) 5.19 KB, patch		Details \| Diff \| Splinter Review
threadsafety patch to nsLDAPAutoCompleteSession, v1 23 years ago Dan Mosedale (:dmosedale, :dmose) 1.09 KB, patch		Details \| Diff \| Splinter Review
patch to C-SDK, v4 23 years ago Dan Mosedale (:dmosedale, :dmose) 5.44 KB, patch		Details \| Diff \| Splinter Review
C SDK changes, v5 23 years ago Dan Mosedale (:dmosedale, :dmose) 6.73 KB, patch		Details \| Diff \| Splinter Review
C SDK changes, v6 23 years ago Dan Mosedale (:dmosedale, :dmose) 8.56 KB, patch		Details \| Diff \| Splinter Review
async patch,. v7 (work-in-progress) 22 years ago Dan Mosedale (:dmosedale, :dmose) 2.84 KB, patch		Details \| Diff \| Splinter Review
experimental libldap changes 22 years ago Mark Smith [:mcs] 3.76 KB, patch		Details \| Diff \| Splinter Review
async patc, v8 (work-in-progress) 22 years ago Dan Mosedale (:dmosedale, :dmose) 6.58 KB, patch		Details \| Diff \| Splinter Review
dmose's async patch, merged with trunk 10 years ago Jon Robison 1.55 KB, patch		Details \| Diff \| Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Description

•

23 years ago

There is an "aync reconnect" option in the C SDK which can be set.  We should
use this.  Last time I checked, using it at least doesn't break anything on
Windows and Mac for the success case, but does on Linux.  So we'll probably need
to whack the C SDK a bit for this, and then test to make sure it things work in
both the success case and the case where there is actually a problem during
connect().

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Blocks: 17880

Severity: normal → major

Status: NEW → ASSIGNED

Keywords: nsbeta1

Priority: -- → P2

Jussi-Pekka Mantere

Updated

•

23 years ago

Target Milestone: --- → mozilla0.9.2

yulian chang

Comment 1

•

23 years ago

reassign to Olga as QA contact

QA Contact: yulian → olgac

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Blocks: 82317

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

No longer blocks: 17880

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.2 → mozilla0.9.3

Leif Hedstrom

Updated

•

23 years ago

Priority: P2 → P3

Jussi-Pekka Mantere

Comment 2

•

23 years ago

Marking P2, 0.9.4.

Priority: P3 → P2

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Hong Kwon

Updated

•

23 years ago

Keywords: nsenterprise

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 3

•

23 years ago

The reason async connects were often causing crashes under linux is that in the
asynchronous case, wait4msg() in the LDAP C SDK depends on the compiler to
initialize automatics variables to 0.  This isn't guaranteed by the C language,
although it apparently is done by the Windows and Mac compilers.  I'm attaching
a one-liner patch for the C SDK problem.  mhein or mcs, can you review?

Another patch to turn async I/O on in the LDAP XPCOM SDK will be forthcoming...

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 4

•

23 years ago

Attached patch C SDK patch to explicitly init a variable, v1 (obsolete) — Details — Splinter Review

Mark Smith [:mcs]

Comment 5

•

23 years ago

The patch looks fine, so you certainly have module owner approval.

But the code in result.c:wait2msg() that uses the potentially unitialized lr 
variable seems incomplete to me.  The code looks like this:

 else if (ld->ld_options & LDAP_BITOPT_ASYNC) {
    if ( lr
            && lc->lconn_status == LDAP_CONNST_CONNECTING
            && nsldapi_iostatus_is_write_ready( ld,
            lc->lconn_sb ) ) {
         rc = nsldapi_ber_flush( ld, lc->lconn_sb, lr->lr_ber 0, 1 );
         ...
    }
 }

The problem I see is that lr is set earlier in the function ONLY if you request 
a specific message id (i.e., when you call ldap_result() with positive integer 
as the second parameter).  But the code above is trying to flush any pending 
requests to a server for which the TCP connect just completed, so it seems like 
it should loop through all of the outstanding requests that are associated with 
that connection and call nsldapi_ber_flush().  Of course if you never initiate 
more than one (or any) LDAP operations before the connect occurs, you will never 
run into problems because of this.  Should we open a separate bug?

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 6

•

23 years ago

Attached patch patch for C SDK, v2 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 7

•

23 years ago

OK, I think this is a reasonable patch.  It attempts to free all requests
associated with a connection when something goes wrong during the flush.  Also
worth nothing is that in my previous fix, this code path actually NEVER got
followed in the LDAP_MSG_ANY case, because lr was always 0.  So
LDAP_CONNST_CONNECTED was only ever set in request.c, never here in result.c.

Whiteboard: have patches; need r=, sr=

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 8

•

23 years ago

Attached patch patch for XPCOM SDK, v1 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 9

•

23 years ago

The above patch turns on async connections in the XPCOM SDK.  leif, can you  r=
that one?

Mark Smith [:mcs]

Comment 10

•

23 years ago

With regard to the C SDK v2 patch:

The connection status should be set to "connected" regardless of whether the 
flushes succeed.  I think.

nsldapi_ber_flush() will return -2 if the I/O would block but the proposed 
flush_connection_requests() function treats that as a fatal error.  I think the 
there is a general problem in the C SDK LDAP_OPT_ASYNC code where a request that 
is partially flushed will never be completely flushed (I just do not see where 
that would happen, since nsldapi_ber_flush() is only called:

1) When a connection completes (in result.c; the subject of this patch)

2) When an LDAP operation is first issued (inside 
request.c:nsldapi_send_server_request())

3) When an unbind request is sent.

The logical place to do the necessary nsldapi_ber_flush() calls is inside 
result.c:wait4msg(), since that is where we poll() for the socket state.  It 
seems like the right strategy would be:

a) If a ber_flush() call returns "would block", call 
nsldapi_iostatus_interest_write() for the connection.

b) Check for "write ready" status inside wait4msg() regardless of whether the 
connection state is "connecting" or "connected."  If it is "connecting", do the 
things in the patch.  If it is "connected", just call 
flush_connection_requests().

Jussi-Pekka Mantere

Updated

•

23 years ago

Keywords: nsenterprise → nsenterprise+

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Whiteboard: have patches; need r=, sr= → incorporating r= feedback into patch

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.4 → mozilla0.9.5

grega

Comment 11

•

23 years ago

really adding chofmann to the cc and marking nsbranch+

Keywords: nsbranch+

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 12

•

23 years ago

*** Bug 99423 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #45886 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #45999 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 13

•

23 years ago

Attached patch LDAP XPCOM SDK patch to turn on async connections by default, v2 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 14

•

23 years ago

mcs: I don't see anything in the 4.0 SDK with a name resembling
nsldapi_iostatus_interest_write().  Is there another way this is done with the
4.0 code?

Mark Smith [:mcs]

Comment 15

•

23 years ago

Sorry.  The function is called nsldapi_is_write_ready() in the 4.x SDK.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 16

•

23 years ago

*** Bug 101040 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 17

•

23 years ago

Attached patch patch to C-SDK, v3 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #45976 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 18

•

23 years ago

mcs: OK, here's a new version of the patch that does what you suggest.  I assume
that when you said

> a) If a ber_flush() call returns "would block", call 
> nsldapi_iostatus_interest_write() for the connection.
>
> [...]
>
> Sorry.  The function is called nsldapi_is_write_ready() in the 4.x SDK.

that you really meant to say that I should call nsldapi_mark_select_write() each
time flushing would block, and the patch is written with that assumption.

There is an XXXdmose comment in the code, I assume I don't really need to call
nsldapi_mark_select_read() in the case where we're already connected, but I just
wanted to be sure.

Interestingly, in the case where the LDAP connect is working, I see lots of
"flush_connection_requests would block" messages in the debug log, though they
eventually go away.  The thing that's odd is that LDAP autocomplete worked just
fine even with the previous rev of this patch, which didn't have the flushing,
so given this debug message, I would have expected previous versions of the
patch not to work.  One conceivable explanation for this is that
nsldapi_ber_flush() will get called the next time a request gets sent out, but
in that case it would get called with a different BerElement arg. And the LDAP
autocomplete code only sends its requests to the C SDK one at a time anyway, so
I'm not really convinced that that's what's going on.

Thoughts appreciated....

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Depends on: 101252

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

No longer depends on: 101252

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 19

•

23 years ago

Attached patch threadsafety patch to nsLDAPAutoCompleteSession, v1 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 20

•

23 years ago

Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

This patch changes the LDAP autocomplete session to use the threadsafe ISUPPORTS macros, as per the discussion in bug 101252.

Mark Smith [:mcs]

Comment 21

•

23 years ago

Your assumption about nsldapi_mark_select_write() is correct.  Sorry.

I think you do need to call nsldapi_mark_select_read() in the connected case, 
because the code inside nsldapi_send_server_request() does not call same if it 
encounters a "would block" situation.

I am not sure why you see the "flush_connection_requests would block" messages 
and things still work.  Does not quite add up, as you say.  Unfortunately I 
don't have time today to puzzle it out (too many meetings).

Your patch looks fine except for one thing:  I think free_connection_requests() 
needs to save the lr->lr_next pointer into a local variable before it calls 
nsldapi_free_request() (otherwise it will access freed memory the next time 
around the loop when it dereferences lr to grab lr->lr_next, won't it?)  See the 
code in bind.c/ldap_ld_free() for an example.

Jaime Rodriguez, Jr.

Comment 22

•

23 years ago

Pls provide an ETA for this one.

Whiteboard: incorporating r= feedback into patch → incorporating r= feedback into patch,[ETA?]

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #50435 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 23

•

23 years ago

Attached patch patch to C-SDK, v4 (obsolete) — Details — Splinter Review

Mark Smith [:mcs]

Comment 24

•

23 years ago

The latest C-SDK patch (v4) looks good.  r=mcs

Leif Hedstrom

Comment 25

•

23 years ago

Comment on attachment 50179 [details] [diff] [review]
LDAP XPCOM SDK patch to turn on async connections by default, v2

r=leif

Leif Hedstrom

Comment 26

•

23 years ago

Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

r=leif

Jaime Rodriguez, Jr.

Updated

•

23 years ago

Whiteboard: incorporating r= feedback into patch,[ETA?] → incorporating r= feedback into patch,[ETA 09.26]

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Whiteboard: incorporating r= feedback into patch,[ETA 09.26] → problems found while testing; at-risk

yulian chang

Comment 27

•

23 years ago

Reassign QA contact to yulian@netscape.com

QA Contact: olgac → yulian

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #50807 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 28

•

23 years ago

Attached patch C SDK changes, v5 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 29

•

23 years ago

OK, the latest C-SDK patch makes the following changes:

* turns on the async functionality for Windows

* adds an nsldapi_mark_select_write()call to nsldapi_new_connection().  This
causes the appropriate if() clause in wait4msg() to notice if an error happens
while in the connecting state (eg connection refused).  I don't think it should
case any other ill-effects, but I'd love confirmation of that.  Also, mcs, do I
need to clear out the read and write fdsets ever, or is that taken care of? 
Turns out the exception fdset isn't very interesting, because it's only used for
TCP out-of-band data and some pseudo-TTY junk.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 30

•

23 years ago

Interestingly, the nsldapi_mark_select_write change seems to only fix the
"connection refused" not being noticed problem on Linux.  On windows, the
problem is still there.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 31

•

23 years ago

Turns out that on Windows, unlike Unix, I _do_ need to use the exception fdset
to notice a connect error.  New patch in progress...

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Depends on: 102270

Jussi-Pekka Mantere

Comment 32

•

23 years ago

Too risky for 0.9.5 at this time; pushing out to 0.9.6. Removing nsbranch+ 
nomination, setting back to nsenterprise.

Keywords: nsbranch+, nsenterprise+ → nsenterprise

Jussi-Pekka Mantere

Updated

•

23 years ago

Target Milestone: mozilla0.9.5 → mozilla0.9.6

Hong Kwon

Comment 33

•

23 years ago

marking nsenterprise-; will be reevaluated for nsenterprise in future release.

Keywords: nsenterprise-

Asa Dotzler [:asa]

Updated

•

23 years ago

Blocks: 104166

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 34

•

23 years ago

*** Bug 105112 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 35

•

23 years ago

*** Bug 90563 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Attachment #51167 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 36

•

23 years ago

Attached patch C SDK changes, v6 (obsolete) — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 37

•

23 years ago

While testing the previous version of the patch, I made
flush_connection_requests return -1 always in order to test that code path, and
I could make it race and crash about every 3rd or 4th time I used LDAP autocomplete.

What appeared to be going on was this: 

The LDAP connection thread would call ldap_result(), which would end up down in
wait4msg() which would see the error in flush_connection_requests, and then
would call nsldapi_free_connection(), which actually decrements the refcount on
the connection, and frees itif appropriate.  wait4msg never actually incremented
the refcount, so I believe the reason that it was non-zero in wait4msg is
because this happened to be the default connection for the session.
Anyway, the connection would get freed from wait4msg, and then when an operation
would be started from another thread, the other thread would increment the
refcount (through a free()d pointer!) attempt to use the connection, hit an
error, and then try and free the connection pointer again.

So I've attached a new version of the patch, which simply doesn't call
nsldapi_free_connection from inside wait4msg when there's an error.  See the
XXXdmose comments in the patch for some details about possible problems.

I'm not particularly convinced this patch is the right fix, but it's one
possibility.  I sort of suspect that a better fix would be to make wait4msg
increment the connection refcount before use.  

Also, should nsldapi_free_connection() check to see if the connection is the
default connection for a session, and if so, set the default connection to
either NULL or another connection?

The real problem I'm having here is that it's not entirely clear from reading
the code what the refcounting model is supposed to be: do only LDAPRequests hold
references to a connection (except for the "never go away property of the
default connection")?  Or can/should other things (eg ldap_result/wait4msg) get
references to a connection?

mcs/mhein: any light you can shed here would be greatly appreciated.

scottputterman

Comment 38

•

23 years ago

Moving to 0.9.7 since it looks like there's still some work left to do.

Target Milestone: mozilla0.9.6 → mozilla0.9.7

Mark Smith [:mcs]

Comment 39

•

23 years ago

Sorry for the delay in looking at the recent comments from Dan.

lconn_refcnt is supposed to be incremented when a new LDAP operation (request)
is sent on a connection and decremented when the LDAPResult message is received
for a request.  Functions like wait4msg() should not need to increment the
refcnt, at least in theory.  Obviously there is a bug somewhere.

With the previous patch, were there more calls to nsldapi_free_connection() than
to nsldapi_send_server_request()?

Or, is it possible that the call to flush_connection_requests() and related code
in wait4msg() was being executed when no outstanding requests were present?

Also, I can't remember why we didn't just call nsldapi_free_connection() once
for each request we free inside free_connection_requests().

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 40

•

23 years ago

*** Bug 90815 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Priority: P2 → P1

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.7 → mozilla0.9.8

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 41

•

23 years ago

Mass change: pushing out 0.9.8 bugs, as I'm concentrating feature work now.

Target Milestone: mozilla0.9.8 → mozilla0.9.9

scottputterman

Updated

•

23 years ago

Keywords: nsenterprise

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

23 years ago

Keywords: nsbeta1

scottputterman

Updated

•

23 years ago

Keywords: nsbeta1 → nsbeta1+

Target Milestone: mozilla0.9.9 → mozilla1.0

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 42

•

23 years ago

*** Bug 126283 has been marked as a duplicate of this bug. ***

saari (gone)

Comment 43

•

22 years ago

ADT needs more info. as to what the impact of this bug will be

Whiteboard: problems found while testing; at-risk → [ADT NEED INFO] problems found while testing; at-risk

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

22 years ago

Attachment #50179 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 44

•

22 years ago

Comment on attachment 50582 [details] [diff] [review]
threadsafety patch to nsLDAPAutoCompleteSession, v1

Already checked in.

Attachment #50582 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 45

•

22 years ago

Comment on attachment 56833 [details] [diff] [review]
C SDK changes, v6

This patch was only relevant to the C-SDK v4.

Attachment #56833 - Attachment is obsolete: true

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 46

•

22 years ago

ADT: the deal is that any LDAP operation (autocomplete, addressbook search,
replication) against a server which is unreachable (eg because a laptop has been
taken outside a firewall) will cause the entire app-suite to freeze for the
length of the TCP timeout (typically long enough that most users will give up
and kill the program).

I've got a patch which fixes this for cleartext connections, but breaks SSL
connections entirely.  I'll attach it as a work-in-progress; worst case it will
be possible to confine this fix to only be used on cleartext connections.  I'll
keep working on the SSL-piece.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 47

•

22 years ago

Attached patch async patch,. v7 (work-in-progress) (obsolete) — Details — Splinter Review

Vadim Berezniker

Comment 48

•

22 years ago

*** Bug 130967 has been marked as a duplicate of this bug. ***

Vadim Berezniker

Comment 49

•

22 years ago

clarifying summary ... it's kind of weird but hopefully it will make this bug
easier to find.

Summary: avoid stalling out if ldap C SDK hangs during connect() → avoid stalling out if ldap C SDK hangs during connect() (LDAP server/service not available/unresponsive causes hang/freeze or causes mozilla to stop responding)

Mark Smith [:mcs]

Comment 50

•

22 years ago

I am starting to see why someone put the LDAP_OPT_ASYNC_CONNECT in
ldap-deprecated.h. But maybe we can fix it. I tried enabling that option with a
simple LDAP client (outside of the Mozilla client) that uses the LDAP C SDK's
libssldap layer. It failed with "timeout" errors. I debugged a bit and found a
few problems. The one I have not yet solved is that libldap needs to know if the
connection really completed or not. I will attach a patch that shows what I mean.

Also, note that you should be able to reduce the connect timeout when you are
NOT using the LDAP_OPT_ASYNC_CONNECT option by making a call like this:

int ms = 10000; /* 10 seconds, in milliseconds */
if ( ldap_set_option( 0, LDAP_X_OPT_CONNECT_TIMEOUT, *ms ) != 0 ) {
      /* error */
}

Mark Smith [:mcs]

Comment 51

•

22 years ago

Attached patch experimental libldap changes (obsolete) — Details — Splinter Review

3 changes:

1) Fix a silly bug in nsldapi_iostatus_poll() that causes it to mistakenly
generate timeout errors.

2) Add a hack that says "the connection did not yet complete." This is a real
hack, because we should not do this unconditionally. I need to figure out a way
to pass back the "not yet connected" status via libldap's (and libprldap's)
CONNECT callback function.

3) Set rc to -2 so we loop waiting for LDAP results after a connect completes.
I vaguely remember Dan fixing this before. Or at least talking about it. I
don't understand at all why we would set rc to LDAP_RES_BIND (I didn't write
this particular code ;-)

With these changes in place, my drop deal simple LDAP over SSL async test
program works.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 52

•

22 years ago

Attached patch async patc, v8 (work-in-progress) — Details — Splinter Review

Woot!  mcs' changes made async work with SSL also.  I've incorporated his
changes into V8 of the patch.  Additionally, I think I'm gonna have to
incorporate the (not-yet-entirely-complete) patches that I was working on for
the LDAP C SDK v4 (ie patch v6 in this bug) before checking in, or this change
will just expose other problems.  That's my next step.

Attachment #79224 - Attachment is obsolete: true

Attachment #79877 - Attachment is obsolete: true

Mark Smith [:mcs]

Comment 53

•

22 years ago

Comment on attachment 79905 [details] [diff] [review]
async patc, v8 (work-in-progress)

Yes, you will need some of the changes from patch v6. Let me know if you need
help sorting out the unknown issues.

As you and I discussed via AIM yesterday, the changes you propose for
nsldapi_new_connection() to always set the state to LDAP_CONNST_CONNECTING
should be safe and are not as much of a hack as I originally thought. But it
would be a little better to put the code inside nsldapi_connect_to_host(); that
way, we can return -2 (connect not yet completed) for async connects that are
processed using the extended I/O callbacks only (because in the case where
there are no is no connect I/O callback, the return code is already set to -2
if appropriate).

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 54

•

22 years ago

*** Bug 50074 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 55

•

22 years ago

I spent yesterday and today wrapping my head around the various codepaths taken
when making LDAP connections.  It turns out that the code in the tree (and in
patch v8 here, I think) never actually returns -2 for EINPROGRESS, even though
there's some code to handle that case and it's documented that way.

The deal is that both the extended connect command and the nsldapi_try_each_host
return a socket file descriptor (which may or may not be used, depending on the
particular call-stack) or -1 on error.  So it's not possible to return -2,
because that would mean not returning the socket fd itself, which is required in
some cases (in the USE_NATIVE_OS_SOCKETS case, as well as the
nsldapi_ext_compat_connect / old IO callbacks case).

The reason that we get away with it in the current patch is that
LDAP_CONNST_CONNECTING is unconditionally set when we're using LDAP_BITOPT_ASYNC.

Presumably a better fix would be to somehow tweak the extended connect and
nsldapi_try_each_host APIs to pass back more information.  Another option might
be to say that LDAP_BITOPT_ASYNC isn't supported for those cases where the
socket fd is actually used, and then change those two functions to return -2 for
EINPROGRESS.

Thoughts?

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 56

•

22 years ago

Hmmm, there may still be a problem with the above explanation, because it
doesn't explain why patch v7 worked in the non-SSL case... if the above
explanation were exactly correct, there's no way LDAP_CONNST_CONNECTING could
ever be set.

Mark Smith [:mcs]

Comment 57

•

22 years ago

Maybe the v7 patch worked by chance (if the connect happens to complete before
and writes or reads are attempted on the socket, the LDAP code will work fine).

But you are right about the -2... I probably broke that when I added the
extended I/O callbacks. I don't think we should bother changing any APIs right
now though. If we assume that the connect may be in progress whenever
LDAP_OPT_ASYNC is set, then I believe everything will work fine. We can clean up
the other things later.

(not reading, please use seth@sspitzer.org instead)

Comment 58

•

22 years ago

I think gayatri, laurel and yulian were seeing this bug this morning, as
nsdirectory.netscape.com:389 was not responding.

D:\builds\trunk\mozilla>telnet nsdirectory.netscape.com 389
Connecting To nsdirectory.netscape.com...Could not open a connection to host on
port 389 : Connect failed

they were doing ldap autocomplete.  what they were seeing was the UI thread
hang, and then a while later (a long while, in some cases) the message about
"ldap connection failed" would appear in the address field.

Gayatri Rimola

Comment 59

•

22 years ago

Well, I never saw the "ldap connection failed" message but maybe I didn't wait
long enough.  In task manager all the netscape proceses were showing "not
responding". However, it does sound like the same problem.

yulian chang

Comment 60

•

22 years ago

Comment #34:
Bug 105112 reported that Linux hangs too long to get <LDAP server connection
failed> error back. 

On Win2K, it took about 1.5 minutes to get the error message.

Mark Smith [:mcs]

Comment 61

•

22 years ago

It would be very easy to implement a consistent and short timeout for
autocomplete on all platforms.  See the second half of my comment # 50. IMHO a
timeout of 10 seconds seems more than long enough for autocomplete (of course it
should be configurable).

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 62

•

22 years ago

mcs: thanks for pointing that out; I had totally spaced that part of your
comment.  I've spun off bug 143172 for this in the hopes of getting it into
Mozilla 1.0 since it's such a small fix.

Michael Buckland

Comment 63

•

22 years ago

Discussed at mail news bug meeting.  Decided to minus this bug.

Blocks: 122274

Keywords: nsbeta1+ → nsbeta1-

Target Milestone: mozilla1.0 → mozilla1.2

Paul Prescod

Comment 64

•

22 years ago

LDAP hangs for much longer than 10 seconds for me. I've waited HOURS on two
different occasions. Both times it was after typing REALLY LONG emails that I
eventually lost when I had to kill and restart Mozilla. :( I don't know how to
repro it though.

Mark Smith [:mcs]

Comment 65

•

22 years ago

I have been working on various async. I/O improvements inside the LDAP library
code. I just attached a patch (work on progress) to bug 140182 (attachment
111518 [details] [diff] [review]). I have not yet tried the new code inside the Mozilla client.

Mark Smith [:mcs]

Comment 66

•

21 years ago

The fix for bug 140182 has been committed.  It includes the changes in this bug
that are under mozilla/directory/c-sdk (and many more fixes too).

Axxackall

Comment 67

•

21 years ago

This bug was a showstopper in the project of migrating Outlook users to Mozilla
Mail I was involved.

Axxackall

Comment 68

•

21 years ago

Is there any particular reason why voting on this bug is disabled?

Adam Guthrie

Comment 69

•

19 years ago

*** Bug 306963 has been marked as a duplicate of this bug. ***

Mark Smith [:mcs]

Comment 70

•

18 years ago

Does anyone have time to try the nsLDAPProtocolModule.cpp portion of dmose's last patch?  Does it solve this problem?

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 71

•

17 years ago

Assigning bugs that I'm not actively working on back to nobody; use SearchForThis as a search term if you want to delete all related bugmail at once.

Assignee: dmose → nobody

Status: ASSIGNED → NEW

Serge Gautherie (:sgautherie)

Comment 72

•

16 years ago

Filter on "Nobody_NScomTLD_20080620"

Assignee: nobody → dmose

QA Contact: yulian → xpcom

Nikolay Shopik

Updated

•

14 years ago

Blocks: 373167

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Assignee: dmose → nobody

Keywords: perf

Whiteboard: [ADT NEED INFO] problems found while testing; at-risk → problems found while testing

Target Milestone: mozilla1.2 → ---

boris

Comment 74

•

13 years ago

Anniversary!
This unresolved bug is now more than 10 years old and still causes problems especially in corporate environments.
We should light a candle...
... could someone please at least set the timeout to 300ms and display a warning: e.g. "LDAP timeaout"

David Rees

Comment 75

•

11 years ago

I have found the LDAP Swapping extension useful for turning off LDAP when I am not on the corporate network - https://bugzilla.mozilla.org/show_bug.cgi?id=79509. Still would like to see this just quickly timeout though.

Jon Robison

Comment 76

•

10 years ago

I applied the nsLDAPProtocolModule.cpp portion of dmose's last patch as Mark Smith suggests above, and it fixes the hanging issue in OSX 10.9. I've only put it through minimal testing, however. Optimally the LDAP query resultant names are appended to the pre-existing entries in the combobox when the query returns, though I haven't stuck around long enough for it to execute. Hopefully I can test that at a later time.

I'm a remote worker with big corporate ldap, and each name query was taking forever for me (10+ minutes each name lookup when composing). My current workaround is move abook.mab to abook.mab.bak, then ln -s ldap.mab to abook.mab. This makes everything work, however, I'd have to update my offline mab regularly for new hires/people that leave. Optimally, thunderbird would immediately grab from the offline mab for the combo box, then when the query returns both update the combo box and update the mab. I'd be elated just with updated the combo box.

Will add my modified patch, based on dmose's but cut out what seems to have already been patched in.

Jon Robison

Comment 77

•

10 years ago

Attached patch dmose's async patch, merged with trunk — Details — Splinter Review

stdedos+mozilla

Comment 78

•

5 years ago

The bug is open 18 years ago (!) and it is still annoying as of today. Could please someone please fix this?

2019-07-09 20:50:07.152000 UTC - [49228:Main Thread]: D/LDAP nsLDAPOperation::SimpleBind(): called; bindName = '';
2019-07-09 20:51:11.468000 UTC - [49228:Main Thread]: D/LDAP unbinding
2019-07-09 20:51:11.468000 UTC - [49228:Main Thread]: D/LDAP unbound

Move this to a secondary thread, make it async .. something!

It's even weirder to the user eye that it freezes, even if there is local cache / address book matches (and those matches are already visible).

Mark Smith [:mcs]

Comment 79

•

5 years ago

•

Edited

I updated the Product and Component so the patch from comment 77 may get some attention from Thunderbird engineers.

Component: LDAP XPCOM SDK → LDAP Integration

Product: Directory → MailNews Core

Version: other → unspecified

Wayne Mery (:wsmwk)

Updated

•

5 years ago

Severity: major → critical

Keywords: hang

Priority: P1 → --

Wayne Mery (:wsmwk)

Comment 80

•

3 years ago

To those of you who were able to reproduce this issue...

Can you create a new profile https://support.mozilla.org/en-US/kb/using-multiple-profiles to see if this reproduces with beta https://archive.mozilla.org/pub/thunderbird/releases/90.0b2/ ?

beta implements new ldap code https://www-stage.thunderbird.net/en-US/thunderbird/90.0beta/releasenotes/

Ping Chen [:rnons] (away)

Comment 81

•

2 years ago

LDAP c-sdk has been removed in bug 1729862.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.