Closed Bug 242789 Opened 20 years ago Closed 19 years ago

LDAP v3 - fallback to v2 if v3 connect doesn't work with >=1.7

Categories

(MailNews Core :: LDAP Integration, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.8beta2

People

(Reporter: mkralec, Assigned: dmosedale)

References

()

Details

(Keywords: verified1.7, Whiteboard: fixed-aviary1.0)

Attachments

(5 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040504
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040504

LDAP connection should autmatically fall back to version 2 if version 3 fails.

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
block 1.7+ requested - reason: change in behaviour too heavy too handle for
enterprises with LDAP v2 Servers. support for manual switchback too hard to handle.
Flags: blocking1.7?
Dan, how hard do you think this would be? Is there a specific error code
returned when we try to use v3 on a server that only supports v2? Presumably, in
CheckLDAPOperationResult, we would detect this error, check if we were using v3,
and if so, switch the version to version 2 by calling
ldap_set_option(mConnectionHandle, VERSION2,&version);, and retry the current
operation (not sure what that would entail). 

Unfortunately, I don't have access to a server that only supports v2 - if anyone
knows of a publicly available server, that would helpful. 
Assignee: sspitzer → bienvenu
Status: UNCONFIRMED → NEW
Ever confirmed: true
It is also marketing. If OE does it - and the requested automatic fallback is an
advantage - Mozilla should at least also do it. Otherwise Mozilla falls behind.
As admin I wouldn't want to upgrade to >=1.7 if I had an LDAP v2 Server.
(In reply to comment #2)
> Unfortunately, I don't have access to a server that only supports v2 - if anyone
> knows of a publicly available server, that would helpful. 
posted question for public v2 server in de.comm.software.mozilla.mailnews
subject: LDAP - öffentlicher 'nur LDAP v2' Server gesucht
If Mozilla attempts a v3 bind to a v2-only server, it should receive an
LDAP_PROTOCOL_ERROR result code.  Sample code from the LDAP command line tools
can be found here:

http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/clients/tools/common.c#1090

(ldaptool_simple_bind_s() is a wrapper for one of the standard LDAP bind API calls).
So the original plan to deal with this is that we had hoped to have the
nsLDAPService code complete, and it would have managed all the connections for
the calling code.  It would have allowed connections to be shared among (eg) the
addressbook and autocomplete, and it would have managed all the
ldap_bind-related stuff.  Unfortunately, that code is only partly done, and Seth
has pointed out that it has some design issues too.

So when I was writing the original patch, I wanted to do fallback, but the
design I was thinking of was essentially pushing it out to all the callers and
forcing them to do the fallback, which would have been a pain.  Since I was
tight on time, I punted on that.

Your suggestion is a better plan, I think, though it'll require a slight
modification.  The general issue is that the params used in calling the C-SDK
operation functions are passed in on the stack and then discarded.  So by the
time you get an protocol error message, those params are no longer available. 
So for the general case, we'll need to store all those params in the operation
object.  However, there's a dirty trick we can use here.  Because up until now,
all the code used LDAPv2, which requires binding before doing anything else, all
the clients of the XPCOM SDK do a simpleBind() first.  LDAPv3 doesn't have that
restriction.  However, if we keep that restriction (for now) on the XPCOM SDK,
that means that CheckLDAPOperation could assume that, whenever it gets a
protocol version error, that the operation must have been a bind.  So the only
thing we'd have to store in the operation object would be the password.  Which
I'm not a huge of fan of, because the longer we leave the password around in
memory, the easier it becomes for a hacker to find it using "strings /dev/kmem".
 But, if people can read /dev/kmem, things are already pretty screwed, so I'm
willing to live with that, especially if we zero out the password string once
we're sure we're done with it.

So the short answer is that I don't _think_ this should be too hard, though the
code has to be careful that any necessary thread-related stuff (esp. proxying or
unproxying) is done.  CheckLDAPOperation happens on the LDAP connection thread,
and the first call to simpleBind will have happened from the UI thread.  I
_think_ it'll be ok to retry the simpleBind from the connection thread, but I'm
not 100% positive.
OS: Windows 2000 → All
Hardware: PC → All
As far as whether this should block 1.7, it's worth taking into account that it
may be possible to use mozptch (see <http://mozptch.mozdev.org/>) to help deploy
to large windows installations that are still using v2 servers.
Attached patch 1st pass at a fix (obsolete) — Splinter Review
this patch basically does what Dan and I described. I have no idea if it works,
since I don't have a v2-only server available. It lets the nsCString destructor
handle clearing out the password from memory, which may or may not be
sufficient (though I imagine if the memory was recycled, it would get filled
with a new string post haste. If anyone who has this trouble has a source tree
to apply and try this patch on, that would be great. If not, if you're running
windows, I can build you a new release ldap dll to try w/ the trunk. Best of
all, of course, would be access to such an ldap server...
> I have no idea if it works, since I don't have a v2-only server available. 

I can try it at my v2-only server.

> It lets the nsCString destructor handle clearing out the password 
> from memory, which may or may not be sufficient (though I imagine 
> if the memory was recycled, it would get filled with a new string
> post haste. 

I only test whether it works or it does not work. I can not understand above
description... :-) 

> If not, if you're running windows, I can build you a new release
> ldap dll to try w/ the trunk. 

I can not patch and build Mozilla. If you attach patched ldap dll, I will try it.
Attached file custom mozldap.dll with fix (obsolete) —
> custom mozldap.dll with fix

I tested this dll on "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a2)
Gecko/20040525". But the response was same message "LDAP initialization problem".

I attach ldapmsg.zip. This includes binary dump of ldap message. This data was
exported by Ethereal. 

msg1.bin...LDAP v3 request from mozilla to ldap server
msg2.bin...protocol error message from ldap server to mozilla 
msg3.bin...unbind message from mozilla to ldap server
thx for trying it. At this point, I really need an ldap server that only
supports v2 to run against - I'm a little surprised that we don't see another
bind attempt but maybe something's bad about the state...
I was able to install (and configure) eudora's directory server on my winxp box.

http://www.eudora.com/techsupport/worldmail/ldap.html

it is a ldap v2 server and things don't work unless I set

ldap_2.servers.localhost.protocolVersion to "2"
ok, coming up to speed.

with my ldap v2 dir server, I can get "LDAP initialization problem" in my
autocomplete text.

for quick search in the AB we just silently fail.
(In reply to comment #13)
> I really need an ldap server that only supports v2 to run against
> - I'm a little surprised that we don't see another
> bind attempt but maybe something's bad about the state...

The LDAP server that I use is OpenLDAP 1.2.9. I can not research about it since
I am not server administrator.
Ben, do we have an LDAP v2 server in the MF offices? If not, can you help us set
one up for testing this? 
Flags: blocking1.7? → blocking1.7+
asa / ben, david has a ldap v2 server he can test with.  

I can help verify his fix since I have one too.
Whiteboard: testing a patch
Attached patch proposed fix (obsolete) — Splinter Review
OK, this fix works with the test server I have...
Attachment #149080 - Attachment is obsolete: true
Attachment #149284 - Attachment is obsolete: true
Attachment #149443 - Flags: superreview?(mscott)
Attachment #149443 - Flags: review?(dmose)
Comment on attachment 149443 [details] [diff] [review]
proposed fix

this patch is against the trunk, and has some of the null pointer checks that
were checked into 1.7...
Attachment #149443 - Attachment is obsolete: true
Attachment #149443 - Flags: superreview?(mscott)
Attachment #149443 - Flags: review?(dmose)
Comment on attachment 149458 [details] [diff] [review]
whoops, wrong patch last time

david, I will try to r/sr/a later tonight.
Attachment #149458 - Flags: superreview?(mscott)
Attachment #149458 - Flags: review?(dmose)
the patch looks good, one nit:

1)

+    // Uf this is a second try at binding, remove the operation from pending ops

s/Uf/If

I was able to test this against my ldap v2 server:

without the patch:

20040528  05:02:13      slapd starting
20040528  05:02:38      conn=0 fd=0 connection from localhost (127.0.0.1)
20040528  05:02:38      conn=0 op=0 BIND dn="" method=128
20040528  05:02:38      unknown version 3
20040528  05:02:38      conn=0 op=0 RESULT err=2 tag=97 nentries=0
20040528  05:02:43      conn=0 op=-1 fd=1548 closed errno=0
20040528  05:02:43      conn=0 op=1 UNBIND

with the patch:

20040528  05:05:09      conn=1 fd=0 connection from localhost (127.0.0.1)
20040528  05:05:09      conn=1 op=0 BIND dn="" method=128
20040528  05:05:09      unknown version 3
20040528  05:05:09      conn=1 op=0 RESULT err=2 tag=97 nentries=0
20040528  05:05:09      conn=1 op=1 BIND dn="" method=128
20040528  05:05:09      conn=1 op=1 RESULT err=0 tag=97 nentries=0
20040528  05:05:09      conn=1 op=2 SRCH base="" scope=2 filter="(|(mail=*SS*)(c
n=*SS*)(givenname=*SS*)(sn=*SS*))"
20040528  05:05:09      conn=1 op=2 RESULT err=0 tag=101 nentries=3

I was also able to test the patch against a ldap v3 server, and it still worked.

sr/a=sspitzer once you get dmose's r=.
Whiteboard: testing a patch → mail sent to dmose asking for review help
Attached file dll with fix
I've checked this fix into the trunk - you can try this dll if you want to try
it before the next nightly trunk build comes out. If this goes well, Dan will
try to review it tomorrow for 1.7 checkin...
Comment on attachment 149458 [details] [diff] [review]
whoops, wrong patch last time

noting sr/a=sspitzer conditional on dmose's r= and no problems reported on the
trunk.
Attachment #149458 - Flags: superreview?(mscott)
Attachment #149458 - Flags: superreview+
Attachment #149458 - Flags: approval1.7+
This patch has threadsafety issues.  David and I are working on figuring out
whether there's an easy way to clean them up so that this can be checked into
the 1.7 branch.  More soon.
*** Bug 242767 has been marked as a duplicate of this bug. ***
well, potential thread-safety issues. We don't know if anything bad can actually
happen. We know that the only time anything potentially bad can happen with this
patch is when we fall back from v3 to v2, and the user hits backspace just as
the initial bind request failed on a v2 server and we're trying to rebind - so
the only folks even potentially affected are those for whom it would not have
worked at all before.  We also know from the other thread-safety issue (hitting
backspace during auto-complete), that the problems were relatively
infrequent...my guess is that what might happen is that we might not abandon the
bind and continue the search. However, I'm not convinced that couldn't happen
prior to this patch since I believe the user could hit backspace in between the
time the bind succeeds and the search operation is created and fired off (but I
need to check that code more closely.)

I don't think the ab search code ever abandons the bind operation. It only ever
abandons search operations, from what I can tell. The autocomplete code can
abandon a bind operation, however, and I don't think there's a race condition
there, in normal operations, since the relevant stuff happens on the ui thread.
fixed on the aviary 1.0 branch. Will adjust the patch as necessary if more
comments come up from the review / thread concerns.
Whiteboard: mail sent to dmose asking for review help → mail sent to dmose asking for review help,fixed-aviary-1.0
> We know that the only time anything potentially bad can
> happen with this patch is when we fall back from v3 to v2, and the user hits 
> backspace just as the initial bind request failed on a v2 server and we're 
> trying to rebind - so the only folks even potentially affected are those for 
> whom it would not have worked at all before.

Well, there may also be the case of a user aborting replication (see below). 
replication case.  I'm not sure I buy that argument: the old failure mode was
"refuse to connect to the LDAP server", and I'm not really sure what the new one
is.  It's probably just "potentially leak an object", but I'm not 100% sure of that.

> We also know from the other thread-safety issue (hitting backspace during
> auto-complete), that the problems were relatively infrequent...

I would say they were hard/impossible to reproduce on many people's setups (such
as your's and mine).  That doesn't seem to me to be the same thing as relatively
infrequent.  That backspace crasher was one of the topcrashers according to
Talkback, was it not?

> my guess is that what might happen is that we might not abandon the
> bind and continue the search. 

I'm kind of suspecting potential object leakage, though I'm not sure.

> However, I'm not convinced that couldn't happen prior to this patch since I 
> believe the user could hit backspace in between the time the bind succeeds
> and the search operation is created and fired off (but I need to check that
> code more closely.)

Not sure about this.

> I don't think the ab search code ever abandons the bind operation. It only 
> ever abandons search operations, from what I can tell. 

That appears to be correct.  However, after a quick skim, it appears to me that
the AB replication code could abandon a bind.  I haven't yet looked to see what
the consequences of this might be.

> The autocomplete code can abandon a bind operation, however, and I don't
> think there's a race condition there, in normal operations, since the
> relevant stuff happens on the ui thread.

Couldn't the second bind attempt from the LDAP connection thread could race with
an abandon from the UI thread triggered by the user hitting escape, for example?

I've spent a bunch of hours staring at this code, and it seems clear to me that
this patch would expose existing race conditions that had not previously been
exposed.  As you say, only for the (not too large) number of people who are
still using LDAPv2 servers, and it could be worked around for individual users
by forcibly setting the pref so this code doesn't get hit.  I think chances are
pretty high that you're right that when these race conditions get hit, the
results will be relatively benign.  However, the code is complex enough that I'd
have to sit down and diagram everything on a whiteboard in order to get any more
confident than "chances are pretty high".  I'll leave the risk assessment up to
drivers on whether the existing patch makes sense for 1.7.

The right fix, I think, is that (at least) all uses of mMsgId need to be locked.
 A more minimal fix would be just to lock the uses of mMsgId in AbandonExt,
SimpleBind, and any functions they call (eg RemovePendingOperation).  Either of
these fixes means that we'd have to convince ourselves that this wouldn't
introduce any deadlocks.


Comment on attachment 149458 [details] [diff] [review]
whoops, wrong patch last time

Please fix the stuff that doesn't match module style: lines < 80 columns,
tabstops are 4 spaces, K&R bracing, always check the return value of functions,
comment style.

>+++ nsLDAPConnection.cpp	27 May 2004 21:33:31 -0000
>@@ -720,6 +718,28 @@
>             switch (rv) {
> 
>             case NS_OK: 
>+              {
>+                PRInt32 errorCode;
>+                rawMsg->GetErrorCode(&errorCode);
>+                if (errorCode == LDAP_PROTOCOL_ERROR)
>+                {
>+                  // maybe a version error, e.g., using v3 on a v2 server.
>+                  // if we're using v3, try v2.
>+                  if (loop->mRawConn->mVersion == nsILDAPConnection::VERSION3)
>+                  {
>+                    nsCAutoString password;
>+                    loop->mRawConn->mVersion = nsILDAPConnection::VERSION2;
>+                    ldap_set_option(loop->mRawConn->mConnectionHandle, LDAP_OPT_PROTOCOL_VERSION, &loop->mRawConn->mVersion);
>+                    nsCOMPtr <nsILDAPOperation> operation = NS_STATIC_CAST(nsILDAPOperation *, NS_STATIC_CAST(nsISupports *, aData));
>+                    // we pass in an empty password to tell the operation that it
>+                    // should use the cached password.
>+                    operation->SimpleBind(password);

If this call fails, we need to invoke the callback.  Otherwise the caller (in
particular LDAP addrbook replication) is likely to just sit there until the
user gets tired of waiting and aborts.

>+                    operationFinished = PR_FALSE;
>+                    // we don't want to notify callers that we're done...
>+                    return PR_TRUE;
>+                  }
>+                }
>+              }
>                 break;
> 
>             case NS_ERROR_LDAP_DECODING_ERROR:
Attachment #149458 - Flags: review?(dmose) → review-
let's get this on the 1.7 branch...  trying to ship 1.7 soon.
fixed in 1.7 - I'll address Dan's comments on the trunk as soon as I can.
Keywords: fixed1.7
I can query LDAPv2 fine after delete "protocolVersion" property with 
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040602.
Didn't the changes to nsLDAPConnection::RemovePendingOperation introduce a
memory leak?  The Get() on mPendingOperations addrefs, and so does the
assignment into the nsCOMPtr, but you only have one release (the nsCOMPtr
destructor).  Note that the old code told the nsCOMPtr to NOT addref on
assignment....
Er, I meant the changes to nsLDAPConnection::InvokeMessageCallback
I think you're right...
Attached patch ref-counting fixSplinter Review
Attachment #150281 - Flags: superreview?(bzbarsky)
Attachment #150281 - Flags: review?(dmose)
Comment on attachment 150281 [details] [diff] [review]
ref-counting fix

r=dmose
Attachment #150281 - Flags: review?(dmose) → review+
Comment on attachment 150281 [details] [diff] [review]
ref-counting fix

sr=bzbarsky.

It may be worth converting mPendingOperations to be an nsInterfaceHashtable
long-term, so these issues don't arise...
Attachment #150281 - Flags: superreview?(bzbarsky) → superreview+
the ref-counting fix has been checked into the aviary 1.0 branch too
Whiteboard: mail sent to dmose asking for review help,fixed-aviary-1.0 → fixed-aviary-1.0
Whiteboard: fixed-aviary-1.0 → fixed-aviary1.0
Comment on attachment 150281 [details] [diff] [review]
ref-counting fix

simple fix, for a smallish but new leak.
Attachment #150281 - Flags: approval1.7?
Comment on attachment 150281 [details] [diff] [review]
ref-counting fix

a=sspitzer for 1.7, and I've sent email to asa so he knows about it.
Attachment #150281 - Flags: approval1.7? → approval1.7+
Another fixed1.7 bug with a misleading blocking1.7+ flag :-(
clearing blocking flag.
Flags: blocking1.7+
verified.  I've tested mozilla 1.7 release bits and this is working.
Keywords: fixed1.7verified1.7
Product: MailNews → Core
Dan, this attempts to address your comments re K&R, comment format,
indentation, and handling an error from SimpleBind...
Comment on attachment 168160 [details] [diff] [review]
address sr comments for trunk, formatting and error handling

I'd love to get this closed and off my list :-)
Attachment #168160 - Flags: review?(dmose)
I'm going to hand this back to Dan...
Assignee: bienvenu → dmose
Component: MailNews: LDAP Integration → Address Book
Flags: review?(dmose)
Flags: review-
Flags: review+
Flags: approval1.7+
Product: Core → Thunderbird
Target Milestone: --- → Thunderbird1.1
Component: Address Book → MailNews: LDAP Integration
Product: Thunderbird → Core
Target Milestone: Thunderbird1.1 → mozilla1.8beta2
Comment on attachment 168160 [details] [diff] [review]
address sr comments for trunk, formatting and error handling

sr=dmose; checked in.
Attachment #168160 - Flags: superreview+
Final patch bit checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
I filed bug 289021 for locking / threading cleanup for the LDAP XPCOM SDK.
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: