Closed
Bug 242789
Opened 20 years ago
Closed 19 years ago
LDAP v3 - fallback to v2 if v3 connect doesn't work with >=1.7
Categories
(MailNews Core :: LDAP Integration, defect)
MailNews Core
LDAP Integration
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.8beta2
People
(Reporter: mkralec, Assigned: dmosedale)
References
()
Details
(Keywords: verified1.7, Whiteboard: fixed-aviary1.0)
Attachments
(5 files, 3 obsolete files)
711 bytes,
application/x-zip-compressed
|
Details | |
5.22 KB,
patch
|
sspitzer
:
superreview+
|
Details | Diff | Splinter Review |
44.00 KB,
application/octet-stream
|
Details | |
771 bytes,
patch
|
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
3.88 KB,
patch
|
dmosedale
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040504 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040504 LDAP connection should autmatically fall back to version 2 if version 3 fails. Reproducible: Always Steps to Reproduce: 1. 2. 3.
block 1.7+ requested - reason: change in behaviour too heavy too handle for enterprises with LDAP v2 Servers. support for manual switchback too hard to handle.
Flags: blocking1.7?
Comment 2•20 years ago
|
||
Dan, how hard do you think this would be? Is there a specific error code returned when we try to use v3 on a server that only supports v2? Presumably, in CheckLDAPOperationResult, we would detect this error, check if we were using v3, and if so, switch the version to version 2 by calling ldap_set_option(mConnectionHandle, VERSION2,&version);, and retry the current operation (not sure what that would entail). Unfortunately, I don't have access to a server that only supports v2 - if anyone knows of a publicly available server, that would helpful.
Assignee: sspitzer → bienvenu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 3•20 years ago
|
||
It is also marketing. If OE does it - and the requested automatic fallback is an advantage - Mozilla should at least also do it. Otherwise Mozilla falls behind. As admin I wouldn't want to upgrade to >=1.7 if I had an LDAP v2 Server.
Comment 4•20 years ago
|
||
(In reply to comment #2) > Unfortunately, I don't have access to a server that only supports v2 - if anyone > knows of a publicly available server, that would helpful. posted question for public v2 server in de.comm.software.mozilla.mailnews subject: LDAP - öffentlicher 'nur LDAP v2' Server gesucht
Comment 5•20 years ago
|
||
If Mozilla attempts a v3 bind to a v2-only server, it should receive an LDAP_PROTOCOL_ERROR result code. Sample code from the LDAP command line tools can be found here: http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/clients/tools/common.c#1090 (ldaptool_simple_bind_s() is a wrapper for one of the standard LDAP bind API calls).
Assignee | ||
Comment 6•20 years ago
|
||
So the original plan to deal with this is that we had hoped to have the nsLDAPService code complete, and it would have managed all the connections for the calling code. It would have allowed connections to be shared among (eg) the addressbook and autocomplete, and it would have managed all the ldap_bind-related stuff. Unfortunately, that code is only partly done, and Seth has pointed out that it has some design issues too. So when I was writing the original patch, I wanted to do fallback, but the design I was thinking of was essentially pushing it out to all the callers and forcing them to do the fallback, which would have been a pain. Since I was tight on time, I punted on that. Your suggestion is a better plan, I think, though it'll require a slight modification. The general issue is that the params used in calling the C-SDK operation functions are passed in on the stack and then discarded. So by the time you get an protocol error message, those params are no longer available. So for the general case, we'll need to store all those params in the operation object. However, there's a dirty trick we can use here. Because up until now, all the code used LDAPv2, which requires binding before doing anything else, all the clients of the XPCOM SDK do a simpleBind() first. LDAPv3 doesn't have that restriction. However, if we keep that restriction (for now) on the XPCOM SDK, that means that CheckLDAPOperation could assume that, whenever it gets a protocol version error, that the operation must have been a bind. So the only thing we'd have to store in the operation object would be the password. Which I'm not a huge of fan of, because the longer we leave the password around in memory, the easier it becomes for a hacker to find it using "strings /dev/kmem". But, if people can read /dev/kmem, things are already pretty screwed, so I'm willing to live with that, especially if we zero out the password string once we're sure we're done with it. So the short answer is that I don't _think_ this should be too hard, though the code has to be careful that any necessary thread-related stuff (esp. proxying or unproxying) is done. CheckLDAPOperation happens on the LDAP connection thread, and the first call to simpleBind will have happened from the UI thread. I _think_ it'll be ok to retry the simpleBind from the connection thread, but I'm not 100% positive.
OS: Windows 2000 → All
Hardware: PC → All
Assignee | ||
Comment 7•20 years ago
|
||
As far as whether this should block 1.7, it's worth taking into account that it may be possible to use mozptch (see <http://mozptch.mozdev.org/>) to help deploy to large windows installations that are still using v2 servers.
Comment 8•20 years ago
|
||
this patch basically does what Dan and I described. I have no idea if it works, since I don't have a v2-only server available. It lets the nsCString destructor handle clearing out the password from memory, which may or may not be sufficient (though I imagine if the memory was recycled, it would get filled with a new string post haste. If anyone who has this trouble has a source tree to apply and try this patch on, that would be great. If not, if you're running windows, I can build you a new release ldap dll to try w/ the trunk. Best of all, of course, would be access to such an ldap server...
Comment 9•20 years ago
|
||
> I have no idea if it works, since I don't have a v2-only server available. I can try it at my v2-only server. > It lets the nsCString destructor handle clearing out the password > from memory, which may or may not be sufficient (though I imagine > if the memory was recycled, it would get filled with a new string > post haste. I only test whether it works or it does not work. I can not understand above description... :-) > If not, if you're running windows, I can build you a new release > ldap dll to try w/ the trunk. I can not patch and build Mozilla. If you attach patched ldap dll, I will try it.
Comment 10•20 years ago
|
||
Comment 11•20 years ago
|
||
> custom mozldap.dll with fix
I tested this dll on "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a2)
Gecko/20040525". But the response was same message "LDAP initialization problem".
I attach ldapmsg.zip. This includes binary dump of ldap message. This data was
exported by Ethereal.
msg1.bin...LDAP v3 request from mozilla to ldap server
msg2.bin...protocol error message from ldap server to mozilla
msg3.bin...unbind message from mozilla to ldap server
Comment 12•20 years ago
|
||
Comment 13•20 years ago
|
||
thx for trying it. At this point, I really need an ldap server that only supports v2 to run against - I'm a little surprised that we don't see another bind attempt but maybe something's bad about the state...
Comment 14•20 years ago
|
||
I was able to install (and configure) eudora's directory server on my winxp box. http://www.eudora.com/techsupport/worldmail/ldap.html it is a ldap v2 server and things don't work unless I set ldap_2.servers.localhost.protocolVersion to "2"
Comment 15•20 years ago
|
||
ok, coming up to speed. with my ldap v2 dir server, I can get "LDAP initialization problem" in my autocomplete text. for quick search in the AB we just silently fail.
Comment 16•20 years ago
|
||
(In reply to comment #13) > I really need an ldap server that only supports v2 to run against > - I'm a little surprised that we don't see another > bind attempt but maybe something's bad about the state... The LDAP server that I use is OpenLDAP 1.2.9. I can not research about it since I am not server administrator.
Comment 17•20 years ago
|
||
Ben, do we have an LDAP v2 server in the MF offices? If not, can you help us set one up for testing this?
Flags: blocking1.7? → blocking1.7+
Comment 18•20 years ago
|
||
asa / ben, david has a ldap v2 server he can test with. I can help verify his fix since I have one too.
Updated•20 years ago
|
Whiteboard: testing a patch
Comment 19•20 years ago
|
||
OK, this fix works with the test server I have...
Attachment #149080 -
Attachment is obsolete: true
Attachment #149284 -
Attachment is obsolete: true
Updated•20 years ago
|
Attachment #149443 -
Flags: superreview?(mscott)
Attachment #149443 -
Flags: review?(dmose)
Comment 20•20 years ago
|
||
Comment on attachment 149443 [details] [diff] [review] proposed fix this patch is against the trunk, and has some of the null pointer checks that were checked into 1.7...
Comment 21•20 years ago
|
||
Attachment #149443 -
Attachment is obsolete: true
Updated•20 years ago
|
Attachment #149443 -
Flags: superreview?(mscott)
Attachment #149443 -
Flags: review?(dmose)
Comment 22•20 years ago
|
||
Comment on attachment 149458 [details] [diff] [review] whoops, wrong patch last time david, I will try to r/sr/a later tonight.
Attachment #149458 -
Flags: superreview?(mscott)
Attachment #149458 -
Flags: review?(dmose)
Comment 23•20 years ago
|
||
the patch looks good, one nit: 1) + // Uf this is a second try at binding, remove the operation from pending ops s/Uf/If I was able to test this against my ldap v2 server: without the patch: 20040528 05:02:13 slapd starting 20040528 05:02:38 conn=0 fd=0 connection from localhost (127.0.0.1) 20040528 05:02:38 conn=0 op=0 BIND dn="" method=128 20040528 05:02:38 unknown version 3 20040528 05:02:38 conn=0 op=0 RESULT err=2 tag=97 nentries=0 20040528 05:02:43 conn=0 op=-1 fd=1548 closed errno=0 20040528 05:02:43 conn=0 op=1 UNBIND with the patch: 20040528 05:05:09 conn=1 fd=0 connection from localhost (127.0.0.1) 20040528 05:05:09 conn=1 op=0 BIND dn="" method=128 20040528 05:05:09 unknown version 3 20040528 05:05:09 conn=1 op=0 RESULT err=2 tag=97 nentries=0 20040528 05:05:09 conn=1 op=1 BIND dn="" method=128 20040528 05:05:09 conn=1 op=1 RESULT err=0 tag=97 nentries=0 20040528 05:05:09 conn=1 op=2 SRCH base="" scope=2 filter="(|(mail=*SS*)(c n=*SS*)(givenname=*SS*)(sn=*SS*))" 20040528 05:05:09 conn=1 op=2 RESULT err=0 tag=101 nentries=3 I was also able to test the patch against a ldap v3 server, and it still worked. sr/a=sspitzer once you get dmose's r=.
Updated•20 years ago
|
Whiteboard: testing a patch → mail sent to dmose asking for review help
Comment 24•20 years ago
|
||
I've checked this fix into the trunk - you can try this dll if you want to try it before the next nightly trunk build comes out. If this goes well, Dan will try to review it tomorrow for 1.7 checkin...
Comment 25•20 years ago
|
||
Comment on attachment 149458 [details] [diff] [review] whoops, wrong patch last time noting sr/a=sspitzer conditional on dmose's r= and no problems reported on the trunk.
Attachment #149458 -
Flags: superreview?(mscott)
Attachment #149458 -
Flags: superreview+
Attachment #149458 -
Flags: approval1.7+
Assignee | ||
Comment 26•20 years ago
|
||
This patch has threadsafety issues. David and I are working on figuring out whether there's an easy way to clean them up so that this can be checked into the 1.7 branch. More soon.
Comment 27•20 years ago
|
||
*** Bug 242767 has been marked as a duplicate of this bug. ***
Comment 28•20 years ago
|
||
well, potential thread-safety issues. We don't know if anything bad can actually happen. We know that the only time anything potentially bad can happen with this patch is when we fall back from v3 to v2, and the user hits backspace just as the initial bind request failed on a v2 server and we're trying to rebind - so the only folks even potentially affected are those for whom it would not have worked at all before. We also know from the other thread-safety issue (hitting backspace during auto-complete), that the problems were relatively infrequent...my guess is that what might happen is that we might not abandon the bind and continue the search. However, I'm not convinced that couldn't happen prior to this patch since I believe the user could hit backspace in between the time the bind succeeds and the search operation is created and fired off (but I need to check that code more closely.)
Comment 29•20 years ago
|
||
I don't think the ab search code ever abandons the bind operation. It only ever abandons search operations, from what I can tell. The autocomplete code can abandon a bind operation, however, and I don't think there's a race condition there, in normal operations, since the relevant stuff happens on the ui thread.
Comment 30•20 years ago
|
||
fixed on the aviary 1.0 branch. Will adjust the patch as necessary if more comments come up from the review / thread concerns.
Whiteboard: mail sent to dmose asking for review help → mail sent to dmose asking for review help,fixed-aviary-1.0
Assignee | ||
Comment 31•20 years ago
|
||
> We know that the only time anything potentially bad can > happen with this patch is when we fall back from v3 to v2, and the user hits > backspace just as the initial bind request failed on a v2 server and we're > trying to rebind - so the only folks even potentially affected are those for > whom it would not have worked at all before. Well, there may also be the case of a user aborting replication (see below). replication case. I'm not sure I buy that argument: the old failure mode was "refuse to connect to the LDAP server", and I'm not really sure what the new one is. It's probably just "potentially leak an object", but I'm not 100% sure of that. > We also know from the other thread-safety issue (hitting backspace during > auto-complete), that the problems were relatively infrequent... I would say they were hard/impossible to reproduce on many people's setups (such as your's and mine). That doesn't seem to me to be the same thing as relatively infrequent. That backspace crasher was one of the topcrashers according to Talkback, was it not? > my guess is that what might happen is that we might not abandon the > bind and continue the search. I'm kind of suspecting potential object leakage, though I'm not sure. > However, I'm not convinced that couldn't happen prior to this patch since I > believe the user could hit backspace in between the time the bind succeeds > and the search operation is created and fired off (but I need to check that > code more closely.) Not sure about this. > I don't think the ab search code ever abandons the bind operation. It only > ever abandons search operations, from what I can tell. That appears to be correct. However, after a quick skim, it appears to me that the AB replication code could abandon a bind. I haven't yet looked to see what the consequences of this might be. > The autocomplete code can abandon a bind operation, however, and I don't > think there's a race condition there, in normal operations, since the > relevant stuff happens on the ui thread. Couldn't the second bind attempt from the LDAP connection thread could race with an abandon from the UI thread triggered by the user hitting escape, for example? I've spent a bunch of hours staring at this code, and it seems clear to me that this patch would expose existing race conditions that had not previously been exposed. As you say, only for the (not too large) number of people who are still using LDAPv2 servers, and it could be worked around for individual users by forcibly setting the pref so this code doesn't get hit. I think chances are pretty high that you're right that when these race conditions get hit, the results will be relatively benign. However, the code is complex enough that I'd have to sit down and diagram everything on a whiteboard in order to get any more confident than "chances are pretty high". I'll leave the risk assessment up to drivers on whether the existing patch makes sense for 1.7. The right fix, I think, is that (at least) all uses of mMsgId need to be locked. A more minimal fix would be just to lock the uses of mMsgId in AbandonExt, SimpleBind, and any functions they call (eg RemovePendingOperation). Either of these fixes means that we'd have to convince ourselves that this wouldn't introduce any deadlocks.
Assignee | ||
Comment 32•20 years ago
|
||
Comment on attachment 149458 [details] [diff] [review] whoops, wrong patch last time Please fix the stuff that doesn't match module style: lines < 80 columns, tabstops are 4 spaces, K&R bracing, always check the return value of functions, comment style. >+++ nsLDAPConnection.cpp 27 May 2004 21:33:31 -0000 >@@ -720,6 +718,28 @@ > switch (rv) { > > case NS_OK: >+ { >+ PRInt32 errorCode; >+ rawMsg->GetErrorCode(&errorCode); >+ if (errorCode == LDAP_PROTOCOL_ERROR) >+ { >+ // maybe a version error, e.g., using v3 on a v2 server. >+ // if we're using v3, try v2. >+ if (loop->mRawConn->mVersion == nsILDAPConnection::VERSION3) >+ { >+ nsCAutoString password; >+ loop->mRawConn->mVersion = nsILDAPConnection::VERSION2; >+ ldap_set_option(loop->mRawConn->mConnectionHandle, LDAP_OPT_PROTOCOL_VERSION, &loop->mRawConn->mVersion); >+ nsCOMPtr <nsILDAPOperation> operation = NS_STATIC_CAST(nsILDAPOperation *, NS_STATIC_CAST(nsISupports *, aData)); >+ // we pass in an empty password to tell the operation that it >+ // should use the cached password. >+ operation->SimpleBind(password); If this call fails, we need to invoke the callback. Otherwise the caller (in particular LDAP addrbook replication) is likely to just sit there until the user gets tired of waiting and aborts. >+ operationFinished = PR_FALSE; >+ // we don't want to notify callers that we're done... >+ return PR_TRUE; >+ } >+ } >+ } > break; > > case NS_ERROR_LDAP_DECODING_ERROR:
Attachment #149458 -
Flags: review?(dmose) → review-
Comment 33•20 years ago
|
||
let's get this on the 1.7 branch... trying to ship 1.7 soon.
Comment 34•20 years ago
|
||
fixed in 1.7 - I'll address Dan's comments on the trunk as soon as I can.
Keywords: fixed1.7
Comment 35•20 years ago
|
||
I can query LDAPv2 fine after delete "protocolVersion" property with Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040602.
Comment 36•20 years ago
|
||
Didn't the changes to nsLDAPConnection::RemovePendingOperation introduce a memory leak? The Get() on mPendingOperations addrefs, and so does the assignment into the nsCOMPtr, but you only have one release (the nsCOMPtr destructor). Note that the old code told the nsCOMPtr to NOT addref on assignment....
Comment 37•20 years ago
|
||
Er, I meant the changes to nsLDAPConnection::InvokeMessageCallback
Comment 38•20 years ago
|
||
I think you're right...
Comment 39•20 years ago
|
||
Updated•20 years ago
|
Attachment #150281 -
Flags: superreview?(bzbarsky)
Attachment #150281 -
Flags: review?(dmose)
Assignee | ||
Comment 40•20 years ago
|
||
Comment on attachment 150281 [details] [diff] [review] ref-counting fix r=dmose
Attachment #150281 -
Flags: review?(dmose) → review+
Comment 41•20 years ago
|
||
Comment on attachment 150281 [details] [diff] [review] ref-counting fix sr=bzbarsky. It may be worth converting mPendingOperations to be an nsInterfaceHashtable long-term, so these issues don't arise...
Updated•20 years ago
|
Attachment #150281 -
Flags: superreview?(bzbarsky) → superreview+
Comment 42•20 years ago
|
||
the ref-counting fix has been checked into the aviary 1.0 branch too
Whiteboard: mail sent to dmose asking for review help,fixed-aviary-1.0 → fixed-aviary-1.0
Updated•20 years ago
|
Whiteboard: fixed-aviary-1.0 → fixed-aviary1.0
Comment 43•20 years ago
|
||
Comment on attachment 150281 [details] [diff] [review] ref-counting fix simple fix, for a smallish but new leak.
Attachment #150281 -
Flags: approval1.7?
Comment 44•20 years ago
|
||
Comment on attachment 150281 [details] [diff] [review] ref-counting fix a=sspitzer for 1.7, and I've sent email to asa so he knows about it.
Attachment #150281 -
Flags: approval1.7? → approval1.7+
Comment 45•20 years ago
|
||
Another fixed1.7 bug with a misleading blocking1.7+ flag :-(
Comment 47•20 years ago
|
||
verified. I've tested mozilla 1.7 release bits and this is working.
Keywords: fixed1.7 → verified1.7
Updated•20 years ago
|
Product: MailNews → Core
Comment 48•20 years ago
|
||
Dan, this attempts to address your comments re K&R, comment format, indentation, and handling an error from SimpleBind...
Comment 49•20 years ago
|
||
Comment on attachment 168160 [details] [diff] [review] address sr comments for trunk, formatting and error handling I'd love to get this closed and off my list :-)
Attachment #168160 -
Flags: review?(dmose)
Assignee | ||
Updated•19 years ago
|
Component: MailNews: LDAP Integration → Address Book
Flags: review?(dmose)
Flags: review-
Flags: review+
Flags: approval1.7+
Product: Core → Thunderbird
Target Milestone: --- → Thunderbird1.1
Assignee | ||
Updated•19 years ago
|
Component: Address Book → MailNews: LDAP Integration
Product: Thunderbird → Core
Target Milestone: Thunderbird1.1 → mozilla1.8beta2
Assignee | ||
Comment 51•19 years ago
|
||
Comment on attachment 168160 [details] [diff] [review] address sr comments for trunk, formatting and error handling sr=dmose; checked in.
Attachment #168160 -
Flags: superreview+
Assignee | ||
Comment 52•19 years ago
|
||
Final patch bit checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 53•19 years ago
|
||
I filed bug 289021 for locking / threading cleanup for the LDAP XPCOM SDK.
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•