Closed Bug 355409 Opened 18 years ago Closed 18 years ago

unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works

Categories

(Thunderbird :: Address Book, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: moco, Assigned: Bienvenu)

References

Details

(Keywords: fixed1.8.1.2, regression)

Attachments

(1 file, 1 obsolete file)

unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works I was following the instructions from aravind (see https://intranet.mozilla.org/Email_%28Zimbra%29#Accessing_via_webmail) and I was unable to use autocomplete or addressbook search against the ldap server. From my environment: NSPR_LOG_MODULES=ldap:5,ldapautocomplete:5 NSPR_LOG_FILE=C:\home\mozilla-nspr-ldap.log Here's what I get in that log file: 3904[340bce8]: nsLDAPConnection::Run() entered 0[2748e0]: nsLDAPOperation::SimpleBind(): called; bindName = 'mail=sspitzer@mozilla.com,o=com,dc=mozilla'; 0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*s*)(cn=*s*)(givenName=*s*)(sn=*s*))', aAttrCounts = 64, aSizeLimit = 100 0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*seth*)(cn=*seth*)(givenName=*seth*)(sn=*seth*))', aAttrCounts = 64, aSizeLimit = 100 0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*seth*)(cn=*seth*)(givenName=*seth*)(sn=*seth*))', aAttrCounts = 64, aSizeLimit = 100 Note, no search results ever returned. I switched to tbird 1.5.0.7 and it worked (same profile). note, I saw this with mac os x and windows xp. I'm using version 2 beta 1 (20061004) on mac os x.
(In reply to comment #0) > unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works Ok, this is strange. 1.5.0.7 and 2.0 *should* be using the same c-sdk code. That doesn't necessarily mean it should work, but I'd have thought it would count for a lot. There are however, two checkins to the 2.0 branch that aren't in the 1.5.0.x branch: 2006-02-15 01:17 nsLDAPProtocolHandler.cpp Bug 320498 Fix NewChannel impls to handle null in argument 2005-09-15 08:48 nsLDAPBERElement.cpp Fix LDAP BER Element encoding issue (bug 308511) So it could be one of those. Although its unlikely to be the second one as I'd have thought that would affect non-ssl connections as well. I haven't been able to set up an ssl server to be able to verify this. Could you try a current trunk build? The c-sdk is very different, so I don't know if it'll reveal anything. Another thought would be if you could do a debug build of 2.0 yourself can you look for any assertions happening or other logged failures? If you can't David might be able to. I saw a comment in another bug about saying folks had had problems with ssl but I've not seen anything in the newsgroups to verify it so I was thinking it was probably just one or two people with incorrect settings. I'm setting ? for blocking thunderbird 2, it could be a big regression for some people if ssl isn't working, and we need to find out why.
Flags: blocking-thunderbird2?
Keywords: regression
it may be related, but on the official tbird 2.0 branch nightly build, when I quit I get the prompt described in bug #188554.
> So it could be one of those. Note that the LDAP protocol handler and channel are not build by default (only with --enable-ldap-experimental).
Hi Mark. >I saw a comment in another bug about saying folks had had problems with ssl but > I've not seen anything in the newsgroups to verify it so I was thinking it was > probably just one or two people with incorrect settings. I am one of those people that suffer from LDAP with SSL not working in the 2.0 nightlies (but have no problems with 1.5.0.x) Not sure if you refer to this message: http://forums.mozillazine.org/viewtopic.php?p=2422989#2422989 The problem is reproducable with a new profile. Not sure if there is another way to make sure that my settings are correct. Please advice. Latest build I am having problems with is from November 4th, located here: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2006-11-04-03-mozilla1.8/thunderbird-2.0b1pre.en-US.win32.installer.exe
Hi Christos, (In reply to comment #4) > I am one of those people that suffer from LDAP with SSL not working in the 2.0 > nightlies (but have no problems with 1.5.0.x) Not sure if you refer to this > message: http://forums.mozillazine.org/viewtopic.php?p=2422989#2422989 Any chance you could try setting up the ldap directory for autocomplete and then doing a compose mail, typing in several chars and seeing if any error messages come up?
(In reply to comment #5) > Any chance you could try setting up the ldap directory for autocomplete and > then doing a compose mail, typing in several chars and seeing if any error > messages come up? I am pretty sure that the autocomplete is on (because it does give me partial matches in 1.5.0.x). However, just to be on the safe side, can we make sure we are talking about the same option (is this under Options->Composition)? I assume you want me to report the testing with the debug flags turned on.
(In reply to comment #6) > I am pretty sure that the autocomplete is on (because it does give me partial > matches in 1.5.0.x). However, just to be on the safe side, can we make sure we > are talking about the same option (is this under Options->Composition)? Yep that's the one. > I assume you want me to report the testing with the debug flags turned on. Not in this case - autocomplete should actually report an error on the display if there is one (might not be very useful, but you never know...)
Unfortunately nothing useful came up. I start typing one of the names (that should be appearing on the LDAP address book) and after a short pause, the black font for the name turns red (I assume this means no matches...). Doing the same in 1.5.0.x would give me a list of matches (and a drop-down menu). Not sure what else information you guys are interested in. Just let me know. I would love to help you track this down.
(In reply to comment #8) > Not sure what else information you guys are interested in. Just let me know. I > would love to help you track this down. Any chance you could test a trunk nightly? (http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/) Its a bit of a stab in the dark, but it may be useful to find out. Additionally, and I think this is highly unlikely, you could try changing the pref ldap_2.servers.default.attrmap.DisplayName from "cn,commonname" to "displayName,cn,commonname". I think this is unlikely as if this was the cause, I wouldn't expect it to work in 1.5.0.7 either.
(In reply to comment #9) > > Additionally, and I think this is highly unlikely, you could try changing the > pref ldap_2.servers.default.attrmap.DisplayName from "cn,commonname" to > "displayName,cn,commonname". I think this is unlikely as if this was the cause, > I wouldn't expect it to work in 1.5.0.7 either. So, first I followed your suggestion and modify the ldap_2.servers.default.attrmap.DisplayName preference. I then tried with 1.5.0.8, with a 2.0 beta build and finally with a trunk nightly. Results: (1) 1.5.0.8 --> LDAP works fine (2) version 2 beta 1 (20061118) --> LDAP does not work, font turns red (3) version 3 alpha 1 (20061119) --> LDAP does not work, pop-up error: <LDAP server connection failed> Well, at least with the nightly I am getting some feedback...
This should really block thunderbird2. We have thousands of people using this feature at UPenn.
I agree.
Flags: blocking-thunderbird2? → blocking-thunderbird2+
I'll look into this tomorrow...
Assignee: mscott → bienvenu
we are connecting to the imap server, but we're never getting to the nsLDAPBERElement.cpp code that's different in 2.0. We're not getting any errors setting up LDAP either, e.g., the call to nsLDAPInstallSSL
I wonder if the ipv6 changes broke ssl ldap...I'll try reverting those first.
on the trunk (and presumably 2.0 branch), the call to nsldapi_send_ber_message fails, when we're trying to do a simple bind, with a -1 error code. 1.5 shipped with a much older version of the c-sdk - I wonder if there's some change we need to make in directory/xpcom in our use of the c-sdk to get SSL to work.
(In reply to comment #16) > on the trunk (and presumably 2.0 branch), the call to nsldapi_send_ber_message > fails, when we're trying to do a simple bind, with a -1 error code. 1.5 shipped > with a much older version of the c-sdk - I wonder if there's some change we > need to make in directory/xpcom in our use of the c-sdk to get SSL to work. > I thought that 1.5 is shipping with the same c-sdk as 2.0. Trunk is much later though as you say. I also thought that the regression was found before the ipv6 code went in, but I may be wrong about that.
Other ideas: - Some change in default SSL settings or certificates that causes the SSL handshake to fail (maybe an NSS change?) - Check the LDAP server access and error logs to see if anything interesting is logged. - Check to see what NSS or OS errors are set inside nsldapi_send_ber_message().
I don't think it's the ipv6 change, since reverting that locally had no effect - 1.5 ldap code does look a lot more like 2.0 - I'll try debugging the 2.0 code; I tend to think it might be an SSL change. I wonder if this is related to the TLS problems we were having with IMAP, which were also new in 2.0, iirc.
prldap_write is failing in the call to PR_Send, with a -1 error code. http://lxr.mozilla.org/mozilla1.8/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-io.c#210
and we ultimately end up here: static PRInt32 PR_CALLBACK PSMSend(PRFileDesc *fd, const void *buf, PRInt32 amount, PRIntn flags, PRIntervalTime timeout) { nsNSSShutDownPreventionLock locker; if (!fd || !fd->lower) { PR_SetError(PR_BAD_DESCRIPTOR_ERROR, 0); return -1; } PR_SetError(PR_UNKNOWN_ERROR, 0); return -1; } which must be the wrong place, because it always returns an error.
turning on SSLTRACE doesn't give me much: prldap_try_one_address(): Trying 63.245.208.172:636... prldap_timeout2it: 10000ms prldap_try_one_address(): Connected. 3724: SSL: grow buffer from 0 to 18432 3724: SSL: grow buffer from 0 to 18432 prldap_timeout2it: 10000ms We must not have negotiated to the extent that the ssl io methods get defined. I'll try turning on SSLDEBUG to see if that tells me more.
my guess is that bug 111384 might be involved - it looks to me like it changes the the way the socket methods get initialized. It's a giant patch, though, so I could be wrong. I'm not sure why only ldap would be affected, but here's the way we set up SSL: http://lxr.mozilla.org/mozilla/source/directory/xpcom/base/src/nsLDAPConnection.cpp#980
It looks to me like the problem is that PSMSend doesn't do anything, and it appears that LDAP is the only client of PR_Send, outside of nspr, so that PR_Send doesn't work over SSL. Kaie or wtc, is that possible? If so, suggestions? Thx!
PR_Send should work over SSL. It's strange that PSMSend isn't implemented.
As an experiment, I tried changing the ldap code to use PR_Write, but that failed because the ssl code decided the socket was busy - but at least it tried :-) I naively thought PR_Write might work since it's also blocking, according to the comments in nsprpub.h
(In reply to comment #25) > PR_Send should work over SSL. It's strange that PSMSend isn't > implemented. We have been using PR_Send() for years in the libprldap code (LDAP + NSPR glue, which is also used when running over SSL). I think something else can be used as long as a timeout can be specified. Strange that something change to break this... I wonder if some initialization is just not happening that needs to be.
Here is the story on send/receive functions from my point of view. When I began to work on Mozilla, I learned that SSL networking in PSM is implemented using two additional IO layers: - the SSL implementation layer in NSS - an additional layer implemented in PSM The PSM layer implemented read/write only. It never implemented send/receive. This is the revision of PSM's IO file that defines PSM's layer functions, just before the change to using the SSL thread in bug 111384: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp&rev=1.102&root=/cvsroot#1152 As PSM does not implement handlers for send/receive, I assume the default handler will pass the call through to the next layer. PSM's functions are implemented to keep track of the SSL handshake state. Calls to send/receive always had the potential to confuse this state tracking. When I worked on bug 111384, I had to make sure that all I/O activity is correctly transported between the threads. I investigated whether send/receive is required. I searched our codebase for calls to send/receive. I could find only one place, and that was a receive with always one byte and PR_MSG_PEEK. But based on the above and my search results, I concluded that send/receive can't really work with SSL sockets and that it is not used. Because of that I add the new send/receive functions at the PSM layer that explicitly return with a failure. I do not understand why I didn't find the call to PR_Send in the LDAP code when I looked for such calls. I'm sorry, this was my mistake. In order to fix this, we need to implement PSM's send and receive functions.
(In reply to comment #26) > As an experiment, I tried changing the ldap code to use PR_Write, but that > failed because the ssl code decided the socket was busy - but at least it tried > :-) I naively thought PR_Write might work since it's also blocking, according > to the comments in nsprpub.h The property of being blocking or non-blocking is a mode of the socket, not of the method used on it. PR_Write is blocking, or not, depending on the mode of the socket. If you got back PR_WOULD_BLOCK_ERROR, NSPR's equivalent of EWOULDBLOCK, then you had a non-blocking socket. (In reply to comment #28) > As PSM does not implement handlers for send/receive, I assume the default > handler will pass the call through to the next layer. correct. > PSM's functions are implemented to keep track of the SSL handshake state. > Calls to send/receive always had the potential to confuse this state tracking. Why is the use of send/recv any different than using write/read? In fact, in the SSL layer, write and read are merely implemented as calls to send and receive with fixed values for timeout time and "flags" parameters. > But based on the above and my search results, I concluded that send/receive > can't really work with SSL sockets Please explain that. SSL and NSPR fully support it. I don't know why PSM would think that send/recv are any different than write/read, other than the obvious fixed settings for flags and timeouts.
(In reply to comment #29) > > But based on the above and my search results, I concluded that send/receive > > can't really work with SSL sockets > > Please explain that. SSL and NSPR fully support it. I don't know why PSM > would think that send/recv are any different than write/read, other than > the obvious fixed settings for flags and timeouts. I did not complain about NSS. I said, because PSM wants to excecute additional logic on read/write, and because on send/receive we bypassed PSM's additional logic, I concluded that this might result in unexpected behaviour (from PSM's point of view), if the application used send/receive. But no need to explore my earlier thinking further. I agree that the older code obviously worked in many scenarios, that did not involve PSM's additional logic like TLS intolerance detection. And I agree we need to fix it. > > PSM's functions are implemented to keep track of the SSL handshake state. > > Calls to send/receive always had the potential to confuse this state tracking. > > Why is the use of send/recv any different than using write/read? I do not know why the original authors of the PSM I/O layer had decided to implement handlers for read/write, but not to implement handlers for send/receive. Let's go ahead and fix that. > In fact, in the SSL layer, write and read are merely implemented as calls to > send and receive with fixed values for timeout time and "flags" parameters. That's great, so the fix in PSM will probably be simple and straightforward. I'll have a look.
thx for the info, Nelson, and thx for looking at this, Kaie! I'd be happy to test any fix.
We'd also probably try to respin our beta1 2.0 build if we got a fix for this in the next day or two, assuming a fix mainly just affected callers of PR_Send, of which LDAP is the only one :-)
I have a patch that appears to fix it. It restores the previous behaviour on send/receive and makes it work for LDAP. Thanks to Nelson and Wan-Teh for proposing this simple approach. This fix is based on the fact, that only LDAP uses send/receive, and that LDAP never uses the PR_Read or PR_Write functions. If any code tried to mix send/receive with read/write calls on the same socket, this could result in surprising results, because of the threading and buffering used by the read/write calls. At this time I seem unable to provide a fix that would be really safe. The SSL threading implementation is full based on the non-blocking I/O approach that all the rest of Mozilla appears to be using. When Nelson and Wan-Teh first came up with the proposal to bypass PSM's SSL threaded layer and just go to NSS directly, I immediately rejected that proposal. I have to apologize for rejecting what seems to work now. I had assumed that when we do a blocking I/O call on the main thread, any attempts by PSM's callbacks to bring up bad cert UI, or to call back into Necko for OCSP would immediately result in a deadlock. I don't understand yet why, with this simple patch, I do not see deadlocks on bad certs or certs that attempt to do OCSP - triggered from blocking I/O calls by the LDAP code on the main thread. But anyway, despite the fact that I should learn more details about Mozilla's threading and networking internals, it's good that we appear to have a working fix.
Attached patch Patch v1Splinter Review
this patch fixes it for me. I believe the ldap code makes its calls into the ldap c-sdk from a per-connection thread, not the ui thread, which may explain why the expected problems didn't appear.
Comment on attachment 247867 [details] [diff] [review] Patch v1 Kai, in PSMRecv, you may want to only allow 'flags' to be 0 or PR_MSG_PEEK.
Comment on attachment 247867 [details] [diff] [review] Patch v1 Wan-Teh, as you already know what this is about, could you please review this patch that will restore the previous behaviour and pass send/receive directly to the NSS SSL layer?
Attachment #247867 - Flags: review?(wtchang)
Comment on attachment 247867 [details] [diff] [review] Patch v1 r=wtc. I want to suggest an improvement. You can either check in this patch now and work on a supplemental patch later, or prepare a new version of the patch. For a non-blocking socket, PR_Read and PR_Recv(flags=0) are the same, and PR_Write and PR_Send(flags=0) are the same. (Note: there is no flag defined for PR_Send.) Therefore, you should only call fd->lower->methods->recv or fd->lower->methods->send if the socket is *blocking*. If the socket is non-blocking, you should call nsSSLThread::requestRead or nsSSLThread::requestWrite Call fd->lower->methods->getsocketoption to determine if the socket is non-blocking. The name of that socket option is PR_SockOpt_Nonblocking.
Attachment #247867 - Flags: review?(wtchang) → review+
Kai, if you implement the improvement I suggested, make sure you test it by changing Mozilla (Necko?) to call PR_Recv and PR_Send instead of PR_Read and PR_Write.
Comment on attachment 247867 [details] [diff] [review] Patch v1 I checked this patch in. I'll work on Wan-Teh's proposal in a follow-up patch.
Attachment #247867 - Flags: approval1.8.1.2?
Attachment #247867 - Flags: approval1.8.1.1?
marking fixed on trunk
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Attached patch Supplemental Patch v2 (obsolete) — Splinter Review
Attachment #248259 - Flags: review?
Sorry for the noise; I assume I should be able to pick up tomorrow's nightly from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/ and confirm that the patch fixes the problem? Also, please let me know if there is a way to be notified of "to be tested" 2.0 builds with this fix - I would be happy to provide feedback. Cheers, C.
Attachment #248259 - Attachment is obsolete: true
Attachment #248259 - Flags: review?
Wan-Teh's proposal can not be implemented as straightforward as he assumed, so I filed bug 363455 to work on that in its own context.
Blocks: 363455
(In reply to comment #43) > I assume I should be able to pick up tomorrow's nightly > from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/ and confirm > that the patch fixes the problem? Yes, look at directory latest-trunk. > Also, please let me know if there is a way to be notified of "to be tested" 2.0 > builds with this fix - I would be happy to provide feedback. As you are already cc'ed to this bug, watch out for a change that says fixed-1.8.1.x (x = 1 or 2). After that you can test a nightly build from directory latest-mozilla1.8
Comment on attachment 247867 [details] [diff] [review] Patch v1 1811 is gone, approved for 1.8.1.2, a=dveditz
Attachment #247867 - Flags: approval1.8.1.2?
Attachment #247867 - Flags: approval1.8.1.2+
Attachment #247867 - Flags: approval1.8.1.1?
Blocks: 354525
I am happy to report that the problem is fixed in the latest trunk: version 3 alpha 1 (20061214) Is there a plan to include in 2.0 as well? Many thanks!
Comment on attachment 247867 [details] [diff] [review] Patch v1 Darin, could you review this patch for PSM's NSPR I/O layer? One way to review this patch is to see that it changes the behavior from failing with the PR_UNKNOWN_ERROR error to forwarding the method call to the layer below. So it "should not" be worse than the original code.
Attachment #247867 - Flags: superreview?(darin.moz)
Attachment #247867 - Flags: superreview?(darin.moz) → superreview+
I checked in patch v1 on the MOZILLA_1_8_BRANCH for 1.8.1.2. Checking in nsNSSIOLayer.cpp; /cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp,v <-- nsNSSIOLayer. cpp new revision: 1.97.2.15; previous revision: 1.97.2.14 done
Keywords: fixed1.8.1.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: