unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works

RESOLVED FIXED

Status

RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: moco, Assigned: Bienvenu)

Tracking

({fixed1.8.1.2, regression})

x86
All
fixed1.8.1.2, regression
Dependency tree / graph
Bug Flags:
blocking-thunderbird2 +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works

I was following the instructions from aravind (see 
https://intranet.mozilla.org/Email_%28Zimbra%29#Accessing_via_webmail) and I was unable to use autocomplete or addressbook search against the ldap server.

From my environment:

NSPR_LOG_MODULES=ldap:5,ldapautocomplete:5
NSPR_LOG_FILE=C:\home\mozilla-nspr-ldap.log

Here's what I get in that log file:

3904[340bce8]: nsLDAPConnection::Run() entered
0[2748e0]: nsLDAPOperation::SimpleBind(): called; bindName = 'mail=sspitzer@mozilla.com,o=com,dc=mozilla';
0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*s*)(cn=*s*)(givenName=*s*)(sn=*s*))', aAttrCounts = 64, aSizeLimit = 100
0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*seth*)(cn=*seth*)(givenName=*seth*)(sn=*seth*))', aAttrCounts = 64, aSizeLimit = 100
0[2748e0]: nsLDAPOperation::SearchExt(): called with aBaseDn = 'dc=mozilla'; aFilter = '(|(mail=*seth*)(cn=*seth*)(givenName=*seth*)(sn=*seth*))', aAttrCounts = 64, aSizeLimit = 100

Note, no search results ever returned.

I switched to tbird 1.5.0.7 and it worked (same profile).

note, I saw this with mac os x and windows xp.

I'm using version 2 beta 1 (20061004) on mac os x.
(In reply to comment #0)
> unable to search LDAP server (over SSL) with tbird 2.0 build, but 1.5.0.7 works

Ok, this is strange. 1.5.0.7 and 2.0 *should* be using the same c-sdk code. That doesn't necessarily mean it should work, but I'd have thought it would count for a lot.

There are however, two checkins to the 2.0 branch that aren't in the 1.5.0.x branch:

2006-02-15 01:17 nsLDAPProtocolHandler.cpp Bug 320498 Fix NewChannel impls to handle null in argument
2005-09-15 08:48 nsLDAPBERElement.cpp Fix LDAP BER Element encoding issue (bug 308511)

So it could be one of those. Although its unlikely to be the second one as I'd have thought that would affect non-ssl connections as well.

I haven't been able to set up an ssl server to be able to verify this.

Could you try a current trunk build? The c-sdk is very different, so I don't know if it'll reveal anything.

Another thought would be if you could do a debug build of 2.0 yourself can you look for any assertions happening or other logged failures? If you can't David might be able to.

I saw a comment in another bug about saying folks had had problems with ssl but I've not seen anything in the newsgroups to verify it so I was thinking it was probably just one or two people with incorrect settings.

I'm setting ? for blocking thunderbird 2, it could be a big regression for some people if ssl isn't working, and we need to find out why.
Flags: blocking-thunderbird2?
Keywords: regression
it may be related, but on the official tbird 2.0 branch nightly build, when I quit I get the prompt described in bug #188554.
> So it could be one of those.

Note that the LDAP protocol handler and channel are not build by default (only with --enable-ldap-experimental).
Hi Mark.
 
>I saw a comment in another bug about saying folks had had problems with ssl but
> I've not seen anything in the newsgroups to verify it so I was thinking it was
> probably just one or two people with incorrect settings.

I am one of those people that suffer from LDAP with SSL not working in the 2.0 nightlies (but have no problems with 1.5.0.x) Not sure if you refer to this message: http://forums.mozillazine.org/viewtopic.php?p=2422989#2422989

The problem is reproducable with a new profile. Not sure if there is another way to make sure that my settings are correct. Please advice.

Latest build I am having problems with is from November 4th, located here: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2006-11-04-03-mozilla1.8/thunderbird-2.0b1pre.en-US.win32.installer.exe
Hi Christos,

(In reply to comment #4)
> I am one of those people that suffer from LDAP with SSL not working in the 2.0
> nightlies (but have no problems with 1.5.0.x) Not sure if you refer to this
> message: http://forums.mozillazine.org/viewtopic.php?p=2422989#2422989

Any chance you could try setting up the ldap directory for autocomplete and then doing a compose mail, typing in several chars and seeing if any error messages come up?
(In reply to comment #5)
> Any chance you could try setting up the ldap directory for autocomplete and
> then doing a compose mail, typing in several chars and seeing if any error
> messages come up?

I am pretty sure that the autocomplete is on (because it does give me partial matches in 1.5.0.x). However, just to be on the safe side, can we make sure we are talking about the same option (is this under Options->Composition)?

I assume you want me to report the testing with the debug flags turned on.
(In reply to comment #6)

> I am pretty sure that the autocomplete is on (because it does give me partial
> matches in 1.5.0.x). However, just to be on the safe side, can we make sure we
> are talking about the same option (is this under Options->Composition)?
Yep that's the one.

> I assume you want me to report the testing with the debug flags turned on.

Not in this case - autocomplete should actually report an error on the display if there is one (might not be very useful, but you never know...)
Unfortunately nothing useful came up. I start typing one of the names (that should be appearing on the LDAP address book) and after a short pause, the black font for the name turns red (I assume this means no matches...). Doing the same in 1.5.0.x would give me a list of matches (and a drop-down menu).

Not sure what else information you guys are interested in. Just let me know. I would love to help you track this down.
(In reply to comment #8)
> Not sure what else information you guys are interested in. Just let me know. I
> would love to help you track this down.

Any chance you could test a trunk nightly? (http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/) Its a bit of a stab in the dark, but it may be useful to find out.

Additionally, and I think this is highly unlikely, you could try changing the pref ldap_2.servers.default.attrmap.DisplayName from "cn,commonname" to "displayName,cn,commonname". I think this is unlikely as if this was the cause, I wouldn't expect it to work in 1.5.0.7 either.
(In reply to comment #9)
> 
> Additionally, and I think this is highly unlikely, you could try changing the
> pref ldap_2.servers.default.attrmap.DisplayName from "cn,commonname" to
> "displayName,cn,commonname". I think this is unlikely as if this was the cause,
> I wouldn't expect it to work in 1.5.0.7 either.

So, first I followed your suggestion and modify the  ldap_2.servers.default.attrmap.DisplayName preference. I then tried with 1.5.0.8, with a 2.0 beta build and finally with a trunk nightly. 

Results:
(1) 1.5.0.8 --> LDAP works fine
(2) version 2 beta 1 (20061118) --> LDAP does not work, font turns red
(3) version 3 alpha 1 (20061119) --> LDAP does not work, pop-up error: <LDAP server connection failed>

Well, at least with the nightly I am getting some feedback...

Comment 11

12 years ago
This should really block thunderbird2.  We have thousands of people using this feature at UPenn.

Comment 12

12 years ago
I agree.
Flags: blocking-thunderbird2? → blocking-thunderbird2+
(Assignee)

Comment 13

12 years ago
I'll look into this tomorrow...
Assignee: mscott → bienvenu
(Assignee)

Comment 14

12 years ago
we are connecting to the imap server, but we're never getting to the nsLDAPBERElement.cpp code that's different in 2.0.  We're not getting any errors setting up LDAP either, e.g., the call to nsLDAPInstallSSL
(Assignee)

Comment 15

12 years ago
I wonder if the ipv6 changes broke ssl ldap...I'll try reverting those first.
(Assignee)

Comment 16

12 years ago
on the trunk (and presumably 2.0 branch), the call to nsldapi_send_ber_message fails, when we're trying to do a simple bind, with a -1 error code. 1.5 shipped with a much older version of the c-sdk - I wonder if there's some change we need to make in directory/xpcom in our use of the c-sdk to get SSL to work.
(In reply to comment #16)
> on the trunk (and presumably 2.0 branch), the call to nsldapi_send_ber_message
> fails, when we're trying to do a simple bind, with a -1 error code. 1.5 shipped
> with a much older version of the c-sdk - I wonder if there's some change we
> need to make in directory/xpcom in our use of the c-sdk to get SSL to work.
> 
I thought that 1.5 is shipping with the same c-sdk as 2.0. Trunk is much later though as you say.

I also thought that the regression was found before the ipv6 code went in, but I may be wrong about that.

Comment 18

12 years ago
Other ideas:
 - Some change in default SSL settings or certificates that causes the
   SSL handshake to fail (maybe an NSS change?)

 - Check the LDAP server access and error logs to see if anything
   interesting is logged.

 - Check to see what NSS or OS errors are set inside nsldapi_send_ber_message().
(Assignee)

Comment 19

12 years ago
I don't think it's the ipv6 change, since reverting that locally had no effect - 1.5 ldap code does look a lot more like 2.0 - I'll try debugging the 2.0 code; 

I tend to think it might be an SSL change. I wonder if this is related to the TLS problems we were having with IMAP, which were also new in 2.0, iirc.
(Assignee)

Comment 20

12 years ago
prldap_write is failing in the call to PR_Send, with a -1 error code.

http://lxr.mozilla.org/mozilla1.8/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-io.c#210

(Assignee)

Comment 21

12 years ago
and we ultimately end up here:

static PRInt32 PR_CALLBACK PSMSend(PRFileDesc *fd, const void *buf, PRInt32 amount,
    PRIntn flags, PRIntervalTime timeout)
{
  nsNSSShutDownPreventionLock locker;
  if (!fd || !fd->lower) {
    PR_SetError(PR_BAD_DESCRIPTOR_ERROR, 0);
    return -1;
  }

  PR_SetError(PR_UNKNOWN_ERROR, 0);
  return -1;
}

which must be the wrong place, because it always returns an error.
(Assignee)

Comment 22

12 years ago
turning on SSLTRACE doesn't give me much:

prldap_try_one_address(): Trying 63.245.208.172:636...
prldap_timeout2it: 10000ms
prldap_try_one_address(): Connected.
3724: SSL: grow buffer from 0 to 18432
3724: SSL: grow buffer from 0 to 18432
prldap_timeout2it: 10000ms

We must not have negotiated to the extent that the ssl io methods get defined. I'll try turning on SSLDEBUG to see if that tells me more.
(Assignee)

Comment 23

12 years ago
my guess is that bug 111384 might be involved - it looks to me like it changes the the way the socket methods get initialized. It's a giant patch, though, so I could be wrong. I'm not sure why only ldap would be affected, but here's the way we set up SSL:

http://lxr.mozilla.org/mozilla/source/directory/xpcom/base/src/nsLDAPConnection.cpp#980 
(Assignee)

Comment 24

12 years ago
It looks to me like the problem is that PSMSend doesn't do anything, and it appears that LDAP is the only client of PR_Send, outside of nspr, so that PR_Send doesn't work over SSL. Kaie or wtc, is that possible? If so, suggestions? Thx!

Comment 25

12 years ago
PR_Send should work over SSL.  It's strange that PSMSend isn't
implemented.
(Assignee)

Comment 26

12 years ago
As an experiment, I tried changing the ldap code to use PR_Write, but that failed because the ssl code decided the socket was busy - but at least it tried :-) I naively thought PR_Write might work since it's also blocking, according to the comments in nsprpub.h

Comment 27

12 years ago
(In reply to comment #25)
> PR_Send should work over SSL.  It's strange that PSMSend isn't
> implemented.

We have been using PR_Send() for years in the libprldap code (LDAP + NSPR glue, which is also used when running over SSL).  I think something else can be used as long as a timeout can be specified.  Strange that something change to break this... I wonder if some initialization is just not happening that needs to be.

Here is the story on send/receive functions from my point of view.

When I began to work on Mozilla, I learned that SSL networking in PSM is implemented using two additional IO layers:
- the SSL implementation layer in NSS
- an additional layer implemented in PSM

The PSM layer implemented read/write only.
It never implemented send/receive.
This is the revision of PSM's IO file that defines PSM's layer functions, just before the change to using the SSL thread in bug 111384:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp&rev=1.102&root=/cvsroot#1152

As PSM does not implement handlers for send/receive, I assume the default handler will pass the call through to the next layer.

PSM's functions are implemented to keep track of the SSL handshake state.
Calls to send/receive always had the potential to confuse this state tracking.

When I worked on bug 111384, I had to make sure that all I/O activity is correctly transported between the threads.

I investigated whether send/receive is required. I searched our codebase for calls to send/receive. I could find only one place, and that was a receive with always one byte and PR_MSG_PEEK.

But based on the above and my search results, I concluded that send/receive can't really work with SSL sockets and that it is not used. Because of that I add the new send/receive functions at the PSM layer that explicitly return with a failure.

I do not understand why I didn't find the call to PR_Send in the LDAP code when I looked for such calls. I'm sorry, this was my mistake.

In order to fix this, we need to implement PSM's send and receive functions.
(In reply to comment #26)
> As an experiment, I tried changing the ldap code to use PR_Write, but that
> failed because the ssl code decided the socket was busy - but at least it tried
> :-) I naively thought PR_Write might work since it's also blocking, according
> to the comments in nsprpub.h

The property of being blocking or non-blocking is a mode of the socket,
not of the method used on it.  PR_Write is blocking, or not, depending on 
the mode of the socket.  If you got back PR_WOULD_BLOCK_ERROR, NSPR's 
equivalent of EWOULDBLOCK, then you had a non-blocking socket.  


(In reply to comment #28)
> As PSM does not implement handlers for send/receive, I assume the default
> handler will pass the call through to the next layer.

correct.

> PSM's functions are implemented to keep track of the SSL handshake state.
> Calls to send/receive always had the potential to confuse this state tracking.

Why is the use of send/recv any different than using write/read?
In fact, in the SSL layer, write and read are merely implemented as calls to 
send and receive with fixed values for timeout time and "flags" parameters.

> But based on the above and my search results, I concluded that send/receive
> can't really work with SSL sockets 

Please explain that.  SSL and NSPR fully support it.  I don't know why PSM
would think that send/recv are any different than write/read, other than
the obvious fixed settings for flags and timeouts.
(In reply to comment #29)
> > But based on the above and my search results, I concluded that send/receive
> > can't really work with SSL sockets 
> 
> Please explain that.  SSL and NSPR fully support it.  I don't know why PSM
> would think that send/recv are any different than write/read, other than
> the obvious fixed settings for flags and timeouts.

I did not complain about NSS.

I said, because PSM wants to excecute additional logic on read/write, and because on send/receive we bypassed PSM's additional logic, I concluded that this might result in unexpected behaviour (from PSM's point of view), if the application used send/receive.

But no need to explore my earlier thinking further. I agree that the older code obviously worked in many scenarios, that did not involve PSM's additional logic like TLS intolerance detection. And I agree we need to fix it.


> > PSM's functions are implemented to keep track of the SSL handshake state.
> > Calls to send/receive always had the potential to confuse this state tracking.
> 
> Why is the use of send/recv any different than using write/read?

I do not know why the original authors of the PSM I/O layer had decided to implement handlers for read/write, but not to implement handlers for send/receive. Let's go ahead and fix that.


> In fact, in the SSL layer, write and read are merely implemented as calls to 
> send and receive with fixed values for timeout time and "flags" parameters.

That's great, so the fix in PSM will probably be simple and straightforward. I'll have a look.
(Assignee)

Comment 31

12 years ago
thx for the info, Nelson, and thx for looking at this, Kaie! I'd be happy to test any fix.
(Assignee)

Comment 32

12 years ago
We'd also probably try to respin our beta1 2.0 build if we got a fix for this in the next day or two, assuming a fix mainly just affected callers of PR_Send, of which LDAP is the only one :-)
I have a patch that appears to fix it. It restores the previous behaviour on send/receive and makes it work for LDAP. Thanks to Nelson and Wan-Teh for proposing this simple approach.

This fix is based on the fact, that only LDAP uses send/receive, and that LDAP never uses the PR_Read or PR_Write functions.

If any code tried to mix send/receive with read/write calls on the same socket, this could result in surprising results, because of the threading and buffering used by the read/write calls.

At this time I seem unable to provide a fix that would be really safe. The SSL threading implementation is full based on the non-blocking I/O approach that all the rest of Mozilla appears to be using.

When Nelson and Wan-Teh first came up with the proposal to bypass PSM's SSL threaded layer and just go to NSS directly, I immediately rejected that proposal. I have to apologize for rejecting what seems to work now. I had assumed that when we do a blocking I/O call on the main thread, any attempts by PSM's callbacks to bring up bad cert UI, or to call back into Necko for OCSP would immediately result in a deadlock.

I don't understand yet why, with this simple patch, I do not see deadlocks on bad certs or certs that attempt to do OCSP - triggered from blocking I/O calls by the LDAP code on the main thread.

But anyway, despite the fact that I should learn more details about Mozilla's threading and networking internals, it's good that we appear to have a working fix.
(Assignee)

Comment 35

12 years ago
this patch fixes it for me.

I believe the ldap code makes its calls into the ldap c-sdk from a per-connection thread, not the ui thread, which may explain why the expected problems didn't appear.

Comment 36

12 years ago
Comment on attachment 247867 [details] [diff] [review]
Patch v1

Kai, in PSMRecv, you may want to only allow 'flags' to be 0 or PR_MSG_PEEK.
Comment on attachment 247867 [details] [diff] [review]
Patch v1

Wan-Teh, as you already know what this is about, could you please review this patch that will restore the previous behaviour and pass send/receive directly to the NSS SSL layer?
Attachment #247867 - Flags: review?(wtchang)

Comment 38

12 years ago
Comment on attachment 247867 [details] [diff] [review]
Patch v1

r=wtc.  I want to suggest an improvement.  You can either check in this patch now
and work on a supplemental patch later, or prepare a new version of the patch.

For a non-blocking socket, PR_Read and PR_Recv(flags=0) are the same,
and PR_Write and PR_Send(flags=0) are the same.  (Note: there is no
flag defined for PR_Send.)

Therefore, you should only call

    fd->lower->methods->recv or fd->lower->methods->send

if the socket is *blocking*.  If the socket is non-blocking, you should
call

    nsSSLThread::requestRead or nsSSLThread::requestWrite

Call fd->lower->methods->getsocketoption to determine if the socket is
non-blocking.  The name of that socket option is PR_SockOpt_Nonblocking.
Attachment #247867 - Flags: review?(wtchang) → review+

Comment 39

12 years ago
Kai, if you implement the improvement I suggested, make sure you
test it by changing Mozilla (Necko?) to call PR_Recv and PR_Send
instead of PR_Read and PR_Write.
Comment on attachment 247867 [details] [diff] [review]
Patch v1

I checked this patch in. I'll work on Wan-Teh's proposal in a follow-up patch.
Attachment #247867 - Flags: approval1.8.1.2?
Attachment #247867 - Flags: approval1.8.1.1?
marking fixed on trunk
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
Created attachment 248259 [details] [diff] [review]
Supplemental Patch v2
Attachment #248259 - Flags: review?
Sorry for the noise; I assume I should be able to pick up tomorrow's nightly from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/ and confirm that the patch fixes the problem?

Also, please let me know if there is a way to be notified of "to be tested" 2.0 builds with this fix - I would be happy to provide feedback.

Cheers, C.
Attachment #248259 - Attachment is obsolete: true
Attachment #248259 - Flags: review?
Wan-Teh's proposal can not be implemented as straightforward as he assumed, so I filed bug 363455 to work on that in its own context.
Blocks: 363455
(In reply to comment #43)
> I assume I should be able to pick up tomorrow's nightly
> from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/ and confirm
> that the patch fixes the problem?

Yes, look at directory latest-trunk.


> Also, please let me know if there is a way to be notified of "to be tested" 2.0
> builds with this fix - I would be happy to provide feedback.

As you are already cc'ed to this bug, watch out for a change that says fixed-1.8.1.x (x = 1 or 2). After that you can test a nightly build from directory latest-mozilla1.8

Comment on attachment 247867 [details] [diff] [review]
Patch v1

1811 is gone, approved for 1.8.1.2, a=dveditz
Attachment #247867 - Flags: approval1.8.1.2?
Attachment #247867 - Flags: approval1.8.1.2+
Attachment #247867 - Flags: approval1.8.1.1?
I am happy to report that the problem is fixed in the latest trunk: version 3 alpha 1 (20061214)

Is there a plan to include in 2.0 as well?

Many thanks!

Comment 48

12 years ago
Comment on attachment 247867 [details] [diff] [review]
Patch v1

Darin, could you review this patch for PSM's NSPR I/O layer?

One way to review this patch is to see that it changes the
behavior from
    failing with the PR_UNKNOWN_ERROR error
to
    forwarding the method call to the layer below.
So it "should not" be worse than the original code.
Attachment #247867 - Flags: superreview?(darin.moz)

Updated

12 years ago
Attachment #247867 - Flags: superreview?(darin.moz) → superreview+

Comment 49

12 years ago
I checked in patch v1 on the MOZILLA_1_8_BRANCH for 1.8.1.2.

Checking in nsNSSIOLayer.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp,v  <--  nsNSSIOLayer.
cpp
new revision: 1.97.2.15; previous revision: 1.97.2.14
done
Keywords: fixed1.8.1.2
Duplicate of this bug: 347361
You need to log in before you can comment on or make changes to this bug.