Open Bug 292127 Opened 19 years ago Updated 3 years ago

Thunderbird startup crash on *IX systems that have PAM set up to get account info from LDAP

Categories

(Directory :: LDAP C SDK, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: jmarco, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [gs][startupcrash])

Attachments

(5 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3
Build Identifier: version 1.0.2 (20050317)

This could either be a problem with the PADL software nss_ldap plugin for name
service switch, or it could be a problem with something in thunderbird doing
user name lookups, or some series of bad interactions between a likely bug in
nss_ldap and something that thunderbird is doing.

If you come across this bug with the same problem and just want to get things
working, I found that running 'nscd' causes the problem to go away, probably
because mozilla is no longer directly going into OpenLDAP or nss_ldap libraries.

Anyhow, with a pristine, fresh untar/install of thunderbird 1.0.2 (and earlier
versions as well), I get a Segfault on startup.  It only happens when launching
from a User whose identity is defined on the LDAP service instead of local
files.  I can start thunerbird just fine as root or other local users.  If I try
to use my own user ID, which comes from the LDAP server, thunderbird segfaults
on startup.



Reproducible: Always

Steps to Reproduce:
1.  Set up or get use of an LDAP based user account environment,
    including a testing box set up to use it.
2.  Set up the OpenLDAP client on the client machine.
3.  Set up pam_ldap and nss_ldap to allow logins by LDAP users.
3.1 DO NOT RUN 'nscd' (nscd appears to work-around this bug)
4.  Do a pristine default untar install of thunderbird.
5.  Try to run thunderbird as an LDAP-based user.  Won't work.
6.  Try to run as root.  Should work if root isn't LDAP based.
7.  Try to run as temporary local-only user not in LDAP.  Should work.

Actual Results:  
With LDAP based users, I got the segfault on startup.
For non-LDAP users, no segfault, thunderbird worked fine.
When running 'nscd', thunderbird worked fine in all cases.

Expected Results:  
Should have started thunderbird in all cases without segfaulting.

Thunerbird Version: 1.0.2 (20050317)
OpenLDAP Version:   2.1.30 (Gentoo port 2.1.30-r4)
Linux Distro:       Gentoo 2005.0
Kernel Version:     2.6.11 (2.6.11-gentoo-r6)
Kernel Patches:     Evms 2.5.2 recommended patches
                    UML SKAS patchset
Libc Version:       2.3.4 + NPTL
nss_ldap version:   2.2.6
pam_ldap version:   1.7.1
Here's the output from one of the segfaults:
jmarco[~] $ /usr/thunderbird/thunderbird
/usr/thunderbird/run-mozilla.sh: line 451:  2193 Segmentation fault      "$prog"
${1+"$@"}
I did an 'strace -f /usr/thunderbird/thunderbird' and attached the results.
Enabled coredumps and did a gdb on the resulting corefile.
I could do more if you'd like.
I've also submitted this issue as bug#203 for nss_ldap on the padl.com bugzilla
system.
Severity: normal → critical
Keywords: crash
Same problem happens to me. This time "nscd" doesn't help to pass away the problem.
Here's the related bug for PADL nss_ldap:
http://bugzilla.padl.com/show_bug.cgi?id=203
I did more investigation on the problem at their request and
found that there seems to be a conflict between the libldap50.so
included with the binary version of Mozilla, and whatever the
default libldapxxx.so that is installed on the user's distribution.
This occurs with nss_ldap because it causes libc to drag in the
default ldap library via NSS for anything that does user name
translation.  This seems to react poorl with the Mozilla libldap50
library.  Included is the text snippit from my most recent comment
on the PADL bug, and the stack trace from that bug.

From PADL Bug:
Problem is not in nss_ldap.  It's a Thunderbird bug

I brought up thunderbird under gdb on my desktop with LDAP user env and not
nscd.  Thunderbird died a messy death in strtok() with corrupted stack, so no
trace.	No problem.  Turns out this is luckily the first call to strtok(), so
was able to 'break strtok' and get a trace.  

It turns out that thunderbird has its own version of libldap.so called
libldap50.so with a version of ldap_str2chararray that conflicts with that in
/usr/lib/libldap-2.2.so.7.  The version in Thunderbird's libldap50 is called
unexpectedly and it looks like this is causing libldap-2.2 to poop its pants.

As an experiment, I moved /usr/thunderbird/libldap50.so aside and symlinked it
to the linux /usr/lib/libldap-2.2.so.7, and sure enough Thunderbird worked
perfectly.

Therefore, this is a problem with the binary release of Thunderbird not
handling conflicts in system LDAP libraries.  Nothing wrong with nss_ldap from
the looks of it.
If you read my previous comment, you'll see that a possible workaround for this bug right now is to go into /usr/thunderbird and:
    mv libldap50.so moved-libldap50.so
    ln -s /usr/lib/libldap-2.2.so.7 libldap50.so
Of course, change the locations of Thunderbird and your currently
installed /usr/lib/libldapXXX as required for your distro/version.
Then, restart Thunderbird.
the other implementations of this function do a dupe before using strtok.  reporter, if someone here posted a patch, could you test it?  i have absolutely no interest in setting up your configuration, something which has no use for me, but i might be willing to post patches for you to test and provide feedback.  note: i'm not a mozilla ldap dev, i'm just someone who flags crash bugs.
Assignee: mscott → mcs
Status: UNCONFIRMED → NEW
Component: General → LDAP C SDK
Ever confirmed: true
Product: Thunderbird → Directory
Version: unspecified → other
Changing one function in the Mozilla libldap will probably not solve the entire problem here.  Why not?  Because there are undoubtedly dozens of small differences in behavior between the OpenLDAP libldap and the Mozilla libldap.  I am not yet sure how to solve this problem in a way that is bulletproof.  I added Rich Megginson to the CC in case he has any ideas/experience in dealing with this kind of conflict on Linux.
Yes, there are many large and small and incompatible differences between the OpenLDAP API and the Mozilla API.  We had the same problem with newer binary versions of Apache on linux because they are linked directly with OpenLDAP, and we have some modules that depend on the Mozilla API.  We solved that problem by using LD_PRELOAD to make sure the Mozilla API is loaded first.  However, in this case, you may need to do the reverse and do a LD_PRELOAD to make sure the OpenLDAP API is loaded first.  While that might solve the first problem, it will probably break other LDAP features of thunderbird like type down addressing, etc.  So I'm not really sure how you can force PAM/NSS to use exclusively OpenLDAP calls while forcing the rest of Thunderbird to use exclusively Mozilla calls.

What we really need is a unified API between OpenLDAP and Mozilla.  There are several impediments to this happening:
1) OpenLDAP uses OpenSSL for crypto, while Mozilla uses NSS.  My preference would be to have the ability for OpenLDAP to use NSS for crypto, especially if running in a Mozilla client app.
2) Each API has extensions lacking in the other.
3) The command line tools are incompatible.
4) No one in either of the communities has either the time or the inclination to do the work.
I would be willing to test an updated libldap50 library if supplied as a binary, but I don't have the spare time to build from source.

It's been a while since I've looked at this kind of stuff.  From the glibc source code, it appears the the NSS code opens its database modules using
dlopen(libnames[x], RTLD_LAZY).  The problem is that Thunderbird is compile-time linked with libldap50.so, and so brings in its own version of any number of identically named but incompatible functions.  By the time NSS does its dlopen() it's too late.  Some of its internal function calls are going to resolve to already-bound functions from libldap50 and blow up.

One way to work around this issue would be to implement a thin LDAP glue library that only contains functions called by Thunderbird.  The glue library would internally dlopen("libldap50.so", RTLD_LAZY|RTLD_LOCAL) so as to not globally export loaded symbols for binding by other libraries.  The glue versions of the API calls would dlsym() for the real versions and pass through.

My workaround of replacing libldap50.so with OpenLDAP "works" for me, since I don't use any of the LDAP related stuff in Thunderbird.  It just keeps getpwuid() type lookups from blowing up.  I'd not be surprised to find that some of the LDAP related functionality is actually broken.
Confirmed bug on my setup - Changing Shared lib to OpenLDAP does resolve issue with startup, but does kill addressbook ldap usage. 
*** Bug 333571 has been marked as a duplicate of this bug. ***
The same problem exists for thunderbird 2.
The workaround to create a symlink to the local libldap-2.2.so.7 still fixes the issue.
(In reply to comment #10)
> What we really need is a unified API between OpenLDAP and Mozilla.

Yes. More to the point, we need a *good* LDAP API. Interested developers are invited to add comments here
http://scratchpad.wikia.com/wiki/LDAP_C_API

> There are
> several impediments to this happening:
> 1) OpenLDAP uses OpenSSL for crypto, while Mozilla uses NSS.  My preference
> would be to have the ability for OpenLDAP to use NSS for crypto, especially if
> running in a Mozilla client app.

That probably makes sense from a Mozilla perspective, but I'm not sure it's worth the overhead of carrying NSPR around everywhere. Also some interesting commentary here:

http://markmail.org/message/z3sf37vnryypdko4#query:openssl%20vs%20nss+page:2+mid:xvw5nybqrhkw6w7n+state:results

> 2) Each API has extensions lacking in the other.

Not relevant, since Mozilla's use of LDAP is quite plain-jane.

> 3) The command line tools are incompatible.

I don't see how associated tools are relevant to the Thunderbird/Mozilla apps..

> 4) No one in either of the communities has either the time or the inclination
> to do the work.

Well, out of boredom, I spent 2 hours this afternoon patching my Mozilla build tree to use OpenLDAP. I think the difficulties have been overstated, because it's working fine on my OpenSUSE laptop.

Note that I haven't looked at the necessary autoconf changes, just edited my build tree after configure was already run. As such, edit config/autoconf.mk:

#LDAP_CFLAGS    = -I${DIST}/public/ldap
#LDAP_LIBS  = -L${DIST}/bin -L${DIST}/lib -lldap60 -lprldap60 -lldif60
LDAP_CFLAGS = -I/usr/local/include -DLDAP_DEPRECATED
LDAP_LIBS=  -L/usr/local/lib -lldap_r -llber

and use the attached patch. A more thorough adaptation would go through and eliminate the use of LDAPv2/deprecated APIs but this was quick and dirty...
Attached patch Quick'n'dirty patch (obsolete) — Splinter Review
Works with all ldap URLs that OpenLDAP supports (cldap, ldap, ldapi, ldaps); someone should add an option for choosing StartTLS...
Oh, you also need to turn off the MOZ_PSM stuff in directory/xpcom/base/src/Makefile:

#ifdef MOZ_PSM
#DEFINES        += -DMOZ_PSM
#CPPSRCS        += \
#       nsLDAPSecurityGlue.cpp \
#       $(NULL)
#endif

This leaves you with a Mozilla build that uses OpenLDAP's SSL support, whatever it may be linked to (OpenSSL or GnuTLS, currently). It's worth noting that OpenSSL is already loaded in the process under Linux, due to various other system libraries included in the build, so this isn't really making any situation worse. Since OpenSSL has been a standard system library on Linux for so long and pretty much everything uses it, it would make more sense to replace NSS with OpenSSL here.
It should be noted that NSS is being considered for inclusion in the LSB,
and OpenSSL is not, due in part to commitment to ABI compatibility in NSS.
Attached patch Cleaned up patch (obsolete) — Splinter Review
This patch is properly ifdef'd so it won't break the existing MozLDAP functionality...
Attachment #333135 - Attachment is obsolete: true
Attached patch OpenLDAP+PSM support (obsolete) — Splinter Review
This patch also supports PSM with OpenLDAP, using new callback hooks that were just added to OpenLDAP's CVS HEAD. (Those hooks probably will be released in OpenLDAP 2.4.12; 2.4.11 is current.)
Attachment #333197 - Attachment is obsolete: true
The PSM support just mimics the existing MozLDAP behavior. It's worth noting that the existing behavior will typically break when chasing referrals: The hostname that's passed in persists until the LDAP* handle is closed and is used for all Connection attempts. If a referral is received which points to ldaps:// on a different host, the hostname will not match and the connection should fail. If the referral points to the same host (as is common on MSAD) then it will probably succeed.

To fix this problem the Connect callback should record a bit more info, to answer two questions:
  1) whether it successfully connected once before - that will allow distinguishing referral chasing from the first successful connection.
  2) whether the IP address of the current connection attempt matches the previous successful attempt - that will distinguish referrals to the same host from referrals to a different host.

Then when it's determined that this connect attempt is chasing a secure referral on a different server, it can just use the name provided in the callback argument list.
This whole referral issue probably belongs in a separate bug report, but I'm commenting here because the details only surfaced while investigating this report.

Another obvious problem with the current PSM support: if the initial connection is plaintext but a referral to an ldaps:// URL is received and chased, the subsequent connection will not have the PSM layer installed. The fix for this is to always install the callback, and just have it pass-thru without pushing the PSM layer if the current connection didn't request ldaps://.


Attached patch Fix referral issues (obsolete) — Splinter Review
Also noticed, in the current code there's a potential memory leak in nsLDAPSSLInstall if prldap_set_sessioninfo fails; it will leak the dup'd hostname because it calls the wrong free function before returning.

(nsLDAPSecurityGlue.cpp:369 should be calling nsLDAPSSLFreeSessionClosure()...)

The socketClosure stuff doesn't seem to accomplish anything. It should probably be ripped out; there's no special handling needed for closure of individual sockets. It's only needed for closing the session handle.

The attached patch fixes these two issues in the existing code. It also fixes the referral issues I mentioned before, for both MozLDAP and OpenLDAP.
Attachment #333905 - Attachment is obsolete: true
(Mark said "be my guest" ...)
Status: NEW → ASSIGNED
Assignee: mcs → hyc
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
(In reply to comment #15)
> What do Mozilla LDAP people think about using the same approach as is done for
> cairo:
> http://lxr.mozilla.org/seamonkey/source/gfx/cairo/cairo/src/filterpublic.awk
> http://lxr.mozilla.org/seamonkey/source/gfx/cairo/cairo/src/cairo-rename.h
> 
It seems this would make the app more dependent on having these specific libraries bundled with the app. It would be nice to be able to use the library already present on a system, instead.

An alternative approach, along similar lines, would be to avoid direct references to these library functions in any particular code. Instead, use dlopen (or its analogue) to find any suitable version of the desired library, and use dlsym to build up a table of function pointers for all of the needed entry points. Then wrap macros around all of the invocations in the main source, to always invoke these functions through your table of pointers.

On a separate note, in my current patches I left nsLDAPService::CreateFilter unimplemented because a quick grep thru the source tree didn't turn up anyone using this function. But now I see that the AddressBook actually does try to use it for autocomplete, so I guess we'll have to provide an OpenLDAP version of ldap_create_filter() before this patch can be considered complete.
NSPR provides a analogue of dlopen that works on all Mozilla/Firefox/TBird 
platforms and is present in every FF browser and TB mail client (SM too).  
See documentation here
http://mxr.mozilla.org/nspr/source/nsprpub/pr/include/prlink.h#94
http://mxr.mozilla.org/nspr/source/nsprpub/pr/include/prlink.h#181
Another approach that sometimes works is to link these libraries with -Bsymbolic, to restrict them to resolving their symbol references to within their own shared objects. Unfortunately, it also requires whoever built the conflicting library to use the same option. I.e., it's not sufficient to link Mozilla's libldap with this flag; the platform's libldap must be linked this way as well. (The symbol conflict confusion is bi-directional; only linking one of the conflicting libraries only eliminates the conflict in one direction.) It also doesn't help when the shared library has other external dependencies (e.g. OpenLDAP's libldap depends on liblber).

Had to mention this because the dlopen approach is still vulnerable to the problem of the dlopen'd libldap referencing the wrong liblber if another one was implicitly loaded into the process by some other library dependency.
I note that in Mozilla's libldap/getfilter.c, which provides ldap_create_filter(), the header comment says "getfilter.c -- optional add-on to libldap". It's not a part of the libldap API spec, and it's totally self-contained - it has no dependencies on anything else in libldap. IMO it doesn't really belong in there, someone just tossed it in there for lack of a more obvious place. So for this patch, I've copied the necessary bits out of getfilter.c and pasted them in here where they're actually used.
Attachment #334053 - Attachment is obsolete: true
Attachment #334117 - Flags: review?(dmose)
Just for your information:
- The bug still exist in OpenSuse 11.0 x86_64
   kernel 2.6.25.18-0.2-default
   MozillaThunderbird-2.0.0.17-3.1
   nscd-2.8-14.1

   nscd crashes as soon as thunderbird is launched

# ps -ef |grep nscd
root      4905     1  0 09:56 ?        00:00:00 /usr/sbin/nscd
root      4915  4844  0 09:56 pts/2    00:00:00 grep nscd
# logout
begou@thor: thunderbird
Registering Enigmail account manager extension.
Enigmail account manager extension registered.
/usr/bin/thunderbird: line 134:  4918 Erreur de segmentation  $MOZ_PROGRAM $@
begou@thor: ps -ef |grep nscd
begou     4927  4818  0 09:57 pts/2    00:00:00 grep nscd


Using /usr/lib64/libldap-2.4.so.2 instead of /usr/lib64/thunderbird/libldap50.so seems to provide a good work-around.
Given that we have a patch, maybe this should block Thunderbird 3.  It would be really nice to have an idea of how prevalent this is...
Flags: blocking1.9.1?
Though, despite having a patch, it seems like there's still some discussion to be had on whether it uses the optimal approach, or if one of the other approaches suggested here would make more sense.
Whatever the effect of this specific patch is, I'd like to voice my opinion that, unless Thunderbird gains useful LDAP support for reading and writing address books, there is no way to place Thunderbird onto the corporate desktop, although there are other limiting factors around as well.
Removing the flag that I mistakenly set: since this isn't part of Gecko, so it can't block a gecko release.  I'd love to get this for Thunderbird 3, but it feels like there's still a non-trivial amount of work to do here.  Not adding [tb3needs], because if I this were the last bug standing, I don't think we would hold the release for it.  Sorry I haven't been able to get back this yet, Howard.  :-(
Flags: blocking1.9.1?
Summary: Thunderbird crashes on startup for LDAP based users. → Thunderbird startup crash on *IX systems that have PAM set up to get account info from LDAP
I was just bitten by it on Tbird 3 (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1pre) Gecko/20090607 Shredder/3.0b3pre). Oddly, it hangs early in startup when connecting over remote X (ssh-tunnel) but not when invoked locally.
QA Contact: csdk
Given that we have a patch, should really try to drive this in for tb3. 
I'm not sure it's a problem that the code is in m-c, if it's all NPOTB for firefox anyway.
Whiteboard: [has patch, need r dmose]
NPOTB?   Let me guess:  Not part of the browser ?
Not part of the build.
Blocks: 433530
Comment on attachment 334117 [details] [diff] [review]
Add ldap_create_filter

is this patch still wanted/needed?
Unless something has changed, I suspect it's still necessary.  Comment 37 still applies, though.
I heard there are distributions which patched glibc's name service switch components to avoid these crashes.
One comment about that is
https://bugzilla.novell.com/show_bug.cgi?id=503151#c5

I don't know more details though.
(In reply to comment #44)
> (From update of attachment 334117 [details] [diff] [review])
> is this patch still wanted/needed?

Independent of the OpenLDAP functionality, the bugs / memory leaks in the current code are still issues.
Here's a workaround:
Add yourself as a local user in passwd & shadow as well as ldap.
@Bruce Edge:
it is sufficient to have the users in passwd - adding them to shadow is not needed.
Besides any workaround (adding to local passwd an run nscd), it works just to run TB, but not for lightning. I can run TB+Lightning with a local only user, but not with a LDAP user, on the same machine with the same pam_ldap config.
Forget my comments. My ldap user has its homedir mounted on a NFS volume mounted with 'noexec' flag. Removing this makes it work.
Wayne Mery (vn) <vseerror@Lehigh.EDU wrote>:
> I wrote:
> > unless Thunderbird gains useful LDAP support for reading and writing
> >  address books, there is no way to place Thunderbird onto the corporate
> > desktop, although there are other limiting factors around as well.
> a_geek, are you in a corporate environment? 

First off, please quote properly, and keep it here.

To answer the question: A large part of my work is as a consultant to corporations with typically several hundred users, and my mind set is tweaked towards the requirements of such organisations. But I have a hard time seeing TB even in the SME area, as they also at least want shared address books and calendaring throughout the company, and will not accept an out-of-band management requirement for their address books (this is a large part of what LDAP access is about).
I would prefer to discuss this out of band, because it's not related to the bug, but my interest in contacting you was to determine your level of interest, which of the ldap bugs you think are most important, and to what extent you would be able to help moving some of them forward (testing, etc)?

And, as implied by my earlier question, there could be more progress if we sought additional users/enterprises who were interested in helping sort through the issues.
Hi, I run a corporate network with aprox. 90 users in 3 different sites and roamaing users. Our only mail client is TB with lightning, using IMAP mailboxes and SOGo as calendar server. I am interested in any progress/enhancement in any of those two. I can help testing.
Hi, today I have realize that this bug is affecting us also. We have just decide to move to TB from Evolution. If nscd is not installed on the system TB even does not start, it gave SEG FAULT an crashes. If nscd is present TB starts but you can't configure accounts or access some menus such as addons menu. Ive tried to add users in /etc/password but no way it still crashes.

Is it there any workaround for this??

Ubuntu: 10.10
TB: 3.1.7
nscd 2.12.1-0ubuntu10.2
libldap 2.4-2
libnss-ldap 264-2ubuntu2

The last lines of strace are this

read(53, "#\n# LDAP Defaults\n#\n\n# See ldap."..., 4096) = 198
read(53, "", 4096)                      = 0
close(53)                               = 0
munmap(0xb7704000, 4096)                = 0
geteuid32()                             = 25004
getuid32()                              = 25004
open("/home/user/ldaprc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/user/.ldaprc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("ldaprc", O_RDONLY|O_LARGEFILE)    = -1 ENOENT (No such file or directory)
stat64("/etc/ldap.conf", {st_mode=S_IFREG|0644, st_size=712, ...}) = 0
geteuid32()                             = 25004
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
unlink("/home/user/.thunderbird/3nkktvbg.default/lock") = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [], 0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0
tgkill(18919, 18943, SIGSEGV)           = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
You really need a stack trace for the crash to make progress, I think.
(In reply to comment #58)
> You really need a stack trace for the crash to make progress, I think.

Oh, sorry, I see Howard has a patch in progress.
Comment on attachment 334117 [details] [diff] [review]
Add ldap_create_filter

switching review to standard8
Attachment #334117 - Flags: review?(dmose) → review?(bugzilla)
Mike, since this might be a serious issue on Ubuntu, you might want to look into driving this patch forward, if it's still applicable.
(In reply to comment #57)
> Is it there any workaround for this??

Carlos, 

reportedly replacing libnss-ldap by libnss-ldapd is a workaround for this. See also the Ubuntu report on this bug,

https://bugs.launchpad.net/ubuntu/+source/thunderbird/+bug/507089
btw. 
as far as I have tested - Mandriva 2010.2 is not affected anymore (thunderbird 3.1.7)
Bug in Mandriva 2010.2 (Thunderbird 3.1.7) seems fixed - maybe an internal
patch - or bug was based on a library which was replaced.
Could not reproduce in Debian Squeeze using Thunderbird 3.1.7.
This also works fine on Ubuntu 8.04 and TB 3.1.7
Ro added the following comment to Launchpad bug report 507089:

As this bug was not present in Ubuntu 9.10 Karmic Koala (thunderbird 2.0.0.24+build1+nobinonly-0ubuntu0.9.10.3), it must have been introduced sometime in between karmic and lucid.

-- 
http://launchpad.net/bugs/507089
I've reproduced this bug in both Ubuntu Maverick and Natty, using Thunderbird 3.1.7.

I'll dig a bit deeper, and keep you all posted.
Comment on attachment 334117 [details] [diff] [review]
Add ldap_create_filter

I'm not convinced by this solution.

If I understand it correctly, then this is trying to make our API the same as OpenLDAP's version. So depending on the set-up of the (Linux) system, we could be using either the OpenLDAP library, or our own. We don't know what is in OpenLDAP's library, nor have will we have done extensive testing in it. If we get crashes or strange results, we may not even realise that we're using OpenLDAP's library. This would make support very difficult. I think this is what Mark was saying in comment 9.

Given that we ship this library in Thunderbird, intending that Thunderbird is going to use this library, then maybe we should consider re-naming the library when we ship it within Thunderbird. This idea is from a similar approach Firefox took with SQLite in bug 513747.

So for instance, we could ship libmozldap60.so etc where we build LDAP as part of Thunderbird. Hence, changing the name should resolve the conflicts we're seeing, and ensure that Thunderbird runs with what we intended.

The LDAP c-sdk could still default to libldap60.so, and if building with the system LDAP c-sdk, then we could still use libldap60.so. If Linux distributions want to use the system LDAP for shipping Thunderbird, then I would expect them to verify/handle bugs with LDAP, especially if it isn't the LDAP c-sdk that we're shipping with Thunderbird.

Obviously we may still want to move the two sets of LDAP APIs closer together, but I'm not convinced doing it as a result of this bug is the right thing to do. For example, it really does feel like ldap_create_filter should be in the c-sdk, and therefore maybe it needs adding to OpenLDAP's version, not removing from ours.

If I've misunderstood things, then please correct me.
Attachment #334117 - Flags: review?(bugzilla) → feedback-
Whiteboard: [has patch, need r dmose]
What is the status on this bug after 7 years?

From what I understand (correct me if I am wrong) the solution is to install either nscd or libnss-ldapd. While both of these seem to work, they are not acceptable solutions because it affects the rest of the system.
And why should thunderbird even care what the controlling backend auth module is in the first place?
(In reply to comment #70)
> What is the status on this bug after 7 years?
> 
> From what I understand (correct me if I am wrong) the solution is to install
> either nscd or libnss-ldapd. While both of these seem to work, they are not
> acceptable solutions because it affects the rest of the system.
> And why should thunderbird even care what the controlling backend auth module
> is in the first place?

You're right that Thunderbird or some other app *shouldn't* ever need to care about this, but the fact is that the old nss-ldap design causes these types of problems, and libnss-ldapd corrects the design flaw.
airtonix added the following comment to Launchpad bug report 507089:

still a problem in 12.04 amd64 desktop and the default thunderbird provided.

-- 
http://launchpad.net/bugs/507089
While that change makes sense in general I'm wondering what it's supposed to fix? I'm pretty sure that the filename is not the issue.
Attachment #628378 - Flags: feedback? → feedback?(mbanner)
Comment on attachment 628378 [details] [diff] [review]
rename libldap60.so to libmozldap60.so

From discussions about this sort of thing previously (which admittedly were a while ago), I believe that changing the library name wouldn't actually resolve all the problems.

Additionally, I don't think it is really right to change the library name unless the developers of the Mozilla LDAP c-sdk really want to, as it would impact on all the users of it, and potentially the use of libraries on existing systems.

I think that we should really go for changing the LDAP c-sdk that we use, and possibly replacing it with OpenLDAP as Howard was intending (or something else). To this effect I've put a proposal to tb-planning about this change:

http://groups.google.com/group/tb-planning/browse_thread/thread/342164ae0db9b21a
(https://wiki.mozilla.org/Thunderbird/tb-planning)
Attachment #628378 - Flags: feedback?(mbanner) → feedback-
Comment on attachment 334117 [details] [diff] [review]
Add ldap_create_filter

I'm rescinding my previous feedback- on this. Per previous comment on this bug, discussions have moved on, and we're considering moving away from the LDAP c-sdk, so this patch may therefore be heading in the right direction. Obviously, it would need to be updated and re-tested etc, but see the tb-planning discussion first.
Attachment #334117 - Flags: feedback- → feedback+
The problem ist also in Thunderbird 15 still present! I get a backtrace like in https://bugzilla.mozilla.org/show_bug.cgi?id=433530:

(gdb) bt
#0  strtok_r () at ../sysdeps/x86_64/strtok.S:190
#1  0x00007ffff6ad3b3a in ldap_str2charray (str=0x7fffe3781ced "ldap://localhost/", brkstr=0x7fffe3781a4b ", ")
    at /usr/src/debug/mail-client/thunderbird-15.0.1/comm-release/ldap/sdks/c-sdk/ldap/libraries/libldap/charray.c:218
#2  0x00007fffe376c216 in ldap_url_parselist_int (ludlist=0x7fffe398be80, url=<optimized out>, sep=<optimized out>, flags=11) at url.c:1293
#3  0x00007fffe376da8b in ldap_int_initialize_global_options (gopts=0x7fffe398bdc0, dbglvl=<optimized out>) at init.c:537
#4  0x00007fffe376dc0d in ldap_int_initialize (gopts=0x7fffe398bdc0, dbglvl=<optimized out>) at init.c:653
#5  0x00007fffe3753309 in ldap_create (ldp=0x7fffffff9cb8) at open.c:108

By looking at
(gdb) info sharedlibrary
0x00007ffff6ad2040  0x00007ffff6af6558  Yes         /usr/lib64/thunderbird/libldap60.so
0x00007fffe3752fd0  0x00007fffe377e0a8  Yes         /usr/lib64/libldap-2.4.so.2

you can see that the openldap routine is jumping into a mozilla routine, causing a segfault by applying strtok to "ldap://localhost/", which is a built in string in the openldap lib. A solution would be nice, because currently I can't use Thunderbird at all.
The problem is also in Thunderbird 16. It's a clash of symbols from libldap-2.4.so and libldap60.so.

(gdb) bt
#0  0x00007fffe708a100 in ldap_str2charray () from /usr/lib64/libldap-2.4.so.2
#1  0x00007fffe70816c6 in ldap_url_parselist_int () from /usr/lib64/libldap-2.4.so.2
#2  0x00007fffe7082f1b in ldap_int_initialize_global_options () from /usr/lib64/libldap-2.4.so.2
#3  0x00007fffe7083016 in ldap_int_initialize () from /usr/lib64/libldap-2.4.so.2
#4  0x00007fffe706a6ab in ldap_create () from /usr/lib64/libldap-2.4.so.2
#5  0x00007fffe706aa81 in ldap_initialize () from /usr/lib64/libldap-2.4.so.2
#6  0x00007fffe72a79c0 in do_init () from /lib64/libnss_ldap.so.2
#7  0x00007fffe72a9d1c in _nss_ldap_search_s () from /lib64/libnss_ldap.so.2
#8  0x00007fffe72ab580 in _nss_ldap_getbyname () from /lib64/libnss_ldap.so.2
#9  0x00007fffe72abd07 in _nss_ldap_getpwnam_r () from /lib64/libnss_ldap.so.2
#10 0x00007ffff70c5685 in getpwnam_r () from /lib64/libc.so.6

Removing/renaming libldap60.so caused some errors in finding the library, so this seems no solution:
  XPCOMGlueLoad error for file /usr/lib64/thunderbird/libxpcom.so:
  libxul.so: cannot open shared object file: No such file or directory
  Couldn't load XPCOM.


We brute-forced renaming the symbol via
   sed -e 's:ldap_str2charray:ldap_str2xharray:' /usr/lib64/thunderbird/libldap60.so
in order to make it work.
Blocks: 756782
After upgrading to Thunderbird 22, the error is reproducible too, but signature is changed from:
arena_dalloc | ldap_x_free | ldap_set_lderrno
to 
arena_dalloc | ld-2.15.so@0x214e4

- is this the same error or some other problem?
Howard is no longer working on this

(In reply to Murz from comment #83)
> After upgrading to Thunderbird 22, the error is reproducible too, but
> signature is changed from:
> arena_dalloc | ldap_x_free | ldap_set_lderrno
> to 
> arena_dalloc | ld-2.15.so@0x214e4
> 
> - is this the same error or some other problem?

seems likely. 
https://crash-stats.mozilla.com/query/?product=Thunderbird&version=ALL%3AALL&range_value=4&range_unit=weeks&date=08%2F06%2F2013+17%3A00%3A00&query_search=signature&query_type=contains&query=arena_dalloc+|+ld&reason=&release_channels=&build_id=&process_type=any&hang_type=any
 arena_dalloc | ldap_x_free | ldap_set_lderrno
arena_dalloc | ldap_ld_free | libnss_ldap-2.13.so@0x3955
arena_dalloc | ldap_set_lderrno
arena_dalloc | ld-2.15.so@0x214e4
arena_dalloc | ld-2.15.so@0xe774
Assignee: hyc → nobody
Status: ASSIGNED → NEW
Whiteboard: [gs] → [gs][startupcrash]
The bug is present in Thunderbird 24.2.0 running on Kubuntu 12.04.4. Running nscd appears to work around the issue, but I haven't tested it thoroughly for side effects.

I find it somewhat ironic that a nearly nine year old bug of this magnitude has status: NEW.

Software versions (all from Ubuntu repos):
$ aptitude show thunderbird | grep Version
Version: 1:24.2.0+build1-0ubuntu0.12.04.1
$ aptitude show libldap-2.4-2 | grep Version
Version: 2.4.28-1.1ubuntu4.4
$ uname -a
Linux tiny 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
(In reply to Maciej Puzio from comment #85)
> I find it somewhat ironic that a nearly nine year old bug of this magnitude
> has status: NEW.

Actually, a better label would be CONFIRMED rather than NEW. That's what NEW really means, it does not refer to the bug's age.
(In reply to Tony Mechelynck [:tonymec] from comment #86)
> Actually, a better label would be CONFIRMED rather than NEW. That's what NEW
> really means, it does not refer to the bug's age.

I am very well aware of that; my point was to draw attention to an unacceptable quality control, record-breaking in the length of bug fix cycle. Anyway, my further testing revealed several more issues with libldap, libpam-ldap and libnss-ldap, and I decided that this software as a whole does not meet my quality requirements. Instead I am deploying sssd as LDAP client for PAM and NSS, and this is my recommendation for readers of this page.
nslcd /nss-pam-ldapd would be the best choice, the code is quite mature since the basic LDAP functionality is ported from the old PADL code and well proven. It's also quite compact, it does just LDAP and nothing else. SSSD is unproven, and quite overloaded featurewise. For security/authentication software, complexity is the enemy of reliability. I shouldn't have to roll out that lecture again...
Maciej Puzio and Howard Chu - thanks for the info, moving to ldapd or sssd solves this problem.
Chiming in with the info that I first encountered this bug in Mint 13 (Ubuntu Precise), and it still applies in Mint 17 (Ubuntu Trusty). And while I can understand all the issues involved with deciding the "right way to go", I am somewhat miffed to find that a decade-old bug still expresses itself as a SIGSEGV. Expecting the user to strace / google / eventually find this bug entry if he's lucky? Is it really that difficult to check for the condition and at least give a meaningful message (perhaps including a workaround recommendation) before exiting gracefully?
(In reply to Martin Baute from comment #90)
> Chiming in with the info that I first encountered this bug in Mint 13
> (Ubuntu Precise), and it still applies in Mint 17 (Ubuntu Trusty). And while
> I can understand all the issues involved with deciding the "right way to
> go", I am somewhat miffed to find that a decade-old bug still expresses
> itself as a SIGSEGV. Expecting the user to strace / google / eventually find
> this bug entry if he's lucky? Is it really that difficult to check for the
> condition and at least give a meaningful message (perhaps including a
> workaround recommendation) before exiting gracefully?

It is a constant of Electronic Data Processing that no program is bug-free before it is obsolete. Even once a bug is identified, fixing it is not always easy. Complaining that "after so many years, no fix has been found" doesn't push the bug any nearer to be fixed, while it adds to the lot of useless rubbish (please excuse my language) that developers must wade through in order to find what the problem really is.

Another constant of EDP is that there are never enough coding hands do do all that needs doing, even when, as at Mozilla, a lot of volunteers selflessly donate part of their time to help the people whose paid job it is to try and fix these bugs. Any help is always welcome, and the code is anyone's to look into.

Do you know how to fix the bug? Good! Write a patch, ASSIGN the bug to yourself, find an appropriate reviewer by browsing https://wiki.mozilla.org/Modules and off you go. Once you get a positive review, set the checkin-needed flag, and someone will push your patch into the permanent source.

You mean you don't know how to fix the patch? Ah, too bad. Neither do I. So let us wait patiently, even years if that's what it takes, until someone comes around who does, and in the meantime let's have a look at the "rules of the house", https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
(In reply to Tony Mechelynck [:tonymec] from comment #91)
> ...lots of the usual deleted...

So your answer to a bug that's been confirmed, and after nine years still expresses itself as SIGSEGV, is basically, "go fix it yourself"?

You think *that* is a useful contribution to this bug report?

Sometimes I'm really ashamed of my peers in the trade. And no, I won't wade through Thunderbird sources, because I've got other projects. I am a Thunderbird *user*, not a *maintainer*, so...

...go fix it yourself.
Confirming this bug for 31.1.1 Linux (Xubuntu 14.04): User accounts through ldap authentication make Thunderbird crash when trying to print. Installing nscd makes that go away.
Confirming bug for current release (45.4.0) for Ubuntu 16.04 (64bit)
https://launchpad.net/ubuntu/+source/thunderbird/1:45.4.0+build1-0ubuntu0.16.04.1

Thunderbird crashes for ldap accounts when:
1. creating new TB user profile
2. invoking print dialog in TB

Workarround: sudo apt install nscd
Confirming bug fur current release (52.7.0) for Ubuntu 16.04 LTS (64 bit). Crash at startup for any user authenticated by LDAP.

Also confirming the workaround from comment #94, installing nscd solved the problem.

Can you reproduce this when using version 78?

Flags: needinfo?(servizio.antifumo)
Flags: needinfo?(lars.behrens)
Flags: needinfo?(daniel)

I can answer for 68.10.0 that the problem does not occur anymore (but our setup changed since then, additionally using kerberos)

Flags: needinfo?(lars.behrens)

thanks lars

Flags: needinfo?(servizio.antifumo)
Flags: needinfo?(daniel)

Just to note - in OpenLDAP 2.5, which is currently being released, we've added symbol versioning to libldap and liblber, so mixing of libraries should no longer be a problem.

You need to log in before you can comment on or make changes to this bug.