Closed Bug 451476 Opened 11 years ago Closed 11 years ago

NSPR shared libraries should use direct bindings on Solaris

Categories

(NSPR :: NSPR, defect, P2)

4.7.1
x86
SunOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: julien.pierre)

Details

Attachments

(2 files)

We ran into a problem recently where accidental interposition of a symbol by an application's shared library caused a crash during NSPR initialization.

The crash stack looked like this :

(dbx) where
current thread: t@1
 [1] __lwp_kill(0x1, 0x6), at 0xfe0b54d7
 [2] _thr_kill(0x1, 0x6), at 0xfe0b2c86
 [3] raise(0x6), at 0xfe0613f3
 [4] abort(0xfe3f9b9c, 0xfe3b2cd9, 0x8046338, 0xfe3d2624, 0xfe3e8470, 0xfe3e8478), at 0xfe041709
=>[5] PR_Assert(s = 0xfe3e8470 "0 == rv", file = 0xfe3e8478 "../../../../pr/src/pthreads/ptsynch.c", ln = 178), line 551 in "prlog.c"
 [6] PR_NewLock(), line 178 in "ptsynch.c"
 [7] session_thread_init(0xfee7f720, 0xfe3c5139, 0x0, 0x8046374, 0xfece4b02, 0xfeedebf0), at 0xfece4ad3
 [8] PR_CallOnce(once = 0xfeedebf0, func = 0xfece4ac8 = &`libns-httpd40.so`session.cpp`#__1cTsession_thread_init6F_nIPRStatus__        [non -g, demangles to: session_thread_init]()), line 808 in "prinit.c"  [9] session_alloc_thread_slot(0xfece912c), at 0xfece4b02
 [10] __STATIC_CONSTRUCTOR(0xfeffb818, 0xb4, 0xfeff04f0, 0x80463c0, 0xfefd58e1, 0xfeffb28c), at 0xfece92e4
 [11] __cplus_fini_at_exit(0xfeffb28c, 0xfeff04f0, 0xfeffb818, 0xfde90e5c, 0xfdeb03dc, 0xfee3f1d4), at 0xfee3f299
 [12] call_init(0xfde90e58, 0x3), at 0xfefd58e1
 [13] is_dep_init(0xfeff04f0, 0xfefa0f00), at 0xfefd5695
 [14] elf_bndr(0xfefa0f00, 0x1c8, 0xfe3b2534), at 0xfefdfc22
 [15] elf_rtbndr(0x1c8, 0xfe3b2534, 0x80464a0, 0xfe3e5398, 0xfe3f9b9c, 0xfe3b2459), at 0xfefcba64
 [16] 0xfefa0f00(0x8082350, 0x0), at 0xfefa0f00
 [17] PR_NewLogModule(name = 0xfe3e74e0 "clock"), line 366 in "prlog.c"
 [18] _PR_InitStuff(), line 181 in "prinit.c"
 [19] _PR_ImplicitInitialization(), line 251 in "prinit.c"
 [20] PR_NewLock(), line 172 in "ptsynch.c"
 [21] __STATIC_CONSTRUCTOR(0xfeffb818, 0x44, 0xfe7d0930, 0x8046594, 0xfefd58e1, 0xfeffb28c), at 0xfdee5b1c
 [22] __cplus_fini_at_exit(0xfeffb28c, 0xfeffdcc8, 0xfeffb818, 0xfdeb094c, 0xfe000bd4, 0xfdee834c), at 0xfdee8407
 [23] call_init(0xfdeb08d8, 0x1), at 0xfefd58e1
 [24] setup(0x80467e0, 0x8046944, 0x0, 0x8047fc4, 0x1000, 0xfefca0ad, 0xfeffbaec, 0xfefc4000, 0xfefc4000, 0xffffffff, 0x8050034, 0x8047fca, 0x80467d8, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x2), at 0xfefd4f9c
 [25] _setup(0x8046794, 0xfeffbaec, 0x3, 0x80467d8, 0x4, 0x80467e0), at 0xfefe0c66
 [26] _rt_boot(0x0, 0x80469e0, 0x80469ed, 0x8046b28, 0x8046b3e, 0x8046b49), at 0xfefcb94c 

The intercepted symbol was strncasecmp, which is used in PR_NewLogModule.
This triggered the loading of a C++ shared library that implemented strncasecmp. The init function for that library called PR_NewLock, even though NSPR had not completed its initialization. This ended up causing a failure in pthread_mutex_init and an assertion in NSPR.

The only way to solve this problem is to prevent interposition of this symbol in order to avoid causing the other shared library to be loaded.

The Solaris linker provides -Bdirect and -z direct options to accomplish this. Both of them solve the problem. I'm not sure which one is preferrable to use.
We may also want to do this on other Unix platforms as well if such a linker option is available.
The problem is that that C++ shared library implements
strncasecmp.  We can change NSPR to not call strncasecmp.

If -Bdirect works like -Bsymbolic, it'll prevent us from
overriding some standard library function calls easily,
and we should not impose that on all users of NSPR.
Wan-Teh,

-Bdirect is not the same as -Bsymbolic.
The way I understand things, it's still possible to interpose if one builds an interposer library with -z interpose. It just prevents additional interposition.

    -B direct | nodirect

         Options governing direct binding. -B direct  establishes
         direct binding information by recording the relationship
         between each symbol reference and  the  dependency  that
         provides  the  definition.  In  addition, direct binding
         information is established between each symbol reference
         and  an  associated  definition  within the object being
         created. The runtime linker  uses  this  information  to
         search  directly  for  a symbol in the associated object
         rather than to carry out a default symbol search.

         Direct binding information can only  be  established  to
         dependencies  specified  with  the  link-edit. Thus, you
         should use the -z defs  option.  Objects  that  wish  to
         interpose  on  symbols  in  a direct binding environment
         should identify themselves as interposers  with  the  -z
         interpose  option. The use of -B direct enables -z lazy-
         load for all dependencies.

         -B nodirect prevents any direct binding  to  the  inter-
         faces  offered  by  the object being created. The object
         being created can continue to directly bind to  external
         interfaces by specifying the -z direct option.
I meant "accidental" interposition.
Here is the doc for -z interpose :

     -z interpose

         Marks the object as an interposer. When direct  bindings
         are in effect, the runtime linker searchs for symbols in
         any interposers before  the  object  associated  to  the
         direct binding. See -B direct.

I did not try to build an interposer library.
Re: strncasecmp

I believe that NSPR uses strcasecmp instead of strncasecmp.
Could you please doublecheck?

In any case, we can modify NSPR so that it doesn't call
any non-standard C library function such as strcasecmp.
That'll significantly reduce the likelihood of accidental
interposition because there is little reason for a normal
library (as opposed to a special-purpose library such as
jemalloc or SmartHeap) to implement a universally available
libc function such as strlen or strcmp.

Re: -Bdirect

If the Solaris system libraries such as libpthread.so,
libthread.so, librt.so, libsocket.so, libnsl.so are
built with -Bdirect, then I'm fine with building NSPR
with -Bdirect.

Is libmtmalloc.so built with -z interpose?

Wan-Teh,

You are right, it is strcasecmp.

Re: mtmalloc, I don't know if it's built with -z interpose. ldd / nm / file don't show that.
prlog.c uses strcasecmp but not strncasecmp, so this macro
definition can be removed.
Attachment #335134 - Flags: review?(julien.pierre.boogz)
Comment on attachment 335134 [details] [diff] [review]
Remove an unused strncasecmp macro definition. (checked in)

This change is fine, although it is not directly related to this Solaris problem .
Attachment #335134 - Flags: review?(julien.pierre.boogz) → review+
Comment on attachment 335134 [details] [diff] [review]
Remove an unused strncasecmp macro definition. (checked in)

I checked in this patch on the NSPR trunk (NSPR 4.7.3).

Checking in prlog.c;
/cvsroot/mozilla/nsprpub/pr/src/io/prlog.c,v  <--  prlog.c
new revision: 3.43; previous revision: 3.42
done
Attachment #335134 - Attachment description: Remove an unused strncasecmp macro definition. → Remove an unused strncasecmp macro definition. (checked in)
I verified that this still allows LD_PRELOAD of /usr/lib/libmtmalloc.so to work . The malloc function from libmtmalloc.so.1 is called in dbx, not the one from libc.so.1 .
Attachment #336128 - Flags: review?(wtc)
Comment on attachment 336128 [details] [diff] [review]
Use -Bdirect when linking NSPR shared libraries on Solaris

r=wtc.

If GNU ld doesn't support the -Bdirect flag, you can
test the GCC_USE_GNU_LD variable.
Attachment #336128 - Flags: review?(wtc) → review+
NSPR 4.7.2 is almost done.  Please set target milestone
to 4.7.3.
Thanks for the review, Wan-Teh. Should I wait until NSPR 4.7.2 has an RTM tag before I check in to the trunk ?
Target Milestone: 4.7.2 → 4.7.3
You can check it in now.
I was hoping we could get it into 4.7.2 when we release 3.11.10 soon (maybe
next week).
Thanks, Wan-Teh. I checked this in to the trunk.

Checking in configure;
/cvsroot/mozilla/nsprpub/configure,v  <--  configure
new revision: 1.237; previous revision: 1.236
done
Checking in configure.in;
/cvsroot/mozilla/nsprpub/configure.in,v  <--  configure.in
new revision: 1.241; previous revision: 1.240
done

I'm also in favor of having this go into 4.7.2 if it's not too late.
Priority: -- → P2
Summary: NSPR shared libraries should use direct bindings → NSPR shared libraries should use direct bindings on Solaris
Version: other → 4.7.1
OK.  I put this in NSPR 4.7.2 Beta 3.  I just imported
NSPR_4_7_2_BETA3 into mozilla-central for testing on the
Mozilla trunk.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: 4.7.3 → 4.7.2
Stack for pipeping2:

core 'core' of 24992:	./pipeping2
-----------------  lwp# 1 / thread# 1  --------------------
 ff352a48 ForkAndExec (119d0, 21c78, 0, 236d8, 0, 0) + c0
 ff3530e8 _MD_CreateUnixProcess (119d0, 21c78, 0, 236d8, 5, 4) + 80
 ff32d470 PR_CreateProcess (119d0, 21c78, 0, 236d8, 31960, 0) + 30
 00011394 main     (1, ffbef9b4, ffbef9bc, 21c00, 0, 0) + 364
 00010c00 _start   (0, 0, 0, 0, 0, 0) + 108
-----------------  lwp# 2 / thread# 2  --------------------
 ff29edc4 _signotifywait (ff1cc000, 0, 0, 0, 0, 0) + 8
 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
-----------------  lwp# 3  --------------------------------
 ff29c968 _door_return (3, ff1cd658, ff1cd670, 3, ff1cc000, 1) + 10
 ff1aa358 _lwp_start (ff145d98, 0, 6000, ffbef2e4, 0, 0) + 18
 ff1b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
--------------------------  thread# 3  --------------------
 ff1ad9b8 _reap_wait (ff1d0980, 1e924, 0, ff1cc000, 0, 0) + 38
 ff1ad710 _reaper  (ff1cce00, ff1d2708, ff1d0980, ff1ccdd8, 1, fe400000) + 38
 ff1bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40
--------------------------  thread# 4  --------------------
 ff1bafd4 _restorefsr (249c0, 0, 0, 0, 0, 0) + 8
System resources available:
>	limit
cputime 	unlimited
filesize 	unlimited
datasize 	unlimited
stacksize 	32768 kbytes
coredumpsize 	unlimited
vmemoryuse 	unlimited
descriptors 	1024 

Originally, stacksize was 8192 kb. I tried to increase it to 32,768 kb but it didn't change anything.
Stack for sockping:

core 'core' of 24998:	./sockping
-----------------  lwp# 1 / thread# 1  --------------------
 fdb52a48 ForkAndExec (11834, 21ad8, 0, 23538, 0, 0) + c0
 fdb530e8 _MD_CreateUnixProcess (11834, 21ad8, 0, 23538, 2, 4) + 80
 fdb2d470 PR_CreateProcess (11834, 21ad8, 0, 23538, 31960, 0) + 30
 0001126c main     (1, ffbef9b4, ffbef9bc, 21800, 0, 0) + 234
 00010c08 _start   (0, 0, 0, 0, 0, 0) + 108
-----------------  lwp# 2 / thread# 2  --------------------
 fda9edc4 _signotifywait (fd9cc000, 0, 0, 0, 0, 0) + 8
 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
-----------------  lwp# 3  --------------------------------
 fda9c968 _door_return (3, fd9cd658, fd9cd670, 3, fd9cc000, 1) + 10
 fd9aa358 _lwp_start (fd945d98, 0, 6000, ffbef2e4, 0, 0) + 18
 fd9b1c2c thr_yield (0, 0, 0, 0, 0, 0) + 8c
--------------------------  thread# 3  --------------------
 fd9ad9b8 _reap_wait (fd9d0980, 1e924, 0, fd9cc000, 0, 0) + 38
 fd9ad710 _reaper  (fd9cce00, fd9d2708, fd9d0980, fd9ccdd8, 1, fe400000) + 38
 fd9bb01c _thread_start (0, 0, 0, 0, 0, 0) + 40
--------------------------  thread# 4  --------------------
 fd9bafd4 _restorefsr (24820, 0, 0, 0, 0, 0) + 8
I am sorry, comments 18, 19 and 20 where meant for another bug. Please disregard these comments.
Well, after investigation, it looks like the 2 stacks above are related to this bug...
I made a build for NSPR_4_7_2_RTM without the -Bdirect and these 2 tests don't crash anymore.
Christophe, is there another bug open for these test failures ?
You need to log in before you can comment on or make changes to this bug.