Closed Bug 285330 Opened 20 years ago Closed 20 years ago

64KB stack limit for PR_CreateThread threads crashes luna (DNS thread)

Categories

(Core :: Networking, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: dbaron, Assigned: darin.moz)

Details

The checkin of bug 274450 yesterday caused the luna tinderbox to go orange
because it applied a 64K default stack size limit to Linux.  On luna (Red Hat
8), gethostbyname_r seems to need about that much stack all by itself:

wtchang wrote:
> 2. The app may crash if the thread stack size
> is too small.
> 
> Before, NSPR didn't set the thread stack size.
> So we use the pthread library's default thread
> stack size.
> 
> Now, NSPR sets the default thread stack size to 64KB.
> It is possible that this is too small for Linux.
> 
> Note that an app may specify a different thread
> stack size when they call PR_CreateThread (the
> last argument). I will look into this by examining
> the PR_CreateThread calls made by Mozilla.

The crash seems to be on the DNS thread:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 40966 (LWP 11141)]
0x40f0fee9 in _nss_dns_gethostbyname2_r () from /lib/libnss_dns.so.2
(gdb) bt
#0  0x40f0fee9 in _nss_dns_gethostbyname2_r () from /lib/libnss_dns.so.2
#1  0x40f111e3 in _nss_dns_gethostbyname_r () from /lib/libnss_dns.so.2
#2  0x420ee0c8 in gethostbyname_r@@GLIBC_2.1.2 () from /lib/i686/libc.so.6
#3  0x401756e3 in PR_GetHostByName () from ./libnspr4.so
#4  0x40176510 in pr_GetAddrInfoByNameFB () from ./libnspr4.so
#5  0x401765c8 in PR_GetAddrInfoByName () from ./libnspr4.so
#6  0x4086cf10 in nsHostResolver::ThreadFunc(void*) ()
   from
/builds/tinderbox/SeaMonkey/Linux_2.4.18-14_Depend/mozilla/dist/bin/components/libnecko.so
#7  0x40180d35 in _pt_root () from ./libnspr4.so
#8  0x401b0941 in pthread_start_thread () from /lib/i686/libpthread.so.0
#9  0x401b0a45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0

The top frame on the stack appears to be quite deep:

(gdb) inf reg
...
esp            0x412b203c       0x412b203c
ebp            0x412c24b4       0x412c24b4
...
(gdb) p *(void**)0x412c24b4
$1 = (void *) 0x412c24e4
(gdb) p *(void**)$
$2 = (void *) 0x412c2534
(gdb) p *(void**)$
$3 = (void *) 0x412c29a4
(gdb) p *(void**)$
$4 = (void *) 0x412c29d4
(gdb) p *(void**)$
$5 = (void *) 0x412c2a34
(gdb) p *(void**)$
$6 = (void *) 0x412c2a64
(gdb) p *(void**)$
$7 = (void *) 0x412c2a94
(gdb) p *(void**)$
$8 = (void *) 0x412c2b94
(gdb) p *(void**)$
$9 = (void *) 0x412c2bd4
(gdb) p *(void**)$
$10 = (void *) 0x0

giving a total stack size larger than 64K:

dbaron@ridley Linux (0) ~ $ guile
guile> (- #x412c2bd4 #x412b203c)
68504



Currently luna is not orange because I changed (locally) the last parameter of
the PR_CreateThread call in nsHostResolver.cpp to 262144.


On Tuesday 2005-03-08 11:59 -0800, Wan-Teh Chang wrote:
> There are several other Linux tinderboxes.  Do you
> know why we are only crashing on luna?

Maybe because it's the only one with an implementation of
gethostbyname_r that uses 64K of stack space?  Then again, I think we do
have a few other machines running Red Hat 8.


bz also wrote:
> Wan-Teh Chang wrote:
> >Note that an app may specify a different thread
> >stack size when they call PR_CreateThread (the
> >last argument). I will look into this by examining
> >the PR_CreateThread calls made by Mozilla.
> 
> At least the following use the default stack size:
> 
> DNS thread, disk cache thread, all I/O threads, profile migration, various
> NSS code.
> 
> ccing darin, since necko is a heavy thread user and may need to address this
> issue...
> 
> Past that, I really think we shouldn't be applying stack size limits past
> those set by ulimit.  Why are we?
I think that NSPR should not call pthread_attr_setstacksize when the default
stack size has been requested.  Comments?
NSPR's backward compatibility guarantee means
any such bug is NSPR's fault.  NSPR-based apps
should not need to be modified when upgrading
to a new NSPR release. So I'm marking this bug
INVALID and will reopen NSPR bug 274450.

Darin, your suggestion is exactly what I have
in mind.  Let's continue the discussion in
bug 274450.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
V/invalid, per wtc.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.