Closed Bug 423663 Opened 16 years ago Closed 16 years ago

crash on startup with high stack limits (ulimit -s or /etc/security/limits.conf)

Categories

(NSPR :: NSPR, defect)

4.7.3
x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED INVALID

People

(Reporter: didier.rebeix, Assigned: wtc)

Details

(Keywords: crash)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080129 Iceweasel/2.0.0.12 (Debian-2.0.0.12-0etch1)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080129 Iceweasel/2.0.0.12 (Debian-2.0.0.12-0etch1)

firefox segfaults on startup when the maximum stack size system limits are set high enough. 

For example when my /etc/security/limits.conf contains the following lines, firefox segfaults on startup :

*       hard    stack   1000000
*       soft    stack   1000000

Looking at the source code it seems to be related to GC_my_stack_limits() in gc/boehm/solaris_threads.c after the getrlimit() call.

I was able to reproduce the problem with the debian iceweasel 2.0.0.12 package, and with the 2.0.0.12 binaries from mozilla.org installed on RedHat AS4 update 4 x86_64.

Reproducible: Always

Steps to Reproduce:
1. cat >> /etc/security/limits.conf <<EOF
*       hard    stack   1000000
*       soft    stack   1000000
EOF
2. re-log into the system to apply new limits to your shell
3. verify the limits with ulimits -a :
...
stack size              (kbytes, -s) 1000000
...
4. launch firefox from the same shell (firefox should crash)

Actual Results:  
firefox gets killed by signal 11 (segmentation fault).
core usually dumped.

Expected Results:  
firefox should handle this condition and start gracefully.
um, we don't use gc/, we don't use boehm/, and you're linux, not solaris_.

please get a stack trace, either build --enable-debugger-info-modules --disable-strip, or ask your distro to install debugging symbols. stack traces come from gdb (not strace!) if you need help with gdb, just google.

finally: 2.x is dead, please try with a minefield nightly (which includes breakpad, which almost certainly won't work, but...).
Keywords: crash
Product: Firefox → Core
QA Contact: general → general
Whiteboard: DUPEME
Version: unspecified → 1.8 Branch
There is an easier way to reproduce this: you don't need to edit limits.conf, all you have to do is use the ulimit command to change the stack limit for the current  process and then start firefox, for example:
  ulimit -S -s 500000
  firefox

I have reproduced it on three different SuSE Linux systems, all running Firefox 2.0.0.12. You need to experiment with different values of the stack limit because the behaviour seems to vary from machine to machine. On two of my machines 200000 made it crash, but another machine seemed OK with 200000 but crashed with 500000.
thanks for volunteering. please provide a stack trace from:

./run-mozilla.sh -g -d gdb ./firefox-bin

from a trunk build of your own making once you make one.
Assignee: nobody → bobv
I appreciate that the dynamics of problem reporting for free software projects are different from commercial ones, but I think that if an end user has gone to the trouble of reporting a bug you should be thanking them for helping you improve your product rather than requesting they do a lot more work. If a problem is hard to reproduce then you may have to ask them to supply more information, but once the problem has been reproduced then I would expect a developer to take over.

Has someone tried to reproduce this and failed? I have reproduced it on SuSE 10.2, and SuSE 10.3, and Didier tried Debian and Redhat, so it must be pretty widespread.
http://www.ussg.iu.edu/hypermail/linux/kernel/0505.1/1331.html

searching for an explanation is such a waste of my time. what you're doing is basically asking the system to do something stupid and then complaining when bad things happen.

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_attr_setstacksize.html

if i understand things correctly for linux, each thread is given this reservation. Linux doesn't dynamically grow the stack, it's fixed when the thread is created, and mozilla tends to create a certain number of threads (more if you use dbus/gnomevfs and friends).

It might be possible for nspr to play games with pthread_attr_setstacksize (because we do specify stack sizes for our threads), however I doubt it'd do anything useful.

Basically this is an instance of "doctor, it hurts when i do this" "doctor: then don't do that".
Assignee: bobv → wtc
Component: General → NSPR
Product: Core → NSPR
QA Contact: general → nspr
Version: 1.8 Branch → 4.7.3
I see no evidence that this is an NSPR bug.  Who knows where the crash
occurs?  Maybe in pthread_create, maybe in the caller of PR_CreateThread,
maybe even some code that calls pthread_create without using NSPR.
I predict this bug will languish while it remains filed against NSPR 
without evidence that it is actually an NSPR bug.  

NSPR's PR_CreateThread function takes a stack size argument which, 
if non-zero, is passed to pthread_attr_setstacksize, otherwise the 
stack is allocated with the OS's default size (whatever that is).

ALL calls to PR_CreateThread in mozilla pass a zero value for the 
stack size.  If those callers would prefer another value, they 
can do so without any change to NSPR.  
Thanks for the bug report.

If the maximum stack size is too small, a thread will
crash when it makes several levels of function calls
and overflows its stack.

If the maximum stack size is too large, an application
may not be able to create all the threads it needs.

The ulimit -S -s 500000 command in comment 2 sets
the soft limit of maximum stack size to 500000K ~=
0.5 GB, so just four threads will consume 2GB memory.

As an NSPR bug, this bug is invalid because
PR_CreateThread() fails with
PR_INSUFFICIENT_RESOURCES_ERROR
(OS error ENOMEM) correctly when it cannot
create more threads.

As a Firefox bug, we could make Firefox handle
PR_CreateThread failure gracefully.  But this
is not worth spending more time (WONTFIX)
in my opinion. The expected result in comment 0
"firefox should handle this condition and start
gracefully" cannot be accomplished because
Firefox won't work when it cannot create all
the threads it needs.  The best we can do is
to exit gracefully.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
This thread has been very enlightening, I didn't realise that under Linux setting the stack size limit sets a default size as well as a maximum size (rather counter-intuitive in my opinion). It makes life tricky because some applications will fail if the limit is set too low while others fail if it is set too high.

But there is a very simple fix: change  run-mozilla.sh to include 
  ulimit -S -s 8192
If this value turns out to be a poor choice then it will be discovered quickly because every installation will be running with the same value. As it stands the bahaviour is dependent on the whim of the administrator who (a) has to guess a value appropriate to the mix of applications on the machine and (b) may be ignorant of the dangers of setting high limits.
sadly OS vendors tend to write their own wrapper scripts. indeed we could set values if we could find them. we could even hard code values in our calls to PR_CreateThread. But I'm not sure we have any reasonable values to provide.

fwiw, I'm not actually opposed to setting a value in run-mozilla.sh feel free to file a bug against Toolkit:*Startup asking for one.

However, I'd hope that system administrators are able to use Google and find out about these "features". It didn't take me long (sure I understood the general problem in advance, but you knew what change caused it).

As for the linux behavior, yeah, it's not exactly the most intuitive, i had a good laugh when I confirmed my thoughts about it.

wtc: the reason I said that we passed stack size is that until the thread manager landing:
http://bonsai.mozilla.org/cvslog.cgi?file=/mozilla/xpcom/threads/nsThread.cpp&mark=1.61
we allowed people to pass one (And I did when I wrote threaded code!)
http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsThread.cpp&branch=&root=/cvsroot&subdir=/mozilla/xpcom/threads&command=DIFF_FRAMESET&rev1=1.60&rev2=1.61

So, I blame darin :). However from a quick scan, no normal code actually did that, which is presumably why darin removed the feature.
The administrator should not increase the maximum stack size system limits
to the high values mentioned in this bug (1GB, 0.5GB, and 0.2 GB).  Applications
should not need to defend against such administrator mistakes.  Many other
programs will crash or fail to run on these misconfigured systems.
I agree :)
Status: RESOLVED → VERIFIED
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.