Closed Bug 192414 Opened 22 years ago Closed 21 years ago

Parser recursion does not check stack overflow [@ GetChar ]

Categories

(Core :: JavaScript Engine, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla1.5alpha

People

(Reporter: user, Assigned: brendan)

References

Details

(Keywords: crash, js1.5, testcase, Whiteboard: [ QA note: verify interactively - see Comment #26 ])

Crash Data

Attachments

(7 files, 22 obsolete files)

481 bytes, application/x-javascript
Details
1.10 KB, text/plain
Details
512 bytes, text/html
Details
1.39 KB, patch
Details | Diff | Splinter Review
25.57 KB, patch
Details | Diff | Splinter Review
17.11 KB, patch
shaver
: review+
Details | Diff | Splinter Review
7.75 KB, patch
shaver
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003
Build Identifier: Linux i686 build of js shell from CVS on 2003-02-08

Currently SM parser does not check for recursion level which can lead to
segmentation fault on too deeply nested expressions which can be used by bad
scripts to crash Mozilla for example.

Reproducible: Always

Steps to Reproduce:
To test, run the following attached script in js shell which constructs a string
like (1&1(1&...)) with 10000 ( and then passes it to eval. It may be necessary
to limit application stack size to something around 1MB or increase the value of
N in the test script
Actual Results:  
~/w/js/x> ulimit -s 1024
~/w/js/x> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js rec_test.js 
GOT STR
Segmentation fault


Expected Results:  
~/w/js/x> ulimit -s 1024
~/w/js/x> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js rec_test.js 
GOT STR
1
Attached file Test case
confirming crash using build 20030130 (CVS) on Linux.
Severity: normal → critical
Keywords: crash, testcase
Summary: Parser recursion does not check stack overflow → Parser recursion does not check stack overflow [@ GetChar ]
Confirming and cc'ing Brendan -
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: js1.5
Testcase added to JS testsuite:

      mozilla/js/tests/js1_5/Regress/regress-192414.js

This crashes my WinNT box under the SpiderMonkey debug and opt shells.
It also crashes in my Rhino interpreted and compiled shells -
The regular expression parser also does not check for stack overflow during
recursion. The following modification of the original testcase gives stack
overflow during regular expression construction:

function repeat_str(str, repeat_count) {
	if (repeat_count == 0) { return ""; }
	if (repeat_count == 1) { return str; }
	--repeat_count;
	var array = new Array(repeat_count);
	while (repeat_count != 0) {
		array[--repeat_count] = str;
	}
	return str.concat.apply(str, array);
}


var N = 1000000;

var reg = new RegExp(repeat_str("(1", N) + repeat_str(")", N));


Should this be reported as a separated bug?
I'll fix this for 1.4.

/be
Assignee: khanson → brendan
Priority: -- → P3
Target Milestone: --- → mozilla1.4alpha
I have filed bug 192626 on the RegExp parser issue. Also note
bug 192465, filed for the same issue regarding Object.toSource().

Question: is it for reasons of security that these bugs should
be fixed? One could argue that the tests that expose them are
artificial and "won't" occur in real-life. cc'ing Mitch on that -

Will fixing the bugs adversely affect JS performance?
This is not a critical severity bug.  You have to contrive a testcase to make
the stack overflow.  I'm calling it minor, although I don't object to trivial ;-).

/be
Severity: critical → minor
The following patch tries to apply a generic solution for unchecked recursion.
It checks amount of the stack space consumed between recursive function
invocation and from the moment of JSContext initialization. If the amount of
consumed stack exceeds a limit, recursion is aborted. In this way a single
check is enough to protected from all forms of recursion and it would not miss
a stack overflow from combined recursion of interpreter and parser. 

The stack consumption is calculated by taking a pointer difference between a
variable that exists during js_NewContext call and a variable at the recursion
point. This is can be quite wrong as people can try to reuse the context or
call js_NewContext from a legitimate function that consumed a lot of stack
space already, so a stack pointer value would be quite off from the real stack
usage.
It even may have nothing to do with the current stack if JS_Context was created
on another thread, but I believe it is not allowed by SM, right?

Another weak point of the patch is that it does not take into account the
amount of stack space already consumed at the moment of the engine invocation.
So if JS engine is called from, say, another scripting language under deep
recursion, it may still miss stack overflow. This can be addressed by
decreasing the stack limit in a OS specific way, see for example, 
http://gcc.gnu.org/ml/gcc/2003-02/msg01342.html . And, of cause, the stack size
limit itself without the adjustment according to the currently consumed space
should be set to OS-specific value instead of the arbitrary 1MB.
Attachment #115178 - Attachment is obsolete: true
Style nits:

- The JSSize and JSPtrdiff types came from a fork of NSPR2.0 files that fur did
in 1998 or so to make js/src stand alone.  They are not used in the engine as
far as I can tell; I prefer size_t and ptrdiff_t (use standard C, it's a new
millennium ;-).

- The js_CheckStackSize macro is named as if it were a function, but in fact
it's an "unsafe" macro that takes the address of its second parameter.  I'd
prefer it if it were JS_CHECK_STACK_SIZE, and it might be even better to make
the callers take the address of the the dummy stack variable.

Has this patch introduced any measurable (even on slow machines) performance
changes?

I'm not sure if this helps the hard cases, even if we try to tune the 1U << 20
(could use JS_BIT there -- we used to work on Win16, and 1U was of type unsigned
or uint16 there) value.  We might end up chasing a set of magic numbers for
different OSes and compilers.  Perhaps it's time to make the engine, or at least
the parser, much more iterative, instead?  Then we could limit heap memory usage
arbitrarily but generously, and not have to worry about OS-dependent tuning.

/be
This patch iteration renames js_CheckStackSize into JS_CHECK_STACK_SIZE and
uses size_t and ptrdiff_t in place of JSSize and JSPtrdiff. Strictly speaking
in the 21st century it is better to use intptr_t for an integer type to hold a
value of a pointer since ptrdiff_t is guarantied to hold only pointer
difference, not the pointer itself, but 20th century compatibility holds that
back. Perhaps there is a reason for JSPtrdiff after all?

I kept passing the variable name to the macro JS_CHECK_STACK_SIZE, not its
address since it prevents passing non-lvalue accidentally, so compiler will
detect a usage errors like JS_CHECK_STACK_SIZE(cx, NULL).

I believe the performance impact of the patch is unmeasurable since it
introduces a couple of checks doing integer compare / subtractions on struct
fields in functions that would allocate memory or make other function calls.
Attachment #115188 - Attachment is obsolete: true
Right, ptrdiff_t on Win16 was only 16 bits -- although that was not kosher by
the C standard, because it couldn't express the difference between a[0] and
a[32768] for a big array a that fit in a 64k segment.

Rather than intptr_t, if you want a type the JS engine knows is big enough to
hold a pointer, you can use jsword (jsuword if unsigned).

/be

As regarding making the engine iterative: to avoid recursion completely, one
would need to merger all code paths that leads to recursion to a single C
function. It should be possible to get that for interpreter, but even that would
not help cases like calling external functions that calls the engine
recursively, like recursive document.write. 

With something like the patch above it is possible to make engine immune against
deliberate stack overflow attacks which AFAIK can be explored to execute
arbitrary code. And stack usage constants tuning can be left to an application
as with the patch applications needs to provide only single number of stack size
limit.
Attachment #115201 - Attachment is obsolete: true
Comment on attachment 115201 [details] [diff] [review]
Patch update: using JSUword to hold pointer values / stack size limitation constants

Looks good.  One tiny nit, really a personal problem of mine: notice that
JSUword and JSWord are never used in the source; the all-lowercase jsuword and
jsword are used instead. That's the convention ("tradition") mentioned for
scalar typedefs in README.html.  It would warm my heard if you upheld it. 
Maybe we should remove the cloned-from-NSPR2.0-PRUword/PRWord names....

Phil, can you try this patch on several platforms?

We should also have a JS_SetStackSizeLimit(cx) API as Igor suggested, I think. 
Igor, are you willing to add that too?	Thanks for all your work here.

/be
("warm my heart")  Wow, racing with the patch manager.  I'll gladly r= a patch
that adds JS_SetStackSizeLimit, so embeddings can tune to a non-default limit.  

One other thought: should the stack size limit be per-runtime, or per-context,
or both (with a per-runtime default value)?

/be
I think a stack size limit should be strictly per-context settings since the
limit should take into account currently consumed stack and this is per-thread
data. 

To simplify user's life a separated JS_SetOsDefaultStackSizeLimit can be
provided that would try to do its best to guess the stack limit from the
OS-specific API. On MAC as Kenton Hanson reported
http://bugzilla.mozilla.org/show_bug.cgi?id=96128#c5 it can call StackSpace, on
Linux getrusage() may help and on BSD it can try pthread_attr_getstac, but with
the default value of 1MB at least some protection would be available.
Attachment #115203 - Attachment is obsolete: true
I have built the optimized JS shell on WinNT4.0 and Linux RH7.2,
both with and without the latest patch applied. I'm having trouble
with my new Mac OSX environment, and have asked Patrick for help.


WinNT: no difference in the test results, including elapsed time to run
Linux: I get three new test failures with the patch applied:

    Testcase js1_5/Regress/regress-152646.js failed
    Expected exit code 0, got 3
    Testcase terminated with signal 0
    Complete testcase output was:
    InternalError: too much recursion

    Testcase js1_5/Regress/regress-192414.js failed
    Expected exit code 0, got 3
    Testcase terminated with signal 0
    Complete testcase output was:
    ./js1_5/Regress/regress-192414.js:64: InternalError: too much recursion

    Testcase js1_5/Regress/regress-192465.js failed
    Expected exit code 0, got 3
    Testcase terminated with signal 0
    Complete testcase output was:
    InternalError: too much recursion


It sounds from the above discussion that this might not be surprising.
That is, would success on these tests depend on further OS-tuning?
Regarding the test faulures on Linux: all this tests tests against too deep
recursion, and in all the cases stack size exceeds 1MB. Just try to to set the
stack limit to a value less then 1MB and I believe you will get segmentation
fault in all 3 cases:

~/w/js/mozilla/js/src> ulimit -s 1000;
~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js -f ~/w/js/TESTS/js1_5/shell.js -f
~/w/js/TESTS/js1_5/Regress/regress-152646.js
Segmentation fault

I also curious, did Window NT pass the tests? And if does, what happpens if
js1_5/Regress/regress-192414.js you will change N from 10*1000 to, say, 100*1000 ?
With this patch iteration manually running Regress/regress-152646.js gives:

~/w/js/mozilla/js/src> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js -f
~/w/js/TESTS/js1_5/shell.js -f ~/w/js/TESTS/js1_5/Regress/regress-152646.js 
/home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118: InternalError: too
much recursion:
/home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118:
(((((((((((((((((((((((((((((((((((((((((((((((((((
/home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118:
...........................................^

which indicates that just after 51 left '(' parser consumend more then 1MB of
stack space. Perhaps more iterative parser is a good idea after all?
Attachment #115207 - Attachment is obsolete: true
As the test cases js1_5/Regress/regress-192414.js and
js1_5/Regress/regress-192465.js test that engine does not crash on too deep
recursion, the patch adds try/catch around recursive code to catch a possible
exception about exceeding stack size limit and make tests pass in this case.
> Test cases update: use try/catch arround code that can throw recursion error.

I have updated the tests with Igor's patch: js1_5/Regress/regress-192414.js
                                            js1_5/Regress/regress-192465.js

Now they will guard against crashes only, and treat "too much recursion"
exceptions as a pass.

Igor had another good idea: I need to run these stack overflow tests
interactively to see the correct results on Linux. The three tests in
Comment #22 were all "passing" according to the Perl test driver, but
in reality were segfaulting (before Igor's patch was applied). Apparently
there is some lack of communication between JS and Perl on my Linux box
when this type of segfault occurs; thus my Comment #22 was misleading.

I'll have to run the JS shell manually, as in:
(path to)/js.exe -f js1_5/shell.js -f js1_5/Regress/regress-192414.js


Here are my corrected results, using this approach on regress-152646.js
plus the two revised testcases. Note these have the following structure.
There is no try...catch in regress-152646.js, because it is testing a
direct compilation of the code (i.e. not inside an eval string):

regress-152646.js:   var x = HUGE_EXPRESSION;
regress-192414.js:   try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! }
regress-192465.js:   try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! }


RESULTS:

LINUX  ulimit -s 8192 (default) --------------------------------------------
                          TRUNK              LATEST PATCH
regress-152646.js        test passes    InternalError: too much recursion
regress-192414.js        segfault       test passes ("too much" is caught)
regress-192465.js        segfault       test passes ("too much" is caught)


LINUX  ulimit -s 1000 ------------------------------------------------------
                          TRUNK              LATEST PATCH
regress-152646.js        segfault             segfault
regress-192414.js        segfault             segfault
regress-192465.js        segfault             segfault


WINNT4.0  ------------------------------------------------------------------
                          TRUNK              LATEST PATCH
regress-152646.js        segfault             segfault
regress-192414.js        segfault             segfault
regress-192465.js        segfault             segfault
Whiteboard: [ QA note: verify interactively - see Comment #26 ]
INTERPRETATION:

LINUX
We see that with the latest patch, regress-152646.js becomes a problem.
Should I modify this test to have the same structure as the other two?

i.e. not:     var x = HUGE_EXPRESSION;
but rather:   try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! }

Or is it important to FAIL this test if it can't compile directly?
The expression is of the form ((((0)))), but with hundreds of parens.
See bug 152646 for a reference.


WINNT
Here the patch seems to have no effect. I segfault on all three tests,
with or without the patch applied. Why?
Attachment #115231 - Attachment is obsolete: true
Phil, could you try the patch on WinNT with stack limit set to, say, 100K or
less instead of 1MB (see js_NewContext , line 154 in jscntxt.c).

Regarding the failure of regress-152646.js on Linux. Currently tests contains
2048 '(' and on my Linux box depending on optimization flags for gcc 3.2 it
needs between 1.5MB (-Os flag) and 2.5MB (-O3) of stack space to complete. Thus
if you cut the number of '(' to 1024 the test most likely will pass. Still the
test shows that making parser more iterative will reduce stack consumption
significantly, as spending 1KB-2KB of stack space is IMO hurt processor caches
and performance. 

The main culprit of the high stack usage is a recursion between OrExpr and
UnaryExpr, where there is at least 10 intermediate function calls with only 1 or
2 really adding nodes to the parse in typical situations, while each adds at
least 64 bytes of stack space due stack alignment.
Igor, recursive descent parsers do use a lot of stack indirectly recursing to
parse nested parentheses.  That's a design characteristic.  It's not necessarily
a bad thing -- it depends on how common deeply-nested parentheses arise in the
real world.

I'm looking into a patch to switch to operator precedence parsing.

/be
> Phil, could you try the patch on WinNT with stack limit set to, say,
> 100K or less instead of 1MB (see js_NewContext , line 154 in jscntxt.c).

That worked!!! On WinNT I changed line 154 from:

              js_SetStackSizeLimit(cx, JS_BIT(20));
to            js_SetStackSizeLimit(cx, JS_BIT(17));


Then the three tests behaved the same as they do on my Linux box!
(after the latest patch is applied):


WINNT4.0  --------- js_SetStackSizeLimit(cx,JS_BIT(17)) --------------------
                          TRUNK              LATEST PATCH
regress-152646.js        segfault       InternalError: too much recursion
regress-192414.js        segfault       test passes ("too much" is caught)
regress-192465.js        segfault       test passes ("too much" is caught)
Given Phil comments above, what is the best way to proceed now? The patch I
believe does allow to set for an application stack limit and prevent possible
stack overflow attacks that potentially can be explored to run arbitrary code
especially in multi-threaded application (AFAIK this is extremely hard, but with
the full source in hands it is doable).

Ideally a separated function can be provided to get at least an estimation of
stack size limit, but for now setting hard coded default limit may not be that bad.
Attachment #115482 - Flags: review?(brendan)
Attachment #115482 - Attachment is obsolete: true
Attachment #115482 - Flags: review?(brendan)
Attachment #115657 - Flags: review?(brendan)
There's no rush to get a patch in, but I'd be happy with Igor's patch landing in
those files that still need it, once I've gotten an operator-precedence patch to
the parser landed.  Should be done in the next week or two.

/be
Blocks: 192465
Attachment #115657 - Attachment is obsolete: true
I ran the latest patch against the JS testsuite on WinNT and Linux.
It caused no test regressions in the debug or optimized JS shell.
Status: NEW → ASSIGNED
*** Bug 152646 has been marked as a duplicate of this bug. ***
From Comment #27:

> Should I modify regress-152646.js to have the same structure ...?
>
> i.e. not:     var x = HUGE_EXPRESSION;
> but rather:   try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! }


I have done this: testcase js1_5/Regress/regress-152646.js now
has the same try...catch structure as the other two tests above.

That way, possible exceptions about exceeding stack size limit
will make the test pass, as with the other two tests.
Attachment #116080 - Attachment is obsolete: true
Comment on attachment 120400 [details] [diff] [review]
Patch update to reflect recent changes.

Should I split the patch into API part and pieces regarding recursion checks in
parser, toSource and regexps?
Attachment #120400 - Flags: review?(brendan)
Target Milestone: mozilla1.4alpha → mozilla1.4beta
Target Milestone: mozilla1.4beta → mozilla1.5alpha
Attachment #120400 - Attachment is obsolete: true
Attachment #123489 - Flags: review?(brendan)
Attachment #115657 - Flags: review?(brendan)
Attachment #120400 - Flags: review?(brendan)
Attachment #123489 - Attachment is obsolete: true
Attachment #123489 - Flags: review?(brendan)
Attachment #126853 - Flags: review?(brendan)
Attachment #126853 - Attachment is obsolete: true
Attachment #126853 - Flags: review?(brendan)
Attachment #128505 - Flags: review?(brendan)
Attached patch my version of Igor's patch (obsolete) — Splinter Review
I assert that stacks grow downward (no one I've spoken with knows of a
counterexample), reducing the cost of JS_CHECK_STACK_SIZE.  Further, by making
js_SetStackSizeLimit precompute a limit pointer, JS_CHECK_STACK_SIZE avoids a
subtraction and the need for cx->stackBase.

An mostly-aesthetic tweak: use int for stackDummy local type.

Control flow to label out: in js_Interpret was already a bit dodgy in the first
goto out case, as cx->interpLevel had not yet been bumped.  I fixed that and
moved the recursion check up, unifying "then" clauses for the two over-recursed
tests.

/be
Same comments for this one.

/be
Attachment #131928 - Attachment is obsolete: true
Attachment #131929 - Flags: review?(user)
Shaver informs me that HP-UX on HP Precision Architecture stacks grow upward. 
Who thought of that?  Whatta revoltin' development!

/be
Next is a diff -w version.

/be
Attachment #131929 - Attachment is obsolete: true
Attached patch diff -w version of last patch (obsolete) — Splinter Review
I could see an autoconf test, but the standalone js make goop wouldn't benefit,
and I'm lazy.  If there really are more architecture/compiler/OS combinations
out there beyond HPUX-for-HPPA that grow stacks upwards, I'll bestir myself. 
But for now (1.6a) this is what I'd like to check in.

/be
Comment on attachment 131936 [details] [diff] [review]
diff -w version of last patch

shaver's around and I shouldn't ask Igor to review, as he wrote much of this
patch and was the driving force.  But he's welcome to comment too, of course.

/be
Attachment #131936 - Flags: review?(shaver)
Patch in haste, obsolete at leisure...

/be
Attachment #131935 - Attachment is obsolete: true
Attached patch diff -w version of last patch (obsolete) — Splinter Review
This is more like it.  I think JS_SetStackSizeLimit(cx, 0) is a clearer trope
for disabling than JS_SetStackSizeLimit(cx, (jsuword)-1).

/be
Attachment #131936 - Attachment is obsolete: true
Attachment #131939 - Flags: review?(shaver)
Attachment #131936 - Flags: review?(shaver)
Comment on attachment 131939 [details] [diff] [review]
diff -w version of last patch

HPUX!==parisc
11 can run on itanium
Attachment #131939 - Flags: review?(shaver) → review-
And you do appear to want parisc:
 >One of the problems with the PA linux implementation is the upward
 >growing stack.  On most architectures, the virtual process space is
 >divided into
timeless: what's the right macro to test, can you find out?  Maybe we should
lift some autoconf love from the Linux kernel sources.

/be
I used JS_SetStackSizeLimit(cx, (jsuword)-1) in the initial patch to disable
stack checking since there is no any special cases associated with it.
(jsuword)-1 == MAX_JSUWORD == MAX_EVER_POSSIBLE_STACK_SIZE and there is no need
to treat case of 0 specially. It would also prevent error when one would use,
say, a - b to set stack size and due to a bug it happens that a == b.

As regarding the optimization to assume that stack grow on a particular
architecture only in one direction, I remember some suggestions to change the
direction even for x86 as it may make buffer overflow exploits harder.
well um, here's what i've seen tested:
_PA_RISC1_0 _PA_RISC1_1 hppa __hppa__ __hppa
and then there's some hint that there's a paric2.0 in existence
so three might be another define ...

igor: yeah but it sounded like people claimed it wouldn't really help
but at least people discussed it, so i suppose it's theoretically
possible that it could happen.
Igor: using (jsuword)-1 directly as a value for cx->stackLimit does not work for
the revised JS_CHECK_STACK_SIZE macro, so there's no advantage to having the
user pass in the "disabled" value.  The "disabled" cx->stackLimit value depends
on stack growth direction, and the ?: expression needed to select the right
value is small, and executed once per new-context or set-stack-size-limit API
call.  Therefore the API "disabled" value can be whatever we like, and 0 is
easier to express, prettier, and better connotes "off" (false, null).

Let's not worry about theoretical upward stack growth on x86 until the assertion
botches and we know it's a real problem.

/be
Attachment #131938 - Attachment is obsolete: true
Attachment #131939 - Attachment is obsolete: true
Attachment #132023 - Flags: review?(shaver)
We really don't support cross-compiling SpiderMonkey well at all, do we?

Anyway, this addresses all concerns about hardwiring stack growth direction. 
Hope everyone is ok with it.  I'll look for r=shaver before landing.

/be
Comment on attachment 132023 [details] [diff] [review]
diff -w version of last patch

Looks good.  There are autoconf tests we can steal too, when the time comes for
someone to revive rginda's (?) work.
Attachment #132023 - Flags: review?(shaver) → review+
> We really don't support cross-compiling SpiderMonkey well at all, do we?

well, i'm cross-compiling it for target arm at the moment ;-)
Darin, how's it going?  What have you had to hack?

/be
no trouble with SpiderMonkey yet... xptcall on the other hand :-/
Fixed.

/be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This patch is causing us to crash on OS/2 build.  In js_SetStackSizeLimit,
stackDummy has a low value, and then 1MB is subtracted, resulting in a negative
value.  cx->stackLimit is a unsigned long, so in the assignment, a negative
value becomes a large positive value, so all JS_CHECK_STACK_SIZE calls fail on
OS/2, eventually leading to a crash.

From reading the comments, it looks like I need to put in an OS/2 specific call
to find the end of the stack and work with that.  However, the call is rather
expensive, and should just be called once per thread.  Does the javascript
engine run on multiple threads, or only on thread 1?
Javier, do stacks grow downward on OS2?

If so, we could clamp at 0.  But if stacks grow upward, then you must not be
running jscpucfg to generate jsautocfg.h.  What's the story?

Also, what's the eventual crash?  You shouldn't crash just because the JS engine
cleanly fails due to (wrongully declared, but never mind) thread stack overflow
detection.

/be
Attached patch proposed OS2 fix (obsolete) — Splinter Review
Unless OS2 builds are using the wrong stack direction, this should be the right
fix.  It's what I should have done anyway, for robustness.

/be
Comment on attachment 132161 [details] [diff] [review]
proposed OS2 fix

Good testing results from Javier needed too, of course.

/be
Attachment #132161 - Flags: review?(shaver)
Comment on attachment 132161 [details] [diff] [review]
proposed OS2 fix

Yeah, nice robustness fix.  Javier's testing will tell the tale, in the end.
Attachment #132161 - Flags: review?(shaver) → review+
> Javier, do stacks grow downward on OS2?

Yes.

> If so, we could clamp at 0.  But if stacks grow upward, then you must not be
> running jscpucfg to generate jsautocfg.h.  What's the story?

Everything is being generated right.  It's just that the 1MB default limit
doesn't work for OS/2.  I've implemented an OS/2 specific section is
js_SetStackSizeLimit that finds the actual end of the stack and uses that.  With
this code, no assertions and no crash.

> Also, what's the eventual crash?  You shouldn't crash just because the JS engine
> cleanly fails due to (wrongully declared, but never mind) thread stack overflow
> detection.

I'm having problems with my debugger, so I can't tell you the exact line that is
causing the crash, but this is what the console output looks like right before
the crash:

JavaScript error: 
chrome://communicator/content/profile/profileSelection.js line 1: too much recursion

WARNING: waaah!, file
D:/builds/trunk/mozilla/content/xul/document/src/nsXULPrototypeDocument.cpp,
line 864
JavaScript error: 
chrome://communicator/content/profile/profileManager.js line 1: too much recursion

WARNING: Failed to get a global Object Owner, file
D:/builds/trunk/mozilla/dom/src/base/nsJSEnvironment.cpp, line 156
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file
D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file
D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file
D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file
D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file
D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79

> Unless OS2 builds are using the wrong stack direction, this should be the right
> fix.  It's what I should have done anyway, for robustness.

Yes, this new patch does indeed work for me.  However, as mentioned in some of
the comments, wouldn't it be best to use OS functions to find the actual end of
the stack, and if these functions don't exist, just fall back on the default?
Javier: The OS-specific functions can be costly, as you advised.  We don't want
endless ifdefs in jscntxt.c to fine-tune a sanity limit that we can set loosely.
 So, with the latest patch, do you crash on stack overflow?

If OS2 stacks grow downward and start from an address less than 0x100000, how
low can they go?  Can they go to 0, or must they stop one page up from null?

/be
The lower 64K are reserved, so it wouldn't go all the way to 0.  Instead of
doing a simple 1MB check, why not use the stack size?  Windows and OS/2 specify
the executable stack size in rules.mk (don't know about other platforms). 
Shouldn't we be using those, rather than arbitrarily choosing 1MB?
Javier: where in rules.mk?  We can easily override JS_DEFAULT_STACK_SIZE_LIMIT,
that's what it's there fore.  Give me a pointer, I'll patch.

/be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
On the later crash due to the spurious stack overflow error, it sounds like the
XUL content sink code is not robust enough.  I'll let someone else file a bug on
that (Javier, you're "it" if you care ;-).

/be
So this is what I had in mind.	I thought that Win32 also set their stack size
to 0x100000, but I can't find it.  Maybe they use the default 1MB now.	What do
you think?  I cannot actually build this until tomorrow, but it shouldn't crash
OS/2.
0x100000 is 1MB.  You said in comment 66 that OS2's default stack size was such
that when js_SetStackSizeLimit subtracted 1MB from &stackDummy, the result was a
negative signed word, so a very large unsigned word value.  So attachment 132172 [details] [diff] [review]
won't actually improve things, without my attachment 132161 [details] [diff] [review] to clamp at 0.

For many platforms, this latest patch will force a reduction in Mozilla's stack
size from what it defaults to presently.  On Linux, for example, it depends on
the default for ulimit -s, which is unlimited on my RH8 laptop.

Remember that  js/src (SpiderMonkey) is built standalone and used around the
world in many embeddings, with no hint of Mozilla's autoconf stuff, so no MOZ_*
macros.

I'd rather we not force the stack size limit in the build system to a certain
value, just for this bug.  How do we know we're not crimping Gecko's recursive
style?  Do that in a separate bug.  Here, we should do the minimum to get OS2
back up.  Is attachment 132161 [details] [diff] [review] enough?

/be
Status: REOPENED → ASSIGNED
> 0x100000 is 1MB. 

Quite right.  All these numbers are starting to run together in my head!

> Here, we should do the minimum to get OS2 back up.
> Is attachment 132161 [details] [diff] [review] enough?

Yes, that patch is enough to bring Mozilla back up, and to be used for normal
browsing.

But it won't fix what this bug is intended to fix.  I finally built the
standalone interpreter and ran the testcases.  Sure enough, they all crash with
BADSTACK on OS/2.  The only way I found to prevent these crashes (and to get the
stack overflow error messages) without using some arbitrary value was to query
the OS to get the end of the stack, and then set cx->stackLimit = StackEnd + 5120.

Setting the stackLimit to a predefined value for all platforms won't fix this
bug, due to the reasons pointed out in comment #11.  And as Phil saw in comment
#31, he needed to lower JS_DEFAULT_STACK_SIZE_LIMIT to less than window's stack
size in order to get the testcases to run.
I do not think that calling os-specific functions to get stack size limit would
have any performance impact if done once per context creation (unless they are
extremely badly implemented...). The attachment 132092 [details] [diff] [review] from bug 96128 contains a
possible implementation of such OS-specific call for Mac, perhaps something
similar can be done for OS2?
I checked in attachment 132161 [details] [diff] [review].

We can keep this bug open and take a long march through the OSes, adding system
calls to get stack sizes, subtracting "fudge terms" to account for stack space
already consumed.  But I'd prefer separate bugs for better assignment, since it
will take several people, with their own machines, to get the job done right.

Also, the reason I'm trying to minimize invasive, extensive, fudge-term-filled
changes for this bug: not all JS embeddings need to worry about crashing on a
runaway recursion (e.g.).  C and C++ so crash, why not JS for a "single-system
image" where all executed scripts come from the same trust domain in which the
interpreter runs?

If you take that view far enough, it says that the embedding application is
responsible for calling JS_SetStackSizeLimit with the right (OS-dependent,
existing stack-depth-dependent) parameter.  After all, only the embedding knows
the distance in stack space from main()'s stack to the frame of the function in
that embedding that calls JS_SetStackSizeLimit.

I like this approach much better.  It means the sequel bugs to this one would
want to patch dom/src/base/nsJSEnvironment.cpp, not jscntxt.c.  Cc'ing jst for
DOM advice and wtc for any help NSPR can provide on stack sizing.

/be
One final thought: the JS_SetStackSizeLimit API might rather be a
JS_SetStackLimit API that takes a jsuword *address*, not a size.  Then callers
would not need to subtract the currently-consumed stack space from the
OS-derived stack size limit, only to have js_SetStackSizeLimit add back a
nearly-equal (but slightly larger) currently-consumed address (&stackDummy, in
js_SetStackSizeLimit).

Comments?

/be
So do you mean that we would change JS_SetStackLimit API to take an address, and
then call it (with the OS specific value) after ::JS_NewContext() in
dom/src/base/nsJSEnvironment.cpp?
Javier: yes, exactly.

The current state of affairs has created an API incompatibility, in fact: if a
multi-threaded embedding pools contexts among threads, when it reassigns an idle
context to a new thread via JS_SetContextThread, the cx->stackLimit (default or
set explicitly) value will be wrong.  JS_SetContextThread can't recompute the
value very well, because again it doesn't know the new thread's stack depth.

I'll provide a patch shortly that minimizes the JS code appropriately.

/be
And leave it to embedders to pass the proper stack limit address for the thread
to which the context is assigned, via JS_SetThreadStackLimit.

/be
Attachment #128505 - Attachment is obsolete: true
Attachment #132161 - Attachment is obsolete: true
Attachment #132172 - Attachment is obsolete: true
Attachment #132216 - Flags: review?(shaver)
Comment on attachment 132216 [details] [diff] [review]
axe JS_DEFAULT_STACK_SIZE_LIMIT and unneeded subroutine

Yeah, this is something that embedders can do a much better job with.
Attachment #132216 - Flags: review?(shaver) → review+
Fixed, for real.  I filed bug 220408 on the DOM core as the sequel bug.

/be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Attachment #128505 - Flags: review?(brendan)
Attachment #131929 - Flags: review?(user)
*** Bug 152646 has been marked as a duplicate of this bug. ***
Reopening bug; timeless has observed that the following two testcases
are crashing in both the debug and optimized JS shell. I see the same:

               js1_5/Regress/regress-192414.js  
               js1_5/Regress/regress-152646.js  

These are, respectively, the regression test for this bug and the one
for bug 152646, which was duped against this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I get a crash for js1_5/Regress/regress-192414.js , but not for
js1_5/Regress/regress-152646.js on my Linux box.

js1_5/Regress/regress-152646.js produces a SyntaxError, because the testcase has
an error. The parentheses are not balanced! If I balance them, I crash starting
with a nesting level of about 6000.
Zack: thanks once again. Testcase corrected:

[//d/JS_TRUNK/mozilla/js/tests/js1_5/Regress] cvs diff -rHEAD regress-152646.js
RCS file: /cvsroot/mozilla/js/tests/js1_5/Regress/regress-152646.js,v
retrieving revision 1.5
diff -r1.5 regress-152646.js
74a75
> sRight += sRight;


[//d/JS_TRUNK/mozilla/js/tests/js1_5/Regress] cvs ci regress-152646.js
Checking in regress-152646.js;
/cvsroot/mozilla/js/tests/js1_5/Regress/regress-152646.js,v  <--  regress-152646.js
new revision: 1.6; previous revision: 1.5
done
Zack: crashes with what stack?  Is it over 1MB deep?  What does ulimit -s say
when you type it into a shell that you're running the tests from?

/be
Phil, what stacks do you see?  What does ulimit -s say if you're on Linux?  It's
hard to believe 1MB of stack is not provided by modern OSes.  Perhaps something
else is going wrong.

/be
I'm seeing regress-192414.js crash on a Linux build from today as well.  ulimit
-s shows a max stack size of 8 MB.  The new version of regress-152646.js runs
successfully.  Unfortunately I just have an opt build on this machine so I can't
get a trace for the crash.
On my WinNT4.0 box, and on my Win2K box, both testcases still crash
due to stack overflow, even after the correction to regress-152646.js.
On Linux, regress-152646.js does not crash, but regress-192414.js does:

[/d/JS_TRUNK/mozilla/js/src/Linux_All_DBG.OBJ] ulimit -s
8192


(gdb) run -f ../../tests/js1_5/shell.js
          -f ../../tests/js1_5/Regress/regress-192414.js

Starting program: /d/JS_TRUNK/mozilla/js/src/Linux_All_DBG.OBJ/./js
-f ../../tests/js1_5/shell.js -f ../../tests/js1_5/Regress/regress-192414.js

Program received signal SIGSEGV, Segmentation fault.
0x0806e981 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c10) at
jsemit.c:1982
1982        switch (pn->pn_type) {

(gdb) bt
#0  0x0806e981 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c10) at
jsemit.c:1982
#1  0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211fe0) at
jsemit.c:3940
#2  0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c40) at
jsemit.c:3632
#3  0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211f80) at
jsemit.c:3940
#4  0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314ca0) at
jsemit.c:3632
#5  0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211f20) at
jsemit.c:3940
#6  0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314cd0) at
jsemit.c:3632
#7  0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211e90) at
jsemit.c:3940
#8  0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d00) at
jsemit.c:3632
#9  0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211e30) at
jsemit.c:3940
#10 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d30) at
jsemit.c:3632
#11 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211dd0) at
jsemit.c:3940
#12 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d60) at
jsemit.c:3632
#13 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211d70) at
jsemit.c:3940
#14 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d90) at
jsemit.c:3632
#15 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211d10) at
jsemit.c:3940

                            etc.
                            etc.
I have 8 MB stack size too. To run the testcase successfully, I need to increase
it to about 27 MB.
For Windows

regress-152646.js top of crash stack is:

>	js3250.dll!AddExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0,
JSTreeContext * tc=0x0012d918)  Line 2450 + 0xc	C
 	js3250.dll!ShiftExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0,
JSTreeContext * tc=0x0012d918)  Line 2435 + 0x11	C
 	js3250.dll!RelExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0,
JSTreeContext * tc=0x0012d918)  Line 2403 + 0x11	C
 	js3250.dll!EqExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0,
JSTreeContext * tc=0x0012d918)  Line 2378 + 0x11	C
 	js3250.dll!BitAndExpr(JSContext * cx=0x00c51028, JSTokenStream *
ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918)  Line 2366 + 0x11	C

regress-192414.js top of crash stack is:

>	js3250.dll!GetChar(JSTokenStream * ts=0x00c31228)  Line 438 + 0xc	C
 	js3250.dll!js_GetToken(JSContext * cx=0x00c51028, JSTokenStream *
ts=0x00c31228)  Line 761 + 0x9	C
 	js3250.dll!UnaryExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228,
JSTreeContext * tc=0x0012d918)  Line 2551 + 0xd	C
 	js3250.dll!MulExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228,
JSTreeContext * tc=0x0012d918)  Line 2468 + 0x11	C
 	js3250.dll!AddExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228,
JSTreeContext * tc=0x0012d918)  Line 2450 + 0x11	C
The tests fail because the committed code gives an embedding just an option to
limit stack size. By default this limit is set ti infinity and it is the
embedding that should specify it based probably on the information supplied by
OS etc.

I filed the bug 225061 with a patch to add -S command line option to js shell to
specify the stack size for the shell explicitly, with that patch the recursion
is correctly detected, see test run there.

Thanks, Igor.  When bug 225061 is fixed (very soon), this bug should be closed
and verified.

/be
Depends on: 225061
I declare victory!

/be
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Flags: testcase+
The tests/js1_5/Regress/regress-152646.js still causes a bus error, when testing against the latest and greatest js-1.60.

The crash is in js_EmitTree(), stacked upon itself 1565 times.
regress-192414.js|regress-152646.js|regress-192465.js verified fixed 1.8.1|1.9.0 linux/mac*/windows.

I also don't see the issues on 1.8.0 with these tests at least on linux.

verified fixed 1.9.0 linux/mac*/windows.

Status: RESOLVED → VERIFIED
Where are the 1.8.0 and 1.9.0? All I can see in http://ftp.mozilla.org/pub/mozilla.org/js/ is 1.6...
1.8.0, 1.8.1, 1.9.0 refer to the cvs branch not the release version of spidermonkey. I am going to release a 1.6.1 version of spidermonkey this week.
Terrific! Would you, please, also release the set of tests (in the same or in separate tarball), which are all supposed to succeed?

There are many ways in which spidermonkey can be miscompiled, so I would like our (FreeBSD's) port of the software to start running the self-tests after building again.

In the past we were forced to turn the tests off due to some of the tests failing spuriously and breaking the package-building...

The latest /released/ tarball of tests is from 2002, which, I guess, is very outdated by now...

Here, for example, is the log of test failures in 1.6:

http://aldan.algebra.com/~mi/tmp/results-2007-09-14-202814-smdebug.html

Yours,

       -mi
Crash Signature: [@ GetChar ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: