Closed Bug 192414 Opened 22 years ago Closed 21 years ago

Parser recursion does not check stack overflow [@ GetChar ]

Categories

(Core :: JavaScript Engine, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla1.5alpha

People

(Reporter: user, Assigned: brendan)

References

Details

(Keywords: crash, js1.5, testcase, Whiteboard: [ QA note: verify interactively - see Comment #26 ])

Crash Data

Attachments

(7 files, 22 obsolete files)

481 bytes, application/x-javascript
Details
1.10 KB, text/plain
Details
512 bytes, text/html
Details
1.39 KB, patch
Details | Diff | Splinter Review
25.57 KB, patch
Details | Diff | Splinter Review
17.11 KB, patch
shaver
: review+
Details | Diff | Splinter Review
7.75 KB, patch
shaver
: review+
Details | Diff | Splinter Review
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Build Identifier: Linux i686 build of js shell from CVS on 2003-02-08 Currently SM parser does not check for recursion level which can lead to segmentation fault on too deeply nested expressions which can be used by bad scripts to crash Mozilla for example. Reproducible: Always Steps to Reproduce: To test, run the following attached script in js shell which constructs a string like (1&1(1&...)) with 10000 ( and then passes it to eval. It may be necessary to limit application stack size to something around 1MB or increase the value of N in the test script Actual Results: ~/w/js/x> ulimit -s 1024 ~/w/js/x> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js rec_test.js GOT STR Segmentation fault Expected Results: ~/w/js/x> ulimit -s 1024 ~/w/js/x> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js rec_test.js GOT STR 1
Attached file Test case
confirming crash using build 20030130 (CVS) on Linux.
Severity: normal → critical
Keywords: crash, testcase
Summary: Parser recursion does not check stack overflow → Parser recursion does not check stack overflow [@ GetChar ]
Confirming and cc'ing Brendan -
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: js1.5
Testcase added to JS testsuite: mozilla/js/tests/js1_5/Regress/regress-192414.js This crashes my WinNT box under the SpiderMonkey debug and opt shells. It also crashes in my Rhino interpreted and compiled shells -
The regular expression parser also does not check for stack overflow during recursion. The following modification of the original testcase gives stack overflow during regular expression construction: function repeat_str(str, repeat_count) { if (repeat_count == 0) { return ""; } if (repeat_count == 1) { return str; } --repeat_count; var array = new Array(repeat_count); while (repeat_count != 0) { array[--repeat_count] = str; } return str.concat.apply(str, array); } var N = 1000000; var reg = new RegExp(repeat_str("(1", N) + repeat_str(")", N)); Should this be reported as a separated bug?
I'll fix this for 1.4. /be
Assignee: khanson → brendan
Priority: -- → P3
Target Milestone: --- → mozilla1.4alpha
I have filed bug 192626 on the RegExp parser issue. Also note bug 192465, filed for the same issue regarding Object.toSource(). Question: is it for reasons of security that these bugs should be fixed? One could argue that the tests that expose them are artificial and "won't" occur in real-life. cc'ing Mitch on that - Will fixing the bugs adversely affect JS performance?
This is not a critical severity bug. You have to contrive a testcase to make the stack overflow. I'm calling it minor, although I don't object to trivial ;-). /be
Severity: critical → minor
The following patch tries to apply a generic solution for unchecked recursion. It checks amount of the stack space consumed between recursive function invocation and from the moment of JSContext initialization. If the amount of consumed stack exceeds a limit, recursion is aborted. In this way a single check is enough to protected from all forms of recursion and it would not miss a stack overflow from combined recursion of interpreter and parser. The stack consumption is calculated by taking a pointer difference between a variable that exists during js_NewContext call and a variable at the recursion point. This is can be quite wrong as people can try to reuse the context or call js_NewContext from a legitimate function that consumed a lot of stack space already, so a stack pointer value would be quite off from the real stack usage. It even may have nothing to do with the current stack if JS_Context was created on another thread, but I believe it is not allowed by SM, right? Another weak point of the patch is that it does not take into account the amount of stack space already consumed at the moment of the engine invocation. So if JS engine is called from, say, another scripting language under deep recursion, it may still miss stack overflow. This can be addressed by decreasing the stack limit in a OS specific way, see for example, http://gcc.gnu.org/ml/gcc/2003-02/msg01342.html . And, of cause, the stack size limit itself without the adjustment according to the currently consumed space should be set to OS-specific value instead of the arbitrary 1MB.
Attachment #115178 - Attachment is obsolete: true
Style nits: - The JSSize and JSPtrdiff types came from a fork of NSPR2.0 files that fur did in 1998 or so to make js/src stand alone. They are not used in the engine as far as I can tell; I prefer size_t and ptrdiff_t (use standard C, it's a new millennium ;-). - The js_CheckStackSize macro is named as if it were a function, but in fact it's an "unsafe" macro that takes the address of its second parameter. I'd prefer it if it were JS_CHECK_STACK_SIZE, and it might be even better to make the callers take the address of the the dummy stack variable. Has this patch introduced any measurable (even on slow machines) performance changes? I'm not sure if this helps the hard cases, even if we try to tune the 1U << 20 (could use JS_BIT there -- we used to work on Win16, and 1U was of type unsigned or uint16 there) value. We might end up chasing a set of magic numbers for different OSes and compilers. Perhaps it's time to make the engine, or at least the parser, much more iterative, instead? Then we could limit heap memory usage arbitrarily but generously, and not have to worry about OS-dependent tuning. /be
This patch iteration renames js_CheckStackSize into JS_CHECK_STACK_SIZE and uses size_t and ptrdiff_t in place of JSSize and JSPtrdiff. Strictly speaking in the 21st century it is better to use intptr_t for an integer type to hold a value of a pointer since ptrdiff_t is guarantied to hold only pointer difference, not the pointer itself, but 20th century compatibility holds that back. Perhaps there is a reason for JSPtrdiff after all? I kept passing the variable name to the macro JS_CHECK_STACK_SIZE, not its address since it prevents passing non-lvalue accidentally, so compiler will detect a usage errors like JS_CHECK_STACK_SIZE(cx, NULL). I believe the performance impact of the patch is unmeasurable since it introduces a couple of checks doing integer compare / subtractions on struct fields in functions that would allocate memory or make other function calls.
Attachment #115188 - Attachment is obsolete: true
Right, ptrdiff_t on Win16 was only 16 bits -- although that was not kosher by the C standard, because it couldn't express the difference between a[0] and a[32768] for a big array a that fit in a 64k segment. Rather than intptr_t, if you want a type the JS engine knows is big enough to hold a pointer, you can use jsword (jsuword if unsigned). /be
As regarding making the engine iterative: to avoid recursion completely, one would need to merger all code paths that leads to recursion to a single C function. It should be possible to get that for interpreter, but even that would not help cases like calling external functions that calls the engine recursively, like recursive document.write. With something like the patch above it is possible to make engine immune against deliberate stack overflow attacks which AFAIK can be explored to execute arbitrary code. And stack usage constants tuning can be left to an application as with the patch applications needs to provide only single number of stack size limit.
Attachment #115201 - Attachment is obsolete: true
Comment on attachment 115201 [details] [diff] [review] Patch update: using JSUword to hold pointer values / stack size limitation constants Looks good. One tiny nit, really a personal problem of mine: notice that JSUword and JSWord are never used in the source; the all-lowercase jsuword and jsword are used instead. That's the convention ("tradition") mentioned for scalar typedefs in README.html. It would warm my heard if you upheld it. Maybe we should remove the cloned-from-NSPR2.0-PRUword/PRWord names.... Phil, can you try this patch on several platforms? We should also have a JS_SetStackSizeLimit(cx) API as Igor suggested, I think. Igor, are you willing to add that too? Thanks for all your work here. /be
("warm my heart") Wow, racing with the patch manager. I'll gladly r= a patch that adds JS_SetStackSizeLimit, so embeddings can tune to a non-default limit. One other thought: should the stack size limit be per-runtime, or per-context, or both (with a per-runtime default value)? /be
I think a stack size limit should be strictly per-context settings since the limit should take into account currently consumed stack and this is per-thread data. To simplify user's life a separated JS_SetOsDefaultStackSizeLimit can be provided that would try to do its best to guess the stack limit from the OS-specific API. On MAC as Kenton Hanson reported http://bugzilla.mozilla.org/show_bug.cgi?id=96128#c5 it can call StackSpace, on Linux getrusage() may help and on BSD it can try pthread_attr_getstac, but with the default value of 1MB at least some protection would be available.
Attachment #115203 - Attachment is obsolete: true
I have built the optimized JS shell on WinNT4.0 and Linux RH7.2, both with and without the latest patch applied. I'm having trouble with my new Mac OSX environment, and have asked Patrick for help. WinNT: no difference in the test results, including elapsed time to run Linux: I get three new test failures with the patch applied: Testcase js1_5/Regress/regress-152646.js failed Expected exit code 0, got 3 Testcase terminated with signal 0 Complete testcase output was: InternalError: too much recursion Testcase js1_5/Regress/regress-192414.js failed Expected exit code 0, got 3 Testcase terminated with signal 0 Complete testcase output was: ./js1_5/Regress/regress-192414.js:64: InternalError: too much recursion Testcase js1_5/Regress/regress-192465.js failed Expected exit code 0, got 3 Testcase terminated with signal 0 Complete testcase output was: InternalError: too much recursion It sounds from the above discussion that this might not be surprising. That is, would success on these tests depend on further OS-tuning?
Regarding the test faulures on Linux: all this tests tests against too deep recursion, and in all the cases stack size exceeds 1MB. Just try to to set the stack limit to a value less then 1MB and I believe you will get segmentation fault in all 3 cases: ~/w/js/mozilla/js/src> ulimit -s 1000; ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js -f ~/w/js/TESTS/js1_5/shell.js -f ~/w/js/TESTS/js1_5/Regress/regress-152646.js Segmentation fault I also curious, did Window NT pass the tests? And if does, what happpens if js1_5/Regress/regress-192414.js you will change N from 10*1000 to, say, 100*1000 ?
With this patch iteration manually running Regress/regress-152646.js gives: ~/w/js/mozilla/js/src> ~/w/js/mozilla/js/src/Linux_All_DBG.OBJ/js -f ~/w/js/TESTS/js1_5/shell.js -f ~/w/js/TESTS/js1_5/Regress/regress-152646.js /home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118: InternalError: too much recursion: /home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118: ((((((((((((((((((((((((((((((((((((((((((((((((((( /home/igor/w/js/TESTS/js1_5/Regress/regress-152646.js:118: ...........................................^ which indicates that just after 51 left '(' parser consumend more then 1MB of stack space. Perhaps more iterative parser is a good idea after all?
Attachment #115207 - Attachment is obsolete: true
As the test cases js1_5/Regress/regress-192414.js and js1_5/Regress/regress-192465.js test that engine does not crash on too deep recursion, the patch adds try/catch around recursive code to catch a possible exception about exceeding stack size limit and make tests pass in this case.
> Test cases update: use try/catch arround code that can throw recursion error. I have updated the tests with Igor's patch: js1_5/Regress/regress-192414.js js1_5/Regress/regress-192465.js Now they will guard against crashes only, and treat "too much recursion" exceptions as a pass. Igor had another good idea: I need to run these stack overflow tests interactively to see the correct results on Linux. The three tests in Comment #22 were all "passing" according to the Perl test driver, but in reality were segfaulting (before Igor's patch was applied). Apparently there is some lack of communication between JS and Perl on my Linux box when this type of segfault occurs; thus my Comment #22 was misleading. I'll have to run the JS shell manually, as in: (path to)/js.exe -f js1_5/shell.js -f js1_5/Regress/regress-192414.js Here are my corrected results, using this approach on regress-152646.js plus the two revised testcases. Note these have the following structure. There is no try...catch in regress-152646.js, because it is testing a direct compilation of the code (i.e. not inside an eval string): regress-152646.js: var x = HUGE_EXPRESSION; regress-192414.js: try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! } regress-192465.js: try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! } RESULTS: LINUX ulimit -s 8192 (default) -------------------------------------------- TRUNK LATEST PATCH regress-152646.js test passes InternalError: too much recursion regress-192414.js segfault test passes ("too much" is caught) regress-192465.js segfault test passes ("too much" is caught) LINUX ulimit -s 1000 ------------------------------------------------------ TRUNK LATEST PATCH regress-152646.js segfault segfault regress-192414.js segfault segfault regress-192465.js segfault segfault WINNT4.0 ------------------------------------------------------------------ TRUNK LATEST PATCH regress-152646.js segfault segfault regress-192414.js segfault segfault regress-192465.js segfault segfault
Whiteboard: [ QA note: verify interactively - see Comment #26 ]
INTERPRETATION: LINUX We see that with the latest patch, regress-152646.js becomes a problem. Should I modify this test to have the same structure as the other two? i.e. not: var x = HUGE_EXPRESSION; but rather: try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! } Or is it important to FAIL this test if it can't compile directly? The expression is of the form ((((0)))), but with hundreds of parens. See bug 152646 for a reference. WINNT Here the patch seems to have no effect. I segfault on all three tests, with or without the patch applied. Why?
Attachment #115231 - Attachment is obsolete: true
Phil, could you try the patch on WinNT with stack limit set to, say, 100K or less instead of 1MB (see js_NewContext , line 154 in jscntxt.c). Regarding the failure of regress-152646.js on Linux. Currently tests contains 2048 '(' and on my Linux box depending on optimization flags for gcc 3.2 it needs between 1.5MB (-Os flag) and 2.5MB (-O3) of stack space to complete. Thus if you cut the number of '(' to 1024 the test most likely will pass. Still the test shows that making parser more iterative will reduce stack consumption significantly, as spending 1KB-2KB of stack space is IMO hurt processor caches and performance. The main culprit of the high stack usage is a recursion between OrExpr and UnaryExpr, where there is at least 10 intermediate function calls with only 1 or 2 really adding nodes to the parse in typical situations, while each adds at least 64 bytes of stack space due stack alignment.
Igor, recursive descent parsers do use a lot of stack indirectly recursing to parse nested parentheses. That's a design characteristic. It's not necessarily a bad thing -- it depends on how common deeply-nested parentheses arise in the real world. I'm looking into a patch to switch to operator precedence parsing. /be
> Phil, could you try the patch on WinNT with stack limit set to, say, > 100K or less instead of 1MB (see js_NewContext , line 154 in jscntxt.c). That worked!!! On WinNT I changed line 154 from: js_SetStackSizeLimit(cx, JS_BIT(20)); to js_SetStackSizeLimit(cx, JS_BIT(17)); Then the three tests behaved the same as they do on my Linux box! (after the latest patch is applied): WINNT4.0 --------- js_SetStackSizeLimit(cx,JS_BIT(17)) -------------------- TRUNK LATEST PATCH regress-152646.js segfault InternalError: too much recursion regress-192414.js segfault test passes ("too much" is caught) regress-192465.js segfault test passes ("too much" is caught)
Given Phil comments above, what is the best way to proceed now? The patch I believe does allow to set for an application stack limit and prevent possible stack overflow attacks that potentially can be explored to run arbitrary code especially in multi-threaded application (AFAIK this is extremely hard, but with the full source in hands it is doable). Ideally a separated function can be provided to get at least an estimation of stack size limit, but for now setting hard coded default limit may not be that bad.
Attachment #115482 - Flags: review?(brendan)
Attachment #115482 - Attachment is obsolete: true
Attachment #115482 - Flags: review?(brendan)
Attachment #115657 - Flags: review?(brendan)
There's no rush to get a patch in, but I'd be happy with Igor's patch landing in those files that still need it, once I've gotten an operator-precedence patch to the parser landed. Should be done in the next week or two. /be
Blocks: 192465
Attachment #115657 - Attachment is obsolete: true
I ran the latest patch against the JS testsuite on WinNT and Linux. It caused no test regressions in the debug or optimized JS shell.
Status: NEW → ASSIGNED
*** Bug 152646 has been marked as a duplicate of this bug. ***
From Comment #27: > Should I modify regress-152646.js to have the same structure ...? > > i.e. not: var x = HUGE_EXPRESSION; > but rather: try {eval('var x = HUGE_EXPRESSION;')} catch(e) { PASS! } I have done this: testcase js1_5/Regress/regress-152646.js now has the same try...catch structure as the other two tests above. That way, possible exceptions about exceeding stack size limit will make the test pass, as with the other two tests.
Attachment #116080 - Attachment is obsolete: true
Comment on attachment 120400 [details] [diff] [review] Patch update to reflect recent changes. Should I split the patch into API part and pieces regarding recursion checks in parser, toSource and regexps?
Attachment #120400 - Flags: review?(brendan)
Target Milestone: mozilla1.4alpha → mozilla1.4beta
Target Milestone: mozilla1.4beta → mozilla1.5alpha
Attachment #120400 - Attachment is obsolete: true
Attachment #123489 - Flags: review?(brendan)
Attachment #115657 - Flags: review?(brendan)
Attachment #120400 - Flags: review?(brendan)
Attachment #123489 - Attachment is obsolete: true
Attachment #123489 - Flags: review?(brendan)
Attachment #126853 - Flags: review?(brendan)
Attachment #126853 - Attachment is obsolete: true
Attachment #126853 - Flags: review?(brendan)
Attachment #128505 - Flags: review?(brendan)
Attached patch my version of Igor's patch (obsolete) — Splinter Review
I assert that stacks grow downward (no one I've spoken with knows of a counterexample), reducing the cost of JS_CHECK_STACK_SIZE. Further, by making js_SetStackSizeLimit precompute a limit pointer, JS_CHECK_STACK_SIZE avoids a subtraction and the need for cx->stackBase. An mostly-aesthetic tweak: use int for stackDummy local type. Control flow to label out: in js_Interpret was already a bit dodgy in the first goto out case, as cx->interpLevel had not yet been bumped. I fixed that and moved the recursion check up, unifying "then" clauses for the two over-recursed tests. /be
Same comments for this one. /be
Attachment #131928 - Attachment is obsolete: true
Attachment #131929 - Flags: review?(user)
Shaver informs me that HP-UX on HP Precision Architecture stacks grow upward. Who thought of that? Whatta revoltin' development! /be
Next is a diff -w version. /be
Attachment #131929 - Attachment is obsolete: true
Attached patch diff -w version of last patch (obsolete) — Splinter Review
I could see an autoconf test, but the standalone js make goop wouldn't benefit, and I'm lazy. If there really are more architecture/compiler/OS combinations out there beyond HPUX-for-HPPA that grow stacks upwards, I'll bestir myself. But for now (1.6a) this is what I'd like to check in. /be
Comment on attachment 131936 [details] [diff] [review] diff -w version of last patch shaver's around and I shouldn't ask Igor to review, as he wrote much of this patch and was the driving force. But he's welcome to comment too, of course. /be
Attachment #131936 - Flags: review?(shaver)
Patch in haste, obsolete at leisure... /be
Attachment #131935 - Attachment is obsolete: true
Attached patch diff -w version of last patch (obsolete) — Splinter Review
This is more like it. I think JS_SetStackSizeLimit(cx, 0) is a clearer trope for disabling than JS_SetStackSizeLimit(cx, (jsuword)-1). /be
Attachment #131936 - Attachment is obsolete: true
Attachment #131939 - Flags: review?(shaver)
Attachment #131936 - Flags: review?(shaver)
Comment on attachment 131939 [details] [diff] [review] diff -w version of last patch HPUX!==parisc 11 can run on itanium
Attachment #131939 - Flags: review?(shaver) → review-
And you do appear to want parisc: >One of the problems with the PA linux implementation is the upward >growing stack. On most architectures, the virtual process space is >divided into
timeless: what's the right macro to test, can you find out? Maybe we should lift some autoconf love from the Linux kernel sources. /be
I used JS_SetStackSizeLimit(cx, (jsuword)-1) in the initial patch to disable stack checking since there is no any special cases associated with it. (jsuword)-1 == MAX_JSUWORD == MAX_EVER_POSSIBLE_STACK_SIZE and there is no need to treat case of 0 specially. It would also prevent error when one would use, say, a - b to set stack size and due to a bug it happens that a == b. As regarding the optimization to assume that stack grow on a particular architecture only in one direction, I remember some suggestions to change the direction even for x86 as it may make buffer overflow exploits harder.
well um, here's what i've seen tested: _PA_RISC1_0 _PA_RISC1_1 hppa __hppa__ __hppa and then there's some hint that there's a paric2.0 in existence so three might be another define ... igor: yeah but it sounded like people claimed it wouldn't really help but at least people discussed it, so i suppose it's theoretically possible that it could happen.
Igor: using (jsuword)-1 directly as a value for cx->stackLimit does not work for the revised JS_CHECK_STACK_SIZE macro, so there's no advantage to having the user pass in the "disabled" value. The "disabled" cx->stackLimit value depends on stack growth direction, and the ?: expression needed to select the right value is small, and executed once per new-context or set-stack-size-limit API call. Therefore the API "disabled" value can be whatever we like, and 0 is easier to express, prettier, and better connotes "off" (false, null). Let's not worry about theoretical upward stack growth on x86 until the assertion botches and we know it's a real problem. /be
Attachment #131938 - Attachment is obsolete: true
Attachment #131939 - Attachment is obsolete: true
Attachment #132023 - Flags: review?(shaver)
We really don't support cross-compiling SpiderMonkey well at all, do we? Anyway, this addresses all concerns about hardwiring stack growth direction. Hope everyone is ok with it. I'll look for r=shaver before landing. /be
Comment on attachment 132023 [details] [diff] [review] diff -w version of last patch Looks good. There are autoconf tests we can steal too, when the time comes for someone to revive rginda's (?) work.
Attachment #132023 - Flags: review?(shaver) → review+
> We really don't support cross-compiling SpiderMonkey well at all, do we? well, i'm cross-compiling it for target arm at the moment ;-)
Darin, how's it going? What have you had to hack? /be
no trouble with SpiderMonkey yet... xptcall on the other hand :-/
Fixed. /be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This patch is causing us to crash on OS/2 build. In js_SetStackSizeLimit, stackDummy has a low value, and then 1MB is subtracted, resulting in a negative value. cx->stackLimit is a unsigned long, so in the assignment, a negative value becomes a large positive value, so all JS_CHECK_STACK_SIZE calls fail on OS/2, eventually leading to a crash. From reading the comments, it looks like I need to put in an OS/2 specific call to find the end of the stack and work with that. However, the call is rather expensive, and should just be called once per thread. Does the javascript engine run on multiple threads, or only on thread 1?
Javier, do stacks grow downward on OS2? If so, we could clamp at 0. But if stacks grow upward, then you must not be running jscpucfg to generate jsautocfg.h. What's the story? Also, what's the eventual crash? You shouldn't crash just because the JS engine cleanly fails due to (wrongully declared, but never mind) thread stack overflow detection. /be
Attached patch proposed OS2 fix (obsolete) — Splinter Review
Unless OS2 builds are using the wrong stack direction, this should be the right fix. It's what I should have done anyway, for robustness. /be
Comment on attachment 132161 [details] [diff] [review] proposed OS2 fix Good testing results from Javier needed too, of course. /be
Attachment #132161 - Flags: review?(shaver)
Comment on attachment 132161 [details] [diff] [review] proposed OS2 fix Yeah, nice robustness fix. Javier's testing will tell the tale, in the end.
Attachment #132161 - Flags: review?(shaver) → review+
> Javier, do stacks grow downward on OS2? Yes. > If so, we could clamp at 0. But if stacks grow upward, then you must not be > running jscpucfg to generate jsautocfg.h. What's the story? Everything is being generated right. It's just that the 1MB default limit doesn't work for OS/2. I've implemented an OS/2 specific section is js_SetStackSizeLimit that finds the actual end of the stack and uses that. With this code, no assertions and no crash. > Also, what's the eventual crash? You shouldn't crash just because the JS engine > cleanly fails due to (wrongully declared, but never mind) thread stack overflow > detection. I'm having problems with my debugger, so I can't tell you the exact line that is causing the crash, but this is what the console output looks like right before the crash: JavaScript error: chrome://communicator/content/profile/profileSelection.js line 1: too much recursion WARNING: waaah!, file D:/builds/trunk/mozilla/content/xul/document/src/nsXULPrototypeDocument.cpp, line 864 JavaScript error: chrome://communicator/content/profile/profileManager.js line 1: too much recursion WARNING: Failed to get a global Object Owner, file D:/builds/trunk/mozilla/dom/src/base/nsJSEnvironment.cpp, line 156 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79 WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file D:/builds/trunk/mozilla/content/xbl/src/nsXBLProtoImpl.cpp, line 79 > Unless OS2 builds are using the wrong stack direction, this should be the right > fix. It's what I should have done anyway, for robustness. Yes, this new patch does indeed work for me. However, as mentioned in some of the comments, wouldn't it be best to use OS functions to find the actual end of the stack, and if these functions don't exist, just fall back on the default?
Javier: The OS-specific functions can be costly, as you advised. We don't want endless ifdefs in jscntxt.c to fine-tune a sanity limit that we can set loosely. So, with the latest patch, do you crash on stack overflow? If OS2 stacks grow downward and start from an address less than 0x100000, how low can they go? Can they go to 0, or must they stop one page up from null? /be
The lower 64K are reserved, so it wouldn't go all the way to 0. Instead of doing a simple 1MB check, why not use the stack size? Windows and OS/2 specify the executable stack size in rules.mk (don't know about other platforms). Shouldn't we be using those, rather than arbitrarily choosing 1MB?
Javier: where in rules.mk? We can easily override JS_DEFAULT_STACK_SIZE_LIMIT, that's what it's there fore. Give me a pointer, I'll patch. /be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
On the later crash due to the spurious stack overflow error, it sounds like the XUL content sink code is not robust enough. I'll let someone else file a bug on that (Javier, you're "it" if you care ;-). /be
So this is what I had in mind. I thought that Win32 also set their stack size to 0x100000, but I can't find it. Maybe they use the default 1MB now. What do you think? I cannot actually build this until tomorrow, but it shouldn't crash OS/2.
0x100000 is 1MB. You said in comment 66 that OS2's default stack size was such that when js_SetStackSizeLimit subtracted 1MB from &stackDummy, the result was a negative signed word, so a very large unsigned word value. So attachment 132172 [details] [diff] [review] won't actually improve things, without my attachment 132161 [details] [diff] [review] to clamp at 0. For many platforms, this latest patch will force a reduction in Mozilla's stack size from what it defaults to presently. On Linux, for example, it depends on the default for ulimit -s, which is unlimited on my RH8 laptop. Remember that js/src (SpiderMonkey) is built standalone and used around the world in many embeddings, with no hint of Mozilla's autoconf stuff, so no MOZ_* macros. I'd rather we not force the stack size limit in the build system to a certain value, just for this bug. How do we know we're not crimping Gecko's recursive style? Do that in a separate bug. Here, we should do the minimum to get OS2 back up. Is attachment 132161 [details] [diff] [review] enough? /be
Status: REOPENED → ASSIGNED
> 0x100000 is 1MB. Quite right. All these numbers are starting to run together in my head! > Here, we should do the minimum to get OS2 back up. > Is attachment 132161 [details] [diff] [review] enough? Yes, that patch is enough to bring Mozilla back up, and to be used for normal browsing. But it won't fix what this bug is intended to fix. I finally built the standalone interpreter and ran the testcases. Sure enough, they all crash with BADSTACK on OS/2. The only way I found to prevent these crashes (and to get the stack overflow error messages) without using some arbitrary value was to query the OS to get the end of the stack, and then set cx->stackLimit = StackEnd + 5120. Setting the stackLimit to a predefined value for all platforms won't fix this bug, due to the reasons pointed out in comment #11. And as Phil saw in comment #31, he needed to lower JS_DEFAULT_STACK_SIZE_LIMIT to less than window's stack size in order to get the testcases to run.
I do not think that calling os-specific functions to get stack size limit would have any performance impact if done once per context creation (unless they are extremely badly implemented...). The attachment 132092 [details] [diff] [review] from bug 96128 contains a possible implementation of such OS-specific call for Mac, perhaps something similar can be done for OS2?
I checked in attachment 132161 [details] [diff] [review]. We can keep this bug open and take a long march through the OSes, adding system calls to get stack sizes, subtracting "fudge terms" to account for stack space already consumed. But I'd prefer separate bugs for better assignment, since it will take several people, with their own machines, to get the job done right. Also, the reason I'm trying to minimize invasive, extensive, fudge-term-filled changes for this bug: not all JS embeddings need to worry about crashing on a runaway recursion (e.g.). C and C++ so crash, why not JS for a "single-system image" where all executed scripts come from the same trust domain in which the interpreter runs? If you take that view far enough, it says that the embedding application is responsible for calling JS_SetStackSizeLimit with the right (OS-dependent, existing stack-depth-dependent) parameter. After all, only the embedding knows the distance in stack space from main()'s stack to the frame of the function in that embedding that calls JS_SetStackSizeLimit. I like this approach much better. It means the sequel bugs to this one would want to patch dom/src/base/nsJSEnvironment.cpp, not jscntxt.c. Cc'ing jst for DOM advice and wtc for any help NSPR can provide on stack sizing. /be
One final thought: the JS_SetStackSizeLimit API might rather be a JS_SetStackLimit API that takes a jsuword *address*, not a size. Then callers would not need to subtract the currently-consumed stack space from the OS-derived stack size limit, only to have js_SetStackSizeLimit add back a nearly-equal (but slightly larger) currently-consumed address (&stackDummy, in js_SetStackSizeLimit). Comments? /be
So do you mean that we would change JS_SetStackLimit API to take an address, and then call it (with the OS specific value) after ::JS_NewContext() in dom/src/base/nsJSEnvironment.cpp?
Javier: yes, exactly. The current state of affairs has created an API incompatibility, in fact: if a multi-threaded embedding pools contexts among threads, when it reassigns an idle context to a new thread via JS_SetContextThread, the cx->stackLimit (default or set explicitly) value will be wrong. JS_SetContextThread can't recompute the value very well, because again it doesn't know the new thread's stack depth. I'll provide a patch shortly that minimizes the JS code appropriately. /be
And leave it to embedders to pass the proper stack limit address for the thread to which the context is assigned, via JS_SetThreadStackLimit. /be
Attachment #128505 - Attachment is obsolete: true
Attachment #132161 - Attachment is obsolete: true
Attachment #132172 - Attachment is obsolete: true
Attachment #132216 - Flags: review?(shaver)
Comment on attachment 132216 [details] [diff] [review] axe JS_DEFAULT_STACK_SIZE_LIMIT and unneeded subroutine Yeah, this is something that embedders can do a much better job with.
Attachment #132216 - Flags: review?(shaver) → review+
Fixed, for real. I filed bug 220408 on the DOM core as the sequel bug. /be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Attachment #128505 - Flags: review?(brendan)
Attachment #131929 - Flags: review?(user)
*** Bug 152646 has been marked as a duplicate of this bug. ***
Reopening bug; timeless has observed that the following two testcases are crashing in both the debug and optimized JS shell. I see the same: js1_5/Regress/regress-192414.js js1_5/Regress/regress-152646.js These are, respectively, the regression test for this bug and the one for bug 152646, which was duped against this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I get a crash for js1_5/Regress/regress-192414.js , but not for js1_5/Regress/regress-152646.js on my Linux box. js1_5/Regress/regress-152646.js produces a SyntaxError, because the testcase has an error. The parentheses are not balanced! If I balance them, I crash starting with a nesting level of about 6000.
Zack: thanks once again. Testcase corrected: [//d/JS_TRUNK/mozilla/js/tests/js1_5/Regress] cvs diff -rHEAD regress-152646.js RCS file: /cvsroot/mozilla/js/tests/js1_5/Regress/regress-152646.js,v retrieving revision 1.5 diff -r1.5 regress-152646.js 74a75 > sRight += sRight; [//d/JS_TRUNK/mozilla/js/tests/js1_5/Regress] cvs ci regress-152646.js Checking in regress-152646.js; /cvsroot/mozilla/js/tests/js1_5/Regress/regress-152646.js,v <-- regress-152646.js new revision: 1.6; previous revision: 1.5 done
Zack: crashes with what stack? Is it over 1MB deep? What does ulimit -s say when you type it into a shell that you're running the tests from? /be
Phil, what stacks do you see? What does ulimit -s say if you're on Linux? It's hard to believe 1MB of stack is not provided by modern OSes. Perhaps something else is going wrong. /be
I'm seeing regress-192414.js crash on a Linux build from today as well. ulimit -s shows a max stack size of 8 MB. The new version of regress-152646.js runs successfully. Unfortunately I just have an opt build on this machine so I can't get a trace for the crash.
On my WinNT4.0 box, and on my Win2K box, both testcases still crash due to stack overflow, even after the correction to regress-152646.js. On Linux, regress-152646.js does not crash, but regress-192414.js does: [/d/JS_TRUNK/mozilla/js/src/Linux_All_DBG.OBJ] ulimit -s 8192 (gdb) run -f ../../tests/js1_5/shell.js -f ../../tests/js1_5/Regress/regress-192414.js Starting program: /d/JS_TRUNK/mozilla/js/src/Linux_All_DBG.OBJ/./js -f ../../tests/js1_5/shell.js -f ../../tests/js1_5/Regress/regress-192414.js Program received signal SIGSEGV, Segmentation fault. 0x0806e981 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c10) at jsemit.c:1982 1982 switch (pn->pn_type) { (gdb) bt #0 0x0806e981 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c10) at jsemit.c:1982 #1 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211fe0) at jsemit.c:3940 #2 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314c40) at jsemit.c:3632 #3 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211f80) at jsemit.c:3940 #4 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314ca0) at jsemit.c:3632 #5 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211f20) at jsemit.c:3940 #6 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314cd0) at jsemit.c:3632 #7 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211e90) at jsemit.c:3940 #8 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d00) at jsemit.c:3632 #9 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211e30) at jsemit.c:3940 #10 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d30) at jsemit.c:3632 #11 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211dd0) at jsemit.c:3940 #12 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d60) at jsemit.c:3632 #13 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211d70) at jsemit.c:3940 #14 0x08072e28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8314d90) at jsemit.c:3632 #15 0x08073a28 in js_EmitTree (cx=0x80e6108, cg=0xbfffe0a0, pn=0x8211d10) at jsemit.c:3940 etc. etc.
I have 8 MB stack size too. To run the testcase successfully, I need to increase it to about 27 MB.
For Windows regress-152646.js top of crash stack is: > js3250.dll!AddExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918) Line 2450 + 0xc C js3250.dll!ShiftExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918) Line 2435 + 0x11 C js3250.dll!RelExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918) Line 2403 + 0x11 C js3250.dll!EqExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918) Line 2378 + 0x11 C js3250.dll!BitAndExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c2b8a0, JSTreeContext * tc=0x0012d918) Line 2366 + 0x11 C regress-192414.js top of crash stack is: > js3250.dll!GetChar(JSTokenStream * ts=0x00c31228) Line 438 + 0xc C js3250.dll!js_GetToken(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228) Line 761 + 0x9 C js3250.dll!UnaryExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228, JSTreeContext * tc=0x0012d918) Line 2551 + 0xd C js3250.dll!MulExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228, JSTreeContext * tc=0x0012d918) Line 2468 + 0x11 C js3250.dll!AddExpr(JSContext * cx=0x00c51028, JSTokenStream * ts=0x00c31228, JSTreeContext * tc=0x0012d918) Line 2450 + 0x11 C
The tests fail because the committed code gives an embedding just an option to limit stack size. By default this limit is set ti infinity and it is the embedding that should specify it based probably on the information supplied by OS etc. I filed the bug 225061 with a patch to add -S command line option to js shell to specify the stack size for the shell explicitly, with that patch the recursion is correctly detected, see test run there.
Thanks, Igor. When bug 225061 is fixed (very soon), this bug should be closed and verified. /be
Depends on: 225061
I declare victory! /be
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Flags: testcase+
The tests/js1_5/Regress/regress-152646.js still causes a bus error, when testing against the latest and greatest js-1.60. The crash is in js_EmitTree(), stacked upon itself 1565 times.
regress-192414.js|regress-152646.js|regress-192465.js verified fixed 1.8.1|1.9.0 linux/mac*/windows. I also don't see the issues on 1.8.0 with these tests at least on linux. verified fixed 1.9.0 linux/mac*/windows.
Status: RESOLVED → VERIFIED
Where are the 1.8.0 and 1.9.0? All I can see in http://ftp.mozilla.org/pub/mozilla.org/js/ is 1.6...
1.8.0, 1.8.1, 1.9.0 refer to the cvs branch not the release version of spidermonkey. I am going to release a 1.6.1 version of spidermonkey this week.
Terrific! Would you, please, also release the set of tests (in the same or in separate tarball), which are all supposed to succeed? There are many ways in which spidermonkey can be miscompiled, so I would like our (FreeBSD's) port of the software to start running the self-tests after building again. In the past we were forced to turn the tests off due to some of the tests failing spuriously and breaking the package-building... The latest /released/ tarball of tests is from 2002, which, I guess, is very outdated by now... Here, for example, is the log of test failures in 1.6: http://aldan.algebra.com/~mi/tmp/results-2007-09-14-202814-smdebug.html Yours, -mi
Crash Signature: [@ GetChar ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: