Closed Bug 324278 Opened 19 years ago Closed 18 years ago

GC without recursion

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Keywords: verified1.8.1)

Attachments

(4 files, 23 obsolete files)

46.57 KB, patch
Details | Diff | Splinter Review
721 bytes, patch
brendan
: review+
Details | Diff | Splinter Review
3.70 KB, text/html
Details
4.05 KB, text/html
Details
This is a spin-off of bug 280844 to implement GC truly without recursion via updating and fixing attachment 177334 [details] [diff] [review]. Here is a summary:

The current implementation of GC despite efforts to implement 
Deutsch-Schorr-Waite algorithm still can go into an uncontrolled recursion when a GC thing of one type refers to a thing of another type. For example, the following example for js shell triggers a stack overflow when GC mark phase recurse  between ordinary and XML objects  even with separated Deutsch-Schorr-Waite available for both types (see bug 280844 comment 14 for a similar example):

var N = 1000 * 1000;

function prepare_list(N)
{
    var cursor = null;
    for (var i = 0; i != N; ++i) {
        var ns = new Namespace("protocol:address");  
	ns.property = cursor;
        var xml = <xml/>;
        xml.addNamespace(ns);
	cursor = {xml: xml};
    }
    return cursor;
}

var top = prepare_list(N);

print("prepared for "+N);
gc();

var count = 0;
while (top) {
	top = top.xml.inScopeNamespaces()[1].property;
	++count;
}

var expected = N;
var actual = count;
print(expected == actual);
gc();

To see the problem, restrict the system stack to 512K and run the shell with -S switch like in js -S 500000 test.js.

To address this the idea is to replace Deutsch-Schorr-Waite by the following algorithm initially implemented in attachment 177334 [details] [diff] [review] for bug 280844. When C stack is low, GC does not recurse into marking children of a GC thing after marking the thing itself. Rather in tags the thing to indicate that its children were not yet marked. Then after the initial mark phase GC loops over tagged things and marks their children (potentially introducing more tagged things) until the set of tagged things is empty.

To reduce the memory overhead as much as possible this not-yet-scanned tagging effectively uses a linked list of 2-level digital trees of bits realized in the following way.

First we replace "split" and "flagp" from JSGCPageInfo by single "offset" from the start of things+flags area. This frees one word per GC page in return of slightly longer calculations of the address for thing's flags in js_GetGCThingFlags. We use this word as a bitset to tag things with not yet marked children from the corresponding page. 

Then we add 2 extra words to the GC arena header. The first is a bitset indicating pages with tagged things. The second is used to link arenas with tagged pages. Now to tag a particular thing we set a bit in the bitset from its JSGCPageInfo. If the bitset was empty, we also set a bit corresponding to the page in the arena header. And if that was empty too, we add the arena to the global tagged list using the link field.

To mark children of tagged things we loop until the list of tagged arenas is empty. During each iteration we pick a page with tagged things and then pick a tagged thing using bitsets from arena's header and JSGCPageInfo. Before marking thing's children, we untag the thing. If that clears the bitset in JSGCPageInfo, we remove the page from the bitset in arena's header. And if that becomes empty, we remove the arena from the tagged list. After that we do the marking.

The only complication is that in the 32 bit word each GC page can contain 1K/sizeof(JSGCThing) == 128 things and each bit in page's bitset covers a chunk of 4 things. Thus during the untagging loop we only know that we have to mark children of some of these 4 things that belong to the chunk without knowing exactly which ones. But since the tagged things should be already marked, we addresses this via scanning for unmarked children for all already marked things in the chunk. Clearly in the worst case this can increase the total marking time by factor of 4, but since the tagging happens only during low stack condition, this is an acceptable price to pay in my opinion.

In the 64 bit world the problem does not exist as 1K/sizeof(JSGCThing) == 64 and we have exactly one bit per thing.
This is the first part of the updated implementation to use only one word from JSGCPageInfo word for flag pointer calculations. It also reorganizes tail-recursion-elimination to get stricter JS_ASSERT coverage and do few renames to make the following patch smaller.
Attached patch Implementation: unscan list (obsolete) — Splinter Review
This is the second part of the update with the real implementation.
Attachment #209227 - Flags: review?(brendan)
Blocks: 323252
Here is non-e4x test case that explores recursion via function_object->function->script->shared_regexp:

// Number to push native stack size beyond 10MB if GC recurses generating
// segfault on Fedora Core / Ubuntu Linuxes where the stack size by default
// is 10MB/8MB.
var N = 100*1000;

function build(N) {
	// Explore the fact that regexp literals are shared between
	// function invocations. Thus we build the following chain:
	// top: functin->regexp->function->regexp....->null
        // to check how GC would deal with this chain.
	
	var top = null;
	for (var i = 0; i != N; ++i) {
		var f = Function('some_arg'+i, ' return /test/;');
		var re = f();
		re.previous = top;
		top = f;
	}
	return top;
}

function check(top, N) {
	for (var i = 0; i != N; ++i) {
		var re = top();
		top = re.previous;
	}
	if (top !== null) 
		throw "Bad top";
	
}

if (typeof gc != "function") {
	gc = function() {
		for (var i = 0; i != 50*1000; ++i) {
			var tmp = new Object();
		}
	}
}

if (typeof print != "function") 
	print = alert;

var top = build(N);
print("BUILT");
gc();
check(top, N);
print("CHECKED");
top = null;
gc();

Without the patches running the script via js -S 20000 generates segfault during the first gc(), with the patches applied the result is:

~/w/mozilla/js/src> js -S 20000 ~/s/x.js
BUILT
before 3827367, after 3602727, break 0bc97000
CHECKED
before 3602727, after 2727, break 0bc97000
Status: NEW → ASSIGNED
Attachment #209228 - Attachment is obsolete: true
Attached patch Implementation: unscan list v2 (obsolete) — Splinter Review
Here is implementation that fixed a bug in macro GET_GAP_AND_CHUNK_SPAN. Now I can browse with JS compiled with JS_GC_ASSUME_LOW_C_STACK defined so GC always goes through the new code.
Attachment #210864 - Attachment is obsolete: true
I'm gonna review this bug's patch first thing tomorrow -- sorry for the delays.  No need for updated patch but feel free to attach one if you can.

/be
Attachment #209227 - Attachment is obsolete: true
Attachment #212712 - Flags: review?(brendan)
Attachment #209227 - Flags: review?(brendan)
The patch contains the following code:

    /*
     * Mark children of things that caused too deep recursion during above
     * marking phase.
     */
    ScanDelayedChildren(cx);

    if (rt->gcCallback)
        (void) rt->gcCallback(cx, JSGC_MARK_END);

But this wrong as rt->gcCallback can cause more marking so     ScanDelayedChildren should come after the callback call. But this crashes the browser. Apparently browser's callback assumes that no more GC marking can happen after it returns. With the patch as is at least the browser crashes after many ours of use with JS_GC_ASSUME_LOW_C_STACK defined to always use recless code.
Attachment #210870 - Attachment is obsolete: true
This version allows for JSGC_MARK_END callback to continue to assume that it can start finalization after marking its own things. For that the patch adds JSContext.insideGCMarkCallback that js_MarkGCThing checks to ensure that the unscan list is empty when it returns back to callback.
Attachment #212713 - Attachment is obsolete: true
To minimize the patch impact I will try to use different approach to find room for unscan bits without touching page info. Currently the code uses 2 flags, GCF_FINAL and GCF_MARK to denote 3 states of GC thing. That is free, "allocated" and "market". Thus the idea is to introduce the forth, "market but unscanned" and use it to manage the unscan list together with some extra fields to locate unscanned things efficiently.
Attached patch Tail recursion cleanup (obsolete) — Splinter Review
This is a small patch to cleanup handling of tail recursion in MarkGCThink and simplify the real implementation.
Attachment #212712 - Attachment is obsolete: true
Attachment #213437 - Attachment is obsolete: true
Attachment #213461 - Flags: review?(brendan)
Attachment #212712 - Flags: review?(brendan)
Attached patch Tail recursion cleanup v2 (obsolete) — Splinter Review
Less code duplication
Attachment #213461 - Attachment is obsolete: true
Attachment #213464 - Flags: review?(brendan)
Attachment #213461 - Flags: review?(brendan)
Igor, which obsoleted "Implementation" patch for the whole fix should I review?  Or if possible, can you attach a new, complete patch?  The small incremental patches are good too, I do that sometimes to help reviewers see deltas (since bugzilla's interdiff is lame too often).  But I need a complete, non-obsolete patch to read, test, review, and stamp.

/be
Brendan wrote in comment #13:
> Igor, which obsoleted "Implementation" patch for the whole fix should I review?

The patch is not ready. When recusrion-less GC is used exclusively the browser asserts after some time (minutes-hours). It can be caused by a bug in the patch or changes in GC behavior that broke xpconnect assumptions. My current plan is to get an alternative implementation and see how it would behave during next week or so. 

As regarding the small patch I asked to review it is simply assert tightening in MARK_GC_THING plus easier to follow the tail recusrion eleimination. It makes the following changes slightly smaller but is also useful on its own. I file a separated bug for it.
Comment on attachment 213464 [details] [diff] [review]
Tail recursion cleanup v2

This is filed as separated bug 328896.
Attachment #213464 - Attachment is obsolete: true
Attachment #213464 - Flags: review?(brendan)
Depends on: 328896
Ok, let me know if you need help debugging -- I can try your latest patch once the tail call one goes in and help test it.

/be
This patch merges the code from attachment 212712 [details] [diff] [review] and attachment 213437 [details] [diff] [review] and fixes a bug in ScanDelayedChildren that would try to mark things beyond JSGCArenaList.lastLimit if it happens inside the chunk of 4 things.

Even if it would be the last bug I still try to implement the alternative approach from comment 10 as it can lead to a less intrusive patch.
Things still crashes:

#4  0xb7eef29a in nsProfileLock::FatalSignalHandler (signo=11)
    at nsProfileLock.cpp:210
#5  <signal handler called>
#6  js_GetGCThingFlags (thing=0x210314)
    at /home/igor/w/mozilla/js/src/jsgc.c:291
#7  0xb7e65432 in UnmarkedGCThingFlags (thing=0x210314)
    at /home/igor/w/mozilla/js/src/jsgc.c:1107
#8  0xb7e65ebb in js_MarkGCThing (cx=0x8d67ea8, thing=0x210314, arg=0x0)
    at /home/igor/w/mozilla/js/src/jsgc.c:1594
#9  0xb7e3bf47 in JS_MarkGCThing (cx=0x8d67ea8, thing=0x210314, 
    name=0xb7ec3166 "object", arg=0x0)
    at /home/igor/w/mozilla/js/src/jsapi.c:1831
#10 0xb7eb2441 in js_MarkXMLNamespace (cx=0x8d67ea8, ns=0xbeaac80, arg=0x0)
    at /home/igor/w/mozilla/js/src/jsxml.c:310
#11 0xb7e65718 in MarkGCThingChildren (cx=0x8d67ea8, thing=0xbeaac80, 
    flagp=0xbeac0b8 "U") at /home/igor/w/mozilla/js/src/jsgc.c:1344
#12 0xb7e65db8 in ScanDelayedChildren (cx=0x8d67ea8)
    at /home/igor/w/mozilla/js/src/jsgc.c:1555
#13 0xb7e66a17 in js_GC (cx=0x8d67ea8, gcflags=0)
    at /home/igor/w/mozilla/js/src/jsgc.c:2000

Segfaultis caused by JSXMLNamespace containing already freed objects.
Attached patch Implementation v3 (obsolete) — Splinter Review
The previous fix for the bug did it wrong, hopefully this time I get it right .
Attachment #214535 - Attachment is obsolete: true
Attached patch Implementation v4 (obsolete) — Splinter Review
Tighter asserts and removal of debug fprintf.
Attachment #214581 - Attachment is obsolete: true
Attached patch Implementation v5 (obsolete) — Splinter Review
Fixing English in comments per feedback from timeless@gmail.com.
Attachment #214615 - Attachment is obsolete: true
Attached patch Implementation v6 (obsolete) — Splinter Review
More comment fixes
Attachment #214647 - Attachment is obsolete: true
Attached patch Implementation v7 (obsolete) — Splinter Review
It turned out that the code to implement comment 10 lead to a rather complex mess when mapping flag pointer into thing pointer during delayed scanning. So I would not pursue that farther. 

On the other hand the idea of marking things with not yet GC-scanned children via GCF_MARK | GCF_FINAL allows to avoid rescanning things twice. And this is what this version of the patch implements.

That is, when bits from JSGCPageInfo covers more then one thing, this version of the patch scan children only of things marked with GCF_MARK | GCF_FINAL. Thus scanning performed only once for any thing.
Attachment #214660 - Attachment is obsolete: true
Comment on attachment 214686 [details] [diff] [review]
Implementation v7

Now it seems to work
Attachment #214686 - Flags: review?(brendan)
Attached patch Implementation v8 (obsolete) — Splinter Review
Improving comments as suggested by timeless@gmal.com
Attachment #214686 - Attachment is obsolete: true
Attachment #214700 - Flags: review?(brendan)
Attachment #214686 - Flags: review?(brendan)
Attached patch Implementation v8 (for real) (obsolete) — Splinter Review
I added the wrong patch
Attachment #214700 - Attachment is obsolete: true
Attachment #214701 - Flags: review?(brendan)
Attachment #214700 - Flags: review?(brendan)
Comment on attachment 214701 [details] [diff] [review]
Implementation v8 (for real)

Reviewing, still in the middle of it.

/be
Attached patch Implementation v9 (obsolete) — Splinter Review
Here is yet another tightening of asserts and better naming. 

This version of the patch takes advantage of the fact that now GC things not yet scanned for GC objects are marked with GCF_MARK | GCF_FINAL. It allows to count number of such things accurately so DEBUG-only JSRuntime.gcArenaUnscannedBagSize now really means number of things waiting for the children scan.

In addition I use the word "bag", not "list" to denote the whole structure to support non-recursive GC which is effectively a stack of 2-level digital trees.

There are no other changes besides these debug-tightening and renames as the last version of the patch survived 3 days of browsing with JS compiled with DEBUG and JS_GC_ASSUME_LOW_C_STACK defined.
Attachment #214701 - Attachment is obsolete: true
Attachment #214883 - Flags: review?(brendan)
Attachment #214701 - Flags: review?(brendan)
Attached patch Implementation v10 (obsolete) — Splinter Review
Here is another set of renames and assert-only changes. This is mostly to use consistently "unscanned", not "unscan" to refer to not yet GC scanned things or their sets and to use JSRuntime.gcUnscannedBagSize, not JSRuntime.gcArenaUnscannedBagSize as the bag size is unrelated to GC arenas.
Attachment #214883 - Attachment is obsolete: true
Attachment #215004 - Flags: review?(brendan)
Attachment #214883 - Flags: review?(brendan)
Attached patch Implementation v11 (obsolete) — Splinter Review
Yet another assert-only change. This time it is the removal of too tight assert at the end of AddThingToUnscannedList with comments:

     * If there is only one arena, the size of unscanned bag can not exceed
     * the maximum number of things per page in the arena times the number of
     * unscanned pages.

But this is wrong as AddThingToUnscannedList can be called from ScanDelayedChildren when the latter scans the last so far unscanned chunk. Then JSGCArena->unscannedPages would be cleared yet gcUnscannedBagSize would be positive. 

In principle the assert can be accounted for that, but then it would be rather complex debug-only code that brings no extra safety as the rest of the asserts already take care about inconsistent gcUnscannedBagSize. So I just removed it and for similar reason simplified the assert about the type of things that can be put into the unscanned bag.
Attachment #215004 - Attachment is obsolete: true
Attachment #215020 - Flags: review?(brendan)
Attachment #215004 - Flags: review?(brendan)
Attached patch Implementation v12 (obsolete) — Splinter Review
Here comes more asserts kills. 

The transition to use GCF_MARK|GCF_FINAL changes some assumption about iterations between JSRuntime.gcUnscannedBagSize and bit set structures for the fast unscanned bag lookup that were previously recorded as asserts and now are invalid.

In particular, this version removes the checks that JSGCPageInfo.unscannedThingChunks and JSGCArena.unscannedPages must be zero on AddThingToUnscannedBag entry if JSRuntime.gcUnscannedBagSize is 0. Although nice as a consistency check that helped to catch bugs in the earlier versions of the code, it no longer holds when thingsPerUnscannedChunk != 0.

For example, if the last remaining so far chunk contains only one GC thing to scan for children, then ScanDelayedChildren would first clear JSGCPageInfo.unscannedThingChunks and JSGCArena.unscannedPages. Then inside the chunk scanning loop the code would decrease gcUnscannedBagSize setting it to 0 and then call MARK_GC_THING_CHILDREN on the thing. But the latter can call AddThingToUnscannedBag for things on the same chunk. At this point JSGCPageInfo.unscannedThingChunks, JSGCArena.unscannedPages and gcUnscannedBagSize becomes non-zero once again.

When MARK_GC_THING_CHILDREN returns, the scan loop continues and comes to see just added to the bag thing. For it the loop calls MARK_GC_THING_CHILDREN decrease gcUnscannedBagSize so gcUnscannedBagSize becomes 0 once again. Now if MARK_GC_THING_CHILDREN calls AddThingToUnscannedBag, then it would see non-empty JSGCPageInfo.unscannedThingChunks and JSGCArena.unscannedPages with no things in the bug. This is perfectly OK, but previously the assert would reject that.
Attachment #215020 - Attachment is obsolete: true
Attachment #215043 - Flags: review?(brendan)
Attachment #215020 - Flags: review?(brendan)
Comment on attachment 215043 [details] [diff] [review]
Implementation v12

Not sure the big comment rewrite was all necessary.

Cite followup bug to address wasted flag bytes in a FIXME: ... comment.

Nit: use |page_address - page_offset| or (page_address - page_offset), not $...$, in that big comment.

JSGCPageInfo.offset is short and vague compared to .unscannedThingChunks (which is a bitmap, but its name doesn't say that).  Suggest more balanced name pair: {offsetInArena,unscannedBitmap}.  Your call, I'm not married to these suggested names.

Nit: s/pageInfo/pi/g in PAGE_TO_ARENA and elsewhere for naming consistency.

Nit: offset declared after a in NewGCArena to match first-set order.

js_GetGCThingFlags nit: might balance names, offsetInArena and thingIndex to match pageAddress.

s/TOO_DEEP_RECURSION/OVERDEEP_RECURSION/g -- or RECURSION_TOO_DEEP, even better.  English nit, really; no big deal.

Is branch-less FLOOR_LOG2 variant actually faster than branching one?  IIRC x86 has small penalty for mispredict where instruction prefetch buffer contains the target, as it almost always will here.

In GET_GAP_AND_CHUNK_SPAN, is strength-reduction by hand necessary, or would gcc and msvc do the right thing with constant divisor?

Assuming Firefox with this patch is still testing well for you, r=me.  Thanks for wading into this, and keep going!  I'm sorry my review took so long.

/be
Attachment #215043 - Flags: review?(brendan) → review+
(In reply to comment #32)
> Is branch-less FLOOR_LOG2 variant actually faster than branching one?  IIRC x86
> has small penalty for mispredict where instruction prefetch buffer contains the
> target, as it almost always will here.

Honestly I have no idea if this is faster but for sure it is longer. I just coded it this way as an exersize. Moreover, the thread http://www.ussg.iu.edu/hypermail/linux/kernel/0304.3/1492.html
implies that jumpless vesion indeed not the fastest while been the biggest. So I will define js_FloorLog2wImpl as JS_FloorLog2 for 32-bit case and code 64-bit version similar to JS_FloorLog2.

> 
> In GET_GAP_AND_CHUNK_SPAN, is strength-reduction by hand necessary, or would
> gcc and msvc do the right thing with constant divisor?

In GET_GAP_AND_CHUNK_SPAN thingSize is not constant! It denotes the size of things to allocate for a particular arena list. In this way for XML-things thingsPerUnscannedChunk would be one leading to much faster scanning phase. Still I could be wrong assuming that integer devision is slow. 

But then I do not want to touch GET_GAP_AND_CHUNK_SPAN as it took me a few attempts to get it right.
Attached patch Implementation v13 (obsolete) — Splinter Review
Here is an update to reflect nits from comment 32.
Attachment #215043 - Attachment is obsolete: true
Attachment #215820 - Flags: review?(brendan)
Comment on attachment 215820 [details] [diff] [review]
Implementation v13

>+ * When we allocate things with size equal to sizeof(JSGCThing), the overhead
>+ * of this scheme for 32 bit platforms is (8+8*(8+1))/(8+9K) or 0.87%
>+ * (assuming 4 bytes for each JSGCArena header, and 8 bytes for each
>+ * JSGCThing and JSGCPageInfo). When thing_size > 8, the scheme wastes the
>+ * flag byte for each extra 8 bytes beyond sizeof(JSGCThing) in thing_size
>+ * and the overhead is close to 1/8 or 12.5%.

Could you add a FIXME: bug NNNNNN here?  IIRC you have the followup bug on file already.  Thanks.

>+ * we can replace "jsuword offsetInArena" in JSGCPageInfo by pair of
>+ * "uint8 card_mark;" uint16 offsetInArena;" as the offset can not exceed

Use | instead of " quoting for code fragments?

>+++ src/jsgc.h
>@@ -81,6 +81,9 @@ JS_BEGIN_EXTERN_C
> extern uint8 *
> js_GetGCThingFlags(void *thing);
> 
>+extern JSRuntime *
>+js_GetGCThingRuntime(void *thing);
>+

Remove this unused declaration.

>@@ -234,11 +241,11 @@ typedef struct JSGCStats {
>     uint32  maxdepth;   /* maximum mark tail recursion depth */
>     uint32  cdepth;     /* mark recursion depth of C functions */
>     uint32  maxcdepth;  /* maximum mark recursion depth of C functions */
>-    uint32  dswmark;    /* mark C stack overflows => Deutsch-Schorr-Waite */
>-    uint32  dswdepth;   /* DSW mark depth */
>-    uint32  maxdswdepth;/* maximum DSW mark depth */
>-    uint32  dswup;      /* DSW moves up the mark spanning tree */
>-    uint32  dswupstep;  /* steps in obj->slots to find DSW-reversed pointer */
>+    uint32  unscannedAdds;      /* mark C stack overflows or number of times
>+                                   things were put to unscanned bag */

Suggest keeping haxor-ish all-lowercase JSGCStats.foopy naming convention, so unscannedadds.  But agree that is ugly and overlong.  Suggest something that connotes overflow or over-recursion.  How about overmark?  Or unscanadd?  Or just unscanned.

Nit: "put in" rather than "put to".

>+#ifdef DEBUG
>+    uint32  maxunscanned;       /* maximum size of unscanned bag */
>+#endif

Nicer name, so perhaps unscanned for previous member is best.

>+++ src/jsscope.h
>@@ -203,7 +203,6 @@ struct JSScope {
>     JSObject        *object;            /* object that owns this scope */
>     uint8           flags;              /* flags, see below */
>     int8            hashShift;          /* multiplicative hash shift */
>-    uint16          dswIndex;           /* Deutsch-Schorr-Waite scaled index */

uint16 spare; here to remind next hacker where to claim space, if possible.

r=me again!

/be
Attachment #215820 - Flags: review?(brendan) → review+
Attached patch Implementation v14 (obsolete) — Splinter Review
Here is a version that kills nits from comment 35.
Attachment #215820 - Attachment is obsolete: true
Patch to commit: there was a CVS update that toched unrelated to the patch line is jsgc.c
Attachment #215829 - Attachment is obsolete: true
I committed the patch from attachment 215895 [details] [diff] [review].
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Blocks: 331598
Checking in e4x/Regress/regress-324278.js;
/cvsroot/mozilla/js/tests/e4x/Regress/regress-324278.js,v  <--  regress-324278.js
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/js/tests/js1_5/Regress/regress-324278.js,v
done
Checking in js1_5/Regress/regress-324278.js;
/cvsroot/mozilla/js/tests/js1_5/Regress/regress-324278.js,v  <--  regress-324278.js
initial revision: 1.1
done

still crashes in windows trunk browser and shell
Status: RESOLVED → REOPENED
Flags: in-testsuite+
Resolution: FIXED → ---
(In reply to comment #39)
> /cvsroot/mozilla/js/tests/e4x/Regress/regress-324278.js,v  <-- 
...
> still crashes in windows trunk browser and shell

This is because jsDriver does not pass -S <some-reasobale-value> to the shell so the engine assumes that the stack size is unlimitted. I suggest to change jsDriver.pl to always pass -S 500000 to the shell invocation to match the reality of real world hardware with just limitted amount of stack ;).
I sugest to change js.c as patching it is simpler than altering jsDriver.pl and can potentially expose more runaway recursion bugs.
Attachment #219515 - Flags: review?(brendan)
actually I ran it using js, but it appears the order of -S matters:


$ /work/mozilla/builds/ff/trunk/mozilla/js/src/WINNT5.1_DBG.OBJ/js -x -S 528000 -f e4x/shell.js -f e4x/Regress/regress-324278.js 
BUGNUMBER: 324278
STATUS: GC without recursion
N = 1000000
preparing...
prepared for 1000000
before 27271328, after 27271328, break 00000000
before 48366448, after 48366448, break 00000000

$ /work/mozilla/builds/ff/trunk/mozilla/js/src/WINNT5.1_DBG.OBJ/js -S 528000 -f js1_5/shell.js -f js1_5/Regress/regress-324278.js          
BUGNUMBER: 324278

STATUS: GC without recursion

STATUS: BUILT

before 3868208, after 3868208, break 00000000
STATUS: CHECKED

before 3868208, after 9232, break 00000000

$ /work/mozilla/builds/ff/trunk/mozilla/js/src/WINNT5.1_DBG.OBJ/js -x -f e4x/shell.js -f e4x/Regress/regress-324278.js -S 528000

$ echo $?
5

$ /work/mozilla/builds/ff/trunk/mozilla/js/src/WINNT5.1_DBG.OBJ/js -f js1_5/shell.js -f js1_5/Regress/regress-324278.js -S 528000

$ echo $?
5

I am pretty sure the e4x test crashed for me earlier, but I haven't been able to reproduce it this time.

When I run the js1_5 test in a fresh browser it crashes, but I don't get a stack. Running it a window that I had previously used for running tests did give the following stack.

		cx	0x03b4e188	JSContext *
		script	0x01b5c509	JSScript *
+		rv	0x0012f24c	unsigned int *
+		result	0x00af2f20	nsIPrincipal *
-		nsJSPrin	0x00a364f4 {nsIPrincipalPtr=??? }	nsJSPrincipals *
-		JSPrincipals	{codebase=??? getPrincipalArray=??? globalPrivilegesEnabled=??? ...}	JSPrincipals
		codebase	CXX0030: Error: expression cannot be evaluated	
		getPrincipalArray	CXX0030: Error: expression cannot be evaluated	
		globalPrivilegesEnabled	CXX0030: Error: expression cannot be evaluated	
		refcount	CXX0030: Error: expression cannot be evaluated	
		destroy	CXX0030: Error: expression cannot be evaluated	
		subsume	CXX0030: Error: expression cannot be evaluated	
		nsIPrincipalPtr	CXX0030: Error: expression cannot be evaluated	
-		jsp	0x00a364f4 {codebase=??? getPrincipalArray=??? globalPrivilegesEnabled=??? ...}	JSPrincipals *
		codebase	CXX0030: Error: expression cannot be evaluated	
		getPrincipalArray	CXX0030: Error: expression cannot be evaluated	
		globalPrivilegesEnabled	CXX0030: Error: expression cannot be evaluated	
		refcount	CXX0030: Error: expression cannot be evaluated	
		destroy	CXX0030: Error: expression cannot be evaluated	
		subsume	CXX0030: Error: expression cannot be evaluated	


>	caps.dll!nsScriptSecurityManager::GetScriptPrincipal(JSContext * cx=0x03b4e188, JSScript * script=0x01b5c509, unsigned int * rv=0x0012f24c)  Line 1940 + 0x3 bytes	C++
 	caps.dll!nsScriptSecurityManager::GetFramePrincipal(JSContext * cx=0x03b4e188, JSStackFrame * fp=0x0012ec14, unsigned int * rv=0x0012f24c)  Line 2015 + 0x11 bytes	C++
 	caps.dll!nsScriptSecurityManager::GetPrincipalAndFrame(JSContext * cx=0x03b4e188, JSStackFrame * * frameResult=0x0012f224, unsigned int * rv=0x0012f24c)  Line 2049 + 0x11 bytes	C++
 	caps.dll!nsScriptSecurityManager::GetSubjectPrincipal(JSContext * cx=0x03b4e188, unsigned int * rv=0x0012f24c)  Line 2091 + 0x11 bytes	C++
 	caps.dll!nsScriptSecurityManager::doGetSubjectPrincipal(unsigned int * rv=0x0012f24c)  Line 1699 + 0xd bytes	C++
 	caps.dll!nsScriptSecurityManager::GetSubjectPrincipal(nsIPrincipal * * aSubjectPrincipal=0x0012f278)  Line 1683 + 0xc bytes	C++
 	caps.dll!nsScriptSecurityManager::SubjectPrincipalIsSystem(int * aIsSystem=0x0012f28c)  Line 1720 + 0x26 bytes	C++
 	gklayout.dll!nsContentUtils::IsCallerChrome()  Line 1006 + 0x17 bytes	C++
 	gklayout.dll!PresShell::HandleEventInternal(nsEvent * aEvent=0x0012f58c, nsIView * aView=0x04b32920, nsEventStatus * aStatus=0x0012f3a0)  Line 6113 + 0x5 bytes	C++
 	gklayout.dll!PresShell::HandleEvent(nsIView * aView=0x04b32920, nsGUIEvent * aEvent=0x0012f58c, nsEventStatus * aEventStatus=0x0012f3a0)  Line 5921 + 0x17 bytes	C++
 	gklayout.dll!nsViewManager::HandleEvent(nsView * aView=0x04b32920, nsPoint aPoint={...}, nsGUIEvent * aEvent=0x0012f58c, int aCaptured=0x00000000)  Line 1712	C++
 	gklayout.dll!nsViewManager::DispatchEvent(nsGUIEvent * aEvent=0x0012f58c, nsEventStatus * aStatus=0x0012f4c8)  Line 1665 + 0x22 bytes	C++
 	gklayout.dll!HandleEvent(nsGUIEvent * aEvent=0x0012f58c)  Line 174	C++
 	gkwidget.dll!nsWindow::DispatchEvent(nsGUIEvent * event=0x0012f58c, nsEventStatus & aStatus=nsEventStatus_eIgnore)  Line 1103 + 0xc bytes	C++
 	gkwidget.dll!nsWindow::DispatchWindowEvent(nsGUIEvent * event=0x0012f58c)  Line 1124	C++
 	gkwidget.dll!nsWindow::DispatchFocus(unsigned int aEventType=0x0000006c, int isMozWindowTakingFocus=0x00000000)  Line 6103 + 0x11 bytes	C++
 	gkwidget.dll!nsWindow::ProcessMessage(unsigned int msg=0x00000008, unsigned int wParam=0x00000000, long lParam=0x00000000, long * aRetValue=0x0012f9d4)  Line 4724 + 0x19 bytes	C++
 	gkwidget.dll!nsWindow::WindowProc(HWND__ * hWnd=0x002002de, unsigned int msg=0x00000008, unsigned int wParam=0x00000000, long lParam=0x00000000)  Line 1292 + 0x1d bytes	C++
 	user32.dll!77d48734() 	

Patching js.c to use a default stack is fine with me. The automated tests use a value of 528000 when calling jsDriver.pl.
Comment on attachment 219515 [details] [diff] [review]
js shell: assume by default 5e5 stack limit

IIRC the DOM uses 512*1024 as its default hread stack limit.  Might want to match that, or might not -- catch other bugs with a slightly different limit?  Anyway, r=me.

/be
Comment on attachment 219515 [details] [diff] [review]
js shell: assume by default 5e5 stack limit

IIRC the DOM uses 512*1024 as its default hread stack limit.  Might want to match that, or might not -- catch other bugs with a slightly different limit?  Anyway, r=me.

/be
Attachment #219515 - Flags: review?(brendan) → review+
I committed the last attachment to set the default stack quota to 5e5.  
(In reply to comment #42)
> actually I ran it using js, but it appears the order of -S matters:

Yes, -S affects  only the scripts that comes after it.
> 
> When I run the js1_5 test in a fresh browser it crashes, but I don't get a
> stack. Running it a window that I had previously used for running tests did
> give the following stack.

What happens if N in the test  is set to 10000 or 100000?
(In reply to comment #46)

> 
> What happens if N in the test  is set to 10000 or 100000?

100*1000->Crash no stack, 10*1000->Crash no stack, 1*1000->No Crash

(In reply to comment #47)
> (In reply to comment #46)
> 
> > 
> > What happens if N in the test  is set to 10000 or 100000?
> 
> 100*1000->Crash no stack, 10*1000->Crash no stack, 1*1000->No Crash
> 

This attachment contains standalone HTML version of the js1_5 test. It does not crash against CVS head build even with N=1e5. After few warning dialogs about too long running script it prints the success status. So is it possible to run this in the environment were js1_5 crashes?
(In reply to comment #48)

It does not crash, but if you replace the function gc in your example with the version from the test library, it will at least assert in a debug build.

-		table	0x010a2094 {ops=0x004fca3c data=0x00000000 hashShift=0x0016 ...}	JSDHashTable *
+		ops	0x004fca3c stub_ops {allocTable=0x00421712 freeTable=0x00421582 getKey=0x00421cf3 ...}	const JSDHashTableOps *
		data	0x00000000	void *
		hashShift	0x0016	short
		maxAlphaFrac	0xc0 'À'	unsigned char
		minAlphaFrac	0x40 '@'	unsigned char
		entrySize	0x0000000c	unsigned long
		entryCount	0x000002f0	unsigned long
		removedCount	0x00000003	unsigned long
		generation	0x00000006	unsigned long
+		entryStore	0x03b01850 ""	char *
		key	0x03da22f4	const void *
		op	JS_DHASH_ADD	JSDHashOperator
		deltaLog2	0x30030600	int
+		entry	0x0000000b {keyHash=??? }	JSDHashEntryHdr *
		size	0x00af2f58	unsigned long
		keyHash	0x0012e7f4	unsigned long

 	ntdll.dll!_DbgBreakPoint@0() 	
 	js3250.dll!JS_Assert(const char * s=0x00517dd8, const char * file=0x00517d9c, int ln=0x00000218)  Line 62	C
>	js3250.dll!JS_DHashTableOperate(JSDHashTable * table=0x010a2094, const void * key=0x03da22f4, JSDHashOperator op=JS_DHASH_ADD)  Line 536 + 0x44 bytes	C
 	js3250.dll!js_AddRootRT(JSRuntime * rt=0x010a2018, void * rp=0x03da22f4, const char * name=0x011fb744)  Line 624 + 0x10 bytes	C
 	js3250.dll!js_AddRoot(JSContext * cx=0x03b45dc8, void * rp=0x03da22f4, const char * name=0x011fb744)  Line 589 + 0x14 bytes	C
 	js3250.dll!JS_AddNamedRoot(JSContext * cx=0x03b45dc8, void * rp=0x03da22f4, const char * name=0x011fb744)  Line 1677 + 0x11 bytes	C
 	xpc3250.dll!nsXPCWrappedJS::AddRef()  Line 144 + 0x1e bytes	C++
 	xpc3250.dll!nsXPCWrappedJSClass::DelegatedQueryInterface(nsXPCWrappedJS * self=0x03da22d8, const nsID & aIID={...}, void * * aInstancePtr=0x0012ea10)  Line 582	C++
 	xpc3250.dll!nsXPCWrappedJS::QueryInterface(const nsID & aIID={...}, void * * aInstancePtr=0x0012ea10)  Line 105	C++
 	xpcom_core.dll!nsWeakReference::QueryReferent(const nsID & aIID={...}, void * * aInstancePtr=0x0012ea10)  Line 151 + 0x24 bytes	C++
 	gklayout.dll!nsMarkedJSFunctionHolder_base::Get(const nsID & aIID={...})  Line 292 + 0x16 bytes	C++
 	gklayout.dll!nsMarkedJSFunctionHolder<nsIDOMEventListener>::Get()  Line 144 + 0x12 bytes	C++
 	gklayout.dll!nsEventListenerManager::HandleEvent(nsPresContext * aPresContext=0x03297528, nsEvent * aEvent=0x0012ef40, nsIDOMEvent * * aDOMEvent=0x0012eb28, nsISupports * aCurrentTarget=0x03913cd0, unsigned int aFlags=0x00000002, nsEventStatus * aEventStatus=0x0012eb2c)  Line 1722 + 0xc bytes	C++
 	gklayout.dll!nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=0x00000002)  Line 335	C++
 	gklayout.dll!nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=0x00000006, nsDispatchingCallback * aCallback=0x00000000)  Line 478	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 401 + 0x12 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventTargetChainItem::CreateChainAndHandleEvent(nsEventChainPreVisitor & aVisitor={...}, nsDispatchingCallback * aCallback=0x00000000)  Line 392 + 0x10 bytes	C++
 	gklayout.dll!nsEventDispatcher::Dispatch(nsISupports * aTarget=0x03a5df20, nsPresContext * aPresContext=0x03297528, nsEvent * aEvent=0x0012ef40, nsIDOMEvent * aDOMEvent=0x00000000, nsEventStatus * aEventStatus=0x0012ef3c, nsDispatchingCallback * aCallback=0x00000000, int aTargetIsChromeHandler=0x00000000)  Line 575 + 0x10 bytes	C++
 	gklayout.dll!nsEventStateManager::DispatchMouseEvent(nsGUIEvent * aEvent=0x0012f60c, unsigned int aMessage=0x0000014c, nsIContent * aTargetContent=0x03a5df20, nsIContent * aRelatedContent=0x03915818)  Line 2590 + 0x21 bytes	C++
 	gklayout.dll!nsEventStateManager::NotifyMouseOut(nsGUIEvent * aEvent=0x0012f60c, nsIContent * aMovingInto=0x03915818)  Line 2661	C++
 	gklayout.dll!nsEventStateManager::NotifyMouseOver(nsGUIEvent * aEvent=0x0012f60c, nsIContent * aContent=0x03915818)  Line 2711	C++
 	gklayout.dll!nsEventStateManager::GenerateMouseEnterExit(nsGUIEvent * aEvent=0x0012f60c)  Line 2750	C++
 	gklayout.dll!nsEventStateManager::PreHandleEvent(nsPresContext * aPresContext=0x03297528, nsEvent * aEvent=0x0012f60c, nsIFrame * aTargetFrame=0x03b3dc5c, nsEventStatus * aStatus=0x0012f3d0, nsIView * aView=0x032945b0)  Line 513	C++
 	gklayout.dll!PresShell::HandleEventInternal(nsEvent * aEvent=0x0012f60c, nsIView * aView=0x032945b0, nsEventStatus * aStatus=0x0012f3d0)  Line 6140 + 0x36 bytes	C++
 	gklayout.dll!PresShell::HandlePositionedEvent(nsIView * aView=0x032945b0, nsIFrame * aTargetFrame=0x03b3dc5c, nsGUIEvent * aEvent=0x0012f60c, nsEventStatus * aEventStatus=0x0012f3d0)  Line 6025 + 0x14 bytes	C++
 	gklayout.dll!PresShell::HandleEvent(nsIView * aView=0x032945b0, nsGUIEvent * aEvent=0x0012f60c, nsEventStatus * aEventStatus=0x0012f3d0)  Line 5853 + 0x1b bytes	C++
 	gklayout.dll!nsViewManager::HandleEvent(nsView * aView=0x032945b0, nsPoint aPoint={...}, nsGUIEvent * aEvent=0x0012f60c, int aCaptured=0x00000000)  Line 1712	C++
 	gklayout.dll!nsViewManager::DispatchEvent(nsGUIEvent * aEvent=0x0012f60c, nsEventStatus * aStatus=0x0012f4f8)  Line 1665 + 0x22 bytes	C++
 	gklayout.dll!HandleEvent(nsGUIEvent * aEvent=0x0012f60c)  Line 174	C++
 	gkwidget.dll!nsWindow::DispatchEvent(nsGUIEvent * event=0x0012f60c, nsEventStatus & aStatus=nsEventStatus_eIgnore)  Line 1103 + 0xc bytes	C++
 	gkwidget.dll!nsWindow::DispatchWindowEvent(nsGUIEvent * event=0x0012f60c)  Line 1124	C++
 	gkwidget.dll!nsWindow::DispatchMouseEvent(unsigned int aEventType=0x0000012c, unsigned int wParam=0x00000000, long lParam=0x031902e3)  Line 5983 + 0x1a bytes	C++
 	gkwidget.dll!ChildWindow::DispatchMouseEvent(unsigned int aEventType=0x0000012c, unsigned int wParam=0x00000000, long lParam=0x031902e3)  Line 6166	C++
 	gkwidget.dll!nsWindow::ProcessMessage(unsigned int msg=0x00000200, unsigned int wParam=0x00000000, long lParam=0x031902e3, long * aRetValue=0x0012fac0)  Line 4457 + 0x20 bytes	C++
 	gkwidget.dll!nsWindow::WindowProc(HWND__ * hWnd=0x002d02e0, unsigned int msg=0x00000200, unsigned int wParam=0x00000000, long lParam=0x031902e3)  Line 1292 + 0x1d bytes	C++
 	user32.dll!77d48734() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for user32.dll]	
 	user32.dll!77d48816() 	
 	user32.dll!77d489cd() 	
 	user32.dll!77d49402() 	
 	user32.dll!77d48a10() 	
 	gkwidget.dll!nsAppShell::Run()  Line 135	C++
 	tkitcmps.dll!nsAppStartup::Run()  Line 161 + 0x1c bytes	C++
 	xul.dll!XRE_main(int argc=0x00000003, char * * argv=0x00af7f10, const nsXREAppData * aAppData=0x004036b0)  Line 2364 + 0x25 bytes	C++
 	firefox.exe!main(int argc=0x00000003, char * * argv=0x00af7f10)  Line 61 + 0x13 bytes	C++
 	firefox.exe!__tmainCRTStartup()  Line 586 + 0x19 bytes	C
 	firefox.exe!mainCRTStartup()  Line 403	C
 	kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes
not seeing the crashes in 20060425 and 20060429 trunk builds. resetting as FIXED.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
verified fixed 1.9 20060520.
Status: RESOLVED → VERIFIED
I am crashing on all platforms for trunk builds in browser tests of e4x/GC/regress-324278.js. This may be related to the crashes I am seeing in bug 324117 comment 18, but I can't get visual studio debugger to fire up so I can get a stack.
note to self: if venkman is not installed or is disabled this does not crash on
trunk, however if venkman is available its jsd GC will cause a crash due to a
stack overflow. See bug 342737.
note to self: the automated test run found crashes in macppc trunk for opt and debug 20060707 builds during the browser test for e4x/GC/regress-324278.js but I could not reproduce the crashes with a nightly build from 20060707 nor the cvs debug build.
I last saw this fail on the 1.8 branch on 2006-07-06. I think this was fixed during the js1.7 landing. Marking fixed1.8.1. Brendan, if this is incorrect please remove. I'll verify it on the next pass.
Keywords: fixed1.8.1
verified no crashes 1.8.1, 1.9 windows/mac(ppc|tel)/linux 20060728 but they do timeout. 1.8.0.5 definitely still shows the crash.
Blocks: 323153
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: