473769 - stack frame size limited to 1K due to allocation structures

Reporter

Description

•

17 years ago

class Assembler uses the same constant for AR.entry[] as for Assembler._resvTable[]. this means we can only have up to 256 reservations, or 256 words on the stack, whichever limit is hit first. The reservation index in LIns is the "real" limit, and should only limit the # of reservations, not the size of the stack frame. if you have reservations whose size is >4 bytes (for doubles, or LIR_alloc areas), its easy to run out of stack space before running out of live reservation structs. An extreme example is on PPC or PPC64 where sizeof(jmp_buf) is 768 or 920, respectively, and Tamarin methods with try/catch blocks need to allocate a jmp_buf on the stack. Those methods almost always fall back to interpretation because of these resource limits.

Edwin Smith

Reporter

Comment 1

•

16 years ago

Tracemonkey's nanojit has partly fixed this issue by widening LIR instructions to 4 words each, and moving Reservation into the instruction. see bug 490947. Merging this change eliminates the 8bit restriction on the # of Reservations, and completely eliminates Assembler._resvTable. We can then make AR.entry[] arbitrarily longer. Reservation.arIndex is still limited to 16 bits, allowing at most 256k per stack frame, up from 1K.

Rick Reitmaier

Comment 2

•

16 years ago

The only reason we used an 8bit index for the reservation table was to keep instruction size small. It would be interesting to collect some data to see how much (if any?) we are saving by compressing the instruction into a single word (i.e as we are currently). If there are some savings (i.e tramp usage is low) then maybe a 64bit form might be good middle ground. A full 16Bytes, while attractive in many ways (speed/simplicity), does represent a full 4x increase in size.

Andreas Gal :gal

Comment 3

•

16 years ago

Nick is working on a variable sized LIR encoding. There is no hard reason to make LIns fixed sized actually. He reported a 2x+ code size reduction from the 16Byte variant. We should cross link the bugs ... and well really get back to merging soon :)

Edwin Smith

Reporter

Comment 4

•

16 years ago

I gave up on the 64bit LIns middle ground after experimenting in TT. it's slightly bigger, and slightly less tramps compared to 32bit LIns, but you still have the headaches and cost of tramps. A variable sized LIns that uses full pointers, or offsets from the start of a movable-but-contiguous is a better middle ground because its only a modest increase in size (<4x, probably 2-3x in practice), but eliminates the tramp complexity.

Andreas Gal :gal

Comment 5

•

16 years ago

This would be a good funnel to facilitate merging us back together. Maybe you guys can talk to nick and make sure that his design meets your requirements as well, and then we port and merge back and forth. We are mostly done with 3.5, so we should be able to take disruptive changes onto trunk in two weeks or less.

Nicholas Nethercote [inactive]

Comment 6

•

16 years ago

I started a bug for the variable-width LIR in TM: #492866.

Nicholas Nethercote [inactive]

Comment 7

•

16 years ago

FWIW, bug 492866 is pretty well advanced now.

Edwin Smith

Reporter

Comment 8

•

16 years ago

The last piece of work to close out this bug is to update how we manage the stack area (struct AR). currently there are NJ_MAX_STACK_ENTRY (=256) entries we can manage, 4 bytes per entry. easy fix: increase NJ_MAX_STACK_ENTRY, or even make it variable. this is okay when we have mostly scalar entries because each entry points to a different LIns. However, when we have large LIR_alloc's in the code stream, we waste space in AR.entry[]. a LIR_alloc of (say) 128 bytes consumes 128/4 = 32 entries in this array, all of which point to the same LIR_alloc. Same thing happens to a smaller degree when there's lots of 8-byte (double or 64-bit pointer) entries; many runs of two identical LIns* in AR.entry[]. AR.entry is indexed by Reservation->arIndex, which is the scaled displacement from the frame pointer. Its used to free AR entries and in disp() to calculate the frame pointer offset. harder: to compact these runs of the same LIns, we would need to change the data structure, and replace these direct indexing actions with something else. i've spent very little time thinking about it but its basically a small memory allocation problem: minimize the overall frame size, track/re-use free regions, and minimize fragmentation. note: Reservation:arIndex is 16 bits, so we still have an upper limit of 64k*4=256K stack frame size.

Rick Reitmaier

Comment 9

•

16 years ago

For the short-term we can probably get by with tweaking the constant, but if we need/want larger stacks then we'll probably require the *harder* option listed in comment #8.

Lars T Hansen

Updated

•

16 years ago

Priority: -- → P3

Target Milestone: --- → flash10.1

Nicholas Nethercote [inactive]

Comment 10

•

16 years ago

FWIW I'm overflowing the stack regularly with large blocks with lirasm --random (see bug 519873). Eg. on x86 it overflows if I do a block much bigger than 10,000 insns. Increasing NJ_MAX_STACK_ENTRY from 256 to 4096 or 16384 would be really easy. We probably don't want to go all the way to 65536, it would be nice to keep a couple of bits spare for other uses.

Edwin Smith

Reporter

Comment 11

•

16 years ago

The cases where tamarin overflows the structure in practice are caused by LIR_alloc's with large sizes. even with short blocks and very few live ranges, this hit the limits. on ppc64, for example, sizeof(jmp_buf) is huge. I agree, 2^14 is probably an upper bound for the current approach.

Nicholas Nethercote [inactive]

Comment 12

•

16 years ago

Related to comment 8 and comment 9: the algorithm used by arReserve() to find free slots is really awful, particularly for the more-than-8-bytes case.

Edwin Smith

Reporter

Updated

•

16 years ago

Assignee: edwsmith → nobody

Increase NJ_MAX_STACK_ENTRY on desktop systems 16 years ago Steven Johnson 1.24 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review
Increase NJ_MAX_STACK_ENTRY on desktop systems, revise search algo 16 years ago Steven Johnson 18.41 KB, patch		Details \| Diff \| Splinter Review
Encapsulate AR structure 16 years ago Steven Johnson 18.48 KB, patch		Details \| Diff \| Splinter Review
Expand maximum size & optimize algo 16 years ago Steven Johnson 9.15 KB, patch		Details \| Diff \| Splinter Review
Expand maximum size & optimize algo, v2 16 years ago Steven Johnson 11.95 KB, patch	edwsmith : review-	Details \| Diff \| Splinter Review
Encapsulate AR structure, v2 16 years ago Steven Johnson 18.79 KB, patch	n.nethercote : review+ edwsmith : review+	Details \| Diff \| Splinter Review
Expand maximum size & optimize algo, v2a 16 years ago Steven Johnson 9.39 KB, patch	n.nethercote : review-	Details \| Diff \| Splinter Review
Encapsulate AR structure, v3 16 years ago Steven Johnson 18.94 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Expand maximum size & optimize algo, v3 16 years ago Steven Johnson 8.48 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
Speed up AR::validate 16 years ago Steven Johnson 3.78 KB, patch		Details \| Diff \| Splinter Review
Speed up AR::validate, v2 16 years ago Steven Johnson 2.90 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review