Closed
Bug 556175
Opened 15 years ago
Closed 15 years ago
Raise NJ_MAX_STACK_ENTRY on MIPS and ARM, or using a single constant everywhere
Categories
(Core Graveyard :: Nanojit, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Future
People
(Reporter: edwsmith, Unassigned)
References
Details
(Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey)
Attachments
(1 file)
1.10 KB,
patch
|
jbramley
:
review+
|
Details | Diff | Splinter Review |
ARM and MIPS set NJ_MAX_STACK_ENTRY to 256, while other platforms are 4096 except SPARC, which is 8192. The actual stack size used in a give function depends on the code anyway, and dynamic behavior ultimately determines stack depth. A too-low limit saves a very modest amount of memory and time during compile time, and can cause the JIT to fail to compile certian functions.
If there is consensus, I'd also propose we just define one constant for nanojit as a whole (8192) and do away with this per-platform tweak.
If we have evidence that the small constants save compile time materially, lets list that evidence here and then get it into code comments.
Reporter | ||
Updated•15 years ago
|
Target Milestone: --- → Future
Reporter | ||
Comment 1•15 years ago
|
||
(In reply to comment #0)
> If there is consensus, I'd also propose we just define one constant for nanojit
> as a whole (8192) and do away with this per-platform tweak.
IIRC, 4096 represented a size/speed compromise so if we keep a per-platform constant, values other than 4096 should probably be backed with some kind of evidence.
Comment 2•15 years ago
|
||
The size/speed compromise will be different on each platform. I don't know why ARM and MIPS get 256, but it would be interesting to see the effect that increasing it to 4096 has. If it doesn't harm performance significantly, I don't see why it can't be the default for all platforms.
Reporter | ||
Comment 3•15 years ago
|
||
Attachment #437599 -
Flags: review?(chris)
Reporter | ||
Updated•15 years ago
|
Attachment #437599 -
Flags: review?(Jacob.Bramley)
Comment 4•15 years ago
|
||
Comment on attachment 437599 [details] [diff] [review]
Raises ARM and MIPS NJ_MAX_STACK_ENTRY to 4096
Looks right to me. Out of interest, did you get a chance to benchmark it?
Attachment #437599 -
Flags: review?(Jacob.Bramley) → review+
Reporter | ||
Comment 5•15 years ago
|
||
Comment on attachment 437599 [details] [diff] [review]
Raises ARM and MIPS NJ_MAX_STACK_ENTRY to 4096
I haven't benchmarked it yet, but what I expect is:
- no decrease in compiler speed; any slowdown would be due to scanning more stack entries during arm code generation
- possible speedup in execution speed, *if* the higher limit allows more code to be jit-compiled than before.
Reporter | ||
Comment 6•15 years ago
|
||
I inspected the Assembler and confirmed that in non-debug builds, we don't do anything that is O(NJ_MAX_STACK_ENTRY); loops are all bounded by actual stack usage, not the fixed size of the _entries[] array. Testing with -Dverifyonly (jit stress test) showed no compile-speed regressions on ARM.
Reporter | ||
Comment 7•15 years ago
|
||
Whiteboard: fixed-in-nanojit
Reporter | ||
Comment 8•15 years ago
|
||
Turns out, asm_spill() in the ARM backend does not support higher than 12-bit displacements for addressing relative to FP. So, NJ_MAX_STACK_ENTRY is limited to 1024 for the time being.
A quick look at MIPS shows it supports 16-bit displacements for general load/store, and also handles larger displacements in asm_spill, so it should be fine at 4096.
Comment 9•15 years ago
|
||
(In reply to comment #8)
> Turns out, asm_spill() in the ARM backend does not support higher than 12-bit
> displacements for addressing relative to FP. So, NJ_MAX_STACK_ENTRY is limited
> to 1024 for the time being.
Unless FP is set well away from the stack entries we're interested in, you can address all 4KB using those 12 bits. The sign bit is separate, so you have a full ±4095 offset range. It's more limited for VFP but the asm_spill implementation does a separate ADD if necessary. You can't address FP±4096 as that requires 13 bits, but FP±4095 (or below) should be fine.
Also, if you think it'll noticeably improve performance, it wouldn't be too hard to modify asm_spill to accept larger offsets. Do you know how MIPS fares with a larger stack area?
Comment 10•15 years ago
|
||
> Unless FP is set well away
By "well away" I mean "just below the bottom (or above the top) of the stack" :-)
Reporter | ||
Comment 11•15 years ago
|
||
(In reply to comment #9)
> (In reply to comment #8)
> > Turns out, asm_spill() in the ARM backend does not support higher than 12-bit
> > displacements for addressing relative to FP. So, NJ_MAX_STACK_ENTRY is limited
> > to 1024 for the time being.
>
> Unless FP is set well away from the stack entries we're interested in, you can
> address all 4KB using those 12 bits. The sign bit is separate, so you have a
> full ±4095 offset range. It's more limited for VFP but the asm_spill
> implementation does a separate ADD if necessary. You can't address FP±4096 as
> that requires 13 bits, but FP±4095 (or below) should be fine.
Agreed. if NJ_MAX_STACK_ENTRY is 1024, it represents 1024 4-byte words, or 4096 bytes. FP should be exactly adjacent to the 4kbyte area; arDisp() calculates offsets from FP as (-4*arIndex), where arIndex is in the range (0, NJ_MAX_STACK_ENTRY-1).
> Also, if you think it'll noticeably improve performance, it wouldn't be too
> hard to modify asm_spill to accept larger offsets. Do you know how MIPS fares
> with a larger stack area?
Agreed, I'd like to fix asm_spill. Better performance with a higher stack limit only occurs if a method requires a larger stack frame, and gets JIT'd instead of being interpreted due to hitting the stack size limit.
My reading of the MIPS code is that it supports 16-bit (signed) general load/store offsets, and MIPS asm_spill() uses a general load/store helper function that supports larger offsets as well by emitting an int literal and add.
Comment 12•15 years ago
|
||
Oh, sorry. I forgot that MJ_MAX_STACK_ENTRY measures words, not bytes!
I assume that we can just copy the VFP mechanism for ARM's integer code too. There, we call asm_add_imm before doing the load. I don't know how it will interact with the load-store-multiple code in the integer case, but it shouldn't be difficult to do. (Let me know if you want me to do the work; I'm happy to do it.)
Reporter | ||
Comment 13•15 years ago
|
||
(In reply to comment #12)
> Oh, sorry. I forgot that MJ_MAX_STACK_ENTRY measures words, not bytes!
>
> I assume that we can just copy the VFP mechanism for ARM's integer code too.
> There, we call asm_add_imm before doing the load. I don't know how it will
> interact with the load-store-multiple code in the integer case, but it
> shouldn't be difficult to do. (Let me know if you want me to do the work; I'm
> happy to do it.)
Sure, if you want to do it, that would be great. I will submit the patch lowering back to 1024, and your patch can re-raise it. I had the impression that the VFP code only added another 1 bit of range, but I didn't look very closely.
Reporter | ||
Comment 14•15 years ago
|
||
Reporter | ||
Updated•15 years ago
|
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tamarin
![]() |
||
Comment 15•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/c999f4c3c5c6
http://hg.mozilla.org/tracemonkey/rev/533443341d58
Whiteboard: fixed-in-nanojit, fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey
Reporter | ||
Comment 16•15 years ago
|
||
Comment on attachment 437599 [details] [diff] [review]
Raises ARM and MIPS NJ_MAX_STACK_ENTRY to 4096
cancelling review, silence = consent.
Attachment #437599 -
Flags: review?(chris)
Comment 17•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•