Closed
Bug 465582
Opened 17 years ago
Closed 16 years ago
add LIR_jtbl for switch-style jump tables
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P2)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
VERIFIED
FIXED
flash10.1
People
(Reporter: edwsmith, Assigned: edwsmith)
References
Details
(Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey)
Attachments
(1 file, 9 obsolete files)
|
570 bytes,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
CodegenLIR in tamarin implements switch as a linear set of conditional branches. We need a table jump, instead.
oprnd 1 = index (0..N)
oprnd 2 = ref to LIR_skip whose payload is N LIns* entries pointing to N LIR_labels.
restrictions:
* N is limited to something smaller than the LIR page size, anything larger must be broken up with either anohter jump table (two levels) or a few conditional branches to choose the right jump table.
* if any of the labels are backwards-edges, then the whole table jump needs to be treated conservatively as a backward edge since we dont know the register state at at least one target.
| Assignee | ||
Comment 1•17 years ago
|
||
Attachment #348815 -
Flags: review?(rreitmai)
Comment 2•17 years ago
|
||
Comment on attachment 348815 [details] [diff] [review]
prototype for review (x86 only so far)
So we're building into LIR (at least CodegenLIR) the concept of
a jmp table having to be less than 256 entries. Is this too architecture
specific?
Regarding the LIR logic for handling the table branch, I wonder
if we can encode the cascade of jumps to be binary rather than
linear (assuming i'm correct in that for large values of the input
parameter we're doing multiple jt's until we reach the 'upper'
part of the table).
Maybe breaking the table into 256 entries and then using
successive jtbl's to ultimately land on the final target would
be one approach. Of course the overhead of multiple indirect
branches might be high, but i can't imagine worse than a long
cascade of branches.
And finally a few assorted comments:
- underrunProtect should +4 to account for aligned ptr
- not necc but would be more consistent with rest of code if indirect jump instruction encoder was a macro
- LirBufWriter insTable() should this be protected from a large nlabels parameter?
Attachment #348815 -
Flags: review?(rreitmai) → review+
| Assignee | ||
Comment 3•17 years ago
|
||
(In reply to comment #2)
> (From update of attachment 348815 [details] [diff] [review])
> So we're building into LIR (at least CodegenLIR) the concept of
> a jmp table having to be less than 256 entries. Is this too architecture
> specific?
We ought to compute something based on the LIR page size. the table limit must also be a power of two. There are no CPU-specific limits in play here.
> Regarding the LIR logic for handling the table branch, I wonder
> if we can encode the cascade of jumps to be binary rather than
> linear (assuming i'm correct in that for large values of the input
> parameter we're doing multiple jt's until we reach the 'upper'
> part of the table).
>
> Maybe breaking the table into 256 entries and then using
> successive jtbl's to ultimately land on the final target would
> be one approach. Of course the overhead of multiple indirect
> branches might be high, but i can't imagine worse than a long
> cascade of branches.
Either would reduce the cascade cost. I'll see what I can do in short order. i'm not sure how much time i want to spend on it, but better now than later, probably.
> And finally a few assorted comments:
>
> - underrunProtect should +4 to account for aligned ptr
fixed
> - not necc but would be more consistent with rest of code if indirect jump
> instruction encoder was a macro
of course
> - LirBufWriter insTable() should this be protected from a large nlabels
> parameter?
It looks like skip is already protected, and i don't think we need a redundant check. if skip could fail in any way other than assert, we should at least check for failure though. it doesn't look like it can.
Comment 4•17 years ago
|
||
Attachment #351138 -
Flags: review?(edwsmith)
Updated•17 years ago
|
Attachment #351138 -
Attachment is patch: true
Attachment #351138 -
Attachment mime type: application/octet-stream → text/plain
Comment 5•17 years ago
|
||
I made a patch to extend the current implementation with infinite long jump tables. Instead of improving the LIR_skip operation, my patch allows to connect LIR_jtbl instructions, all have their finite label list.
Something like this:
nj_op_switch_imm1 = call nj_op_switch_imm ( jitFrame )
jtbl nj_op_switch_imm1 -> ? ? ? ? ?
jtbl <join> -> ? ? ? ? ?
jtbl <join> -> ? ? ? ? ?
jtbl <join> -> ? ? ? ? ?
jtbl <join> -> ? ? ?
The operand 1 of the first LIR_jtbl is non-NULL, while the others must be defined with NULL for operand 1.
This is the generated code:
0x8322eed call nj_op_switch_imm
0x8322ef2 add esp,16
jtbl nj_op_switch_imm1 -> label21 label1 label2 label3 label4
0x8322efc [0x8322efc]:
0x8322efc 0: [&label21]
0x8322f00 1: [&label1]
0x8322f04 2: [&label2]
0x8322f08 3: [&label3]
0x8322f0c 4: [&label4]
jtbl <join> -> label5 label6 label7 label8 label21
0x8322f10 0: [&label5]
0x8322f14 1: [&label6]
0x8322f18 2: [&label7]
0x8322f1c 3: [&label8]
0x8322f20 4: [&label9]
[...]
My working directory is a little different compared to the tamarin/nanojit repository, but I am sure you will understand my changes. Fortunately, only small changes are required.
| Assignee | ||
Comment 6•17 years ago
|
||
how are really long jump tables handled in the assembler, where pages are allocated in 4K chunks? is it up to the cpu-specific assembler to figure out how to divide up the table?
Comment 7•17 years ago
|
||
(In reply to comment #6)
> how are really long jump tables handled in the assembler, where pages are
> allocated in 4K chunks? is it up to the cpu-specific assembler to figure out
> how to divide up the table?
Does it mean, the pages are not concatenated after the code generation, and the result is a set of small blocks with several unconditional jumps instead of one big memory chunk? If that is true, my approach is bad. I thought the jump table pages are concatenated during the final phase, but if there is no final phase at all...
Comment 8•17 years ago
|
||
The code cache is a collection of 4k pages with jumps connecting them if code flows across two pages. I am planning on adding support for larger pages but conceptually the approach will remain the same.
| Assignee | ||
Comment 9•17 years ago
|
||
C(In reply to comment #7)
> Does it mean, the pages are not concatenated after the code generation, and the
> result is a set of small blocks with several unconditional jumps instead of one
> big memory chunk? If that is true, my approach is bad. I thought the jump table
> pages are concatenated during the final phase, but if there is no final phase
> at all...
Correct, there is no final compaction or code motion phase. If we allow unlimited size LIR_jtbl, then it just moves the problem to the (several) backends to handle individually.
If we restrict LIR_jtbl, backends can support that directly assuming the size of the jump table is the same in native code as in lir (true), and if the native page size is the same as LIR (likely but not required).
It would be nice if a backend could look at LIR_jtbl and discover that it was just part of a jump table heirarchy, and turn the whole collection into a single larger jump table, as an optimization. Any ideas on how to encode jtbl so that is possible?
| Assignee | ||
Updated•17 years ago
|
Attachment #351138 -
Flags: review?(edwsmith)
Comment 10•16 years ago
|
||
(In reply to comment #0)
> CodegenLIR in tamarin implements switch as a linear set of conditional
> branches. We need a table jump, instead.
I would like to suggest that jump-tables are not necessarily the ideal representation for a switch. Experiments indicate that dispatch tables incur inordinate overheads due to unpredictable branches [1]. Up to about 4 cases, the current linear sequence of conditionals is optimal, and binary tree based dispatch optimal from 4 to 10 cases. Dispatch tables should only be used for more than 10 cases.
[1] http://www.sagecertification.org/events/javavm02/full_papers/zendra/zendra_html/index.html
Comment 11•16 years ago
|
||
My nanojit based inline threading work (Bug 506182) needs LIR_jtbl (or LIR_ji)
in order to implement switch, generators, and reentering generated code after
falling off trace.
| Assignee | ||
Comment 12•16 years ago
|
||
it occurs to me that the table-size limitations mentioned previously are all but gone in nanojit now, because
Fields in LIR instructions are generally word sized: a length field can be big, and a table pointer can point directly to a long jump table allocated with nanojit::Allocator (the same one used for LirBuffer, with the same lifetime), without any LIR_skip pain.
The backend for generating a jump table also can put the jump table in nanojit::Allocator memory, arbitrarily large -- its data after all and doesn't have to be allocated through CodeAlloc.
Updated•16 years ago
|
Flags: flashplayer-qrb?
| Assignee | ||
Updated•16 years ago
|
OS: Windows XP → All
Priority: -- → P2
Hardware: x86 → All
Target Milestone: --- → flash10.1
| Assignee | ||
Updated•16 years ago
|
Assignee: nobody → edwsmith
Status: NEW → ASSIGNED
| Assignee | ||
Comment 13•16 years ago
|
||
new opcode: LIR_jtbl. jtbl takes a uint32_t index and a table of label references (LIns**), representing a jump to one of the labels.
the index must be in range (range checking is not included in the opcode).
the table must be allocated in memory at least as long lived as the LIR.
In the x86 backend, this is implemented with a simple jmp [R*4+ADDR] where ADDR is the address of the table. I added a new dataAllocator (Allocator&) parameter to Assembler, which is used for allocating data along with code (data & code have same lifetime). The x86 backend allocates the final table of addresses from this allocator and embeds the pointer to the table in code.
opinion wanted: in principle we could avoid the extra dataAllocator parameter by changing the semantics of LIR_jtbl's table.
+ if the table were used for LIns* addresses (of LIR_label instructions) and then was overrwritten with native code addresses, then we'd have one less allocation and no need for the dataAllocator parameter to Assembler.
+ since the host vm allocates this table and knows its address, the native addresses inside can be patched later, without much extra hassle (assuming of course that such patching wouldn't alter dataflow assumptions, live ranges, etc)
- but, it seems wonky to overload the table... as currently implemented, LIR data and native data are disjoint.
- overloading the table would seriously hose LIR inlining
Later, we probably want to use dataAllocator for constant pools on cpu's that like to have constant pools (i.e. everyone except x86) and can afford to dedicate a register to point to it.
Attachment #348815 -
Attachment is obsolete: true
Attachment #351138 -
Attachment is obsolete: true
Attachment #408880 -
Flags: review?(nnethercote)
| Assignee | ||
Comment 14•16 years ago
|
||
caveat: the jump instruction itself is open-coded but i'll macroize it like other instructions.
Updated•16 years ago
|
Flags: flashplayer-qrb? → flashplayer-qrb+
| Assignee | ||
Comment 15•16 years ago
|
||
Attachment #408880 -
Attachment is obsolete: true
Attachment #408880 -
Flags: review?(nnethercote)
| Assignee | ||
Comment 16•16 years ago
|
||
| Assignee | ||
Comment 17•16 years ago
|
||
| Assignee | ||
Comment 18•16 years ago
|
||
Attachment #409098 -
Attachment is obsolete: true
Attachment #409125 -
Flags: review?(rreitmai)
| Assignee | ||
Updated•16 years ago
|
Attachment #409099 -
Flags: review?(rreitmai)
| Assignee | ||
Updated•16 years ago
|
Attachment #409120 -
Flags: review?(Jacob.Bramley)
Comment 19•16 years ago
|
||
Comment on attachment 409125 [details] [diff] [review]
(updated) adds LIR_jtbl for x86 and x64
I'm in two minds about storing the jump table separately from the LIR. One one hand, it's nice to not have another variable-width LIns (as calls are, and skips were). On the other hand, it has potential for memory errors, either dangling pointers or leaks. I guess Valgrind can catch such errors, but it'd be nicer if they weren't possible.
As for the dataAllocator parameter: please keep it! Getting rid of it sounds like a dirty hack that will cause no end of problems in the future.
>diff -r 71c36aa36b66 nanojit/Assembler.cpp
>--- a/nanojit/Assembler.cpp Thu Oct 29 08:19:44 2009 -0400
>+++ b/nanojit/Assembler.cpp Thu Oct 29 13:58:04 2009 -0400
>+ Register Assembler::registerAllocTmp(RegisterMask allow)
>+ {
>+ Register tmp = registerAlloc(allow);
>+ _allocator.addFree(tmp);
>+ return tmp;
> }
I introduced a function called registerAllocTmp() in bug 513863, which is different to this one. Can you just undo your registerAllocTmp() changes and let bug 513863 deal with it?
>+ Allocator& alloc; // for use while assembling
>+ CodeAlloc& _codeAlloc; // for code we generate
>+ Allocator& _dataAlloc; // for data used by generated code
The comments for the latter two are good, but the one for 'alloc' isn't very illuminating. Can you clarify further?
>+ // - Jtbl encodes a jump table with a variable # of targets as follows
>+ //
>+ // [ table <-- pointer to LIns*[] for each target
>+ // size
>+ // opcode + resv ] <-- LIns* ins
>+ //
This comment isn't necessary... LInsJtbl isn't particularly different to the others (unlike skips and calls, which are different and so do merit their own comment). And it doesn't match the code anyway (field order is wrong, oprnd_1 is missing, the sense of the "<--" arrow is different :).
> [...]
>+ // used for LIR_jtbl
>+ class LInsJtbl
>+ {
>+ private:
>+ friend class LIns;
>+
>+ uint32_t size; // number of entries in table
>+ LIns** table; // pointer to table with size entries
>+ LIns* oprnd_1; // expression used to index the table
>+
>+ LIns ins;
>+
>+ public:
>+ LIns* getLIns() { return &ins; }
>+ };
Can you mention in the class comment that 'oprnd_1' isn't range-checked against 'size'?
And also something about the lifetime of the table, ie. that it must match the LIR's lifetime.
| Assignee | ||
Comment 20•16 years ago
|
||
(In reply to comment #19)
> (From update of attachment 409125 [details] [diff] [review])
> I'm in two minds about storing the jump table separately from the LIR. One one
> hand, it's nice to not have another variable-width LIns (as calls are, and
> skips were). On the other hand, it has potential for memory errors, either
> dangling pointers or leaks. I guess Valgrind can catch such errors, but it'd
> be nicer if they weren't possible.
the paranoia is justified. Since LirBuffer always gets its memory from a particular (vm provided) nanojit::Allocator, how I rearrange the api to always use that one. That way it will be forced to have the same lifetime as the jtbl instruction using it.
Maybe: drop the table parameter to insJtbl, and have it allocate it internally from the same allocator that allocates LIR itself, then hosts can do jtbl_ins->getTable() and fill it in as entries are patched.
> As for the dataAllocator parameter: please keep it! Getting rid of it sounds
> like a dirty hack that will cause no end of problems in the future.
coolio
> I introduced a function called registerAllocTmp() in bug 513863, which is
> different to this one. Can you just undo your registerAllocTmp() changes and
> let bug 513863 deal with it?
no problem, dependency added.
> >+ Allocator& alloc; // for use while assembling
> >+ CodeAlloc& _codeAlloc; // for code we generate
> >+ Allocator& _dataAlloc; // for data used by generated code
>
> The comments for the latter two are good, but the one for 'alloc' isn't very
> illuminating. Can you clarify further?
yep
> This comment isn't necessary... LInsJtbl isn't particularly different to the
> others (unlike skips and calls, which are different and so do merit their own
> comment). And it doesn't match the code anyway (field order is wrong, oprnd_1
> is missing, the sense of the "<--" arrow is different :).
doh, yeah i had to rearrange to put oprnd1 in the right place. will drop the comment.
> Can you mention in the class comment that 'oprnd_1' isn't range-checked against
> 'size'?
> And also something about the lifetime of the table, ie. that it must match the
> LIR's lifetime.
sure.
| Assignee | ||
Comment 21•16 years ago
|
||
One snag: Neither unionRegisterState() nor intersectRegisterState() can deal with the situation when N forward edges from a LIR_jtbl can't be reconciled. Once the code at two labels has been generated and both want different instructions in the same register, there's nothing to be done except somehow insert code along the jtbl -> label edge that fixes registers.
When the jtbl is replaced by a series of jt's then registers can be shuffled inbetween each jt.
brainstorming:
a) spill all registers at all labels (lame!)
b) same as (a) but only in functions using jtbl (almost as lame)
c) only spill registers at labels jumped to by jtbl (prepass required)
d) compare register states at forward labels;
- if we can reconcile, done.
- if not, chose a minimal set to do compare+branch to
then use jtbl to target the rest.
e) (d) could be generalized to partition the targets into groups, each of which have compatible register assignments. (how?)
f) if we'd do a prepass anyway, how a register allocator prepass?
| Assignee | ||
Comment 22•16 years ago
|
||
(this is a complete patch)
changes since previous patch:
- LIR_jtbl's jump label table is allocated in LirBufWriter from LirBuffer's allocator so we can't accidentally use the wrong allocator
- no direct access to the table. LIns.getTarget(i) and setTarget(i, LIns*) are used instead.
- workaround for register allocator problem: CodegenLIR (our frontend) puts a LIR_regfence at each label at the end of a forward jump from LIR_jtbl; that way we cannot have a situation at the LIR_jtbl that the allocator cannot resolve.
I'm open to suggestions for how to safeguard LIR_jtbl with an assert; perhaps we have to scan all the targets with registers already assigned to make sure none have incompatible assignments.
Attachment #409125 -
Attachment is obsolete: true
Attachment #409686 -
Flags: review?(nnethercote)
Attachment #409125 -
Flags: review?(rreitmai)
| Assignee | ||
Comment 23•16 years ago
|
||
This update adds a sanity check in jtbl to make sure we aren't in an impossible register allocator situation, along with better comments. I can't think of much else to add to this patch, should be really ready for review/landing now.
Attachment #409686 -
Attachment is obsolete: true
Attachment #409920 -
Flags: review?(nnethercote)
Attachment #409686 -
Flags: review?(nnethercote)
Comment 24•16 years ago
|
||
Comment on attachment 409099 [details] [diff] [review]
PPC and PPC64 support for LIR_jtbl
looks fine.
Aside...I know that R0 and R2 are scratch but I can't find the code that ensures they are not being used. GpRegs seems to include all regs on PPC.
Attachment #409099 -
Flags: review?(rreitmai) → review+
| Assignee | ||
Comment 25•16 years ago
|
||
regAlloc will never allocate a register that isn't being managed, so its harmless (if maybe suboptimal) to give it a mask containing non-managed registers.
Comment 26•16 years ago
|
||
Comment on attachment 409920 [details] [diff] [review]
(update 3) adds LIR_jtbl for x86 and x64
>@@ -614,7 +627,7 @@
> // Note, this assumes that loads will never fault and hence cannot
> // affect the control flow.
> bool isStmt() {
>- return isGuard() || isBranch() ||
>+ return isGuard() || isBranch() || isop(LIR_jtbl) ||
> (isCall() && !isCse()) ||
> isStore() ||
> isop(LIR_label) || isop(LIR_live) || isop(LIR_flive) ||
I think isBranch() should be changed to succeed on LIR_jtbl.
>@@ -807,6 +820,23 @@
> LIns* getLIns() { return &ins; };
> };
>
>+ // used for LIR_jtbl. oprnd_1 must be a uint32_t index in
>+ // the range 0 <= index < size; no range check is performed.
Missing capital letter at start of sentence. Ed, this seems to be your "thing" but I find it makes comments much harder to read.
>+ class LInsJtbl
>+ {
>+ private:
>+ friend class LIns;
>+
>+ uint32_t size; // number of entries in table
>+ LIns** table; // pointer to table[size]
>+ LIns* oprnd_1; // uint32_t index expression
>+
>+ LIns ins;
>+
>+ public:
>+ LIns* getLIns() { return &ins; }
>+ };
Add a comment about the lifetime of 'table', please.
r=me with those changes, but beware that I didn't look over the Tamarin-specific code in any detail.
Updated•16 years ago
|
Attachment #409920 -
Flags: review?(nnethercote) → review+
| Assignee | ||
Comment 27•16 years ago
|
||
| Assignee | ||
Comment 28•16 years ago
|
||
Attachment #409099 -
Attachment is obsolete: true
Attachment #409120 -
Attachment is obsolete: true
Attachment #409920 -
Attachment is obsolete: true
Attachment #411956 -
Flags: review?
Attachment #409120 -
Flags: review?(Jacob.Bramley)
| Assignee | ||
Updated•16 years ago
|
Attachment #411956 -
Flags: review? → review?(dvander)
| Assignee | ||
Updated•16 years ago
|
Whiteboard: fixed-in-nanojit fixed-in-tamarin
Updated•16 years ago
|
Attachment #411956 -
Flags: review?(dvander) → review+
| Assignee | ||
Updated•16 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 29•16 years ago
|
||
Whiteboard: fixed-in-nanojit fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey
Comment 30•16 years ago
|
||
Comment 29 had the NJ-specific part. The following is the TM-specific part:
http://hg.mozilla.org/tracemonkey/rev/30f8f6dcf808
You need to log in
before you can comment on or make changes to this bug.
Description
•