Closed Bug 465582 Opened 17 years ago Closed 16 years ago

add LIR_jtbl for switch-style jump tables

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

flash10.1

People

(Reporter: edwsmith, Assigned: edwsmith)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey)

Attachments

(1 file, 9 obsolete files)

prototype for review (x86 only so far) 17 years ago Edwin Smith 22.09 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review
Allows infinite long jump tables 17 years ago Zoltan Herczeg 3.17 KB, patch		Details \| Diff \| Splinter Review
Simplified & rebased patch, x86 only 16 years ago Edwin Smith 25.30 KB, patch		Details \| Diff \| Splinter Review
cleaned up a bit, added x86-64 support 16 years ago Edwin Smith 33.00 KB, patch		Details \| Diff \| Splinter Review
PPC and PPC64 support for LIR_jtbl 16 years ago Edwin Smith 2.10 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review
ARM support for LIR_jtbl 16 years ago Edwin Smith 1.57 KB, patch		Details \| Diff \| Splinter Review
(updated) adds LIR_jtbl for x86 and x64 16 years ago Edwin Smith 34.21 KB, patch		Details \| Diff \| Splinter Review
(update 2) adds LIR_jtbl for x86 and x64 16 years ago Edwin Smith 37.68 KB, patch		Details \| Diff \| Splinter Review
(update 3) adds LIR_jtbl for x86 and x64 16 years ago Edwin Smith 38.97 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
tracemonkey patch to compile with nanojit+LIR_jtbl 16 years ago Edwin Smith 570 bytes, patch	dvander : review+	Details \| Diff \| Splinter Review

Edwin Smith

Assignee

Description

•

17 years ago

CodegenLIR in tamarin implements switch as a linear set of conditional branches. We need a table jump, instead. oprnd 1 = index (0..N) oprnd 2 = ref to LIR_skip whose payload is N LIns* entries pointing to N LIR_labels. restrictions: * N is limited to something smaller than the LIR page size, anything larger must be broken up with either anohter jump table (two levels) or a few conditional branches to choose the right jump table. * if any of the labels are backwards-edges, then the whole table jump needs to be treated conservatively as a backward edge since we dont know the register state at at least one target.

Edwin Smith

Assignee

Comment 1

•

17 years ago

Attached patch prototype for review (x86 only so far) (obsolete) — Details — Splinter Review

Attachment #348815 - Flags: review?(rreitmai)

Rick Reitmaier

Comment 2

•

17 years ago

Comment on attachment 348815 [details] [diff] [review] prototype for review (x86 only so far) So we're building into LIR (at least CodegenLIR) the concept of a jmp table having to be less than 256 entries. Is this too architecture specific? Regarding the LIR logic for handling the table branch, I wonder if we can encode the cascade of jumps to be binary rather than linear (assuming i'm correct in that for large values of the input parameter we're doing multiple jt's until we reach the 'upper' part of the table). Maybe breaking the table into 256 entries and then using successive jtbl's to ultimately land on the final target would be one approach. Of course the overhead of multiple indirect branches might be high, but i can't imagine worse than a long cascade of branches. And finally a few assorted comments: - underrunProtect should +4 to account for aligned ptr - not necc but would be more consistent with rest of code if indirect jump instruction encoder was a macro - LirBufWriter insTable() should this be protected from a large nlabels parameter?

Attachment #348815 - Flags: review?(rreitmai) → review+

Edwin Smith

Assignee

Comment 3

•

17 years ago

(In reply to comment #2) > (From update of attachment 348815 [details] [diff] [review]) > So we're building into LIR (at least CodegenLIR) the concept of > a jmp table having to be less than 256 entries. Is this too architecture > specific? We ought to compute something based on the LIR page size. the table limit must also be a power of two. There are no CPU-specific limits in play here. > Regarding the LIR logic for handling the table branch, I wonder > if we can encode the cascade of jumps to be binary rather than > linear (assuming i'm correct in that for large values of the input > parameter we're doing multiple jt's until we reach the 'upper' > part of the table). > > Maybe breaking the table into 256 entries and then using > successive jtbl's to ultimately land on the final target would > be one approach. Of course the overhead of multiple indirect > branches might be high, but i can't imagine worse than a long > cascade of branches. Either would reduce the cascade cost. I'll see what I can do in short order. i'm not sure how much time i want to spend on it, but better now than later, probably. > And finally a few assorted comments: > > - underrunProtect should +4 to account for aligned ptr fixed > - not necc but would be more consistent with rest of code if indirect jump > instruction encoder was a macro of course > - LirBufWriter insTable() should this be protected from a large nlabels > parameter? It looks like skip is already protected, and i don't think we need a redundant check. if skip could fail in any way other than assert, we should at least check for failure though. it doesn't look like it can.

Zoltan Herczeg

Comment 4

•

17 years ago

Attached patch Allows infinite long jump tables (obsolete) — Details — Splinter Review

Attachment #351138 - Flags: review?(edwsmith)

Zoltan Herczeg

Updated

•

17 years ago

Attachment #351138 - Attachment is patch: true

Attachment #351138 - Attachment mime type: application/octet-stream → text/plain

Zoltan Herczeg

Comment 5

•

17 years ago

I made a patch to extend the current implementation with infinite long jump tables. Instead of improving the LIR_skip operation, my patch allows to connect LIR_jtbl instructions, all have their finite label list. Something like this: nj_op_switch_imm1 = call nj_op_switch_imm ( jitFrame ) jtbl nj_op_switch_imm1 -> ? ? ? ? ? jtbl <join> -> ? ? ? ? ? jtbl <join> -> ? ? ? ? ? jtbl <join> -> ? ? ? ? ? jtbl <join> -> ? ? ? The operand 1 of the first LIR_jtbl is non-NULL, while the others must be defined with NULL for operand 1. This is the generated code: 0x8322eed call nj_op_switch_imm 0x8322ef2 add esp,16 jtbl nj_op_switch_imm1 -> label21 label1 label2 label3 label4 0x8322efc [0x8322efc]: 0x8322efc 0: [&label21] 0x8322f00 1: [&label1] 0x8322f04 2: [&label2] 0x8322f08 3: [&label3] 0x8322f0c 4: [&label4] jtbl <join> -> label5 label6 label7 label8 label21 0x8322f10 0: [&label5] 0x8322f14 1: [&label6] 0x8322f18 2: [&label7] 0x8322f1c 3: [&label8] 0x8322f20 4: [&label9] [...] My working directory is a little different compared to the tamarin/nanojit repository, but I am sure you will understand my changes. Fortunately, only small changes are required.

Edwin Smith

Assignee

Comment 6

•

17 years ago

how are really long jump tables handled in the assembler, where pages are allocated in 4K chunks? is it up to the cpu-specific assembler to figure out how to divide up the table?

Zoltan Herczeg

Comment 7

•

17 years ago

(In reply to comment #6) > how are really long jump tables handled in the assembler, where pages are > allocated in 4K chunks? is it up to the cpu-specific assembler to figure out > how to divide up the table? Does it mean, the pages are not concatenated after the code generation, and the result is a set of small blocks with several unconditional jumps instead of one big memory chunk? If that is true, my approach is bad. I thought the jump table pages are concatenated during the final phase, but if there is no final phase at all...

Andreas Gal :gal

Comment 8

•

17 years ago

The code cache is a collection of 4k pages with jumps connecting them if code flows across two pages. I am planning on adding support for larger pages but conceptually the approach will remain the same.

Edwin Smith

Assignee

Comment 9

•

17 years ago

C(In reply to comment #7) > Does it mean, the pages are not concatenated after the code generation, and the > result is a set of small blocks with several unconditional jumps instead of one > big memory chunk? If that is true, my approach is bad. I thought the jump table > pages are concatenated during the final phase, but if there is no final phase > at all... Correct, there is no final compaction or code motion phase. If we allow unlimited size LIR_jtbl, then it just moves the problem to the (several) backends to handle individually. If we restrict LIR_jtbl, backends can support that directly assuming the size of the jump table is the same in native code as in lir (true), and if the native page size is the same as LIR (likely but not required). It would be nice if a backend could look at LIR_jtbl and discover that it was just part of a jump table heirarchy, and turn the whole collection into a single larger jump table, as an optimization. Any ideas on how to encode jtbl so that is possible?

Edwin Smith

Assignee

Updated

•

17 years ago

Attachment #351138 - Flags: review?(edwsmith)

naasking

Comment 10

•

16 years ago

(In reply to comment #0) > CodegenLIR in tamarin implements switch as a linear set of conditional > branches. We need a table jump, instead. I would like to suggest that jump-tables are not necessarily the ideal representation for a switch. Experiments indicate that dispatch tables incur inordinate overheads due to unpredictable branches [1]. Up to about 4 cases, the current linear sequence of conditionals is optimal, and binary tree based dispatch optimal from 4 to 10 cases. Dispatch tables should only be used for more than 10 cases. [1] http://www.sagecertification.org/events/javavm02/full_papers/zendra/zendra_html/index.html

Michael Sullivan

Updated

•

16 years ago

Blocks: 506182

Michael Sullivan

Comment 11

•

16 years ago

My nanojit based inline threading work (Bug 506182) needs LIR_jtbl (or LIR_ji) in order to implement switch, generators, and reentering generated code after falling off trace.

Edwin Smith

Assignee

Comment 12

•

16 years ago

it occurs to me that the table-size limitations mentioned previously are all but gone in nanojit now, because Fields in LIR instructions are generally word sized: a length field can be big, and a table pointer can point directly to a long jump table allocated with nanojit::Allocator (the same one used for LirBuffer, with the same lifetime), without any LIR_skip pain. The backend for generating a jump table also can put the jump table in nanojit::Allocator memory, arbitrarily large -- its data after all and doesn't have to be allocated through CodeAlloc.

Brent Baker

Updated

•

16 years ago

Flags: flashplayer-qrb?

Edwin Smith

Assignee

Updated

•

16 years ago

OS: Windows XP → All

Priority: -- → P2

Hardware: x86 → All

Target Milestone: --- → flash10.1

Edwin Smith

Assignee

Updated

•

16 years ago

Assignee: nobody → edwsmith

Status: NEW → ASSIGNED

Edwin Smith

Assignee

Comment 13

•

16 years ago

Attached patch Simplified & rebased patch, x86 only (obsolete) — Details — Splinter Review

new opcode: LIR_jtbl. jtbl takes a uint32_t index and a table of label references (LIns**), representing a jump to one of the labels. the index must be in range (range checking is not included in the opcode). the table must be allocated in memory at least as long lived as the LIR. In the x86 backend, this is implemented with a simple jmp [R*4+ADDR] where ADDR is the address of the table. I added a new dataAllocator (Allocator&) parameter to Assembler, which is used for allocating data along with code (data & code have same lifetime). The x86 backend allocates the final table of addresses from this allocator and embeds the pointer to the table in code. opinion wanted: in principle we could avoid the extra dataAllocator parameter by changing the semantics of LIR_jtbl's table. + if the table were used for LIns* addresses (of LIR_label instructions) and then was overrwritten with native code addresses, then we'd have one less allocation and no need for the dataAllocator parameter to Assembler. + since the host vm allocates this table and knows its address, the native addresses inside can be patched later, without much extra hassle (assuming of course that such patching wouldn't alter dataflow assumptions, live ranges, etc) - but, it seems wonky to overload the table... as currently implemented, LIR data and native data are disjoint. - overloading the table would seriously hose LIR inlining Later, we probably want to use dataAllocator for constant pools on cpu's that like to have constant pools (i.e. everyone except x86) and can afford to dedicate a register to point to it.

Attachment #348815 - Attachment is obsolete: true

Attachment #351138 - Attachment is obsolete: true

Attachment #408880 - Flags: review?(nnethercote)

Edwin Smith

Assignee

Comment 14

•

16 years ago

caveat: the jump instruction itself is open-coded but i'll macroize it like other instructions.

Trevor Baker

Updated

•

16 years ago

Flags: flashplayer-qrb? → flashplayer-qrb+

Edwin Smith

Assignee

Comment 15

•

16 years ago

Attached patch cleaned up a bit, added x86-64 support (obsolete) — Details — Splinter Review

Attachment #408880 - Attachment is obsolete: true

Attachment #408880 - Flags: review?(nnethercote)

Edwin Smith

Assignee

Comment 16

•

16 years ago

Attached patch PPC and PPC64 support for LIR_jtbl (obsolete) — Details — Splinter Review

Edwin Smith

Assignee

Comment 17

•

16 years ago

Attached patch ARM support for LIR_jtbl (obsolete) — Details — Splinter Review

Edwin Smith

Assignee

Comment 18

•

16 years ago

Attached patch (updated) adds LIR_jtbl for x86 and x64 (obsolete) — Details — Splinter Review

Attachment #409098 - Attachment is obsolete: true

Attachment #409125 - Flags: review?(rreitmai)

Edwin Smith

Assignee

Updated

•

16 years ago

Attachment #409099 - Flags: review?(rreitmai)

Edwin Smith

Assignee

Updated

•

16 years ago

Attachment #409120 - Flags: review?(Jacob.Bramley)

Nicholas Nethercote [inactive]

Comment 19

•

16 years ago

Comment on attachment 409125 [details] [diff] [review] (updated) adds LIR_jtbl for x86 and x64 I'm in two minds about storing the jump table separately from the LIR. One one hand, it's nice to not have another variable-width LIns (as calls are, and skips were). On the other hand, it has potential for memory errors, either dangling pointers or leaks. I guess Valgrind can catch such errors, but it'd be nicer if they weren't possible. As for the dataAllocator parameter: please keep it! Getting rid of it sounds like a dirty hack that will cause no end of problems in the future. >diff -r 71c36aa36b66 nanojit/Assembler.cpp >--- a/nanojit/Assembler.cpp Thu Oct 29 08:19:44 2009 -0400 >+++ b/nanojit/Assembler.cpp Thu Oct 29 13:58:04 2009 -0400 >+ Register Assembler::registerAllocTmp(RegisterMask allow) >+ { >+ Register tmp = registerAlloc(allow); >+ _allocator.addFree(tmp); >+ return tmp; > } I introduced a function called registerAllocTmp() in bug 513863, which is different to this one. Can you just undo your registerAllocTmp() changes and let bug 513863 deal with it? >+ Allocator& alloc; // for use while assembling >+ CodeAlloc& _codeAlloc; // for code we generate >+ Allocator& _dataAlloc; // for data used by generated code The comments for the latter two are good, but the one for 'alloc' isn't very illuminating. Can you clarify further? >+ // - Jtbl encodes a jump table with a variable # of targets as follows >+ // >+ // [ table <-- pointer to LIns*[] for each target >+ // size >+ // opcode + resv ] <-- LIns* ins >+ // This comment isn't necessary... LInsJtbl isn't particularly different to the others (unlike skips and calls, which are different and so do merit their own comment). And it doesn't match the code anyway (field order is wrong, oprnd_1 is missing, the sense of the "<--" arrow is different :). > [...] >+ // used for LIR_jtbl >+ class LInsJtbl >+ { >+ private: >+ friend class LIns; >+ >+ uint32_t size; // number of entries in table >+ LIns** table; // pointer to table with size entries >+ LIns* oprnd_1; // expression used to index the table >+ >+ LIns ins; >+ >+ public: >+ LIns* getLIns() { return &ins; } >+ }; Can you mention in the class comment that 'oprnd_1' isn't range-checked against 'size'? And also something about the lifetime of the table, ie. that it must match the LIR's lifetime.

Edwin Smith

Assignee

Updated

•

16 years ago

Depends on: 513863

Edwin Smith

Assignee

Comment 20

•

16 years ago

(In reply to comment #19) > (From update of attachment 409125 [details] [diff] [review]) > I'm in two minds about storing the jump table separately from the LIR. One one > hand, it's nice to not have another variable-width LIns (as calls are, and > skips were). On the other hand, it has potential for memory errors, either > dangling pointers or leaks. I guess Valgrind can catch such errors, but it'd > be nicer if they weren't possible. the paranoia is justified. Since LirBuffer always gets its memory from a particular (vm provided) nanojit::Allocator, how I rearrange the api to always use that one. That way it will be forced to have the same lifetime as the jtbl instruction using it. Maybe: drop the table parameter to insJtbl, and have it allocate it internally from the same allocator that allocates LIR itself, then hosts can do jtbl_ins->getTable() and fill it in as entries are patched. > As for the dataAllocator parameter: please keep it! Getting rid of it sounds > like a dirty hack that will cause no end of problems in the future. coolio > I introduced a function called registerAllocTmp() in bug 513863, which is > different to this one. Can you just undo your registerAllocTmp() changes and > let bug 513863 deal with it? no problem, dependency added. > >+ Allocator& alloc; // for use while assembling > >+ CodeAlloc& _codeAlloc; // for code we generate > >+ Allocator& _dataAlloc; // for data used by generated code > > The comments for the latter two are good, but the one for 'alloc' isn't very > illuminating. Can you clarify further? yep > This comment isn't necessary... LInsJtbl isn't particularly different to the > others (unlike skips and calls, which are different and so do merit their own > comment). And it doesn't match the code anyway (field order is wrong, oprnd_1 > is missing, the sense of the "<--" arrow is different :). doh, yeah i had to rearrange to put oprnd1 in the right place. will drop the comment. > Can you mention in the class comment that 'oprnd_1' isn't range-checked against > 'size'? > And also something about the lifetime of the table, ie. that it must match the > LIR's lifetime. sure.

Edwin Smith

Assignee

Comment 21

•

16 years ago

One snag: Neither unionRegisterState() nor intersectRegisterState() can deal with the situation when N forward edges from a LIR_jtbl can't be reconciled. Once the code at two labels has been generated and both want different instructions in the same register, there's nothing to be done except somehow insert code along the jtbl -> label edge that fixes registers. When the jtbl is replaced by a series of jt's then registers can be shuffled inbetween each jt. brainstorming: a) spill all registers at all labels (lame!) b) same as (a) but only in functions using jtbl (almost as lame) c) only spill registers at labels jumped to by jtbl (prepass required) d) compare register states at forward labels; - if we can reconcile, done. - if not, chose a minimal set to do compare+branch to then use jtbl to target the rest. e) (d) could be generalized to partition the targets into groups, each of which have compatible register assignments. (how?) f) if we'd do a prepass anyway, how a register allocator prepass?

Edwin Smith

Assignee

Comment 22

•

16 years ago

Attached patch (update 2) adds LIR_jtbl for x86 and x64 (obsolete) — Details — Splinter Review

(this is a complete patch) changes since previous patch: - LIR_jtbl's jump label table is allocated in LirBufWriter from LirBuffer's allocator so we can't accidentally use the wrong allocator - no direct access to the table. LIns.getTarget(i) and setTarget(i, LIns*) are used instead. - workaround for register allocator problem: CodegenLIR (our frontend) puts a LIR_regfence at each label at the end of a forward jump from LIR_jtbl; that way we cannot have a situation at the LIR_jtbl that the allocator cannot resolve. I'm open to suggestions for how to safeguard LIR_jtbl with an assert; perhaps we have to scan all the targets with registers already assigned to make sure none have incompatible assignments.

Attachment #409125 - Attachment is obsolete: true

Attachment #409686 - Flags: review?(nnethercote)

Attachment #409125 - Flags: review?(rreitmai)

Edwin Smith

Assignee

Comment 23

•

16 years ago

Attached patch (update 3) adds LIR_jtbl for x86 and x64 (obsolete) — Details — Splinter Review

This update adds a sanity check in jtbl to make sure we aren't in an impossible register allocator situation, along with better comments. I can't think of much else to add to this patch, should be really ready for review/landing now.

Attachment #409686 - Attachment is obsolete: true

Attachment #409920 - Flags: review?(nnethercote)

Attachment #409686 - Flags: review?(nnethercote)

Rick Reitmaier

Comment 24

•

16 years ago

Comment on attachment 409099 [details] [diff] [review] PPC and PPC64 support for LIR_jtbl looks fine. Aside...I know that R0 and R2 are scratch but I can't find the code that ensures they are not being used. GpRegs seems to include all regs on PPC.

Attachment #409099 - Flags: review?(rreitmai) → review+

Edwin Smith

Assignee

Comment 25

•

16 years ago

regAlloc will never allocate a register that isn't being managed, so its harmless (if maybe suboptimal) to give it a mask containing non-managed registers.

Nicholas Nethercote [inactive]

Comment 26

•

16 years ago

Comment on attachment 409920 [details] [diff] [review] (update 3) adds LIR_jtbl for x86 and x64 >@@ -614,7 +627,7 @@ > // Note, this assumes that loads will never fault and hence cannot > // affect the control flow. > bool isStmt() { >- return isGuard() || isBranch() || >+ return isGuard() || isBranch() || isop(LIR_jtbl) || > (isCall() && !isCse()) || > isStore() || > isop(LIR_label) || isop(LIR_live) || isop(LIR_flive) || I think isBranch() should be changed to succeed on LIR_jtbl. >@@ -807,6 +820,23 @@ > LIns* getLIns() { return &ins; }; > }; > >+ // used for LIR_jtbl. oprnd_1 must be a uint32_t index in >+ // the range 0 <= index < size; no range check is performed. Missing capital letter at start of sentence. Ed, this seems to be your "thing" but I find it makes comments much harder to read. >+ class LInsJtbl >+ { >+ private: >+ friend class LIns; >+ >+ uint32_t size; // number of entries in table >+ LIns** table; // pointer to table[size] >+ LIns* oprnd_1; // uint32_t index expression >+ >+ LIns ins; >+ >+ public: >+ LIns* getLIns() { return &ins; } >+ }; Add a comment about the lifetime of 'table', please. r=me with those changes, but beware that I didn't look over the Tamarin-specific code in any detail.

Nicholas Nethercote [inactive]

Updated

•

16 years ago

Attachment #409920 - Flags: review?(nnethercote) → review+

Edwin Smith

Assignee

Comment 27

•

16 years ago

tamarin-redux: http://hg.mozilla.org/tamarin-redux/rev/84d27b8008af nanojit: http://hg.mozilla.org/projects/nanojit-central/rev/ee709eaaa30f

Brent Baker

Updated

•

16 years ago

Depends on: 528182

Edwin Smith

Assignee

Comment 28

•

16 years ago

Attached patch tracemonkey patch to compile with nanojit+LIR_jtbl — Details — Splinter Review

Attachment #409099 - Attachment is obsolete: true

Attachment #409120 - Attachment is obsolete: true

Attachment #409920 - Attachment is obsolete: true

Attachment #411956 - Flags: review?

Attachment #409120 - Flags: review?(Jacob.Bramley)

Edwin Smith

Assignee

Updated

•

16 years ago

Attachment #411956 - Flags: review? → review?(dvander)

Edwin Smith

Assignee

Updated

•

16 years ago

Whiteboard: fixed-in-nanojit fixed-in-tamarin

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

16 years ago

Attachment #411956 - Flags: review?(dvander) → review+

Edwin Smith

Assignee

Updated

•

16 years ago

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Nicholas Nethercote [inactive]

Comment 29

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/c29687074114

Whiteboard: fixed-in-nanojit fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey

Nicholas Nethercote [inactive]

Comment 30

•

16 years ago

Comment 29 had the NJ-specific part. The following is the TM-specific part: http://hg.mozilla.org/tracemonkey/rev/30f8f6dcf808

Chris Peyer

Comment 31

•

15 years ago

Engineering work item. Marking as verified.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.