Closed Bug 475115 Opened 15 years ago Closed 15 years ago

TM: Generate a jump table for the side exits for JSOP_TABLESWITCH

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: dmandelin)

References

Details

(Keywords: fixed1.9.1)

Attachments

(2 files, 5 obsolete files)

See the attachment for an example of the kind of program we want to speed up and proof that it needs this kind of speeding up. This should help for other VMs implemented in JS.

Currently, the problem with these VMs is that with 256 or so opcodes, and MAX_BRANCHES=16, most opcodes will never get native code, so TM exits back to the interpreter most times through the loop, and calls js_MonitorLoopEdge, an expensive function, for each iteration. Also, for the traces we do compile, we exit quickly, so we end up calling js_ExecuteTree a lot relative to the number of traces executed.

If we set MAX_BRANCHES=64, then we get a lot of native code, but we still don't run fast. The problem is that for a dense switch, the code ends up going through 0-64 side exits every time through the switch statement, thus adding an average of 32 guards to the cost of every iteration through the VM loop. 

Our solution is to generate an indirect jump guard instead of 64 conditional jump guards. Here is an outline of the implementation, from Andreas:

- During LIR generation, when he hit a tableswitch for the "first" time (see below), generate a guard using LIR_xi, a new LIR instruction. (xi is for "exit indirect"--we can fight over names once it works). The side exit will return from the native trace to the inside of js_ExecuteTree.

- During LIR compilation, compile LIR_xi to a table-based jump using the standard compiler technique. With what I have described so far, the jump will go to the trace continuation point for the value observed during trace recording, and the side exit for all other table entries.

- Background: When we exit a trace, we try to find a compatible trace to call to, and if there isn't one, start recording a new trace at the current point. If that trace is successfully compiled, we patch the side exit to jump to this trace.

- When we exit from a trace via LIR_xi, we will restart the tableswitch instruction as usual and try to record a new trace. But in this condition, we will generate a LIR_xip instruction (another new one) for the tableswitch. LIR_xip doesn't correspond to any executable code. Rather, it is a directive to the compiler to patch a jump table instead of patching the side exit.

All together, the effect is to create a mostly empty jump table the first time we see a tableswitch, and then create the entries on demand. Later, we'll probably need tuning, e.g., to create the table only if the tableswitch has as least 3 elements.
I got mostly finished using the design above, but then I realized that the design in comment 0 has 2 problems that mean it probably can't work. First, the jump table goes directly from one trace to the next trace, without executing any exit block code, which is mandatory before leaving a trace. Second, when we compile LIR_xip, we are supposed to patch a table to jump to the head of the current trace, but of course we don't know the head of the current trace until we are done compiling it.

New design:

- The trace will always end at a tableswitch that is compiled using a jump table. All cases can eventually be traces linked off guards at the end of that trace.

- During trace recording, when we hit a tableswitch for the first [*] time, we generate two guards. These guards implement exit points for every case of the switch, using efficient jump table code, and initially jump to the standard epilog.
  - The first guard tests whether the value is an int or in range, i.e., for the default case. This is a standard xt/xf guard. I will call this guard/side exit GR.
  - The second guard always exits via a jump table. This is a new guard type, xi. For an xi guard, nFragExit must generate a jump table with all entries initially pointing to the standard epilog code. (c.f. a standard exit block, which ends with a JMP to the standard epilog code.) 
  [*] Hitting a tableswitch for the first time: We can tell that we are hitting it after these exits if the anchor is non-null, has exit type SWITCH_EXIT, and the fragment ip is equal to the tableswitch pc. Otherwise we are at a new tableswitch.

- The LIR_xi generates code that computes (expr-low) into a specific register (say, EBX), where expr is the switch expression, and low is the least case value.

- The guard record of a LIR_xi contains a pointer to a SwitchInfo struct with additional data (in LIR) that will be needed during native code generation: 
  - count: the number of cases (i.e., size of the jump table)
  - table: the address of the jump table (filled in by nFragExit)

- As above, nFragExit will generate a jump table when the guard instruction is a LIR_xi.

- When we start trace recording after leaving one of these exits, the current instruction will be the tableswitch again. This time, we don't generate any LIR. All we have to do is record enough information in the fragment or trace recorder to patch the exit correctly. Currently, this is done simply by patching the jump given in the guard record. When we record a tableswitch case, we will know that we are doing so because of the anchor and anchor exit type. The only extra piece of information we need is the value of the switch instruction on this trace (and whether it is a default case or not). Logically, that seems to go alongside 'anchor' in the trace recorder. Then we just modify the code that patches the anchor exit to jump to this trace so that it can handle jump tables.

The only thing I don't like about this so far is that there is an unnecessary jump from the main trace to the exit that contains the jump table. It would be nicer to keep the jump table inline on the main trace code area. I don't think fixing that is all that difficult: it should work to simply place the exit block code in the main code area.
Attached patch Preview patch (obsolete) — Splinter Review
This patch speeds up the toy VM and the jsMSX web demo. It crashes on JSSpeccy. I think the problem is that the jump table is too big. I'm going to check out bug 465582 for a solution that problem.
Attached patch Preview patch 2 (obsolete) — Splinter Review
This one works on both the msx and speccy pages and visibly speeds up both. With this patch I think we're faster than non-tracing on all VM-like workloads I've tried so far. 

Next steps would be to clean up for checkin or investigate perf more deeply. I guess it depends if we want to put it in b3. It's a tough call--I'd really like a beta test vector for a change like this, but I don't want to rush it in or further delay b3. I'll start with cleanup until someone tells me otherwise.
Attachment #359150 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
Attachment #359431 - Attachment is obsolete: true
Attachment #359441 - Flags: review?(danderson)
Attachment #359441 - Flags: review?(danderson) → review+
Attachment #359441 - Flags: review?(gal)
Attached patch Patch, improving tuning (obsolete) — Splinter Review
The original patch regressed string-unpack-code. This one has better tuning, so it has no SS regression but still brings jsMSX to parity with non-tracing. I think it could benefit somewhat from further tuning but I'd rather get it pushed now so the testing can begin.
Attachment #359441 - Attachment is obsolete: true
Attachment #360337 - Flags: review?(gal)
Attachment #359441 - Flags: review?(gal)
Fixes issues mentioned in IRC.
Attachment #360337 - Attachment is obsolete: true
Attachment #360381 - Flags: review?(gal)
Attachment #360337 - Flags: review?(gal)
Attached patch Latest patchSplinter Review
Changes: (1) macro-ize special LEA, (b) cross-platform jump table generator, (c) moved SwitchInfo * from GuardRecord to SideExit & made it typed (instead of void*).
Attachment #360381 - Attachment is obsolete: true
Attachment #360632 - Flags: review?(gal)
Attachment #360381 - Flags: review?(gal)
Comment on attachment 360632 [details] [diff] [review]
Latest patch

Looks good. The NANOJIT_IA32 is just a hack until ARM and AMD64 support is in place I assume? The MAX_UNDERRUN_PROTECT of 3200 worries me a bit but if I remember correctly thats used in an assert only.
Attachment #360632 - Flags: review?(gal)
Attachment #360632 - Flags: review?(edwsmith)
Attachment #360632 - Flags: review+
(In reply to comment #8)
> (From update of attachment 360632 [details] [diff] [review])
> Looks good. The NANOJIT_IA32 is just a hack until ARM and AMD64 support is in
> place I assume? The MAX_UNDERRUN_PROTECT of 3200 worries me a bit but if I
> remember correctly thats used in an assert only.

Correct on both counts.
Attachment #360632 - Flags: review?(edwsmith) → review+
Comment on attachment 360632 [details] [diff] [review]
Latest patch

Bug 465582 is an unlanded patch to add LIR_jtbl, for table-switch jumps at the LIR level.  So, two small optional suggestions:

1. xtbl instead of xi?  (cosmetic, but after all it is table based, not a true indirection to a computed address)

2. the backend half of jtbl will probably involve similar if not the same cogen as xi/xtbl (jump to an address in a table, which presumably gets patched to a trace at some point).  so any factoring that allows doing this once, would be good.  obviously if it doesn't happen for xi/xtbl then it will later for jtbl anyway, so it can be deferred too.

for what its worth, the jtbl patch handles arbitrarily large jump tables by requiring the frontend to break up tables into pieces that fit in the page size.  this is the main reason i haven't landed the patch yet.  a) its hacky, b) selecting the sub-table ought to be done with a binary search or a two-level switch, instead of cascading if's as it does currently.
(In reply to comment #10)

I will change xi to xtbl. On point 2, I think the functions I created are generic-y enough that they can be adapted to jtbl. Selecting fields to pass information through (like what item to patch) was the hardest part of coding this, so I wouldn't be surprised if that aspect needs extension or revision, but I sure couldn't predict how it will work now.
Pushed to TM as 99f3744acfef.
http://hg.mozilla.org/mozilla-central/rev/99f3744acfef
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: