Closed Bug 1547752 Opened 6 years ago Closed 6 years ago

Cranelift: Support jump tables

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: bbouvier, Assigned: lth)

References

Details

Attachments

(1 file, 2 obsolete files)

Currently, Cranelift will emit ifs for jump tables in Spidermonkey. This is not ideal for performance or code size. However, Cranelift already knows how to emit jump tables; the code doesn't know how to pass readonly data from Cranelift to Spidermonkey yet.

Type: defect → task
Type: task → enhancement

Test case that crashes with illegal instruction:

var bin = wasmTextToBinary(
    `(module
       (func (export "f") (param $n i32) (result i32)
         (block $A
           (block $B
             (block $C
               (block $D
                 (block $E
                   (block $F (br_table $A $B $C $D $E $F (local.get $n))
                     (return (i32.const 12)))
                   (return (i32.const 10)))
                 (return (i32.const 8)))
               (return (i32.const 6)))
             (return (i32.const 4)))
           (return (i32.const 2)))
         (return (i32.const 0))))`);

var mod = new WebAssembly.Module(bin);
var ins = new WebAssembly.Instance(mod);
for ( var i=0; i < 10; i++ )
  print(ins.exports.f(i));

Here's the disassembly from lldb:

    0x1d09cf064020: 41 56                 pushq  %r14
    0x1d09cf064022: 55                    pushq  %rbp
    0x1d09cf064023: 48 8b ec              movq   %rsp, %rbp
    0x1d09cf064026: 48 83 ec 08           subq   $0x8, %rsp
    0x1d09cf06402a: 83 ff 05              cmpl   $0x5, %edi
    0x1d09cf06402d: 73 10                 jae    0x1d09cf06403f      ;; skip to default if out of range
    0x1d09cf06402f: 48 8d 05 33 00 00 00  leaq   0x33(%rip), %rax    ;; table addr 0x...69 in rax
    0x1d09cf064036: 48 63 0c b8           movslq (%rax,%rdi,4), %rcx ;; table element to rcx, sign extended
    0x1d09cf06403a: 48 01 c8              addq   %rcx, %rax          ;; add table element to table address
    0x1d09cf06403d: ff e0                 jmpq   *%rax               ;; and jump to it
    0x1d09cf06403f: b8 0a 00 00 00        movl   $0xa, %eax          ;; default case
    0x1d09cf064044: eb 21                 jmp    0x1d09cf064067
    0x1d09cf064046: b8 08 00 00 00        movl   $0x8, %eax
    0x1d09cf06404b: eb 1a                 jmp    0x1d09cf064067
    0x1d09cf06404d: b8 06 00 00 00        movl   $0x6, %eax
    0x1d09cf064052: eb 13                 jmp    0x1d09cf064067
    0x1d09cf064054: b8 04 00 00 00        movl   $0x4, %eax
    0x1d09cf064059: eb 0c                 jmp    0x1d09cf064067
    0x1d09cf06405b: b8 02 00 00 00        movl   $0x2, %eax
    0x1d09cf064060: eb 05                 jmp    0x1d09cf064067
    0x1d09cf064062: b8 00 00 00 00        movl   $0x0, %eax
    0x1d09cf064067: 89 c0                 movl   %eax, %eax          ;; return sequence starts here
    0x1d09cf064069:                       0xfffffff9                 ;; oops, table
    0x1d09cf06406d:                       0xfffffff2                  
    0x1d09cf064071:                       0xffffffeb
    0x1d09cf064075:                       0xffffffe4
    0x1d09cf064079:                       0xffffffdd
    0x1d09cf06407d: 4c 8b 74 24 10        movq   0x10(%rsp), %r14    ;; return sequence continues here
    0x1d09cf064082: 4d 8b 3e              movq   (%r14), %r15
    0x1d09cf064085: 48 83 c4 08           addq   $0x8, %rsp
    0x1d09cf064089: 5d                    popq   %rbp
    0x1d09cf06408a: 41 5e                 popq   %r14
    0x1d09cf06408c: c3                    retq   

The problem is that the jump table is not at the end of the function where it should be, but at the end of the function body before the epilogue. If it hadn't been for that, this would have been fine.

Presumably this is because the pasteup of the code is not aware of the epilogue, which is added after the fact.

(What's below may not be important)

When emit_inst() for the indirect jump encodes the load from the table, it gets the offset wrong in jt_disp4(). Backtrace:

frame #0: js`cranelift_codegen::isa::x86::binemit::jt_disp4::h0ce918d698c8dcf4(jt=(__0 = 0),...) at binemit.rs:341:4
frame #1: js`cranelift_codegen::isa::x86::binemit::emit_inst::h5f31c66d95a9a4c9(...) at binemit-x86.rs:3622:16
frame #2: js`core::ops::function::Fn::call::hd4e75944fb12c8a7(...) at function.rs:69:4
frame #3: js`cranelift_codegen::binemit::emit_function::hfa4c505b876bb8b4(...) at mod.rs:126:12
frame #4: js`_$LT$cranelift_codegen..isa..x86..Isa$u20$as$u20$cranelift_codegen..isa..TargetIsa$GT$\ 
             ::emit_function_to_memory::h615630a7623257c6(...) at mod.rs:132:8

That's correct. Some ways to fix this:

  • stop using the MemoryCodeSink in Spidermonkey and have our own CodeSink impl, which will emit code and keep the readonly data (jump table entries) separately; then the C++ code can request the metadata and put it after the Spidermonkey epilogue.
  • split the CodeSink trait into a CodeSink and ReadonlyDataSink; table entries would get emitted by the readonly data sink instead of the code sink.

relax_branches() in cranelift-codegen/src/binemit/relax-branches.rs computes the final offset of the jump tables and needs to be aware of the prologue, or we need to insert branches around the tables if there are any. Technically relaxation is also a little dodgy because it assumes the code starts at zero, which it does not (because of the prologue that's added later), but I don't see a reason why that's a big deal right now.

Arguably the branch tables should also be laid out on an aligned address so somebody should add some nops when necessary, but it may not matter very much.

I see -- binemit::emit_function emits the code followed by the rodata. But it doesn't have any hooks for prologue or epilogue. (Clearly if it did have such hooks they would have to be coordinated with relaxation, and maybe with other things.)

CodeSink::begin_rodata() could perform the necessary fill for alignment, but only if it could take into account the sizes of prologue (now) and epilogue (when the epilogue comes before the jump tables).

The prologue and epilogue are added in C++ code, by GenerateCraneliftCode() in WasmCraneliftCompile.cpp, after cranlift is done with everything and we have a blob of raw machine code and rodata. This extra code includes stack frame setup and teardown.

Benjamin's solutions in comment 2 amount to moving more logic up into the C++ layer, letting it coordinate the emission of tables after the epilogue has been handled; it would have to offset the table entries additionally but this would not necessarily be bad, and there would be the option of creating padding for alignment too. It's also a little nasty since the tables will have been offset for the cranelift code size, so they're in an inbetween state.

The other solution appears to be to expose hooks that cranelift can call to make space for / generate raw bytes for prologue and epilogue; this way we do more in cranelift. I don't know how much work this is. One nice thing about it is that cranelift-generated code is always finished, it has everything needed.

Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Priority: -- → P3

(In reply to Benjamin Bouvier [:bbouvier] from comment #2)

That's correct. Some ways to fix this:

  • stop using the MemoryCodeSink in Spidermonkey and have our own CodeSink impl, which will emit code and keep the readonly data (jump table entries) separately; then the C++ code can request the metadata and put it after the Spidermonkey epilogue.
  • split the CodeSink trait into a CodeSink and ReadonlyDataSink; table entries would get emitted by the readonly data sink instead of the code sink.

Having looked at this some, the first option looks more appealing - it is a more local fix. The CodeSink needs a couple more methods and we need to duplicate the implementation as well as emit_to_memory() but that's not much of a problem, or doesn't seem to be one so far.

Having pondered it further, the problem is really this instruction in the disas above: leaq 0x33(%rip), %rax. When this is emitted it must know the offset from the instruction to the jump table. This offset is computed during relaxation. So I think what we really want here is a way for relaxation to know about the gap between code and its rodata, and perhaps know about the sizes of prologues and epilogues. This does not have to be super hard, it could be an additional parameter to compile() and would mirror the CodeSink's begin_rodata method (for example).

Having pondered it further, the problem is really this instruction in the disas above: leaq 0x33(%rip), %rax.

Good point; it could just use the largest branch to be cautious, that we'd patch later, but that's more fragile than the right thing you're doing.

Turns out that knowing the sizes of prologues and epilogues when we need to know those sizes (at relaxation time) is not so easy. The prologues and epilogues are generated into the masm before and after we copy in the CL code. They can vary a fair bit. They carry metadata (for traps at least). I'm not convinced that they can be generated early (so that we can know their sizes) and then copied around at will when we need them later, during pasteup.

Really, the cleanest and most forward-looking solution here is some kind of relocation so that we can treat the rodata as movable. We need to emit a relocation at the branch instruction so that we can later patch in the correct pc-relative offset to the table, and then each table must carry some kind of relocation information so that it can be fixed up relative to the location in memory where it is placed. The overhead of this should be manageable. However it does require that we distinguish between code and rodata in the CodeSink so that we can handle them separately.

I think I have found a simple way to do this. The MemoryCodeSink emits simple relocs for the jump table references; we track the offset within the final CL output where the code ends and the rodata begins; then during pasteup on the C++ side we split the CL output apart, patch up the references in the code part, and fixup the offsets in the rodata part. This requires only a small amount of code on both sides of the fence.

patch up the references in the code part

Can we do this? It seems that we can end up in a situation where:

  • a branch emitted in the code is just below a branch range limit
  • we insert the custom epilogue
  • now the range between branch and jump table is slightly off the branch range

(Unless we conservatively assume the biggest branch range, as suggested before)

The addend for computing the branch table base must be four bytes, but it already is that (see disassembly above). And that should be the only constraint in practical terms - the branch table has four-byte entries.

Spidermonkey side.

Cranelift side.

This patch uses the machinery submitted in
https://github.com/CraneStation/cranelift/pull/774 to split the jump
table (and any other read-only data) apart from the compiled code, so
as to make room for the epilogue and proper alignment. The only
difficult part about this is to update the instructions referencing
the tables, and the table entries, when the tables are moved.

Nice. Did you get some performance numbers that showed improvements in our set of tracked benchmarks? Did any issue related to https://github.com/CraneStation/cranelift/issues/732 show up in our test suite?

Re performance, I think the mac laptop that I use for benchmarking is a little noisy, but it looks like ~5% win on binarytrees and otherwise nothing interesting for the tracked benchmarks. (Of the untracked ones there was a consistent > 20% gain on primes; the most likely explanation for this is that the implementation of sqrt, which is fairly hot in this program, uses a switch.)

No issues re that cranelift ticket.

Attachment #9066414 - Attachment is obsolete: true
Attachment #9066413 - Attachment is obsolete: true
Attachment #9067303 - Attachment description: Bug 1547752 - Support jump tables in cranelift. r?bbouvier → Bug 1547752 - Support jump tables in cranelift. r=bbouvier
Depends on: 1556641
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: