JM: generate inline code for JSOP_TABLESWITCH

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: jandem, Assigned: jandem)

Tracking

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(blocking2.0 -, status2.0 wanted)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

9 years ago
JSNES has *many* calls to stubs::TableSwitch. I will try to write a hackish patch to generate inline code for this and measure if it's worthwhile. 

This won't help SS/V8, as only deltablue has about 600 calls to stubs::TableSwitch.
(Assignee)

Updated

9 years ago
Blocks: WebJSPerf
(Assignee)

Updated

9 years ago
Assignee: general → jandemooij
Status: NEW → ASSIGNED
Hmm... I thought I implemented this in the last JM and sstangl had ported it. sstangl, do you have a patch lying around?
The only tricky part is going to be allocating the big table buffer and relocating the addresses inside it. Might need to extend the Assembler.
I think I just used DataLabel32s and patched them up in the previous JM iteration.
(Assignee)

Comment 4

9 years ago
Posted patch Patch v1 (obsolete) — Splinter Review
Consider this microbenchmark:
----------
function f(a) {
    var t0 = new Date;
    
    for(var i=0; i<10000000; i++) {
        var b;
        switch(a) {
            case 1: b = 2; break;
            case 2: b = 3; break;
            case 3:
            case 4: b = 5; break;
            case 5: b = 6; break;
            case 6: b = 7; break;
            case 7: b = 8; break;
            case 8: b = 9; break;
            case 9: b = 10; break;
        }
    }
    print(new Date - t0);
}
f(7);
----------
We get the following times, after subtracting the empty loop times:
- JM before: 161 ms.
- JM after: 36 ms.
- TM: 70 ms.
- V8: 60 ms.

(note that V8 does not generate a jump table; changing f(7) to f(0) makes V8 faster, likely O(n))

Generated asm looks like this:

0xf7fc42ea:	cmpl   $0xffff0001,0x74(%ebx)
0xf7fc42f1:	jne    0xf7fc46d2
0xf7fc42f7:	mov    $0xf7fc4a33,%ecx
0xf7fc42fc:	sub    $0x1,%edx
0xf7fc42ff:	cmp    $0x9,%edx
0xf7fc4302:	jae    0xf7fc43a3
0xf7fc4308:	jmp    *(%ecx,%edx,4)

Running JSNES for a few seconds results in millions of stubcalls to JSOP_TABLESWITCH. This patch eliminates them, but performance win seems minimal (hard to see though, FPS fluctuates a lot).

So, I'm not sure if this is worthwhile. Maybe after FF4 or for some unknown benchmark?
(Assignee)

Comment 5

9 years ago
Posted patch Patch v2 (obsolete) — Splinter Review
Simplify the code a bit. This is not commit-ready, it has some small hacks and a compile warning (about function-pointer to void* cast). But good enough to measure performance. Also, feel free to close this as WONTFIX.
Attachment #470767 - Attachment is obsolete: true
It's a good idea: I think we will want to take this, after we have landed JM and sorted all that out.
I'm curious why we're so much slower right now - if other engines don't fast-path either.
(Assignee)

Comment 8

9 years ago
(In reply to comment #7)
> I'm curious why we're so much slower right now - if other engines don't
> fast-path either.

V8 does not generate a jump table, but it does generate a comparison for each case label (hence the O(n) behavior). TM also generates a jump table. JSC seems to do a stub call to a jumptable-stub, but I don't have a standalone JSC shell on Linux to test its performance.
(Assignee)

Updated

9 years ago
Blocks: 608733
(Assignee)

Comment 9

9 years ago
Posted patch Patch (obsolete) — Splinter Review
Updated + cleaned up a bit. This works on x86 and x64; for ARM it still does a stub call because we need to implement masm.jump(BaseIndex) for ARM. There are already tests in jit-test/test/basic/test{Table}Switch*.js for JSOP_TABLESWITCH; so I just added some tests that were missing there.
Attachment #470770 - Attachment is obsolete: true
Attachment #487633 - Flags: review?(dvander)
Comment on attachment 487633 [details] [diff] [review]
Patch

Sorry for the delay here... nice patch!
 
>+    for (size_t i = 0; i < jumpTableOffsets.length(); i++) {
>+        ptrdiff_t offset = jumpTableOffsets[i];

uint32 here since the offset is absolute.

>+    for (uint32 i = 0; i < numJumps; i++) {
>+        uint32 target = GET_JUMP_OFFSET(pc);
>+        if (!target)
>+            target = defaultTarget;
>+        ptrdiff_t offset = (originalPC + target) - script->code;

also here, and 

>+    js::Vector<ptrdiff_t, 16> jumpTableOffsets;

here.

Would also be good to have a copy of the switch (3.14) test cases where the input is not a constant, that is, there's no test that takes the OOL stubs::TableSwitch path. There should also be a test with negative switch indexes. |low| and |high| can be negative. stubs::TableSwitch and the code here (which store as uint32) work because of overflow which is a little subtle.

I would either change jsop_tableswitch to use signed ranges, or put a big comment explaining how it works.

r=me with those addressed
Attachment #487633 - Flags: review?(dvander) → review+
(Assignee)

Comment 11

9 years ago
Posted patch Patch v2 (obsolete) — Splinter Review
New version: s/ptrdiff_t/uint32, added another #define so we don't break compilation on ARM, added tests, changed uint32 to jsint (like tracer and interpreter)
Attachment #487633 - Attachment is obsolete: true
Attachment #492716 - Flags: review?(dvander)
Attachment #492716 - Flags: review?(dvander) → review+
(Assignee)

Comment 12

9 years ago
Anybody willing to land this? It's a measurable speedup for typical interpreter/emulator-like switch-statements, and switch-statements usually don't trace well.
blocking2.0: --- → ?

Comment 13

9 years ago
We'll land this after b8 is done. Thanks for your work.

Comment 14

9 years ago
So wait, are switches faster than this.OPCODES[op](this) ? Because my own emulator (GameBoy Color) relies on this. Does firefox optimize for an array of anonymous functions? It was hard for me to tell for some reason. Does firefox's performance vary majorly based on the case order presented in a switch (might be why I'm confused).
(Assignee)

Comment 15

9 years ago
Posted patch Patch v3 (obsolete) — Splinter Review
Rebased + make jsop_tableswitch a no-op when there are no cases (tracer does this too). The previous version handled this correctly, but this may prevent problems in the future (JumpTable.offsetIndex pointing to an invalid index is not very intuitive). Sorry for the review noise...
Attachment #492716 - Attachment is obsolete: true
Attachment #495479 - Flags: review?(dvander)
(Assignee)

Comment 16

9 years ago
(In reply to comment #14)
> So wait, are switches faster than this.OPCODES[op](this) ? Because my own
> emulator (GameBoy Color) relies on this. 

With this patch, I think (need to measure to be sure) a table switch(*) is faster than an array, because we don't have to 1) check that this.OPCODES is an array 2) call a separate function.

> Does firefox optimize for an array of anonymous functions? 

I don't think so, but I might be wrong.

> Does firefox's
> performance vary majorly based on the case order presented in a switch (might
> be why I'm confused).

About the case order, it depends on the switch statement. For a table switch the case order should not matter. But for other switch statements it does matter because we compare the value to each case label in the same order.

* A table switch is a switch with "constant integer cases"; this patch optimizes this to a jump table (when highest value - lowest value <= 256 to prevent excessive memory use)
Comment on attachment 495479 [details] [diff] [review]
Patch v3

>+     */
>+    if (numJumps == 0) {
>+        frame.pop();
>+        return;
>+    }
>+    FrameEntry *fe = frame.peek(-1);

r=me with a newline after the } here, and a test case that asserts the default label in an empty switch gets executed.

>+        return;
>+    }
>+    RegisterID dataReg;

>+    } else {
>+        dataReg = frame.copyDataIntoReg(fe);
>+    }
>+    RegisterID reg = frame.allocReg();

While you're back in the area, could use newlines after } in these places too :)
Attachment #495479 - Flags: review?(dvander) → review+

Updated

9 years ago
blocking2.0: ? → -
status2.0: --- → wanted
(Assignee)

Comment 18

9 years ago
Posted patch Patch v4Splinter Review
Rebased + addressed review comments. Had to use jumpMap now that nmap is gone.
Attachment #495479 - Attachment is obsolete: true
Attachment #497813 - Flags: review?(dvander)
Attachment #497813 - Flags: review?(dvander) → review+
(Assignee)

Comment 19

8 years ago
Comment on attachment 497813 [details] [diff] [review]
Patch v4

Asking approval2.0. Would be nice to have this in beta 9.
Attachment #497813 - Flags: approval2.0?
http://hg.mozilla.org/tracemonkey/rev/673ae0e2f656
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: fixed-in-tracemonkey
Comment on attachment 497813 [details] [diff] [review]
Patch v4

Pretty sure approval wasn't the problem, since nothing else landing in TM has approval.
Attachment #497813 - Flags: approval2.0?
http://hg.mozilla.org/mozilla-central/rev/673ae0e2f656
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.