Closed Bug 677636 Opened 13 years ago Closed 13 years ago

IonMonkey: possible tableswitch improvements

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: h4writer, Assigned: h4writer)

Details

Attachments

(4 files)

Tableswitch can get implemented in different ways. There are three major ways.

1)
> code:
>   leal       0(%esi,%edi,3), %esi
>   jmp        ((%esi))
>   jmp        case_1
>   hlt        ; 3 hlt instructions to align to 8 bytes
>   jmp        case_2
>   hlt        ; 3 hlt instructions to align to 8 bytes
>   jmp        case_3
>   hlt        ; 3 hlt instructions to align to 8 bytes

- jmptable size = 8 bytes * #cases

1)
> code:
>   ...
>   leal       0(%esi,%edi,3), %esi
>   jmp        ((%esi))
>   jmp        case_1
>   hlt        ; 3 hlt instructions to align to 8 bytes
>   jmp        case_2
>   hlt        ; 3 hlt instructions to align to 8 bytes
>   jmp        case_3
>   hlt        ; 3 hlt instructions to align to 8 bytes
>   ...

- jmptable size = 8 bytes * #cases

2)
> code:
>   ...
>   jmp        ((0(%esi,%edi,2)))
>   pointer to case_1
>   pointer to case_2
>   pointer to case_3
>   ...

- jmptable size = 4 bytes * #cases

3)

> code:
>   ...
>   jmp        ((0(data,%edi,2))
>   ...
> data:
>   pointer to case_1
>   pointer to case_2
>   pointer to case_3
>   ...

- jmptable size = 4 bytes * #cases

In IonMonkey (1) is implemented. It has the locality that the jump table is in the code stream. But it is twice the size of (2) and (3). 
In JeagerMonkey (3) was implemented.
I'll report the speed difference between the different solutions and based upon those figures decide which one is the best to get impemented.
Attached patch method 2Splinter Review
Quick and hackish way for method 2.
Attached file testcase.js
--ion version 1 => 0,168s
--ion version 2 => 0,140s
So version 2 is faster. But marginally (this is for visiting the tableswitch 100000000 times). For now I'm gonna let this rest. There are more important things to do, like implementing other opcodes. But this is on file, for when we need some extra performance in a tableswitch.

I was curious how this compares to the already existing monkeys. So I did the benchmark on those too.
-j => 0,267s
-m => 0,766s
-m -j => 0,244s
-m -j -p => 0,781s
interpreter => 4,384s

Not bad! ;D
Is that the first actual IonMonkey benchmark? Nice results!
(In reply to David Anderson [:dvander] from comment #3)
> Is that the first actual IonMonkey benchmark? Nice results!

I think so. Haven't seen another one yet ;)
Attached patch method 3Splinter Review
Method 3 is the same as the one that is implemented in jeagermonkey. So This way we can see the real difference between the methods, without looking at the differences of the compiler. (e.g. ionmonkey doesn't check for overflow in addition yet)

Some measurements:

method 1: 200ms
method 2: 170ms
method 3: 170ms

Now I expected method 2-3 to be the same speed. (Because in that testcase the generated code is almost identical) But normally it should differ if the content in the case statements are longer. So I added 100x NOP after each case and measured again (and also some other small changes, that's why it's 3x slower):

method 1: 677ms
method 2: 665ms
method 3: 665ms

So strangely I see no difference between method 2 and 3.
This implements method 3 (cleanly). On irc we (me and Dvander) decided this is the easiest way, without speed regression.
Assignee: general → hv1989
Attachment #555317 - Flags: review?(dvander)
Comment on attachment 555317 [details] [diff] [review]
cleaned up version of method 3 for checkin

Review of attachment 555317 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
@@ +61,5 @@
> +    void copy(IonCode *code, uint8 *buffer) const {
> +        void **jumpData = (void **)buffer;
> +
> +        // For every case write the pointer to the start in the table
> +        for (uint j=0; j<lswitch->mir()->numCases(); j++) { 

Spacing nit: uint32 j = 0; j < lswitch...

@@ +513,5 @@
>      masm.j(AssemblerX86Shared::AboveOrEqual, defaultcase->label());
> +
> +    // Create a JumpTable that during linking will get written.
> +    DeferredJumpTable *d = new DeferredJumpTable(ins);
> +    if (!masm.addDeferredData(d, (1<<ScalePointer)*cases))

Give this some breathing room: (1 << ScalePointer) * cases
Attachment #555317 - Flags: review?(dvander) → review+
http://hg.mozilla.org/projects/ionmonkey/rev/911966c4bee6
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: