Closed Bug 670493 Opened 13 years ago Closed 13 years ago

JM: for loop with "i--" as loop condition is 2x slower than same loop with "i-- > 0" as loop condition

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: bzbarsky, Assigned: jandem)

References

()

Details

(Keywords: perf, Whiteboard: js-triage-done)

Attachments

(2 files, 1 obsolete file)

Attached file Testcase
See attached testcase.  I see f2() taking 2x longer than f1(), for both JM and JM+TI.

Running under -j doesn't seem to show this 2x slowdown.
Whiteboard: js-triage-needed
f2 generates the opcodes:
00038:   5  zero
00039:   5  gt
00040:   5  ifne 32 (-8)

where f2 generates the opcodes:
00038:  13  ifne 32 (-6)

Then gt; ifne gets fused into a single operation that we generate code for, 
whereas the lone ifne code mostly only works if the argument is a boolean.  In this case, It is an Int32, so we fall back to the slow C++ code.
The relevant code is in booleanJumpScript:
>    if (type.isSet()) {
>        jmpNotBool.setJump(masm.testBoolean(Assembler::NotEqual, type.reg()));
>    } else {
Looks like a job for a peephole optimizer

It'd be interesting to see what other engines do with this.
Blocks: WebJSPerf
Sorry if I was not clear.  It looks like the root cause of the problem is that we don't have a fast path for doing integer checks.  I suspect it would be about +10 lines to fix this (and we'd add in about 2 instructions), but I have bigger fish to fry for the time being.  

sstangl had the idea to put asserts in our ool slow paths to make sure that we were not falling back to C++ for integer/double operations which could be trivially jitted.
booleanJumpScript should handle known int32 values (my first JM patch, hah), so on the TI branch we should be able to handle this inline. Looking at booleanJumpScript there seems to be more low-hanging fruit (regalloc for instance).
> so on the TI branch we should be able to handle this inline

As in, you don't yet but you could?

The timings I see on TI branch are:

-m:
  419
  905
-j:
  176
  178
-m -n:
  133
  291

Note that having an integer as test is the common case for loops, so I think we should seriously look into making it fast.
(In reply to Boris Zbarsky (:bz) from comment #5)
> > so on the TI branch we should be able to handle this inline
> 
> As in, you don't yet but you could?

We do, but somehow it's less efficient than the code generated for i-- > 0.

The fix for this will also help bug 679710. With Ion coming up I don't want to spend too much time fixing JM perf issues, but this one seems to be quite common.
Blocks: 679710
Status: NEW → ASSIGNED
Assignee: general → jandemooij
Attached patch WIP (obsolete) — Splinter Review
This refactors booleanJumpScript and gets rid of some stack stores and jumps in the generated code. And it removes some code: 1 files changed, 43 insertions(+), 55 deletions(-)

For the attached testcase:

old: 132 310
new: 133 148

More than 2x faster, but still 15 ms slower than version 1. The reason is that we are able to hoist the overflow check for i-- > 0 but not for the i--. The generated code for the two versions is very similar. The only difference is test/jne vs. cmp/jg and the overflow check:
--
   0x8a1710:	mov    %edi,%esi
   0x8a1712:	add    $0xffffffff,%esi
>> 0x8a1715:	jo     0x8a1bc7
   0x8a171b:	mov    %esi,0x48(%ebp)
   0x8a171e:	mov    %edi,%ebx
   0x8a1720:	mov    %esi,%edi
   0x8a1722:	test   %ebx,%ebx
   0x8a1724:	jne    0x8a16ec
   0x8a172a:	jmp    0x8a173c
--
I will add the TI optimizations in bug 679710. This patch seems to be a nice win so I would like to get it in first
Whiteboard: js-triage-needed → js-triage-done
(In reply to Jan de Mooij from comment #7)
> ... The reason is
> that we are able to hoist the overflow check for i-- > 0 but not for the
> i--.
As someone following along at home, why is that? i-- can overflow if i == 0 and i is unsigned, but both loops test the value before the decrement, right? Is this a peculiarity of JS, or am I missing something simple?
Seems like if we know that i > 0 then i-- can't overflow (we work only with signed ints for purposes of working with ints).

But if all we know is |i != 0|, then i-- could overflow on a signed int i, right?

The above is total guess; jandem is the one who knows what's going on here.  ;)
Hmm, I suppose it would overflow from the maximum negative value to.. something else in that case (not sure what exactly, bit rusty on integer representations). I guess it makes sense that the overflow check has to be made in that case - though I don't know what happens on overflow :) Does TI have type barriers for unsigned values, or is signed assumed throughout the codebase? (not to start some lengthy discussion on the JS engine, I think I understand the behavior difference now)
We can only remove the overflow check if i is always >= 0 within the loop body, else we could get the overflow on INT_MIN and need to start using the equivalent double (all numbers in JS are really doubles, we go through this integer song and dance for perf).  With some analysis we should be able to determine that if i >= 0 when we enter the loop, either when it initially starts or via an interpreter transition, that it will stay >= 0.  Then we just have to do that one test when entering the loop, and don't need to test each iteration.  LoopState does similar analysis to this for other loop forms, but needs to be more robust for this case.
> not sure what exactly

In practice on most hardware using 2s-complement, INT_MIN - 1 = INT_MAX.  But JS engine behavior should be that INT_MIN-1 = double(INT_MIN) - 1.
Thanks, that just about answers all my questions!
Attached patch PatchSplinter Review
Can you especially check whether this comment is still correct: "Note: this cannot overwrite slots holding loop invariants."?
Attachment #557919 - Attachment is obsolete: true
Attachment #558066 - Flags: review?(bhackett1024)
Comment on attachment 558066 [details] [diff] [review]
Patch

Nice.
Attachment #558066 - Flags: review?(bhackett1024) → review+
Adding checkin-needed. The tracemonkey repo required level 2 commit access but for inbound I need level 3.

This was green on try (except some seemingly unrelated Android orange).
Keywords: checkin-needed, perf
http://hg.mozilla.org/integration/mozilla-inbound/rev/6e67cafd75c8
Keywords: checkin-needed
Whiteboard: js-triage-done → js-triage-done [inbound]
http://hg.mozilla.org/mozilla-central/rev/6e67cafd75c8
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: js-triage-done [inbound] → js-triage-done
Target Milestone: --- → mozilla9
Depends on: 684789
Depends on: 684797
Depends on: 686919
Depends on: 716900
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: