JM: for loop with "i--" as loop condition is 2x slower than same loop with "i-- > 0" as loop condition

RESOLVED FIXED in mozilla9

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: bz, Assigned: jandem)

Tracking

(Blocks: 2 bugs, {perf})

Trunk
mozilla9
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: js-triage-done, URL)

Attachments

(2 attachments, 1 obsolete attachment)

Created attachment 545046 [details]
Testcase

See attached testcase.  I see f2() taking 2x longer than f1(), for both JM and JM+TI.

Running under -j doesn't seem to show this 2x slowdown.
Whiteboard: js-triage-needed
f2 generates the opcodes:
00038:   5  zero
00039:   5  gt
00040:   5  ifne 32 (-8)

where f2 generates the opcodes:
00038:  13  ifne 32 (-6)

Then gt; ifne gets fused into a single operation that we generate code for, 
whereas the lone ifne code mostly only works if the argument is a boolean.  In this case, It is an Int32, so we fall back to the slow C++ code.
The relevant code is in booleanJumpScript:
>    if (type.isSet()) {
>        jmpNotBool.setJump(masm.testBoolean(Assembler::NotEqual, type.reg()));
>    } else {
Looks like a job for a peephole optimizer

It'd be interesting to see what other engines do with this.
Blocks: 579390
Sorry if I was not clear.  It looks like the root cause of the problem is that we don't have a fast path for doing integer checks.  I suspect it would be about +10 lines to fix this (and we'd add in about 2 instructions), but I have bigger fish to fry for the time being.  

sstangl had the idea to put asserts in our ool slow paths to make sure that we were not falling back to C++ for integer/double operations which could be trivially jitted.
(Assignee)

Comment 4

6 years ago
booleanJumpScript should handle known int32 values (my first JM patch, hah), so on the TI branch we should be able to handle this inline. Looking at booleanJumpScript there seems to be more low-hanging fruit (regalloc for instance).
> so on the TI branch we should be able to handle this inline

As in, you don't yet but you could?

The timings I see on TI branch are:

-m:
  419
  905
-j:
  176
  178
-m -n:
  133
  291

Note that having an integer as test is the common case for loops, so I think we should seriously look into making it fast.
(Assignee)

Comment 6

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #5)
> > so on the TI branch we should be able to handle this inline
> 
> As in, you don't yet but you could?

We do, but somehow it's less efficient than the code generated for i-- > 0.

The fix for this will also help bug 679710. With Ion coming up I don't want to spend too much time fixing JM perf issues, but this one seems to be quite common.
Blocks: 679710
Status: NEW → ASSIGNED
(Assignee)

Updated

6 years ago
Assignee: general → jandemooij
(Assignee)

Comment 7

6 years ago
Created attachment 557919 [details] [diff] [review]
WIP

This refactors booleanJumpScript and gets rid of some stack stores and jumps in the generated code. And it removes some code: 1 files changed, 43 insertions(+), 55 deletions(-)

For the attached testcase:

old: 132 310
new: 133 148

More than 2x faster, but still 15 ms slower than version 1. The reason is that we are able to hoist the overflow check for i-- > 0 but not for the i--. The generated code for the two versions is very similar. The only difference is test/jne vs. cmp/jg and the overflow check:
--
   0x8a1710:	mov    %edi,%esi
   0x8a1712:	add    $0xffffffff,%esi
>> 0x8a1715:	jo     0x8a1bc7
   0x8a171b:	mov    %esi,0x48(%ebp)
   0x8a171e:	mov    %edi,%ebx
   0x8a1720:	mov    %esi,%edi
   0x8a1722:	test   %ebx,%ebx
   0x8a1724:	jne    0x8a16ec
   0x8a172a:	jmp    0x8a173c
--
I will add the TI optimizations in bug 679710. This patch seems to be a nice win so I would like to get it in first
(Assignee)

Updated

6 years ago
Whiteboard: js-triage-needed → js-triage-done
(In reply to Jan de Mooij from comment #7)
> ... The reason is
> that we are able to hoist the overflow check for i-- > 0 but not for the
> i--.
As someone following along at home, why is that? i-- can overflow if i == 0 and i is unsigned, but both loops test the value before the decrement, right? Is this a peculiarity of JS, or am I missing something simple?
Seems like if we know that i > 0 then i-- can't overflow (we work only with signed ints for purposes of working with ints).

But if all we know is |i != 0|, then i-- could overflow on a signed int i, right?

The above is total guess; jandem is the one who knows what's going on here.  ;)
Hmm, I suppose it would overflow from the maximum negative value to.. something else in that case (not sure what exactly, bit rusty on integer representations). I guess it makes sense that the overflow check has to be made in that case - though I don't know what happens on overflow :) Does TI have type barriers for unsigned values, or is signed assumed throughout the codebase? (not to start some lengthy discussion on the JS engine, I think I understand the behavior difference now)
We can only remove the overflow check if i is always >= 0 within the loop body, else we could get the overflow on INT_MIN and need to start using the equivalent double (all numbers in JS are really doubles, we go through this integer song and dance for perf).  With some analysis we should be able to determine that if i >= 0 when we enter the loop, either when it initially starts or via an interpreter transition, that it will stay >= 0.  Then we just have to do that one test when entering the loop, and don't need to test each iteration.  LoopState does similar analysis to this for other loop forms, but needs to be more robust for this case.
> not sure what exactly

In practice on most hardware using 2s-complement, INT_MIN - 1 = INT_MAX.  But JS engine behavior should be that INT_MIN-1 = double(INT_MIN) - 1.
Thanks, that just about answers all my questions!
(Assignee)

Comment 14

6 years ago
Created attachment 558066 [details] [diff] [review]
Patch

Can you especially check whether this comment is still correct: "Note: this cannot overwrite slots holding loop invariants."?
Attachment #557919 - Attachment is obsolete: true
Attachment #558066 - Flags: review?(bhackett1024)
Comment on attachment 558066 [details] [diff] [review]
Patch

Nice.
Attachment #558066 - Flags: review?(bhackett1024) → review+
(Assignee)

Comment 16

6 years ago
Adding checkin-needed. The tracemonkey repo required level 2 commit access but for inbound I need level 3.

This was green on try (except some seemingly unrelated Android orange).
Keywords: checkin-needed, perf
http://hg.mozilla.org/integration/mozilla-inbound/rev/6e67cafd75c8
Keywords: checkin-needed
Whiteboard: js-triage-done → js-triage-done [inbound]
http://hg.mozilla.org/mozilla-central/rev/6e67cafd75c8
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: js-triage-done [inbound] → js-triage-done
Target Milestone: --- → mozilla9

Updated

6 years ago
Depends on: 684789
(Assignee)

Updated

6 years ago
Depends on: 684797
(Assignee)

Updated

6 years ago
Depends on: 686919

Updated

6 years ago
Depends on: 716900
Depends on: 707141
You need to log in before you can comment on or make changes to this bug.