Closed Bug 738873 Opened 12 years ago Closed 12 years ago

IonMonkey: Optimize truncates to have fewer bailouts

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mjrosenb, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

This optimization looks for (a+b)|0 (or any other operation that has an implicit truncateToInt32 in it, paired with an operation that has a non-int32 check, like add, sub, and eventually, div), and omits the overflow / remainder check and bailout.
Attachment #608939 - Flags: review?(sstangl)
Could you make this patch on top of bug 736135 ? That patch will land soon and change some things...

Secondly the patch in bug 736135 also introduces canOverflow boolean on MDiv (like MMul has already a long time). I would suggest you reuse that boolean for MDiv. And MSub/MAdd could use that same name (instead of elideOverflowCheck) ?
results of patch:

~/bench; js.ion.opt --ion -n --ion-trunc=off ./micro.js 
13193
+2012-03-24 01:37 0 !218 (0)
mrosenberg@planetaryac:$
~/bench; js.ion.opt --ion -n --ion-trunc=on ./micro.js 
11394

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.004x as fast    381.6ms +/- 0.2%   380.3ms +/- 0.1%     significant

=============================================================================

  3d:                  1.005x as fast     60.8ms +/- 0.3%    60.5ms +/- 0.1%     significant
    cube:              1.004x as fast     28.4ms +/- 0.3%    28.3ms +/- 0.2%     significant
    morph:             1.015x as fast      8.7ms +/- 0.8%     8.5ms +/- 0.2%     significant
    raytrace:          -                  23.8ms +/- 0.3%    23.7ms +/- 0.2% 

  access:              -                  47.3ms +/- 0.4%    47.2ms +/- 0.2% 
    binary-trees:      -                  26.5ms +/- 0.5%    26.5ms +/- 0.4% 
    fannkuch:          -                  11.7ms +/- 0.4%    11.7ms +/- 0.2% 
    nbody:             ??                  5.5ms +/- 0.5%     5.5ms +/- 0.2%     not conclusive: might be *1.005x as slow*
    nsieve:            -                   3.6ms +/- 0.6%     3.6ms +/- 0.2% 

  bitops:              1.016x as fast     15.7ms +/- 0.2%    15.4ms +/- 0.1%     significant
    3bit-bits-in-byte: ??                  1.2ms +/- 1.2%     1.2ms +/- 0.4%     not conclusive: might be *1.002x as slow*
    bits-in-byte:      1.019x as fast      5.5ms +/- 0.2%     5.4ms +/- 0.1%     significant
    bitwise-and:       -                   4.3ms +/- 0.1%     4.2ms +/- 0.1% 
    nsieve-bits:       1.030x as fast      4.7ms +/- 0.6%     4.6ms +/- 0.1%     significant

  controlflow:         ??                  3.2ms +/- 0.7%     3.2ms +/- 0.2%     not conclusive: might be *1.003x as slow*
    recursive:         ??                  3.2ms +/- 0.7%     3.2ms +/- 0.2%     not conclusive: might be *1.003x as slow*

  crypto:              -                  35.0ms +/- 0.2%    35.0ms +/- 0.1% 
    aes:               *1.006x as slow*   19.5ms +/- 0.3%    19.6ms +/- 0.2%     significant
    md5:               -                   9.3ms +/- 0.4%     9.2ms +/- 0.2% 
    sha1:              1.010x as fast      6.2ms +/- 0.6%     6.2ms +/- 0.2%     significant

  date:                -                  87.2ms +/- 0.4%    86.9ms +/- 0.4% 
    format-tofte:      -                  47.9ms +/- 0.4%    47.9ms +/- 0.6% 
    format-xparb:      1.006x as fast     39.2ms +/- 0.4%    39.0ms +/- 0.4%     significant

  math:                *1.005x as slow*   21.5ms +/- 0.3%    21.6ms +/- 0.1%     significant
    cordic:            *1.012x as slow*    4.1ms +/- 0.7%     4.2ms +/- 0.3%     significant
    partial-sums:      ??                 13.2ms +/- 0.4%    13.2ms +/- 0.1%     not conclusive: might be *1.003x as slow*
    spectral-norm:     *1.006x as slow*    4.2ms +/- 0.2%     4.2ms +/- 0.1%     significant

  regexp:              1.014x as fast     14.1ms +/- 0.2%    13.9ms +/- 0.1%     significant
    dna:               1.014x as fast     14.1ms +/- 0.2%    13.9ms +/- 0.1%     significant

  string:              1.004x as fast     96.9ms +/- 0.3%    96.5ms +/- 0.2%     significant
    base64:            *1.006x as slow*    6.0ms +/- 0.4%     6.0ms +/- 0.3%     significant
    fasta:             ??                  8.8ms +/- 0.3%     8.9ms +/- 0.1%     not conclusive: might be *1.003x as slow*
    tagcloud:          1.006x as fast     35.3ms +/- 0.5%    35.1ms +/- 0.2%     significant
    unpack-code:       1.007x as fast     38.5ms +/- 0.4%    38.3ms +/- 0.2%     significant
    validate-input:    ??                  8.2ms +/- 0.5%     8.3ms +/- 0.2%     not conclusive: might be *1.005x as slow*


~/bench/v8/v7; js.ion.opt --ion -n --ion-trunc=off ./run-crypto.js
Crypto: 9865
Crypto: 9942
Crypto: 10385
Crypto: 10074
Crypto: 9708
Crypto: 9978
Crypto: 10033
Crypto: 10457
Crypto: 10364
----
Score (version 7): 10087

~/bench/v8/v7; js.ion.opt --ion -n --ion-trunc=off ./run.js 
Richards: 2067
DeltaBlue: 1053
Crypto: 10041
RayTrace: 385
EarleyBoyer: 523
RegExp: 466
Splay: 3423
NavierStokes: 658
----
Score (version 7): 1211
~/bench/v8/v7; js.ion.opt --ion -n --ion-trunc=on ./run.js 
Richards: 2090
DeltaBlue: 1080
Crypto: 10419
RayTrace: 386
EarleyBoyer: 522
RegExp: 464
Splay: 3414
NavierStokes: 682
----
Score (version 7): 1227

.3
+2012-03-23 21:22 0 !179 (0)
mrosenberg@planetaryac:$
~/bench/v8/v7; js.ion.opt --ion -n --ion-trunc=on ./run-crypto.js
Crypto: 10522
Crypto: 10161
Crypto: 10649
Crypto: 10570
Crypto: 10683
Crypto: 10842
Crypto: 10699
Crypto: 10644
Crypto: 10450
----
Score (version 7): 10579



TEST                         COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:                 -                 5121.9ms +/- 0.1%   5113.5ms +/- 0.3% 

=============================================================================

  ai:                        -                  180.1ms +/- 0.2%    180.2ms +/- 0.3% 
    astar:                   -                  180.1ms +/- 0.2%    180.2ms +/- 0.3% 

  audio:                     -                 1562.4ms +/- 0.3%   1559.2ms +/- 0.4% 
    beat-detection:          -                  509.5ms +/- 0.8%    505.0ms +/- 0.9% 
    dft:                     ??                 663.1ms +/- 0.3%    663.9ms +/- 0.4%     not conclusive: might be *1.001x as slow*
    fft:                     -                  174.0ms +/- 0.3%    174.2ms +/- 0.4% 
    oscillator:              ??                 215.7ms +/- 0.3%    216.1ms +/- 0.3%     not conclusive: might be *1.002x as slow*

  imaging:                   ??                 792.4ms +/- 0.2%    793.8ms +/- 0.3%     not conclusive: might be *1.002x as slow*
    gaussian-blur:           ??                 294.2ms +/- 0.2%    294.8ms +/- 0.3%     not conclusive: might be *1.002x as slow*
    darkroom:                ??                 335.7ms +/- 0.3%    336.7ms +/- 0.6%     not conclusive: might be *1.003x as slow*
    desaturate:              -                  162.4ms +/- 0.7%    162.3ms +/- 0.7% 

  json:                      ??                 143.6ms +/- 0.3%    144.0ms +/- 0.3%     not conclusive: might be *1.003x as slow*
    parse-financial:         ??                  91.2ms +/- 0.4%     91.5ms +/- 0.4%     not conclusive: might be *1.003x as slow*
    stringify-tinderbox:     ??                  52.4ms +/- 0.3%     52.5ms +/- 0.4%     not conclusive: might be *1.002x as slow*

  stanford:                  -                 2443.4ms +/- 0.2%   2436.2ms +/- 0.3% 
    crypto-aes:              -                 1386.6ms +/- 0.3%   1387.0ms +/- 0.3% 
    crypto-ccm:              ??                 344.9ms +/- 0.3%    345.6ms +/- 0.4%     not conclusive: might be *1.002x as slow*
    crypto-pbkdf2:           1.009x as fast     567.3ms +/- 0.5%    562.2ms +/- 0.5%     significant
    crypto-sha256-iterative: 1.022x as fast     144.5ms +/- 0.3%    141.5ms +/- 0.3%     significant
(In reply to Hannes Verschore from comment #1)
> Could you make this patch on top of bug 736135 ? That patch will land soon
> and change some things...
> 
> Secondly the patch in bug 736135 also introduces canOverflow boolean on MDiv
> (like MMul has already a long time). I would suggest you reuse that boolean
> for MDiv. And MSub/MAdd could use that same name (instead of
> elideOverflowCheck) ?

Sure thing. I'd decided to change the name of the pass mid-writing, and decided to just leave it as a split name for testing.
other thing to note: I'm going to strip out the debug fprintf's as well.
Merged this patch with all of the current "range analysis" stuff. I also split the old range analysis into a forwards and backwards pass, while adding the new code specifically into the backwards pass.
Attachment #608939 - Attachment is obsolete: true
Attachment #608939 - Flags: review?(sstangl)
Attachment #613813 - Flags: review?(sstangl)
Can I raise some concerns. I noticed you moved the range analysis pass before the GVN pass. This actually worsen the effect of the 'NegativeZeroCheck' removal. This is because GVN can also remove the negative zero check and range analysis uses this information! Theoretical the pass would be best at the end, like it was.

Another question you've added foldIfNegOne, but I nowhere found any calls to that function in your diff?
(In reply to Hannes Verschore from comment #6)
> Can I raise some concerns. I noticed you moved the range analysis pass
> before the GVN pass. This actually worsen the effect of the
> 'NegativeZeroCheck' removal. This is because GVN can also remove the
> negative zero check and range analysis uses this information! Theoretical
> the pass would be best at the end, like it was.
> 
I'd asked around if there was any reason that we did negative-zero before gvn, and didn't hear anything definite, so I moved it earlier.  If you have any test cases that used to get optimized that no longer do, I'll fix them.  Also is there some time we can chat for 15-20 minutes about this rather than through bugzilla/email?

> Another question you've added foldIfNegOne, but I nowhere found any calls to
> that function in your diff?

oops, that was supposed to be called from the binary folding function, guess I never actually added it in. I guess there aren't many cases that people execute x & -1 in our benches :)
(In reply to Marty Rosenberg [:mjrosenb] from comment #7)
> I'd asked around if there was any reason that we did negative-zero before
> gvn, and didn't hear anything definite, so I moved it earlier.  If you have
> any test cases that used to get optimized that no longer do, I'll fix them. 
> Also is there some time we can chat for 15-20 minutes about this rather than
> through bugzilla/email?

Small example:
(x*6)+(y*z)

- Without extra information range analysis can remove one check of both checks if both have "negative zero checks"
- GVN removes "negative zero check" on (x*6)
- With this information, range analysis can remove the other "negative zero check" too.

=> GVN followed by RangeAnalysis: removes definitely both checks
=> RangeAnalysis followed by GVN: removes one or both checks depending on order of MUL's

I'll try to be online from 6am PST (now) to 9am PST and from 1pm PST onwards
Comment on attachment 613813 [details] [diff] [review]
/home/mrosenberg/patches/elideOverflow-r3.patch

Review of attachment 613813 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/Makefile.in
@@ +319,5 @@
>  		TypePolicy.cpp \
>  		ValueNumbering.cpp \
>  		VMFunctions.cpp \
>  		AliasAnalysis.cpp \
> +		OverflowEliminator.cpp \

This is legacy stuff, right? It doesn't look like there is any such file in this cset.

::: js/src/ion/Ion.cpp
@@ +681,5 @@
>          IonSpewPass("Alias analysis");
>          AssertGraphCoherency(graph);
>      }
>  
> +    if (js_IonOptions.rangeAnalysis) {

It would be useful to explain the logic for requiring this pass to occur before GVN in a comment; it is not immediately obvious.

::: js/src/ion/Lowering.cpp
@@ +541,5 @@
>          JS_ASSERT(lhs->type() == MIRType_Int32);
>          ReorderCommutative(&lhs, &rhs);
>          LAddI *lir = new LAddI;
> +        if (!ins->implicitTruncate()) {
> +            if (!assignSnapshot(lir))

if (!ins->implicitTruncate() && !assignSnapshot(lir))
    return false;

The else block with the manual DEBUG spew to stderr should be removed.

@@ +572,5 @@
>      if (ins->specialization() == MIRType_Int32) {
>          JS_ASSERT(lhs->type() == MIRType_Int32);
>          LSubI *lir = new LSubI;
> +        if (!ins->implicitTruncate()) {
> +            if (!assignSnapshot(lir))

Same comments as with visitAdd().

::: js/src/ion/MIR.cpp
@@ +672,1 @@
>      // This is only meaningfull when doing integer division

preexisting nit: "meaningful". Period at end. s/when doing/for/.

@@ +704,4 @@
>      if (canBeNegativeZero_)
>          canBeNegativeZero_ = NeedNegativeZeroCheck(this);
> +    if (js::ion::RangeAnalysis::allUsesTruncate(this))
> +        implicitTruncate_ = true;

setTruncates().

@@ +709,4 @@
>  }
>  
> +bool
> +MDiv::updateForReplacement(MDefinition *ins_, bool merging)

Non-nit: The existence of the boolean parameter is troublesome. Booleans are horribly ugly and make maintenance irritating. Is there a better way to structure this? Could we get by without information about GVN phase leaking? Perhaps have a different function for each phase? updateForReplacement() is intended for instructions that are not equivalent, per comment in MIR.h.

@@ +715,5 @@
> +        return true;
> +    JS_ASSERT(ins_->isDiv());
> +    MDiv *ins = ins_->toDiv();
> +    // Since these properties are based solely on the inputs, GVN equivalence
> +    // implies that they are also equivalent.

updateForReplacement() is commented that it is intended specifically for instructions that are not equivalent (MIR.h:409).

@@ +720,5 @@
> +    JS_ASSERT(canBeNegativeOverflow() == ins->canBeNegativeOverflow());
> +    JS_ASSERT(canBeDivideByZero() == ins->canBeDivideByZero());
> +
> +    canBeNegativeZero_ = canBeNegativeZero() || ins->canBeNegativeZero();
> +    implicitTruncate_ = implicitTruncate() && ins->implicitTruncate();

These should use setters. These comments apply to all the other updateForReplacement() functions.

@@ +785,5 @@
> +void
> +MAdd::analyzeRangeBackward()
> +{
> +    if (js::ion::RangeAnalysis::allUsesTruncate(this))
> +        implicitTruncate_ = true;

setTruncates();

::: js/src/ion/MIR.h
@@ +531,5 @@
>      }
>      MResumePoint *resumePoint() const {
>          return resumePoint_;
>      }
> +

Don't need extra whitespace here.

@@ +1838,5 @@
>      MDefinition *foldIfEqual() {
>          return MConstant::New(Int32Value(0));
>      }
> +    MDefinition *foldIfNegOne(size_t operand) {
> +        // Technically bitNot.

This patch introduces a number of functions named foldIfNegOne(), but they appear to be unused. They should be removed.

@@ +2052,5 @@
>          return new MAdd(left, right);
>      }
> +    void analyzeRangeBackward();
> +
> +    bool implicitTruncate() const {

This name seems strange, because it contains a verb but does not perform that action. How about simply "isTruncating()"?

@@ +2055,5 @@
> +
> +    bool implicitTruncate() const {
> +        return implicitTruncate_;
> +    }
> +    void setImplicitTruncate(bool val) {

Can get rid of the boolean parameter and just always set it to true.
These methods should be put on MBinaryArithInstruction to avoid duplication.

::: js/src/ion/RangeAnalysis.cpp
@@ +24,5 @@
>  RangeAnalysis::analyze()
>  {
>      for (ReversePostorderIterator block(graph.rpoBegin()); block != graph.rpoEnd(); block++) {
>          for (MDefinitionIterator iter(*block); iter; iter++) {
> +            (*iter)->analyzeRangeForward();

iter->analyzeRangeForward();

@@ +28,5 @@
> +            (*iter)->analyzeRangeForward();
> +        }
> +    }
> +
> +    for (PostorderIterator block(graph.poBegin()); block != graph.poEnd(); block++) {

Comment on the necessity of the backwards phase: only used to find and group truncating definitions.

::: js/src/ion/RangeAnalysis.h
@@ -14,5 @@
>  
>  class RangeAnalysis
>  {
>      MIRGraph &graph;
> -

This whitespace should be kept.

@@ +18,4 @@
>    public:
>      RangeAnalysis(MIRGraph &graph);
>      bool analyze();
> +    static bool allUsesTruncate(MInstruction *m) {

This function should be defined in RangeAnalysis.cpp.
Could we call it "AllUsesTruncate()"? Capitalized since static.

@@ +20,5 @@
>      bool analyze();
> +    static bool allUsesTruncate(MInstruction *m) {
> +        for (MUseIterator use = m->usesBegin(); use != m->usesEnd(); use++) {
> +            if (!use->node()->isDefinition())
> +                return false;

if (use->node()->isResumePoint())
    return false;

@@ +33,5 @@
> +            if(def->isBitXor())
> +                continue;
> +            if(def->isLsh())
> +                continue;
> +            if(def->isRsh())

nit: need whitespace between "if" and "("

::: js/src/ion/ValueNumbering.cpp
@@ +388,5 @@
>              MDefinition *dom = findDominatingDef(defs, ins, index);
>              if (!dom)
>                  return false; // Insertion failed.
>  
> +            if (dom == ins || !dom->updateForReplacement(ins, true)) {

The boolean argument should go away; see comments in MIR.cpp.

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +525,5 @@
>      masm.setupAlignedABICall(2);
>      masm.passABIArg(lhs);
>      masm.passABIArg(rhs);
>      masm.callWithABI(JS_FUNC_TO_DATA_PTR(void *, __aeabi_idivmod));
>      // idivmod returns the qoutient in r0, and the remainder in r1.

preexisting nit: "quotient"

@@ +528,5 @@
>      masm.callWithABI(JS_FUNC_TO_DATA_PTR(void *, __aeabi_idivmod));
>      // idivmod returns the qoutient in r0, and the remainder in r1.
> +    if (!mir->implicitTruncate()) {
> +        masm.ma_cmp(r1, Imm32(0));
> +        if (!bailoutIf(Assembler::NonZero, ins->snapshot()))

Since the above operation is a comparison, it looks more idiomatic here to use Assembler::Equal, even though it aliases to the same thing as NonZero.
Attachment #613813 - Flags: review?(sstangl) → review+
nits addressed, both sstangl's and h4writer's.  lots of fiddly code movement, so asking for a quick follow-up review.
Attachment #613813 - Attachment is obsolete: true
Attachment #614959 - Flags: review?(sstangl)
Comment on attachment 614959 [details] [diff] [review]
/home/mrosenberg/patches/elideOverflow-r4.patch

Review of attachment 614959 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Ion.cpp
@@ +685,5 @@
> +    if (js_IonOptions.rangeAnalysis) {
> +        RangeAnalysis rangeAnalysis(graph);
> +        if (!rangeAnalysis.analyzeEarly())
> +            return false;
> +        IonSpewPass("Range Analysis");

This should be "Range Analysis Early" or something to differentiate it from the other pass.

::: js/src/ion/MIR.h
@@ +409,5 @@
>  
>      // Mark this instruction as having replaced all uses of ins, as during GVN,
>      // returning false if the replacement should not be performed. For use when
>      // GVN eliminates instructions which are not equivalent to one another.
> +

Don't need this added whitespace.

@@ +1800,5 @@
>      }
> +
> +    MDefinition *foldIfNegOne(size_t operand) {
> +        return getOperand(1-operand);
> +    }

Still don't need these functions!

@@ +2099,4 @@
>      double getIdentity() {
>          return 0;
>      }
> +

Still don't need this extra whitespace.

@@ +2194,5 @@
>      bool canBeDivideByZero() {
>          return canBeDivideByZero_;
>      }
> +    bool updateForReplacement(MDefinition *ins);
> +

Don't need extra whitespace.

::: js/src/ion/RangeAnalysis.cpp
@@ +25,3 @@
>  {
>      for (ReversePostorderIterator block(graph.rpoBegin()); block != graph.rpoEnd(); block++) {
>          for (MDefinitionIterator iter(*block); iter; iter++) {

nit: Braces unnecessary on the innermost loop. Likewise for the other iterators in this file.

@@ +25,4 @@
>  {
>      for (ReversePostorderIterator block(graph.rpoBegin()); block != graph.rpoEnd(); block++) {
>          for (MDefinitionIterator iter(*block); iter; iter++) {
> +            (*iter)->analyzeRangeForward();

I think this can just be "iter->". Likewise for the other iterators in this file.

@@ +43,5 @@
> +{
> +
> +    for (PostorderIterator block(graph.poBegin()); block != graph.poEnd(); block++) {
> +        for (MInstructionReverseIterator riter(block->rbegin()); riter != block->rend(); riter++) {
> +            (*riter)->analyzeTruncBackward();

Is truncating "Truncate" saving anything? "analyzeTruncateBackward" is probably OK.

@@ +54,5 @@
> +bool
> +RangeAnalysis::AllUsesTruncate(MInstruction *m)
> +{
> +    for (MUseIterator use = m->usesBegin(); use != m->usesEnd(); use++) {
> +        if (!use->node()->isDefinition())

if (use->node()->isResumePoint())

::: js/src/ion/RangeAnalysis.h
@@ +18,5 @@
>  
>    public:
>      RangeAnalysis(MIRGraph &graph);
> +    bool analyzeEarly();
> +    bool analyzeLate();

I like this. It's cute.
Attachment #614959 - Flags: review?(sstangl) → review+
Looks like this has caused a large regression on the kraken-gaussian-blur test (371ms => 102447.8ms) going on what AWFY is indicating.
Yeah, this was pointed out last night.  Testing on linux reveals no clues, I suspect I need to try this on an OS box.
I can reproduce this locally by running the entire kraken-1.1 testsuite.
It also reproduces by just running the test individually:

> time ~/dev/ionmonkey/js/src/opt32/js --ion -n -f imaging-gaussian-blur-data.js -f imaging-gaussian-blur.js
> real	0m49.261s
looks like when I updated my configure scripts to run on my desktop, they broke my laptop, such that I was building a 64 bit shell in my 32 bit directory.  Now that *that* has been fixed, investigation will begin (and I suspect hilarity will ensue)
Depends on: 746335
landed: http://hg.mozilla.org/projects/ionmonkey/rev/a5788d299d6a
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.