Last Comment Bug 707844 - IonMonkey: try switching bailouts from three instructions to one or two
: IonMonkey: try switching bailouts from three instructions to one or two
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: ARM Linux
: -- normal (vote)
: ---
Assigned To: general
:
Mentors:
Depends on:
Blocks: IonSpeed
  Show dependency treegraph
 
Reported: 2011-12-05 17:50 PST by Marty Rosenberg [:mjrosenb]
Modified: 2012-01-30 13:28 PST (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
shrink bailouts to 1 instruction (52.57 KB, patch)
2012-01-26 22:24 PST, Marty Rosenberg [:mjrosenb]
Jacob.Bramley: review+
Details | Diff | Splinter Review

Description Marty Rosenberg [:mjrosenb] 2011-12-05 17:50:02 PST
currently for nearly all bailouts, we generate:
movwcc r12, bottom
movtcc r12, top
bxcc r12
which means we have 3 instructions that don't do anything but take up space in the instruction stream and pipeline
It would be much smaller (and probably faster) to either
ldrcc r12, [pc, #pool_offset]
bxcc r12
or even 
ldrc pc, [pc, #pool_offset]
Comment 1 Marty Rosenberg [:mjrosenb] 2012-01-26 22:24:06 PST
Created attachment 592058 [details] [diff] [review]
shrink bailouts to 1 instruction

This code has a bunch of extra changes in it, since I found a whole slew of bugs related to the multi-pool code.
Comment 2 Jacob Bramley [:jbramley] 2012-01-27 05:10:58 PST
Comment on attachment 592058 [details] [diff] [review]
shrink bailouts to 1 instruction

Review of attachment 592058 [details] [diff] [review]:
-----------------------------------------------------------------

Some tests have been modified. Are these left-over debug changes?

Other than that, there are only a few minor glitches. r+ with those addressed.

::: js/src/ion/arm/Assembler-arm.cpp
@@ +486,5 @@
>          JS_ASSERT(branch->checkDest(temp));
>          uint32 *dest = (uint32*) (targ_bot.decode() | (targ_top.decode() << 16));
>          return dest;
> +    } else if (jump->is<InstLDR>()) {
> +

Should there be a TODO in here?

@@ +1160,5 @@
>  Assembler::as_genmul(Register dhi, Register dlo, Register rm, Register rn,
>            MULOp op, SetCond_ sc, Condition c)
>  {
>  
> +    writeInst(RN(dhi) | maybeRD(dlo) | RM(rm) | rn.code() | op | sc | c | mull_tag);

What's mull_tag for? Doesn't it belong in the MULOp values?

@@ +1238,5 @@
>          poolVDTR  = 3
>      };
> +  private:
> +    uint32  index : 17;
> +    uint32 cond : 4;

Inconsistent spacing.

@@ +1276,5 @@
> +    int32 getIndex() {
> +        return index;
> +    }
> +    void setIndex(uint32 index_) {
> +        JS_ASSERT(ONES == 0xf && loadType != poolBOGUS);

You should perhaps assert the ONES value on init, in case an index is provided there and not set again.

@@ +1346,3 @@
>  {
>      PoolHintPun php;
> +    php.phd.init(0, c, PoolHintData::poolDTR, dest);

Ah, much cleaner :-)

@@ +1872,5 @@
> +void
> +Assembler::flushBuffer()
> +{
> +    m_buffer.flushPool();
> +}

I think there should be spaces around functions.

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +851,5 @@
>  
>  RelocBranchStyle
>  b_type()
>  {
> +    return LDR;

Is this left in for experimentation? It produces lots of dead code in ma_b (and that is the only place that b_type is referenced.)

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +410,5 @@
> +    // callee token
> +    // frame descriptor
> +    // padding              <- sp now
> +    // return address
> +    masm.ma_dtr(IsLoad, sp, Imm32(4), r4, PreIndex);  // r3 <- sizeDescriptor with FrameType.

I think the comment should say r4. Also, I think 'frame descriptor' should be replaced with 'sizeDescriptor' in the layout comment.

@@ +413,5 @@
> +    // return address
> +    masm.ma_dtr(IsLoad, sp, Imm32(4), r4, PreIndex);  // r3 <- sizeDescriptor with FrameType.
> +    // Remove the rectifier frame.
> +    // callee token
> +    // frame descriptor     <- sp now; r3 <- frame descriptor

r4

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +835,5 @@
>          return (ptrdiff_t)x;
>      }
> +    void flushBuffer() {}
> +    void finish() {
> +    }

There are two different empty-function conventions in use there.

Also, it might be a good idea to have a comment in there, to explain that they are deliberately empty.

::: js/src/ion/shared/IonAssemblerBufferWithConstantPools.h
@@ +335,2 @@
>                      // align the pool.
> +                    poolDest = (uint8*) (curPool->align(poolDest));

The (uint8*) cast shouldn't be necessary here.

@@ +343,2 @@
>                      // align the pool.
> +                    poolDest = (uint8*) (curPool->align(poolDest));

Ditto.

@@ +617,5 @@
>  
>          // We have a perforation.  Time to cut the instruction stream, patch in the pool
>          // and possibly re-arrange the pool to accomodate its new location.
>          int poolOffset = perforation.getOffset();
> +        int magicAlign = numDumps == 0 ? 0 : poolInfo[numDumps-1].finalPos - poolInfo[numDumps-1].offset;

Perhaps magicAlign should be calculated by a helper function of some kind.
Comment 3 Marty Rosenberg [:mjrosenb] 2012-01-27 10:16:22 PST
Effect of changing bailouts from MOVW/MOVT/BLX to LDR pc [pc, IMM]

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.182x as fast    12383.2ms +/- 0.7%   10475.5ms +/- 0.3%     significant

=============================================================================

  3d:                  1.078x as fast     2034.9ms +/- 4.2%    1886.9ms +/- 1.3%     significant
    cube:              1.087x as fast      659.8ms +/- 5.9%     607.3ms +/- 1.8%     significant
    morph:             -                   744.7ms +/- 8.3%     692.3ms +/- 0.3% 
    raytrace:          1.073x as fast      630.4ms +/- 3.3%     587.3ms +/- 3.2%     significant

  access:              1.086x as fast     2653.0ms +/- 0.1%    2442.1ms +/- 0.2%     significant
    binary-trees:      1.024x as fast      412.4ms +/- 0.3%     402.7ms +/- 0.8%     significant
    fannkuch:          1.126x as fast     1146.5ms +/- 0.1%    1018.1ms +/- 0.2%     significant
    nbody:             1.038x as fast      747.9ms +/- 0.2%     720.2ms +/- 0.1%     significant
    nsieve:            1.150x as fast      346.2ms +/- 0.2%     301.0ms +/- 0.1%     significant

  bitops:              3.77x as fast      1661.8ms +/- 0.0%     441.3ms +/- 0.1%     significant
    3bit-bits-in-byte: 10.3x as fast       139.6ms +/- 0.1%      13.5ms +/- 0.1%     significant
    bits-in-byte:      19.2x as fast       383.9ms +/- 0.0%      20.0ms +/- 1.5%     significant
    bitwise-and:       47.2x as fast       686.2ms +/- 0.0%      14.5ms +/- 0.2%     significant
    nsieve-bits:       1.150x as fast      452.1ms +/- 0.1%     393.3ms +/- 0.1%     significant

  controlflow:         12.8x as fast       275.2ms +/- 0.1%      21.5ms +/- 0.2%     significant
    recursive:         12.8x as fast       275.2ms +/- 0.1%      21.5ms +/- 0.2%     significant

  crypto:              1.095x as fast      731.8ms +/- 0.2%     668.2ms +/- 0.1%     significant
    aes:               1.176x as fast      358.5ms +/- 0.1%     304.8ms +/- 0.1%     significant
    md5:               1.015x as fast      118.2ms +/- 0.4%     116.5ms +/- 0.4%     significant
    sha1:              1.033x as fast      255.1ms +/- 0.6%     246.9ms +/- 0.4%     significant

  date:                1.037x as fast      717.2ms +/- 0.8%     691.9ms +/- 0.6%     significant
    format-tofte:      1.045x as fast      402.9ms +/- 0.5%     385.5ms +/- 0.3%     significant
    format-xparb:      1.026x as fast      314.3ms +/- 1.7%     306.4ms +/- 1.3%     significant

  math:                1.023x as fast     2688.7ms +/- 0.4%    2627.3ms +/- 0.5%     significant
    cordic:            1.024x as fast     1026.6ms +/- 0.3%    1002.5ms +/- 0.4%     significant
    partial-sums:      1.003x as fast      728.3ms +/- 0.2%     725.8ms +/- 0.3%     significant
    spectral-norm:     1.039x as fast      933.7ms +/- 0.9%     899.0ms +/- 0.8%     significant

  regexp:              -                    86.7ms +/- 0.2%      86.4ms +/- 0.2% 
    dna:               -                    86.7ms +/- 0.2%      86.4ms +/- 0.2% 

  string:              *1.050x as slow*   1533.8ms +/- 0.2%    1609.9ms +/- 0.3%     significant
    base64:            1.032x as fast      234.2ms +/- 0.3%     226.8ms +/- 0.3%     significant
    fasta:             1.051x as fast      512.0ms +/- 0.6%     487.3ms +/- 0.6%     significant
    tagcloud:          *1.37x as slow*     308.7ms +/- 0.5%     423.4ms +/- 0.9%     significant
    unpack-code:       1.050x as fast      215.8ms +/- 0.4%     205.5ms +/- 0.4%     significant
    validate-input:    *1.014x as slow*    263.1ms +/- 0.8%     266.9ms +/- 0.2%     significant
Comment 4 Jacob Bramley [:jbramley] 2012-01-30 00:41:57 PST
(In reply to Marty Rosenberg [:Marty] from comment #3)
> Effect of changing bailouts from MOVW/MOVT/BLX to LDR pc [pc, IMM]

Well, I'd say that's worthwhile :-)

Which processor did you try?
Comment 5 Paul Wright 2012-01-30 05:17:38 PST
FWIW, this improvement doesn't show up on AWFY, yet.

Note You need to log in before you can comment on or make changes to this bug.