Closed Bug 781510 Opened 12 years ago Closed 12 years ago

IonMonkey: (ARM) Coalesce cache flushes

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mjrosenb, Unassigned)

Details

(Whiteboard: [ion:p2])

Attachments

(1 file, 1 obsolete file)

Currently in IonMonkey, every time we overwrite a single instruction, or do an executable copy, we eagerly flush the cache.  Unfortunately, prior to ARMv8, cache flushes require a call into the kernel, and like on most architectures, calling into the kernel is not a fast operation.  To remedy this, I've created a datastructure that will record all would-be calls to cacheflush, and keeps track of the range flushed, and then flushes the entire range before returning to jitted code.
The initial results look *promising*
kraken:
=============================================================================

** TOTAL **:                 1.006x as fast    22249.9ms +/- 0.3%   22125.3ms +/- 0.2%     significant

=============================================================================

  ai:                        -                  2270.7ms +/- 0.7%    2254.2ms +/- 0.6% 
    astar:                   -                  2270.7ms +/- 0.7%    2254.2ms +/- 0.6% 

  audio:                     -                  6967.1ms +/- 0.4%    6936.5ms +/- 0.3% 
    beat-detection:          -                  1666.5ms +/- 0.6%    1661.5ms +/- 0.6% 
    dft:                     -                  3055.6ms +/- 0.6%    3047.0ms +/- 0.4% 
    fft:                     -                  1110.7ms +/- 0.5%    1108.3ms +/- 0.8% 
    oscillator:              1.013x as fast     1134.3ms +/- 0.8%    1119.7ms +/- 0.1%     significant
    
  imaging:                   1.006x as fast     6670.2ms +/- 0.5%    6631.5ms +/- 0.2%     significant
    gaussian-blur:           1.012x as fast     1812.0ms +/- 1.0%    1790.4ms +/- 0.1%     significant
    darkroom:                1.006x as fast     1607.2ms +/- 0.2%    1597.8ms +/- 0.2%     significant
    desaturate:              -                  3250.9ms +/- 0.4%    3243.3ms +/- 0.3% 

  json:                      1.006x as fast     1096.3ms +/- 0.2%    1089.6ms +/- 0.2%     significant
    parse-financial:         1.002x as fast      734.3ms +/- 0.1%     732.8ms +/- 0.1%     significant
    stringify-tinderbox:     1.014x as fast      362.0ms +/- 0.5%     356.9ms +/- 0.4%     significant

  stanford:                  1.006x as fast     5245.7ms +/- 0.4%    5213.5ms +/- 0.2%     significant
    crypto-aes:              1.009x as fast     1235.4ms +/- 0.8%    1224.4ms +/- 0.4%     significant
    crypto-ccm:              1.010x as fast     1166.1ms +/- 0.2%    1154.3ms +/- 0.2%     significant
    crypto-pbkdf2:           -                  2258.8ms +/- 0.5%    2254.4ms +/- 0.2% 
    crypto-sha256-iterative: 1.009x as fast      585.4ms +/- 0.3%     580.4ms +/- 0.1%     significant

=============================================================================

** TOTAL **:           1.024x as fast    1988.4ms +/- 1.1%   1941.7ms +/- 1.0%     significant

=============================================================================

  3d:                  1.115x as fast     444.1ms +/- 3.1%    398.4ms +/- 2.8%     significant
    cube:              -                  178.1ms +/- 2.7%    176.2ms +/- 4.6% 
    morph:             1.118x as fast      56.3ms +/- 4.5%     50.4ms +/- 2.7%     significant
    raytrace:          1.22x as fast      209.7ms +/- 5.1%    171.8ms +/- 3.8%     significant

  access:              -                  158.2ms +/- 2.5%    150.2ms +/- 6.1% 
    binary-trees:      1.093x as fast      32.3ms +/- 4.6%     29.6ms +/- 4.6%     significant
    fannkuch:          -                   70.7ms +/- 3.2%     67.6ms +/- 12.9% 
    nbody:             1.041x as fast      29.2ms +/- 2.1%     28.0ms +/- 1.9%     significant
    nsieve:            1.042x as fast      26.0ms +/- 1.5%     25.0ms +/- 0.8%     significant

  bitops:              -                   80.3ms +/- 0.6%     79.9ms +/- 2.4% 
    3bit-bits-in-byte: 1.037x as fast       7.2ms +/- 1.5%      6.9ms +/- 0.6%     significant
    bits-in-byte:      ??                  24.4ms +/- 1.4%     24.6ms +/- 2.2%     not conclusive: might be *1.007x as slow*
    bitwise-and:       -                   20.1ms +/- 0.6%     20.1ms +/- 2.7% 
    nsieve-bits:       -                   28.7ms +/- 0.5%     28.4ms +/- 3.7% 

  controlflow:         1.051x as fast      18.5ms +/- 0.1%     17.6ms +/- 2.7%     significant
    recursive:         1.051x as fast      18.5ms +/- 0.1%     17.6ms +/- 2.7%     significant

  crypto:              ??                 219.7ms +/- 1.0%    222.7ms +/- 3.4%     not conclusive: might be *1.014x as slow*
    aes:               ??                 104.0ms +/- 1.5%    106.9ms +/- 5.9%     not conclusive: might be *1.028x as slow*
    md5:               ??                  73.3ms +/- 1.1%     75.2ms +/- 3.4%     not conclusive: might be *1.026x as slow*
    sha1:              1.046x as fast      42.5ms +/- 1.0%     40.6ms +/- 0.7%     significant

  date:                ??                 321.6ms +/- 5.4%    326.3ms +/- 3.6%     not conclusive: might be *1.015x as slow*
    format-tofte:      ??                 163.4ms +/- 2.8%    164.4ms +/- 3.8%     not conclusive: might be *1.006x as slow*
    format-xparb:      ??                 158.2ms +/- 9.4%    161.9ms +/- 6.1%     not conclusive: might be *1.024x as slow*

  math:                1.012x as fast     142.7ms +/- 0.8%    140.9ms +/- 0.7%     significant
    cordic:            1.026x as fast      30.8ms +/- 0.3%     30.0ms +/- 0.7%     significant
    partial-sums:      -                   88.5ms +/- 1.0%     88.2ms +/- 1.0% 
    spectral-norm:     1.029x as fast      23.5ms +/- 1.3%     22.8ms +/- 0.4%     significant

  regexp:              ??                  88.0ms +/- 0.9%     88.7ms +/- 2.9%     not conclusive: might be *1.008x as slow*
    dna:               ??                  88.0ms +/- 0.9%     88.7ms +/- 2.9%     not conclusive: might be *1.008x as slow*

  string:              ??                 515.3ms +/- 1.0%    517.0ms +/- 1.9%     not conclusive: might be *1.003x as slow*
    base64:            1.031x as fast      41.5ms +/- 1.3%     40.2ms +/- 0.4%     significant
    fasta:             1.065x as fast      62.7ms +/- 4.6%     58.9ms +/- 1.3%     significant
    tagcloud:          ??                 176.7ms +/- 0.9%    180.8ms +/- 5.2%     not conclusive: might be *1.024x as slow*
    unpack-code:       *1.026x as slow*   170.0ms +/- 1.2%    174.4ms +/- 1.8%     significant
    validate-input:    1.029x as fast      64.5ms +/- 0.6%     62.6ms +/- 0.7%     significant

unfortunately, v8 has shown signs that this code is not entirely correct, so I will be looking into that, also this patch is wrong, since I explicitly rely on global state.  Some of that state will need to be moved into an IonContext.
Now implemented to store everything in a context, didn't hurt compile/run times at all.  
I also added some spew, and discovered that a few assumptions that I made about the locality of flushes are wrong on v8, which may be hurting performance.
Attachment #650539 - Attachment is obsolete: true
Attachment #651058 - Flags: review?(dvander)
Comment on attachment 651058 [details] [diff] [review]
/home/mrosenberg/patches/flushLessFrequently-r1.patch

Review of attachment 651058 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Ion.cpp
@@ +667,5 @@
>              Assembler::ToggleToCmp(loc);
>          else
>              Assembler::ToggleToJmp(loc);
>      }
> +

accidental newline?

@@ +844,5 @@
>      if (!GenerateCode(builder, graph))
>          return false;
>  
>      IonSpewEndFunction();
> +    //fprintf(stderr, "a\n");

Don't forget to remove this.

@@ -1028,5 @@
>      if (!CheckFrame(fp)) {
>          ForbidCompilation(script);
>          return Method_CantCompile;
>      }
> -

accidental deletion?

@@ -1080,5 @@
>      if (!CheckFrame(fp)) {
>          ForbidCompilation(script);
>          return Method_CantCompile;
>      }
> -

ditto

@@ -1107,5 @@
>      JS_CHECK_RECURSION(cx, return IonExec_Error);
>      JS_ASSERT(ion::IsEnabled(cx));
>      JS_ASSERT(CheckFrame(fp));
>      JS_ASSERT(!fp->script()->ion->bailoutExpected());
> -

ditto

@@ +1186,5 @@
>  IonExecStatus
>  ion::Cannon(JSContext *cx, StackFrame *fp)
>  {
>      JSScript *script = fp->script();
> +    //fprintf(stderr, "CANNON %s:%d\n", script->filename, script->lineno);

Don't forget to remove this.

@@ +1210,5 @@
>                   TraceLogging::ION_CANNON_STOP,
>                   script);
>      }
>  #endif
> +    //fprintf(stderr, "UN CANNON\n");

ditto

@@ +1218,5 @@
>  
>  IonExecStatus
>  ion::SideCannon(JSContext *cx, StackFrame *fp, jsbytecode *pc)
>  {
> +    //fprintf(stderr, "SIDECANNON\n");

ditto

@@ +1366,5 @@
>  
>  void
>  ion::InvalidateAll(FreeOp *fop, JSCompartment *c)
>  {
> +

Accidental newline?

@@ +1516,5 @@
> +    if (comp == NULL) {
> +        if (CurrentIonContext() != NULL)
> +            comp = GetIonContext()->compartment->ionCompartment();
> +    }
> +    // If a compartment isn't available, then be a nop, nobody will ever see this flusher

Is it okay that some auto cache flushers have no effect?

::: js/src/ion/IonCaches.cpp
@@ +384,5 @@
>  
>  bool
>  IonCacheSetProperty::attachNativeExisting(JSContext *cx, JSObject *obj, const Shape *shape)
>  {
> +

Accidental newline?

::: js/src/ion/IonCaches.h
@@ -88,5 @@
>          BindName,
>          Name,
>          NameTypeOf
>      };
> -

nit: accidental removal?

::: js/src/ion/IonCode.h
@@ +380,5 @@
>  };
>  
>  struct VMFunction;
> +class IonCompartment;
> +struct AutoFlushCache {

style nits:
 newlines above "struct", "private", "public:
 append _ to private variables

Is |used| already implied by whether start/stop are set?

@@ +391,5 @@
> +  public:
> +    void update(uintptr_t p, size_t len);
> +    static void updateTop(uintptr_t p, size_t len);
> +    ~AutoFlushCache();
> +    AutoFlushCache(char * nonce, IonCompartment *comp = NULL);

nit: const char *nonce

@@ +392,5 @@
> +    void update(uintptr_t p, size_t len);
> +    static void updateTop(uintptr_t p, size_t len);
> +    ~AutoFlushCache();
> +    AutoFlushCache(char * nonce, IonCompartment *comp = NULL);
> +    static void checkFlush() { /*JS_ASSERT(count > 0);*/ }

Is this function needed?

::: js/src/ion/IonCompartment.h
@@ +23,5 @@
>                               CalleeToken calleeToken, Value *vp);
>  
>  class IonActivation;
>  
> +

nit: accidental newline?

@@ +63,5 @@
>      IonCode *generateBailoutTable(JSContext *cx, uint32 frameClass);
>      IonCode *generateBailoutHandler(JSContext *cx);
>      IonCode *generateInvalidator(JSContext *cx);
>      IonCode *generatePreBarrier(JSContext *cx);
> +    js::ion::AutoFlushCache *flusher_;

nit: move this to where the other private variables are

@@ +146,5 @@
> +    }
> +    void setFlusher(AutoFlushCache *fl) {
> +        if (flusher_ == NULL || fl == NULL) {
> +            flusher_ = fl;
> +        }

nit: no braces needed here, use !flusher_ || !fl

::: js/src/ion/VMFunctions.cpp
@@ +95,5 @@
>  
>  bool
>  ReportOverRecursed(JSContext *cx)
>  {
> +    fprintf(stderr, "OVERRECURSED\n");

Don't forget to remove this.

::: js/src/ion/arm/Assembler-arm.cpp
@@ +2449,5 @@
> +{
> +    uintptr_t newStop = newStart + len;
> +    used = true;
> +    static int count = 0;
> +    if (start == NULL) {

nit: if (!start)

@@ +2470,5 @@
> +}
> +
> +AutoFlushCache::~AutoFlushCache()
> +{
> +    if (myCompartment == NULL)

nit: if (!myCompartment)

@@ +2475,5 @@
> +        return;
> +    
> +    IonSpewCont(IonSpew_CacheFlush, ">", name);
> +    if (myCompartment->flusher() == this) {
> +

nit: excess newline

@@ +2481,5 @@
> +        myCompartment->setFlusher(NULL);
> +    }
> +    if (!used) {
> +        return;
> +    }

nit: newlines before or after this if; no braces needed

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +71,5 @@
>   */
>  IonCode *
>  IonCompartment::generateEnterJIT(JSContext *cx)
>  {
> +    AutoFlushCache afc("GenerateEnterJIT", cx->compartment->ionCompartment());

Why does this need an explicit flush, but the other trampolines don't?

::: js/src/ion/shared/Assembler-x86-shared.cpp
@@ +121,5 @@
> +}
> +
> +AutoFlushCache::~AutoFlushCache()
> +{
> +    if (myCompartment == NULL)

nit: if (!myCompartment)

@@ +126,5 @@
> +        return;
> +
> +    if (myCompartment->flusher() == this) {
> +        myCompartment->setFlusher(NULL);
> +    }

nit: no braces needed here.

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +1138,5 @@
>          uint8_t *ptr = (uint8_t *)inst.raw();
>          JS_ASSERT(*ptr == 0xE9);
>          *ptr = 0x3D;
>      }
> +    struct PtrRange {

Is this struct used anywhere?

::: js/src/jsgc.cpp
@@ +5445,5 @@
>      JS_ASSERT(!rt->profilingScripts);
>  
>      ReleaseScriptCounts(rt->defaultFreeOp());
>  }
> +js::ion::IonContext *CurrentIonContext();

What was this added for?

::: nsprpub/configure
@@ +6808,5 @@
>  s%\[%\\&%g
>  s%\]%\\&%g
>  s%\$%$$%g
>  EOF
> +DEFS=`sed -f conftest.defs confdefs.h | tr '\012' ' '`

Don't forget to revert this change.
Attachment #651058 - Flags: review?(dvander) → review+
Marty says this landed.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: