JM: (ARM) Mandreel is incredibly slow on ARM

RESOLVED FIXED in mozilla21

Status

()

defect
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: mjrosenb, Unassigned)

Tracking

unspecified
mozilla21
ARM
Linux
Points:
---

Firefox Tracking Flags

(blocking-b2g:-, b2g18 affected, b2g18-v1.0.0 affected, b2g18-v1.0.1 affected)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Here is the ratio of x64 scores / arm scores for most octane tests:
Richards: 9.61
DeltaBlue: 10.24
Crypto: 10.50
RayTrace: 12.40
EarleyBoyer: 11.42
RegExp: 19.11
Splay: 10.31
PdfJS: 15.63
Mandreel: 86.55
Gameboy: 17.84
CodeLoad: 11.16
Box2D: 15.92

For the most part, the average is a bit over 10, but mandreel is about 7-8 times above that. (regex doesn't look too good either, but still nowhere near mandreel).
    38.24%  js.ion.opt  perf-3476.map        [.] 0xb678975c
    18.82%  js.ion.opt  js                   [.] js::mjit::stubs::GetElem(js::VMFrame&)
     5.89%  js.ion.opt  js                   [.] JaegerStubVeneer
     5.13%  js.ion.opt  js                   [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
     3.46%  js.ion.opt  js                   [.] JSObject::getElement(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, unsigned int, JS::MutableHandle<JS::Value>)
     2.12%  js.ion.opt  js                   [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, unsigned int, JS::MutableHandle<JS::Value>)
whereas on x86, it looks like:
 72.98%  js.ion.opt  perf-5139.map      [.] 0x00007f39620880b2
  2.55%  js.ion.opt  js                 [.] js::Interpret(JSContext*, js::StackFrame*, js::InterpMode)
  2.21%  js.ion.opt  js                 [.] js::analyze::ScriptAnalysis::analyzeBytecode(JSContext*)
  1.29%  js.ion.opt  js                 [.] js::types::TypeMonitorResult(JSContext*, JSScript*, unsigned char*, JS::Value const&)
  1.02%  js.ion.opt  libm-2.17.so       [.] __ieee754_pow_sse2
mostly working patch.  It looks like the main issue is that both x86 and arm spend most of their time in JM, but x86 has typed arrays, and ARM needs to make stub calls for everything!  evidently, we turned off typed arrays on ARM a while back.  I fixed two bugs in the typed arrays implementation, but it looks like there is one last bug lurking around here...
another update, I now believe that some of the failures caused by this patch are due to a bad compiler.  Dropping from gcc-4.7 to gcc-4.4 made the crashes go away (however, the mandreel test fails a self check still).
There is still some commented out printf crud lying around, as well as a counter that I added.  Other than that, this patch works when using the native compiler on my panda board.  It fails a couple of jit tests when using a cross compiled shell.  My initial suspicion was that it was a compiler bug, but I'm looking into it more thoroughly now.
Attachment #709705 - Attachment is obsolete: true
Attachment #710539 - Flags: review?(Jacob.Bramley)
oh yes, speed numbers:

#with the patch applied
mjrosenb@panda:~/src/central/central-837347/js/objs/arm-loc/octane$  ../shell/js ./run-mandreel.js
Mandreel: 878
----
Score (version 8): 878

#before applying any patches
mjrosenb@panda:~/src/central/central-837347/js/objs/arm-loc/octane$ js.ion.opt ./run-mandreel.js
Mandreel: 150
----
Score (version 8): 150
Comment on attachment 710539 [details] [diff] [review]
/home/mjrosenb/patches/jm_TA-r1.patch

Review of attachment 710539 [details] [diff] [review]:
-----------------------------------------------------------------

There's a lot of debug code to be removed, and some trace changes that you said you wanted to split into another patch, but other than that I think I just have one or two minor comments.

::: js/src/TraceLogging.cpp
@@ +16,2 @@
>  
> +}

Is this supposed to be here?

@@ +71,5 @@
> +    gettimeofday(&tv, NULL);
> +    uint64_t usec = tv.tv_usec;
> +    uint64_t sec = tv.tv_sec;
> +    return usec + 1000000 * sec;
> +}

Is this supposed to be here?

::: js/src/assembler/assembler/ARMAssembler.cpp
@@ +515,5 @@
>  {
>      // This variant accesses memory at base+offset+(index*scale). VLDR and VSTR
>      // don't have such an addressing mode, so this access will require some
>      // arithmetic instructions.
> +    //bkpt(1235);

Is this supposed to be here?

@@ +576,5 @@
>  
>  void ARMAssembler::baseIndexFloatTransfer(bool isLoad, bool isDouble, FPRegisterID srcDst, RegisterID base, RegisterID index, int scale, int32_t offset)
>  {
>      ARMWord op2;
> +    //bkpt(3);

Is this supposed to be here?

::: js/src/assembler/assembler/ARMAssembler.h
@@ +1518,5 @@
>          void fmem_imm_off(bool isLoad, bool isDouble, bool isUp, int dest, int rn, ARMWord offset, Condition cc = AL)
>          {
> +            static int cnt = 0;
> +            //bkpt(cnt++);
> +            //fprintf(stderr, "%s register %u\n", isLoad ? "Loading" : "Storing", dest);

Is this supposed to be here?

::: js/src/assembler/assembler/AbstractMacroAssembler.h
@@ +89,5 @@
>          }
>  
>          RegisterID base;
>          int32_t offset;
> +        void rend() {fprintf(stderr, "[r%d, #%d]\n", base, offset);}

Is this supposed to be here?

@@ +98,5 @@
>              : base(base)
>              , offset(offset)
>          {
>          }
>          

There's some trailing whitespace here which you should remove whilst you're editing code nearby.

@@ +101,5 @@
>          }
>          
>          RegisterID base;
>          intptr_t offset;
> +        void rend() {fprintf(stderr, "[r%d, #%d]\n", base, offset);}

Is this supposed to be here?

@@ +133,5 @@
>          }
>  
>          RegisterID base;
>          int32_t offset;
> +        void rend() {fprintf(stderr, "[r%d, #%d]\n", base, offset);}

Is this supposed to be here?

@@ +152,5 @@
>          RegisterID base;
>          RegisterID index;
>          Scale scale;
>          int32_t offset;
> +        void rend() {fprintf(stderr, "[r%d, r%d << %d]\n", base, index, scale);}

Is this supposed to be here?

@@ +166,5 @@
>          {
>          }
>  
>          void* m_ptr;
> +        void rend() {fprintf(stderr, "[%p]\n", m_ptr);}

Is this supposed to be here?

::: js/src/assembler/assembler/MacroAssemblerARM.h
@@ +514,5 @@
>      }
>  
>      void store8(RegisterID src, BaseIndex address)
>      {
> +        //fprintf(stderr, "base is: %d\n", address.base);

Is this supposed to be here?

@@ +1157,5 @@
>      }
>  
>      void loadFloat(ImplicitAddress address, FPRegisterID dest)
>      {
> +        ASSERT((address.offset & 0x3) == 0);

Is that not asserted in floatTransfer? The range is limited too, so shouldn't we assert both the range and the alignment in the same place?

@@ +1162,2 @@
>          // as long as this is a sane mapping, (*2) should just work
> +        m_assembler.floatTransfer(true, (FPRegisterID)(dest*2), address.base, address.offset);

It was clearer when you modified dest outside the call (though this clearly caused a bug in the vcvt call). Why not make a dest_s? Also, doesn't floatShadow() so what you want here?

::: js/src/jsinferinlines.h
@@ +1458,5 @@
> +    static int count = 0;
> +    static int* inv = 0;
> +    if (++count == 2083)
> +        *inv = count;
> +#endif

Is this supposed to be here?

::: js/src/methodjit/BaseAssembler.h
@@ +1122,5 @@
>          switch (atype) {
>            case js::TypedArray::TYPE_INT8:
>            case js::TypedArray::TYPE_UINT8:
>            case js::TypedArray::TYPE_UINT8_CLAMPED:
> +            //fprintf(stderr, "storeToTypedIntArray: "); address.rend();

Is this supposed to be here?

@@ +1150,5 @@
>  
>      template <typename T>
>      void storeToTypedArray(int atype, ValueRemat vr, T address)
>      {
> +        //fprintf(stderr, "storeToTypedArray: "); address.rend();

Is this supposed to be here?

@@ +1166,5 @@
>      }
>  
>      void storeToTypedArray(int atype, RegisterID objReg, Int32Key key, ValueRemat vr)
>      {
> +        //fprintf(stderr, "Array is located at r%d + r%d\n", objReg, key);

Is this supposed to be here?

@@ +1185,5 @@
>                  scale = Assembler::TimesEight;
>                  break;
>              }
>              BaseIndex addr(objReg, key.reg(), scale);
> +            //fprintf(stderr, "toplevel: ");        addr.rend();

Is this supposed to be here?

::: js/src/methodjit/MethodJIT.cpp
@@ +1111,5 @@
>  #if JS_TRACE_LOGGING
>      AutoTraceLog logger(TraceLogging::defaultLogger(),
>                          TraceLogging::JM_START,
>                          TraceLogging::JM_STOP,
> +                        fp->script());

As discussed, this looks Ok, but you wanted to move the trace changes to another patch.

@@ +1124,5 @@
>  #if JS_TRACE_LOGGING
>      AutoTraceLog logger(TraceLogging::defaultLogger(),
>                          TraceLogging::JM_SAFEPOINT_START,
>                          TraceLogging::JM_SAFEPOINT_STOP,
> +                        cx->fp()->script());

As discussed, this looks Ok, but you wanted to move the trace changes to another patch.

@@ +1129,2 @@
>  #endif
> +    //fprintf(stderr, "BAM\n");

Is this supposed to be here?
Attachment #710539 - Flags: review?(Jacob.Bramley) → review+
Comment on attachment 710539 [details] [diff] [review]
/home/mjrosenb/patches/jm_TA-r1.patch

Review of attachment 710539 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Ion.h
@@ +195,5 @@
>          edgeCaseAnalysis(true),
>          rangeAnalysis(true),
>          uce(true),
>          parallelCompilation(false),
> +        usesBeforeCompile(1024),

was this change intended?
There is a lot of printfs and dead debug code in this patch. Please clean it up before landing. Thanks!
This looks critical for emscripten codebases as well.
https://hg.mozilla.org/mozilla-central/rev/e4448a24543b
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
I don't see a speedup on mandreel on arm on awfy after this landed, and in an emscripten codebase we also did not see a speedup with this change on arm either? Are those not testing what this bug changed?
(In reply to Alon Zakai (:azakai) from comment #12)
> I don't see a speedup on mandreel on arm on awfy after this landed, and in
> an emscripten codebase we also did not see a speedup with this change on arm
> either? Are those not testing what this bug changed?

there has been a hiccup on awfy, and it hasn't updated since the 12th.  I'm out of town, and sadly, can not fix the problem from here.
Oh ok, thanks, I saw the x86 one was updating and mistook that to mean all machines were ok.
Some of our gaming partners are using mandreel.
blocking-b2g: --- → tef?
IM isn't enabled on b2g18, right?
Based on comment 16 this looks to be unaffected for b2g18 - please renom if that status changes.
When I filed this bug, I thought it was a problem with IM, but after looking into it, I discovered that it was actually a problem with code that I'd written in July, 2011 for Jaeger Monkey.  Guess I never updated the name of the bug (and didn't actually realize that I could do that)
Summary: IonMonkey: (ARM) Mandreel is incredibly slow on ARM → JM: (ARM) Mandreel is incredibly slow on ARM
Ah, sorry about that.
blocking-b2g: - → tef?
blocking-b2g: tef? → leo?
So we don't want to support our partner games on b2g18?
This is being discussed in email currently as part of a general ionmonkey b2g-turn-on.  This patch in isolation won't help partner games; there's a whole package of patches that need to land, some of which have yet to be written.  We're waiting until we have everything working well on trunk, and then deciding how/when to uplift.
(In reply to Gregor Wagner [:gwagner] from comment #23)
> So we don't want to support our partner games on b2g18?

The tracking request only has meaning in the presence of a concrete blocking decision. I think that's why Lukas removed the request.

Product team, can you advise as to whether this is required for 1.1?
Flags: needinfo?(ffos-product)
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #24)
> This is being discussed in email currently as part of a general ionmonkey
> b2g-turn-on.  This patch in isolation won't help partner games; there's a
> whole package of patches that need to land, some of which have yet to be
> written.  We're waiting until we have everything working well on trunk, and
> then deciding how/when to uplift.

When everything is working well, let's revisit the blocking/tracking nomination, but keeping - for now.
Vlad, can you follow up on this as part of the package of patches?
Flags: needinfo?(ffos-product) → needinfo?(vladimir)
blocking-b2g: leo? → -
Old for now; we're tracking Octane etc. separately as part of general benchmarking.
You need to log in before you can comment on or make changes to this bug.