Last Comment Bug 740167 - IonMonkey: 15x slower on reduced kraken audio-beat-detection benchmark
: IonMonkey: 15x slower on reduced kraken audio-beat-detection benchmark
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Hannes Verschore [:h4writer]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-28 14:58 PDT by Hannes Verschore [:h4writer]
Modified: 2012-04-06 05:52 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null (6.53 KB, patch)
2012-04-03 08:24 PDT, Hannes Verschore [:h4writer]
no flags Details | Diff | Splinter Review
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null (7.10 KB, patch)
2012-04-04 21:05 PDT, Hannes Verschore [:h4writer]
dvander: review+
Details | Diff | Splinter Review

Description Hannes Verschore [:h4writer] 2012-03-28 14:58:57 PDT
In order to find where we still can improve IonMonkey I reduced kraken audio-beat-detection. This gives me a rather small testcase pinpointing possible improvements IonMonkey still lacks. IonMonkey is 15x slower on following benchmark (to methodJIT):

BeatDetektor = function() {
    this.a_freq_range = new Array(128);
    this.ma_freq_range = new Array(128);
};
BeatDetektor.prototype.process = function() {
    this.last_update = 1;
    var range = 0;
    for (x = 0; x < 512000; x += 1) {
        this.a_freq_range[range] = 0;
        this.ma_freq_range[range] -= (this.ma_freq_range[range] - this.a_freq_range[range]) * this.last_update * 12.0;
    }
}
var bd = new BeatDetektor();
bd.process();

The optimization we lack is hoisting property get/sets. In JaegerMonkey we hoist the getproperty to the start of the loop and save it in a temporary value. The set property is hoisted till after the loop. That way we don't need to do the slow set/get property during the loop. (Except if we call another function or something or bailout or ...)
Comment 1 Hannes Verschore [:h4writer] 2012-03-28 14:59:37 PDT
Ah forgot to mention. The logic of jaegermonkey is in methodjit/LoopState.cpp
Comment 2 Hannes Verschore [:h4writer] 2012-03-28 16:19:59 PDT
Ok, I was a bit fast with my explanation and it wasn't right.
The following shows the problem better:

var ma_freq_range = new Array(128);
//ma_freq_range[0] = 0
function process() {
    for (var x = 0; x < 512000; x += 1) {
        ma_freq_range[0] = ma_freq_range[0]*12;
    }
}
process();

If you uncomment the second line it will be as fast as JM. So because ma_freq_range isn't initialized the body of the loop isn't as optimized as it could:

loadelement37-vn32loadelement elements29-vn29 constant0-vn7
mul38-vn34       mul loadelement37-vn32 constant0-vn33
storeelement39-vn39storeelement elements29-vn29 constant0-vn7 mul38-vn34
add40-vn41       add phi33-vn17 constant0-vn40
goto98-vn98      goto

VS

slots33-vn21     slots constant32-vn20
loadslot34-vn22  loadslot slots33-vn21
elements35-vn29  elements loadslot34-vn22
initializedlength36-vn30initializedlength elements35-vn29
loadelementhole37-vn31loadelementhole elements35-vn29 constant0-vn7 initializedlength36-vn30
mul41-vn34       mul loadelementhole37-vn31 box39-vn33
elements45-vn36  elements loadslot34-vn22
storeelementhole37-vn37storeelementhole loadslot34-vn22 elements45-vn36 constant0-vn7 mul41-vn34
add48-vn39       add phi28-vn17 constant0-vn38
goto90-vn90      goto

Not sure what the best solution is.
Comment 3 David Anderson [:dvander] 2012-03-28 19:33:14 PDT
How does Crankshaft do? JM has a pretty specific optimization for this case, I think it updates cached pointers to invalidated slots vectors. I can't think of any easy way to do this for IonMonkey yet.
Comment 4 Hannes Verschore [:h4writer] 2012-03-29 13:13:51 PDT
V8:  46ms
JM:  22ms
IM: 287ms
IM patched: 10ms

So I've found the problem. TI returns "void/int/double" for the mul as lhs operand (getelem) and "int32" for rhs. This means lhs is translated into a MIRType_Value and we do the VM-call "Multiply". The performance difference here is the overhead of the call!
In the best case MBinaryArithInstruction should be able to distinguish between MIRType_Value containing only elements from {MIRType_Undefined, MIRType_Null, MIRType_Int32, MIRType_Double, MIRType_Boolean} or also containing other.
- Maybe introducing MIRType_Number_Value?
- Or make it possible to view what the types exactly are in MIRType_Value?

(To test speed difference I just adjusted TypeInferenceOracle::getMIRType to check the baseFlags for TYPE_FLAG_UNDEFINED | TYPE_FLAG_INT32 | TYPE_FLAG_DOUBLE to return MIRType_Double)

Note: looks like by reducing the testcase it lost the real speed difference between JM and IM in the kraken audio-beat-detection benchmark. My quick fix doesn't introduce a speed difference on the full audio-beat-detection benchmark. So when I find some time I will reduce it again and see if I find something else.
Comment 5 Hannes Verschore [:h4writer] 2012-04-03 08:24:38 PDT
Created attachment 611826 [details] [diff] [review]
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null

Looks like my suggestion in previous comment is already possible, as MCompare does that. So this patch adjusts the infer function of MBinaryArithInstruction to get TypeOracle::BinaryTypes instead of TypeOracle::Binary. Using BinaryTypes it is possible to determine what types are used in MIRType_Value.

V8: 53ms
JM+TI: 13ms
IM: 13ms
Comment 6 David Anderson [:dvander] 2012-04-04 20:01:47 PDT
Comment on attachment 611826 [details] [diff] [review]
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null

Review of attachment 611826 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/MIR.cpp
@@ +805,5 @@
> +    if ((lhs >= MIRType_String && 
> +         (b.lhsTypes->baseFlags() == 0 || b.lhsTypes->baseFlags() >= types::TYPE_FLAG_STRING)) ||
> +        (rhs >= MIRType_String && 
> +         (b.rhsTypes->baseFlags() == 0 || b.rhsTypes->baseFlags() >= types::TYPE_FLAG_STRING))) {
> +        if (rval != MIRType_Double) {

These tests are kind of complicated - it would be good to use local variables to shorten it a little. It looks like the string check is duplicated, and it looks like (Object + Object => Double) might accidentally get a double specialization.
Comment 7 Hannes Verschore [:h4writer] 2012-04-04 21:05:11 PDT
Created attachment 612445 [details] [diff] [review]
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null

Adjusted the check to use local variables and it now contains the correct logic.
Comment 8 David Anderson [:dvander] 2012-04-05 16:50:33 PDT
Comment on attachment 612445 [details] [diff] [review]
Specialize binary arith to double if result is double and lhs/rhs only contains double/int/undefined/null

Review of attachment 612445 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with first two comments addressed.

::: js/src/ion/MIR.cpp
@@ +802,5 @@
> +
> +    // Test if lhs contain only undefined/null/int32/double types
> +    types::TypeFlags lhsFlags = b.lhsTypes->baseFlags();
> +    bool lhsIsNumber = lhsFlags != 0 && lhsFlags < types::TYPE_FLAG_STRING;
> +    lhsIsNumber = lhsIsNumber && b.lhsTypes->getObjectCount() == 0;

This still seems fairly complicated. Instead, I'd add a function to TypeSet to answer the question you're trying to ask, also to avoid inlining TI internals here.

@@ +809,5 @@
> +    bool rhsIsNumber = true;
> +    if (b.rhsTypes) {
> +        types::TypeFlags rhsFlags = b.rhsTypes->baseFlags();
> +        rhsIsNumber = rhsFlags != 0 && rhsFlags < types::TYPE_FLAG_STRING;
> +        rhsIsNumber = rhsIsNumber && b.rhsTypes->getObjectCount() == 0;

Same here.

@@ +820,5 @@
>      }
>  
> +    // Don't specialize values (having only undefined/null/int32/double) when result isn't Double
> +    if (lhs == MIRType_Value || rhs == MIRType_Value) {
> +        if (rval != MIRType_Double) {

At this point, we're guaranteed that lhs and rhs don't contain Object or String, right?

@@ +842,1 @@
>          specialization_ = MIRType_None;

Right. Important not to specialize to Double here since that would conflict with TI.
Comment 9 Hannes Verschore [:h4writer] 2012-04-06 05:52:01 PDT
(In reply to David Anderson [:dvander] from comment #8)

> This still seems fairly complicated. Instead, I'd add a function to TypeSet
> to answer the question you're trying to ask, also to avoid inlining TI
> internals here.

Done. It's called "knownNonStringPrimitive". This matched the other functions (knownNonEmpty/KnownSubset) in TypeSet best.

> @@ +820,5 @@
> >      }
> >  
> > +    // Don't specialize values (having only undefined/null/int32/double) when result isn't Double
> > +    if (lhs == MIRType_Value || rhs == MIRType_Value) {
> > +        if (rval != MIRType_Double) {
> 
> At this point, we're guaranteed that lhs and rhs don't contain Object or
> String, right?

Yes, the first check removes any type containing Object/String. (before the check listed here). 
The check listed here removes any not specialized input that can coerce to double, but TI report non-double return value.
Maybe the comment made you doubt? I've adjusted it to be more clear.

http://hg.mozilla.org/projects/ionmonkey/rev/ad6e00aaa6cd

Note You need to log in before you can comment on or make changes to this bug.