Closed Bug 621961 Opened 9 years ago Closed 9 years ago

TI+JM: inline some native functions


(Core :: JavaScript Engine, defect)

Not set





(Reporter: jandem, Assigned: jandem)


(Blocks 1 open bug)


(Whiteboard: fixed-in-jaegermonkey)


(1 file, 7 obsolete files)

Kraken's imaging-gaussian-blur has *many* calls to Math.abs. For these calls TI knows it calls the builtin Math.abs and the argument is an integer. We can probably inline this in the methodjit like the tracer does. Maybe we can also call traceable natives?

This is related to bug 579574. The difference is that the patch there does this in the MIC and needs more guards, memory stores and type checks. IIUC, with type inference we can do this directly in JM and use the FrameState machinery, eg. for Math.abs(x) we can keep x in registers. Combined with bug 612204 this could make these functions very fast. 

Brian, does this makes some sense or am I missing something? I can work on a prototype for Math.abs and see how much it helps Kraken/V8.
Assignee: general → jandemooij
This is definitely the right idea.  From numbers in bug 579574 and my own experiments, most of the gain in optimizing these builtins is avoiding spill code --- Math.abs(int) should not spill anything, and just test on the register containing x.

It would probably be easiest to look at doing bug 612204 after putting together a prototype.  Then the JSOP_CALL could pop a constant-valued frame entry and specializing the call path would fall naturally from there.  That would require being able to identify singleton TypeObjects and go to the corresponding JSObject (probably just a 'JSObject *singleton' in TypeObject; eventually we will I think want to avoid constructing singleton TypeObjects and manipulate the JSObjects directly in type sets, but that's longer term).
Attached patch WIP v1 (obsolete) — Splinter Review
First stab. This inlines Math.abs(int) in the methodjit. Also triggers a recompile for Math.abs(MIN_INT) which results in a double. This also does constant folding for e.g. Math.abs(-10) -> 10. Easy to do; maybe not very relevant for abs, but it could help Math.sqrt, Math.PI etc.

Some numbers:
function f() {
    var abs = Math.abs;
    var t0 = new Date;
    var x = 0;
    for(var i=0; i<10000000; i++) {
        x = abs(i);
    print(new Date - t0);
V8: 116 ms
TM: 37 ms
JM (tip): 231 ms
JM (infer old): 218 ms
JM (infer new): 21 ms

Generated code looks like this:
movl %esi, %edi
cmpl $0x0, %edi
jge <end>
negl %edi
cmpl $0x0, %edi
jl <stub path>
GCC does this without jumps, but we need to check for MIN_INT anyway and GCC requires more registers. So I don't think this is too bad.

Next, I will do some cleanup, extend this to JSOP_CALLPROP and add other builtins like floor.
I had to change Math.abs to a local abs variable to test Kraken's imaging-gaussian-blur (until we optimize JSOP_CALLPROP; but it will be as fast). Also verified the output is still correct.

TM: 875 ms
JM+TI: 1243 ms
V8: 2610 ms
JM: 4719 ms

So, with this patch we may beat Crankshaft on another test. Can't wait to see TM+TI, should be impressive :)
Nice!  One issue here is that if MathAbsMinIntHelper is called and
triggers an overflow, you can't count on the recompiled caller still
knowing the called function is Math.abs or that the argument is an
int.  In such cases the recompiled code will not have a call to
MathAbsMinIntHelper, the recompiler will not be able to patch the
stack and will crash safely.

The compiler can test the 'recompiling' flag to see whether it might
need to emit extra OOL call paths for rejoining from recompiled calls.
 Since most inlinable natives will have cases where they need to stub
out and can trigger recompilation, we don't want to end up with many
different stubs for the compiler to call when recompiling.  Hopefully
what will work is to have all inline natives use the same stub,
probably ic::NativeCall/NativeNew (looking, I think there's a bug
where we don't call these when recompiling to an uncached call), and
have the Compiler make sure to emit a call to that function whenever
Thanks for your feedback. I tested the recompiling path and it seemed to work, but that was probably luck. I will try to understand the recompiler better and fix it the Right Way. (fortunately this should not affect performance numbers in comment 2 and 3).

We also have to think about Math.abs(float,int) etc. Limiting the number of stubs will be good and make it easier to compile more builtins. (At some point we should probably add some instrumentation to find out which functions and argument types will benefit the most from inlining.)
Cool stuff, will do |String()| and |Number()|.
Attached patch WIP v2 (obsolete) — Splinter Review
Refactoring, added a new file FastBuiltins.cpp for inlined natives, added tests. Static overflow (Math.abs(-2147483648)) is now handled correctly. JSOP_CALLPROP is now optimized like JSOP_CALLLOCAL, only difference is that we still do the "getgname Math" but I think we can optimize that too. Slow path calls stubs::SlowCall instead of a special Math.abs helper, like Brian suggested.
Attachment #500404 - Attachment is obsolete: true
Attached patch WIP v3 (obsolete) — Splinter Review
Added more Math functions. This patch inlines:

Math.abs(int) : int
Math.abs(double) : double
Math.sqrt(double) : double
Math.sqrt(int) : double
Math.floor(double) : int
Math.round(double) : int

Math.round helps Kraken audio-oscillator:

V8: 326 ms
TM: 595 ms
JM+TI old: 673 ms
JM+TI new: 344 ms

Smaller wins on v8-raytrace (Math.sqrt) and some other tests.

Next are Math.{sin,cos,log,..}. These are more complex (and need the math cache), I think I will try to call the traceable natives or call js_math_sin etc directly.

Maybe also Math.{pow,ceil,max,min}, don't know how common these are.
Attachment #500612 - Attachment is obsolete: true
Attached patch WIP v4 (obsolete) — Splinter Review
Inline charAt and charCodeAt with integer argument and non-rope string. For the typical micro-benchmark (doing c = s.charCodeAt(5); 1000000 times) we have:

TM: 6 ms
JM (FF4): 23 ms
JM+TI (new): 3 ms
d8: 10 ms

This helps SS string-base64 (11 ms to 7 ms) and some other SS tests.
Attachment #501026 - Attachment is obsolete: true
Duplicate of this bug: 579574
Attached patch WIP v5 (obsolete) — Splinter Review
Compile Math.pow(x, 0.5) and Math.pow(x, -0.5) as sqrt(x) and 1/sqrt(x).

Still seeing about a 4-5% win on SS, and ~40% on Kraken (hard to measure because some tests fail on the TI branch)
Attachment #506908 - Attachment is obsolete: true
Blocks: 638794
Attached patch WIP v6 (obsolete) — Splinter Review
Rebased and fixed some issues. JM branch can run Kraken now, for -m -n:

** TOTAL **: 1.51x as fast     14975.2ms +/- 0.2%   9902.2ms +/-
oscillator:              2.10x as fast       622.4ms +/- 0.3%    295.8ms
gaussian-blur:           4.13x as fast      6349.4ms +/- 0.1%   1536.6ms
Attachment #510628 - Attachment is obsolete: true
Nice! Are these numbers comparing vs. JM tip or TM tip?
Comparing JM tip vs. JM tip + patch. TM tip (-m only) runs Kraken in 12247 ms and gaussian-blur in 4968 ms, so some tests are slower with TI enabled (probably bug 619592 and others). Others like audio-fft and some crypto tests are already ~2x faster, pretty cool (TM tip -m vs JM tip -m -n)
(In reply to comment #11)
> Compile Math.pow(x, 0.5) and Math.pow(x, -0.5) as sqrt(x) and 1/sqrt(x).

Note that 1 / Math.pow(-0, 0.5) !== 1 / Math.sqrt(-0).
OS: Mac OS X → Windows XP
OS: Windows XP → All
Hardware: x86 → All
(In reply to comment #15)
> Note that 1 / Math.pow(-0, 0.5) !== 1 / Math.sqrt(-0).

Yes for pow(x, 0.5) we take a slow path for x === -Infinity and otherwise add 0.0 to convert -0 to +0. These edge cases can be very tricky.
Attached patch Patch v7 (obsolete) — Splinter Review
Rebased, added tests, refactoring. If there's anything you would have done differently, let me know and I'll happily change it. This passes jit-tests and jstests with -m -n -a.
Attachment #518509 - Attachment is obsolete: true
Attachment #520671 - Flags: review?(bhackett1024)
Summary: TypeInference: JM: optimize calls to simple builtins → TI+JM: inline some native functions
Comment on attachment 520671 [details] [diff] [review]
Patch v7

High level:

This patch looks great!  However, there are some correctness problems with the interface with the type information, which I've described below (and will clean up and add to the code as comments).  I'm happy to take the patch as is and fix these myself if you want, let me know your preference.  Would be good to get this landed quickly.

>     /* Whether any objects this represents have an equality hook. */
>     bool hasSpecialEquality;
>+    JSObject *singleton;

A comment here would be good.

Also, the singleton should be marked by TypeObject::trace.  Normally type objects get marked when one of the objects using them as the type gets marked (which could only be the singleton), but that is not always the case --- if a GC occurs during an inference operation or during compilation then all type objects in the compartment are marked, which could leave a dangling pointer here.

>-            jsop_getglobal(GET_SLOTNO(PC));
>-            if (op == JSOP_CALLGLOBAL)
>-                frame.push(UndefinedValue());
>+            if (JSObject *singleton = pushedSingleton(0))
>+                frame.push(ObjectValue(*singleton));
>+            else
>+                jsop_getglobal(GET_SLOTNO(PC));
>+            frame.push(UndefinedValue());

> bool
> mjit::Compiler::jsop_callprop(JSAtom *atom)
> {
>     FrameEntry *top = frame.peek(-1);
>+    JSObject *singleton = pushedSingleton(0);
>+    if (singleton && singleton->isFunction()) {
>+        // THIS
>+        frame.dup();
>+        // THIS THIS
>+        frame.push(ObjectValue(*singleton));
>+        // THIS THIS FUN
>+        frame.shift(-2);
>+        // FUN THIS
>+        return true;
>+    }

>+    /* Optimize singletons like Math for JSOP_CALLPROP. */
>+    JSObject *obj = pushedSingleton(0);
>+        frame.push(ObjectValue(*obj));
>+        return;
>+    }

It's important for property and global variable accesses that we don't optimize away any type monitoring for undefined values.  The type set for a property only describes the possible types for the property * if it exists *, it does not guarantee that the property is actually on the object.  There is an exception to this: global variables defined with 'var x' do not have undefined added to their type set even though they are properties of the global object.

So these ops can only be nop'ed if we can also determine that an undefined value cannot be pushed.  That can be done by establishing a new invariant:

For a TypeObject objtype with singleton JSObject obj and property id, if obj has a non-undefined value for id then any lookup on id in the future cannot produce undefined unless 'undefined' is also added to the type set for id in objtype.

Adding undefined in this way would trigger recompilation of the script and cause the pushedSingleton calls to return NULL.  We already almost have this invariant, we just need to make sure that deleting a property from an object marks that property as undefined in the type (we currently don't do anything with 'delete' in inference), must call cx->addTypeProperty in obj->deleteProperty.

Then, after calling pushedSingleton, CALLGLOBAL and CALLGNAME should check the global object (these opcodes only appear in compileAndGo code) to make sure the property is there.  The optimization for CALLPROP should only apply if we can test a JSObject which is definitely either the operand or is on the operand's prototype chain.  For the operand should be a constant FrameEntry (may need to make sure for the NAME accesses on Math, whose IC and monitoring checks can't be optimized but whose pushed value can still be marked as a singleton).  For ('foo').charAt(0), if the operand is definitely a primitive string and the code is compileAndGo, you can lookup the String.prototype for the global and check there.

The above invariant might need some tuning for handling properties with custom getter hooks, etc. This stuff gets pretty hairy pretty fast.


>+JSObject *
>+mjit::Compiler::pushedSingleton(unsigned pushed)
>+    if (!cx->typeInferenceEnabled())
>+        return NULL;
>+    types::TypeSet *types = script->types->pushed(PC - script->code, pushed);
>+    if (types && types->getKnownTypeTag(cx, script) == JSVAL_TYPE_OBJECT &&
>+        types->objectCount == 1) {
>+        types::TypeObject *obj = (types::TypeObject *)types->objectSet;
>+        return obj->singleton;
>+    }
>+    return NULL;

It's also important to make sure that all information the Compiler requests from type sets is duly recorded as constraints on those type sets, so that if the information changes in the future a recompilation will be triggered.  The 'getKnownTypeTag' test here adds such constraints, but the 'types->objectCount == 1' doesn't.  If a different object is added to the type set, recompilation is not guaranteed.  The existing recompilation constraints don't really capture this use case, I think this needs a 'JSObject *getKnownObject()' that works similarly to 'knownNonEmpty'.
Attachment #520671 - Flags: review?(bhackett1024) → review+
Attached patch Patch v8Splinter Review
Add comments and trace singleton. Brian, as discussed on IRC, I'd appreciate it if you could fix the type inference property part.
Attachment #520671 - Attachment is obsolete: true

I added a stopgap fix for opcodes which don't expect to pop a constant object (just stuff the object into a register), so we can use singleton info wherever we want and be more robust.
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-jaegermonkey
You need to log in before you can comment on or make changes to this bug.