Performance issue with functions created within loops

NEW
Unassigned

Status

()

Core
JavaScript Engine
8 years ago
4 years ago

People

(Reporter: humph, Unassigned)

Tracking

Trunk
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
This may be known, but in case not...we noticed a performance issue today while fixing some code in Processing.js.  Functions declared within loops perform very differently under V8 and SM.  After reducing the code we were working on, we see the following (times are what I saw on my machine):

Test 1: http://bocoup.com/core/code/js-tests/f1lt3r-loop-test-1.html

    for( var i = 0; i < times; i ++ ){
      function strokeCol(){
        context.strokeColor = '#fff';
      }
      strokeCol();
    }    

V8 - 0.012
SM - 0.91

Test 2: http://bocoup.com/core/code/js-tests/f1lt3r-loop-test-2.html

    function strokeCol(){
      context.strokeColor = '#fff';
    }
    
    for( var i = 0; i < times; i ++ ){
      strokeCol();
    }    

V8 - 0.013
SM - 0.007

Test 3: http://bocoup.com/core/code/js-tests/f1lt3r-loop-test-3.html

    for( var i = 0; i < times; i ++ ){
      var strokeCol = function(){
        context.strokeColor = '#fff';
      }
      strokeCol();
    } 

V8 - 0.237
SM - 0.494

Whether the function is named or anonymous, whether it is attached to a var or called directly, etc. have very different effects on the two, and not always in good ways for Firefox.
For test1:

Abort recording of tree /Users/bzbarsky/test.js:5@54 at /Users/bzbarsky/test.js:6@55: deffun.

so that loop just doesn't trace.  Over here it runs at basically interp-speed.

For test3, we stay on trace the whole time according to TMFLAGS (as in, one trace entry and one trace exit)... but the Shark profile shows 38% in js_Interpret, 30% under js_CloneFunctionObject, 15% js_FindPropertyHelper.  I have no idea what's going on there.
(In reply to comment #1)
> For test3, we stay on trace the whole time according to TMFLAGS (as in, one
> trace entry and one trace exit)... but the Shark profile shows 38% in
> js_Interpret, 30% under js_CloneFunctionObject, 15% js_FindPropertyHelper.  I
> have no idea what's going on there.

One question is whether the closure is a regular closure or a null closure. It looks to me like it should be a null closure (i.e., non-escaping, upvar access can be implemented by static link technique (with display optimization in SM's case)) but there may be some extra constraint I don't know about.

Both of those call js_CloneFunctionObject. That function seems pretty much to just call js_NewObject. But it suggests that we can somehow make creating function objects faster.

js_FindPropertyHelper must be for the .strokeColor access. Maybe with all the cloning we never hit the prop cache?
Something I forgot to say in comment 2:

I guess the delayed cloning with read barrier should prevent all that cloning, so maybe something is tripping up the delayed cloning mechanism.
Ok, it looks like for test3 the issue is that the startShark calls do something weird with the shapes.  If I look at TMFLAGS=abort,stats,tracer for a build with those calls around the loop, I get:

No global slotlist for global shape 251, flushing cache.
Flushing cache.
No global slotlist for global shape 256, flushing cache.
Flushing cache.
etc

and the stats are:
recorder: started(1), aborted(0), completed(1), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), merged loop exits(0), unstableInnerCalls(0), blacklisted(1)
monitor: exits(0), timeouts(0), type mismatch(0), triggered(0), global mismatch(20), flushed(20)

So that profile is once again pure-interp.  If I take out the shark calls, there are only two calls to js_CloneFunctionObject.
If I move the loop into a function (whether the connect/disconnect to shark happens in that function or around the call to it), then the weird shape stuff goes away.  At that point, 53% of the total time is spent under js_NewNullClosure.  13% of total is in the function itself.  20% is in/under js_NewFinalizableGCThing (15% of total in, the rest in NewGCArena, lock, unlock).  18% is in atomic increment.

Another 26% of the time is the vm_fault user trap.

This is all on Mac, of course...  The numbers might look different elsewhere.

If I remove supervisor callstacks, 81% of the total time is under js_NewNullClosure.  The rest seems to be executing jitted code.
Depends on: 540992
Note about test 1:

In ECMA-262, the function statement gets hoisted to the top of the script. V8 follows this, so for V8 test 1 is the same as test 2. SM has an extension where function statements are not hoisted in some contexts (like in the top-level script?), so in SM test 1 is more like test 3 (although 2x slower, for some reason).
Probably 2x slower because it doesn't trace while test 3 does, right?
(In reply to comment #6)
> In ECMA-262, the function statement gets hoisted to the top of the script. V8
> follows this,

So do IE and Opera. Not sure about WebKit/JSC.

> so for V8 test 1 is the same as test 2. SM has an extension where
> function statements are not hoisted in some contexts (like in the top-level
> script?),

Parenthetical is where we do hoist: functions at top level of another function or program, i.e. not function expressions and not function statements that are subordinate to a block or other statement. Function definitions must be hoisted per ES1-3 and ES5, of course.

IE hoisting the extended syntax (function sub-statement) is annoying if you are trying to do "conditional compilation":

if (DEBUG) {
  function f() ...
} else {
  function f() ...
}

and even before IE extended ES3 this way, SpiderMonkey (in Netscape 4) had the function-in-substatement extension for just this use-case.

By covenant in Ecma TC39, we will in a future edition standardize function in sub-statement to bind a block-local (let-style) function name, hoisted to top of block. The block must be explicit: no if (DEBUG) function f()... (without braces). So be warned, don't count on either what we do now, or what IE et al. do, for this function-in-substatement extensions.

/be

Comment 9

5 years ago
Test 1
Nightly 27 - 0.247
Chrome 30 - 0.012
IE 10 - 0.018

Test 2
Nightly 27 - 0.003
Chrome 30 - 0.012
IE 10 - 0.015

Test 3
Nightly 27 - 0.029
Chrome 30 - 0.034
IE 10 - 0.117
(Assignee)

Updated

4 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.