Closed Bug 643629 Opened 13 years ago Closed 13 years ago

Firefox slower than Chrome at variable decls

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: humph, Unassigned)

References

()

Details

(Keywords: perf)

Attachments

(2 files)

Rick found some interesting numbers when testing perf for multiple vars vs one var; the interesting thing, for us, is how much slower we are than Chrome in *all* cases:

http://jsperf.com/a-whole-lotta-var

I'm not sure how to characterize this, so my summary may be off.  It seems like we shouldn't be this much slower than Chrome at such a simple function.  Even worse is the regression from 3.6 to 4 and trunk.

Jeff suggests:

18:12 < Waldo> could be name analysis time
18:12 < Waldo> but pull out the profiler to see for sure :-)
Attached file Shell testcase
A few notes:

1)  It's only slow if the methodjit is involved (independent of whether profiling is).  If I disable methodjit, and hence force tracejit, we're within a factor of 2 of my Chrome 11 dev build on the jsperf testcase.  Hence the regression from 3.6: we no longer trace this on trunk.

2)  If I compare v8 shell vs JM shell on the attached testcase, JM comes out only 2x slower than v8 shell.  TM comes out 5x faster than v8 shell.  I see the same times in the Chrome 11 browser and Firefox on the browser testcase I'm about to attach.  I have no idea why jsperf is coming up with numbers for Chrome that are about 12x faster than the ones I get in a simple testcase without all their jazz.  Note that for both JM and TM my numbers track the jsperf numbers pretty closely; they just disagreee for Chrome.

3)  The test is fairly bogus in that it's not measuring "variable decls".  If I comment all the variable decls out entirely the TM times don't change at all, and neither do the v8 times.  Which makes sense: it's all obvious dead code, so is eliminated.  So all that's being measured here in TM and V8 is the speed at which the JS engine can count from 0 to some big number and call functions (I get a 30% speedup in TM and v8 by commenting out the function calls entirely).  JM does get a bit faster if I comment out the vars (by 20% or so).  That means that 80% of the time is still the counting and function call itself.
Attached file HTML testcase
By the way, I tried changing the code to something more like:

  function benchmark(f, count) {
      var start = new Date;
      while (count--) {
  	f();
      }
      var end = new Date;
      
      print(end - start);
  }

  benchmark(lotta, 40000000);
  benchmark(onlyone, 40000000);

and that makes Chrome a bit faster (as expected; their function local access are a lot faster than their global var access), but still 3x slower than TM.  So I still don't understand where jsperf is getting its numbers from.

For the rest, it's not worth profiling here until we see the times JM+TI has, imo.  I fully expect that to be quite different from just JM.
ccing jdd too, in case he cares enough to dig into why jsperf is producing numbers that are off by this much.
If jsperf is off, it would be good to know that, given how often it's used as a baseline for various js perf tests.
jsPerf compiled function can be viewed via:

javascript:void document.body.appendChild(document.createTextNode(ui.benchmarks[0].fn.compiled));

For me it produces:

function (m1300767194359, n1300767194359){
  var r1300767194359,
      i1300767194359 = m1300767194359.count,
      f1300767194359 = m1300767194359.fn,
      s1300767194359 = n1300767194359.nanoTime();

  while(i1300767194359--){
    lotta();
  }
  r1300767194359 = (n1300767194359.nanoTime() - s1300767194359) / 1e9;

  return {
    time: r1300767194359,
    uid: "1300767194359"
  };
}
The test is bogus but the good news is jsPerf at least informs them that both tests are statistically indistinguishable from each other for the specific browser:

http://dl.dropbox.com/u/513327/d954abf5f0b98bae2909bbd8d7a6e77d.png
I made the test closer to Boris' test and the results were closer between browsers:

http://jsperf.com/a-whole-lotta-var/2
http://dl.dropbox.com/u/513327/4138eaf1fcaf869396c78cf01c3694eb.png

It compiles to:

function (m1300768630640, n1300768630640) {

  function lotta() {
    var a = 'a';
    var b = 'b';
    var c = 'c';
    var d = 'd';
    var e = 'e';
    var f = 'f';
    var g = 'g';
    var h = 'h';
    var i = 'i';
    var j = 'j';
    var k = 'k';
    var l = 'l';
    var m = 'm';
    var n = 'n';
    var o = 'o';
    var p = 'p';
    var q = 'q';
    var r = 'r';
    var s = 's';
    var t = 't';
    var u = 'u';
    var v = 'v';
    var w = 'w';
    var x = 'x';
    var y = 'y';
    var z = 'z';
    return true;
   }

   function onlyone() {
    var a = 'a',
        b = 'b',
        c = 'c',
        d = 'd',
        e = 'e',
        f = 'f',
        g = 'g',
        h = 'h',
        i = 'i',
        j = 'j',
        k = 'k',
        l = 'l',
        m = 'm',
        n = 'n',
        o = 'o',
        p = 'p',
        q = 'q',
        r = 'r',
        s = 's',
        t = 't',
        u = 'u',
        v = 'v',
        w = 'w',
        x = 'x',
        y = 'y',
        z = 'z';

    return true;
   }

   var r1300768630640,
       i1300768630640 = m1300768630640.count,
       f1300768630640 = m1300768630640.fn,
       s1300768630640 = n1300768630640.nanoTime();

   while(i1300768630640--){
     lotta();
   }
   r1300768630640 = (n1300768630640.nanoTime() - s1300768630640) / 1e9;

   return {
     time: r1300768630640,
     uid: "1300768630640"
   };
 }
JDD, the issue is people comparing across browsers....  ;)

Thanks for the hint in comment 6.  Using that code (or rather the somewhat different code Chrome 11 actually gets) I can reproduce the speed that jsperf is measuring.  Let me figure out what's going on there.
Aha.  The issue is that in V8 this code:

    var g = lotta;
    while (c--) {
	g();
    }

runs way slower than this code:

    while (c--) {
      lotta();
    }

Presumably in the latter case it inlines and dead-code-eliminates the whole thing or something, so you're left with just the cost of decrementing c.  In fact, the latter loop runs at the same speed as if the lotta() call is removed altogether.

So we're back to the test measuring silly things; the entirety of the test is dead code, and v8 eliminates it completely.  TM almost does; it still has the function identity guard for the callee that will almost certainly go away at some point.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
Though

    var g = lotta;
    while (c--) {
      g();
    }

still has some v8 optimization going on:
http://dl.dropbox.com/u/513327/37a23c064cf1adc90deed7e1504b8b89.png
Oh, sure.  That's comparing to JM; see comment 1 item 1.  Try disabling methodjit in preferences.  ;)
With methodjit disabled it's crazy fast:
http://dl.dropbox.com/u/513327/8847fb53d85a711e6d9304394fc738dd.png

So why the change in FF4 to use methodjit?
methodjit is faster at some things; tracejit faster at others.  There's a heuristic profiler that picks which one to use for a given loop.  This depends on the exact code run in the loop, the iteration count of the loop the first several times it runs, etc, etc.  The heuristic depends on the number of "good" operations that the tracer is much better than the methodjit at (e.g. function calls) and the number of total operations.

In this case, each of the assignments in the function counts towards the number of total operations, and there is only one good operation: the function call.  That causes the profiler to decide to not tracejit the loop.

Proper tuning of the profiler is an ongoing thing, but tuning it based on microbenchmarks like this is silly.  Given any heuristic, you can always find a case where it guesses "wrong".  The right thing to do is to tune for large workloads we care about, and not worry too much about pointless microbenchmarks, imo.
I agree with you on the topic of micro-benchmarks. Thanks for explaining the details.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: