Closed Bug 643615 Opened 11 years ago Closed 7 years ago

Analyze v8-deltablue

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: dmandelin, Assigned: dmandelin)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

On 32-bit linux, v8/nocs is 1.7x faster.
Attached file JS profiler output
Initial observations:

- Most of the time is in the methodjit and unknown stubs. One of them is clearly for |new Array|, which I'm sure we need to speed up. Others look like Array.push and possibly some IC stubs, which might not be a big deal.

- Otherwise, the time is distributed through different JS functions, but 60% of the time is in the top 10 functions. 

- 3% of time is in GC.

Looks like the next thing to do is go through the top functions individually.
Top JS function:

    frac      cum   script
0.109870 0.109870   deltablue.js             776
                     0.097054 0.883   RUN_MJITCODE               776
                     0.007924 0.072   RUN_MJITCODE               778
                     0.004839 0.044   RUN_MJITCODE               777
                     0.000053 0.000   STUB_UNKNOWN               776

Plan.prototype.execute = function () {
  for (var i = 0; i < this.size(); i++) {
    var c = this.constraintAt(i);
    c.execute();
  }
}

The profiler points at |this.size()|, which calls through here:

Plan.prototype.size = function () {
    // v is an |OrderedCollection|
    return this.v.size();
}

OrderedCollection.prototype.size = function() {
    // elms is from |new Array|
    return this.elms.length;
}

On the whole test, we take about 1700 ms. According to the profiler, we spend 11% of our time in this hot line of code, or 187 ms. We run that line 12M times, so that's 15.6 ns/iteration.

In a reduced microbenchmark that I used to investigate the slowdown, we run 10M iterations in 163 ms, or 16.3 ns/iteration. So the reduced version seems like a valid model. The reduced version just has an empty loop body in |execute|, thus demonstrating that |this.size()| does in fact take almost all the time in this function.

By comparing the reduced version to v8nocs, I found that our slowdown (50ms on the reduced version: 163 ms vs 113 ms) is from these sources:

1. Inlining. If I inline |this.size()| all the way down, we gain 20 ms. So that's 40% of the difference here.
2. |this|. We seem to lose 6 ms from computing |this|.
3. |.v.elms|. We lose 4 ms here. That's pretty small. I think it's probably due to our object layouts and their worse cache behavior.
4. |.length|. 20 ms here, it seems we are running a slow path for array length here, or else our IC stub is way slower.

This explains about 10% of our slowdown so far.
Bill pointed out that as far as we know v8/nocs doesn't do inlining. And in my tests inlining at the source level does help v8/nocs run faster. So it seems that the difference here is actually that they are faster at method calls, as shown by this benchmark. They are not faster at a plain call to the global function |g|, but they are faster at calling |a.g|.
About 10% of our time is in Array.push and Array.pop. v8/nocs is much faster (3x) on this microbenchmark.
On this benchmark, v8/nocs is 2x faster for the not-equal case and 4x faster for the equal case.
Summarizing, here are the optimizations we need here:

1. Faster method calls. This is probably about 2/3 of what's slowing us down here. See comment 3. Inlining would also fix this.

2. Faster array and object allocation.

3. Faster Array.push/pop

4. Faster equality testing

5. Faster array.length
(In reply to comment #3)
> They are not faster at a plain call to the global
> function |g|, but they are faster at calling |a.g|.

Since JSOP_CALL should do roughly the same thing for g() vs. a.g(), does that mean that its JSOP_CALLPROP that needs loving?
(In reply to comment #7)
> (In reply to comment #3)
> > They are not faster at a plain call to the global
> > function |g|, but they are faster at calling |a.g|.
> 
> Since JSOP_CALL should do roughly the same thing for g() vs. a.g(), does that
> mean that its JSOP_CALLPROP that needs loving?

That would be my guess, but I don't really know.
This JM bug has been superseded by IonMonkey bug 768739.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.