Closed Bug 1101386 Opened 9 years ago Closed 7 years ago

Missing bytecode cache causes SpiderMonkey to be 6x slower than JSC at 2nd (or consecutive) execution of TypeScript-compiled code

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 900784

People

(Reporter: till, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [performance] [shumway])

Attachments

(5 files)

The TypeScript compiler turns syntax for defining classes and modules very similar to ES6 into ES5. The result is very closure-heavy code. Our performance when initially parsing and executing this code is comparable to V8's. JSC, however, is substantially faster - about 6x in my testing.

TypeScript-compiled code employs a pattern of creating nested IIFEs to express modules. Members of those modules are created as local vars in those IIFEs, with exports stored as members on an object passed into the IIFEs.

Additionally, TypeScript employs a custom __extends function to express class inheritance.

A longer example of real-world JS output from the Shumway codebase is attached, but here's some simple TypeScript:

module Foo {
  export class Bar {
    baz() {}
  }
}

module Qux.Quibble {
  import Bar = Foo.Bar;
  export class Quant extends Bar {
    private bar: Bar;
    constructor() {
      this.bar = new Bar();
    }
  }
}

And here's the resulting JS:

var __extends = this.__extends || function (d, b) {
    for (var p in b) if (b.hasOwnProperty(p)) d[p] = b[p];
    function __() { this.constructor = d; }
    __.prototype = b.prototype;
    d.prototype = new __();
};
var Foo;
(function (Foo) {
    var Bar = (function () {
        function Bar() {
        }
        Bar.prototype.baz = function () {
        };
        return Bar;
    })();
    Foo.Bar = Bar;
})(Foo || (Foo = {}));

var Qux;
(function (Qux) {
    (function (Quibble) {
        var Bar = Foo.Bar;
        var Quant = (function (_super) {
            __extends(Quant, _super);
            function Quant() {
                this.bar = new Bar();
            }
            return Quant;
        })(Foo.Bar);
        Quibble.Quant = Quant;
    })(Qux.Quibble || (Qux.Quibble = {}));
    var Quibble = Qux.Quibble;
})(Qux || (Qux = {}));
This just loads the previously-attached JS file via XHR and then eval's it. The printed number is the time in ms evaluation took.

On my machine, both current Nightly and Chrome Canary take 11.5ms to 12.5ms, whereas Safari takes between 1.8ms and 2.3ms.
Attached is a high-resolution Instruments run of a shell version of the simple benchmark.

Of the time we spend under js::DirectEval, 86% are under CompileScript, the rest under ExecuteKernel. Note that even just the execution part is roughly as much as JSC's total time.

(All following numbers in percentage points of time under CompileScript.)

The time under CompileScript is distributed like this:
57.2% under frontend::Parser
32.5% under frontend::EmitTree
 5.3% under frontend::NameFunctions
 3.1% under frontend::ParseNodeAllocator::freeTree
 1.7% under js::AtomizeString (Note that that's almost 10% of JSC's total time)
Note that the timing only improves by about 10% or so for consecutive runs, so it looks like our compilation cache doesn't really help with this.

Brian, is that something we could improve on? TypeScript-compiled code and, more importantly, the code pattern employed by the TypeScript compiler, are fairly popular and used by large applications.
Flags: needinfo?(bhackett1024)
This should run in the js shell. On my machine this takes roughly 150ms to load.
Some further testing clearly shows that the difference comes down to code caching: the first load isn't any faster in Safari than in Firefox or Chrome.

I just tested with bug 988353 applied but sadly, that doesn't help meaningfully.
Depends on: 988353
(In reply to Till Schneidereit [:till] from comment #5)
> Some further testing clearly shows that the difference comes down to code
> caching: the first load isn't any faster in Safari than in Firefox or Chrome.

Do you have a benchmark or a methodology you can describe that shows this?
(In reply to Brian Hackett (:bhackett) from comment #6)
> (In reply to Till Schneidereit [:till] from comment #5)
> > Some further testing clearly shows that the difference comes down to code
> > caching: the first load isn't any faster in Safari than in Firefox or Chrome.
> 
> Do you have a benchmark or a methodology you can describe that shows this?

Yes, I instrumented jsbench to use an evalWithCache-like primitive for some of the benchmarks [1].  This instrumentation was done by doing so:
 - I replaced geval by a call to evaluate, use one object saving the pre-compiled code.
 - I changed the first call to the jr function to execute the benchmark (and save the bytecode).
 - I added an init function to reset the state of all manipulated objects. 

This is using the bytecode cache as an intermediate representation, which can ideally be saved to the disk and loaded before loading the sources.  The idea of this bytecode cache being that the size of bytecode + lazy script used during the start-up is smaller than the size of the source, and thus potentially faster at loading.  The source could then be attached later, once we are looking for a lazy script.

Without any instrumentation:
    urem: [34.0, 33.0, 32.0, 32.0, 32.0, 34.0, 32.0, 34.0, 32.0, 32.0, 32.0, 33.0, 33.0,
33.0, 32.0, 34.0, 32.0, 32.0, 32.0, 32.0]

With the instrumentation (running the benchmark twice) and the cache being disabled:
    urem: [31.0, 30.0, 29.0, 33.0, 32.0, 32.0, 30.0, 30.0, 30.0, 33.0, 29.0, 29.0, 29.0,
34.0, 30.0, 31.0, 31.0, 31.0, 31.0, 29.0]

With the bytecode cache enabled:
    urem: [6.0, 7.0, 6.0, 7.0, 7.0, 7.0, 7.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 7.0, 6.0, 7.0, 6.0, 7.0, 6.0] 

[1] http://people.mozilla.org/~npierron/jsbench.tgz
Attached file shumway-bundle.zip
(In reply to Brian Hackett (:bhackett) from comment #6)
> (In reply to Till Schneidereit [:till] from comment #5)
> > Some further testing clearly shows that the difference comes down to code
> > caching: the first load isn't any faster in Safari than in Firefox or Chrome.
> 
> Do you have a benchmark or a methodology you can describe that shows this?

As an approximation, this should do:

1. Download the attached zip and extract it into a local folder
2. Start a web server in that folder (e.g. with `python -m SimpleHTTPServer 9000 .`)
3. Visit this URL (adapt as needed if the server is running on a different port): http://localhost:9000/examples/inspector/inspector-bundles.html?rfile=../as3_tiger/tiger.swf
4. Navigate to about:blank (this is to make sure Safari's startup doesn't interfere with the timings)
5. Close and reopen Safari
6. Navigate back in the tab's history
7. Open the console and note the various timings logged
8. Close the console and reload the tab
9. Reopen the console and note the various timings again

The second set of timings should be substantially lower. The most important one is "Load Player Bundle", which on my machine reproducibly goes from ~190ms to ~35ms. Doing the same in Firefox or Chrome gives minimal improvements at best.
Depends on: 1113378
No longer blocks: shumway-fb2
No longer blocks: shumway-jw2
We're a bit less concerned with startup performance with the focus on ads: spinning up a new Shumway instance won't block the browser, because it'll happen in the js-plugins process, so the effect boils down to delaying displaying of and potentially causing jank in ads.
Blocks: shumway-later
No longer blocks: shumway-m4
Summary: SpiderMonkey is 6x slower than JSC at initial execution of TypeScript-compiled code → Missing bytecode cache causes SpiderMonkey to be 6x slower than JSC at 2nd (or consecutive) execution of TypeScript-compiled code
I think there's a general issue here of what sort of caching we should do when the same scripts are loaded in multiple pages (or reloaded pages) or after a cold start.  But it's not clear to me what that caching should look like and I'm not going to have time to work on this for a while.
Flags: needinfo?(bhackett1024)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.