Closed Bug 1113378 Opened 10 years ago Closed 9 years ago

Always Fully Parse IIFEs

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: sstangl, Assigned: sstangl)

References

Details

(Keywords: perf)

Attachments

(2 files, 1 obsolete file)

Shumway is written in Typescript, which compiles to IIFE-heavy JS. For each IIFE we perform a syntax-only parse to generate a LazyScript, then later perform a full parse when the expression is invoked. On a version of shumway.player.js modified to just time code load, we take about 3x as long as d8, with all the time spent in the frontend.

I modified Parser<FullParseHandler>::functionArgsAndBody() to never attempt a SyntaxParse. On shumway.player.js, this moves us to the following code load times:

> Before SM: 143ms
> Before d8:  42ms
> After  SM:  22ms

This might be quick enough that a JSC-style cache isn't as much of a win.

v8 has simple heuristic logic in their AST to handle IIFEs while still allowing for lazy scripts: they check for expressions of the form "(function() { ...}();" and "var x = function() { ... }();", and set an is_parenthesized() flag. If they then encounter as is_parenthesized() function, they always perform a full parse.

We should detect IIFEs and use the FullParseHandler.
Attached patch WIP Patch (obsolete) — Splinter Review
This patch implements the logic described in Comment 0, but in an aesthetically displeasing way, and in a way that doesn't apply to arrow functions.

It moves the shumway.player.js load time from 143ms to 48ms, roughly on par with d8.

The disparity between the twice-as-fast results in Comment 0 and these results are due to the lazy generation of self-hosted code, which doesn't fit the IIFE heuristic.
Blocks: shumway-fb2
Keywords: perf
First of all: awesome!

Two questions, though: the most important one is about the numbers. Specifically, why does syntax parsing followed by eager parsing for everything but the innermost functions (as a slightly simplified scenario) take 7x as long as eagerly parsing everything? That doesn't seem right to me. on IRC, jandem speculated that we might have some bad interactions with source compression. Could it be that we do the syntax parse, then throw away the uncompressed source, only to have to decompress it again for every IIFE we run? That'd be pretty horrible.

The second question is about your reference to self-hosting: I don't understand what you mean, exactly. All self-hosted code is eagerly parsed during startup in all cases, because we then freeze the self-hosting compartment and share it among all runtimes in a process. But even if that weren't the case: if the self-hosted code doesn't fit the IIFE heuristics and we don't eagerly parse it because of that, and then no IIFEs are in fact run in that code, why would that cost us 26ms?
Flags: needinfo?(sstangl)
(In reply to Till Schneidereit [:till] from comment #2)
> Specifically, why does syntax parsing followed by eager parsing for
> everything but the innermost functions (as a slightly simplified scenario)
> take 7x as long as eagerly parsing everything?

Honestly, I'm not sure. Without the patch in this file, we syntax-parse and later full-parse 2409 functions (at least, as seen by createScriptForLazilyInterpretedFunction()). There are 2408 cache hits, and only 1 miss.

> The second question is about your reference to self-hosting: I don't
> understand what you mean, exactly.

Yeah, that comment was incorrect. The actual reason for the discrepancy is that with the patch, we still call CompileLazyFunction() 480 times. Figuring out why lazy compilation is such a hit to performance should let us bridge back to the original 22ms result.
Flags: needinfo?(sstangl)
Comment on attachment 8539548 [details] [diff] [review]
WIP Patch

Review of attachment 8539548 [details] [diff] [review]:
-----------------------------------------------------------------

It turns out that this patch actually isn't such a bad way to implement the detection. Detecting TOK_FUNCTION in primaryExpr() would be nicer in theory, but it requires making a new function that duplicates non-trivial code.

I stared at differential profiles for a long time, and I'm still not seeing what the source of slowdown is.
Attachment #8539548 - Flags: review?(jwalden+bmo)
(In reply to Sean Stangl [:sstangl] from comment #0)
> Shumway is written in Typescript, which compiles to IIFE-heavy JS.

Note that TypeScript 1.4 will have an ES6 compilation target option
https://github.com/Microsoft/TypeScript/wiki/Roadmap#14 (and 1.4 will be the next release which should be rather sooner than later [1]).

I haven't looked into the details, but given Firefox is planning on supporting ES6 classes natively, it's possible that soon enough, Shumway will be able to compile its TS to something with much less IIFEs.


[1] https://github.com/Microsoft/TypeScript/milestones/TypeScript%201.4
Although this patch completely solves player load time in the shell, it looks like the plugin load situation is more complicated, and there's much more to be done. Loading fbplayer.swf in the Shumway Inspectorwith Big Buck Bunny from localhost, I get the following timings:

Without patch:

> Load Player Dependencies: 143.37ms

With patch:

> Load Player Dependencies: 119.68ms

Similarly, without the inspector, bare fbplayer.swf's load time moves from 82ms to 52ms, not the same reduction we see in the shell.
The actual performance of the Facebook Video Player on the Facebook site apparently mimics the latter, going 82ms -> 52ms.

Although the times are low, there is a good three seconds between the appearance of the Shumway logo and the render of the first image frame.
Comment on attachment 8539548 [details] [diff] [review]
WIP Patch

Review of attachment 8539548 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great in general.

I seem to remember seeing at one point that in addition to this form:

(function() {

});

for top-level code, people also did this:

new function() {

}();

The latter case, |new function|, *almost always* invokes (well, |new|s) the following function, even more reliably, I'd say, than parentheses do.  So we should tag that case identically.  New patch (pun intended) that fixes |new| and makes the enum change?

::: js/src/frontend/Parser.cpp
@@ +2148,5 @@
>          return null();
>  
> +    /* Mark functions of the form "(function" for eager compilation. */
> +    if (isInParens)
> +        pn = handler.setInParens(pn);

Could we add a handler method setPresumeInvoked that delegates to setInParens internally, for a little more readability?  And sibiling pn->shouldPresumeInvoked() (name not super-happy, suggestions encouraged) that the code in the next hunk uses, that defers to the in-parens test internally too?  Lets us get rid of the comment on this, and not have any weird presume-invoked/in-parens impedance mismatch to reading?

@@ +2314,5 @@
>      // Try a syntax parse for this inner function.
>      do {
> +        // If the function is immediately preceded by a left parenthesis,
> +        // assume that the function is an IIFE, likely to be immediately
> +        // executed. As a heuristic optimization, perform a full parse.

Considering the enum change above, how about:

If we're presuming this function is an expression that is immediately invoked, don't attempt to syntax-parse: proceed immediately to full parse to eliminate overhead a lazy parse.  We might be wrong, but immediately-invoked function expressions (IIFEs) are common enough it pays off for lots of code.

::: js/src/frontend/Parser.h
@@ +536,4 @@
>       * suffix) and a never-inlined version (with an 'n' suffix).
>       */
>      Node functionStmt();
> +    Node functionExpr(bool isInParens = false);

Random boolean arguments aren't nearly as nice as enums, as the various comments you have by booleans show.  Let's make it an enum, I guess in js::frontend sibling to Parser.

enum InvokePrediction { PresumeUninvoked = false, PresumeInvoked = true };

(false/true so they can be cleanly if-tested for a |InvokePrediction invoked| argument.)

There's room for better suggestions for names here if you see anything better, for sure.
Attachment #8539548 - Flags: review?(jwalden+bmo) → feedback+
Thanks for the review! This detects the "new function" pattern as an IIFE, and addresses the other comments.
Attachment #8539548 - Attachment is obsolete: true
Attachment #8548478 - Flags: review?(jwalden+bmo)
Comment on attachment 8548478 [details] [diff] [review]
Predict IIFEs from "(function" and "new function"

Review of attachment 8548478 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +7537,4 @@
>                  handler.setOp(lhs, JSOP_SPREADNEW);
>          }
>      } else {
> +        lhs = primaryExpr(tt, invoked);

Doesn't line 7510 in my tree -- this one:

        Node ctorExpr = memberExpr(tt, false);

also need PredictInvoked to trigger IIFE status for new function and all?  Pretty sure that's where.
Attachment #8548478 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae6a30f691ee

Added PredictInvoked to that line. In my testing it wasn't required for handling "new function", but it looks right anyway.
Backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/0dfa3b77405b

Apparently certain generators with a specific yield expression require syntax-only parsing for some reason that completely eludes me.
Generators apparently sometimes require the quirks of syntax-only parsing. Specifically, the expression "(function*(){(1, yield 1, 2)})" mis-parses if we run it in full-parse mode.

There is little benefit to marking generators as IIFEs. Rather than actually find out what's going wrong, this follow-up patch glosses over generator-specific parsing issues by just disallowing them from the IIFE detection.
Attachment #8549196 - Flags: review?(jwalden+bmo)
So comment 12 and comment 13 mildly confused me slightly.  The issue -- which we have had before, and will have again, until we syntax-parse everything -- is that we have a syntax error that's *only* reported in full parsing.  This error is pre-existing:

  var f = (function*(){(1, yield 1, 2)}); // no error
  f(); // error: full-parsing actually finds the problem

The changes here just made this more visible, by moving the error to when it should *actually* happen, breaking a test that depended upon the delay.  :-\

ES6, unlike SpiderMonkey, doesn't require yield expressions to be parenthesized, so making ^ a syntax error is actually flat-out wrong.  Given this was purely a syntax error, I don't see why we couldn't eliminate the must-be-parenthesized requirement to fix this problem.  The only reason was unintuitive precedence, but ES6 seems to have decided that just doesn't matter.  I'll file a bug on it.
Comment on attachment 8549196 [details] [diff] [review]
0001-IIFE-prediction-should-ignore-generators.patch

Review of attachment 8549196 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +2309,4 @@
>          // parse to avoid the overhead of a lazy syntax-only parse. Although
>          // the prediction may be incorrect, IIFEs are common enough that it
>          // pays off for lots of code.
> +        if (pn->isLikelyIIFE() && !funbox->isGenerator())

Hmm, I guess generators as IIFEs is kind of unlikely, so this makes sense regardless of the bugs you stumbled across.  Good enough.
Attachment #8549196 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/mozilla-central/rev/5834d0b43de6
https://hg.mozilla.org/mozilla-central/rev/369167a98ec3
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
No longer blocks: shumway-fb2
So reading through the comments, the syntaxes that may or may not be identified are: 

(function() { … }()); // aka Crockford
var x = function() { … }();
new function() { … }();

There is also: 

(function() { … })(); // aka dogballs
+function() { … }(); // basically any operater making this an expression and not a declaration

… and a few others, see e.g.:
- http://jsperf.com/immediate-anonymous-function-invocation
- http://jsperf.com/iife-antecedents-in-js


While supporting all variations is not feasible, at least a few more "simple" ones like {+,-,!,~,void} should be supported since they are used in the wild (I saw some of them when working with a jQuery + Bootstrap project recently). 

Can you confirm which cases the patch covers?
Flags: needinfo?(sstangl)
(In reply to Florian Bender from comment #18)
> So reading through the comments, the syntaxes that may or may not be
> identified are: 
> 
> (function() { … }()); // aka Crockford

Covered.  Opening parenthesis followed by |function| triggers full parsing.

> var x = function() { … }();

Not covered.  There's no way a priori to distinguish this from |var x = function() { ... };|

> new function() { … }();

Covered.  |function(| and |new function| are the *only* bits of syntax that trigger the full-parse heuristic.

> There is also: 
> 
> (function() { … })(); // aka dogballs

Covered, because it starts with |(function|.  (Not exactly complaining, or blaming you, per se -- but I did not know that name, and I didn't need to.  Stay classy, Interwebs.)

> +function() { … }(); // basically any operater making this an expression and
> not a declaration

Not covered.

> … and a few others, see e.g.:
> - http://jsperf.com/immediate-anonymous-function-invocation
> - http://jsperf.com/iife-antecedents-in-js

I skimmed the cases.  It appears most aren't handled.  I'm not sure how many actually should be.

> While supporting all variations is not feasible, at least a few more
> "simple" ones like {+,-,!,~,void} should be supported since they are used in
> the wild (I saw some of them when working with a jQuery + Bootstrap project
> recently).

Seems possible, at least for the unary operators cases.  Wouldn't require much more than modifying Parser<ParseHandler>::unaryExpr to pass in PredictInvoked, and Parser<ParseHandler>::unaryOpExpr to pass down an invoked parameter.  File a new bug on it?
Flags: needinfo?(sstangl)
Sure; the heuristic is simple: when the token immediately preceding the token "function" is "(", we fully parse the function, expecting it to be invoked immediately.

So of your examples, as of this patch, we detect the following ones as IIFEs:

> (function() { … }())
> (function() { … })()

In terms of engine operations, this is the best possible way to express an anonymous, immediately-invoked function. That said, are any of the other forms popular in the wild? (Could you link to some examples in popular libraries?) If programmers are frequently using a form that we could optimize but are not, we should fix that.
Sure, will file a new bug with examples in the next days. (-> ni? me)

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #19)
> Covered, because it starts with |(function|.  (Not exactly complaining, or
> blaming you, per se -- but I did not know that name, and I didn't need to. 
> Stay classy, Interwebs.)

Oh, that's a Crockford-ism, at least he coined that term once – then went on and proposed his solution.
Flags: needinfo?(fb+mozdev)
No longer blocks: shumway-jw2

M ni? is superseded by events.

Flags: needinfo?(fb+mozdev)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: