Closed Bug 584436 Opened 14 years ago Closed 14 years ago

confusing/irrelevant SSA details in jsparse.FunctionDefinition

Categories

(Other Applications Graveyard :: Narcissus, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dherman, Assigned: shu)

Details

Attachments

(1 file, 1 obsolete file)

There's some code from Shu's SSA-based parsing code that leaked into jsparse.js in the patch for bug 579230. My comments inline:

>+/*
>+ * FunctionDefinition :: (tokenizer, compiler context, boolean,
>+ *                        DECLARED_FORM or EXPRESSED_FORM or STATEMENT_FORM)
>+ *                    -> node
>+ */
> function FunctionDefinition(t, x, requireName, functionForm) {
>-    var f = new Node(t);
>-    if (f.type != FUNCTION)
>-        f.type = (f.value == "get") ? GETTER : SETTER;
>+    var b = x.builder;
>+    var f = b.FUNCTION$build(t);
>     if (t.match(IDENTIFIER))
>-        f.name = t.token.value;
>+        b.FUNCTION$setName(f, t.token.value);
>     else if (requireName)
>         throw t.newSyntaxError("Missing function identifier");
> 
>     t.mustMatch(LEFT_PAREN);
>-    f.params = [];
>-    var tt;
>-    while ((tt = t.get()) != RIGHT_PAREN) {
>-        if (tt != IDENTIFIER)
>-            throw t.newSyntaxError("Missing formal parameter");
>-        f.params.push(t.token.value);
>-        if (t.peek() != RIGHT_PAREN)
>-            t.mustMatch(COMMA);
>+    if (!t.match(RIGHT_PAREN)) {
>+        do {
>+            switch (t.get()) {
>+              case LEFT_BRACKET:
>+              case LEFT_CURLY:
>+                // Destructured formal parameters.
>+                t.unget();
>+                b.FUNCTION$addParam(f, DestructuringExpression(t, x));
>+                break;
>+              case IDENTIFIER:
>+                b.FUNCTION$addParam(f, t.token.value);
>+                break;
>+              default:
>+                throw t.newSyntaxError("Missing formal parameter");
>+                break;
>+            }
>+        } while (t.match(COMMA));
>+        t.mustMatch(RIGHT_PAREN);
>     }
> 
>-    if (t.match(LEFT_CURLY)) {
>-        var x2 = new CompilerContext(true);
>-        f.body = Script(t, x2);
>+    // Do we have an expression closure or a normal body?
>+    var tt = t.get();
>+    if (tt != LEFT_CURLY)
>+        t.unget();
>+
>+    var x2 = new CompilerContext(true, b);
>+    var rp = t.save();
>+    if (x.inFunction) {
>+        /*
>+         * Inner functions don't reset block numbering. They also need to
>+         * remember which block they were parsed in for hoisting (see comment
>+         * below).
>+         */

This comment is pretty confusing. Block numbering isn't described anywhere; it needs a comment somewhere in the codebase explaining its purpose.

>+        x2.blockId = x.blockId;
>+    }
>+
>+    if (tt != LEFT_CURLY) {
>+        b.FUNCTION$setBody(f, AssignExpression(t, x));
>+        if (x.isGenerator)
>+            throw t.newSyntaxError("Generator returns a value");
>+    } else {
>+        b.FUNCTION$hoistVars(x2.blockId);
>+        b.FUNCTION$setBody(f, Script(t, x2));
>+    }
>+
>+    /*
>+     * To linearize hoisting with nested blocks needing hoists, if a toplevel
>+     * function has any hoists we reparse the entire thing. Each toplevel
>+     * function is parsed at most twice.
>+     *
>+     * Pass 1: If there needs to be hoisting at any child block or inner
>+     * function, the entire function gets reparsed.
>+     *
>+     * Pass 2: It's possible that hoisting has changed the upvars of
>+     * functions. That is, consider:
>+     *
>+     * function f() {
>+     *   x = 0;
>+     *   g();
>+     *   x; // x's forward pointer should be invalidated!
>+     *   function g() {
>+     *     x = 'g';
>+     *   }
>+     *   var x;
>+     * }
>+     *
>+     * So, a function needs to remember in which block it is parsed under
>+     * (since the function body is _not_ hoisted, only the declaration) and
>+     * upon hoisting, needs to recalculate all its upvars up front.
>+     */

Okay, now this is just completely incomprehensible. Presumably it only has to do with the SSA algorithm? At the very least, the approach to hoisting needs some explanation somewhere.

>+    if (x2.needsHoisting) {
>+        // Order is important here! funDecls must come _after_ varDecls!
>+        b.setHoists(f.body.id, x2.varDecls.concat(x2.funDecls));
>+
>+        if (x.inFunction) {
>+            // Propagate up to the parent function if we're an inner function.
>+            x.needsHoisting = true;
>+        } else {
>+            // Only re-parse toplevel functions.
>+            var x3 = x2;

This is a dead variable.

>+            x2 = new CompilerContext(true, b);
>+            t.rewind(rp);
>+            // Set a flag in case the builder wants to have different behavior
>+            // on the second pass.
>+            b.secondPass = true;
>+            b.FUNCTION$hoistVars(f.body.id, true);
>+            b.FUNCTION$setBody(f, Script(t, x2));
>+            b.secondPass = false;
>+        }
>+    }
>+
>+    if (tt == LEFT_CURLY)
>         t.mustMatch(RIGHT_CURLY);
>-    } else { /* Expression closures (1.8) */
>-        f.body = Expression(t, x, COMMA);
>-    }
>+
>     f.end = t.token.end;
>-
>     f.functionForm = functionForm;
>     if (functionForm == DECLARED_FORM)
>         x.funDecls.push(f);
>+    b.FUNCTION$finish(f, x);
>     return f;
> }

Dave
Re: Block numbering and hoisting comments

The SSA builder (and I imagine other analyses during parse time implemented via a custom builder) need to know when hoisting is needed. It has proven hard to implement that detail purely in the builder. So I opted into having the logic in the parser instead and have it call back into the builder.

I see two options:

1. Properly document and explain what these two things are doing.
2. Remove them from the base parser, but then the SSA patch will also need to patch jsparse.

What do you think, Dave?

Re: Dead variable

Will fix.
Assignee: nobody → shu
Status: NEW → ASSIGNED
> It has proven hard to
> implement that detail purely in the builder. So I opted into having the logic
> in the parser instead and have it call back into the builder.

Sure, don't worry. It doesn't have to be uber-abstract. I guess it's just that the comments in jsparse are missing context. It's especially stark given that you haven't merged the SSABuilder into the tree yet, so there was nowhere I could go to try to make sense of the comment.

> 1. Properly document and explain what these two things are doing.

That would be fine. I'll be happy to review. Keep in mind I don't know how your SSA algorithm works (yet, anyway) so if you could try to make it make sense to a reader who doesn't know about the algorithm that'd be ideal.

Dave
Any reason not to have an ad-hoc callback or similar so the SSA analysis can contain the code currently inlined? Builders and other patterns are great but there is always a lower-level pattern, still abstracting harder than "if" ;-).

/be
Attached patch Essays (obsolete) — Splinter Review
Have fun reading the essays!
Attachment #462883 - Flags: review?(dherman)
I don't think I understand what is meant by "ad-hoc callback". How does the parser know when to call those callbacks if they're ad-hoc?
Right now you have some SSA-builder code directly inline in FunctionDefinition. That code is conditioned by the mighty "if" statement, testing x2.needsHoisting.

Instead of inline if-then logic, why not have a closure-based factoring? If this is hard, no worries. It just seems worth doing since you got almost to the finish line using the builder pattern.

/be
Ah I see. It'd be hard to factor out all the hoisting-related logic (that's just the main part, others are peppered throughout other parts of the parser).

Going to put it on the backburner for now, but will come back to it eventually to see how to make hoisting logic better.
Comment on attachment 462883 [details] [diff] [review]
Essays

Yikes -- I was just looking for comprehensibility, not essays. :) I think this might have gone too far in the other direction. You just want to help the reader understand, and you don't want to demand more of their time than necessary to get the point across.

>+        /*
>+         * Blocks are uniquely numbered by a blockId within a function that is
>+         * at the top level of the program. blockId starts from 0. Consider
>+         * the following program:

No examples needed here. Suggestion (take it or leave it, up to you):

"Blocks are uniquely numbered within the subtree rooted at their outermost ancestor function. In other words, each top-level function of a program starts its blockId counter at 0."

>+         * This is done to aid hoisting for parse-time analyses done in custom
>+         * builders.
>+         *
>+         * For more details in its interaction with hoisting, see comments in
>+         * FunctionDefinition.
>+         */

That looks fine.

>+         * Hoisting is hairy for binding analysis done at parse time. The
>+         * taxonomy of hoists is as follows:

"Hoisting makes parse-time binding analysis tricky. A taxonomy of hoists:"

>+         * 1. vars hoist to the top of their function. The following program
>+         *    prints 'global', because the 'var x' is hoisted to the top of
>+         *    the function.

"vars hoist to the top of their function:"

>+         *    var x = 'global';
>+         *    function f() {
>+         *      x = 'f';
>+         *      if (false) {
>+         *        var x;
>+         *      }
>+         *    }
>+         *    f();
>+         *    print(x);

"print(x); // => 'global'"

>+         * 2. lets hoist to the top of their block. The following program
>+         *    prints 'undefined' because the 'let x' is hoisted to the top of
>+         *    block 1, which then binds the x in block 2.

"lets hoist to the top of their block:"

>+         *    function f() { // id: 0
>+         *      x = 'f';
>+         *      { // id: 1
>+         *        { // id: 2
>+         *          print(x);

"print(x); // => 'undefined'

>+         *        }
>+         *        let x;
>+         *      }
>+         *      var x;
>+         *    }
>+         *    f();
>+         *
>+         * 3. inner functions hoist to the top of their parent function if and
>+         *    only if they are not within a block. The following program
>+         *    results in an error

This isn't quite right, since in SpiderMonkey block-nested inner functions are legal (and have screwy semantics). Also, it's not just about whether they're in a nested block, e.g.:

    function outer() {
        if (...)
            function inner() { ... }
        ...
    }

I think you want to say something like: "Inner functions at function top-level hoist to the beginning" and then have a #4 describing nested functions that are not at function top-level. For that, you can just say they aren't currently supported. File a bug on that and reference it in the comment.


>+         * If the builder used is doing parse-time analyses, hoisting can make
>+         * the analysis result unsound.

Personal pet peeve: "make the analysis result unsound" doesn't give the reader any insight as to what about it is unsound. Much better to say something like:

"If the builder is doing binding analysis, hoisting may invalidate earlier conclusions it makes about variable scope."

>+         * If this is the case, the builder can opt to set the needsHoisting

Axe "If this is the case," -- better to just get straight to the point.

>+         * flag in a CompilerContext (in the case of var and function
>+         * hoisting) or in a node of type BLOCK (in the case of let hoisting).
>+         * This signals for the parser to reparse sections of code.

Looks fine.

>+         * Eager reparsing can blow up asymptotically. Revisiting example 2,
>+         * note that both block 0 (the function, basically) and block 2 need
>+         * hoisting. Block 0 needs to hoist 'var x', and block 2 needs to
>+         * hoist 'let x'. If eagerly reparsed, we would first reparse block 2,
>+         * but upon seeing 'var x' at the end of block 0, it needs to reparse
>+         * the entirety of block 0, which _includes_ block 2. In the end,
>+         * block 2 gets parse 4 times!
>+         *
>+         * This blows up exponentially. For a block nested n deep where every
>+         * block 0 to n needs hoisting, the inner block gets reparsed 2^n
>+         * times!

Kill both these paragraphs. We're all adults here, we know what makes exponential blow-up. Just replace the opening clause of the next paragraph:

>+         * To linearize hoisting with nested blocks then, we make a compromise

with "To avoid exponential blow-up,"

>+         * such that if a function at the program toplevel any hoists in its
>+         * child blocks or inner functions, we reparse the entire toplevel
>+         * function. Each toplevel function is parsed at most twice.

Ungrammatical: the subject "a function" has no verb. I think you're trying to say something like: "if a forward-reference is detected anywhere within a function." If I've understood correctly, I think that would be clearer.

>+         * As noted before, it's possible that hoisting makes the previous
>+         * analysis result unsound. During reparsing the builder now knows
>+         * ahead of time all its vars, lets, and inner functions declarations
>+         * before parsing them.

Kill this whole paragraph. It's about the specific builder algorithm; not relevant to this code.

>+         * The list of declarations is tied to block ids to aid talking about
>+         * declarations of blocks that have not yet been fully parsed.
>+         *
>+         * Blocks are already uniquely numbered; see the comment in
>+         * Statements.

Looks fine.

Dave
Attached patch CommentsSplinter Review
> Yikes -- I was just looking for comprehensibility, not essays. :) I think this
> might have gone too far in the other direction. You just want to help the
> reader understand, and you don't want to demand more of their time than
> necessary to get the point across.

Binary searching my way to success!

> I think you want to say something like: "Inner functions at function top-level
> hoist to the beginning" and then have a #4 describing nested functions that are
> not at function top-level. For that, you can just say they aren't currently
> supported. File a bug on that and reference it in the comment.

I'm not sure what the unsupported here is for -- the case you're talking about is just mutation, I don't think there's any hoisting going on to be concerned about.

Otherwise good edits; they've been incorporated.
Attachment #462883 - Attachment is obsolete: true
Attachment #463249 - Flags: review?(dherman)
Attachment #462883 - Flags: review?(dherman)
Comment on attachment 463249 [details] [diff] [review]
Comments

>+         * 3. inner functions at function top-level hoist to the top of the
>+         *    function.

Nit: "...to the beginning of the function" to avoid confusion with preceding "top-level."

Looks great. r=me

Dave
Attachment #463249 - Flags: review?(dherman) → review+
http://hg.mozilla.org/tracemonkey/rev/76d8fffccca9
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: Other Applications → Other Applications Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: