Last Comment Bug 699227 - Make ParseNode::getKind() and Token::type different types
: Make ParseNode::getKind() and Token::type different types
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-02 14:17 PDT by Jeff Walden [:Waldo] (remove +bmo to email)
Modified: 2011-11-09 05:24 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch (269.79 KB, patch)
2011-11-02 14:17 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
jorendorff: review+
Details | Diff | Splinter Review
Bundle of patches, including the one here, for use in building the patch (32.49 KB, application/octet-stream)
2011-11-02 14:21 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
no flags Details

Description Jeff Walden [:Waldo] (remove +bmo to email) 2011-11-02 14:17:09 PDT
Created attachment 571455 [details] [diff] [review]
Patch

Parse nodes and tokens are currently tagged with the same enum type.  Sometimes this is convenient.  However, this trick makes parsing and tokenizing much harder to understand.  Parse nodes would be much easier to understand if the two were distinct types.

I mostly avoided other cleanups while writing this patch.  Sometimes I found I could understand the code I was non-mechanically changing better if I rearranged things somewhat, converting C-style at-start declarations to C++-style at-first-assignment declarations, so I made those changes to help me make the changes I actually wanted.  Sometimes I renamed a variable in the process, but usually not.  Other than that there's very little here that wasn't just converting the type.  I did change PNK_RESERVED to PNK_CATCHLIST, because that's the only place/way it's used, and it might make the changes easier to understand.

Once this is in, further changes will follow to better name the kinds.  For example, PNK_FOR for for-loops has a kid that's PNK_FORHEAD or PNK_IN -- but PNK_IN is also used for |x in y|.  Splitting these two uses into two different kinds will be much clearer.  PNK_DEFAULT is used for |default xml namespace| and for default in switches; those should be split too (and default: made unary rather than binary).  PNK_PLUS and PNK_MINUS are used both unary and infix, when having PNK_ADD and PNK_POS split up would be better.  You get the idea -- doing this will let us make our parse node kinds clearer, without imposing overhead on token kinds too.

This patch bitrots fairly easily, so quicker review would be great.  Regardless of review timing, however, I won't be landing it until after the next merge on Tuesday.  (It may pass all our existing tests, but I wouldn't be surprised if fuzzers managed to find a few parsing bugs caused by it.)
Comment 1 Jeff Walden [:Waldo] (remove +bmo to email) 2011-11-02 14:21:18 PDT
Created attachment 571457 [details]
Bundle of patches, including the one here, for use in building the patch

The patch here is atop a bunch of other changes, so if you want to test it, you want to use this bundle to do so, not the patch itself.  (I should note that this patch passed tests before I rebased it just before posting it here.  I haven't verified that no tests broke during the rebase, although of course I would prior to landing it.)
Comment 2 Jason Orendorff [:jorendorff] 2011-11-03 04:50:59 PDT
(In reply to Jeff Walden (remove +bmo to email) from comment #0)
> Parse nodes and tokens are currently tagged with the same enum type. 
> Sometimes this is convenient.  However, this trick makes parsing and
> tokenizing much harder to understand.  Parse nodes would be much easier to
> understand if the two were distinct types.

I agree.

> Once this is in, further changes will follow to better name the kinds.  For
> example, PNK_FOR for for-loops has a kid that's PNK_FORHEAD or PNK_IN -- but
> PNK_IN is also used for |x in y|.  Splitting these two uses into two
> different kinds will be much clearer.

I agree again. It will also be easier to grep the code for references to either kind of IN node.

Reviewing now.
Comment 3 Jason Orendorff [:jorendorff] 2011-11-03 05:24:51 PDT
I've gotten through everything except Parser.{h,cpp}, so here's the first half of a review:

In BytecodeEmitter.cpp, TOK_ still appears a few times in comments where
it should be PNK_.

In BytecodeCompiler.cpp, frontend::CompileFunctionBody:
>     tokenStream.mungeCurrentToken(TOK_NAME);
>-    ParseNode *fn = FunctionNode::create(&funbce);
>+    ParseNode *fn = FunctionNode::create(PNK_NAME, &funbce);

This mungeCurrentToken call isn't necessary anymore (but watch out, the
other one later in this function is still needed).

The structure of this function creeps me out. It should return false
early on error. Oh well.

In ParseNode.h, comment on enum ParseNodeKind:
>+ * Descriptions of the meanings of the various parse node kinds occur after the
>+ * ParseNodeKind definition.

It wasn't clear to me what this meant. You could say, "A long comment
after this enum block describes the parse node kinds in detail."

>+    /* Relational operators (< <= > >=). */
>+    PNK_LT,
>+    PNK_RELOP_START = PNK_LT,
>+    PNK_LE,
>+    PNK_GT,
>+    PNK_GE,
>+    PNK_RELOP_LAST = PNK_GE,

PNK_RELOP_START and PNK_RELOP_LAST are unused.

PNK_SHIFTOP_START and PNK_SHIFTOP_LAST are unused.

The UNARYOP and EQUALITY ranges are unused except for
ParseNode::isUnaryOp and ParseNode::isEquality, which are themselves
unused.

In ParseNode.h, struct ParseNode:
>+    bool isDeclaration() const {
>+        ParseNodeKind kind = getKind();
>+#if JS_HAS_BLOCK_SCOPE
>+        if (kind == PNK_LET)
>+            return true;
>+#endif
>+        return kind == PNK_VAR;
>+    }

This method name makes me nervous. It is false for function
declarations, for example. Also I feel like it's confusable with
isDefn(), but that might just be me.

The two places where this is used, you could just say
"pn->isKind(PNK_VAR) || pn->isKind(PNK_LET)".

In TokenStream.h:
>-inline bool
>-TreeTypeIsXML(TokenKind tt)
>-{
>-    return tt == TOK_XMLCOMMENT || tt == TOK_XMLCDATA || tt == TOK_XMLPI ||
>-           tt == TOK_XMLELEM || tt == TOK_XMLLIST;
>-}
>-
> inline bool
> TokenKindIsDecl(TokenKind tt)
> {
> #if JS_HAS_BLOCK_SCOPE
>     return tt == TOK_VAR || tt == TOK_LET;
> #else
>     return tt == TOK_VAR;
> #endif
> }
> 
>+inline bool
>+TreeTypeIsXML(TokenKind tt)
>+{
>+    return tt == TOK_XMLCOMMENT || tt == TOK_XMLCDATA || tt == TOK_XMLPI ||
>+           tt == TOK_XMLELEM || tt == TOK_XMLLIST;
>+}
>+

The function being moved here, TreeTypeIsXML, is now unused.

TokenKindIsUnaryOp and TokenStream::isCurrentTokenUnaryOp are now unused.

Many TokenKinds are now obsolete, including TOK_ARRAYCOMP, TOK_ARRAYPUSH,
TOK_FORHEAD, TOK_SEQ, TOK_ARGSBODY, TOK_UPVARS.

TOK_LEXICALSCOPE still has one use, in Parser::forStatement, but I think
that can be thrown away too, following Luke's changes in bug 696813.

>    TOK_ASSIGNMENT_LAST = TOK_MODASSIGN,
>    
>
>    TOK_LIMIT                      /* domain size */

Looks like enum TokenKind has some extra whitespace near the end.
Comment 4 Jason Orendorff [:jorendorff] 2011-11-03 08:01:24 PDT
> In BytecodeCompiler.cpp, frontend::CompileFunctionBody:
> >     tokenStream.mungeCurrentToken(TOK_NAME);
> >-    ParseNode *fn = FunctionNode::create(&funbce);
> >+    ParseNode *fn = FunctionNode::create(PNK_NAME, &funbce);
> 
> This mungeCurrentToken call isn't necessary anymore (but watch out, the
> other one later in this function is still needed).

Actually I don't understand why TOK_NAME ever made sense here. Please change it to TOK_FUNCTION, in a separate changeset for better bisecting.
Comment 5 Jason Orendorff [:jorendorff] 2011-11-03 09:02:23 PDT
Comment on attachment 571455 [details] [diff] [review]
Patch

In Parser.cpp, js::DefineArg:
>     /*
>      * Make an argument definition node, distinguished by being in tc->decls
>      * but having TOK_NAME type and JSOP_NOP op. Insert it in a TOK_ARGSBODY
>      * list node returned via pn->pn_body.
>      */
>-    ParseNode *argpn = NameNode::create(atom, tc);
>+    ParseNode *argpn = NameNode::create(PNK_NAME, atom, tc);

In the comment, s/TOK_NAME type/PNK_NAME kind/
and s/TOK_ARGSBODY/PNK_ARGSBODY/.

In Parser::functionDef:
>     /* Make a TOK_FUNCTION node. */
>     tokenStream.mungeCurrentToken(TOK_FUNCTION, JSOP_NOP);
>-    ParseNode *pn = FunctionNode::create(tc);
>+    ParseNode *pn = FunctionNode::create(PNK_FUNCTION, tc);

This mungeCurrentToken call isn't necessary anymore.

In Parser::forStatement, the monster if-condition:
>-                    (pn1->pn_head->isKind(TOK_ASSIGN) &&
>-                     (!pn1->pn_head->pn_left->isKind(TOK_RB) ||
>+                    (pn1->pn_head->isAssignment() &&
>+                     (!pn1->pn_head->pn_left->isKind(PNK_RB) ||

I think PNK_ASSIGN is correct here. isAssignment() is misleading;
augmented assignment isn't possible here, right?

In Parser::variables:
> {
>     /*
>      * The three options here are:
>      * - TOK_LET: We are parsing a let declaration.
>      * - TOK_LP: We are parsing the head of a let block.
>      * - Otherwise, we're parsing var declarations.
>      */
>     TokenKind tt = tokenStream.currentToken().type;
>     bool let = (tt == TOK_LET || tt == TOK_LP);
>     JS_ASSERT(let || tt == TOK_VAR);
[...]
>         while (scopeStmt && !(scopeStmt->flags & SIF_SCOPE)) {
>             JS_ASSERT(!STMT_MAYBE_SCOPE(scopeStmt));
>             scopeStmt = scopeStmt->downScope;
>         }
>         JS_ASSERT(scopeStmt);
>     }
> 
>     BindData data;
>     data.op = let ? JSOP_NOP : tokenStream.currentToken().t_op;
>-    pn = ListNode::create(tc);
>+    ParseNode *pn = ListNode::create(tt == TOK_LET ? PNK_LET : tt == TOK_VAR ? PNK_VAR : PNK_LP, tc);

I think it would be clearer if you moved the assertion from the top to
be next to this line, changed the comment to say "TOK_VAR:" instead of
"Otherwise," or both.

In Parser::assignExpr:
>+    ParseNode *lhs;
>+    ParseNodeKind kind;
>+    {
>+        ParseNode *maybeLHS = condExpr1();
>+        if (!maybeLHS)
>+            return NULL;
>+
>+        switch (tokenStream.currentToken().type) {
>+          case TOK_ASSIGN:       kind = PNK_ASSIGN;       break;
>+          case TOK_ADDASSIGN:    kind = PNK_ADDASSIGN;    break;
>+          case TOK_SUBASSIGN:    kind = PNK_SUBASSIGN;    break;
>+          case TOK_BITORASSIGN:  kind = PNK_BITORASSIGN;  break;
>+          case TOK_BITXORASSIGN: kind = PNK_BITXORASSIGN; break;
>+          case TOK_BITANDASSIGN: kind = PNK_BITANDASSIGN; break;
>+          case TOK_LSHASSIGN:    kind = PNK_LSHASSIGN;    break;
>+          case TOK_RSHASSIGN:    kind = PNK_RSHASSIGN;    break;
>+          case TOK_URSHASSIGN:   kind = PNK_URSHASSIGN;   break;
>+          case TOK_MULASSIGN:    kind = PNK_MULASSIGN;    break;
>+          case TOK_DIVASSIGN:    kind = PNK_DIVASSIGN;    break;
>+          case TOK_MODASSIGN:    kind = PNK_MODASSIGN;    break;
>+          default:
>+            JS_ASSERT(!tokenStream.isCurrentTokenAssignment());
>+            tokenStream.ungetToken();
>+            return maybeLHS;
>+        }
>+
>+        lhs = maybeLHS;
>     }

I don't think the extra maybeLHS variable or the extra C++ block help
here. (Renaming pn to lhs helps though.) Flatten it?

>+ParseNode *
>+Parser::unaryOpExpr(ParseNodeKind kind)
>+{
>+    ParseNode *unary = UnaryNode::create(kind, tc);
>+    if (!unary)
>+        return NULL;
>+    unary->setOp(tokenStream.currentToken().t_op);
>+
>+    ParseNode *kid = unaryExpr();
>+    if (!kid)
>+        return NULL;
>+    unary->pn_pos.end = kid->pn_pos.end;
>+    unary->pn_kid = kid;
>+    return unary;
>+}

You can write it like this:

ParseNode *
Parser::unaryOpExpr(ParseNodeKind kind, JSOp op)
{
    TokenPtr begin = tokenStream.currentToken().t_pos.begin;
    ParseNode *kid = unaryExpr();
    if (!kid)
        return NULL;
    return new_<UnaryNode>(kind, op, TokenPos::make(begin, kid->pn_pos.end), kid);
}

Of course that requires passing in the op, but that's easy enough.
Just a suggestion.

In Parser.cpp, comment on Parser::xmlNameExpr():
>  * If PN_LIST or PN_NULLARY, pn_type will be TOK_XMLNAME for the case where
>  * XMLTagContent: XMLNameExpr.  If pn_type is not TOK_XMLNAME but pn_arity is
>  * PN_LIST, pn_type will be tagtype.  If PN_UNARY, pn_type will be TOK_LC and
>  * we parsed exactly one expression.

This comment needs to be updated (pn_type, TOK_, tagtype).

The comment on Parser::xmlTagContent has similar issues.

In Parser::xmlNameExpr():
>            JS_ASSERT(tt == TOK_XMLNAME);
>+            pn2 = atomNode(PNK_XMLNAME, tokenStream.currentToken().t_op);

The op is always JSOP_STRING (TOK_XMLNAME tokens are only created in one
place, in TokenStream.cpp, and there the op is hard-coded).

It's OK to do this in a separate changeset from the rest, for easy
bisecting, but definitely change this to hardcode JSOP_STRING here too,
rather than rely on t_op. I think we want to remove t_op.

In Parser::xmlTagContent():
>         tt = tokenStream.getToken();
>         if (tt == TOK_XMLATTR) {
>-            pn2 = xmlAtomNode();
>+            pn2 = atomNode(PNK_XMLATTR, tokenStream.currentToken().t_op);

Again, the op is always JSOP_STRING. Same deal.

In Parser::xmlElementContent(ParseNode *pn):
>             /* Non-zero-length XML text scanned. */
>-            ParseNode *pn2 = xmlAtomNode();
>+            ParseNode *pn2 = atomNode(tt == TOK_XMLSPACE ? PNK_XMLSPACE : PNK_XMLTEXT,
>+                                      tokenStream.currentToken().t_op);

This case is less obvious. I think t_op is always JSOP_STRING here. It
could perhaps be uninitialized, in which case surely it doesn't matter
what we use. So I vote for hard-coding JSOP_STRING here too.

Nicely done. r=me with the comments addressed.
Comment 6 Jeff Walden [:Waldo] (remove +bmo to email) 2011-11-08 10:42:09 PST
Good comments, basically responded to them all, although I ended up folding in a couple of the requests after determining that they made sense enough that keeping them split out would just be a hassle.

https://hg.mozilla.org/integration/mozilla-inbound/rev/cce5d3a79723

Also, you anticipate me in stating that we want to remove t_op!  I hope to do this very soon, so that Token will be typed by only one field and not kind-of two.
Comment 7 Nicholas Nethercote [:njn] 2011-11-09 04:59:09 PST
> Also, you anticipate me in stating that we want to remove t_op!  I hope to
> do this very soon, so that Token will be typed by only one field and not
> kind-of two.

Huzzah!
Comment 8 Marco Bonardo [::mak] (Away 6-20 Aug) 2011-11-09 05:24:22 PST
https://hg.mozilla.org/mozilla-central/rev/cce5d3a79723

Note You need to log in before you can comment on or make changes to this bug.