699227 - Make ParseNode::getKind() and Token::type different types

Assignee

Description

•

14 years ago

Attached patch Patch — Details — Splinter Review

Parse nodes and tokens are currently tagged with the same enum type. Sometimes this is convenient. However, this trick makes parsing and tokenizing much harder to understand. Parse nodes would be much easier to understand if the two were distinct types. I mostly avoided other cleanups while writing this patch. Sometimes I found I could understand the code I was non-mechanically changing better if I rearranged things somewhat, converting C-style at-start declarations to C++-style at-first-assignment declarations, so I made those changes to help me make the changes I actually wanted. Sometimes I renamed a variable in the process, but usually not. Other than that there's very little here that wasn't just converting the type. I did change PNK_RESERVED to PNK_CATCHLIST, because that's the only place/way it's used, and it might make the changes easier to understand. Once this is in, further changes will follow to better name the kinds. For example, PNK_FOR for for-loops has a kid that's PNK_FORHEAD or PNK_IN -- but PNK_IN is also used for |x in y|. Splitting these two uses into two different kinds will be much clearer. PNK_DEFAULT is used for |default xml namespace| and for default in switches; those should be split too (and default: made unary rather than binary). PNK_PLUS and PNK_MINUS are used both unary and infix, when having PNK_ADD and PNK_POS split up would be better. You get the idea -- doing this will let us make our parse node kinds clearer, without imposing overhead on token kinds too. This patch bitrots fairly easily, so quicker review would be great. Regardless of review timing, however, I won't be landing it until after the next merge on Tuesday. (It may pass all our existing tests, but I wouldn't be surprised if fuzzers managed to find a few parsing bugs caused by it.)

Attachment #571455 - Flags: review?(jorendorff)

Jeff Walden [:Waldo]

Assignee

Comment 1

•

14 years ago

Attached file Bundle of patches, including the one here, for use in building the patch — Details

The patch here is atop a bunch of other changes, so if you want to test it, you want to use this bundle to do so, not the patch itself. (I should note that this patch passed tests before I rebased it just before posting it here. I haven't verified that no tests broke during the rebase, although of course I would prior to landing it.)

Jeff Walden [:Waldo]

Assignee

Updated

•

14 years ago

Attachment #571457 - Attachment is patch: false

Attachment #571457 - Attachment mime type: text/plain → application/octet-stream

Jason Orendorff [:jorendorff]

Comment 2

•

14 years ago

(In reply to Jeff Walden (remove +bmo to email) from comment #0) > Parse nodes and tokens are currently tagged with the same enum type. > Sometimes this is convenient. However, this trick makes parsing and > tokenizing much harder to understand. Parse nodes would be much easier to > understand if the two were distinct types. I agree. > Once this is in, further changes will follow to better name the kinds. For > example, PNK_FOR for for-loops has a kid that's PNK_FORHEAD or PNK_IN -- but > PNK_IN is also used for |x in y|. Splitting these two uses into two > different kinds will be much clearer. I agree again. It will also be easier to grep the code for references to either kind of IN node. Reviewing now.

Jason Orendorff [:jorendorff]

Comment 3

•

14 years ago

I've gotten through everything except Parser.{h,cpp}, so here's the first half of a review: In BytecodeEmitter.cpp, TOK_ still appears a few times in comments where it should be PNK_. In BytecodeCompiler.cpp, frontend::CompileFunctionBody: > tokenStream.mungeCurrentToken(TOK_NAME); >- ParseNode *fn = FunctionNode::create(&funbce); >+ ParseNode *fn = FunctionNode::create(PNK_NAME, &funbce); This mungeCurrentToken call isn't necessary anymore (but watch out, the other one later in this function is still needed). The structure of this function creeps me out. It should return false early on error. Oh well. In ParseNode.h, comment on enum ParseNodeKind: >+ * Descriptions of the meanings of the various parse node kinds occur after the >+ * ParseNodeKind definition. It wasn't clear to me what this meant. You could say, "A long comment after this enum block describes the parse node kinds in detail." >+ /* Relational operators (< <= > >=). */ >+ PNK_LT, >+ PNK_RELOP_START = PNK_LT, >+ PNK_LE, >+ PNK_GT, >+ PNK_GE, >+ PNK_RELOP_LAST = PNK_GE, PNK_RELOP_START and PNK_RELOP_LAST are unused. PNK_SHIFTOP_START and PNK_SHIFTOP_LAST are unused. The UNARYOP and EQUALITY ranges are unused except for ParseNode::isUnaryOp and ParseNode::isEquality, which are themselves unused. In ParseNode.h, struct ParseNode: >+ bool isDeclaration() const { >+ ParseNodeKind kind = getKind(); >+#if JS_HAS_BLOCK_SCOPE >+ if (kind == PNK_LET) >+ return true; >+#endif >+ return kind == PNK_VAR; >+ } This method name makes me nervous. It is false for function declarations, for example. Also I feel like it's confusable with isDefn(), but that might just be me. The two places where this is used, you could just say "pn->isKind(PNK_VAR) || pn->isKind(PNK_LET)". In TokenStream.h: >-inline bool >-TreeTypeIsXML(TokenKind tt) >-{ >- return tt == TOK_XMLCOMMENT || tt == TOK_XMLCDATA || tt == TOK_XMLPI || >- tt == TOK_XMLELEM || tt == TOK_XMLLIST; >-} >- > inline bool > TokenKindIsDecl(TokenKind tt) > { > #if JS_HAS_BLOCK_SCOPE > return tt == TOK_VAR || tt == TOK_LET; > #else > return tt == TOK_VAR; > #endif > } > >+inline bool >+TreeTypeIsXML(TokenKind tt) >+{ >+ return tt == TOK_XMLCOMMENT || tt == TOK_XMLCDATA || tt == TOK_XMLPI || >+ tt == TOK_XMLELEM || tt == TOK_XMLLIST; >+} >+ The function being moved here, TreeTypeIsXML, is now unused. TokenKindIsUnaryOp and TokenStream::isCurrentTokenUnaryOp are now unused. Many TokenKinds are now obsolete, including TOK_ARRAYCOMP, TOK_ARRAYPUSH, TOK_FORHEAD, TOK_SEQ, TOK_ARGSBODY, TOK_UPVARS. TOK_LEXICALSCOPE still has one use, in Parser::forStatement, but I think that can be thrown away too, following Luke's changes in bug 696813. > TOK_ASSIGNMENT_LAST = TOK_MODASSIGN, > > > TOK_LIMIT /* domain size */ Looks like enum TokenKind has some extra whitespace near the end.

Jason Orendorff [:jorendorff]

Comment 4

•

14 years ago

> In BytecodeCompiler.cpp, frontend::CompileFunctionBody: > > tokenStream.mungeCurrentToken(TOK_NAME); > >- ParseNode *fn = FunctionNode::create(&funbce); > >+ ParseNode *fn = FunctionNode::create(PNK_NAME, &funbce); > > This mungeCurrentToken call isn't necessary anymore (but watch out, the > other one later in this function is still needed). Actually I don't understand why TOK_NAME ever made sense here. Please change it to TOK_FUNCTION, in a separate changeset for better bisecting.

Jason Orendorff [:jorendorff]

Comment 5

•

14 years ago

Comment on attachment 571455 [details] [diff] [review] Patch In Parser.cpp, js::DefineArg: > /* > * Make an argument definition node, distinguished by being in tc->decls > * but having TOK_NAME type and JSOP_NOP op. Insert it in a TOK_ARGSBODY > * list node returned via pn->pn_body. > */ >- ParseNode *argpn = NameNode::create(atom, tc); >+ ParseNode *argpn = NameNode::create(PNK_NAME, atom, tc); In the comment, s/TOK_NAME type/PNK_NAME kind/ and s/TOK_ARGSBODY/PNK_ARGSBODY/. In Parser::functionDef: > /* Make a TOK_FUNCTION node. */ > tokenStream.mungeCurrentToken(TOK_FUNCTION, JSOP_NOP); >- ParseNode *pn = FunctionNode::create(tc); >+ ParseNode *pn = FunctionNode::create(PNK_FUNCTION, tc); This mungeCurrentToken call isn't necessary anymore. In Parser::forStatement, the monster if-condition: >- (pn1->pn_head->isKind(TOK_ASSIGN) && >- (!pn1->pn_head->pn_left->isKind(TOK_RB) || >+ (pn1->pn_head->isAssignment() && >+ (!pn1->pn_head->pn_left->isKind(PNK_RB) || I think PNK_ASSIGN is correct here. isAssignment() is misleading; augmented assignment isn't possible here, right? In Parser::variables: > { > /* > * The three options here are: > * - TOK_LET: We are parsing a let declaration. > * - TOK_LP: We are parsing the head of a let block. > * - Otherwise, we're parsing var declarations. > */ > TokenKind tt = tokenStream.currentToken().type; > bool let = (tt == TOK_LET || tt == TOK_LP); > JS_ASSERT(let || tt == TOK_VAR); [...] > while (scopeStmt && !(scopeStmt->flags & SIF_SCOPE)) { > JS_ASSERT(!STMT_MAYBE_SCOPE(scopeStmt)); > scopeStmt = scopeStmt->downScope; > } > JS_ASSERT(scopeStmt); > } > > BindData data; > data.op = let ? JSOP_NOP : tokenStream.currentToken().t_op; >- pn = ListNode::create(tc); >+ ParseNode *pn = ListNode::create(tt == TOK_LET ? PNK_LET : tt == TOK_VAR ? PNK_VAR : PNK_LP, tc); I think it would be clearer if you moved the assertion from the top to be next to this line, changed the comment to say "TOK_VAR:" instead of "Otherwise," or both. In Parser::assignExpr: >+ ParseNode *lhs; >+ ParseNodeKind kind; >+ { >+ ParseNode *maybeLHS = condExpr1(); >+ if (!maybeLHS) >+ return NULL; >+ >+ switch (tokenStream.currentToken().type) { >+ case TOK_ASSIGN: kind = PNK_ASSIGN; break; >+ case TOK_ADDASSIGN: kind = PNK_ADDASSIGN; break; >+ case TOK_SUBASSIGN: kind = PNK_SUBASSIGN; break; >+ case TOK_BITORASSIGN: kind = PNK_BITORASSIGN; break; >+ case TOK_BITXORASSIGN: kind = PNK_BITXORASSIGN; break; >+ case TOK_BITANDASSIGN: kind = PNK_BITANDASSIGN; break; >+ case TOK_LSHASSIGN: kind = PNK_LSHASSIGN; break; >+ case TOK_RSHASSIGN: kind = PNK_RSHASSIGN; break; >+ case TOK_URSHASSIGN: kind = PNK_URSHASSIGN; break; >+ case TOK_MULASSIGN: kind = PNK_MULASSIGN; break; >+ case TOK_DIVASSIGN: kind = PNK_DIVASSIGN; break; >+ case TOK_MODASSIGN: kind = PNK_MODASSIGN; break; >+ default: >+ JS_ASSERT(!tokenStream.isCurrentTokenAssignment()); >+ tokenStream.ungetToken(); >+ return maybeLHS; >+ } >+ >+ lhs = maybeLHS; > } I don't think the extra maybeLHS variable or the extra C++ block help here. (Renaming pn to lhs helps though.) Flatten it? >+ParseNode * >+Parser::unaryOpExpr(ParseNodeKind kind) >+{ >+ ParseNode *unary = UnaryNode::create(kind, tc); >+ if (!unary) >+ return NULL; >+ unary->setOp(tokenStream.currentToken().t_op); >+ >+ ParseNode *kid = unaryExpr(); >+ if (!kid) >+ return NULL; >+ unary->pn_pos.end = kid->pn_pos.end; >+ unary->pn_kid = kid; >+ return unary; >+} You can write it like this: ParseNode * Parser::unaryOpExpr(ParseNodeKind kind, JSOp op) { TokenPtr begin = tokenStream.currentToken().t_pos.begin; ParseNode *kid = unaryExpr(); if (!kid) return NULL; return new_<UnaryNode>(kind, op, TokenPos::make(begin, kid->pn_pos.end), kid); } Of course that requires passing in the op, but that's easy enough. Just a suggestion. In Parser.cpp, comment on Parser::xmlNameExpr(): > * If PN_LIST or PN_NULLARY, pn_type will be TOK_XMLNAME for the case where > * XMLTagContent: XMLNameExpr. If pn_type is not TOK_XMLNAME but pn_arity is > * PN_LIST, pn_type will be tagtype. If PN_UNARY, pn_type will be TOK_LC and > * we parsed exactly one expression. This comment needs to be updated (pn_type, TOK_, tagtype). The comment on Parser::xmlTagContent has similar issues. In Parser::xmlNameExpr(): > JS_ASSERT(tt == TOK_XMLNAME); >+ pn2 = atomNode(PNK_XMLNAME, tokenStream.currentToken().t_op); The op is always JSOP_STRING (TOK_XMLNAME tokens are only created in one place, in TokenStream.cpp, and there the op is hard-coded). It's OK to do this in a separate changeset from the rest, for easy bisecting, but definitely change this to hardcode JSOP_STRING here too, rather than rely on t_op. I think we want to remove t_op. In Parser::xmlTagContent(): > tt = tokenStream.getToken(); > if (tt == TOK_XMLATTR) { >- pn2 = xmlAtomNode(); >+ pn2 = atomNode(PNK_XMLATTR, tokenStream.currentToken().t_op); Again, the op is always JSOP_STRING. Same deal. In Parser::xmlElementContent(ParseNode *pn): > /* Non-zero-length XML text scanned. */ >- ParseNode *pn2 = xmlAtomNode(); >+ ParseNode *pn2 = atomNode(tt == TOK_XMLSPACE ? PNK_XMLSPACE : PNK_XMLTEXT, >+ tokenStream.currentToken().t_op); This case is less obvious. I think t_op is always JSOP_STRING here. It could perhaps be uninitialized, in which case surely it doesn't matter what we use. So I vote for hard-coding JSOP_STRING here too. Nicely done. r=me with the comments addressed.

Attachment #571455 - Flags: review?(jorendorff) → review+

Jeff Walden [:Waldo]

Assignee

Comment 6

•

14 years ago

Good comments, basically responded to them all, although I ended up folding in a couple of the requests after determining that they made sense enough that keeping them split out would just be a hassle. https://hg.mozilla.org/integration/mozilla-inbound/rev/cce5d3a79723 Also, you anticipate me in stating that we want to remove t_op! I hope to do this very soon, so that Token will be typed by only one field and not kind-of two.

Target Milestone: --- → mozilla11

Nicholas Nethercote [inactive]

Comment 7

•

14 years ago

> Also, you anticipate me in stating that we want to remove t_op! I hope to > do this very soon, so that Token will be typed by only one field and not > kind-of two. Huzzah!

Marco Bonardo [:mak]

Comment 8

•

14 years ago

https://hg.mozilla.org/mozilla-central/rev/cce5d3a79723

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Patch 14 years ago Jeff Walden [:Waldo] 269.79 KB, patch	jorendorff : review+	Details \| Diff \| Splinter Review
Bundle of patches, including the one here, for use in building the patch 14 years ago Jeff Walden [:Waldo] 32.49 KB, application/octet-stream		Details

Bugzilla

Make ParseNode::getKind() and Token::type different types

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type