Created attachment 569530 [details] [diff] [review] Patch TOK_UNARYOP is another token-kind-category for unary operations: ~, !, typeof, void. Those can be split up in the tokenizer, and when we need to know if we have a TOK_UNARYOP-alike, we can just use the in-range trick. Unlike for TOK_EQOP, there are further complications, because TokenKind is also used for ParseNode::getKind(). In the parser, TOK_UNARYOP is used to munge certain forms of XML name: * (anyname), @foo (at), and x::y (dblcolon). Basically a unary node is inserted as a parent of the actual node, with op JSOP_XMLNAME and type TOK_UNARYOP. (JSOP_XMLNAME is sometimes later rewritten, depending on context, into JSOP_SETXMLNAME, JSOP_CALLXMLNAME, or JSOP_BINDXMLNAME.) In this patch I've changed the type of the inserted unary to be the type of the thing parsed -- TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON. (JSOP_XMLNAME is still used.) This means that for all these types, when parse nodes are interpreted, in addition to being binary nodes, or unary nodes, or whatever, they might also be a unary node. So this patch, in the parser, must handle TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON whenever TOK_UNARYOP would have been handled. Also in the parser, TOK_UNARYOP represents TOK_PLUS and TOK_MINUS. In the token stream these are scanned as TOK_PLUS with JSOP_PLUS, and likewise for minus. But the parser rewrites them into TOK_UNARYOP parse nodes. This presents an issue: to use TOK_PLUS instead of TOK_UNARYOP, TOK_PLUS must represent both (+a) and (a + b). (Likewise for TOK_MINUS.) For now, I've addressed the temporary dual nature of TOK_PLUS and TOK_MINUS when used to discriminate a parse node usually by adding |if (unary) goto unary_handling;|. This is not pretty, but it will go away when I expand the parse node kind set to not lump (a + b) and (+a) together. Last, there's a complication for Reflect.parse. In the reification of MemberExpressions, there's a field named "computed", which for non-XML sort of distinguishes x[p] from x.p. The relevant code is by |bool computed| in ASTSerializer::expression. For XML, there's special-case code which...well, dherman and I aren't sure exactly what it's supposed to do. For x.* the parser uses TOK_ANYNAME, and for x[*] it uses TOK_UNARYOP. This causes Reflect.parse to see x.* as non-computed and x[*] as computed. Maybe that's wrong, because the ASTSerializer special-case seems to have been targeted at it. Always using TOK_ANYNAME, as in this patch, causes us to see x[*] and x.* as non-computed. And we can't distinguish with TOK_RB or TOK_DOT because the parser rewrites x.* to use TOK_RB and JSOP_GETELEM. Since this all is nitpicky E4X edge cases, in a feature not yet highly used, I've just changed reflect-parse.js to expect x[*] and x.* to both be non-computed. It should be possible to fix this somehow by expanding parse node kinds to not conflate these two, just as for + and -. This too will go away after parse node kind set expansion. This is a mixed bag that's several steps forward and a few steps back. It's a partial improvement, but not a complete improvement. Bigger readability gains will come when parse node kinds are made distinct from token kinds. That's next on my list, so the length of time we tolerate some of these oddities will be short. Alternately, I can keep any patch here queued up until that followup work is complete, then land it all at once to avoid regressions. I'm happy doing whatever others want here.
Comment on attachment 569530 [details] [diff] [review] Patch Dave, one question only on this as far as this review request goes: Is it okay to temporarily change reflect-parse.js to expect different computed-ness for x.* and x[*] (and x.y::z and x.@foo and x.@[foo])? I will be coming back here shortly to fix them up, in whatever way we decide we actually want (since we hadn't decided, as far as I could tell, in IRC discussion), so this is only temporary. Alternatively, I can hold off pushing this patch until I have patches for changes to revert to better behavior, then I can push both patches at once to not have a regression. But given we don't even know what semantics we want here, and existing semantics might well have been buggy, spending much effort to avoid regression seems not worth the trouble to me.
Comment on attachment 569530 [details] [diff] [review] Patch Review of attachment 569530 [details] [diff] [review]: ----------------------------------------------------------------- That's fine. I will need to take a little time to figure out what the behavior should be. If you figure something out in the meantime, let me know. :) Dave
https://hg.mozilla.org/integration/mozilla-inbound/rev/944c81533751 https://hg.mozilla.org/integration/mozilla-inbound/rev/6d6a47da6c5a https://hg.mozilla.org/integration/mozilla-inbound/rev/22b33eb2969d https://hg.mozilla.org/integration/mozilla-inbound/rev/e80f536a7069 https://hg.mozilla.org/integration/mozilla-inbound/rev/51039a8be72c https://hg.mozilla.org/integration/mozilla-inbound/rev/5271cc9673eb
https://hg.mozilla.org/mozilla-central/rev/944c81533751 the other changesets are marked as related to bug 697795, so I'll mark them there.
Two regressions and counting. What was the win in code size or runtime win from this patch? /be
(In reply to Brendan Eich [:brendan] from comment #5) There is another category of win: complexity. As I slog through bug 692274 (on hiatus now to review ObjShrink) I am seeing how valuable these logical reductions are to the frontend and specifically appreciating the work over the last year by waldo, jorendorff, cdleary, etc. I hope it continues!
(In reply to Luke Wagner [:luke] from comment #6) > (In reply to Brendan Eich [:brendan] from comment #5) > There is another category of win: complexity. As I slog through bug 692274 > (on hiatus now to review ObjShrink) I am seeing how valuable these logical > reductions are to the frontend and specifically appreciating the work over > the last year by waldo, jorendorff, cdleary, etc. I hope it continues! Need a particular argument. The complexity of more token kinds trades off against the complexity of sub-dispatch in certain cases on pn_op. It's not a clear win IMHO, but I'm happy if there is a more objective time and/or space win. Is there? /be
(In reply to Brendan Eich [:brendan] from comment #7) > Need a particular argument. The complexity of more token kinds trades off > against the complexity of sub-dispatch in certain cases on pn_op. Why can't the primary win be in terms of comprehensibility of the AST to developers? We need the frontend to be more easily understood (and the code to be more easily modified) if we're going to compete with other engines on the ES6 feature front. I'm fairly certain the indirect branch predictability overheads will be negligible -- if you have doubts we could spend time trying to measure with parser microbenchmarks, but I'm not sure there's good reason to do so.
(In reply to Brendan Eich [:brendan] from comment #7) > Need a particular argument. Particularly: it would be a big improvement for parse nodes to have a single primary discriminator instead of mix of token kind, op, arity and context. Removing dependence on pn_op (to eventually remove pn_op) by adding token kinds is a step in this direction.
Indirect branches? No way those matter. A simple before/after size on the .o would be enough, or a parsemark before/after. /be
For what it's worth, I made special effort to delay landing this change until just after an aurora merge, in case it happened to cause regressions requiring branch gymnastics to address. I'm happy to see that hedge paid off (tho of course I'd rather it hadn't :-) ). Just-post-merge landings for tricky stuff are a good idea. Not to throw stones or impugn the judgment of anyone who might have done otherwise in the past, or whose reasoned judgment counsels against doing otherwise in the future, but I think we would get good mileage out of using this tactic more often.
@Waldo FYI the instructions for running parsemark are on the wiki: https://developer.mozilla.org/En/SpiderMonkey/Running_Parsemark
I found that parsemark.py's results were way too noisy to detect anything useful. I imported parsemark into the Sunspider setup and ran it that way, with more success.
(In reply to Nicholas Nethercote [:njn] from comment #13) Could you add that procedure description to the wiki page? Thanks Nick!