Last Comment Bug 697297 - Remove TOK_UNARYOP, split it up into multiple token kinds
: Remove TOK_UNARYOP, split it up into multiple token kinds
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 701222 701224 701247
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-25 15:52 PDT by Jeff Walden [:Waldo] (remove +bmo to email)
Modified: 2011-11-15 12:16 PST (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch (24.01 KB, patch)
2011-10-25 15:52 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
cdleary: review+
dherman: review+
Details | Diff | Splinter Review

Description Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-25 15:52:16 PDT
Created attachment 569530 [details] [diff] [review]
Patch

TOK_UNARYOP is another token-kind-category for unary operations: ~, !, typeof, void.  Those can be split up in the tokenizer, and when we need to know if we have a TOK_UNARYOP-alike, we can just use the in-range trick.

Unlike for TOK_EQOP, there are further complications, because TokenKind is also used for ParseNode::getKind().

In the parser, TOK_UNARYOP is used to munge certain forms of XML name: * (anyname), @foo (at), and x::y (dblcolon).  Basically a unary node is inserted as a parent of the actual node, with op JSOP_XMLNAME and type TOK_UNARYOP.  (JSOP_XMLNAME is sometimes later rewritten, depending on context, into JSOP_SETXMLNAME, JSOP_CALLXMLNAME, or JSOP_BINDXMLNAME.)  In this patch I've changed the type of the inserted unary to be the type of the thing parsed -- TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON.  (JSOP_XMLNAME is still used.)  This means that for all these types, when parse nodes are interpreted, in addition to being binary nodes, or unary nodes, or whatever, they might also be a unary node.  So this patch, in the parser, must handle TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON whenever TOK_UNARYOP would have been handled.

Also in the parser, TOK_UNARYOP represents TOK_PLUS and TOK_MINUS.  In the token stream these are scanned as TOK_PLUS with JSOP_PLUS, and likewise for minus.  But the parser rewrites them into TOK_UNARYOP parse nodes.  This presents an issue: to use TOK_PLUS instead of TOK_UNARYOP, TOK_PLUS must represent both (+a) and (a + b).  (Likewise for TOK_MINUS.)  For now, I've addressed the temporary dual nature of TOK_PLUS and TOK_MINUS when used to discriminate a parse node usually by adding |if (unary) goto unary_handling;|.  This is not pretty, but it will go away when I expand the parse node kind set to not lump (a + b) and (+a) together.

Last, there's a complication for Reflect.parse.  In the reification of MemberExpressions, there's a field named "computed", which for non-XML sort of distinguishes x[p] from x.p.  The relevant code is by |bool computed| in ASTSerializer::expression.  For XML, there's special-case code which...well, dherman and I aren't sure exactly what it's supposed to do.  For x.* the parser uses TOK_ANYNAME, and for x[*] it uses TOK_UNARYOP.  This causes Reflect.parse to see x.* as non-computed and x[*] as computed.  Maybe that's wrong, because the ASTSerializer special-case seems to have been targeted at it.  Always using TOK_ANYNAME, as in this patch, causes us to see x[*] and x.* as non-computed.  And we can't distinguish with TOK_RB or TOK_DOT because the parser rewrites x.* to use TOK_RB and JSOP_GETELEM.  Since this all is nitpicky E4X edge cases, in a feature not yet highly used, I've just changed reflect-parse.js to expect x[*] and x.* to both be non-computed.  It should be possible to fix this somehow by expanding parse node kinds to not conflate these two, just as for + and -.  This too will go away after parse node kind set expansion.

This is a mixed bag that's several steps forward and a few steps back.  It's a partial improvement, but not a complete improvement.  Bigger readability gains will come when parse node kinds are made distinct from token kinds.  That's next on my list, so the length of time we tolerate some of these oddities will be short.  Alternately, I can keep any patch here queued up until that followup work is complete, then land it all at once to avoid regressions.  I'm happy doing whatever others want here.
Comment 1 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-25 15:57:22 PDT
Comment on attachment 569530 [details] [diff] [review]
Patch

Dave, one question only on this as far as this review request goes:

Is it okay to temporarily change reflect-parse.js to expect different computed-ness for x.* and x[*] (and x.y::z and x.@foo and x.@[foo])?  I will be coming back here shortly to fix them up, in whatever way we decide we actually want (since we hadn't decided, as far as I could tell, in IRC discussion), so this is only temporary.

Alternatively, I can hold off pushing this patch until I have patches for changes to revert to better behavior, then I can push both patches at once to not have a regression.  But given we don't even know what semantics we want here, and existing semantics might well have been buggy, spending much effort to avoid regression seems not worth the trouble to me.
Comment 2 Dave Herman [:dherman] 2011-10-25 16:02:21 PDT
Comment on attachment 569530 [details] [diff] [review]
Patch

Review of attachment 569530 [details] [diff] [review]:
-----------------------------------------------------------------

That's fine. I will need to take a little time to figure out what the behavior should be. If you figure something out in the meantime, let me know. :)

Dave
Comment 4 Marco Bonardo [::mak] 2011-11-09 05:23:24 PST
https://hg.mozilla.org/mozilla-central/rev/944c81533751

the other changesets are marked as related to bug 697795, so I'll mark them there.
Comment 5 Brendan Eich [:brendan] 2011-11-09 16:16:02 PST
Two regressions and counting. What was the win in code size or runtime win from this patch?

/be
Comment 6 Luke Wagner [:luke] 2011-11-09 16:59:22 PST
(In reply to Brendan Eich [:brendan] from comment #5)
There is another category of win: complexity.  As I slog through bug 692274 (on hiatus now to review ObjShrink) I am seeing how valuable these logical reductions are to the frontend and specifically appreciating the work over the last year by waldo, jorendorff, cdleary, etc.  I hope it continues!
Comment 7 Brendan Eich [:brendan] 2011-11-10 13:38:38 PST
(In reply to Luke Wagner [:luke] from comment #6)
> (In reply to Brendan Eich [:brendan] from comment #5)
> There is another category of win: complexity.  As I slog through bug 692274
> (on hiatus now to review ObjShrink) I am seeing how valuable these logical
> reductions are to the frontend and specifically appreciating the work over
> the last year by waldo, jorendorff, cdleary, etc.  I hope it continues!

Need a particular argument. The complexity of more token kinds trades off against the complexity of sub-dispatch in certain cases on pn_op. It's not a clear win IMHO, but I'm happy if there is a more objective time and/or space win. Is there?

/be
Comment 8 Chris Leary [:cdleary] (not checking bugmail) 2011-11-10 19:43:57 PST
(In reply to Brendan Eich [:brendan] from comment #7)
> Need a particular argument. The complexity of more token kinds trades off
> against the complexity of sub-dispatch in certain cases on pn_op.

Why can't the primary win be in terms of comprehensibility of the AST to developers? We need the frontend to be more easily understood (and the code to be more easily modified) if we're going to compete with other engines on the ES6 feature front.

I'm fairly certain the indirect branch predictability overheads will be negligible -- if you have doubts we could spend time trying to measure with parser microbenchmarks, but I'm not sure there's good reason to do so.
Comment 9 Luke Wagner [:luke] 2011-11-10 21:42:31 PST
(In reply to Brendan Eich [:brendan] from comment #7)
> Need a particular argument.

Particularly: it would be a big improvement for parse nodes to have a single primary discriminator instead of mix of token kind, op, arity and context.  Removing dependence on pn_op (to eventually remove pn_op) by adding token kinds is a step in this direction.
Comment 10 Brendan Eich [:brendan] 2011-11-14 13:13:34 PST
Indirect branches? No way those matter. A simple before/after size on the .o would be enough, or a parsemark before/after.

/be
Comment 11 Jeff Walden [:Waldo] (remove +bmo to email) 2011-11-14 14:50:18 PST
For what it's worth, I made special effort to delay landing this change until just after an aurora merge, in case it happened to cause regressions requiring branch gymnastics to address.  I'm happy to see that hedge paid off (tho of course I'd rather it hadn't :-) ).

Just-post-merge landings for tricky stuff are a good idea.  Not to throw stones or impugn the judgment of anyone who might have done otherwise in the past, or whose reasoned judgment counsels against doing otherwise in the future, but I think we would get good mileage out of using this tactic more often.
Comment 12 Chris Leary [:cdleary] (not checking bugmail) 2011-11-14 22:51:21 PST
@Waldo FYI the instructions for running parsemark are on the wiki:

https://developer.mozilla.org/En/SpiderMonkey/Running_Parsemark
Comment 13 Nicholas Nethercote [:njn] 2011-11-15 05:08:43 PST
I found that parsemark.py's results were way too noisy to detect anything useful.  I imported parsemark into the Sunspider setup and ran it that way, with more success.
Comment 14 Chris Leary [:cdleary] (not checking bugmail) 2011-11-15 12:16:52 PST
(In reply to Nicholas Nethercote [:njn] from comment #13)

Could you add that procedure description to the wiki page? Thanks Nick!

Note You need to log in before you can comment on or make changes to this bug.