697297 - Remove TOK_UNARYOP, split it up into multiple token kinds

Assignee

Description

•

13 years ago

Attached patch Patch — Details — Splinter Review

TOK_UNARYOP is another token-kind-category for unary operations: ~, !, typeof, void. Those can be split up in the tokenizer, and when we need to know if we have a TOK_UNARYOP-alike, we can just use the in-range trick.

Unlike for TOK_EQOP, there are further complications, because TokenKind is also used for ParseNode::getKind().

In the parser, TOK_UNARYOP is used to munge certain forms of XML name: * (anyname), @foo (at), and x::y (dblcolon). Basically a unary node is inserted as a parent of the actual node, with op JSOP_XMLNAME and type TOK_UNARYOP. (JSOP_XMLNAME is sometimes later rewritten, depending on context, into JSOP_SETXMLNAME, JSOP_CALLXMLNAME, or JSOP_BINDXMLNAME.) In this patch I've changed the type of the inserted unary to be the type of the thing parsed -- TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON. (JSOP_XMLNAME is still used.) This means that for all these types, when parse nodes are interpreted, in addition to being binary nodes, or unary nodes, or whatever, they might also be a unary node. So this patch, in the parser, must handle TOK_ANYNAME, TOK_AT, and TOK_DBLCOLON whenever TOK_UNARYOP would have been handled.

Also in the parser, TOK_UNARYOP represents TOK_PLUS and TOK_MINUS. In the token stream these are scanned as TOK_PLUS with JSOP_PLUS, and likewise for minus. But the parser rewrites them into TOK_UNARYOP parse nodes. This presents an issue: to use TOK_PLUS instead of TOK_UNARYOP, TOK_PLUS must represent both (+a) and (a + b). (Likewise for TOK_MINUS.) For now, I've addressed the temporary dual nature of TOK_PLUS and TOK_MINUS when used to discriminate a parse node usually by adding |if (unary) goto unary_handling;|. This is not pretty, but it will go away when I expand the parse node kind set to not lump (a + b) and (+a) together.

Last, there's a complication for Reflect.parse. In the reification of MemberExpressions, there's a field named "computed", which for non-XML sort of distinguishes x[p] from x.p. The relevant code is by |bool computed| in ASTSerializer::expression. For XML, there's special-case code which...well, dherman and I aren't sure exactly what it's supposed to do. For x.* the parser uses TOK_ANYNAME, and for x[*] it uses TOK_UNARYOP. This causes Reflect.parse to see x.* as non-computed and x[*] as computed. Maybe that's wrong, because the ASTSerializer special-case seems to have been targeted at it. Always using TOK_ANYNAME, as in this patch, causes us to see x[*] and x.* as non-computed. And we can't distinguish with TOK_RB or TOK_DOT because the parser rewrites x.* to use TOK_RB and JSOP_GETELEM. Since this all is nitpicky E4X edge cases, in a feature not yet highly used, I've just changed reflect-parse.js to expect x[*] and x.* to both be non-computed. It should be possible to fix this somehow by expanding parse node kinds to not conflate these two, just as for + and -. This too will go away after parse node kind set expansion.

This is a mixed bag that's several steps forward and a few steps back. It's a partial improvement, but not a complete improvement. Bigger readability gains will come when parse node kinds are made distinct from token kinds. That's next on my list, so the length of time we tolerate some of these oddities will be short. Alternately, I can keep any patch here queued up until that followup work is complete, then land it all at once to avoid regressions. I'm happy doing whatever others want here.

Attachment #569530 - Flags: review?(cdleary)

Jeff Walden [:Waldo]

Assignee

Comment 1

•

13 years ago

Comment on attachment 569530 [details] [diff] [review]
Patch

Dave, one question only on this as far as this review request goes:

Is it okay to temporarily change reflect-parse.js to expect different computed-ness for x.* and x[*] (and x.y::z and x.@foo and x.@[foo])?  I will be coming back here shortly to fix them up, in whatever way we decide we actually want (since we hadn't decided, as far as I could tell, in IRC discussion), so this is only temporary.

Alternatively, I can hold off pushing this patch until I have patches for changes to revert to better behavior, then I can push both patches at once to not have a regression.  But given we don't even know what semantics we want here, and existing semantics might well have been buggy, spending much effort to avoid regression seems not worth the trouble to me.

Attachment #569530 - Flags: review?(dherman)

Dave Herman [:dherman]

Comment 2

•

13 years ago

Comment on attachment 569530 [details] [diff] [review]
Patch

Review of attachment 569530 [details] [diff] [review]:
-----------------------------------------------------------------

That's fine. I will need to take a little time to figure out what the behavior should be. If you figure something out in the meantime, let me know. :)

Dave

Attachment #569530 - Flags: review?(dherman) → review+

Chris Leary [:cdleary] (not checking bugmail)

Updated

•

13 years ago

Attachment #569530 - Flags: review?(cdleary) → review+

Jeff Walden [:Waldo]

Assignee

Comment 3

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/944c81533751
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d6a47da6c5a
https://hg.mozilla.org/integration/mozilla-inbound/rev/22b33eb2969d
https://hg.mozilla.org/integration/mozilla-inbound/rev/e80f536a7069
https://hg.mozilla.org/integration/mozilla-inbound/rev/51039a8be72c
https://hg.mozilla.org/integration/mozilla-inbound/rev/5271cc9673eb

Target Milestone: --- → mozilla11

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 4

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/944c81533751

the other changesets are marked as related to bug 697795, so I'll mark them there.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

13 years ago

Depends on: 701222

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

13 years ago

Depends on: 701224

Brendan Eich [:brendan]

Comment 5

•

13 years ago

Two regressions and counting. What was the win in code size or runtime win from this patch?

/be

Luke Wagner [:luke]

Comment 6

•

13 years ago

(In reply to Brendan Eich [:brendan] from comment #5)
There is another category of win: complexity.  As I slog through bug 692274 (on hiatus now to review ObjShrink) I am seeing how valuable these logical reductions are to the frontend and specifically appreciating the work over the last year by waldo, jorendorff, cdleary, etc.  I hope it continues!

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

13 years ago

Depends on: 701247

Brendan Eich [:brendan]

Comment 7

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #6)
> (In reply to Brendan Eich [:brendan] from comment #5)
> There is another category of win: complexity.  As I slog through bug 692274
> (on hiatus now to review ObjShrink) I am seeing how valuable these logical
> reductions are to the frontend and specifically appreciating the work over
> the last year by waldo, jorendorff, cdleary, etc.  I hope it continues!

Need a particular argument. The complexity of more token kinds trades off against the complexity of sub-dispatch in certain cases on pn_op. It's not a clear win IMHO, but I'm happy if there is a more objective time and/or space win. Is there?

/be

Chris Leary [:cdleary] (not checking bugmail)

Comment 8

•

13 years ago

(In reply to Brendan Eich [:brendan] from comment #7)
> Need a particular argument. The complexity of more token kinds trades off
> against the complexity of sub-dispatch in certain cases on pn_op.

Why can't the primary win be in terms of comprehensibility of the AST to developers? We need the frontend to be more easily understood (and the code to be more easily modified) if we're going to compete with other engines on the ES6 feature front.

I'm fairly certain the indirect branch predictability overheads will be negligible -- if you have doubts we could spend time trying to measure with parser microbenchmarks, but I'm not sure there's good reason to do so.

Luke Wagner [:luke]

Comment 9

•

13 years ago

(In reply to Brendan Eich [:brendan] from comment #7)
> Need a particular argument.

Particularly: it would be a big improvement for parse nodes to have a single primary discriminator instead of mix of token kind, op, arity and context.  Removing dependence on pn_op (to eventually remove pn_op) by adding token kinds is a step in this direction.

Brendan Eich [:brendan]

Comment 10

•

13 years ago

Indirect branches? No way those matter. A simple before/after size on the .o would be enough, or a parsemark before/after.

/be

Jeff Walden [:Waldo]

Assignee

Comment 11

•

13 years ago

For what it's worth, I made special effort to delay landing this change until just after an aurora merge, in case it happened to cause regressions requiring branch gymnastics to address.  I'm happy to see that hedge paid off (tho of course I'd rather it hadn't :-) ).

Just-post-merge landings for tricky stuff are a good idea.  Not to throw stones or impugn the judgment of anyone who might have done otherwise in the past, or whose reasoned judgment counsels against doing otherwise in the future, but I think we would get good mileage out of using this tactic more often.

Chris Leary [:cdleary] (not checking bugmail)

Comment 12

•

13 years ago

@Waldo FYI the instructions for running parsemark are on the wiki:

https://developer.mozilla.org/En/SpiderMonkey/Running_Parsemark

Nicholas Nethercote [inactive]

Comment 13

•

13 years ago

I found that parsemark.py's results were way too noisy to detect anything useful.  I imported parsemark into the Sunspider setup and ran it that way, with more success.

Chris Leary [:cdleary] (not checking bugmail)

Comment 14

•

13 years ago

(In reply to Nicholas Nethercote [:njn] from comment #13)

Could you add that procedure description to the wiki page? Thanks Nick!