Closed Bug 1288460 Opened 4 years ago Closed 4 years ago

Allow escape sequences in keyword-like but non-reserved words

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox50 --- affected
firefox51 --- fixed

People

(Reporter: anba, Assigned: Waldo)

References

Details

Attachments

(4 files, 4 obsolete files)

10.97 KB, patch
arai
: review+
Details | Diff | Splinter Review
37.52 KB, patch
arai
: review+
Details | Diff | Splinter Review
19.74 KB, patch
arai
: review+
Details | Diff | Splinter Review
5.52 KB, patch
arai
: review+
Details | Diff | Splinter Review
Test case:
---
with({}) l\u0065t: ;
with({}) st\u0061tic: ;
---

Expected: No SyntaxError
Actual: Throws SyntaxError "keywords must be written literally, without embedded escapes"

Note: Neither "let" nor "static" are reserved words, so it's allowed to use escape sequences in this context.

11.6.2.1 Keywords [https://tc39.github.io/ecma262/#sec-keywords]
12.1.1 Static Semantics: Early Errors [https://tc39.github.io/ecma262/#sec-identifiers-static-semantics-early-errors]
I worked a bit on patchwork for this.  I eventually reached a point where it became clear to me that attempting to treat 'let' as a keyword, with its own TokenKind TOK_LET, was a losing proposition, so I removed it.  That led me smack into bug 1288459 when sussing out the particular quirks in doing so.  So, similar to that bug, I think this depends on the frontend rewrite, as far as 'let' is concerned.

As to 'static', I see that the current spec says 'static' is a contextual keyword that's treated as reserved in other contexts only in strict mode code.  I suspect I'll want to make that also just a name as a fix here, but I'm not certain.

In any event, the former change spiralled into bug 1288459, and the combined work for both spans enough this is gonna have to wait for frontend rewriting to land.
Assignee: nobody → jwalden+bmo
Depends on: 1263355
Attached patch Most of a patch (obsolete) — Splinter Review
+68,-70 lines overall to remove the custom TokenKind.  Not bad.

This might need some extra work related to Identifier parsing, most particularly in import/export parsing.  Not sure.  Also it still needs tests, and exercise of the places maybe needing work still.
...most of a patch for TOK_LET *only*, that is.  Removing TOK_STATIC will be a separate patch, and likely a much simpler one given that |static| is a contextual keyword in many fewer places.
So this is (I argue) a very good patch that makes a bunch of parsing of identifiers much nicer.  Unfortunately, a try-push

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fb6a13559db

reveals that (at a minimum) it falls afoul of our failing to parse the argument-list-string given to the Function constructor as strict mode code, when the body-string provided to the same is strict mode code.  That breaks ecma_5/misc/future-reserved-words.js that I modified a bit in the previous patch, on this basic testcase (and likely a few variants, but I didn't investigate):

  Function("let", "'use strict';"); // should throw SyntaxError, probably did with TOK_LET, wouldn't now

evilpie said he was rebasing a patch to use proper Parser argument-parsing code, so hopefully this unfortunate state of affairs is a temporary one.  Feel free to look at the patch, but there's not much point in reviewing if it's not landable yet.  :-\
This is approximately landable in its own right, without the identifier-parsing rewrite bits.  (Although it would have to beware bug 1298779.)  But I don't know that I'm inclined to push on it, without that other patch landing first.  I don't have much confidence that we're processing |tokenStream.currentName()| correctly in all the places it's used, without that patch landing first.  And I'd sort of rather not spend a bunch of time enumerating all the ways the current code might be busted, when it's easier to just make the presentation more obviously correct and not have to suss out every edge-case misbehavior.
Attachment #8785687 - Attachment is obsolete: true
And the similar patch for |static|.

I should note that the previous patch is only a few lines away from being a net code *reduction*, and it probably would be if I weren't adding nice comments in places.  Having TOK_LET and TOK_STATIC is almost actively harmful to sanity/readability here, IMO.

This too is almost landable, like the |let| patch, modulo bug 1298779.
Setting a dependency on Function argument parsing, for now.  If that gets too long in the tooth I can always do the let/static bits of this patch and punt identifier-handling generally to a new bug, but again, prefer not to.
Depends on: 755821
Attachment #8786408 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8082a95eccd3
Modify ecma_5/misc/future-reserved-words.js to test using assertEq instead of reportCompare, for easier debugging.  r=arai
I think I misread the tea leaves suggesting this had to have proper parsing of arguments in the Function constructor a little, so I think this is fixable without further ado.  Patches shortly.
Keywords: leave-open
Attachment #8786411 - Attachment is obsolete: true
The CLOBBER change is unfortunate, but there's not much else to do.  The real sadness will be various people's shell builds I'll probably break til they manually clobber, because apparently |make -s -C dbg| doesn't know about CLOBBER.
Attachment #8788656 - Flags: review?(arai.unmht)
Attachment #8786412 - Attachment is obsolete: true
Attachment #8786415 - Attachment is obsolete: true
Comment on attachment 8788652 [details] [diff] [review]
Parse names in a ton of places using Parser::* functions directly named after spec productions rather than open-coding them

Review of attachment 8788652 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed.

::: js/src/asmjs/AsmJS.cpp
@@ +6941,5 @@
>  
>      TokenKind tk;
>      if (!tokenStream.getToken(&tk, TokenStream::Operand))
>          return false;
> +    if (tk == TOK_NAME || tk == TOK_YIELD) {

we could now flip this condition and move `name` declaration after `if` block.

::: js/src/frontend/Parser.cpp
@@ +8226,5 @@
> +            versionNumber() >= JSVERSION_1_7)
> +        {
> +            report(ParseError, false, null(), JSMSG_RESERVED_ID, "yield");
> +            return nullptr;
> +        }

It's better merging labelOrIdentifierReference and bindingIdentifier with extra parameter for allowing label,
to reduce the duplication (other parts also adds more duplication)

@@ +8246,5 @@
> +        MOZ_ASSERT(ident != context->names().yield,
> +                   "tokenizer should have treated 'yield' as TOK_YIELD");
> +
> +        if (pc->sc()->strict()) {
> +            const char* name = ident == context->names().arguments

Can this be `badName` or `errorName` or something that says this should be non-null only if `ident` is bad?

::: js/src/frontend/Parser.h
@@ +1223,5 @@
> +    PropertyName* importedBinding() {
> +        return bindingIdentifier(YieldIsName);
> +    }
> +
> +    Node identifierReference(Handle<PropertyName*> name);

It's so confusing to name both those 2 methods "identifierReference",
as they do different thing, and sometimes appear continuously.

It would be nice to append "...NameFromCurrentToken" (or simply "...Name") to methods that return PropertyName*.

::: js/src/tests/ecma_6/Function/rest-parameter-names.js
@@ +3,5 @@
> +
> +//-----------------------------------------------------------------------------
> +var BUGNUMBER = 1288460;
> +var summary =
> +  "Rest parameters to functions can be named |yield| or |eval| or |let| in "

this file tests |yield| only.
can you add others as well?
Attachment #8788652 - Flags: review?(arai.unmht) → review+
Comment on attachment 8788656 [details] [diff] [review]
Allow escape sequences in the keyword-like but non-reserved 'let' Identifier (in non-strict code)

Review of attachment 8788656 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +8208,5 @@
>          MOZ_ASSERT(ident != context->names().yield,
>                     "tokenizer should have treated 'yield' as TOK_YIELD");
> +
> +        if (pc->sc()->strict()) {
> +            const char* name = ident == context->names().let

here too, would be nice to use more descriptive name than `name`.
Attachment #8788656 - Flags: review?(arai.unmht) → review+
Attachment #8788659 - Flags: review?(arai.unmht) → review+
(In reply to Tooru Fujisawa [:arai] from comment #14)
> ::: js/src/frontend/Parser.cpp
> @@ +8226,5 @@
> > +            versionNumber() >= JSVERSION_1_7)
> > +        {
> > +            report(ParseError, false, null(), JSMSG_RESERVED_ID, "yield");
> > +            return nullptr;
> > +        }
> 
> It's better merging labelOrIdentifierReference and bindingIdentifier with
> extra parameter for allowing label,
> to reduce the duplication (other parts also adds more duplication)

I don't think I agree.  IdentifierReference and BindingIdentifier serve different purposes.  The difference in actual semantics of the two terms is subtle.  I think we're better served by separate versions of the functions.  Passing around an extra argument to indicate whether to apply additional semantics unduly muddies the waters.  I'd like to leave this as-is.

> Can this be `badName` or `errorName` or something that says this should be
> non-null only if `ident` is bad?

I'm fine renaming, more or less, but it seems to me the code itself clearly demonstrates that this is non-null only if the identifier is forbidden.

> > +    Node identifierReference(Handle<PropertyName*> name);
> 
> It's so confusing to name both those 2 methods "identifierReference",
> as they do different thing, and sometimes appear continuously.

Yeah, it's not great.  That they differ in return type and in argument types, however, seemed enough to force people to use the right one for the job.  Can't mistakenly use one when you meant the other, because it won't give you what you want.  (And there are super-few places that have to call one followed by the other.)

> It would be nice to append "...NameFromCurrentToken" (or simply "...Name")
> to methods that return PropertyName*.

"from current token" is the default behavior of basically all Parser functions -- off the top of my head, orExpr1/condExpr1 and so on, named with the "1" suffix, are the exceptions that act upon the next token.

Appending simply "Name" isn't a good solution in this particular case, because IdentifierName is *another* spec production.  Having identifierReferenceName would be horrendously confusing.  :-)  Identifier standing alone would also be bad for much the same reason.

I'm on board with having a different/better name for this, but I don't think either of us have one, yet.
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c978746cc2cb
Parse names in a ton of places using Parser::* functions directly named after spec productions rather than open-coding them.  This centralizes many of the static-semantics rules checks in a very few places.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/869a5b6dae78
Allow escape sequences in the keyword-like but non-reserved 'let' Identifier (in non-strict code).  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/240e3c1ab622
Allow escape sequences in the keyword-like but non-reserved 'static' Identifier (in non-strict code).  r=arai
Discussion on IRC of comment 17's concerns led us to run with the names we already have, at least for now.
Status: NEW → ASSIGNED
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/c978746cc2cb
https://hg.mozilla.org/mozilla-central/rev/869a5b6dae78
https://hg.mozilla.org/mozilla-central/rev/240e3c1ab622
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.