Closed Bug 1317375 Opened 7 years ago Closed 7 years ago

Implement "Template Literals Revision / Lifting Template Literal Restriction" ECMAScript proposal

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: anba, Assigned: bakkot)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(1 file, 4 obsolete files)

"Template Literals Revision / Lifting Template Literal Restriction" [1] is currently a stage 3 proposal [2].

test262 already tests the new semantics [3], we're currently failing language/expressions/tagged-template/invalid-escape-sequences.js because of this.


[1] https://github.com/tc39/proposal-template-literal-revision
[2] https://github.com/tc39/proposals#active-proposals
[3] https://github.com/tc39/test262/pull/764
I'd like to work on this. I expect to be able to get a patch ready within a week or two.
Attached patch Implement the feature, add tests (obsolete) — Splinter Review
To implement this in the most straightforward way, this adds a new node type, PNK_UNDEFINED. It is precisely analogous to PNK_NULL except that it represents undefined instead of null.

Error messages for invalid escapes in untagged templates are currently imprecise, with imprecise locations (just the token, instead of the precise location), but that's true in master as well.
Attachment #8826423 - Flags: review?(arai.unmht)
Comment on attachment 8826423 [details] [diff] [review]
Implement the feature, add tests

Review of attachment 8826423 [details] [diff] [review]:
-----------------------------------------------------------------

Great! Thank you for your patch :D

I saw the IRC conversation about PNK_UNDEFINED name.  I'd vote for using different name than PNK_UNDEFINED,
maybe PNK_TEMPLATE_UNDEFINED or something, to make it clear that it doesn't correspond to "undefined" expression but more specific thing.
Especially in BytecodeEmitter.cpp and FoldConstants.cpp, it's confusing to see PNK_UNDEFINED gets handled like a constant.

If we go with PNK_UNDEFINED/UndefinedLiteral it should be better adding comment to those definitions to say it's not "undefined" expression, but represents raw(?) undefined, or maybe a part of template literal.

::: js/src/frontend/NameFunctions.cpp
@@ +316,4 @@
>              return false;
>  
>          // Next is the callsite object node.  This node only contains
> +        // internal strings or undefined and an array -- no user-controlled expressions.

Please wrap comment line into 80 columns.
https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style#Other_whitespace

::: js/src/frontend/ParseNode.cpp
@@ +686,4 @@
>        case PNK_TRUE:  fprintf(stderr, "#true");  break;
>        case PNK_FALSE: fprintf(stderr, "#false"); break;
>        case PNK_NULL:  fprintf(stderr, "#null");  break;
> +      case PNK_UNDEFINED:  fprintf(stderr, "#undefined");  break;

Please use single space after ":" and ";".

::: js/src/frontend/ParseNode.h
@@ +1144,4 @@
>      explicit NullLiteral(const TokenPos& pos) : ParseNode(PNK_NULL, JSOP_NULL, PN_NULLARY, pos) { }
>  };
>  
> +// This is only used internally, currently just for tagged templates.

It should be better explaining what this class represents, in addition to above comment.

::: js/src/frontend/Parser.cpp
@@ +8704,4 @@
>  }
>  
>  template <typename ParseHandler>
> +template <bool tagged>

Please use enum instead of bool, to make it clear what it means on callsite.
for example:
  enum TemplateKind {
    Untagged,
    Tagged
  };
and
  noSubstitutionTemplate<Untagged>();
  noSubstitutionTemplate<Tagged>();
in callsites:

::: js/src/frontend/TokenStream.cpp
@@ +1943,5 @@
>                // Unicode character specification.
>                case 'u': {
> +                uint32_t code = 0;
> +
> +                int32_t lookahead_c;

To follow other places, please name it `c2` or maybe `next` ?
Anyway it should use camelCase instead of underscore.

@@ +1949,2 @@
>                      return false;
> +                if (lookahead_c == '{') { // \u{deadbeef}

This comment is misleading since it's overflowing from unicode::NonBMPMax (0x10FFFF), and in that case we stop handling it even in template literal before reaching "}".

Also, please move the comment tothe next line if you add some comment.
(I feel it's fine to just remove it tho)

@@ +1957,5 @@
> +                    do {
> +                        int32_t c = getCharIgnoreEOL();
> +                        if (c == EOF) {
> +                            if (parsingTemplate) {
> +                                // Invalid escapes in templates are reported by the parser

Please append period.
Same for other places.

@@ +1983,5 @@
> +                        }
> +
> +                        if (!JS7_ISHEX(c)) {
> +                            if (parsingTemplate) {
> +                                ungetCharIgnoreEOL(c); // In case the next character is '\'' or '`', put it back so we read it on the next pass.

Please put the comment to its own line, and also wrap it to 80 columns.

Also, the comment doesn't match to the code.
It puts back the character even if it's not  '\'' or '`'.

@@ +1998,5 @@
> +                        code = (code << 4) | JS7_UNHEX(c);
> +                        if (code > unicode::NonBMPMax) {
> +                            if (parsingTemplate) {
> +                                // Invalid escapes in templates are reported by the parser
> +                                flags.sawInvalidEscape = true;

This looses the detail of the error, the following cases are reported as JSMSG_TEMPLSTR_INVALID_ESC for template:
  * JSMSG_MALFORMED_ESCAPE + "Unicode"
  * JSMSG_MALFORMED_ESCAPE + "hexadecimal"
  * JSMSG_UNICODE_OVERFLOW + "escape sequence"
  * JSMSG_DEPRECATED_OCTAL

Can we store that information by dedicated enum, or JSMSG_* + extra one parameter, instead of true/false, and report corresponding one in Parser<ParseHandler>::noSubstitutionTemplate ?
Parser::PossibleError would be a good example of handling error lazily.

@@ +2037,5 @@
>                  } else {
> +                    if (parsingTemplate) {
> +                        // Invalid escapes in templates are reported by the parser
> +                        flags.sawInvalidEscape = true;
> +                        continue; // Don't bother appending to tokenbuf; it won't be read.

Please put the comment to its own line.

Also, this patch adds these 3 lines (|flags.sawInvalidEscape = true;|, |continue|, and comments) to several places.
Can we add yet another method like matchBracedUnicode that handles all braced/non-braced unicode and hexadecimal, with extra in/out parameter, and merge these 3 lines into single place?
Attachment #8826423 - Flags: review?(arai.unmht) → feedback+
Attached patch Implement the feature, add tests (obsolete) — Splinter Review
Addressed comments.

I couldn't find a good way to split out a method like matchBracedUnicode which didn't end up adding just as many duplicate lines, so instead I just moved the comments to the top of the escape sequence parsing code. Now that this uses 'reportInvalidTemplateEscape' instead of 'flags.sawInvalidEscape' it is somewhat more clear anyway.
Attachment #8826423 - Attachment is obsolete: true
Attachment #8827349 - Flags: review?(arai.unmht)
Comment on attachment 8827349 [details] [diff] [review]
Implement the feature, add tests

Review of attachment 8827349 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks again :D

::: js/src/frontend/Parser.cpp
@@ +8704,4 @@
>  }
>  
>  template <typename ParseHandler>
> +template <typename Parser<ParseHandler>::TemplateKind templateKind>

sorry, now I feel this method should just be splitted into 2 methods for tagged and untagged cases.
since tagged case and untagged case do very different things, sharing only 1 line,
also callsites are totally different for each case.

so, instead of adding template and branches,
how about splitting this into noSubstitutionTaggedTemplate and noSubstitutionUntaggedTemplate? (or similar name)

::: js/src/frontend/TokenStream.cpp
@@ +1988,5 @@
> +                            if (parsingTemplate) {
> +                                ungetCharIgnoreEOL(c);
> +                                // We put the character back so that we read
> +                                // it on the next pass, which matters if it
> +                                // was '`' or '\'.

Nice comment :)
it would be even better if the comment is above `ungetCharIgnoreEOL(c);`

::: js/src/frontend/TokenStream.h
@@ +377,5 @@
>  
> +    InvalidTemplateEscapeType getInvalidTemplateEscapeType() const {
> +        return invalidTemplateEscapeType;
> +    }
> +    void reportInvalidTemplateEscape(InvalidTemplateEscapeType type) {

This method doesn't report but just sets pending error, so "report" doesn't sound describing correctly.
To follow PossibleError class, it would be nice to name this setPendingInvalidTemplateEscapeError or something.

Also, regarding "report*" method, it sounds nice to have a method that reports pending error like PossibleError::checkForError, instead of returning raw InvalidTemplateEscapeType data and report it on Parser, since same error is handled inside TokenStream for string case.
(and maybe we could merge those string and template cases into one hopefully?)

In that case, TokenStream should expose only the following functions to public:
  * hasPendingInvalidTemplateEscapeError
  * clearInvalidTemplateEscapeType
  * reportPendingInvalidTemplateEscapeError
    or checkForInvalidTemplateEscapeError (if following PossibleError)
and privately:
  * setPendingInvalidTemplateEscapeError
Attachment #8827349 - Flags: review?(arai.unmht) → feedback+
Attached patch Implement the feature, add tests (obsolete) — Splinter Review
Addressed comments, round 2.

TokenStream now exposes public
* `hasInvalidTemplateEscape`
* `clearInvalidTemplateEscape` and
* `checkForInvalidTemplateEscapeError`

as well as private
* `setInvalidTemplateEscape` and
* `reportInvalidEscapeError` (which is also used when tokenizing strings).

(The set/check/clear methods don't say "error" because it isn't an error in tagged templates.)

This also has the effect that errors for invalid escape sequences in templates and strings (except the error for octal literals in a string in strict mode) have more precise location information.

Previously:

> js>   ' \u{}'
> typein:2:2 SyntaxError: malformed Unicode character escape sequence:
> typein:2:2   ' \u{}'
> typein:2:2 ..^

Now: 

> js>   ' \u{}'
> typein:2:4 SyntaxError: malformed Unicode character escape sequence:
> typein:2:4   ' \u{}'
> typein:2:4 ....^
Attachment #8827594 - Flags: review?(arai.unmht)
Aargh, I forgot to obsolete the old patch, and I can't figure out how to do it without submitting another attachment. Pretend it's obsolete, please.
Attached patch Implement the feature, add tests (obsolete) — Splinter Review
Nevermind, there was a formatting issue anyway.

Sorry for the noise. This patch should be ready for review :)
Attachment #8827349 - Attachment is obsolete: true
Attachment #8827594 - Attachment is obsolete: true
Attachment #8827594 - Flags: review?(arai.unmht)
Attachment #8827600 - Flags: review?(arai.unmht)
Comment on attachment 8827600 [details] [diff] [review]
Implement the feature, add tests

Review of attachment 8827600 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you again!
Almost there :)

(In reply to bakkot from comment #6)
> This also has the effect that errors for invalid escape sequences in
> templates and strings (except the error for octal literals in a string in
> strict mode) have more precise location information.

Excellent :D


(In reply to bakkot from comment #7)
> Aargh, I forgot to obsolete the old patch, and I can't figure out how to do
> it without submitting another attachment.

You can obsolete patch from "Details" link, "edit details" button (link) in the page, "obsolete" checkbox.

::: js/src/frontend/TokenStream.h
@@ +386,5 @@
> +    // otherwise return true.
> +    bool checkForInvalidTemplateEscapeError() {
> +        if (invalidTemplateEscapeType == InvalidEscapeType::None) {
> +            return true;
> +        } else {

This then-branch always returns, so `else` isn't necessary.
Please remove `else` and block.
Also, if then-branch (and all else-branches if any) is single line, braces should be omitted.

@@ +475,5 @@
> +            case InvalidEscapeType::Octal:
> +                errorAt(offset, JSMSG_DEPRECATED_OCTAL);
> +                return;
> +            default:
> +                MOZ_ASSERT_UNREACHABLE("unexpected InvalidEscapeType");

it would be better enumerating all enum values instead of using "default:".
when you add new enum value, switch without "default:" will hit compile-error if you forgot to handle the new enum value.
Attachment #8827600 - Flags: review?(arai.unmht) → feedback+
Addressed comments.

Also no longer includes changes to the test framework, since those landed in
https://bugzilla.mozilla.org/show_bug.cgi?id=1332052
Attachment #8827600 - Attachment is obsolete: true
Attachment #8828445 - Flags: review?(arai.unmht)
Comment on attachment 8828445 [details] [diff] [review]
Implement the feature, add tests

Review of attachment 8828445 [details] [diff] [review]:
-----------------------------------------------------------------

Great!
Attachment #8828445 - Flags: review?(arai.unmht) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb868860dfc35876d2d9c421c037c75a4fb9b3d2
Bug 1317375 - Implement "Template Literals Revision / Lifting Template Literal Restriction" ECMAScript proposal r=arai
https://hg.mozilla.org/mozilla-central/rev/bb868860dfc3
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
:hooray:

Thanks for your patience and help getting this in!
congrats :)
Keywords: dev-doc-needed
Assignee: nobody → bakkot
You need to log in before you can comment on or make changes to this bug.