Closed Bug 1195578 Opened 5 years ago Closed 5 years ago

Assertion failure: next.type != TOK_DIV && next.type != TOK_REGEXP (next token requires contextual specifier to be parsed unambiguously), at frontend/TokenStream.h


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
firefox43 --- fixed


(Reporter: gkw, Assigned: arai)



(Keywords: assertion, regression, testcase, Whiteboard: [jsbugmon:update])


(3 files, 1 obsolete file)

a => {}

asserts js debug shell on m-c changeset 9673c75864be with --fuzzing-safe --no-threads --no-ion at Assertion failure: next.type != TOK_DIV && next.type != TOK_REGEXP (next token requires contextual specifier to be parsed unambiguously), at frontend/TokenStream.h.

Configure options:

CC="clang -Qunused-arguments" CXX="clang++ -Qunused-arguments" AR=ar AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=x86_64-apple-darwin12.5.0 --enable-debug --enable-nspr-build --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

python -u ~/funfuzz/js/ -b "--enable-debug --enable-more-deterministic --enable-nspr-build" -r 9673c75864be

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
user:        Tooru Fujisawa
date:        Fri Aug 07 04:11:59 2015 +0900
summary:     Bug 1089045 - Part 1: Supply consistent modifiers to TokenStream. r=Waldo

Tooru, is bug 1089045 a likely regressor?
Flags: needinfo?(arai.unmht)
Attached file stack
(lldb) bt 5
* thread #1: tid = 0x94a9b, 0x000000010004c9d8 js-dbg-64-dm-nsprBuild-darwin-9673c75864be`js::frontend::TokenStream::addModifierException(this=<unavailable>, modifierException=<unavailable>) + 552 at TokenStream.h:478, queue = '', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010004c9d8 js-dbg-64-dm-nsprBuild-darwin-9673c75864be`js::frontend::TokenStream::addModifierException(this=<unavailable>, modifierException=<unavailable>) + 552 at TokenStream.h:478
    frame #1: 0x0000000100044b41 js-dbg-64-dm-nsprBuild-darwin-9673c75864be`js::frontend::Parser<js::frontend::FullParseHandler>::expressionStatement(js::frontend::YieldHandling, js::frontend::Parser<js::frontend::FullParseHandler>::InvokedPrediction) [inlined] js::frontend::MatchOrInsertSemicolon(ts=<unavailable>, modifier=None) + 225 at Parser.cpp:1366
    frame #2: 0x0000000100044ac5 js-dbg-64-dm-nsprBuild-darwin-9673c75864be`js::frontend::Parser<js::frontend::FullParseHandler>::expressionStatement(this=0x00007fff5fbfe658, yieldHandling=<unavailable>, invoked=<unavailable>) + 101 at Parser.cpp:4736
    frame #3: 0x000000010004396b js-dbg-64-dm-nsprBuild-darwin-9673c75864be`js::frontend::Parser<js::frontend::FullParseHandler>::statement(this=<unavailable>, yieldHandling=<unavailable>, canHaveDirectives=<unavailable>) + 1387 at Parser.cpp:6275
    frame #4: 0x00000001001adca1 js-dbg-64-dm-nsprBuild-darwin-9673c75864be`BytecodeCompiler::compileScript(this=0x00007fff5fbfdfd8, scopeChain=<unavailable>, evalCaller=<unavailable>) + 897 at BytecodeCompiler.cpp:592
yeah, that patch should be the one which throws the assertion failure, but not sure if it's the patch's bug or existing bug.

Here is the result in Firefox 40 console (without the patch):

  > eval("a => {}\n/x/;")
  SyntaxError: expected expression, got '/'
  > eval("a => 1\n/x/;")
  SyntaxError: expected expression, got ';'

I think something goes wrong there.  I'll look into it by tonight.
oh, I was wrong. the 2nd example in comment #2 is not related to this.
token after arrow function with block body need Operand modifier.
propagating that information up to expressionStatement and any other callers will make the code complicated, so we could peek that token while parsing arrow function's body, and use modifier exception for it.
try with WIP patch is running
Flags: needinfo?(arai.unmht)
Added peekToken with Operand modifier after arrow function with block body, to get regexp after that correctly, and store it into lookahead buffer.  it might be tricky, but it seems to be a fix with the simplest change here (so, if there's any better solution, we should use that instead).

The issue is that we might need nice name for NoneIsOperandArrowBlock (= NoneIsOperandYieldEOL), the situation is that AssignmentExpression ends there not-because of the actual next token (presence of "}" for arrow function, presence of EOL for yield), and ASI may happen before the next token, then TOK_REGEXP is allowed there but TOK_DIV is not.  or should we handle those exception separately?
or perhaps, we should handle this case with completely different way?

Waldo, how do you think about them?
Assignee: nobody → arai.unmht
Attachment #8649084 - Flags: feedback?(jwalden+bmo)
Comment on attachment 8649084 [details] [diff] [review]
Get a token next to an arrow function with block body with Operand modifier.

Review of attachment 8649084 [details] [diff] [review]:

This is total bikeshed, but what would you think about renaming "Operand" to "Expression", and maybe "None" to "Operator"?  I read "operand" as too much like "operator", and I end up confusing myself regularly reading this stuff.  Separate bug if you agree.

::: js/src/frontend/Parser.cpp
@@ +6760,5 @@
>              return null();
>          }
> +        Node pn = functionDef(inHandling, yieldHandling, nullptr, Arrow, NotGenerator);
> +        if (!pn)

Instead of |pn| use |arrowFunc| or something, please.

@@ +6765,5 @@
> +            return null();
> +
> +        if (isBlock) {
> +            // Next token should be gotten with Operand, before any other
> +            // getToken call.

I think we want to spell out both possibilities in this comment for clarity:

// This arrow function could be a non-trailing member of a comma
// expression or a semicolon terminating a full expression.  If so,
// the next token is that comma/semicolon, gotten with None:
//   a => {}, b; // as if (a => {}), b;
//   a => {};
// But if this arrow function ends a statement, ASI permits the
// next token to start an expression statement.  In that case the
//  next token must be gotten as Operand:
//   a => {} // complete expression statement
//   /x/g;   // regular expression as a statement, *not* division
// Getting the second case right requires the first token-peek
// after the arrow function use Operand, and that peek must occur
// before Parser::expr() looks for a comma.  Do so here, then
// immediately add the modifier exception needed for the first
// case.
// Note that the second case occurs *only* if the arrow function
// has block body.  An arrow function not ending in such, ends in
// another AssignmentExpression that we can inductively assume was
// peeked consistently.

::: js/src/frontend/TokenStream.h
@@ +130,5 @@
>          // followed by EOL, the next token is already gotten with Operand, and
>          // we expect Operand in next statement, but MatchOrInsertSemicolon
>          // after expression statement expects operator (None).
>          NoneIsOperandYieldEOL,
> +        NoneIsOperandArrowBlock = NoneIsOperandYieldEOL,

So really, on closer inspection, I should have seen this bug when the other (for yield-EOL) was filed, 'cause they're two peas in a pod.

What we have are *the* two cases where the token following an AssignmentExpression might have to tokenize as *operand* (and *not* part of that AssignmentExpression, or an explicit semicolon delimiting it), not as *operator*.

Generally we ask if the next thing is a non-operand because we might have just parsed one component of a comma expression, or because the AssignmentExpression ends with a binary expression that might be continued by an operator.

But these two cases differ.

*If* the AssignmentExpression is the end of the Expression (that is, not followed by a comma), *and* there's no expression-ending token (semicolon, closing brace, EOF) that follows (same line or later), *and* we have a linebreak after the AssignmentExpression: that linebreak triggers ASI that forces the next token to be treated as operand.

These are the only two edge cases of this sort.  The cases in AssignmentExpression[In, Yield] are:

== Conditional expressions ==

ConditionalExpression[?In, ?Yield]

This ends in an expression that might use an operator (and would be terminated by lack of one).  Getting that maybe-operator will use None, consistent with a successive comma/semicolon.  *Or*, the ?: expression ends in an AssignmentExpression to be treated inductively.  No special exception needed.

== Yield expressions ==

[+Yield] YieldExpression[?In], i.e.:
  yield <no linebreak> AssignExpression
    yield* <no linebreak> AssignExpression

Pure |yield| being a full Expression means we have trickiness after it.

Looking *only* on the same line, hitting * or a non-expression-ending character means we have another AssignmentExpression -- inductive cases where we'll keep using Operand, so no exceptions needed.

If we instead get a comma/semicolon/closing-{brace,paren,bracket}/colon (this last for A?B:C where B is yield) on the same line, we'll unget and getToken(None) for a comma/semicolon.  So we need an exception to treat None as Operand.

If we get EOF on the same line, or we hit EOL without any token, the |yield| and the AssignmentExpression are complete.  Any next token must be a comma *or* a possibly-ASI-inserted semicolon, gotten with getToken(None).  So we need to treat None as Operand here as well.

The NoneIsOperand/NoneIsOperandYieldEOL split, seen from this vantage point, is weird.  Both cases we need to be able to treat None as Operand.  We can assert in both cases, on the ultimate get, that the token isn't TOK_DIV (because an Operand must be TOK_REGEXP in such cases).  So why have both?

The NoneIsOperand case goes beyond asserting next.type != TOK_DIV, to also assert next.type != TOK_REGEXP.  In general that's just weird: if we're getting an "Operand" (Expression), of course we can have a regular expression literal.  As it happens there are exactly *two* NoneIsOperand cases: both for after |yield|, when we *know* the next token is TOK_SEMI/TOK_RC/TOK_RB/TOK_RP/TOK_COLON/TOK_COMMA.  The exclusion of TOK_REGEXP is correct but dumb -- we know we're limited to a very small set of possibilities not including it.

Given that, I don't think the YieldEOL (and now ArrowBlock) extras are useful/valuable.  We should just have NoneIsOperand and be done with it.  I'll post a patch to get rid of *YieldEOL; we should have just removed the not-TOK_REGEXP in the other bug, I think.

== Arrow functions ==

ArrowFunction[?In, ?Yield], i.e.:
  <args> => AssignmentExpression
  <args> => { ... }

The first case tokenizes inductively, so no new exceptions needed.

The second case would hit the getToken(None) detecting a subsequent comma/semicolon.  If there *is* no comma/semicolon/EOF/closing-brace, then we're in an ASI spot *iff* there's an EOL in play, and the next thing would have to be processed as Operand.  Unlike for |yield|, tho, the natural next step is a getToken(None) -- *not* a getToken(Operand).  But we can't *start* asking for None, because then we'd get a cached TOK_DIV and not the proper TOK_REGEXP.

So I think you're right, sadly -- we need a totally ad-hoc peekToken(Operand) after block-body arrow functions, in concert with an immediate exception.

== Assignments ==

LeftHandSideExpression[?Yield] = AssignmentExpression[?In, ?Yield]
LeftHandSideExpression[?Yield] AssignmentOperator AssignmentExpression[?In, ?Yield]

These end in AssignmentExpression and so are handled inductively without special exceptions needed.

So: everything is handled easily/inductively *except* for |yield| and arrow functions with block body.  We've handled the first.  Your changes handle the second.  So once we finish this, there truly are no other similar possibilities lurking as unfixt bugs.
Attachment #8649084 - Flags: feedback?(jwalden+bmo) → feedback+
Comment on attachment 8656280 [details] [diff] [review]
Consolidate NoneIsOperand and NoneIsOperandYieldEOL modifier exceptions into simply NoneIsOperand

Review of attachment 8656280 [details] [diff] [review]:

Thank you for your feedback and this patch :)
I agree that we can remove the TOK_REGEXP check there.

I'll post my patch rebased on this shortly.

About Modifier's name, "Expression" sounds like it contains leading unary "Operator", so that won't describe well.  I think we should use the name that describes what actually happens, so, the difference between RegExp vs Div.  In ES6 spec, following 4 symbols are used [1]:

 * InputElementDiv
 * InputElementTemplateTail
 * InputElementRegExp
 * InputElementRegExpOrTemplateTail

I guess we could just use InputElementDiv (= None) and InputElementRegExp (= Operand).  InputElementTemplateTail is a bit different because we tokenize leading '}' separatedly, but we could keep using TemplateTail, as it's already there.  InputElementRegExpOrTemplateTail is not used as InputElementTemplateTail is different and RegExp cannot be there.  Not sure whether it's better to leave KeywordIsName there or move it to another place.  anyway, I'll also post that patch shortly in separated bug.

Attachment #8656280 - Flags: review?(arai.unmht) → review+
Changed to just use NoneIsOperand, and added comments for NoneIsOperand definition, also added test for semicolon and colon.
Attachment #8649084 - Attachment is obsolete: true
Attachment #8656435 - Flags: review?(jwalden+bmo)
See Also: → 1201421
Keywords: leave-open
Comment on attachment 8656435 [details] [diff] [review]
Part 1: Get a token next to an arrow function with block body with Operand modifier.

Review of attachment 8656435 [details] [diff] [review]:

::: js/src/frontend/Parser.cpp
@@ +6987,5 @@
>              return null();
>          }
> +        Node allowFunc = functionDef(inHandling, yieldHandling, nullptr, Arrow, NotGenerator);
> +        if (!allowFunc)

Attachment #8656435 - Flags: review?(jwalden+bmo) → review+
Backed out in for somehow causing SM(p) on Windows to OOM in oomInFormatStackDump.js
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision cffdd225055e).
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:update]
You need to log in before you can comment on or make changes to this bug.