Closed Bug 1319416 Opened 4 years ago Closed 3 years ago

Assertion failure: next.type != TOK_DIV && next.type != TOK_REGEXP (next token requires contextual specifier to be parsed unambiguously), at js/src/frontend/TokenStream.h:476


(Core :: JavaScript Engine, defect, P1)




Tracking Status
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- fixed


(Reporter: decoder, Assigned: Waldo)



(4 keywords, Whiteboard: [jsbugmon:update])


(3 files, 3 obsolete files)

The following testcase crashes on mozilla-central revision b7f895c1dc2e (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize, run with --fuzzing-safe):

if (testcase() !== true) {}
function testcase() {
    var each = (...of) => () => {}
    /\W+/u.testcase() = 10;


 received signal SIGSEGV, Segmentation fault.
js::frontend::TokenStream::addModifierException (this=this@entry=0x7fffffffc660, modifierException=modifierException@entry=js::frontend::Token::OperandIsNone) at js/src/frontend/TokenStream.h:475
#0  js::frontend::TokenStream::addModifierException (this=this@entry=0x7fffffffc660, modifierException=modifierException@entry=js::frontend::Token::OperandIsNone) at js/src/frontend/TokenStream.h:475
#1  0x00000000004a3d54 in js::frontend::MatchOrInsertSemicolonHelper (ts=..., modifier=js::frontend::Token::None) at js/src/frontend/Parser.cpp:2631
#2  0x00000000004e3ebc in js::frontend::MatchOrInsertSemicolonAfterExpression (ts=...) at js/src/frontend/Parser.cpp:2638
#3  js::frontend::Parser<js::frontend::FullParseHandler>::variableStatement (this=this@entry=0x7fffffffc630, yieldHandling=yieldHandling@entry=js::frontend::YieldIsName) at js/src/frontend/Parser.cpp:6756
#4  0x00000000004e6493 in js::frontend::Parser<js::frontend::FullParseHandler>::statementListItem (this=this@entry=0x7fffffffc630, yieldHandling=yieldHandling@entry=js::frontend::YieldIsName, canHaveDirectives=<optimized out>) at js/src/frontend/Parser.cpp:6977
#5  0x00000000004e66b7 in js::frontend::Parser<js::frontend::FullParseHandler>::statementList (this=0x7fffffffc630, yieldHandling=js::frontend::YieldIsName) at js/src/frontend/Parser.cpp:3815
#6  0x00000000004e7c6f in js::frontend::Parser<js::frontend::FullParseHandler>::functionBody (this=this@entry=0x7fffffffc630, inHandling=inHandling@entry=js::frontend::InAllowed, yieldHandling=js::frontend::YieldIsName, kind=kind@entry=js::frontend::Statement, type=type@entry=js::frontend::Parser<js::frontend::FullParseHandler>::StatementListBody) at js/src/frontend/Parser.cpp:2442
#7  0x00000000004e8372 in js::frontend::Parser<js::frontend::FullParseHandler>::functionFormalParametersAndBody (this=this@entry=0x7fffffffc630, inHandling=inHandling@entry=js::frontend::InAllowed, yieldHandling=<optimized out>, pn=0x7ffff69f4020, kind=js::frontend::Statement) at js/src/frontend/Parser.cpp:3445
#8  0x00000000004ae067 in js::frontend::Parser<js::frontend::FullParseHandler>::standaloneLazyFunction (this=this@entry=0x7fffffffc630, fun=fun@entry=..., strict=<optimized out>, generatorKind=js::NotGenerator, asyncKind=js::SyncFunction) at js/src/frontend/Parser.cpp:3361
#9  0x0000000000b18f88 in js::frontend::CompileLazyFunction (cx=0x7ffff695f000, lazy=lazy@entry=..., chars=0x7ffff699e26a u"() {\n    var each = (...of) => () => {}\n    /\\W+/u.testcase() = 10;\n}\n", length=length@entry=69) at js/src/frontend/BytecodeCompiler.cpp:651
#10 0x000000000096e796 in JSFunction::createScriptForLazilyInterpretedFunction (cx=cx@entry=0x7ffff695f000, fun=fun@entry=...) at js/src/jsfun.cpp:1492
#11 0x0000000000466bfa in JSFunction::getOrCreateScript (this=<optimized out>, cx=0x7ffff695f000) at js/src/jsfun.h:408
#12 0x0000000000b64d71 in Interpret (cx=0x7ffff695f000, state=...) at js/src/vm/Interpreter.cpp:2934
#22 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:7940
rax	0x201c7c0	33671104
rbx	0x2	2
rcx	0x1182680	18359936
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7fffffffbc00	140737488337920
rsp	0x7fffffffbbf0	140737488337904
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff7fe4740	140737354024768
r10	0x58	88
r11	0x7ffff6b9f750	140737332770640
r12	0x7fffffffc660	140737488340576
r13	0x7fffffffc660	140737488340576
r14	0x0	0
r15	0x1	1
rip	0x4af659 <js::frontend::TokenStream::addModifierException(js::frontend::Token::ModifierException)+713>
=> 0x4af659 <js::frontend::TokenStream::addModifierException(js::frontend::Token::ModifierException)+713>:	movl   $0x0,0x0
   0x4af664 <js::frontend::TokenStream::addModifierException(js::frontend::Token::ModifierException)+724>:	ud2
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
user:        Jan de Mooij
date:        Tue Sep 13 14:14:32 2016 +0200
summary:     Bug 1297706 - Syntax parse arrow functions. r=shu

This iteration took 0.849 seconds to run.
Hmm.  I'd have thought bug 1302994 addressed the only instance of this.  :-(  I wonder if it's time to go back to the drawing board and rigorize/systematize how we use modifier exceptions, the way a mega-patch I had was attempting to do (but then I dropped it because it almost certainly would have been un-backportable).
Jeff, did you get a chance to look into this any more? Any thoughts on how to progress?
Flags: needinfo?(jwalden+bmo)
I'm attempting to resurrect my mega-patch now to see if that more-principled approach will fix this.  Hopefully that'll be reducible into something backportable, but we'll see.
I'm not 100% sure what the problem is with this bug.  But I *think* the problem is about 90% certain to be that the peeking hack after block-bodied arrow functions is just wrong.

Currently, after a block body, we have code-smelly handling because our arrow function parsing treats block-bodied stuff differently.  That works for an ArrowFunction as an entire expression.  But it fails if the ArrowFunction appears at the end of *another* ArrowFunction.  I'm not wholly certain why.  And unless there's some greater danger to this bug than just a possible misparse, I'm not inclined to investigate it.

But in the spec, in reality, it's *always* the case that after an arrow function -- any arrow function -- an expression/Operand is expected, and no modifier should be necessary.  There's no difference between arrow functions -- within ArrowFunction the production -- in this regard.  So why in the world is a peek of a certain kind, and/or a modifier exception, required after one with a block body?

The answer is, it really shouldn't be.  Everything beneath Parser::orExpr() should be getting the next token as Operand, because '/' begins an expression in that context (only without error if the '/' is on a fresh line).  And everything that might call Parser::orExpr(), should be parsing with the None modifier, because '/' is never an Operand there, only an operator.  And if we recognize that, conceptually, only a *single* OperandIsNone exception should ever appear in the parser, I think we can treat all arrow functions identically.

This patch implements this behavior.  3 files changed, 138 insertions(+), 194 deletions(-), not including the test.  Passes all jstests/jit-tests in the shell, so it's probably reviewable now (even if a try-run is more than called for, and may well suss out other places where modifiers require changes.)
Attachment #8825599 - Flags: review?(arai.unmht)
Assignee: nobody → jwalden+bmo
The semicolon-insert logic should also have been recognized as code-smelly before now, I think.  ASI means the place where the implicit semicolon is inserted, *always* is the possible beginning of an ExpressionStatement, so interpreting '/' as regular expression should always have been the only sensible behavior.  With the previous patch (which left matchOrInsertSemicolonAfter* as being identical, to avoid this mess of extra change in a concise-ish patch), we can have only a single function to do this.
Attachment #8825600 - Flags: review?(arai.unmht)
Blocks: 1330296
Comment on attachment 8825599 [details] [diff] [review]
Substantially rejigger token lookahead and modifier exceptions to function more like the spec

Review of attachment 8825599 [details] [diff] [review]:

Great :)
This makes the purpose of modifier/exception more clear,
and it would be the time to revisit bug 1201421 to name them properly, to follow the actual use case.
Attachment #8825599 - Flags: review?(arai.unmht) → review+
Attachment #8825600 - Flags: review?(arai.unmht) → review+
Pushed by
Substantially rejigger token lookahead and modifier exceptions to function more like the spec.  r=arai
Consolidate all semicolon-matching into a single function, now that there's no quasi-bogus need for multiple variants.  r=arai
Bah.  Any context that takes expr() (so that the first peek after an ArrowFunction is Operand, when ordinarily it's None with a modifier exception added automagically) is going to be a pain to get modifiers correct.  So it goes.  :-(
Attachment #8826396 - Flags: review?(arai.unmht)
Comment on attachment 8825600 [details] [diff] [review]
Consolidate all semicolon-matching into a single function, now that there's no quasi-bogus need for multiple variants

Review of attachment 8825600 [details] [diff] [review]:

::: js/src/frontend/Parser.cpp
@@ +5919,3 @@
>              return null();
>      } else {
> +        if (!matchOrInsertSemicolon())

overlooked this.
if (exprNode) has the same then- and else- branches.
Comment on attachment 8826396 [details] [diff] [review]
Substantially rejiggered, take two

Review of attachment 8826396 [details] [diff] [review]:

looks good except one place.
>         // Parser::declaration consumed everything up to the closing ')'.  That
>         // token follows an {Assignment,}Expression, so the next token must be
>         // consumed as if an operator continued the expression, i.e. as None.

this could (maybe should?) be changed to Operand now, since it's after assignExpr()/expr()

also, maybe we could add another variant of parser test like js/src/jit-test/lib/syntax.js to test modifier after certain expression in certain context.
currently js/src/jit-test/lib/syntax.js append the codelet to the pre-defined code.
we could extend it to be able to embed the codelet to anywhere in the template, something like following
    () => {}
    for (; @; ) break;
  evaluated code:
    for (; () => {}; ) break;

of course in other bug.
Attachment #8826396 - Flags: review?(arai.unmht) → review+
Minor update: try-servering indicates a number of additional bugs in this that require fixing, possibly (probably?) enough for a second go-around.  Still on that, amidst stabbing at request queue that briefly passed 50 recently.  :-\
Flags: needinfo?(jwalden+bmo)
Priority: -- → P1
(response to IRC)
I agree that it's nice to handle KeywordIsName (or KeywordIsKeyword, maybe?) in separate step
See Also: → 1336783
Jeff, are we going to be able to do anything here for 53 (Aurora at present)?
Flags: needinfo?(jwalden+bmo)
Almost certainly no.  The partial fix-work I have makes pervasive change to the parser and is certainly not backportable.  (Unless some actual security issue pops up from this, but I don't think one would.)
Flags: needinfo?(jwalden+bmo)
53->wontfix given comment 16. Thanks for the clarification, Jeff.
Jeff, is this landable for 55?  I assume that even if this did land for 55, we're not going to want to uplift this to 54?
Flags: needinfo?(jwalden+bmo)
It isn't landable.  At this point, it's in line behind the generally-more-important work of single-byte parsing, bug 1351107.  I'd really really like to finish this up quickly after that, but there are some scheduling issues that will make that tricky.
Flags: needinfo?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #19)
> It isn't landable.  At this point, it's in line behind the
> generally-more-important work of single-byte parsing, bug 1351107.  I'd
> really really like to finish this up quickly after that, but there are some
> scheduling issues that will make that tricky.

Hm, OK.  Gonna say wontfix for 54, but we'll leave this open for 55.
Duplicate of this bug: 1376890
Marking affected for 56 since the duplicate seems to be.
This is an assigned P1 bug without activity in two weeks. 

If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword.

Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.
Keywords: stale-bug
Given comment #19, wontfixing for 55 and 56. Given that it's a fuzz bug, marking fix-optional for 57.
I picked up the partial patch I had from 5+ months ago, un-depended it on bug 1351107, and *think* I have a complete patch, maybe.  Passes jit-tests, jstests running now (no failures at the 33% mark!), and then we'll see what all the JS in the tree that runs in every other test suite turns up -- but I think what I have is super-close to completion.  Will post patch once jstests run, possibly flagging for review at that point to let that go in parallel with a full try-run and fixing of anything that turns up.

As noted in earlier comments, the patch here is totally unsuitable for backporting anywhere -- far too big and finicky.
Getting there getting there...

Not sure yet whether all those failures are unrelated to the patchwork, but this is definitely close.  And unlike previous times here, I doubt there remain fundamental issues to resolve.  Should have this finished tomorrow.  
Having only two functions in the entire Parser that consume a bunch of tokens *plus one extra* is bizarrely different, for no good reason.

And it turns out, if you're writing a parser that requests the next token and wants to specify which of InputElement{Div,RegExp} is desired for tokenizing, it makes a ton of sense for the end of parsing a binary expression, to add an exception so that InputElementDiv can now be requested as InputElementRegExp without conflict.

This patch makes both these functions consume exactly what you'd think, and no more.  The next token may be *examined* -- although once this bug is fixed, that's not necessarily the case! -- but if so, it's put back and a modifier exception is added to allow examination using the other production.

This patch shouldn't change any behavior, but it at least slims down the overall aggregated patch for this bug.  And given we're on the third attempt at review here :-\ any slimming that can be done (sadly, all too little) is desirable.
Attachment #8825599 - Attachment is obsolete: true
Attachment #8825600 - Attachment is obsolete: true
Attachment #8826396 - Attachment is obsolete: true
Attachment #8923558 - Flags: review?(arai.unmht)
At one time I think I thought this was necessary to the overall patch, and that I'd specifically build upon these changes.  I'm not sure I do any more.  But in any event, having multiple variables (not using the inscrutable "tt" name) for different contexts here is much more readable.  The compiler/optimizer should be able to combine the storage for the two pretty easily, given their lifetimes don't overlap.
Attachment #8923562 - Flags: review?(arai.unmht)
Okay, this actually makes the fix.  There are several different moving parts in this, all mostly intertwined and incapable of separation.

First, matchOrInsertSemicolonHelper *always* looks for Operand now.  This just makes sense -- any place where ASI is permissible, means you have the start of a new Statement after, and that could be an ExpressionStatement requiring Operand.  It's nonsense for None ever to be appropriate, because that would imply a division at the start of an expression.  However, as with the previous patches here, having just one single matchOrInsertSemicolon function (instead of a helper called by two actual functions with different modifiers) is left to a subsequent patch.

Second, modifiers for distinguishing division from regexp are *mostly* handled by a single exception added at the end of Parser::orExpr, after the end of a binary expression has been reached.  This just makes sense -- the end of a binary expression can't have division after it, because then that would require extending the binary expression.  So every caller of Parser::orExpr should be followed by looking at the next token as Operand (and the same for callers of those functions, when the call is in terminal position).  It's not clear to me whether messing this up will result in broken behavior (other than asserting); exactly how things go awry when these assertions fail, is very unclear.  (And not worth trying to figure out.)

Third, the one place where this distinguishing happens other than that spot, is the new modifier exception added in the not-an-assignment/arrow-function default case in Parser::assignExpr.  That exception is really only necessary for when Parser::assignExpr reads up a Parser::condExpr and another token, then it turns out the Parser::condExpr was the entirety of the Parser::assignExpr -- which is pretty much only ConditionalExpression that ends in an ArrowFunction (with block body) AssignmentExpression, e.g. |1 ? 2 : () => {}|.  See the numerous tests in the new test file that use exactly that sort of pattern to trigger that exact scenario.

Fourth, the modifiers passed in a ton of places need to be changed, so that the next modifier after Parser::orExpr (or Parser::condExpr that calls it, or Parser::assignExpr that calls that) is always Operand.  (Or is handled by some exception somehow.)  The changes are not always entirely obvious for this -- I could probably write a paragraph-long comment by each of these to demonstrate exactly why each particular modifier is needed.  :-(  But it would be tedious and take a lot of time, so I'm deferring that unless it's deemed necessary.  Hopefully blame in the future is enough for people to understand how this stuff should be used.

Fresh tryserver is here and looks as green as it ever does these days:

Excluding the new tests, this is a wonderful 3 files changed, 130 insertions(+), 197 deletions(-).  And getting rid of the comments containing the phrases "ASI's infinite majesty" (so much snark, RIP :-( ), "first token-peek", and "If you think this all sounds pretty code-smelly" is easily the highlight of recent hacking for me.
Attachment #8923652 - Flags: review?(arai.unmht)
Comment on attachment 8923558 [details] [diff] [review]
Convert Parser::{cond,or}Expr1 to plain old Parser::{cond,or}Expr

Review of attachment 8923558 [details] [diff] [review]:

Nice cleanup :D
I'll review remaining patch tonight or tomorrow.
Attachment #8923558 - Flags: review?(arai.unmht) → review+
Comment on attachment 8923562 [details] [diff] [review]
Use better names (and multiple variables) to replace a single |TokenKind tt|

Review of attachment 8923562 [details] [diff] [review]:

Yes, it's clearer :)
Attachment #8923562 - Flags: review?(arai.unmht) → review+
Pushed by
Convert Parser::{cond,or}Expr1 to plain old Parser::{cond,or}Expr.  r=arai
Use better names (and multiple variables) to replace a single |TokenKind tt|.  r=arai
Keywords: leave-open
Comment on attachment 8923652 [details] [diff] [review]
Actually fix this bug

Review of attachment 8923652 [details] [diff] [review]:

r+ with one question in arrayBindingPattern answered.

::: js/src/frontend/Parser.cpp
@@ +4629,4 @@
>           }
>           TokenKind tt;
> +         if (!tokenStream.getToken(&tt))

in this case, it's using TokenStream::None because there can't be RegExp ?

@@ +8185,4 @@
>          tokenStream.consumeKnownToken(TOK_ASYNC, TokenStream::Operand);
>          TokenKind tokenAfterAsync;
> +        if (!tokenStream.getToken(&tokenAfterAsync, TokenStream::None))

this change has no effect.
Attachment #8923652 - Flags: review?(arai.unmht) → review+
Comment on attachment 8923652 [details] [diff] [review]
Actually fix this bug

Review of attachment 8923652 [details] [diff] [review]:

::: js/src/frontend/Parser.cpp
@@ +4629,4 @@
>           }
>           TokenKind tt;
> +         if (!tokenStream.getToken(&tt))

I *think* this was for identifier-parsing consistency with Parser::objectBindingPattern.  That code must use None for parsing names because Parser::propertyName gets the token as None, because property names are gotten in a spot where the closing '}' in a binding pattern could be gotten (if there were a trailing comma).  And that's gotten that way because, similar to object *literals*, the '}' is gotten as None because it could appear after the AssignmentExpression that could be the value of a property in said literal:

  var obj = { x: 3 / 5 };

The brace is first gotten as None by binary-operator parsing, so the closing brace in general is so gotten.  (Of course if the expression were an ArrowFunction with block body, that brace would be gotten only by object-literal parsing -- but that's the exceptional case this entire bug is concerned about.)

Now as I look at this code now, maybe you *could* keep this as Operand safely, if you kept a few other things that way.  Maybe.  But I'm not at all confident of that, and indeed I'd bet it's the other way around, given pretty much every change I made in this patch was motivated by existing tests/code or the new ones in the patch.

@@ +8185,4 @@
>          tokenStream.consumeKnownToken(TOK_ASYNC, TokenStream::Operand);
>          TokenKind tokenAfterAsync;
> +        if (!tokenStream.getToken(&tokenAfterAsync, TokenStream::None))

Yeah, I guess so.  Maybe in that followup we can make specifying the modifer non-optional, or built into the function call itself, or *something* so it's more explicit and you don't just use one way unthinkingly and inevitably get it wrong.
Pushed by
Substantially rejigger token lookahead and modifier exceptions to function more like the spec.  r=arai
Consolidate all semicolon-matching into a single function, now that there's no quasi-bogus need for multiple variants.  r=arai
OS: Linux → All
Hardware: x86_64 → All
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Depends on: 1414805
You need to log in before you can comment on or make changes to this bug.