Last Comment Bug 588061 - bogus initial tokenizer position
: bogus initial tokenizer position
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla8
Assigned To: Brendan Eich [:brendan]
:
Mentors:
Depends on:
Blocks: 568142
  Show dependency treegraph
 
Reported: 2010-08-17 08:55 PDT by Dave Herman [:dherman]
Modified: 2011-08-02 03:24 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
proposed fix (5.39 KB, patch)
2011-07-14 16:53 PDT, Brendan Eich [:brendan]
no flags Details | Diff | Review
alterna-fix, better (4.35 KB, patch)
2011-07-14 17:50 PDT, Brendan Eich [:brendan]
cdleary: review+
Details | Diff | Review

Description Dave Herman [:dherman] 2010-08-17 08:55:00 PDT
The TokenStream zeroes out its tokens buffer and initially points to the first token. So if the parser creates a node before advancing the token stream, the node gets the initial position 0:0, which is an invalid position.

This comes up if you try to use js::Parser::statements, because it calls ListNode::create before touching the token stream. I don't think this is currently used anywhere except JS_BufferIsCompilableUnit, which ignores token positions anyway. But it affects the upcoming parser API (see bug 533874) by producing an incorrect start position for the root node.

Dave
Comment 1 Dave Herman [:dherman] 2011-07-11 12:11:28 PDT
cdleary's suggestion: js::Parser::parse() should not be calling js::Parser::statements(), and the latter should be private since it depends on being called only in a context where the token stream is advanced past the first token. js::Parser::parse() should create a ListNode properly up front.

Dave
Comment 2 Brendan Eich [:brendan] 2011-07-14 16:53:32 PDT
Created attachment 546042 [details] [diff] [review]
proposed fix
Comment 3 Brendan Eich [:brendan] 2011-07-14 16:56:05 PDT
js::Parser::statements is private. We could have the TokenStream constructor always get a token and then unget. That seems like the best way to roll but it might cost noticeably on empty function bodies or programs. Comments?

/be
Comment 4 Brendan Eich [:brendan] 2011-07-14 17:50:22 PDT
Created attachment 546060 [details] [diff] [review]
alterna-fix, better
Comment 5 Chris Leary [:cdleary] (not checking bugmail) 2011-07-19 11:23:05 PDT
Comment on attachment 546060 [details] [diff] [review]
alterna-fix, better

Review of attachment 546060 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscan.cpp
@@ +234,5 @@
> +     * future) can create parse nodes with good source coordinates before they
> +     * explicitly get any tokens.
> +     *
> +     * FIXME: reopen bug 556486 and prime the pump here, leaving cursor at the
> +     * next token for the parser to consider.

I don't think this is a legitimate FIXME.
Comment 6 Brendan Eich [:brendan] 2011-07-20 14:48:44 PDT
(In reply to comment #5)
> Comment on attachment 546060 [details] [diff] [review] [review]
> alterna-fix, better
> 
> Review of attachment 546060 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsscan.cpp
> @@ +234,5 @@
> > +     * future) can create parse nodes with good source coordinates before they
> > +     * explicitly get any tokens.
> > +     *
> > +     * FIXME: reopen bug 556486 and prime the pump here, leaving cursor at the
> > +     * next token for the parser to consider.
> 
> I don't think this is a legitimate FIXME.

Right, cuz that bug is resolved WONTFIX -- is there a better FIXME comment, then? Should that bug be reopened based on stuff njn learned since?

/be
Comment 7 Chris Leary [:cdleary] (not checking bugmail) 2011-07-20 14:55:32 PDT
(In reply to comment #6)
> Right, cuz that bug is resolved WONTFIX -- is there a better FIXME comment,
> then? Should that bug be reopened based on stuff njn learned since?

Yeah, I think if njn could describe the new course of action (and how it differs from the potential of the failed approach) in a new bug, that'd be for the best. FIXMEs + open and actionable bugs FTW.
Comment 8 Nicholas Nethercote [:njn] 2011-07-20 16:24:24 PDT
Um, I don't claim any special knowledge here.  Could TokenStream be initialized so that the first token has a valid source location?  But that seems hacky.

The premature ListNode creation seems to be the root of the problem.  What happens if there is no more input -- you've created a ListNode unnecessarily.  (Does the ListNode leak in that case, BTW?)  Could js::Parser::statements avoid creating the ListNode until a token has been parsed?  Alternatively, could it call peekToken() before creating the ListNode, and bail early if peekToken() fails?
Comment 9 Brendan Eich [:brendan] 2011-07-21 10:18:25 PDT
(In reply to comment #8)
> Um, I don't claim any special knowledge here.  Could TokenStream be
> initialized so that the first token has a valid source location?  But that
> seems hacky.

The idea of pump-priming is non-hacky. The lexer always gets the next token and the parser consults it as if by peekToken(). That's all that is being proposed here. Is that a non-starter?

> The premature ListNode creation seems to be the root of the problem.

No, the parse tree always makes a list node even if it has no kids. This is a trivial memory cost for an uncommon empty-block/body/program source form, in order to avoid a non-uniform basis case.

> What
> happens if there is no more input -- you've created a ListNode
> unnecessarily.  (Does the ListNode leak in that case, BTW?)

Nothing leaks, parse nodes are arena-allocated with recycling, and freed en masse.

> Could
> js::Parser::statements avoid creating the ListNode until a token has been
> parsed?  Alternatively, could it call peekToken() before creating the
> ListNode, and bail early if peekToken() fails?

This was the less obviously good fix, which we discussed here and on IRC, since statements is called recursively and also from compileScript where the token stream has a valid next-token. Adding an extra peek just spends some cycles to prime the token position for the called-from-API-entry-point case that are a waste in all other cases.

Hence the desire to take the first-token cost only where needed: up front in TokenStream::init.

/be
/be
Comment 10 Brendan Eich [:brendan] 2011-07-21 10:23:09 PDT
Nick the reason you were summoned here was your bug 556486 comment 6.

/be
Comment 11 Nicholas Nethercote [:njn] 2011-07-21 17:28:53 PDT
(In reply to comment #10)
> Nick the reason you were summoned here was your bug 556486 comment 6.

Now I'm more confused.  That bug was resolved WONTFIX, so why is it relevant?  Maybe don't bother answering that, it sounds like you understand this much better than I do and have it under control.
Comment 12 Brendan Eich [:brendan] 2011-07-22 12:14:24 PDT
Sorry for the confusing bug comment citing. The upshot: cdleary and I are still considering a pump-priming model. Your work in bug 637549 reduced the need for such a change, but it still might be the fastest and smallest code footprint way to parse. Not sure.

Anyway, we can revisit at leisure. I'm removing the FIXME.

/be
Comment 13 Brendan Eich [:brendan] 2011-07-22 12:24:40 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/32ab46a3803b

/be
Comment 14 Matt Brubeck (:mbrubeck) 2011-07-22 14:15:47 PDT
Backed out on inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/47b309008c4e

This caused failures in jsreftest on Linux debug builds:
tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311365688.1311367284.20312.gz

REFTEST TEST-START | file:///home/cltbld/talos-slave/test/build/jsreftest/tests/jsreftest.html?test=ecma_5/misc/future-reserved-words.js
++DOMWINDOW == 22 (0x32921c8) [serial = 2968] [outer = 0x1d762a0]
497869: Implement FutureReservedWords per-spec
Assertion failure: tp->begin.lineno == tp->end.lineno, at /builds/slave/m-in-lnx64-dbg/build/js/src/jsscan.cpp:525
WARNING: An event was posted to a thread that will never run it (rejected): file /builds/slave/m-in-lnx64-dbg/build/xpcom/threads/nsThread.cpp, line 374
WARNING: leaking reference to nsTimerImpl: file /builds/slave/m-in-lnx64-dbg/build/xpcom/threads/nsTimerImpl.cpp, line 494
TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/jsreftest/tests/jsreftest.html?test=ecma_5/misc/future-reserved-words.js | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:07:16.379330
INFO | automation.py | Reading PID log: /tmp/tmpvP1wfgpidlog
PROCESS-CRASH | file:///home/cltbld/talos-slave/test/build/jsreftest/tests/jsreftest.html?test=ecma_5/misc/future-reserved-words.js | application crashed (minidump found)
Crash dump filename: /tmp/tmpyRWUUj/minidumps/3c81c007-fc10-2978-1c76f6fd-26f3e91d.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGABRT
Crash address: 0x1f400000843

Thread 0 (crashed)
 0  libpthread-2.11.so + 0xee6b
    rbx = 0x0000020d   r12 = 0x00000001   r13 = 0x8b69b6b8   r14 = 0xffffffff
    r15 = 0x8b62ca00   rip = 0xd360ee6b   rsp = 0x126cc838   rbp = 0x126cc870
    Found by: given as instruction pointer in context
 1  libxul.so!JS_Assert [jsutil.cpp:32ab46a3803b : 89 + 0x9]
    rip = 0xbe812cfc   rsp = 0x126cc840
    Found by: stack scanning
 2  libxul.so!js_ReportValueErrorFlags [jscntxt.cpp:32ab46a3803b : 1115 + 0x6]
    rip = 0xbe6d537c   rsp = 0x126cc870
    Found by: stack scanning
 3  0x7fff126cc94f
    rbx = 0x00000002   rip = 0x126cc950   rsp = 0x126cc878   rbp = 0xbe6d537c
    Found by: call frame info
 4  libxul.so!js::TokenStream::reportCompileErrorNumberVA [jsscan.cpp:32ab46a3803b : 525 + 0x29]
    rip = 0xbe7d6d2e   rsp = 0x126cc880
    Found by: stack scanning
Comment 15 Brendan Eich [:brendan] 2011-07-22 18:15:25 PDT
Rats, ran the tests on my opt shell build instead of dbg.

http://hg.mozilla.org/integration/mozilla-inbound/rev/99f8219d08d3

/be
Comment 16 Chris Leary [:cdleary] (not checking bugmail) 2011-07-22 19:44:41 PDT
(In reply to comment #15)

Just a heads up -- the procedure for inbound is like tracemonkey: mark as [inbound] in the whiteboard and the inbound-merger marks as RESOLVED/FIXED.
Comment 17 Marco Bonardo [::mak] 2011-07-25 05:56:55 PDT
http://hg.mozilla.org/mozilla-central/rev/32ab46a3803b
Comment 18 Marco Bonardo [::mak] 2011-07-25 06:02:48 PDT
this was backed out by mbrubeck and then pushed again.
This is the final changeset, sorry for bugspam:
http://hg.mozilla.org/mozilla-central/rev/99f8219d08d3
Comment 19 Brendan Eich [:brendan] 2011-07-26 08:45:35 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/735e1dc41b44

Forgot the test when transplanting from my tm tree to mozilla-inbound, d'oh.

/be
Comment 20 Brendan Eich [:brendan] 2011-07-26 08:47:04 PDT
Ay caramba. Comment 19 was for bug 671947.

/be
Comment 21 Bob Clary [:bc:] 2011-07-27 09:22:02 PDT
brendan: since evalcx is shell only, shouldn't you just skip the test cross-global-implicit-this.js in the browser rather than mark it as a fails?
Comment 22 Brendan Eich [:brendan] 2011-08-01 14:47:27 PDT
(In reply to comment #21)
> brendan: since evalcx is shell only, shouldn't you just skip the test
> cross-global-implicit-this.js in the browser rather than mark it as a fails?

I am a jstests.list copy-paste monkey. Fixed:

http://hg.mozilla.org/integration/mozilla-inbound/rev/14e1153b706c

/be
Comment 23 Marco Bonardo [::mak] 2011-08-02 03:24:27 PDT
merged http://hg.mozilla.org/mozilla-central/rev/14e1153b706c

Note You need to log in before you can comment on or make changes to this bug.