Last Comment Bug 846933 - Comment class TokenStream
: Comment class TokenStream
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Other Branch
: All All
: -- normal (vote)
: mozilla22
Assigned To: Philipp Matthias Schäfer [:fiveop]
: general
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-01 14:23 PST by Jason Orendorff [:jorendorff]
Modified: 2013-03-26 07:22 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP1: explanation of the (caveats of the) token scanning interface (2.70 KB, patch)
2013-03-09 11:24 PST, Philipp Matthias Schäfer [:fiveop]
no flags Details | Diff | Review
second draft (3.12 KB, patch)
2013-03-11 13:50 PDT, Philipp Matthias Schäfer [:fiveop]
n.nethercote: feedback+
Details | Diff | Review
third draft (3.50 KB, patch)
2013-03-19 13:19 PDT, Philipp Matthias Schäfer [:fiveop]
n.nethercote: review+
Details | Diff | Review
(hopefully) final patch (3.49 KB, patch)
2013-03-25 13:43 PDT, Philipp Matthias Schäfer [:fiveop]
no flags Details | Diff | Review

Description Jason Orendorff [:jorendorff] 2013-03-01 14:23:50 PST
It has a one-line comment on each data member, so it could be worse. But not much.
Comment 1 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-03 02:08:14 PST
Is it notably worse than every other class in SpiderMonkey?
Comment 2 Jeff Walden [:Waldo] (remove +bmo to email) 2013-03-03 12:00:30 PST
The soft bigotry of low expectations.  :-)
Comment 3 Jason Orendorff [:jorendorff] 2013-03-04 08:59:24 PST
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Is it notably worse than every other class in SpiderMonkey?

Someone's interested in working on it!
Comment 4 Philipp Matthias Schäfer [:fiveop] 2013-03-09 11:24:18 PST
Created attachment 723104 [details] [diff] [review]
WIP1: explanation of the (caveats of the) token scanning interface

So far I have only wrote a big comment above TokenStreams definition that tries to explain when it is safe to call which of the member functions related to the token scanning process.

The important word is try. I am not sure if I covered everything.

I stopped at this point, because first I would ask to what extend I may make changes to the code itself. I would mainly like to reorganise the order of the class members (functions and variables) so that their grouping makes more sense, but also like to make slight functionality changes. For example: As far as I can tell, though of course I'm not entirly sure, the restriction on peekToken (only with lookahead 0 or 1 instead of 0, 1, 2, and 3) is artificial (even performance wise). Without this restriction though, you could explain much more clearly how the buffer works in relation to the functions querying (information about) tokens.

I'm asking, since I don't want to put work into something no one wants to review or someone thinks is a waste of time anyway.
Comment 5 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-09 12:30:15 PST
Comment on attachment 723104 [details] [diff] [review]
WIP1: explanation of the (caveats of the) token scanning interface

Thanks for doing this.

When you attach a patch, please check the "patch" box.  When you do that, Bugzilla offers to show it in more readable forms.

Please use eight lines of context in your diffs.  Put this in your .hgrc file:

  [defaults]
  diff=-p -U 8

Note that bug 747831 is going to change TokenPos a bit -- |begin| and |end| will just be indexes into the buffer, instead of a lineno/column-index pair.
Comment 6 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-09 12:42:17 PST
Comment on attachment 723104 [details] [diff] [review]
WIP1: explanation of the (caveats of the) token scanning interface

This looks pretty good.  I have a few minor comments below.  I'm happy to do a proper review after you've fixed them;  just request ":njn" for review.


>+// In the first case it does not contain any meaningful data, in the second
>+// case the fields .pos.begin.lineno and .pos.begin.index have been filled with
>+// proper data.

An interesting case:  while working on bug 747831 I discovered that we erroneously access .pos.end before it has been set in any of the cases in getTokenInternal() -- .pos.end is set at the end of that function, but we call reportError/reportWarning and then end up in reportCompileErrorNumberVA, which can read .pos.end.  I'm fixing that in bug 747831.


>+// The following table lists in which situations it is secure to call each
>+// listed function. No checks are made by the functions in non-debug builds.
>+//
>+// Function Name     | Precondition (Changes to |lookahead|)

I change this to "Precondition;  changes to |lookahead|" or "Precondition / changes to |lookahead|".  The parentheses read like they're additional information about the precondition.

>+// ------------------+---------------------------------------------------------
>+// getToken          | none (|lookahead > 0| -> |lookahead--|)
>+// peekToken         | 0 <= |lookahead| <= 1 (none)
>+// peekTokenSameLine | see peekToken plus current token is not poisoned, see
>+//                   | above (none)
>+// matchToken        | none (none)

matchToken calls getToken, so it can change |lookahead|.

>+// consumeKnownToken | none (none)
>+// ungetToken        | 0 <= |lookahead| < 3 and current token is not poisoned,
>+//                   | see above (|lookahead++|)
>+// currentToken      | current token is not poisoned, see above (none)
>+// <any functiont    |
>+// that queries      | see currentToken (none)
>+// information about |
>+// the current token>|
>+//
>+// The behavior of the token scanning process (see getTokenInternal()) can be
>+// modified by calling any but the last three of the above member functions with
>+// an optional argument of type TokenStreamFlags. The two flags that do influence
>+// the scanning process are TSF_OPERAND and TSF_KEYWORD_IS_NAME. However, they
>+// will be ignored unless |lookahead == 0| holds.

Yeah... that's pretty gross :/
Comment 7 Philipp Matthias Schäfer [:fiveop] 2013-03-11 13:50:56 PDT
Created attachment 723628 [details] [diff] [review]
second draft

Thank you.

I agree, ';' (or '/') make it clearer. And I missed the change of matchToken.

What about major reordering of the member functions and variables (i.e. documentation via clear presentation)?
Comment 8 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-12 10:51:57 PDT
Comment on attachment 723628 [details] [diff] [review]
second draft

AFAICT this is your second patch for Mozilla?  It's great to see newcomers, so well done for getting to this point.

The patch looks pretty good.  I've done a pretty nitpicky review and I have lots of minor suggestions (which shouldn't be hard to address) so I'd like to see one more iteration.  So I'm giving you f+ instead of r+.

Thanks!


>diff -r d2ae88e9925b js/src/frontend/TokenStream.h
>--- a/js/src/frontend/TokenStream.h	Mon Mar 11 15:31:14 2013 -0400
>+++ b/js/src/frontend/TokenStream.h	Mon Mar 11 21:45:18 2013 +0100

How did you generate this patch?  It doesn't look like a standard hg patch as used for Mozilla code.  As a result, the "review" link in Bugzilla doesn't show it properly.


>+// TokenStream is the lexical scanner for Javascript source text.
>+//
>+// It takes a buffer jschars and linearly scans it into |Token|s.

s/buffer/buffer of/.


>+// At any moment the current token is the one right before the one that would
>+// be returned by a call to getToken().

It feels odd to describe the present relative to the future.  Perhaps something like this would be better:  "The current token is the one pointed to by |cursor|.  getToken() advances the cursor by one."


>+   If a TokenStream was just created or
>+// its seek() member function was just called, the current token is poisoned:
>+// In the first case it does not contain any meaningful data, in the second
>+// case the fields .pos.begin.lineno and .pos.begin.index have been filled with
>+// proper data.

In bug 747831 I changed .pos.begin and .pos.end into single uint32_t values.  So whichever patch lands first, the other will need to be updated accordingly.  Also, I added a Valgrind annotation (via MOZ_MAKE_MEM_UNDEFINED) to poison .pos.end, which might be worth mentioning.


>+// The circular buffer allows to go back up to three tokens from the last

s/allows to go back/lets us go back/.  Although the circular buffer has four elements, |maxLookahead| constrains us to back at most two -- see the comment on |nTokens|.  So please change "three" to "two".


>+// scanned token. Internally, the relative number of backward steps that were
>+// taken (via ungetToken()) after the last token was scanned is stored in
>+// |lookahead|.

Put a blank line here.  (Well, a blank line beginning with "//".)


>+// The following table lists in which situations it is secure to call each
>+// listed function. No checks are made by the functions in non-debug builds.

Perhaps:  "Poisoning and checks only occur in debug builds."


>+// Function Name     | Precondition; changes to |lookahead|
>+// ------------------+---------------------------------------------------------
>+// getToken          | none; |lookahead > 0| -> |lookahead--|
>+// peekToken         | 0 <= |lookahead| <= 1; none
>+// peekTokenSameLine | see peekToken plus current token is not poisoned, see
>+//                   | above; none

For the precondition, I'd say "0 <= |lookahead| <= 1, and current token is not poisoned".  No need for the "see above".


>+// matchToken        | none; if successfull see getToken(), otherwise
>+//                   | |lookahead| remains unchanged

s/successfull/successful/.  But I'd change the "changes to |lookahead| to "|lookahead > 0| -> |lookahead--| if the match succeeds".


>+// consumeKnownToken | none; none
>+// ungetToken        | 0 <= |lookahead| < 3 and current token is not poisoned,

You used "<=" above but "<" here;  please use "<=" consistently for easier reading.

Hmm... ungetToken has this assertion:

    JS_ASSERT(lookahead < ntokensMask);

but it should probably be this instead (like in peekToken()):

    JS_ASSERT(lookahead < maxLookahead);

Can you try that change?  If it passes the jit-tests it should be fine.  And then you should change the condition to "0 <= |lookahead| <= 1".


>+//                   | see above; |lookahead++|

No need for the ", see above".


>+// currentToken      | current token is not poisoned, see above; none

No need for the ", see above".


>+// <any functiont    |

s/functiont/function/.


>+// that queries      | see currentToken; none
>+// information about |
>+// the current token>|

Perhaps naming an example of such a function would help, e.g. isCurrentTokenEquality().


>+// The behavior of the token scanning process (see getTokenInternal()) can be
>+// modified by calling any but the last three of the above member functions with
>+// an optional argument of type TokenStreamFlags. 

I'd change "any but the last three" to "the first four".


>+   The two flags that do influence

Remove the "do".


>+// the scanning process are TSF_OPERAND and TSF_KEYWORD_IS_NAME. However, they
>+// will be ignored unless |lookahead == 0| holds.

This is kind of alarming when you first realize it :)  Please add this:  "Due to constraints of the grammar, this turns out to rarely be a problem in practice.  See the mozilla.dev.tech.js-engine.internals thread entitled 'Bug in the scanner?' for more details (https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.tech.js-engine.internals/2JLH5jRcr7E)."


>+// The methods seek() and tell() allow to rescan from a previous visited
>+// location of the buffer. However, tell() may only be called if |lookahead == 0|
>+// holds.

Maybe mention seek() and tell() in the table above, rather than here?
Comment 9 Philipp Matthias Schäfer [:fiveop] 2013-03-12 12:31:53 PDT
>Although the circular buffer has four elements, |maxLookahead| constrains us to
>back at most two -- see the comment on |nTokens|.  So please change "three" to
>"two".

>Hmm... ungetToken has this assertion:
>
>    JS_ASSERT(lookahead < ntokensMask);
>
>but it should probably be this instead (like in peekToken()):
>
>    JS_ASSERT(lookahead < maxLookahead);
>
>Can you try that change?  If it passes the jit-tests it should be fine.  And
>then you should change the condition to "0 <= |lookahead| <= 1".

This is one thing, that was quite confusing. Consider the following scenario. Let R be the last parsed token (by getTokenInternal), C be the current token, and T any other token. Currently peekToken() is allowed if R == C or if the buffer looks like [C, R, T, T] (up to circularity). But your JS_ASSERT(lookahead < maxLookahead) in ungetToken would allow the buffer to get in a state equivalent to [C, T, R, T].

If you want to allow a |lookahead| of up to 2. It should probably be JS_ASSERT(lookahead <= maxLookahead) in peekToken() and a JS_ASSERT(lookahead < maxLookahead) in ungetToken(). The latter is very clearly what we want. If |lookahead| is not at max value yet, you may call ungetToken() (and hence increase |lookahead|). But then the JS_ASSERT in peekToken() is really superfluous, because the only Function increasing |lookahead| is ungetToken() and there we already have the appropriate assert. (But we can leave the assert in peekToken() anyway).

jorendorff has already changed peekToken as suggested in his patch for bug 846406 .
Comment 10 Philipp Matthias Schäfer [:fiveop] 2013-03-19 13:19:23 PDT
Created attachment 726866 [details] [diff] [review]
third draft

I copied a .hgrc example from the wiki and the patch looks (to me) like others that have been attached in bugzilla now.

I applied most of your changes. Since TokenPos and the behaviour of tell and seek have changed considerable in a patch for bug 846406, we no longer need to talk about poisoned current tokens. When the TokenStream is created, token[cursor] is filled with random data (at least from a TokenStream user point of view). As soon as one of the 'reading' function members is called, the TokenStream will always have a valid current token.

I made the change in ungetToken() and ran the jittests, which passed.

> Perhaps naming an example of such a function would help, e.g. isCurrentTokenEquality().

Since those functions will always return valid data (after the first (indirect) getToken call after TokenStream construction), I removed them from the list. Only functions that potentially change |cursor| or |lookahead| are listed now.

This is also why I do not want tell() and seek() in that list, they do not control 'micromovement' on the token stream, but rather store and restore its state.
Comment 11 :Ms2ger 2013-03-19 13:34:11 PDT
Comment on attachment 726866 [details] [diff] [review]
third draft

Review of attachment 726866 [details] [diff] [review]:
-----------------------------------------------------------------

Lovely patch; I wonder if a few blank lines could do the comment some good, perhaps?
Comment 12 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-19 13:44:16 PDT
Bug 747831 just landed on mozilla-inbound, which changes TokenStream some more... would you mind rebasing your patch in light of those changes?  Sorry for the hassle.  TokenStream normally goes long periods of time without change, but recently it's seen a lot of activity.
Comment 13 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-19 13:49:02 PDT
> Bug 747831 just landed on mozilla-inbound

And was just backed out due to Windows bustage :(  Hopefully I'll reland later today.
Comment 14 Jason Orendorff [:jorendorff] 2013-03-19 17:21:13 PDT
> >+// At any moment the current token is the one right before the one that would
> >+// be returned by a call to getToken().
>
> It feels odd to describe the present relative to the future.  Perhaps something
> like this would be better:  "The current token is the one pointed to by |cursor|.
> getToken() advances the cursor by one."

I prefer what fiveop wrote. "getToken() advances the cursor by one" doesn't tell which token is returned (the one cursor points to before or after advancing).

(Incidentally I say TokenSource just fundamentally has some screws loose in the API that cause a lot of confusion, and a good mass-renaming might help.)
Comment 15 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-24 15:42:24 PDT
Comment on attachment 726866 [details] [diff] [review]
third draft

Review of attachment 726866 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  Just a few minor suggestions.  Sorry I took so long to re-review.

Do you have commit permissions?  If not, I'm happy to land it once you attach the final patch.  Thanks again.

::: js/src/frontend/TokenStream.h
@@ +425,5 @@
> +// scanned token. Internally, the relative number of backward steps that were
> +// taken (via ungetToken()) after the last token was scanned is stored in
> +// |lookahead|.
> +//
> +// The following table lists in which situations it is secure to call each

"safe" sounds better than "secure", to my ears.

@@ +430,5 @@
> +// listed function. No checks are made by the functions in non-debug builds.
> +//
> +// Function Name     | Precondition; changes to |lookahead|
> +// ------------------+---------------------------------------------------------
> +// getToken          | none; |lookahead > 0| -> |lookahead--|

I think "if |lookahead > 0| then |lookahead--|" would be clearer for the change.

@@ +431,5 @@
> +//
> +// Function Name     | Precondition; changes to |lookahead|
> +// ------------------+---------------------------------------------------------
> +// getToken          | none; |lookahead > 0| -> |lookahead--|
> +// peekToken         | 0 <= |lookahead| <= |maxLookahead|; none

This assertion is one that must hold at all times, and it's not actually checked in peekToken().  So I think the precondition for peekToken is "none", like getToken().

@@ +433,5 @@
> +// ------------------+---------------------------------------------------------
> +// getToken          | none; |lookahead > 0| -> |lookahead--|
> +// peekToken         | 0 <= |lookahead| <= |maxLookahead|; none
> +// peekTokenSameLine | see peekToken; none
> +// matchToken        | none; |lookahead > 0| -> |lookahead--| if the match

I think "if |lookahead > 0 and the match succeeds| then |lookahead--|" would be clearer.
Comment 16 Philipp Matthias Schäfer [:fiveop] 2013-03-25 13:43:53 PDT
Created attachment 729167 [details] [diff] [review]
(hopefully) final patch

I don't have commit rights.
Comment 17 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-25 17:48:26 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8c8f4386fe0

Thanks, Philipp!  If you're looking for another bug to work on, bug 798914 might be a good one.  It'll take you outside the JS engine, but should still be pretty easy.
Comment 18 Ryan VanderMeulen [:RyanVM] 2013-03-26 07:22:14 PDT
https://hg.mozilla.org/mozilla-central/rev/f8c8f4386fe0

Note You need to log in before you can comment on or make changes to this bug.