846933 - Comment class TokenStream

(In reply to Nicholas Nethercote [:njn] from comment #1) > Is it notably worse than every other class in SpiderMonkey? Someone's interested in working on it!

Philipp Matthias Schäfer [:fiveop]

Assignee

Comment 4

•

13 years ago

Attached patch WIP1: explanation of the (caveats of the) token scanning interface (obsolete) — Details — Splinter Review

So far I have only wrote a big comment above TokenStreams definition that tries to explain when it is safe to call which of the member functions related to the token scanning process. The important word is try. I am not sure if I covered everything. I stopped at this point, because first I would ask to what extend I may make changes to the code itself. I would mainly like to reorganise the order of the class members (functions and variables) so that their grouping makes more sense, but also like to make slight functionality changes. For example: As far as I can tell, though of course I'm not entirly sure, the restriction on peekToken (only with lookahead 0 or 1 instead of 0, 1, 2, and 3) is artificial (even performance wise). Without this restriction though, you could explain much more clearly how the buffer works in relation to the functions querying (information about) tokens. I'm asking, since I don't want to put work into something no one wants to review or someone thinks is a waste of time anyway.

Nicholas Nethercote [inactive]

Comment 5

•

13 years ago

Comment on attachment 723104 [details] [diff] [review] WIP1: explanation of the (caveats of the) token scanning interface Thanks for doing this. When you attach a patch, please check the "patch" box. When you do that, Bugzilla offers to show it in more readable forms. Please use eight lines of context in your diffs. Put this in your .hgrc file: [defaults] diff=-p -U 8 Note that bug 747831 is going to change TokenPos a bit -- |begin| and |end| will just be indexes into the buffer, instead of a lineno/column-index pair.

Attachment #723104 - Attachment is patch: true

Nicholas Nethercote [inactive]

Comment 6

•

13 years ago

Comment on attachment 723104 [details] [diff] [review] WIP1: explanation of the (caveats of the) token scanning interface This looks pretty good. I have a few minor comments below. I'm happy to do a proper review after you've fixed them; just request ":njn" for review. >+// In the first case it does not contain any meaningful data, in the second >+// case the fields .pos.begin.lineno and .pos.begin.index have been filled with >+// proper data. An interesting case: while working on bug 747831 I discovered that we erroneously access .pos.end before it has been set in any of the cases in getTokenInternal() -- .pos.end is set at the end of that function, but we call reportError/reportWarning and then end up in reportCompileErrorNumberVA, which can read .pos.end. I'm fixing that in bug 747831. >+// The following table lists in which situations it is secure to call each >+// listed function. No checks are made by the functions in non-debug builds. >+// >+// Function Name | Precondition (Changes to |lookahead|) I change this to "Precondition; changes to |lookahead|" or "Precondition / changes to |lookahead|". The parentheses read like they're additional information about the precondition. >+// ------------------+--------------------------------------------------------- >+// getToken | none (|lookahead > 0| -> |lookahead--|) >+// peekToken | 0 <= |lookahead| <= 1 (none) >+// peekTokenSameLine | see peekToken plus current token is not poisoned, see >+// | above (none) >+// matchToken | none (none) matchToken calls getToken, so it can change |lookahead|. >+// consumeKnownToken | none (none) >+// ungetToken | 0 <= |lookahead| < 3 and current token is not poisoned, >+// | see above (|lookahead++|) >+// currentToken | current token is not poisoned, see above (none) >+// <any functiont | >+// that queries | see currentToken (none) >+// information about | >+// the current token>| >+// >+// The behavior of the token scanning process (see getTokenInternal()) can be >+// modified by calling any but the last three of the above member functions with >+// an optional argument of type TokenStreamFlags. The two flags that do influence >+// the scanning process are TSF_OPERAND and TSF_KEYWORD_IS_NAME. However, they >+// will be ignored unless |lookahead == 0| holds. Yeah... that's pretty gross :/

Philipp Matthias Schäfer [:fiveop]

Assignee

Comment 7

•

13 years ago

Attached patch second draft (obsolete) — Details — Splinter Review

Thank you. I agree, ';' (or '/') make it clearer. And I missed the change of matchToken. What about major reordering of the member functions and variables (i.e. documentation via clear presentation)?

Attachment #723104 - Attachment is obsolete: true

Attachment #723628 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 8

•

13 years ago

Comment on attachment 723628 [details] [diff] [review] second draft AFAICT this is your second patch for Mozilla? It's great to see newcomers, so well done for getting to this point. The patch looks pretty good. I've done a pretty nitpicky review and I have lots of minor suggestions (which shouldn't be hard to address) so I'd like to see one more iteration. So I'm giving you f+ instead of r+. Thanks! >diff -r d2ae88e9925b js/src/frontend/TokenStream.h >--- a/js/src/frontend/TokenStream.h Mon Mar 11 15:31:14 2013 -0400 >+++ b/js/src/frontend/TokenStream.h Mon Mar 11 21:45:18 2013 +0100 How did you generate this patch? It doesn't look like a standard hg patch as used for Mozilla code. As a result, the "review" link in Bugzilla doesn't show it properly. >+// TokenStream is the lexical scanner for Javascript source text. >+// >+// It takes a buffer jschars and linearly scans it into |Token|s. s/buffer/buffer of/. >+// At any moment the current token is the one right before the one that would >+// be returned by a call to getToken(). It feels odd to describe the present relative to the future. Perhaps something like this would be better: "The current token is the one pointed to by |cursor|. getToken() advances the cursor by one." >+ If a TokenStream was just created or >+// its seek() member function was just called, the current token is poisoned: >+// In the first case it does not contain any meaningful data, in the second >+// case the fields .pos.begin.lineno and .pos.begin.index have been filled with >+// proper data. In bug 747831 I changed .pos.begin and .pos.end into single uint32_t values. So whichever patch lands first, the other will need to be updated accordingly. Also, I added a Valgrind annotation (via MOZ_MAKE_MEM_UNDEFINED) to poison .pos.end, which might be worth mentioning. >+// The circular buffer allows to go back up to three tokens from the last s/allows to go back/lets us go back/. Although the circular buffer has four elements, |maxLookahead| constrains us to back at most two -- see the comment on |nTokens|. So please change "three" to "two". >+// scanned token. Internally, the relative number of backward steps that were >+// taken (via ungetToken()) after the last token was scanned is stored in >+// |lookahead|. Put a blank line here. (Well, a blank line beginning with "//".) >+// The following table lists in which situations it is secure to call each >+// listed function. No checks are made by the functions in non-debug builds. Perhaps: "Poisoning and checks only occur in debug builds." >+// Function Name | Precondition; changes to |lookahead| >+// ------------------+--------------------------------------------------------- >+// getToken | none; |lookahead > 0| -> |lookahead--| >+// peekToken | 0 <= |lookahead| <= 1; none >+// peekTokenSameLine | see peekToken plus current token is not poisoned, see >+// | above; none For the precondition, I'd say "0 <= |lookahead| <= 1, and current token is not poisoned". No need for the "see above". >+// matchToken | none; if successfull see getToken(), otherwise >+// | |lookahead| remains unchanged s/successfull/successful/. But I'd change the "changes to |lookahead| to "|lookahead > 0| -> |lookahead--| if the match succeeds". >+// consumeKnownToken | none; none >+// ungetToken | 0 <= |lookahead| < 3 and current token is not poisoned, You used "<=" above but "<" here; please use "<=" consistently for easier reading. Hmm... ungetToken has this assertion: JS_ASSERT(lookahead < ntokensMask); but it should probably be this instead (like in peekToken()): JS_ASSERT(lookahead < maxLookahead); Can you try that change? If it passes the jit-tests it should be fine. And then you should change the condition to "0 <= |lookahead| <= 1". >+// | see above; |lookahead++| No need for the ", see above". >+// currentToken | current token is not poisoned, see above; none No need for the ", see above". >+// <any functiont | s/functiont/function/. >+// that queries | see currentToken; none >+// information about | >+// the current token>| Perhaps naming an example of such a function would help, e.g. isCurrentTokenEquality(). >+// The behavior of the token scanning process (see getTokenInternal()) can be >+// modified by calling any but the last three of the above member functions with >+// an optional argument of type TokenStreamFlags. I'd change "any but the last three" to "the first four". >+ The two flags that do influence Remove the "do". >+// the scanning process are TSF_OPERAND and TSF_KEYWORD_IS_NAME. However, they >+// will be ignored unless |lookahead == 0| holds. This is kind of alarming when you first realize it :) Please add this: "Due to constraints of the grammar, this turns out to rarely be a problem in practice. See the mozilla.dev.tech.js-engine.internals thread entitled 'Bug in the scanner?' for more details (https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.tech.js-engine.internals/2JLH5jRcr7E)." >+// The methods seek() and tell() allow to rescan from a previous visited >+// location of the buffer. However, tell() may only be called if |lookahead == 0| >+// holds. Maybe mention seek() and tell() in the table above, rather than here?

Attachment #723628 - Flags: review?(n.nethercote) → feedback+

Philipp Matthias Schäfer [:fiveop]

Assignee

Comment 9

•

13 years ago

>Although the circular buffer has four elements, |maxLookahead| constrains us to >back at most two -- see the comment on |nTokens|. So please change "three" to >"two". >Hmm... ungetToken has this assertion: > > JS_ASSERT(lookahead < ntokensMask); > >but it should probably be this instead (like in peekToken()): > > JS_ASSERT(lookahead < maxLookahead); > >Can you try that change? If it passes the jit-tests it should be fine. And >then you should change the condition to "0 <= |lookahead| <= 1". This is one thing, that was quite confusing. Consider the following scenario. Let R be the last parsed token (by getTokenInternal), C be the current token, and T any other token. Currently peekToken() is allowed if R == C or if the buffer looks like [C, R, T, T] (up to circularity). But your JS_ASSERT(lookahead < maxLookahead) in ungetToken would allow the buffer to get in a state equivalent to [C, T, R, T]. If you want to allow a |lookahead| of up to 2. It should probably be JS_ASSERT(lookahead <= maxLookahead) in peekToken() and a JS_ASSERT(lookahead < maxLookahead) in ungetToken(). The latter is very clearly what we want. If |lookahead| is not at max value yet, you may call ungetToken() (and hence increase |lookahead|). But then the JS_ASSERT in peekToken() is really superfluous, because the only Function increasing |lookahead| is ungetToken() and there we already have the appropriate assert. (But we can leave the assert in peekToken() anyway). jorendorff has already changed peekToken as suggested in his patch for bug 846406 .

Philipp Matthias Schäfer [:fiveop]

Assignee

Comment 10

•

13 years ago

Attached patch third draft (obsolete) — Details — Splinter Review

I copied a .hgrc example from the wiki and the patch looks (to me) like others that have been attached in bugzilla now. I applied most of your changes. Since TokenPos and the behaviour of tell and seek have changed considerable in a patch for bug 846406, we no longer need to talk about poisoned current tokens. When the TokenStream is created, token[cursor] is filled with random data (at least from a TokenStream user point of view). As soon as one of the 'reading' function members is called, the TokenStream will always have a valid current token. I made the change in ungetToken() and ran the jittests, which passed. > Perhaps naming an example of such a function would help, e.g. isCurrentTokenEquality(). Since those functions will always return valid data (after the first (indirect) getToken call after TokenStream construction), I removed them from the list. Only functions that potentially change |cursor| or |lookahead| are listed now. This is also why I do not want tell() and seek() in that list, they do not control 'micromovement' on the token stream, but rather store and restore its state.

Attachment #723628 - Attachment is obsolete: true

Attachment #726866 - Flags: review?(n.nethercote)

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 11

•

13 years ago

Comment on attachment 726866 [details] [diff] [review] third draft Review of attachment 726866 [details] [diff] [review]: ----------------------------------------------------------------- Lovely patch; I wonder if a few blank lines could do the comment some good, perhaps?

Nicholas Nethercote [inactive]

Comment 12

•

13 years ago

Bug 747831 just landed on mozilla-inbound, which changes TokenStream some more... would you mind rebasing your patch in light of those changes? Sorry for the hassle. TokenStream normally goes long periods of time without change, but recently it's seen a lot of activity.

Nicholas Nethercote [inactive]

Comment 13

•

13 years ago

> Bug 747831 just landed on mozilla-inbound And was just backed out due to Windows bustage :( Hopefully I'll reland later today.

Jason Orendorff [:jorendorff]

Reporter

Comment 14

•

13 years ago

> >+// At any moment the current token is the one right before the one that would > >+// be returned by a call to getToken(). > > It feels odd to describe the present relative to the future. Perhaps something > like this would be better: "The current token is the one pointed to by |cursor|. > getToken() advances the cursor by one." I prefer what fiveop wrote. "getToken() advances the cursor by one" doesn't tell which token is returned (the one cursor points to before or after advancing). (Incidentally I say TokenSource just fundamentally has some screws loose in the API that cause a lot of confusion, and a good mass-renaming might help.)

Nicholas Nethercote [inactive]

Comment 15

•

13 years ago

Attachment #726866 - Flags: review?(n.nethercote) → review+

Philipp Matthias Schäfer [:fiveop]

Assignee

Comment 16

•

13 years ago

Attached patch (hopefully) final patch — Details — Splinter Review

I don't have commit rights.

Attachment #726866 - Attachment is obsolete: true

Attachment #729167 - Flags: checkin?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 17

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f8c8f4386fe0 Thanks, Philipp! If you're looking for another bug to work on, bug 798914 might be a good one. It'll take you outside the JS engine, but should still be pretty easy.

Ryan VanderMeulen (PTO, back July 27)

Updated

•

13 years ago

Attachment #729167 - Flags: checkin?(n.nethercote)

Ryan VanderMeulen (PTO, back July 27)

Comment 18

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/f8c8f4386fe0

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla22

WIP1: explanation of the (caveats of the) token scanning interface 13 years ago Philipp Matthias Schäfer [:fiveop] 2.70 KB, patch		Details \| Diff \| Splinter Review
second draft 13 years ago Philipp Matthias Schäfer [:fiveop] 3.12 KB, patch	n.nethercote : feedback+	Details \| Diff \| Splinter Review
third draft 13 years ago Philipp Matthias Schäfer [:fiveop] 3.50 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(hopefully) final patch 13 years ago Philipp Matthias Schäfer [:fiveop] 3.49 KB, patch		Details \| Diff \| Splinter Review