Last Comment Bug 672760 - The tokenizer should not pass EOF directly into the unicode support code
: The tokenizer should not pass EOF directly into the unicode support code
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
-- normal (vote)
: mozilla8
Assigned To: Tom Schuster [:evilpie]
: Jason Orendorff [:jorendorff]
Depends on:
Blocks: 652771
  Show dependency treegraph
Reported: 2011-07-20 04:49 PDT by Tom Schuster [:evilpie]
Modified: 2011-07-27 12:35 PDT (History)
4 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

v1 (5.37 KB, patch)
2011-07-23 13:01 PDT, Tom Schuster [:evilpie]
jwalden+bmo: review+
Details | Diff | Splinter Review

Description User image Tom Schuster [:evilpie] 2011-07-20 04:49:55 PDT

Comment 1 User image Tom Schuster [:evilpie] 2011-07-22 16:08:17 PDT
Okay i am trying to do this, but i am not sure what the best way is here and how this would affect performance. I think we could it likes this, and hope that compilers will be able to optimize it away. 

>if (jschar(EOF) != 0xffff) {
>  if (c == EOF) break;
Comment 2 User image Jeff Walden [:Waldo] (remove +bmo to email) 2011-07-22 16:38:37 PDT
I looked at the scanner cases that do this.  Most are in cold code paths.  (E4X, identifier names with embedded Unicode escapes, &c.)  And for the remainder -- the first loop over characters in an identifier, the character immediately following a Dec or HexOct number -- I think we can pretty easily handle the cost of one extra (fast) comparison in what are tight loops already, especially compared to the cost of computing the actual number for those numeric literals (using jsdtoa, or even a loop repeatedly adding and multiplying doubles) or the cost of atomizing the identifier.  This feels like a premature optimization to be worrying about.
Comment 3 User image Tom Schuster [:evilpie] 2011-07-23 13:01:40 PDT
Created attachment 547951 [details] [diff] [review]

Applies on top of the update patch.
Comment 4 User image Jeff Walden [:Waldo] (remove +bmo to email) 2011-07-26 14:01:49 PDT
Comment on attachment 547951 [details] [diff] [review]

Review of attachment 547951 [details] [diff] [review]:

r+ with comments addressed (which I guess I'll do now, since as far as I know you're still in the wrong-level-to-commit-to-inbound conundrum).  Since the patch for bug 652771 depends on this one, I'll push this patch first, then that one.  But first: a try run of both of them.  :-)

::: js/src/jsscan.cpp
@@ +1151,5 @@
>                      return true;
>                  }
>                  line = temp;
>              }
> +            while (c != '\n' && && c != EOF && IsSpaceOrBOM2(c))

Um...that doesn't look like it compiles.  :-)  Or if it does, it's not what you wanted to do.

@@ +1350,5 @@
>       * Chars not in the range 0..127 are rare.  Getting them out of the way
>       * early allows subsequent checking to be faster.
>       */
> +    if (JS_UNLIKELY(c >= 128)) {
> +        if (c != EOF && IsSpaceOrBOM2(c)) {

I don't believe this one is necessary.  hasRawChars() called above this is true, and thus getRawChar() will always have a character to return.  Maybe add a JS_ASSERT(c != EOF) above the getRawChar() call to make that clearer?

@@ +1363,5 @@
>          tp = newToken(-1);
>          /* '$' and '_' don't pass IsLetter, but they're < 128 so never appear here. */
>          JS_STATIC_ASSERT('$' < 128 && '_' < 128);
> +        if (c != EOF && IsLetter(c)) {

Nor is this one necessary, either.
Comment 5 User image Jeff Walden [:Waldo] (remove +bmo to email) 2011-07-27 00:14:05 PDT
Comment on attachment 547951 [details] [diff] [review]

Review of attachment 547951 [details] [diff] [review]:

::: js/src/jsscan.cpp
@@ +1418,5 @@
>        identifier:
>          for (;;) {
>              c = getCharIgnoreEOL();
> +            if (c != EOF && !IsIdentifierPart(c)) {

Try server quickly flagged this as the wrong way to do it -- should have been |if (c == EOF) break;|, instead of looping forever.  But that was the only change that had to be made here, before pushing.
Comment 6 User image Jeff Walden [:Waldo] (remove +bmo to email) 2011-07-27 00:15:00 PDT
Batched up bug 652771 and this bug into a single push, since they're kind of interdependent:
Comment 7 User image Tom Schuster [:evilpie] 2011-07-27 12:35:05 PDT

Note You need to log in before you can comment on or make changes to this bug.