Closed Bug 672760 Opened 13 years ago Closed 13 years ago

The tokenizer should not pass EOF directly into the unicode support code

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: evilpie, Assigned: evilpie)

References

Details

(Whiteboard: [mentor=jorendorff])

Attachments

(1 file)

      No description provided.
Depends on: 652771
Whiteboard: [mentor=jorendorff]
Okay i am trying to do this, but i am not sure what the best way is here and how this would affect performance. I think we could it likes this, and hope that compilers will be able to optimize it away. 

>if (jschar(EOF) != 0xffff) {
>  if (c == EOF) break;
>}
Blocks: 652771
No longer depends on: 652771
I looked at the scanner cases that do this.  Most are in cold code paths.  (E4X, identifier names with embedded Unicode escapes, &c.)  And for the remainder -- the first loop over characters in an identifier, the character immediately following a Dec or HexOct number -- I think we can pretty easily handle the cost of one extra (fast) comparison in what are tight loops already, especially compared to the cost of computing the actual number for those numeric literals (using jsdtoa, or even a loop repeatedly adding and multiplying doubles) or the cost of atomizing the identifier.  This feels like a premature optimization to be worrying about.
Attached patch v1Splinter Review
Applies on top of the update patch.
Assignee: general → evilpies
Status: NEW → ASSIGNED
Attachment #547951 - Flags: review?(jwalden+bmo)
Comment on attachment 547951 [details] [diff] [review]
v1

Review of attachment 547951 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed (which I guess I'll do now, since as far as I know you're still in the wrong-level-to-commit-to-inbound conundrum).  Since the patch for bug 652771 depends on this one, I'll push this patch first, then that one.  But first: a try run of both of them.  :-)

::: js/src/jsscan.cpp
@@ +1151,5 @@
>                      return true;
>                  }
>                  line = temp;
>              }
> +            while (c != '\n' && && c != EOF && IsSpaceOrBOM2(c))

Um...that doesn't look like it compiles.  :-)  Or if it does, it's not what you wanted to do.

@@ +1350,5 @@
>       * Chars not in the range 0..127 are rare.  Getting them out of the way
>       * early allows subsequent checking to be faster.
>       */
> +    if (JS_UNLIKELY(c >= 128)) {
> +        if (c != EOF && IsSpaceOrBOM2(c)) {

I don't believe this one is necessary.  hasRawChars() called above this is true, and thus getRawChar() will always have a character to return.  Maybe add a JS_ASSERT(c != EOF) above the getRawChar() call to make that clearer?

@@ +1363,5 @@
>          tp = newToken(-1);
>          
>          /* '$' and '_' don't pass IsLetter, but they're < 128 so never appear here. */
>          JS_STATIC_ASSERT('$' < 128 && '_' < 128);
> +        if (c != EOF && IsLetter(c)) {

Nor is this one necessary, either.
Attachment #547951 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 547951 [details] [diff] [review]
v1

Review of attachment 547951 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscan.cpp
@@ +1418,5 @@
>  
>        identifier:
>          for (;;) {
>              c = getCharIgnoreEOL();
> +            if (c != EOF && !IsIdentifierPart(c)) {

Try server quickly flagged this as the wrong way to do it -- should have been |if (c == EOF) break;|, instead of looping forever.  But that was the only change that had to be made here, before pushing.
Batched up bug 652771 and this bug into a single push, since they're kind of interdependent:

http://hg.mozilla.org/integration/mozilla-inbound/rev/54b8ca3b0c7a
Whiteboard: [mentor=jorendorff] → [mentor=jorendorff][inbound]
Target Milestone: --- → mozilla8
http://hg.mozilla.org/mozilla-central/rev/54b8ca3b0c7a
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [mentor=jorendorff][inbound] → [mentor=jorendorff]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: