Okay i am trying to do this, but i am not sure what the best way is here and how this would affect performance. I think we could it likes this, and hope that compilers will be able to optimize it away. 

>if (jschar(EOF) != 0xffff) {
>  if (c == EOF) break;
>}

Tom S [:evilpie]

Assignee

Updated

•

13 years ago

Blocks: 652771

No longer depends on: 652771

Jeff Walden [:Waldo]

Comment 2

•

13 years ago

I looked at the scanner cases that do this.  Most are in cold code paths.  (E4X, identifier names with embedded Unicode escapes, &c.)  And for the remainder -- the first loop over characters in an identifier, the character immediately following a Dec or HexOct number -- I think we can pretty easily handle the cost of one extra (fast) comparison in what are tight loops already, especially compared to the cost of computing the actual number for those numeric literals (using jsdtoa, or even a loop repeatedly adding and multiplying doubles) or the cost of atomizing the identifier.  This feels like a premature optimization to be worrying about.

Tom S [:evilpie]

Assignee

Comment 3

•

13 years ago

Attached patch v1 — Details — Splinter Review

Applies on top of the update patch.

Assignee: general → evilpies

Status: NEW → ASSIGNED

Tom S [:evilpie]

Assignee

Updated

•

13 years ago

Attachment #547951 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 4

•

13 years ago

Comment on attachment 547951 [details] [diff] [review]
v1

Review of attachment 547951 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed (which I guess I'll do now, since as far as I know you're still in the wrong-level-to-commit-to-inbound conundrum).  Since the patch for bug 652771 depends on this one, I'll push this patch first, then that one.  But first: a try run of both of them.  :-)

::: js/src/jsscan.cpp
@@ +1151,5 @@
>                      return true;
>                  }
>                  line = temp;
>              }
> +            while (c != '\n' && && c != EOF && IsSpaceOrBOM2(c))

Um...that doesn't look like it compiles.  :-)  Or if it does, it's not what you wanted to do.

@@ +1350,5 @@
>       * Chars not in the range 0..127 are rare.  Getting them out of the way
>       * early allows subsequent checking to be faster.
>       */
> +    if (JS_UNLIKELY(c >= 128)) {
> +        if (c != EOF && IsSpaceOrBOM2(c)) {

I don't believe this one is necessary.  hasRawChars() called above this is true, and thus getRawChar() will always have a character to return.  Maybe add a JS_ASSERT(c != EOF) above the getRawChar() call to make that clearer?

@@ +1363,5 @@
>          tp = newToken(-1);
>          
>          /* '$' and '_' don't pass IsLetter, but they're < 128 so never appear here. */
>          JS_STATIC_ASSERT('$' < 128 && '_' < 128);
> +        if (c != EOF && IsLetter(c)) {

Nor is this one necessary, either.

Attachment #547951 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 5

•

13 years ago

Comment on attachment 547951 [details] [diff] [review]
v1

Review of attachment 547951 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscan.cpp
@@ +1418,5 @@
>  
>        identifier:
>          for (;;) {
>              c = getCharIgnoreEOL();
> +            if (c != EOF && !IsIdentifierPart(c)) {

Try server quickly flagged this as the wrong way to do it -- should have been |if (c == EOF) break;|, instead of looping forever.  But that was the only change that had to be made here, before pushing.

Jeff Walden [:Waldo]

Comment 6

•

13 years ago

Batched up bug 652771 and this bug into a single push, since they're kind of interdependent:

http://hg.mozilla.org/integration/mozilla-inbound/rev/54b8ca3b0c7a

Whiteboard: [mentor=jorendorff] → [mentor=jorendorff][inbound]

Target Milestone: --- → mozilla8

Tom S [:evilpie]

Assignee

Comment 7

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/54b8ca3b0c7a

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Tom S [:evilpie]

Assignee

Updated

•

13 years ago

Whiteboard: [mentor=jorendorff][inbound] → [mentor=jorendorff]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

The tokenizer should not pass EOF directly into the unicode support code

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: evilpie, Assigned: evilpie)

References

Details

(Whiteboard: [mentor=jorendorff])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Attachment

General

Description

File Name

Content Type