Closed Bug 898912 Opened 11 years ago Closed 11 years ago

Speed up the JS scanner yet again

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla25

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: [js:t])

Attachments

(11 files)

4.87 KB, patch
till
: review+
Details | Diff | Splinter Review
8.30 KB, patch
till
: review+
Details | Diff | Splinter Review
6.56 KB, patch
till
: review+
Details | Diff | Splinter Review
12.66 KB, patch
till
: review+
Details | Diff | Splinter Review
4.79 KB, patch
till
: review+
Details | Diff | Splinter Review
2.71 KB, patch
till
: review+
Details | Diff | Splinter Review
2.14 KB, patch
till
: review+
Details | Diff | Splinter Review
4.75 KB, patch
till
: review+
Details | Diff | Splinter Review
1.16 KB, patch
till
: review+
Details | Diff | Splinter Review
2.32 KB, patch
till
: review+
Details | Diff | Splinter Review
1.15 KB, patch
till
: review+
Details | Diff | Splinter Review
I have some ideas with which I hope to prove the title of bug 645598 wrong.
I tried a bunch of ideas but it turns out there was less fat there than I
thought.  I got a very small improvement, but these patches end up being useful
for cleaning up the code as much as the better performance.

Till, I'll ask you for review because (a) you've been poking around near this
code, (b) there isn't anybody else who knows it particularly well, and (c) it's
quite self-contained and so not too hard to understand.

Here are the instruction count changes from each patch for Parsemark and Unreal
(the asm.js demo).
                        
                        Parsemark           Unreal
original                2070.190            14876.821 
colon-is-one-char       2070.390 (1.000x)   14863.328 (1.001x)
clever-one-char         2059.733 (1.005x)   14731.248 (1.010x)
handle-EOL-separately   2059.987 (1.005x)   14735.960 (1.010x)
reorder                 2061.895 (1.004x)   14692.327 (1.013x)
GetDecimalInteger       2059.692 (1.005x)   14621.951 (1.017x)
betterIdent             2057.208 (1.006x)   14551.286 (1.022x)
rm-dead-isCurrent       (no change)         (no change)
rm-IN_HTML_COMMENT      2055.132 (1.007x)   14527.213 (1.024x)
fix-dp-comment          (no change)         (no change)
There's a nice optimization in the scanner, which is that it quickly recognizes
any chars that are known to be single-char tokens.  For example, if we see ';'
we know the token will be TOK_SEMI;  in contrast, if we see '*' it could be
TOK_STAR or TOK_MULASSIGN.

When we supported e4x, '::' was an operator.  But now that e4x is gone, ':' can
be added to the list of unambiguous single-char tokens.  It's a tiny win, but
worth doing mostly because it's one less case to worry about.

I also added ' ' to the "Space:" entry in the firstCharKinds comment, because
it was missing.
Attachment #783584 - Flags: review?(till)
Currently for single-char tokens such as ';', we do two lookups:  one into
firstCharKinds[], where we see it's OneChar, and then one into oneCharTokens[],
where we convert the jschar to a TokenKind.

This patch modifies the encoding of the OneChar kind in FirstCharKinds so that
these tables are effectively merged.  This avoids one array lookup on the hot
single-char token path.
Attachment #783587 - Flags: review?(till)
Currently Space and EOL are handled together.  This patch splits them apart,
which is basically perf-neutral but it will allow me to reorder the various
cases to reflect real-life frequencies in a subsequent patch.

Note that I had to unhoist the |newToken(-1)| call into every non-EOL case, so
I made it JS_ALWAYS_INLINE just to make sure it's inlined in every case.
Attachment #783588 - Flags: review?(till)
This patch removes the Plus, Equals and Dot kinds, because they're all rare
enough that handling them separately provides a vanishingly small improvement.
The patch also reorders the cases in getTokenInternal() to better match the
order seen in real code.

It's not obvious, but 90% of the changes here are just moving code around.
Attachment #783590 - Flags: review?(till)
If an identifer contains unicode escapes (e.g. \u1234) it must be handled
differently than an identifier which doesn't.  The old code had three separate
|hadUnicode| checks.  This patch rearranges things so there is only one, and
it's a tiny bit faster as well.
Attachment #783592 - Flags: review?(till)
These functions are dead.
Attachment #783594 - Flags: review?(till)
TokenStream's TSF_IN_HTML_COMMENT flag doesn't have any actual effect.  The
only place where we look at it is below the |skipline| label, and there the
only difference between the TSF_IN_HTML_COMMENT and !TSF_IN_HTML_COMMENT cases
is that the former clears TSF_IN_HTML_COMMENT!

I wonder how long this flag has been present without effect.
Attachment #783596 - Flags: review?(till)
If you look at the (c1kind == Dec) case in getTokenInternal(), you can see that
the setting of |decimalPoint| is *not* dependent on the presence of an
exponent.
Attachment #783598 - Flags: review?(till)
Comment on attachment 783584 [details] [diff] [review]
(part 1) - Treat ':' as a |OneChar| token now that e4x is gone and '::' is no longer supported.

Review of attachment 783584 [details] [diff] [review]:
-----------------------------------------------------------------

Including ' ' in Space seems like an exceedingly good idea.

::: js/src/frontend/TokenStream.cpp
@@ +307,5 @@
>       * The few token kinds satisfying these properties cover roughly 35--45%
>       * of the tokens seen in practice.
>       *
> +     * Nb: the following tables could be static, but initializing them this way
> +     * is much easier.  Don't worry, the time to initialize them for each

Nit: one space for new sentence.

@@ +308,5 @@
>       * of the tokens seen in practice.
>       *
> +     * Nb: the following tables could be static, but initializing them this way
> +     * is much easier.  Don't worry, the time to initialize them for each
> +     * TokenStream is trivial.  See bug 639420.

and here
Attachment #783584 - Flags: review?(till) → review+
Comment on attachment 783587 [details] [diff] [review]
(part 2) - Merge oneCharTokens[] into firstCharKinds[] to avoid a lookup in the |OneChar| token case.

Review of attachment 783587 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice!

::: js/src/frontend/TokenStream.cpp
@@ +973,5 @@
>  }
>  
>  enum FirstCharKind {
> +    // A jschar has the 'OneChar' kind if it, by itself, constitutes a valid
> +    // token that cannot also be a prefix of a longer token.  E.g. ';' has the

Nit: one space for new sentence. I'll stop mentioning these: they irritate me less and less, and I don't see too much value in creating review churn over them.

@@ +1045,3 @@
>  
> +// Ensure the table's entries are big enough.
> +JS_STATIC_ASSERT(LastCharKind < (1 << (sizeof(firstCharKinds[0]) * 8)));

Can we use static_assert now (https://groups.google.com/forum/#!topic/mozilla.dev.platform/AEppKj6K9vQ) or doesn't that apply to SpiderMonkey yet?

@@ +1128,5 @@
>      /*
>       * Look for an unambiguous single-char token.
>       */
> +    if (c1kind < OneChar_Max) {
> +        tt = (TokenKind)c1kind;

Change this to a C++-style cast, maybe?
Attachment #783587 - Flags: review?(till) → review+
Comment on attachment 783588 [details] [diff] [review]
(part 3) - Separate handling of EOLs and whitespace.

Review of attachment 783588 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.cpp
@@ +1119,1 @@
>          tt = (TokenKind)c1kind;

Mmh, looking through the file, the C-style casts should probably be updated in one go, and not in a piecemeal fashion.
Attachment #783588 - Flags: review?(till) → review+
Comment on attachment 783590 [details] [diff] [review]
(part 4) - Remove unnecessary FirstCharKinds and reorder FirstCharKind handling.

Review of attachment 783590 [details] [diff] [review]:
-----------------------------------------------------------------

> It's not obvious, but 90% of the changes here are just moving code around.

My kingdom for better diff tools!

I tried identifying the moves and looking more closely at the real changes, but I'm not 100% sure I separated them cleanly. Anyway, the result does look correct to me.

::: js/src/frontend/TokenStream.cpp
@@ +1461,5 @@
> +            hasExp = false;
> +            goto decimal_dot;
> +        }
> +        if (c == '.') {
> +            qc = getCharIgnoreEOL();

Can't you use matchChar, here? Or don't you want to, because it's not used elsewhere in this `case`?
Attachment #783590 - Flags: review?(till) → review+
Comment on attachment 783591 [details] [diff] [review]
(part 5) - Add GetDecimalInteger(), a version of GetPrefixInteger() specialized for decimal integers.

Review of attachment 783591 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #783591 - Flags: review?(till) → review+
Comment on attachment 783592 [details] [diff] [review]
(part 6) - Clean up the handling of identifiers.

Review of attachment 783592 [details] [diff] [review]:
-----------------------------------------------------------------

nice
Attachment #783592 - Flags: review?(till) → review+
Comment on attachment 783594 [details] [diff] [review]
(part 7) - Remove dead isCurrentToken* functions.

Review of attachment 783594 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #783594 - Flags: review?(till) → review+
Comment on attachment 783596 [details] [diff] [review]
(part 8) - Remove the no-effect TSF_IN_HTML_COMMENT flag.

Review of attachment 783596 [details] [diff] [review]:
-----------------------------------------------------------------

http://www.youtube.com/watch?v=3ojhZaP5_JI
Attachment #783596 - Flags: review?(till) → review+
Comment on attachment 783598 [details] [diff] [review]
(part 9) - Fix comment about Token.u.decimalPoint.

Review of attachment 783598 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

Very nice bunch of cleanups, this patch series!
Attachment #783598 - Flags: review?(till) → review+
> Can we use static_assert now
> (https://groups.google.com/forum/#!topic/mozilla.dev.platform/AEppKj6K9vQ)
> or doesn't that apply to SpiderMonkey yet?

Good question.  Ehsan changed all the MOZ_STATIC_ASSERT instances, but not the JS_STATIC_ASSERT ones.  I'll change this one, and I just filed bug 900275 for removing the others.

> Mmh, looking through the file, the C-style casts should probably be updated
> in one go, and not in a piecemeal fashion.

I'll do a follow-up patch.

> > +        if (c == '.') {
> > +            qc = getCharIgnoreEOL();
> 
> Can't you use matchChar, here? Or don't you want to, because it's not used
> elsewhere in this `case`?

matchChar() would work.  This patch is just moving code around... I will do a follow-up if I can be bothered :)
Comment on attachment 784105 [details] [diff] [review]
(part 11) - Use matchChar() to look for TOK_TRIPLEDOT.

Review of attachment 784105 [details] [diff] [review]:
-----------------------------------------------------------------

Most certainly.
Attachment #784105 - Flags: review?(till) → review+
Comment on attachment 784104 [details] [diff] [review]
(part 10) - Use C++-style casts in TokenStream.cpp.

Review of attachment 784104 [details] [diff] [review]:
-----------------------------------------------------------------

I hadn't realized that there were so few of them. Great.
Attachment #784104 - Flags: review?(till) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: