Closed
Bug 898912
Opened 11 years ago
Closed 11 years ago
Speed up the JS scanner yet again
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
(Whiteboard: [js:t])
Attachments
(11 files)
4.87 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
8.30 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
6.56 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
12.66 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
4.79 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
2.71 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
2.14 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
4.75 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
1.16 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
2.32 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
1.15 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
I have some ideas with which I hope to prove the title of bug 645598 wrong.
Assignee | ||
Comment 1•11 years ago
|
||
I tried a bunch of ideas but it turns out there was less fat there than I thought. I got a very small improvement, but these patches end up being useful for cleaning up the code as much as the better performance. Till, I'll ask you for review because (a) you've been poking around near this code, (b) there isn't anybody else who knows it particularly well, and (c) it's quite self-contained and so not too hard to understand. Here are the instruction count changes from each patch for Parsemark and Unreal (the asm.js demo). Parsemark Unreal original 2070.190 14876.821 colon-is-one-char 2070.390 (1.000x) 14863.328 (1.001x) clever-one-char 2059.733 (1.005x) 14731.248 (1.010x) handle-EOL-separately 2059.987 (1.005x) 14735.960 (1.010x) reorder 2061.895 (1.004x) 14692.327 (1.013x) GetDecimalInteger 2059.692 (1.005x) 14621.951 (1.017x) betterIdent 2057.208 (1.006x) 14551.286 (1.022x) rm-dead-isCurrent (no change) (no change) rm-IN_HTML_COMMENT 2055.132 (1.007x) 14527.213 (1.024x) fix-dp-comment (no change) (no change)
Assignee | ||
Comment 2•11 years ago
|
||
There's a nice optimization in the scanner, which is that it quickly recognizes any chars that are known to be single-char tokens. For example, if we see ';' we know the token will be TOK_SEMI; in contrast, if we see '*' it could be TOK_STAR or TOK_MULASSIGN. When we supported e4x, '::' was an operator. But now that e4x is gone, ':' can be added to the list of unambiguous single-char tokens. It's a tiny win, but worth doing mostly because it's one less case to worry about. I also added ' ' to the "Space:" entry in the firstCharKinds comment, because it was missing.
Attachment #783584 -
Flags: review?(till)
Assignee | ||
Comment 3•11 years ago
|
||
Currently for single-char tokens such as ';', we do two lookups: one into firstCharKinds[], where we see it's OneChar, and then one into oneCharTokens[], where we convert the jschar to a TokenKind. This patch modifies the encoding of the OneChar kind in FirstCharKinds so that these tables are effectively merged. This avoids one array lookup on the hot single-char token path.
Attachment #783587 -
Flags: review?(till)
Assignee | ||
Comment 4•11 years ago
|
||
Currently Space and EOL are handled together. This patch splits them apart, which is basically perf-neutral but it will allow me to reorder the various cases to reflect real-life frequencies in a subsequent patch. Note that I had to unhoist the |newToken(-1)| call into every non-EOL case, so I made it JS_ALWAYS_INLINE just to make sure it's inlined in every case.
Attachment #783588 -
Flags: review?(till)
Assignee | ||
Comment 5•11 years ago
|
||
This patch removes the Plus, Equals and Dot kinds, because they're all rare enough that handling them separately provides a vanishingly small improvement. The patch also reorders the cases in getTokenInternal() to better match the order seen in real code. It's not obvious, but 90% of the changes here are just moving code around.
Attachment #783590 -
Flags: review?(till)
Assignee | ||
Comment 6•11 years ago
|
||
Attachment #783591 -
Flags: review?(till)
Assignee | ||
Comment 7•11 years ago
|
||
If an identifer contains unicode escapes (e.g. \u1234) it must be handled differently than an identifier which doesn't. The old code had three separate |hadUnicode| checks. This patch rearranges things so there is only one, and it's a tiny bit faster as well.
Attachment #783592 -
Flags: review?(till)
Assignee | ||
Comment 8•11 years ago
|
||
These functions are dead.
Attachment #783594 -
Flags: review?(till)
Assignee | ||
Comment 9•11 years ago
|
||
TokenStream's TSF_IN_HTML_COMMENT flag doesn't have any actual effect. The only place where we look at it is below the |skipline| label, and there the only difference between the TSF_IN_HTML_COMMENT and !TSF_IN_HTML_COMMENT cases is that the former clears TSF_IN_HTML_COMMENT! I wonder how long this flag has been present without effect.
Attachment #783596 -
Flags: review?(till)
Assignee | ||
Comment 10•11 years ago
|
||
If you look at the (c1kind == Dec) case in getTokenInternal(), you can see that the setting of |decimalPoint| is *not* dependent on the presence of an exponent.
Attachment #783598 -
Flags: review?(till)
Comment 11•11 years ago
|
||
Comment on attachment 783584 [details] [diff] [review] (part 1) - Treat ':' as a |OneChar| token now that e4x is gone and '::' is no longer supported. Review of attachment 783584 [details] [diff] [review]: ----------------------------------------------------------------- Including ' ' in Space seems like an exceedingly good idea. ::: js/src/frontend/TokenStream.cpp @@ +307,5 @@ > * The few token kinds satisfying these properties cover roughly 35--45% > * of the tokens seen in practice. > * > + * Nb: the following tables could be static, but initializing them this way > + * is much easier. Don't worry, the time to initialize them for each Nit: one space for new sentence. @@ +308,5 @@ > * of the tokens seen in practice. > * > + * Nb: the following tables could be static, but initializing them this way > + * is much easier. Don't worry, the time to initialize them for each > + * TokenStream is trivial. See bug 639420. and here
Attachment #783584 -
Flags: review?(till) → review+
Comment 12•11 years ago
|
||
Comment on attachment 783587 [details] [diff] [review] (part 2) - Merge oneCharTokens[] into firstCharKinds[] to avoid a lookup in the |OneChar| token case. Review of attachment 783587 [details] [diff] [review]: ----------------------------------------------------------------- Very nice! ::: js/src/frontend/TokenStream.cpp @@ +973,5 @@ > } > > enum FirstCharKind { > + // A jschar has the 'OneChar' kind if it, by itself, constitutes a valid > + // token that cannot also be a prefix of a longer token. E.g. ';' has the Nit: one space for new sentence. I'll stop mentioning these: they irritate me less and less, and I don't see too much value in creating review churn over them. @@ +1045,3 @@ > > +// Ensure the table's entries are big enough. > +JS_STATIC_ASSERT(LastCharKind < (1 << (sizeof(firstCharKinds[0]) * 8))); Can we use static_assert now (https://groups.google.com/forum/#!topic/mozilla.dev.platform/AEppKj6K9vQ) or doesn't that apply to SpiderMonkey yet? @@ +1128,5 @@ > /* > * Look for an unambiguous single-char token. > */ > + if (c1kind < OneChar_Max) { > + tt = (TokenKind)c1kind; Change this to a C++-style cast, maybe?
Attachment #783587 -
Flags: review?(till) → review+
Comment 13•11 years ago
|
||
Comment on attachment 783588 [details] [diff] [review] (part 3) - Separate handling of EOLs and whitespace. Review of attachment 783588 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.cpp @@ +1119,1 @@ > tt = (TokenKind)c1kind; Mmh, looking through the file, the C-style casts should probably be updated in one go, and not in a piecemeal fashion.
Attachment #783588 -
Flags: review?(till) → review+
Comment 14•11 years ago
|
||
Comment on attachment 783590 [details] [diff] [review] (part 4) - Remove unnecessary FirstCharKinds and reorder FirstCharKind handling. Review of attachment 783590 [details] [diff] [review]: ----------------------------------------------------------------- > It's not obvious, but 90% of the changes here are just moving code around. My kingdom for better diff tools! I tried identifying the moves and looking more closely at the real changes, but I'm not 100% sure I separated them cleanly. Anyway, the result does look correct to me. ::: js/src/frontend/TokenStream.cpp @@ +1461,5 @@ > + hasExp = false; > + goto decimal_dot; > + } > + if (c == '.') { > + qc = getCharIgnoreEOL(); Can't you use matchChar, here? Or don't you want to, because it's not used elsewhere in this `case`?
Attachment #783590 -
Flags: review?(till) → review+
Comment 15•11 years ago
|
||
Comment on attachment 783591 [details] [diff] [review] (part 5) - Add GetDecimalInteger(), a version of GetPrefixInteger() specialized for decimal integers. Review of attachment 783591 [details] [diff] [review]: ----------------------------------------------------------------- r=me
Attachment #783591 -
Flags: review?(till) → review+
Comment 16•11 years ago
|
||
Comment on attachment 783592 [details] [diff] [review] (part 6) - Clean up the handling of identifiers. Review of attachment 783592 [details] [diff] [review]: ----------------------------------------------------------------- nice
Attachment #783592 -
Flags: review?(till) → review+
Comment 17•11 years ago
|
||
Comment on attachment 783594 [details] [diff] [review] (part 7) - Remove dead isCurrentToken* functions. Review of attachment 783594 [details] [diff] [review]: ----------------------------------------------------------------- r=me
Attachment #783594 -
Flags: review?(till) → review+
Comment 18•11 years ago
|
||
Comment on attachment 783596 [details] [diff] [review] (part 8) - Remove the no-effect TSF_IN_HTML_COMMENT flag. Review of attachment 783596 [details] [diff] [review]: ----------------------------------------------------------------- http://www.youtube.com/watch?v=3ojhZaP5_JI
Attachment #783596 -
Flags: review?(till) → review+
Comment 19•11 years ago
|
||
Comment on attachment 783598 [details] [diff] [review] (part 9) - Fix comment about Token.u.decimalPoint. Review of attachment 783598 [details] [diff] [review]: ----------------------------------------------------------------- r=me Very nice bunch of cleanups, this patch series!
Attachment #783598 -
Flags: review?(till) → review+
Assignee | ||
Comment 20•11 years ago
|
||
> Can we use static_assert now > (https://groups.google.com/forum/#!topic/mozilla.dev.platform/AEppKj6K9vQ) > or doesn't that apply to SpiderMonkey yet? Good question. Ehsan changed all the MOZ_STATIC_ASSERT instances, but not the JS_STATIC_ASSERT ones. I'll change this one, and I just filed bug 900275 for removing the others. > Mmh, looking through the file, the C-style casts should probably be updated > in one go, and not in a piecemeal fashion. I'll do a follow-up patch. > > + if (c == '.') { > > + qc = getCharIgnoreEOL(); > > Can't you use matchChar, here? Or don't you want to, because it's not used > elsewhere in this `case`? matchChar() would work. This patch is just moving code around... I will do a follow-up if I can be bothered :)
Assignee | ||
Comment 21•11 years ago
|
||
Attachment #784104 -
Flags: review?(till)
Assignee | ||
Comment 22•11 years ago
|
||
Attachment #784105 -
Flags: review?(till)
Comment 23•11 years ago
|
||
Comment on attachment 784105 [details] [diff] [review] (part 11) - Use matchChar() to look for TOK_TRIPLEDOT. Review of attachment 784105 [details] [diff] [review]: ----------------------------------------------------------------- Most certainly.
Attachment #784105 -
Flags: review?(till) → review+
Comment 24•11 years ago
|
||
Comment on attachment 784104 [details] [diff] [review] (part 10) - Use C++-style casts in TokenStream.cpp. Review of attachment 784104 [details] [diff] [review]: ----------------------------------------------------------------- I hadn't realized that there were so few of them. Great.
Attachment #784104 -
Flags: review?(till) → review+
Assignee | ||
Comment 25•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/199bb6112200 https://hg.mozilla.org/integration/mozilla-inbound/rev/6671f7a67f19 https://hg.mozilla.org/integration/mozilla-inbound/rev/c996032668bf https://hg.mozilla.org/integration/mozilla-inbound/rev/0a4f1b961a98 https://hg.mozilla.org/integration/mozilla-inbound/rev/a418ca312cbd https://hg.mozilla.org/integration/mozilla-inbound/rev/a09b9ba8dc03 https://hg.mozilla.org/integration/mozilla-inbound/rev/06176970ae87 https://hg.mozilla.org/integration/mozilla-inbound/rev/7e2904bafdc4 https://hg.mozilla.org/integration/mozilla-inbound/rev/7a02c72ed5ed https://hg.mozilla.org/integration/mozilla-inbound/rev/8b9f64d7a47e
Assignee | ||
Comment 26•11 years ago
|
||
Whoops, I missed part 11: https://hg.mozilla.org/integration/mozilla-inbound/rev/95cf8e61d70d
Comment 27•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/199bb6112200 https://hg.mozilla.org/mozilla-central/rev/6671f7a67f19 https://hg.mozilla.org/mozilla-central/rev/c996032668bf https://hg.mozilla.org/mozilla-central/rev/0a4f1b961a98 https://hg.mozilla.org/mozilla-central/rev/a418ca312cbd https://hg.mozilla.org/mozilla-central/rev/a09b9ba8dc03 https://hg.mozilla.org/mozilla-central/rev/06176970ae87 https://hg.mozilla.org/mozilla-central/rev/7e2904bafdc4 https://hg.mozilla.org/mozilla-central/rev/7a02c72ed5ed https://hg.mozilla.org/mozilla-central/rev/8b9f64d7a47e https://hg.mozilla.org/mozilla-central/rev/95cf8e61d70d
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
You need to log in
before you can comment on or make changes to this bug.
Description
•