Last Comment Bug 637549 - Speed up expression parsing
: Speed up expression parsing
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Nicholas Nethercote [:njn] (on vacation until July 11)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-28 18:35 PST by Nicholas Nethercote [:njn] (on vacation until July 11)
Modified: 2011-03-31 14:55 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 1 (against TM 62858:f85299200bd3) (10.99 KB, patch)
2011-02-28 19:34 PST, Nicholas Nethercote [:njn] (on vacation until July 11)
jimb: review+
brendan: review+
Details | Diff | Review
patch 2 (15.21 KB, patch)
2011-02-28 19:54 PST, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
patch 2, v2 (13.86 KB, patch)
2011-03-17 20:38 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
jimb: review+
brendan: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] (on vacation until July 11) 2011-02-28 18:35:18 PST
I have two patches that speed up expression parsing.  Coming shortly.
Comment 1 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-02-28 19:34:44 PST
Created attachment 515815 [details] [diff] [review]
patch 1 (against TM 62858:f85299200bd3)

If you're parsing an expression that doesn't involve any operators, eg. |3| or |foo|, a whole heap of checks for operators fail in the parsers condExpr, orExpr, ..., addExpr, mulExpr.  More specifically, in every one of those we call matchToken() once or twice to see if a particular operator is present.  Most of the time, no matching operator is present.  And a failing matchToken() involves a getToken() followed by an ungetToken().  So we end up ungetting and regetting the same token about 15 times in a row.  Even though ungetting and regetting tokens is cheap, it adds up.

This patch tweaks these parsers so that when they return, TokenStream's state is always one token past the end of the expression, instead of zero tokens past the end.  This means the next token is always at hand, so we can avoid all the ungetting and regetting.  (cdleary, I think you tried doing something like this at one point?  I couldn't find the bug, though I think it had a title like "pump prime the parser"?)

For benchmarking, I used parsemark with an additional test (kraken-imaging) that contains the huge array-of-numbers from Kraken's imaging tests.  Instruction count results with the patch:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    15.111    14.244 (1.061x) |    163875    164163 (0.998x) | 280slides-comb
|    61.342    56.730 (1.081x) |    835251    836595 (0.998x) | ball-pool-comb
|    31.240    28.649 (1.090x) |    395039    395979 (0.998x) | dojo
|    57.514    53.144 (1.082x) |    764475    765647 (0.998x) | effectgames-ga
|    14.342    13.174 (1.089x) |    177572    178058 (0.997x) | ga
|   324.741   291.769 (1.113x) |     5.069     5.080 (0.998x) | gmail-combined
|    94.907    86.478 (1.097x) |     1.313     1.315 (0.998x) | gravity-combin
|    21.909    20.206 (1.084x) |    273781    274451 (0.998x) | jquery-min
|    39.960    36.004 (1.110x) |    520153    521695 (0.997x) | jsgb-combined
|   779.284   645.875 (1.207x) |    10.870    10.870 (------) | kraken-imaging
|    58.048    53.602 (1.083x) |    788597    790169 (0.998x) | mochikit
|   273.672   248.183 (1.103x) |     3.818     3.829 (0.997x) | pipio-combined
|    12.362    11.688 (1.058x) |    127551    127865 (0.998x) | processing-twi
|    41.228    39.026 (1.056x) |    336458    336918 (0.999x) | sunspider-comb
|    18.487    16.969 (1.089x) |    234882    235466 (0.998x) | tetris-combine
|    70.310    64.348 (1.093x) |    968758    971506 (0.997x) | twitter-combin
|    74.674    68.090 (1.097x) |    945588    947736 (0.998x) | v8-combined
|     9.567     8.987 (1.065x) |    102786    103026 (0.998x) | yui-min
|   434.166   397.597 (1.092x) |     6.017     6.026 (0.998x) | zimbra-combine
-------
|  2432.874  2154.772 (1.129x) |    33.724    33.772 (0.999x) | all

I imported the parsemark+kraken-imaging files into the SunSpider harness and saw a 1.10x speedup, which matches the instruction counts closely.  I also tried measuring with test/parsemark.py, but I've never managed to see anything close to deterministic results with it on previous occasions and the same was true this time.

I also measured some function counts.  The patch reduces the number of unget/reget operations by roughly 3x.
Comment 2 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-02-28 19:54:05 PST
Created attachment 515818 [details] [diff] [review]
patch 2

This patch is less elegant than the previous one.  It just forces inlining of all the expression parsers in the hot cases (ie. when the operator matching fails).  This requires having two versions of each expression parser, one that's always inlined and one that's never inlined.  I reordered the functions so that functions are always defined before their call points, to assist the compiler with the inlining.

I see a 1.063x speedup and the following instruction count improvements (both of those are relative to the first patch, ie. with both patches applied I see a 1.17x speedup).

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    14.244    13.883 (1.026x) |    164163    160059 (1.026x) | 280slides-comb
|    56.729    54.753 (1.036x) |    836595    808549 (1.035x) | ball-pool-comb
|    28.649    27.575 (1.039x) |    395979    384547 (1.030x) | dojo
|    53.144    51.386 (1.034x) |    765647    747007 (1.025x) | effectgames-ga
|    13.174    12.686 (1.038x) |    178058    171950 (1.036x) | ga
|   291.769   277.652 (1.051x) |     5.080     4.897 (1.037x) | gmail-combined
|    86.478    82.920 (1.043x) |     1.315     1.271 (1.035x) | gravity-combin
|    20.206    19.494 (1.037x) |    274451    265883 (1.032x) | jquery-min
|    36.004    34.462 (1.045x) |    521695    509671 (1.024x) | jsgb-combined
|   645.874   595.846 (1.084x) |    10.870    10.870 (------) | kraken-imaging
|    53.603    51.736 (1.036x) |    790169    767213 (1.030x) | mochikit
|   248.183   237.449 (1.045x) |     3.829     3.690 (1.038x) | pipio-combined
|    11.688    11.407 (1.025x) |    127865    124451 (1.027x) | processing-twi
|    39.026    38.162 (1.023x) |    336918    331574 (1.016x) | sunspider-comb
|    16.969    16.328 (1.039x) |    235466    227870 (1.033x) | tetris-combine
|    64.348    61.822 (1.041x) |    971506    935974 (1.038x) | twitter-combin
|    68.090    65.435 (1.041x) |    947736    925136 (1.024x) | v8-combined
|     8.987     8.747 (1.027x) |    103026    100566 (1.024x) | yui-min
|   397.597   382.151 (1.040x) |     6.026     5.838 (1.032x) | zimbra-combine
-------
|  2154.772  2043.902 (1.054x) |    33.772    33.029 (1.023x) | all

I see a ~12KB increase in the binary size on a 32-bit linux build and a ~20KB increase on a 32-bit mac build.
Comment 3 Jim Blandy :jimb 2011-02-28 21:33:12 PST
Wow, those are great results. I will try to look at this tomorrow.
Comment 4 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-02-28 21:43:23 PST
Wait until you see what I have in store for the scanner! :)

No rush on the review, this obviously is post-4.0.
Comment 5 Chris Leary [:cdleary] (not checking bugmail) 2011-03-01 09:04:23 PST
(In reply to comment #1)
> This patch tweaks these parsers so that when they return, TokenStream's state
> is always one token past the end of the expression, instead of zero tokens past
> the end.  This means the next token is always at hand, so we can avoid all the
> ungetting and regetting.  (cdleary, I think you tried doing something like this
> at one point?  I couldn't find the bug, though I think it had a title like
> "pump prime the parser"?)

Bug 556486, my first patch. I saw no speedup on parsemark.
Comment 6 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-16 20:07:20 PDT
Review ping!

With bug 639420 having landed, the combined instruction count reduction from these two patches has improved from 1.17x to 1.248x.
Comment 7 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-17 20:38:54 PDT
Created attachment 520122 [details] [diff] [review]
patch 2, v2

Somehow I managed to accidentally combine patch 1 into patch 2 v1.  Here is the proper patch.
Comment 8 Brendan Eich [:brendan] 2011-03-18 18:45:58 PDT
Comment on attachment 515815 [details] [diff] [review]
patch 1 (against TM 62858:f85299200bd3)

>@@ -6598,137 +6597,142 @@ Parser::condExpr()
>         JSParseNode *pn3 = assignExpr();
>         if (!pn3)
>             return NULL;
>         pn->pn_pos.begin = pn1->pn_pos.begin;
>         pn->pn_pos.end = pn3->pn_pos.end;
>         pn->pn_kid1 = pn1;
>         pn->pn_kid2 = pn2;
>         pn->pn_kid3 = pn3;
>+        tokenStream.getToken();     /* need to read one token past the end */

This can fail due to invalid char/token, right?

r=me FWIW, :jimb r+ awaited -- good thinking!

/be
Comment 9 Brendan Eich [:brendan] 2011-03-18 18:51:24 PDT
Comment on attachment 520122 [details] [diff] [review]
patch 2, v2

..12345678901234567890123456789012345678901234567890123456789012345678901234567890
>+#define BEGIN_EXPR_PARSER(name) \
>+    JS_ALWAYS_INLINE JSParseNode * \
>+    Parser::name##i() \
>+    {
>+
>+#define END_EXPR_PARSER(name) \
>+    } \
>+    JS_NEVER_INLINE JSParseNode * \
>+    Parser::name##n() { \
>+        return name##i(); \
>+    }

Please put \ in colument 79 per prevailing style. If you can squish macro bodies to fit without too much unreadability, do it.

>+BEGIN_EXPR_PARSER(mulExpr1)
>+    TokenKind tt;
>+    JSParseNode *pn = unaryExpr();

Nit: blank line here.

>+    /*
>+     * Note: unlike addExpr1() et al, we use getToken() here instead of
>+     * isCurrentTokenType() because unaryExpr() doesn't leave the TokenStream
>+     * state one past the end of the unary expression.
>+     */
>+    while (pn && (tt = tokenStream.getToken(), (tt == TOK_STAR || tt == TOK_DIVOP))) {

Nit: we often nest assignment in the first conjunct or disjunct, so long as that assignment dominates all later uses in the condition. So:

..    while (pn && ((tt = tokenStream.getToken()) == TOK_STAR || tt == TOK_DIVOP))) {

Avoids a repeated tt and the comma operator.

Crazy-good thinking here. What is the rational for inlining only below andExpr, though? Sorry if I missed it. r=me FWIW.

/be
Comment 10 Jim Blandy :jimb 2011-03-19 02:46:19 PDT
I'll take a look at this on Monday. Sorry I didn't get to it this week.
Comment 11 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-20 17:51:43 PDT
(In reply to comment #9)
> 
> Crazy-good thinking here. What is the rational for inlining only below andExpr,
> though? Sorry if I missed it.

I don't understand the question... all the functions between condExpr1 and mulExpr1 are always inlined (excluding the off-the-hot-path versions with a name that ends with 'n'.
Comment 12 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-21 17:21:49 PDT
(In reply to comment #8)
> Comment on attachment 515815 [details] [diff] [review]
> patch 1 (against TM 62858:f85299200bd3)
> 
> >@@ -6598,137 +6597,142 @@ Parser::condExpr()
> >         JSParseNode *pn3 = assignExpr();
> >         if (!pn3)
> >             return NULL;
> >         pn->pn_pos.begin = pn1->pn_pos.begin;
> >         pn->pn_pos.end = pn3->pn_pos.end;
> >         pn->pn_kid1 = pn1;
> >         pn->pn_kid2 = pn2;
> >         pn->pn_kid3 = pn3;
> >+        tokenStream.getToken();     /* need to read one token past the end */
> 
> This can fail due to invalid char/token, right?

It can "fail" only in the sense that the current token can be set to a TOK_ERROR token.  AFAICT that's not a problem.
Comment 13 Jim Blandy :jimb 2011-03-23 17:06:53 PDT
Comment on attachment 520122 [details] [diff] [review]
patch 2, v2

We should have a follow-up patch that makes this more friendly to editors that try to understand indentation; Emacs is going to make this code pretty hard to work with.
Comment 14 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-23 18:06:14 PDT
After discussion with jimb on IRC I added some curly braces to the 2nd patch to make Emacs happy.

http://hg.mozilla.org/tracemonkey/rev/36a1aba3cd60
http://hg.mozilla.org/tracemonkey/rev/0ffe53f7606a

Note You need to log in before you can comment on or make changes to this bug.