Closed Bug 849367 Opened 11 years ago Closed 11 years ago

Speed up JS parsing some more

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: [js:t][gdc2013])

Attachments

(3 files, 6 obsolete files)

1.24 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
3.53 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
20.40 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
I have some patches forthcoming (early next week) to speed up SpiderMonkey's parser a bit.  They take some of the ideas from bug 637549 further.
I have three patches coming shortly.  Here are their effects on instruction
counts (which I can measure precisely in the JS shell with Cachegrind).

                            parsemark           huge asm.js file
                            ---------           ----------------
original                    2367.206 (------)   27937.373 (------)
faster-matchChar            2350.976 (1.007x)   27462.620 (1.017x)
faster-op-checking          2341.680 (1.011x)   27162.125 (1.029x)
skip-binops                 2252.980 (1.051x)   24500.506 (1.140x)

(All of the instruction count ratios are versus "original".)

The instruction count numbers if anything underplay the improvement -- I saw a 1.053x (time) speedup for parsemark and a 1.199x speed-up for the asm.js file.
This patch takes advantage of the fact that matchChar() is never called upon to
match an EOL char, so it doesn't need to worry about the line number updating.
Attachment #723814 - Flags: review?(jorendorff)
This patch:

- Speeds up the check for multiplicative tokens.

- Neatens and speeds up the check for relational (include IN and INSTANCEOF)
  tokens.

- Reorders various chunks of expr-parsing related code, so they match the order
  used in Parser.cpp (i.e. star/div/mod at the top).

- Reorders the TOK_* constants for the same reason, and to facilitate
  additional range checks (e.g. all the binops are now together, which I'll use
  in the next patch).

- Renames some TOK_*_START contants as TOK_*_FIRST, to match TOK_*_LAST
Attachment #723815 - Flags: review?(jorendorff)
This is the big improvement.  Binary expression parsing is really repetitive.
We get a unary expression and then check if the next token is star/div/mod.
Then we check for plus/minus.  Then we check for... and so on for every binary
operator.  This isn't entirely obvious because each operator group is looked
for in a different function:  mulExpr1, addExpr1, etc.

This patch manually inlines all the binary expression parsers into condExpr,
and then adds a check at the front -- if the next token after the unaryExpr
isn't a binary operator, it skips over all the binary operator parsing stuff.
This is a sizeable win:  4% on Parsemark, and more than 10% on big asm.js
files.  It also allows the removal of the ugly inlined/not-inlined split of the
binop expression parsers.
Attachment #723817 - Flags: review?(jorendorff)
This is a good idea. Thanks. How much do you win with *just* the hack where you skip over all binary expression parsing if there's not a binary expression coming, no manual inlining?

Also, not relevant to this particular bug: I meant to ask about the Expr1 methods. What if we just changed the interface of TokenStream so that the normal thing to do is look at the next token, rather than consume it and then look at it, and then put it back? It seems like that would be effectively the same thing as Expr1 but throughout the parser (and not quite as mind-bending).
> How much do you win with *just* the hack where
> you skip over all binary expression parsing if there's not a binary
> expression coming, no manual inlining?

I don't think it's possible.  You have to call unaryExpr(), and then do the binary operator check.  The current binary expression parsing functions aren't structured appropriately for that.

If you're concerned about the code duplication, I could factor out stuff like this:

> while (pn && tokenStream.isCurrentTokenRelational(oldParsingForInit)) {
>     ParseNodeKind kind = RelationalTokenToParseNodeKind(tokenStream.currentToken());
>     JSOp op = tokenStream.currentToken().t_op;
>     pn = handler.newBinaryOrAppend(kind, pn, shiftExpr1(), op);
> }   

into inline functions.  Or maybe it's possible to restructure these functions so this does work out.  I'll give it a go.


> Also, not relevant to this particular bug: I meant to ask about the Expr1
> methods. What if we just changed the interface of TokenStream so that the
> normal thing to do is look at the next token, rather than consume it and
> then look at it, and then put it back? It seems like that would be
> effectively the same thing as Expr1 but throughout the parser (and not quite
> as mind-bending).

I did the Expr1 stuff just for the binary expressions in bug 637549 because it was easy and gave a big speed-up -- i.e. it was the 80/20 approach.  Extending that approach past those parsing functions looks substantially harder.

It might be worth trying again, though.  I've done measurements that show that 2/3 of the calls to getToken() end up re-getting a previously gotten/ungotten token.  In other words, for the average token we do this:  get, unget, reget, unget, reget.  cdleary explored this whole idea in bug 556486 and decided it wasn't worthwhile -- it made things more complicated without notable speed-ups.  But it was his first bug, so maybe he overlooked something.
New version which avoids the duplication by introducing a bunch of "Tail1"
functions.  Performance is negligibly changed compared to the last version.

The Expr1-everywhere stuff is fodder for a follow-up.
Attachment #724730 - Flags: review?(jorendorff)
Attachment #723817 - Attachment is obsolete: true
Attachment #723817 - Flags: review?(jorendorff)
This changes memberExpr() so it expects the first token to have already been
gotten (which makes it like primaryExpr()).  This avoids an ungetToken() call
in unaryExpr().

When parsing gmail-combines.js from Parsemark, this affects the getToken()
calls in the following wya:

BEFORE
1007695 counts:
( 1) 634023 (62.9%, 62.9%): reget
( 2) 373672 (37.1%,100.0%): get

AFTER
894565 counts:
( 1) 520893 (58.2%, 58.2%): reget
( 2) 373672 (41.8%,100.0%): get

And it saves about 1% on big asm.js files.
Attachment #724751 - Flags: review?(jorendorff)
Attachment #723814 - Flags: review?(jorendorff) → review+
Comment on attachment 723815 [details] [diff] [review]
(part 2) - Speed up and clean up binary operator matching.

Review of attachment 723815 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +4609,5 @@
> +        JSOp op = tokenStream.currentToken().t_op;
> +        pn = handler.newBinaryOrAppend(kind, pn, shiftExpr1n(), op);
> +    }
> +
> +    pc->parsingForInit |= oldParsingForInit;

I just noticed -- this should just be "=", not "|=". Not relevant to this change, so a separate changeset would be fine.
Attachment #723815 - Flags: review?(jorendorff) → review+
njn, would you be willing to try out an alternative to part 3?

This patch seems to pass tests; what needs to be done is:
- replace the switch statements with array lookups somehow
- tune the Vector's inline-storage size (defaults to 0; maybe 1 or 2 is ideal)
- see if it's fast

I think it'll be fast, and it's less code.
I'm fine with that alternative if it is as fast.  Are you planning to make those changes you mentioned?
I was hoping you would do them! Pretty please?
Here's a version of the patch without the switch statements.

This is OK for testing. I think it'll be fast; you can tweak the Vector's builtin capacity. Or try using a flat array or two. It turns out there are only 10 levels of precedence; the vector can't grow larger than that.
Attachment #726446 - Attachment is obsolete: true
> Here's a version of the patch without the switch statements.

Ok, thanks.  I'll try it out...  does your patch replace my patches 2 and 3?
By the way: I'm never sure how many comments to put in code like that. If the reader knows shift-reduce parsers, they won't need much help; if not, it may take rather a lot of comments to make it clear.

In any case, the reason I compare precedence using >= rather than > is that it just so happens all the binary operators in question are right-associative.

Since we parse left-to-right, and the operators all group left-to-right, the correct thing is to combine like terms as soon as we read them:

    a + b +        ==>    (a + b) +
    a + b + c +    ==>    ((a + b) + c) +
I got a compile error on your patch:

/home/njn/moz/mi3slim/js/src/frontend/Parser.cpp:4728:18: error: no type named 'pair' in namespace 'std'
    typedef std::pair<Node, ParseNodeKind> StackEntry;

so I replaced it with a struct.  Using std::pair seems like a bad idea in general, anyway.
 
Here are the instruction counts for the two approaches:

                parsemark           asm.js
  njn           2319.064            25345.839
  jorendorff    2348.234 (1.012x)   26139.527 (1.031x)

So the shift-reduce parser is a bit worse.  jorendorff, what now?
Part 1:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b147c2b08372
Whiteboard: [js:t] → [js:t][leave open]
(In reply to Nicholas Nethercote [:njn] from comment #16)
>                 parsemark           asm.js
>   njn           2319.064            25345.839
>   jorendorff    2348.234 (1.012x)   26139.527 (1.031x)
> 
> So the shift-reduce parser is a bit worse.  jorendorff, what now?

Heh! I'm trying *not* to identify so intensely with my patch (which I'm admittedly fond of) that it gets to be counterproductive, so can we call them A and B instead of "njn" and "jorendorff"?

Although I should be thrilled to be considered only π% worse than njn... :)

Let's decide on software engineering characteristics. A is less rocket science but B is less code. Which one do you want to maintain?
This seems to win another 1-2 percent (I don't know how to run parsemark under cachegrind, so my numbers may be bogus), which should put us in the neighborhood of patch A. Maybe.
Attachment #726544 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #16)
> /home/njn/moz/mi3slim/js/src/frontend/Parser.cpp:4728:18: error: no type
> named 'pair' in namespace 'std'
>     typedef std::pair<Node, ParseNodeKind> StackEntry;

Sorry about the compile error, btw. I'm not sure why it worked locally; lucky system headers I guess.

> Using std::pair seems like a bad idea in general, anyway.

I only used it because I was prototyping. But it turns out I sort of don't know how to make this code much better (...apart from not using std::pair).
Whiteboard: [js:t][leave open] → [js:t][leave open][gdc2013]
Updated instruction counts.  Note that the ratios are now relative to "trunk".

                     parsemark           asm.js
  trunk              2561.651            28040.042
  tails              2462.493 (1.040x)   25333.385 (1.107x)
  shift-reduce-v3    2471.351 (1.037x)   25351.412 (1.106x)

That's close enough that I'm happy.  Even better, I did actual timings and shift-reduce-v3 was 1.006x faster than tails on parsemark, and there was no difference on asm.js.  Nice job!

jorendorff, is shift-reduce-v3 ready for review?  I can do it if so.
OK, here it is. I just added comments and cleaned some stuff up.
Attachment #727092 - Attachment is obsolete: true
Attachment #727905 - Flags: review?(n.nethercote)
Comment on attachment 727905 [details] [diff] [review]
Replace the binary-expression part of the parser with a shift-reduce parser (v4)

Review of attachment 727905 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +4671,2 @@
>  {
> +    return tok == TOK_IN ? !parsingForInit : TOK_BINOP_FIRST <= tok && tok <= TOK_BINOP_LAST;

Can you copy TokenKindIsBinaryOp from my "(part 2)" patch and use it here?

@@ +4677,3 @@
>  {
> +    JS_ASSERT(tok >= TOK_BINOP_FIRST);
> +    JS_ASSERT(tok <= TOK_BINOP_LAST);

Use |JS_ASSERT(!TokenKindIsBinaryOp(tok));| instead.

@@ +4724,5 @@
>  template <typename ParseHandler>
>  JS_ALWAYS_INLINE typename ParseHandler::Node
>  Parser<ParseHandler>::orExpr1()
>  {
> +    // Shift-reduce parser for the right-associative binary operator part of

Left-associative?

@@ +4726,5 @@
>  Parser<ParseHandler>::orExpr1()
>  {
> +    // Shift-reduce parser for the right-associative binary operator part of
> +    // the JS syntax.
> +    Node stack[PRECEDENCE_CLASSES];

Is |nodeStack| a better name?

@@ +4749,5 @@
> +        if (IsBinaryOpToken(tok, oldParsingForInit)) {
> +            pnk = BinaryOpTokenKindToParseNodeKind(tok);
> +        } else {
> +            tok = TOK_EOF;
> +            pnk = PNK_LIMIT;

I think you can skip |pnk| and immediately compute |Precedence(pnk)|.  And then below you'd change |pnk == PNK_LIMIT| to check for precedence of zero.

The benefit of this is that you wouldn't have to check for PNK_LIMIT in Precedence(), which would make it slightly faster.

@@ +4757,5 @@
> +        // stack, reduce. This combines nodes on the stack until we form the
> +        // actual lhs of pnk.
> +        //
> +        // The >= in this condition works because all the operators in question
> +        // are right-associative; if any were not, the case where two operators

Left-associative?

@@ +4779,5 @@
> +        JS_ASSERT(depth <= PRECEDENCE_CLASSES);
> +    }
> +
> +    JS_ASSERT(depth == 0);
> +    pc->parsingForInit = oldParsingForInit;

Hmm.  In my "tails" version, this gets reset in the middle of the operator stuff, not at the end.  But I didn't really understand it in the first place, so if you're confident I trust you.
Attachment #727905 - Flags: review?(n.nethercote) → review+
Attachment #723815 - Attachment is obsolete: true
Attachment #724730 - Attachment is obsolete: true
Attachment #724730 - Flags: review?(jorendorff)
> > +    // Shift-reduce parser for the right-associative binary operator part of
> 
> Left-associative?

Nothing I do is ever good enough for you.

> @@ +4749,5 @@
> > +        if (IsBinaryOpToken(tok, oldParsingForInit)) {
> > +            pnk = BinaryOpTokenKindToParseNodeKind(tok);
> > +        } else {
> > +            tok = TOK_EOF;
> > +            pnk = PNK_LIMIT;
> 
> I think you can skip |pnk| and immediately compute |Precedence(pnk)|.  And
> then below you'd change |pnk == PNK_LIMIT| to check for precedence of zero.
> 
> The benefit of this is that you wouldn't have to check for PNK_LIMIT in
> Precedence(), which would make it slightly faster.

pnk has to be pushed to the stack, though, because later it will be popped and passed to newBinaryOrAppend.

> Hmm.  In my "tails" version, this gets reset in the middle of the operator
> stuff, not at the end.  But I didn't really understand it in the first
> place, so if you're confident I trust you.

There's no change in behavior because both patches always
  (a) clear the flag before calling out to unaryExpr; and
  (b) restore it before returning successfully.

The restriction implemented using this flag is fairly goofy: When parsing a for loop head, as long as you can't yet tell whether it's a for-;; or a for-in loop, "in" is not a binary operator. Maybe you've seen the NoIn grammar productions in the ES language spec. This is what they're doing.
Comment on attachment 724751 [details] [diff] [review]
(part 4) - Avoid an ungetToken() in unaryExpr().

Review of attachment 724751 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, r=me if it's really measurably faster. Is it?
Attachment #724751 - Flags: review?(jorendorff) → review+
> Looks good, r=me if it's really measurably faster. Is it?

0.5% faster (time, not instruction counts) on Parsemark, 3% faster on asm.js.


https://hg.mozilla.org/integration/mozilla-inbound/rev/b5dbd3b196d9
Whiteboard: [js:t][leave open][gdc2013] → [js:t][gdc2013]
FYI
- part 2: 35.5% hit on 64bit ss-unpack-code
- part 3: 10% improvement on 64bit ss-unpack-code
(see awfy 64bit)

I.e still an overall hit in performance on SS. Was this intended?
https://hg.mozilla.org/mozilla-central/rev/ff5c998babfd
https://hg.mozilla.org/mozilla-central/rev/b5dbd3b196d9
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
> - part 2: 35.5% hit on 64bit ss-unpack-code
> - part 3: 10% improvement on 64bit ss-unpack-code
> (see awfy 64bit)
> 
> I.e still an overall hit in performance on SS. Was this intended?

That's surprising.  I didn't test on SS but parsing is a pretty tiny part of SS, and for Parsemark and asm.js part 2 was a clear win of at least 5%.

Looking at AWFY, I see a 35.5% regression for the purple line for ss-unpack-code on the 64-bit Mac Pro, revision 1cb7fcc85525.  I then see a 10% improvement for 52ae75e1909c, and another 17.1% improvement for e6d92226bac2, bringing it back to where it started.  But those revisions don't match the revisions for part 2 and part 3 in this bug.  So I think there isn't a problem here.
Huh. On my machine, ff5c998babfd makes ss-unpack-code about 1% slower. It's in the noise but I'm guessing it's significant.

Really weird.
But 1% on one SS test isn't a big deal.  And the 35% regression was clearly temporary weirdness of some kind, and also the revisions didn't match these patches.
Ah cool. It indeed disappeared. Looking at the bug that fixed the regression, the regression was probably not related to this. Sorry for the false alarm.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: