Closed Bug 964253 Opened 11 years ago Closed 11 years ago

Untangle TokenStream::getTokenInternal

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: jandem, Assigned: jandem)

Details

Attachments

(8 files, 1 obsolete file)

Part 1 - Add TokenStream::getStringToken 11 years ago Jan de Mooij [:jandem] 10.31 KB, patch		Details \| Diff \| Splinter Review
Part 2 - Add TokenStream::getDecimalToken 11 years ago Jan de Mooij [:jandem] 8.56 KB, patch		Details \| Diff \| Splinter Review
Part 3 - Add TokenStream::getBasePrefixToken 11 years ago Jan de Mooij [:jandem] 7.66 KB, patch		Details \| Diff \| Splinter Review
Part 4 - Add TokenStream::getIdentifierToken 11 years ago Jan de Mooij [:jandem] 7.34 KB, patch		Details \| Diff \| Splinter Review
Part 5 - Factor out TokenStream::skipLineAndUngetToken 11 years ago Jan de Mooij [:jandem] 4.18 KB, patch		Details \| Diff \| Splinter Review
Part 6 - Add TokenStream::getRegExpToken 11 years ago Jan de Mooij [:jandem] 6.77 KB, patch		Details \| Diff \| Splinter Review
Part 7 - Miscellaneous changes 11 years ago Jan de Mooij [:jandem] 3.77 KB, patch		Details \| Diff \| Splinter Review
Part 7 - Miscellaneous changes 11 years ago Jan de Mooij [:jandem] 3.77 KB, patch		Details \| Diff \| Splinter Review
Combined patch 11 years ago Jan de Mooij [:jandem] 35.68 KB, patch		Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

11 years ago

I was experimenting with some TokenStream::getTokenInternal optimizations but the control flow is really complicated due to backwards gotos in the middle of other cases etc. So I thought it'd be good to refactor the code a bit first.

Jan de Mooij [:jandem]

Assignee

Comment 1

•

11 years ago

Attached patch Part 1 - Add TokenStream::getStringToken — Details — Splinter Review

Moves the code for scanning string literals into its own method, getStringToken.

Attachment #8365939 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 2

•

11 years ago

Attached patch Part 2 - Add TokenStream::getDecimalToken — Details — Splinter Review

This one was a bit nasty due to |goto decimal_dot;| and could use a careful review.

Attachment #8365940 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 3

•

11 years ago

Attached patch Part 3 - Add TokenStream::getBasePrefixToken — Details — Splinter Review

Attachment #8365941 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 4

•

11 years ago

Attached patch Part 4 - Add TokenStream::getIdentifierToken — Details — Splinter Review

Attachment #8365942 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 5

•

11 years ago

Attached patch Part 5 - Factor out TokenStream::skipLineAndUngetToken — Details — Splinter Review

Jan de Mooij [:jandem]

Assignee

Updated

•

11 years ago

Attachment #8365943 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 6

•

11 years ago

Attached patch Part 6 - Add TokenStream::getRegExpToken — Details — Splinter Review

Factor out the regexp literal scanning. Also some minor "badchar" cleanup.

Attachment #8365945 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 7

•

11 years ago

Attached patch Part 7 - Miscellaneous changes (obsolete) — Details — Splinter Review

Some minor changes. I saw TokenStream::getChar() show up in Instruments, so I added some JS_ALWAYS_INLINEs.

Attachment #8365949 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 8

•

11 years ago

njn, I flagged you as reviewer since you also touched getTokenInternal; feel free to reassign.

I tested Citadel's UDKGame-Browser-Shipping.js (23 MB) with the run() call commented out, with/without asm.js (without asm.js to ignore compilation time). Perf seems to be either the same or maybe a tiny bit faster with these patches.

Same for Parsemark and Octane-Codeload. Unfortunately all these benchmarks are pretty noisy :(

Jan de Mooij [:jandem]

Assignee

Comment 9

•

11 years ago

Attached patch Part 7 - Miscellaneous changes — Details — Splinter Review

Oops, part 7 didn't apply cleanly; this one does.

Attachment #8365949 - Attachment is obsolete: true

Attachment #8365949 - Flags: review?(n.nethercote)

Attachment #8365951 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 10

•

11 years ago

Attached patch Combined patch — Details — Splinter Review

Should apply to inbound revision b2b1b90f5082.

It'd be great if somebody could verify these patches don't affect performance. As I said, on OS X things look ok though.

Nicholas Nethercote [inactive]

Comment 11

•

11 years ago

I have a script that runs the JS shell on parsemark (bug 839450), under Cachegrind. Here are the results for the combined patch:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    58.700    58.819 (0.998x) |     6.003     6.003 (------) | 280slides-comb
|    96.493    96.765 (0.997x) |    20.021    20.021 (------) | ball-pool-comb
|    85.716    85.950 (0.997x) |    11.192    11.192 (------) | dojo
|    97.969    98.333 (0.996x) |    23.664    23.664 (------) | effectgames-ga
|    48.730    48.864 (0.997x) |     3.743     3.743 (------) | ga
|   338.461   339.860 (0.996x) |    92.427    92.427 (------) | gmail-combined
|   126.102   126.586 (0.996x) |    31.001    31.001 (------) | gravity-combin
|    56.304    56.441 (0.998x) |     7.190     7.190 (------) | jquery-min
|    83.715    83.761 (0.999x) |    15.170    15.170 (------) | jsgb-combined
|    97.343    97.659 (0.997x) |    21.122    21.122 (------) | mochikit
|   410.809   412.300 (0.996x) |   100.694   100.694 (------) | pipio-combined
|    65.201    65.232 (------) |     4.650     4.650 (------) | processing-twi
|    78.898    79.969 (0.987x) |     1.389     1.389 (------) | sunspider-comb
|    52.990    53.126 (0.997x) |     5.480     5.480 (------) | tetris-combine
|   110.808   110.795 (------) |    29.972    29.972 (------) | twitter-combin
|   132.599   131.901 (1.005x) |    35.653    35.653 (------) | v8-combined
|    45.329    45.441 (0.998x) |     2.821     2.821 (------) | yui-min
|   767.127   769.733 (0.997x) |   176.807   176.807 (------) | zimbra-combine
-------
|  2753.304  2761.546 (0.997x) |   589.009   589.009 (------) | all

So overall it's a slight regression. I'll try the patches individually as well.

In general, patches 1, 3, 5, 6 and 7 look fine. I'm a bit uncertain about 2 and 4; both of them cause us to inline a big function in three places, which isn't ideal... what were the optimizations you have in mind? I'll be pleased if you can get speed-ups here, because I've previously removed every last ounce of fat I could find in this function :)

Nicholas Nethercote [inactive]

Comment 12

•

11 years ago

More instruction counts from Cachegrind.

        clang               gcc         
orig    2753.298 (1.000x)   2608.813
part1   2753.216 (1.000x)   2610.726 (0.999x)
part2   2754.158 (1.000x)   2612.656 (0.999x)
part3   2754.482 (1.000x)   2616.232 (0.997x)
part4   2776.504 (0.992x)   2626.606 (0.993x)
part5   2776.429 (0.992x)   2640.561 (0.992x)
part6   2779.920 (0.990x)   2626.653 (0.993x)
part7   2761.552 (0.997x)   2617.431 (0.997x)
only-
 part7  2758.066 (0.998x)   2616.173 (0.997x)

So, a bit here, a bit there. Even part 7 is a net regression when applied without the others.

As for parts 1 to 6... does the current structure actively block the optimizations you were planning?

Jan de Mooij [:jandem]

Assignee

Comment 13

•

11 years ago

Nicholas, thanks a lot for the numbers! Using Cachegrind is a good idea; I should probably set that up as well... These 0.003x regressions are pretty much impossible to measure without it.

Also interesting that gcc regresses slightly more than clang, that's what I feared.

0.003x is tiny but it's a regression so for now I'll just drop these patches I think.

Till Schneidereit [:till]

Comment 14

•

11 years ago

Looking at h4writer's splendid tracelogging numbers[1], we spend roughly two thirds of our time on SunSpider in parsing. Does that mean that SunSpider truly is as fast as it will ever get, or are there other parts of the parsing phase that can still be sped up?

If the former, how does IE manage to be 30% faster, if their numbers[2] are to be believed[3]?


[1]: https://tl-beta.paas.allizom.org/
[2]: http://ie.microsoft.com/testDrive/Benchmarks/SunSpider/Default.html
[3]: I couldn't find any independent results corroborating them, which might be a good sign as it might indicate that less people do SunSpider comparisons

Till Schneidereit [:till]

Comment 15

•

11 years ago

h4writer cleared up the mystery here: the tracelogging results are a combination of running all sunspider sub-tests individually, starting a new shell each time. That means that all parsing times include parsing the self-hosting code, making parsing heavily over-represented.

As a short-term solution to that, and to get better numbers here, I'll change things so that tracelogging only starts after self-hosting has been initialized.

Hannes Verschore [:h4writer]

Comment 16

•

11 years ago

Just ran Sunspider fully getting the following results:

{
    "engineOverview": {
        "parsing script": 0.0790143,
        "Interpreter": 0.0903848,
        "GC": 0.0241764,
        "Baseline": 0.37226,
        "IonMonkey": 0.358923,
        "Lazy parsing": 0.0197462,
        "Ion Compilation": 0.00268987,
        "Yarr execution": 0.0528035
    }
}

So 80%+ is spend in execution ;). Only 9% in parsing.

Nicholas Nethercote [inactive]

Comment 17

•

11 years ago

Parsing certainly is important -- think of all those enormous scripts that websites use. But the lexer and parser have both enjoyed (suffered?) several rounds of optimization over the past couple of years, with the help of Parsemark (bug 839450). I don't think there's much fat to be removed, especially in the lexer. But I'd love to be proven wrong, which is why I am interested to hear Jan's ideas.

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365939 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365940 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365941 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365942 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365943 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365945 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Attachment #8365951 - Flags: review?(n.nethercote)

Jan de Mooij [:jandem]

Assignee

Comment 18

•

11 years ago

(In reply to Nicholas Nethercote [:njn] from comment #17)
> I don't think there's much fat to be removed,
> especially in the lexer. But I'd love to be proven wrong, which is why I am
> interested to hear Jan's ideas.

I was looking at an Octane-codeload profile and noticed quite a lot of time atomizing identifier strings. We could try to do something similar to what JSC does, add a PropertyName *nameCache_[128] to TokenStream, then when you have a name with char[0] < 128 and length > 1 (length == 1 is very hot and should probably be inlined in getTokenInternal), you look up the cached PropertyName*, check if the length + chars match the current one, and avoid the whole AtomizeChars path (including the atom table lock!). I hacked this up last week and combined with the length == 1 fast path it avoided the AtomizeChars call for 75% of all identifiers IIRC.

Also, integer literals are way more common than double literals (imagine all these x|0 expressions etc), it would be nice if we could avoid doubles completely. Again, not sure if this wins much but worth measuring maybe.

Jan de Mooij [:jandem]

Assignee

Comment 19

•

11 years ago

Anyway, this one can be closed as WONTFIX.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

Nicholas Nethercote [inactive]

Comment 20

•

11 years ago

The PropertyName idea sounds good.

> Also, integer literals are way more common than double literals (imagine all
> these x|0 expressions etc), it would be nice if we could avoid doubles
> completely.

We already do that: we use GetDecimalInteger() to compute the value of integer literals and js_strtod() to compute the value of double literals (bug 639420, patch 3 introduced this). It was a ~10% win on integer-heavy inputs such as some of the Kraken ones.

Or were you thinking of something else?

You need to log in before you can comment on or make changes to this bug.