Closed
Bug 964253
Opened 11 years ago
Closed 11 years ago
Untangle TokenStream::getTokenInternal
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jandem, Assigned: jandem)
Details
Attachments
(8 files, 1 obsolete file)
10.31 KB,
patch
|
Details | Diff | Splinter Review | |
8.56 KB,
patch
|
Details | Diff | Splinter Review | |
7.66 KB,
patch
|
Details | Diff | Splinter Review | |
7.34 KB,
patch
|
Details | Diff | Splinter Review | |
4.18 KB,
patch
|
Details | Diff | Splinter Review | |
6.77 KB,
patch
|
Details | Diff | Splinter Review | |
3.77 KB,
patch
|
Details | Diff | Splinter Review | |
35.68 KB,
patch
|
Details | Diff | Splinter Review |
I was experimenting with some TokenStream::getTokenInternal optimizations but the control flow is really complicated due to backwards gotos in the middle of other cases etc. So I thought it'd be good to refactor the code a bit first.
Assignee | ||
Comment 1•11 years ago
|
||
Moves the code for scanning string literals into its own method, getStringToken.
Attachment #8365939 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 2•11 years ago
|
||
This one was a bit nasty due to |goto decimal_dot;| and could use a careful review.
Attachment #8365940 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 3•11 years ago
|
||
Attachment #8365941 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 4•11 years ago
|
||
Attachment #8365942 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 5•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8365943 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 6•11 years ago
|
||
Factor out the regexp literal scanning. Also some minor "badchar" cleanup.
Attachment #8365945 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 7•11 years ago
|
||
Some minor changes. I saw TokenStream::getChar() show up in Instruments, so I added some JS_ALWAYS_INLINEs.
Attachment #8365949 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 8•11 years ago
|
||
njn, I flagged you as reviewer since you also touched getTokenInternal; feel free to reassign. I tested Citadel's UDKGame-Browser-Shipping.js (23 MB) with the run() call commented out, with/without asm.js (without asm.js to ignore compilation time). Perf seems to be either the same or maybe a tiny bit faster with these patches. Same for Parsemark and Octane-Codeload. Unfortunately all these benchmarks are pretty noisy :(
Assignee | ||
Comment 9•11 years ago
|
||
Oops, part 7 didn't apply cleanly; this one does.
Attachment #8365949 -
Attachment is obsolete: true
Attachment #8365949 -
Flags: review?(n.nethercote)
Attachment #8365951 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 10•11 years ago
|
||
Should apply to inbound revision b2b1b90f5082. It'd be great if somebody could verify these patches don't affect performance. As I said, on OS X things look ok though.
Comment 11•11 years ago
|
||
I have a script that runs the JS shell on parsemark (bug 839450), under Cachegrind. Here are the results for the combined patch: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 58.700 58.819 (0.998x) | 6.003 6.003 (------) | 280slides-comb | 96.493 96.765 (0.997x) | 20.021 20.021 (------) | ball-pool-comb | 85.716 85.950 (0.997x) | 11.192 11.192 (------) | dojo | 97.969 98.333 (0.996x) | 23.664 23.664 (------) | effectgames-ga | 48.730 48.864 (0.997x) | 3.743 3.743 (------) | ga | 338.461 339.860 (0.996x) | 92.427 92.427 (------) | gmail-combined | 126.102 126.586 (0.996x) | 31.001 31.001 (------) | gravity-combin | 56.304 56.441 (0.998x) | 7.190 7.190 (------) | jquery-min | 83.715 83.761 (0.999x) | 15.170 15.170 (------) | jsgb-combined | 97.343 97.659 (0.997x) | 21.122 21.122 (------) | mochikit | 410.809 412.300 (0.996x) | 100.694 100.694 (------) | pipio-combined | 65.201 65.232 (------) | 4.650 4.650 (------) | processing-twi | 78.898 79.969 (0.987x) | 1.389 1.389 (------) | sunspider-comb | 52.990 53.126 (0.997x) | 5.480 5.480 (------) | tetris-combine | 110.808 110.795 (------) | 29.972 29.972 (------) | twitter-combin | 132.599 131.901 (1.005x) | 35.653 35.653 (------) | v8-combined | 45.329 45.441 (0.998x) | 2.821 2.821 (------) | yui-min | 767.127 769.733 (0.997x) | 176.807 176.807 (------) | zimbra-combine ------- | 2753.304 2761.546 (0.997x) | 589.009 589.009 (------) | all So overall it's a slight regression. I'll try the patches individually as well. In general, patches 1, 3, 5, 6 and 7 look fine. I'm a bit uncertain about 2 and 4; both of them cause us to inline a big function in three places, which isn't ideal... what were the optimizations you have in mind? I'll be pleased if you can get speed-ups here, because I've previously removed every last ounce of fat I could find in this function :)
Comment 12•11 years ago
|
||
More instruction counts from Cachegrind. clang gcc orig 2753.298 (1.000x) 2608.813 part1 2753.216 (1.000x) 2610.726 (0.999x) part2 2754.158 (1.000x) 2612.656 (0.999x) part3 2754.482 (1.000x) 2616.232 (0.997x) part4 2776.504 (0.992x) 2626.606 (0.993x) part5 2776.429 (0.992x) 2640.561 (0.992x) part6 2779.920 (0.990x) 2626.653 (0.993x) part7 2761.552 (0.997x) 2617.431 (0.997x) only- part7 2758.066 (0.998x) 2616.173 (0.997x) So, a bit here, a bit there. Even part 7 is a net regression when applied without the others. As for parts 1 to 6... does the current structure actively block the optimizations you were planning?
Assignee | ||
Comment 13•11 years ago
|
||
Nicholas, thanks a lot for the numbers! Using Cachegrind is a good idea; I should probably set that up as well... These 0.003x regressions are pretty much impossible to measure without it. Also interesting that gcc regresses slightly more than clang, that's what I feared. 0.003x is tiny but it's a regression so for now I'll just drop these patches I think.
Comment 14•11 years ago
|
||
Looking at h4writer's splendid tracelogging numbers[1], we spend roughly two thirds of our time on SunSpider in parsing. Does that mean that SunSpider truly is as fast as it will ever get, or are there other parts of the parsing phase that can still be sped up? If the former, how does IE manage to be 30% faster, if their numbers[2] are to be believed[3]? [1]: https://tl-beta.paas.allizom.org/ [2]: http://ie.microsoft.com/testDrive/Benchmarks/SunSpider/Default.html [3]: I couldn't find any independent results corroborating them, which might be a good sign as it might indicate that less people do SunSpider comparisons
Comment 15•11 years ago
|
||
h4writer cleared up the mystery here: the tracelogging results are a combination of running all sunspider sub-tests individually, starting a new shell each time. That means that all parsing times include parsing the self-hosting code, making parsing heavily over-represented. As a short-term solution to that, and to get better numbers here, I'll change things so that tracelogging only starts after self-hosting has been initialized.
Comment 16•11 years ago
|
||
Just ran Sunspider fully getting the following results: { "engineOverview": { "parsing script": 0.0790143, "Interpreter": 0.0903848, "GC": 0.0241764, "Baseline": 0.37226, "IonMonkey": 0.358923, "Lazy parsing": 0.0197462, "Ion Compilation": 0.00268987, "Yarr execution": 0.0528035 } } So 80%+ is spend in execution ;). Only 9% in parsing.
Comment 17•11 years ago
|
||
Parsing certainly is important -- think of all those enormous scripts that websites use. But the lexer and parser have both enjoyed (suffered?) several rounds of optimization over the past couple of years, with the help of Parsemark (bug 839450). I don't think there's much fat to be removed, especially in the lexer. But I'd love to be proven wrong, which is why I am interested to hear Jan's ideas.
Updated•11 years ago
|
Attachment #8365939 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365940 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365941 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365942 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365943 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365945 -
Flags: review?(n.nethercote)
Updated•11 years ago
|
Attachment #8365951 -
Flags: review?(n.nethercote)
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #17) > I don't think there's much fat to be removed, > especially in the lexer. But I'd love to be proven wrong, which is why I am > interested to hear Jan's ideas. I was looking at an Octane-codeload profile and noticed quite a lot of time atomizing identifier strings. We could try to do something similar to what JSC does, add a PropertyName *nameCache_[128] to TokenStream, then when you have a name with char[0] < 128 and length > 1 (length == 1 is very hot and should probably be inlined in getTokenInternal), you look up the cached PropertyName*, check if the length + chars match the current one, and avoid the whole AtomizeChars path (including the atom table lock!). I hacked this up last week and combined with the length == 1 fast path it avoided the AtomizeChars call for 75% of all identifiers IIRC. Also, integer literals are way more common than double literals (imagine all these x|0 expressions etc), it would be nice if we could avoid doubles completely. Again, not sure if this wins much but worth measuring maybe.
Assignee | ||
Comment 19•11 years ago
|
||
Anyway, this one can be closed as WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Comment 20•11 years ago
|
||
The PropertyName idea sounds good. > Also, integer literals are way more common than double literals (imagine all > these x|0 expressions etc), it would be nice if we could avoid doubles > completely. We already do that: we use GetDecimalInteger() to compute the value of integer literals and js_strtod() to compute the value of double literals (bug 639420, patch 3 introduced this). It was a ~10% win on integer-heavy inputs such as some of the Kraken ones. Or were you thinking of something else?
You need to log in
before you can comment on or make changes to this bug.
Description
•