Experiment with removing char16_t JS parser
Categories
(Core :: JavaScript Engine, task, P2)
Tracking
()
People
(Reporter: tcampbell, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
We still have two copies of the parser compiled. This can have impact on CPU icache since the parser is quite large. Ideally we would remove the char16_t parser and allow the utf-8 parser to receive unmatched char16_t surrogates (a la WTF-8 encoding).
| Reporter | ||
Comment 1•3 years ago
|
||
In practice, whether we use UTF-8 vs char16_t is determined by call site.
- char16_t
- eval
- js:// urls
- inline scripts
- event handlers
- utf-8
- out-of-line scripts
The bulk volume of JS bytes parsed is by the utf-8 parser, but more than half of invocations to parse are char16_t entry points. This continues to mean that icache pollution is a potential problem.
The utf8 parser seems to be functional if I simply disable the check for invalid surrogates and eagerly turn on utf-8 parsing for evals.
The next step is to convert the remaining scripts (which are typically much smaller) to wtf-8 just before the parse and then removing the actual char16_t parser from the build.
Updated•3 years ago
|
| Reporter | ||
Comment 2•3 years ago
|
||
The basic prototype seems to be working and I was able to do some initial testing. Unsurprisingly I was not able to see obvious wins in speedometer, but with many retries I do see that the highest confidence results are almost all improvements.
In this incarnation of the prototype, I see about 150kB reduction in firefox installer size. I was seeing 400kB in spidermonkey shell builds, and I'm not sure if there is more stuff that can be removed.
The main blocker to doing this for real is that I don't precisely handle the unmatched char16_t surrogates cases yet and that requires a more consistent approach to allowing WTF-8 strings in Gecko.
For now, there are a number of pieces of the prototype that can be landed today which will move more cases to use the UTF-8 parser.
Updated•3 years ago
|
Updated•3 years ago
|
| Reporter | ||
Updated•3 years ago
|
Comment 3•4 months ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #0)
We still have two copies of the parser compiled. This can have impact on CPU icache since the parser is quite large. Ideally we would remove the char16_t parser and allow the utf-8 parser to receive unmatched char16_t surrogates (a la WTF-8 encoding).
From having investigated this issue at the time of jsparagus / SmooshMonkey, I can tell that a single instance of the JS parser does not fit in the instruction cache (L1i). Thus, this point is valid, but not solvable by having a single instance of our parser.
However, the problems of having multiple instances of the parser, for different token sizes, is in the L2 / L3 caches and in the download size of the binary.
Description
•