Open Bug 1798958 Opened 2 years ago Updated 6 months ago

Experiment with removing char16_t JS parser

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

People

(Reporter: tcampbell, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

We still have two copies of the parser compiled. This can have impact on CPU icache since the parser is quite large. Ideally we would remove the char16_t parser and allow the utf-8 parser to receive unmatched char16_t surrogates (a la WTF-8 encoding).

In practice, whether we use UTF-8 vs char16_t is determined by call site.

  • char16_t
    • eval
    • js:// urls
    • inline scripts
    • event handlers
  • utf-8
    • out-of-line scripts

The bulk volume of JS bytes parsed is by the utf-8 parser, but more than half of invocations to parse are char16_t entry points. This continues to mean that icache pollution is a potential problem.

The utf8 parser seems to be functional if I simply disable the check for invalid surrogates and eagerly turn on utf-8 parsing for evals.

The next step is to convert the remaining scripts (which are typically much smaller) to wtf-8 just before the parse and then removing the actual char16_t parser from the build.

Severity: -- → N/A
Priority: -- → P2
Depends on: 1803495
Blocks: 1801192

The basic prototype seems to be working and I was able to do some initial testing. Unsurprisingly I was not able to see obvious wins in speedometer, but with many retries I do see that the highest confidence results are almost all improvements.

https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=9f1e3d9b2d3648dd8994b6a74881e8b7d84ab42a&originalSignature=3445603&newSignature=3445603&framework=13&originalRevision=b8cb2e1be27c3ee45b10871b6f6d0aa8e2e3b69c&page=1

In this incarnation of the prototype, I see about 150kB reduction in firefox installer size. I was seeing 400kB in spidermonkey shell builds, and I'm not sure if there is more stuff that can be removed.

The main blocker to doing this for real is that I don't precisely handle the unmatched char16_t surrogates cases yet and that requires a more consistent approach to allowing WTF-8 strings in Gecko.

For now, there are a number of pieces of the prototype that can be landed today which will move more cases to use the UTF-8 parser.

Depends on: 1806169
Whiteboard: [sp3]
Assignee: tcampbell → nobody
You need to log in before you can comment on or make changes to this bug.