JSON.stringify is 2x slower than Safari on sp3
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox117 | --- | fixed |
People
(Reporter: mstange, Assigned: sfink, NeedInfo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Attachments
(3 files)
On the attached microbenchmark, I get the following scores:
Firefox Nightly: 473ms
Chrome Canary: 352ms
Safari Technology Preview: 197ms
Matching Safari's performance of JSON.stringify on this shape of JSON would improve our score on each of these three sp3 benchmarks by 4%: TodoMVC-JavaScript-ES5, TodoMVC-JavaScript-ES6, TodoMVC-JavaScript-ES6-Webpack
Updated•1 year ago
|
Reporter | ||
Comment 1•1 year ago
|
||
Here's another version of the benchmark which is closer to what happens in the actual benchmark, because it embeds a string from input.value
in the JSON. In Firefox, this causes us to take the utf-16 JSON serialization path.
Firefox Nightly: 492ms
Chrome Canary: 350ms
Safari Technology Preview: 194ms
Reporter | ||
Comment 2•1 year ago
|
||
Firefox Profile: https://share.firefox.dev/45Rblsp
Safari Profile: https://share.firefox.dev/3oLhZQp
JavaScriptCore has a FastStringifier:
// ------------------------------ FastStringifier --------------------------------
// FastStringifier does a no-side-effects stringify of the most common types of
// objects and arrays. It bails out if the serialization is any longer than a
// fixed buffer and handles only the simplest cases, including only 8-bit character
// strings. Instead of explicit checks to prevent excessive recursion and cycles,
// it counts on hitting the buffer size limit to catch those things. If it fails,
// since there is no side effect, the full general purpose Stringifier can be used
// and the only cost of the fast stringifying attempt is the time wasted.
class FastStringifier {
They also seem to have a more efficient way to enumerate properties:
if (UNLIKELY(!structure.canPerformFastPropertyEnumeration())) {
recordFastPropertyEnumerationFailure(object);
return;
}
structure.forEachProperty(m_vm, [&](const auto& entry) -> bool {
// ...
});
In Firefox, we seem to do property enumeration and property lookup separately.
Comment 3•1 year ago
|
||
(In reply to Markus Stange [:mstange] from comment #2)
They also seem to have a more efficient way to enumerate properties:
There might be some overlap with bug 1836679. I want to look into that more after the mprotect work, and can also look into this one.
Updated•1 year ago
|
Assignee | ||
Comment 4•1 year ago
|
||
I'm prototyping this one. Preliminary result is that the fast path (which should cover the examples in this bug) is slightly over twice the speed of the slow path. Results are only preliminary because I'm still outputting the named properties in reverse order, I'm not doing the length check so it would actually infinite loop if you had a cycle, and I imagine there are some additional checks that I'm not doing yet.
Without fast path: 13800.40ms for 2236400000 chars, 6.170808003725295 ns/char.
With fast path: 6283.81ms for 2236400000 chars, 2.8097893884283893 ns/char.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 5•1 year ago
|
||
Assignee | ||
Comment 6•1 year ago
|
||
Had to take some time off, but this is getting closer to being real and I've pushed a timing run.
Patch contains some weird test code that I'll need to do in a different way. Passes all jit-tests and jstests, which confuses me because I've had to fix some things that I would totally expect to be tested by test262, but I'm not seeing any JSON tests there (other than for JSON module imports). Still matching up the code to the spec sections, and I need to remove some code duplication between the fast and slow paths.
Assignee | ||
Comment 7•1 year ago
|
||
I seem to have everything working now, though adding in the additional checks seems to have slowed the fast path down substantially:
Fast: 5098.89ms for 1118200000 chars, 4.56 ns/char.
Slow: 7779.06ms for 1118200000 chars, 6.96 ns/char.
I pushed to try for speedometer, and got around 5% improvement on 5 different Speedometer3 subtests. Actual differences: -2.75% -4.34% -5.20% -6.36% -6.88%
When running the JS shell locally under samply, I noticed the following not so desirable overhead:
- 15% of the overall time is spent poisoning memory freed by jemalloc when reallocing for ExtractWellSized
- 14% of the overall time is in
GetOwnPropertyPure
, some of which is redundant since we're iterating over the own properties - 23% in Quote(). Quote() is expensive for both fast and slow paths. It's a little tempting to cache a "this string does not need quoting (for JSON)" bit, though most of the allocation growing time is in Quote so it may not have that much of an effect.
- 3.6% is spent looking for an interfering
.toJSON
property, some of which could be saved with a guard bit of some sort.
Assignee | ||
Comment 8•1 year ago
|
||
Getting closer to something reviewable, and I realized that I've only been looking at Linux numbers. Windows numbers are... different.
I have two pushes where Windows shows similar improvements to Linux, plus some massive 15-20% regressions on a handful of subtests. And my latest push which is mostly up to date (though I've added interrupt checks that this doesn't include) shows across-the-board improvements that I don't trust, and no regressions.
Assignee | ||
Comment 9•1 year ago
•
|
||
Information on the hit rates.
For awsy (tp-6), the overall hit rate is 72.4%. Most of the misses are unavoidable. Of the inputs where it is theoretically possible to use the fast path, the hit rate is 99.0%. Well, unless you magically analyzed the replacer function for side effects or something, or observed everything you cared about while it was running. If you count the replacer function argument as a miss, then the hit rate is 91.8%.
Of all calls that took the slow path:
- 20.8% had a replacer function and so were ineligible
- 76.4% were primitives
- 0.1% were ineligible for other reasons
- 2.1% found an object with a callable toJSON function
- 0.7% bailed for other reasons (sparse indexes was the most common)
For just speedometer3 (plus some startup noise):
- overall hit rate is 84.6%
- hit rate only counting non-primitives that aren't going to run any code is 98.7%
- hit rate for all non-primitives is 97.1%
Of all calls that took the slow path:
- 8.4% had a replacer argument
- 83.5% were primitives
- 0.7% found something with a callable toJSON property
- 7.4% found an INELIGIBLE_OBJECT, which could be a variety of things. Typed arrays, BigInts, non-native objects, etc.
That last category is the only category that could be reasonable to reduce right now, but the effort doesn't seem worth it just for bumping the hit rate by a percentage point or so.
A magical analysis that could tell whether a given function could possibly have any side effects on the data in question could boost the hit rate more, but that would only make sense if we were doing it for something else and could just reuse it here.
Comment 10•1 year ago
|
||
Comment 11•1 year ago
|
||
Backed out for leak failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/ec054fe362b6c5391b791840686832dea90688ba
Log link: https://treeherder.mozilla.org/logviewer?job_id=423600717&repo=autoland&lineNumber=3598
Comment 12•1 year ago
|
||
Comment 13•1 year ago
|
||
bugherder |
Comment 14•1 year ago
|
||
Improvements:
34% on Jetstream2-json-stringify-inspector-Average
31% on Jetstream2-json-stringify-inspector-First
28% on Jetstream2-json-stringify-inspector-Geometric
20%-30% on Jetstream2-json-stringify-inspector-Worst
6% on Speedometer2-Vanilla-ES2015-Babel-Webpack-TodoMVC/CompletingAllItems/Sync and other smaller improvements in related sub-tests
5.6% on Speedometer2-Vanilla-ES2015-TodoMVC/CompletingAllItems and other smaller improvements in related sub-tests
6.6% on Speedometer2-VanillaJS-TodoMVC/CompletingAllItems/Sync and other improvements in related sub-tests
Description
•