Closed Bug 1837410 Opened 1 year ago Closed 10 months ago

JSON.stringify is 2x slower than Safari on sp3

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
117 Branch
Tracking Status
firefox117 --- fixed

People

(Reporter: mstange, Assigned: sfink, NeedInfo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(3 files)

On the attached microbenchmark, I get the following scores:

Firefox Nightly: 473ms
Chrome Canary: 352ms
Safari Technology Preview: 197ms

Matching Safari's performance of JSON.stringify on this shape of JSON would improve our score on each of these three sp3 benchmarks by 4%: TodoMVC-JavaScript-ES5, TodoMVC-JavaScript-ES6, TodoMVC-JavaScript-ES6-Webpack

Here's another version of the benchmark which is closer to what happens in the actual benchmark, because it embeds a string from input.value in the JSON. In Firefox, this causes us to take the utf-16 JSON serialization path.

Firefox Nightly: 492ms
Chrome Canary: 350ms
Safari Technology Preview: 194ms

Firefox Profile: https://share.firefox.dev/45Rblsp
Safari Profile: https://share.firefox.dev/3oLhZQp

JavaScriptCore has a FastStringifier:

https://searchfox.org/wubkat/rev/1ef008a7bed116ef9afa0741d99ceb2bd3d3ce05/Source/JavaScriptCore/runtime/JSONObject.cpp#659-669

// ------------------------------ FastStringifier --------------------------------

// FastStringifier does a no-side-effects stringify of the most common types of
// objects and arrays. It bails out if the serialization is any longer than a
// fixed buffer and handles only the simplest cases, including only 8-bit character
// strings. Instead of explicit checks to prevent excessive recursion and cycles,
// it counts on hitting the buffer size limit to catch those things. If it fails,
// since there is no side effect, the full general purpose Stringifier can be used
// and the only cost of the fast stringifying attempt is the time wasted.

class FastStringifier {

They also seem to have a more efficient way to enumerate properties:

https://searchfox.org/wubkat/rev/1ef008a7bed116ef9afa0741d99ceb2bd3d3ce05/Source/JavaScriptCore/runtime/JSONObject.cpp#1071-1077

if (UNLIKELY(!structure.canPerformFastPropertyEnumeration())) {
    recordFastPropertyEnumerationFailure(object);
    return;
}
structure.forEachProperty(m_vm, [&](const auto& entry) -> bool {
    // ...
});

In Firefox, we seem to do property enumeration and property lookup separately.

(In reply to Markus Stange [:mstange] from comment #2)

They also seem to have a more efficient way to enumerate properties:

There might be some overlap with bug 1836679. I want to look into that more after the mprotect work, and can also look into this one.

Severity: -- → S3
Priority: -- → P2

I'm prototyping this one. Preliminary result is that the fast path (which should cover the examples in this bug) is slightly over twice the speed of the slow path. Results are only preliminary because I'm still outputting the named properties in reverse order, I'm not doing the length check so it would actually infinite loop if you had a cycle, and I imagine there are some additional checks that I'm not doing yet.

Without fast path: 13800.40ms for 2236400000 chars, 6.170808003725295 ns/char.
With fast path: 6283.81ms for 2236400000 chars, 2.8097893884283893 ns/char.

Assignee: nobody → sphink
Status: NEW → ASSIGNED

Had to take some time off, but this is getting closer to being real and I've pushed a timing run.

Patch contains some weird test code that I'll need to do in a different way. Passes all jit-tests and jstests, which confuses me because I've had to fix some things that I would totally expect to be tested by test262, but I'm not seeing any JSON tests there (other than for JSON module imports). Still matching up the code to the spec sections, and I need to remove some code duplication between the fast and slow paths.

I seem to have everything working now, though adding in the additional checks seems to have slowed the fast path down substantially:

Fast: 5098.89ms for 1118200000 chars, 4.56 ns/char.
Slow: 7779.06ms for 1118200000 chars, 6.96 ns/char.

I pushed to try for speedometer, and got around 5% improvement on 5 different Speedometer3 subtests. Actual differences: -2.75% -4.34% -5.20% -6.36% -6.88%

When running the JS shell locally under samply, I noticed the following not so desirable overhead:

  • 15% of the overall time is spent poisoning memory freed by jemalloc when reallocing for ExtractWellSized
  • 14% of the overall time is in GetOwnPropertyPure, some of which is redundant since we're iterating over the own properties
  • 23% in Quote(). Quote() is expensive for both fast and slow paths. It's a little tempting to cache a "this string does not need quoting (for JSON)" bit, though most of the allocation growing time is in Quote so it may not have that much of an effect.
  • 3.6% is spent looking for an interfering .toJSON property, some of which could be saved with a guard bit of some sort.

Getting closer to something reviewable, and I realized that I've only been looking at Linux numbers. Windows numbers are... different.

I have two pushes where Windows shows similar improvements to Linux, plus some massive 15-20% regressions on a handful of subtests. And my latest push which is mostly up to date (though I've added interrupt checks that this doesn't include) shows across-the-board improvements that I don't trust, and no regressions.

Information on the hit rates.

For awsy (tp-6), the overall hit rate is 72.4%. Most of the misses are unavoidable. Of the inputs where it is theoretically possible to use the fast path, the hit rate is 99.0%. Well, unless you magically analyzed the replacer function for side effects or something, or observed everything you cared about while it was running. If you count the replacer function argument as a miss, then the hit rate is 91.8%.

Of all calls that took the slow path:

  • 20.8% had a replacer function and so were ineligible
  • 76.4% were primitives
  • 0.1% were ineligible for other reasons
  • 2.1% found an object with a callable toJSON function
  • 0.7% bailed for other reasons (sparse indexes was the most common)

For just speedometer3 (plus some startup noise):

  • overall hit rate is 84.6%
  • hit rate only counting non-primitives that aren't going to run any code is 98.7%
  • hit rate for all non-primitives is 97.1%

Of all calls that took the slow path:

  • 8.4% had a replacer argument
  • 83.5% were primitives
  • 0.7% found something with a callable toJSON property
  • 7.4% found an INELIGIBLE_OBJECT, which could be a variety of things. Typed arrays, BigInts, non-native objects, etc.

That last category is the only category that could be reasonable to reduce right now, but the effort doesn't seem worth it just for bumping the hit rate by a percentage point or so.

A magical analysis that could tell whether a given function could possibly have any side effects on the data in question could boost the hit rate more, but that would only make sense if we were doing it for something else and could just reuse it here.

Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/65e729aa1e06
accelerated path for JSON.stringify r=jandem
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a954796eff2d
accelerated path for JSON.stringify r=jandem
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch
Regressions: 1846246
Regressions: 1845671
Regressions: 1847369
See Also: → 1852595
No longer regressions: 1885819
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: