toBase64 is very slow
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
| Performance Impact | medium |
People
(Reporter: denispal, Unassigned)
References
(Blocks 1 open bug)
Details
Go to https://simdutf.github.io/browserbase64/ and run the benchmark.
Firefox:
~3100.43 MiB/s
Chrome:
~21479.84 MiB/s
Safari:
~19730.71 MiB/s
We were even worse before, but bug 1994067 and bug 1995626 improved performance by about 10x. However, we are still very far behind.
| Reporter | ||
Updated•3 months ago
|
Comment 1•3 months ago
•
|
||
Profile: https://share.firefox.dev/4ovF8yJ (from Nightly)
30% of the time is in poisoning during free. But why is JSStringBuilder::finishString doing a reallocation to begin with? Did we predict the wrong size when we reserved the string length?
Comment 2•3 months ago
|
||
The big difference between us and V8/JSC here is that they're using SIMD and we're not. IIUC they're both using the simdutf library.
toBase64 computes a precise result size. However, StringBuilder is backed by mozilla::Vector, and Vector::reserve calls growStorageBy, which over-allocates. Maybe we need Vector::reservePrecise?
Comment 3•3 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #1)
why is
JSStringBuilder::finishStringdoing a reallocation to begin with? Did we predict the wrong size when we reserved the string length?
finishString appends a null terminator (/0).
https://github.com/mozilla-firefox/firefox/blob/main/js/src/util/StringBuilder.cpp#L158
Comment 4•3 months ago
|
||
That said, there are comments that suggest the reserve() function has logic that will automatically reserve header bytes in certain scenarios. So it may not be that.
Comment 5•3 months ago
|
||
Note that the profile shows that the realloc is happening inside ExtractWellSized. If we were reallocating when appending the null terminator, it would show up in finishStringInternal->StringBuilder::append.
The header bytes are different, though. Those exist so that we can turn this into a mozilla::StringBuffer. If we add reservePrecise, we might need to bump the allocation by 1 to account for the null terminator.
Comment 6•3 months ago
|
||
Okay, this is from my limited understanding of the codebase, but unless the input string qualifies as "static string", a JS "Inline String", or being "too short for a buffer", then finishString (finishStringInternal) will always:
- append a
\0to the original - move the data with potential to reallocate if its wasting memory (via
ExtractWellSized) - reset the original string to an empty string (TODO: check if
appendNallocates if its buffer is taken from it...ExtractWellSizedis weird). - allocate a
mozilla::StringBuffer, and copy all the data over
PS: I'm currently in compiling Firefox for the first time limbo, so while that works I'm only able to dig around for clues. I don't want to be too noisy, but I figured I'd share my findings.
Comment 7•3 months ago
|
||
(In reply to Iain Ireland [:iain] from comment #5)
Note that the profile shows that the realloc is happening inside ExtractWellSized. If we were reallocating when appending the null terminator, it would show up in finishStringInternal->StringBuilder::append.
Yeah, then that realloc appears to be an ExtractWellSized feature that shrinks strings that take up too much memory.
The header bytes are different, though. Those exist so that we can turn this into a mozilla::StringBuffer.
Gotcha. Strings with a size header. Okay.
If we add
reservePrecise, we might need to bump the allocation by 1 to account for the null terminator.
Will reservePrecise do anything that JSStringBuilder::finishString will consider usable as is? i.e. make a "Static String". Like if we're hitting ExtractWellSized then it already failed the 3 possible early outs.
Comment 8•3 months ago
|
||
Is there any obvious reason rewriting the uint8array_toBase64 code to use a mozilla::StringBuffer instead of a JSStringBuilder would be a problem?
Ultimately the function needs a mozilla::StringBuffer at the end anyway.
Comment 9•3 months ago
|
||
(In reply to Mike Kasprzak from comment #8)
Ultimately the function needs a
mozilla::StringBufferat the end anyway.
Nevermind, it needs a JSString. The mozilla::StringBuffer is just something it uses temporarily.
Comment 10•3 months ago
|
||
(In reply to Mike Kasprzak from comment #6)
- allocate a
mozilla::StringBuffer, and copy all the data over
I was incorrect about this. With the exception of the potential reallocation in ExtractWellSized' (if the allocated memory is significantly larger than needed), finishStringdoesn't allocate, but instead just coaxes theJSStringBuilderinto a JSString with a validmozilla::StringBuffer` hidden inside.
So yeah, I think I'm on the same page now (sorry about the delay, new codebase). As suggested, adding a reservePrecise, and calling it with an extra byte for the null terminator should avoid the ExtractWellSized reallocation. Not sure how far that will get this, but it's a start.
Comment 11•3 months ago
|
||
reservePrecise Improvement is roughly 2x.
## Chromium 142
Uint8Array.toBase64() 0.00273 ms → 5722.63 MiB/s
## FF 145
Uint8Array.toBase64() 0.0761 ms → 205.45 MiB/s
## FF Nightly Artifact
Uint8Array.toBase64() 0.0273 ms → 571.48 MiB/s
Uint8Array.toBase64() 0.0369 ms → 423.56 MiB/s
Uint8Array.toBase64() 0.0358 ms → 435.95 MiB/s
## FF Nightly Build
Uint8Array.toBase64() 0.0326 ms → 479.61 MiB/s
Uint8Array.toBase64() 0.0374 ms → 418.21 MiB/s
## FF Nightly Build with base64 tweak
Uint8Array.toBase64() 0.0216 ms → 724.65 MiB/s
Uint8Array.toBase64() 0.0185 ms → 844.92 MiB/s
Uint8Array.toBase64() 0.0177 ms → 881.62 MiB/s
Hacky implementation:
https://github.com/mikekasprzak/firefox/tree/toBase64-reservePrecise
Updated•3 months ago
|
Updated•3 months ago
|
Description
•