Open Bug 2003299 Opened 3 months ago Updated 3 months ago

toBase64 is very slow

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

Performance Impact medium

People

(Reporter: denispal, Unassigned)

References

(Blocks 1 open bug)

Details

Go to https://simdutf.github.io/browserbase64/ and run the benchmark.

Firefox:
~3100.43 MiB/s

Chrome:
~21479.84 MiB/s

Safari:
~19730.71 MiB/s

We were even worse before, but bug 1994067 and bug 1995626 improved performance by about 10x. However, we are still very far behind.

Performance Impact: --- → medium
See Also: → 2003305

Profile: https://share.firefox.dev/4ovF8yJ (from Nightly)

30% of the time is in poisoning during free. But why is JSStringBuilder::finishString doing a reallocation to begin with? Did we predict the wrong size when we reserved the string length?

The big difference between us and V8/JSC here is that they're using SIMD and we're not. IIUC they're both using the simdutf library.

toBase64 computes a precise result size. However, StringBuilder is backed by mozilla::Vector, and Vector::reserve calls growStorageBy, which over-allocates. Maybe we need Vector::reservePrecise?

(In reply to Markus Stange [:mstange] from comment #1)

why is JSStringBuilder::finishString doing a reallocation to begin with? Did we predict the wrong size when we reserved the string length?

finishString appends a null terminator (/0).

https://github.com/mozilla-firefox/firefox/blob/main/js/src/util/StringBuilder.cpp#L158

That said, there are comments that suggest the reserve() function has logic that will automatically reserve header bytes in certain scenarios. So it may not be that.

Note that the profile shows that the realloc is happening inside ExtractWellSized. If we were reallocating when appending the null terminator, it would show up in finishStringInternal->StringBuilder::append.

The header bytes are different, though. Those exist so that we can turn this into a mozilla::StringBuffer. If we add reservePrecise, we might need to bump the allocation by 1 to account for the null terminator.

Okay, this is from my limited understanding of the codebase, but unless the input string qualifies as "static string", a JS "Inline String", or being "too short for a buffer", then finishString (finishStringInternal) will always:

  • append a \0 to the original
  • move the data with potential to reallocate if its wasting memory (via ExtractWellSized)
  • reset the original string to an empty string (TODO: check if appendN allocates if its buffer is taken from it... ExtractWellSized is weird).
  • allocate a mozilla::StringBuffer, and copy all the data over

PS: I'm currently in compiling Firefox for the first time limbo, so while that works I'm only able to dig around for clues. I don't want to be too noisy, but I figured I'd share my findings.

(In reply to Iain Ireland [:iain] from comment #5)

Note that the profile shows that the realloc is happening inside ExtractWellSized. If we were reallocating when appending the null terminator, it would show up in finishStringInternal->StringBuilder::append.

Yeah, then that realloc appears to be an ExtractWellSized feature that shrinks strings that take up too much memory.

The header bytes are different, though. Those exist so that we can turn this into a mozilla::StringBuffer.

Gotcha. Strings with a size header. Okay.

If we add reservePrecise, we might need to bump the allocation by 1 to account for the null terminator.

Will reservePrecise do anything that JSStringBuilder::finishString will consider usable as is? i.e. make a "Static String". Like if we're hitting ExtractWellSized then it already failed the 3 possible early outs.

Is there any obvious reason rewriting the uint8array_toBase64 code to use a mozilla::StringBuffer instead of a JSStringBuilder would be a problem?

Ultimately the function needs a mozilla::StringBuffer at the end anyway.

(In reply to Mike Kasprzak from comment #8)

Ultimately the function needs a mozilla::StringBuffer at the end anyway.

Nevermind, it needs a JSString. The mozilla::StringBuffer is just something it uses temporarily.

(In reply to Mike Kasprzak from comment #6)

  • allocate a mozilla::StringBuffer, and copy all the data over

I was incorrect about this. With the exception of the potential reallocation in ExtractWellSized' (if the allocated memory is significantly larger than needed), finishStringdoesn't allocate, but instead just coaxes theJSStringBuilderinto a JSString with a validmozilla::StringBuffer` hidden inside.

So yeah, I think I'm on the same page now (sorry about the delay, new codebase). As suggested, adding a reservePrecise, and calling it with an extra byte for the null terminator should avoid the ExtractWellSized reallocation. Not sure how far that will get this, but it's a start.

reservePrecise Improvement is roughly 2x.

## Chromium 142
Uint8Array.toBase64()            0.00273 ms → 5722.63 MiB/s

## FF 145
Uint8Array.toBase64()            0.0761 ms → 205.45 MiB/s

## FF Nightly Artifact
Uint8Array.toBase64()            0.0273 ms → 571.48 MiB/s
Uint8Array.toBase64()            0.0369 ms → 423.56 MiB/s
Uint8Array.toBase64()            0.0358 ms → 435.95 MiB/s

## FF Nightly Build
Uint8Array.toBase64()            0.0326 ms → 479.61 MiB/s
Uint8Array.toBase64()            0.0374 ms → 418.21 MiB/s

## FF Nightly Build with base64 tweak
Uint8Array.toBase64()            0.0216 ms → 724.65 MiB/s
Uint8Array.toBase64()            0.0185 ms → 844.92 MiB/s
Uint8Array.toBase64()            0.0177 ms → 881.62 MiB/s

Hacky implementation:

https://github.com/mikekasprzak/firefox/tree/toBase64-reservePrecise

Depends on: 2003461
Blocks: sm-js-perf
Severity: -- → N/A
Priority: -- → P3
Severity: N/A → S4
You need to log in before you can comment on or make changes to this bug.