Closed Bug 1995626 Opened 5 months ago Closed 4 months ago

from Base64/Hex are slower than chrome

Categories

(Core :: JavaScript Engine, defect, P3)

Firefox 144
defect

Tracking

()

RESOLVED FIXED
146 Branch
Tracking Status
firefox146 --- fixed

People

(Reporter: mgaudet, Assigned: anba)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

From Comment 18 on Bug 1994067:

I see:

  • 1.4x improvement in fromHex, but it's still ~2x slower than JS
  • 1.3x improvement in fromBase64, but it's still ~1.7x slower than JS in SpiderMonkey (or 2x slower than JS in Firefox)
  • 4.3x improvement in toHex, that is now faster than JS (but ~2.7x slower than v8)
  • 3.7x improvement in toBase64, that is now faster than JS in SpiderMonkey and in Firefox (but still ~8x slower than v8)

I think the original issue still stands for fromHex and fromBase64 and this should be reopened
As in: JS is still faster for those two

See that bug for test cases as well.

Severity: -- → S3
Priority: -- → P3

Note: further improvements here are likely to involve using SIMD instructions. Anba made all the easy perf fixes in the previous bug.

It looks like JSC and V8 are both using the simdutf library in (at least) the fromBase64 case.

The issue inherited from Bug 1994067 is that fromHex/fromBase64 impls in Firefox are 2x slower than compared to JS-based impl run in Firefox

Even disregarding Chrome/WebKit

Use a table lookup to replace HexDigitToNibbleOrInvalid. The decode table has
256 entries so that Latin-1 characters can be decoded branch-free. The table
element type is int8_t, to keep the table size small. The elements are later
loaded as int32_t for faster error detection.

Generated code for decoding four characters, extracted from a standalone C++
implementation, but should be similar enough to code generated for FromHex:

;; Load four characters
movzx   eax, byte ptr [rdi]
movzx   ecx, byte ptr [rdi + 1]
movzx   edx, byte ptr [rdi + 2]
movzx   esi, byte ptr [rdi + 3]

;; Decode table
lea     rdi, [rip + Hex::Table]

;; Decode c2, sign-extend int8 to int32
movsx   edx, byte ptr [rdx + rdi]
shl     edx, 12

;; Decode c3, ...
movsx   esi, byte ptr [rsi + rdi]
shl     esi, 8
or      esi, edx

;; Decode c0, ...
movsx   edx, byte ptr [rax + rdi]
shl     edx, 4
or      edx, esi

;; Decode c1, ...
movsx   eax, byte ptr [rcx + rdi]
or      eax, edx

;; Check SF set by previous or-instruction
js      .invalid_char
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED

Two changes:

  1. Extend the decode table to 256 elements and change the element type to
    int8_t. This matches the changes from part 1.
  2. Add a separate loop to process full chunks. This saves additional branches,
    because we no longer have to if the output is full for each character read.
    Also try to read four consecutive characters if possible and treat whitespace
    characters in a slow path.

These two patches should help to make the fromHex and fromBase64 cases noticeably faster.

I've filed bug 1996197 for another possible fromBase64 optimisation.

Pushed by amarc@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/12d2f3d8575d https://hg.mozilla.org/integration/autoland/rev/d33331bedf01 Revert "Bug 1995626 - Part 2: Add separate loop to decode full chunks in FromBase64. r=iain" for causing SM bustages @ TypedArrayObject.cpp

Backed out for causing sm bustages @ TypedArrayObject.cpp

Flags: needinfo?(andrebargull)
Flags: needinfo?(andrebargull)

Will test after this gets into a nightly build

Tested on https://archive.mozilla.org/pub/firefox/integration/autoland/2025/10/25/860fd0ffd63968ac63e185431cd3612c8b2cf125/

Yes, that fixed it, thanks!

I see ~4.8x improvement on both fromHex and fromBase64
v8 is still faster, but now only about ~1.7-2x

Now native is faster than JS for all four methods, this can be closed, I think

(while this is fixed, numbers from the build above are likely lower than what'll actually be in Nightly, as apparently integration a less optimized build)

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 146 Branch

Thanks for verifying that the changes improved performance!

QA Whiteboard: [qa-triage-done-c147/b146]
See Also: → 2003299
See Also: → 2003305
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: