from Base64/Hex are slower than chrome
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox146 | --- | fixed |
People
(Reporter: mgaudet, Assigned: anba)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
From Comment 18 on Bug 1994067:
I see:
- 1.4x improvement in fromHex, but it's still ~2x slower than JS
- 1.3x improvement in fromBase64, but it's still ~1.7x slower than JS in SpiderMonkey (or 2x slower than JS in Firefox)
- 4.3x improvement in toHex, that is now faster than JS (but ~2.7x slower than v8)
- 3.7x improvement in toBase64, that is now faster than JS in SpiderMonkey and in Firefox (but still ~8x slower than v8)
I think the original issue still stands for fromHex and fromBase64 and this should be reopened
As in: JS is still faster for those two
See that bug for test cases as well.
| Reporter | ||
Updated•5 months ago
|
Comment 1•5 months ago
|
||
Note: further improvements here are likely to involve using SIMD instructions. Anba made all the easy perf fixes in the previous bug.
It looks like JSC and V8 are both using the simdutf library in (at least) the fromBase64 case.
Comment 2•4 months ago
|
||
The issue inherited from Bug 1994067 is that fromHex/fromBase64 impls in Firefox are 2x slower than compared to JS-based impl run in Firefox
Even disregarding Chrome/WebKit
| Assignee | ||
Comment 3•4 months ago
|
||
Use a table lookup to replace HexDigitToNibbleOrInvalid. The decode table has
256 entries so that Latin-1 characters can be decoded branch-free. The table
element type is int8_t, to keep the table size small. The elements are later
loaded as int32_t for faster error detection.
Generated code for decoding four characters, extracted from a standalone C++
implementation, but should be similar enough to code generated for FromHex:
;; Load four characters
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rdi + 1]
movzx edx, byte ptr [rdi + 2]
movzx esi, byte ptr [rdi + 3]
;; Decode table
lea rdi, [rip + Hex::Table]
;; Decode c2, sign-extend int8 to int32
movsx edx, byte ptr [rdx + rdi]
shl edx, 12
;; Decode c3, ...
movsx esi, byte ptr [rsi + rdi]
shl esi, 8
or esi, edx
;; Decode c0, ...
movsx edx, byte ptr [rax + rdi]
shl edx, 4
or edx, esi
;; Decode c1, ...
movsx eax, byte ptr [rcx + rdi]
or eax, edx
;; Check SF set by previous or-instruction
js .invalid_char
Updated•4 months ago
|
| Assignee | ||
Comment 4•4 months ago
|
||
Two changes:
- Extend the decode table to 256 elements and change the element type to
int8_t. This matches the changes from part 1. - Add a separate loop to process full chunks. This saves additional branches,
because we no longer have to if the output is full for each character read.
Also try to read four consecutive characters if possible and treat whitespace
characters in a slow path.
| Assignee | ||
Comment 5•4 months ago
|
||
These two patches should help to make the fromHex and fromBase64 cases noticeably faster.
| Assignee | ||
Comment 6•4 months ago
|
||
I've filed bug 1996197 for another possible fromBase64 optimisation.
Backed out for causing sm bustages @ TypedArrayObject.cpp
| Assignee | ||
Updated•4 months ago
|
Comment 10•4 months ago
|
||
Comment 12•4 months ago
|
||
Yes, that fixed it, thanks!
I see ~4.8x improvement on both fromHex and fromBase64
v8 is still faster, but now only about ~1.7-2x
Now native is faster than JS for all four methods, this can be closed, I think
Comment 13•4 months ago
|
||
(while this is fixed, numbers from the build above are likely lower than what'll actually be in Nightly, as apparently integration a less optimized build)
Comment 14•4 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/f621bb9d2ef2
https://hg.mozilla.org/mozilla-central/rev/2666f184df80
| Assignee | ||
Comment 15•4 months ago
|
||
Thanks for verifying that the changes improved performance!
Updated•4 months ago
|
Description
•