Implement JIT Support for Float16Array
Categories
(Core :: JavaScript Engine, enhancement, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox130 | --- | fixed |
People
(Reporter: dminor, Assigned: anba)
References
Details
Attachments
(14 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
From https://bugzilla.mozilla.org/show_bug.cgi?id=1833647#c12, a follow up to implement JIT support for Float16Array.
Updated•2 years ago
|
| Reporter | ||
Comment 1•1 year ago
|
||
There's some discussion of optimizations here: https://github.com/tc39/proposal-float16array/issues/12
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 2•1 year ago
|
||
Add the more conversion methods from upstream. Later patches will call the new
methods.
Add if-constexpr to ElementSpecific::valueToNative to avoid compiler errors,
because both js::float16::operator=(double) and js::float16::operator=(float)
are applicable assignment operators when assigning from int64_t.
| Assignee | ||
Comment 3•1 year ago
|
||
Support vcvtph2ps and vcvtps2ph instructions from the F16C instruction set.
F16C requires AVX being enabled per "Intel developer manual, Vol 1, §14.4.1
Detection of F16C Instructions".
Depends on D215762
| Assignee | ||
Comment 4•1 year ago
|
||
Upstream already has this fixed.
Depends on D215763
| Assignee | ||
Comment 5•1 year ago
|
||
Support float in addition to double in storeCallFloatResult.
Depends on D215764
| Assignee | ||
Comment 6•1 year ago
|
||
Hardware support for float16 conversions is limited:
- ARM supports float32<>float16 conversions with Neon. This is not implemented yet.
- ARM64 supports float32<>float16 and float64<>float16 conversions.
- x86/x64 supports float32<>float16 conversions when F16C instructions are supported.
We use the following approach for this initial implementation:
- Use supported conversions when available, otherwise fall back to an ABI call.
- Represent float16 as float32 throughout the JIT (so no
MIRType::Float16yet),
because:- float32 is supported for all targets, so we reduce cross-target differences
when choosing the data type which is universally supported, - float32<>float16 conversions are natively supported for the main target platforms,
- actual float16 math operations have even more limited hardware support, so we need
to convert float16 to either float32 or float64 anyway at some point.
- float32 is supported for all targets, so we reduce cross-target differences
- And this also enables using the existing optimisations for
MIRType::Float32.
The next part will start using the conversion methods from this patch.
Note 1: float64->float16 conversion can't be emulated through a float64->float32->float16
conversion sequence, because the sequence float64->float32 and float32->float16 can
round differently than the direct float64->float16 conversion.
Note 2: float f(int32_t) in "ABIFunctionType.yaml" requires an explicit General -> Float32
entry for the ARM simulator, just adding Int32 -> Float32 led to an error.
Depends on D215765
| Assignee | ||
Comment 7•1 year ago
|
||
Inline Math.f16round similar how Math.fround is inlined:
- CacheIRCompiler either calls the conversion methods from part 5
or calls into the VM. - Warp transpiles to
MToFloat16, which has a similar implementation
asMToFloat32.
Depends on D215766
| Assignee | ||
Comment 8•1 year ago
|
||
Extend MacroAssembler::loadFromTypedArray to support loading from Float16Array.
This requires passing an additional temp-register and LiveRegisterSet when the
target doesn't natively support float32<>float16 conversions.
Codegen for LoadUnboxedScalar on x86/x64 looks like:
movzwl 0x0(%rdx,%rbx,2), %esi
vmovd %esi, %xmm0
vpmovzxwq %xmm0, %xmm0
vcvtph2ps %xmm0, %xmm0
vucomiss %xmm0, %xmm0
jnp .Lfrom120
movss .Lfrom128(%rip), %xmm0
And on ARM64:
ldr h0, [x2, x3, lsl #1]
fcvt s0, h0
fcmp s0, s0
b.vc -> 1015f
ldr s0, pc+24 (addr 0x70c2b0a96224) ; .const nan
Depends on D215767
| Assignee | ||
Comment 9•1 year ago
|
||
Extend the existing DataView code to also support Float16, using similar
changes as the previous part.
Depends on D215768
| Assignee | ||
Comment 10•1 year ago
|
||
Slightly larger changes when compared to the previous two parts, because
MacroAssembler::storeToTypedFloatArray had to be changed to support
conversions instead of performing conversion in its caller:
CacheIRCompiler::emitStoreTypedArrayElementusedScratchFloat32Scopeto
convertdouble -> float32, but using the same approach won't work for float16,
becauseScratchFloat32Scopeis also needed inMacroAssembler::storeFloat16
to convertfloat32 -> float16.- Therefore move the conversion
double -> float32intoStoreToTypedFloatArray - And the conversions
double -> float16intoMacroAssembler::storeFloat16.
Codegen for StoreUnboxedScalar on x64 looks like:
vcvtps2ph $0x4, %xmm0, %xmm15
vmovd %xmm15, %r11d
movw %r11w, 0x0(%rdx,%rbx,2)
And on ARM64:
h31, s0
h31, [x2, x4, lsl #1]
Depends on D215769
| Assignee | ||
Comment 11•1 year ago
|
||
Depends on D215770
| Assignee | ||
Comment 12•1 year ago
|
||
Transpiler and type policies add the following instructions when reading and then
storing a value from a Float16Array:
value = MLoadUnboxedScalar(f16array)
guarded_value = MToDouble(value) <-- Inserted by WarpCacheIRTranspiler
typed_value = MToFloat16(guarded_value) <-- Inserted by StoreUnboxedScalarPolicy
MStoreUnboxedScalar(f16array, typed_value)
Neither MToDouble nor MToFloat16 are needed, so let MToFloat16::foldsTo remove them.
This extra folding is needed because we don't yet have a MIRType::Float16 which we
can handle in MToFloat16::foldsTo.
The WarpCacheIRTranspiler change is an optimisation to avoid generating the following
instructions during transpiling and applying the type policy:
value = MLoadUnboxedScalar(f16array)
double_value = MToDouble(value) <-- Inserted by js::jit::AlwaysBoxAt
boxed_value = MBox(double_value) <-- Inserted by BoxPolicy
unboxed_value = MUnbox(boxed_value, Double) <-- Inserted by WarpCacheIRTranspiler
GVN will remove the MBox->MUnbox sequence, but it seems preferable to avoid generating it
in the first place.
Depends on D215771
| Assignee | ||
Comment 13•1 year ago
|
||
Remove the TODO note about adding Float16Array JIT support by renaming
OutOfLineLoadTypedArrayOutOfBounds to OutOfLineAsmJSLoadHeapOutOfBounds
which makes it more clear that Float16 support isn't needed here.
Depends on D215772
| Assignee | ||
Comment 14•1 year ago
|
||
Before this change:
[Codegen] vucomiss %xmm0, %xmm0
[Codegen] jnp .Lfrom214
[Codegen] movss .Lfrom222(%rip), %xmm0
After this change:
[Codegen] vucomiss %xmm0, %xmm0
[Codegen] jnp .Lfrom214
[Codegen] movss .Lfrom222(%rip), %xmm0
Note how the label identifiers are now properly aligned.
Depends on D215773
| Assignee | ||
Comment 15•1 year ago
|
||
ToFloat32(ToDouble(float32)) is exactly equal to float32, so MToDouble can
produce Float32 when its input can produce Float32. This change is necessary to
enable Float32 optimizations for various instructions, for example MSqrt.
Without this change Float32 optimizations are always disabled, which makes it
hard to verify that Float16 operations correctly handle Float32 inputs and
outputs.
Depends on D215774
| Reporter | ||
Updated•1 year ago
|
Comment 16•1 year ago
|
||
Comment 17•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/410329e58599
https://hg.mozilla.org/mozilla-central/rev/9d01c98d3d64
https://hg.mozilla.org/mozilla-central/rev/ade51dbcc573
https://hg.mozilla.org/mozilla-central/rev/3f2cb72c6348
https://hg.mozilla.org/mozilla-central/rev/c6ec3155a5f8
https://hg.mozilla.org/mozilla-central/rev/f8cdec89d3cc
https://hg.mozilla.org/mozilla-central/rev/d5ea08f74244
https://hg.mozilla.org/mozilla-central/rev/816e1e8497d5
https://hg.mozilla.org/mozilla-central/rev/1543e18cfd43
https://hg.mozilla.org/mozilla-central/rev/b39efb191c8e
https://hg.mozilla.org/mozilla-central/rev/10fb376db8cf
https://hg.mozilla.org/mozilla-central/rev/1006c87386c1
https://hg.mozilla.org/mozilla-central/rev/a67dc538eaee
https://hg.mozilla.org/mozilla-central/rev/f0bc3536dce7
https://hg.mozilla.org/mozilla-central/rev/e0c206023ab0
Description
•