Implement Float16Array
Categories
(Core :: JavaScript Engine, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox127 | --- | fixed |
People
(Reporter: dminor, Assigned: dminor)
References
(Blocks 3 open bugs)
Details
(Keywords: dev-doc-complete)
Attachments
(8 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
This is at stage 3 and ready for implementation. Some thoughts:
- We define typed arrays here and have precedent for defining custom types on top of an underlying primitive type in uint8_clamped.
- It looks like we could do something similar for f16.
- Conversion from f32 to f16 is a matter of bit-twiddling, it looks there is hardware support for conversions / storage on x86 and ARM, so this is likely optimizable.
- From what I can tell from the uint8_clamped implementation, this is sufficient, we’ll convert back to a Number for operations like
filter
orfill
.
The changes required to support this are self-contained. I think this would be a good project for someone new interested in contributing to SpiderMonkey, it's not trivial, but not a huge project either.
I haven't hacked SpiderMonkey before but would like to take a crack at this :)
Assignee | ||
Comment 2•1 year ago
|
||
Excellent! Let me know if you need any help getting started. Here's some info on getting a build going: https://firefox-source-docs.mozilla.org/js/build.html
Thank you. From the code you linked I assume the first steps are:
- Define a js::float16 type similar to js::uint8_clamped.
- Represent float16 as a C++ short (uint16_t). Use bit-twiddling to extract appropriate bits from a float32. Most likely optimize with hardware supported instructions when available.
AFAIU the benefits come from the smaller representation? I.e. to do computation with float16 one must convert to float32 (bit twiddling or hardware instructions), do the computation, convert back to float16. Hardware support for computation in float16 is only available in recent Intel processors from 2020 onwards and is called BFloat16. Is this correct?
The thing I'm most uncertain about is what the interface of the js::float16 type should look like? js::uint8_clamped has conversions from various primitive types. I assume we need more than that? I'm trying to answer these questions myself by looking through uses of js::uint8_clamped on Searchfox.
Assignee | ||
Comment 4•1 year ago
|
||
Yes, that sounds right. Don't worry about the hardware optimizations as part of this bug, it's not worth the extra complexity right now. If that changes, we'll file a follow up bug to handle the optimizations.
Yes, you're right, the advantage is the smaller representation, e.g. for passing buffers to GPUs that do have native support for float16 arithmetic when doing graphics or neural networks.
That's a good question about whether the various primitive type conversions, I'm not sure offhand. I'd suggest starting with the float32 conversion for now, and worry about the other types later. I'll look into it too.
Updated•1 year ago
|
After looking at the code some more, I'm not sure what ExternalT in MACRO(ExternalT, NativeT, Name) means. For all the other definitions it's pretty clear what it's supposed to be without knowing the meaning of it.
I'm at the point where the compiler is complaining that it doesn't know how to sort Float16 typed arrays. Makes sense. I implemented TypedArrayStdSort by calling std::sort and converting the uint16_t half-precision floats into single-precision floats. Seems reasonable. However due to sizeof(js::float16) == 2, it wants to use Radix sort (1). Which inturn calls Counting Sort. Now that seems less reasonable for floating point data.
(1): https://searchfox.org/mozilla-central/source/js/src/vm/TypedArrayObject.cpp#2763
Comment 7•1 year ago
|
||
(In reply to B. Dahse from comment #3)
I.e. to do computation with float16 one must convert to float32 (bit twiddling or hardware instructions), do the computation, convert back to float16. Hardware support for computation in float16 is only available in recent Intel processors from 2020 onwards and is called BFloat16. Is this correct?
BFloat16 is a different 16-bit float type. The Float16Array proposal uses instead the IEEE-754/IEC 60559 binary16, see also https://en.wikipedia.org/wiki/Half-precision_floating-point_format.
(In reply to B. Dahse from comment #5)
After looking at the code some more, I'm not sure what ExternalT in MACRO(ExternalT, NativeT, Name) means. For all the other definitions it's pretty clear what it's supposed to be without knowing the meaning of it.
TypedArray contents are also exposed via the public JS SpiderMonkey API. The public API uses the ExternalT
type. I think it should be possible to use ExternalT = uint16_t
for now.
(In reply to B. Dahse from comment #6)
I'm at the point where the compiler is complaining that it doesn't know how to sort Float16 typed arrays. Makes sense. I implemented TypedArrayStdSort by calling std::sort and converting the uint16_t half-precision floats into single-precision floats. Seems reasonable.
It should be possible to specialise UnsignedSortValue
similar to this specialisations for float/double. And then enable this function to float16 in addition to float/double. That way it's not necessary to convert to float
.
However due to sizeof(js::float16) == 2, it wants to use Radix sort (1). Which inturn calls Counting Sort. Now that seems less reasonable for floating point data.
Just change the if constexpr
condition, so float16 types don't call counting sort. We can worry about any possible performance improvements later.
(In reply to Dan Minor [:dminor] from comment #0)
- Conversion from f32 to f16 is a matter of bit-twiddling, it looks there is hardware support for conversions / storage on x86 and ARM, so this is likely optimizable.
The conversion functions from stackoverflow don't correctly handle all inputs, for example denormals. https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/truncsfhf2.c (Apache 2 licensed) seems better when comparing the results against the results from vcvtps2ph
on x86-64.
It's easy to create a test program to ensure the conversion works correctly:
#include <stdint.h>
#include <cstring>
#include <cstdio>
union Float16 {
private:
uint16_t value_ = 0;
public:
Float16() = default;
Float16(uint16_t value) : value_(value) {}
bool operator==(const Float16& other) const { return value_ == other.value_; }
bool operator!=(const Float16& other) const { return value_ != other.value_; }
auto value() const { return value_; }
};
static auto ConvertFloat16(float flt) {
// https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
_Float16 f16 = flt;
uint32_t fltInt16;
std::memcpy(&fltInt16, &f16, sizeof(_Float16));
return Float16(fltInt16);
}
static auto MyConvertFloat16(float flt) {
// TODO: Implement conversion here.
return ConvertFloat16(flt);
}
// Compile with:
// (1) -O3 -std=c++17 -mfpmath=sse -msse2
// (2) -O3 -std=c++17 -mfpmath=sse -msse2 -march=x86-64-v3
//
// (1) tests the intrinsic compiler functions.
// (2) tests the hardward optimised instructions.
int main() {
for (uint64_t j = 0; j <= uint64_t(0xffff'ffff); ++j) {
uint32_t i = uint32_t(j);
float flt;
std::memcpy(&flt, &i, sizeof(float));
auto x = ConvertFloat16(flt);
auto y = MyConvertFloat16(flt);
if (x != y) {
std::printf("%08x: x(%04x) != y(%04x)\n", i, x.value(), y.value());
break;
}
}
}
Thank you for the detailed comment. It sounds like I'm on the right track. The specialization of UnsignedSortValue and enabling the floating point implementation for float16 was my first idea but I ran into difficulties so I decided to just code a float16 specific version to get something compiling.
I took your advice about disabling radix/counting sort for float16 so I can compile SpiderMonkey once again. Yay!
Now unfortunately some 500 of the JIT tests are broken. Most fail with an assertion "MOZ_CRASH(unexpected JSProtoKey)" which I have not had time to look into just yet. There are also quite a few TypeError: can't convert BigInt to number. I suspect it has to do with where I defined the new Float16 scalar. I put it after Big(U)int64 so maybe there is some type confusion going on. Any advice is welcome :)
Oh the Protokey thing was easy enough, just another case in the switch.
Comment 10•1 year ago
|
||
Also, I'm a bit unsure how to handle float16 in codegen. Since we represent it as a uint16_t at runtime, I guess I should handle it the same way in JIT? Because we don't want to reinterpret the bits as a float for example.
Comment 11•1 year ago
|
||
Seems my intuition about the order of declarations was right. After moving around I'm now down to a handful of webassembly test failures:
FAILURES:
wasm/memory64/memory-copy.js
--wasm-compiler=optimizing wasm/memory64/memory-copy.js
--wasm-compiler=baseline wasm/memory64/memory-copy.js
wasm/memory64/memory-fill.js
--test-wasm-await-tier2 wasm/memory64/memory-copy.js
--wasm-compiler=optimizing wasm/memory64/memory-fill.js
TIMEOUTS:
--wasm-compiler=optimizing wasm/atomicity.js
--wasm-compiler=baseline wasm/atomicity.js
--disable-wasm-huge-memory wasm/atomicity.js
Comment 12•1 year ago
|
||
(In reply to B. Dahse from comment #10)
Also, I'm a bit unsure how to handle float16 in codegen. Since we represent it as a uint16_t at runtime, I guess I should handle it the same way in JIT?
Code needs to be added to convert the bits representing the Float16 number to a double
, similar to the existing MacroAssembler::convertFloat32ToDouble function.
But this can be added in a follow-up bug. Instead I'd suggest to disable JIT support Float16 TypedArrays by adding the following check to GetPropIRGenerator::tryAttachTypedArrayElement and SetPropIRGenerator::tryAttachSetTypedArrayElement:
// JIT support for Float16 isn't yet implemented.
if (tarr->type() == Scalar::Float16) {
return AttachDecision::NoAction;
}
Comment 13•1 year ago
|
||
I see. I will review my diff and send it in because I don't think there is much more that needs to be done in this bug in that case.
Assignee | ||
Comment 14•1 year ago
|
||
(In reply to B. Dahse from comment #13)
I see. I will review my diff and send it in because I don't think there is much more that needs to be done in this bug in that case.
Great! Thank you for working on this :)
Assignee | ||
Comment 15•1 year ago
|
||
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1835034 as the follow up from Comment 12, please mention that bug number when you disable like Anba suggested.
Comment 16•1 year ago
|
||
Updated•1 year ago
|
Assignee | ||
Comment 17•1 year ago
|
||
Hi, thanks for your work on this so far :) I was wondering if you were planning to continue? It's ok if not, the work you've done is useful, and we'll look for someone else to finish it off.
Comment 18•1 year ago
|
||
I haven't forgotten about this and I think I actually more or less solved it but I currently have a broken build (ugh, C++ templates...). Now that I know there's still interest I will try to get it rebased and try to rig it to use the mozilla floating point traits template stuff instead of my own to get it to build again. If I'm blocked on anything it's because I'm not familiar with the appropriate phabricator workflow for what I'm trying to do.
Assignee | ||
Comment 19•1 year ago
|
||
Great :) I'm glad you're still interested in this. Let us know if you have any questions.
Assignee | ||
Comment 20•9 months ago
|
||
Hi! I wanted to check in again to see if you've had any luck with this. We'd like to get this landed in 2024, so please let me know if there's anyway I can help out. If you don't have the time for it right now, I'd be happy to pick up whatever you have in progress and get it finished off. Thanks again for your work :)
Assignee | ||
Comment 21•9 months ago
|
||
I'm going to take this one over.
Comment 22•9 months ago
|
||
Yes that's fine, I'm starting a new job so I don't have time to push this one over the finish line. This is the code I wrote for doing branchless conversion from double/float to float16, tested on all 32 bit strings. It's based on a snippet from Stackoverflow with NaN normalization added.
template<typename F>
float16& operator=(F x) {
using enc = mozilla::FloatingPoint<float16>;
using flt = mozilla::FloatingPoint<F>;
using raw_type = typename flt::Bits;
static constexpr auto sig_diff = flt::kSignificandWidth - enc::kSignificandWidth;
static constexpr auto bit_diff = CHAR_BIT * (sizeof(raw_type) - sizeof(enc::Bits));
static constexpr auto bias_mul = raw_type(enc::kExponentBias) << flt::kSignificandWidth;
static constexpr auto flt_inf = flt::kExponentBits;
static constexpr auto enc_inf = enc::kExponentBits;
static constexpr auto qnan = enc_inf | enc_inf >> 1;
auto bits = mozilla::BitwiseCast<raw_type>(x);
auto sign = bits & flt::kSignBit; // save sign
bits ^= sign; // clear sign
auto is_nan = flt_inf < bits; // compare before rounding!!
auto payload = bits >> sig_diff & 0x1ff;
static constexpr auto min_norm = raw_type(flt::kExponentBias - enc::kExponentBias + 1) << flt::kSignificandWidth;
static constexpr auto sub_rnd = enc::kExponentBias < sig_diff
? raw_type(1) << (flt::kSignificandWidth - 1 + enc::kExponentBias - sig_diff)
: raw_type(enc::kExponentBias - sig_diff) << flt::kSignificandWidth;
static constexpr auto sub_mul = raw_type(flt::kExponentBias + sig_diff) << flt::kSignificandWidth;
bool is_sub = bits < min_norm;
auto norm = mozilla::BitwiseCast<F>(bits);
auto subn = norm;
subn *= mozilla::BitwiseCast<F>(sub_rnd); // round subnormals
subn *= mozilla::BitwiseCast<F>(sub_mul); // correct subnormal exp
norm *= mozilla::BitwiseCast<F>(bias_mul); // fix exp bias
bits = mozilla::BitwiseCast<raw_type>(norm);
bits += (bits >> sig_diff) & 1; // add tie breaking bias
bits += (raw_type(1) << (sig_diff - 1)) - 1; // round up to half
//if( is_sub ) bits = std::bit_cast<raw_type>(subn);
bits ^= -is_sub & (mozilla::BitwiseCast<raw_type>(subn) ^ bits);
bits >>= sig_diff; // truncate
//if( enc::inf < bits ) bits = enc::inf; // fix overflow
bits ^= -( enc_inf < bits ) & ( enc_inf ^ bits );
//if( is_nan ) bits = enc::qnan | payload;
bits ^= -is_nan & ((qnan | payload) ^ bits);
bits |= sign >> bit_diff; // restore sign
val = bits;
return *this;
}
Assignee | ||
Comment 23•7 months ago
|
||
Assignee | ||
Comment 24•7 months ago
|
||
This adds a Float16 type with conversion code for double derived from the half
library. The half library (https://sourceforge.net/projects/half/) is MIT
licensed.
Optimized code, based upon lookup tables, exists for single precision, but it's
not obvious to me whether it's correct to first round to single precision and
then to half precision when converting from double. The fact that the half
library does not implement their conversion this way leads me to believe that
this is not correct.
The V8 implementation is using this library: https://github.com/Maratyszcza/FP16,
which only supports single precision conversions.
Depends on D203414
Assignee | ||
Comment 25•7 months ago
|
||
The tests are based upon the existing fround.js testcase, with a polyfill and
additional tests from https://github.com/petamoriken/float16/, which is MIT
licensed.
Depends on D203415
Assignee | ||
Comment 26•7 months ago
|
||
This adds the Float16Array implementation. This code is derived from an
earlier patch written by Benjamin Dahse.
Depends on D203416
Assignee | ||
Comment 27•7 months ago
|
||
Depends on D203417
Assignee | ||
Comment 28•7 months ago
|
||
Depends on D203418
Comment 29•7 months ago
|
||
it's not obvious to me whether it's correct to first round to single precision and then to half precision when converting from double
Indeed it is not correct. Consider the float64 1.00048828125000022204, which is the next float64 after 1.00048828125, which is exactly halfway between 1.0 and the next float16 after 1.0 i.e. 1.0009765625.
Float64ToFloat32(1.00048828125000022204) is 1.00048828125, so Float32ToFloat16(Float64ToFloat32(1.00048828125000022204)) is 1.0 (under ties-to-even).
Float64ToFloat16(1.00048828125000022204) is of course 1.0009765625, since the original value was specifically chosen to be > halfway between 1.0 and 1.0009765625.
Updated•7 months ago
|
Assignee | ||
Comment 30•7 months ago
|
||
The Float16Array tests in test262 check for Float16Array on the fly instead of
using a feature flag. This adds support for checking the list of include files
for a specific include file, and adding the appropriate command line to the
existing shell option code if it is present.
Depends on D203419
Assignee | ||
Comment 31•7 months ago
|
||
Depends on D205260
Updated•7 months ago
|
Comment 32•6 months ago
|
||
Incidentally you may wish to follow / comment on https://github.com/tc39/proposal-float16array/issues/12.
Comment 33•6 months ago
|
||
While other browsers also do not yet support Float16Array, the lack of its support is the reason for our interop2024 failures in a number of indexedDB structured-clone
tests and idb-binary-key-roundtrip.htm
, so I'm marking this bug on wpt.fyi's dashboard for those tests.
Updated•6 months ago
|
Updated•6 months ago
|
Comment 34•6 months ago
|
||
Comment 35•6 months ago
|
||
Backed out for causing crashes with js::GlobalObject::skipDeselectedConstructor
- Backout link
- Push with failures
- Failure Log
- Failure line: PROCESS-CRASH | MOZ_CRASH(unexpected JSProtoKey) [@ js::GlobalObject::skipDeselectedConstructor] | Profiling run
Assignee | ||
Comment 36•5 months ago
|
||
I think I have this fixed by adding only adding Float16Array to JSProto for Nightly Builds: https://treeherder.mozilla.org/jobs?repo=try&revision=ea84ec64c025afafba0df634cec40ac6a19c0321
Comment 37•5 months ago
|
||
Comment 38•5 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/efbb8096ba45
https://hg.mozilla.org/mozilla-central/rev/c8f44ec2be16
https://hg.mozilla.org/mozilla-central/rev/3fbfc13c9452
https://hg.mozilla.org/mozilla-central/rev/a4640f7b9be7
https://hg.mozilla.org/mozilla-central/rev/a399a0da4980
https://hg.mozilla.org/mozilla-central/rev/7e6742efa89b
https://hg.mozilla.org/mozilla-central/rev/18435f4cca59
https://hg.mozilla.org/mozilla-central/rev/2768653c3e4a
Comment 39•5 months ago
|
||
MDN docs work for this can be tracked in https://github.com/mdn/content/issues/33561
A question - this is gated behind a preference, but some of it seems to be conditioned on nightly - such as https://hg.mozilla.org/integration/autoland/rev/3fbfc13c9452#l1.30
Does this mean that this featured only works on NIghtly AND with preference set?
Assignee | ||
Comment 40•5 months ago
|
||
Yes, for now, this requires a Nightly build and the preference set. My current plan is to ship this in Firefox 130, at which point the pref will default to true, and the Nightly requirement will go away.
Updated•4 months ago
|
Description
•