Closed Bug 1494346 Opened 6 years ago Closed 5 years ago

Consider using embedded BigInt library instead of libgmp

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox64 --- affected

People

(Reporter: wingo, Unassigned)

References

Details

Hi,

I was talking with :terpri earlier today, working out what's the shortest path to getting BigInt shipping in Firefox.  In bug 1366287, :lth says that using a library is a good idea and I agree (https://bugzilla.mozilla.org/show_bug.cgi?id=1366287#c1).  However I think we should reconsider the choice of library.

libgmp has the big advantage of being a highly optimized codebase that has received a lot of work regarding asymptotic complexity.  This is great.  However, this may not be the most important use case for bigints in JavaScript, where I expect most bigints will be small; mostly in the 64-bit range and a bit wider.  There, some other concerns come into play:

  * gmp is easiest to use with malloc/free allocators; gmp memory isn't on the GC heap.  This means allocations are expensive, can't be moved, and it can perturb the heuristics for when to run GC.
  * gmp offers no advantages for 64-bit integers (I expect); there's just not that much to do
  * having to link to a shared library is a bit irritating; see bug 1491098.

I think we should reconsider the decision to use libgmp, and potentially just slot in a simple header-and-.cpp C++ replacement.

The only downside AFAICS would be that SpiderMonkey wouldn't be able to claim better perf than other engines, assuming libgmp has better perf for big multiplications, because we'd probably just use V8's src/objects/bigint.cc or something like that.
For at least the initial implementation, this is worth considering for concern #3 alone -- it may be less effort to switch to a simpler bigint library than to get gmp properly vendored and working on all platforms :) I feel less certain about the long-term implications, but it's also something that should be easy to experiment with. I guess the main questions would be:

1. which bignum library would replace gmp? V8's is a good candidate but would need changes for the different JS APIs; there are also bignum libraries from Lisp implementations, crypto libraries, etc. that might work
2. how should BigInt be adapted to the new API? E.g. libgmp's interface could be emulated (like mini-gmp), requiring no changes to BigInt.cpp, but maybe it would be better to factor out the library-dependent code
3. would GC optimizations be significantly easier with a different library? That's the biggest long-term issue with gmp, imho; giving SpiderMonkey full control over gmp memory management will probably require switching to the low-level mpn API, or extending gmp, or maybe both...

It might also be interesting to support both gmp and an alternative library for a while, and decide where to invest effort based on their relative performance.
I have been looking into this and the first issue is going to be memory management.  Both the V8 and JSC implementations of bigints allocate digit storage inline with the value.  (I would normally say "inline with the object", but as you know bigints are values, not objects.)  In SpiderMonkey it seems that's not really possible.  The GC has a number of fixed sizes it can use to allocate objects; truly variable-sized allocations are off-heap, made by malloc or mmap or similar allocators.

(If I make mistakes, corrections are very welcome!)

Anyway this means that adapting the V8/JSC implementations isn't quite as straightforward as it could be.  When you allocate digits and value inline, you have a less flexibility: you can't generally change the size of the digit storage.  But with a pointer out, you can.

:terpri suggested taking a look at the scheme48 implementation of bignums:

  http://www.s48.org/cgi-bin/hgwebdir.cgi/s48-stable/file/tip/c/bignum.h
  http://www.s48.org/cgi-bin/hgwebdir.cgi/s48-stable/file/tip/c/bignumint.h
  http://www.s48.org/cgi-bin/hgwebdir.cgi/s48-stable/file/tip/c/bignum.c

I think this makes sense.  It will take some massaging to get into nice C++ shape but I think it's a reasonable incremental change.

Unfortunately we are still left with the disadvantages of having memory off-GC-heap: finalizers are needed, not compactable, weird heuristics.  Oh well, it seems we can't help that for now.
(In reply to Andy Wingo [:wingo] from comment #2)
> Unfortunately we are still left with the disadvantages of having memory
> off-GC-heap: finalizers are needed, not compactable, weird heuristics.  Oh
> well, it seems we can't help that for now.

It works for string data for us now, it can work for bigints for sure to start.  IMO.  Probably some sort of hybrid where the JS::BigInt allocation can store a uint64_t (or int64_t?) inline, with bigger stuff relegated to the heap.

Given libgmp doesn't support allocator fallibility and would have to be linked to keep everything in a separate library with corresponding call overhead, I'm kinda happy to see something else used.  Tho the question of *which* something else, vexes.
Lars, what do you think?

It's important to reach a decision soon. Mike Shal is starting to work on libgmp build system integration in bug 1491098.

My current thinking:

-   Can we live with O(n*m) elementary-school multiplication? I think so.

-   Is there really a significant steady-state maintenance cost either for libgmp or for Scheme48's bignums? I think probably not. Arithmetic seldom changes.

-   Will either choice be fast enough in the 64-bit common case? I expect not--that either way, we'll have to special-case 64-bit BigInts, both how JS::Value stores them and how Ion optimizes them.

-   Given that, can we then live with the performance ceiling implied by dynamic-linking libgmp (which we'd have to do, as LGPL rules out static linking for us)? I think so, but I'd still rather avoid it.

I am undecided ==> ni?lth.
Flags: needinfo?(lhansen)
-   Does it help reduce schedule risk to vendor in some C files rather than link libgmp? I actually think it's kind of a wash. Mike is already working on the build system parts; and taking on new C++ work here introduces some schedule work too.
(In reply to Jason Orendorff [:jorendorff] from comment #4)

> It's important to reach a decision soon. Mike Shal is starting to work on
> libgmp build system integration in bug 1491098.

If he's doing that for us I think he should stop.  libgmp seems to have ruled itself out by its requirement for infallible allocation; it's much too easy to DOS the browser with some runaway bignum computation, we need OOM handling.

> My current thinking:
> 
> -   Can we live with O(n*m) elementary-school multiplication? I think so.

In the short term, definitely.

My guess is that in the long run this will be a competitive disadvantage, since, once bignums are in, all sorts of interesting uses for them will be found.  With wasm, mathematical solvers can move to the web fairly easily with call-outs to JS for faster-than-C-compiled-to-wasm bignum arithmetic, maybe (speculation; depends on all sorts of assumptions).  etc.

There have been a couple of discussions above re "dominant use cases".  From JS, 64-bit will probably dominate and will probably have to be specialized regardless of the library we choose.  Wasm's use will be different; the pure 64-bit use case is entirely served by wasm's 64-bit support; bignums will be used only for larger values in specialized domains.

> -   Is there really a significant steady-state maintenance cost either for
> libgmp or for Scheme48's bignums? I think probably not. Arithmetic seldom
> changes.

I agree.

> -   Will either choice be fast enough in the 64-bit common case? I expect
> not--that either way, we'll have to special-case 64-bit BigInts, both how
> JS::Value stores them and how Ion optimizes them.

I agree.

> -   Given that, can we then live with the performance ceiling implied by
> dynamic-linking libgmp (which we'd have to do, as LGPL rules out static
> linking for us)? I think so [...].

I'm inclined to agree to this too.

> -   Does it help reduce schedule risk to vendor in some C files rather than
> link libgmp? I actually think it's kind of a wash. 

I probably agree, as the costs we have with vendored C/C++ (irregexp, dtoa) are pretty clear to us - we effectively fork the code and updates become increasingly difficult, so it's not free.

Looking briefly at the S48 code, there are real questions about how well it will interact with our GC - whether it needs a lot of rooting work, and so on.  S48 does not have an incremental GC, I'm not even sure it has generational.  MIT Scheme, whence the bignum code appears to originate, also was non-incremental and (I believe, at least at the time) non-generational.  There will be some hidden assumptions.

The high bit is to use *a* library and not rewrite the code.  Robin and Andy have both looked at this in more detail than I have.  But at this point, given that we're not currently doing the work to specialize 64-bit in JS for the necessary perf, any library that is behind a reasonable API is fine; when we're ready to do the specialization work we can decide to change libraries.  Thus fretting about the library probably comes down mostly to risk: how hard is it to get it correct (fallible allocation of malloc'ed data vs proper rooting of heap-allocated structure), on time, and portable to all the platforms we care about.

For anything near reasonable performance long-term we'd need to have bignums allocated in the nursery (whether with malloc'd memory hanging off the object or memory inline); as Waldo says, we do nursery strings now, so why not.  But I think this argues in favor of a library that can take a custom allocator?

> I am undecided ==> ni?lth.

lth is undecided too but has rambled his way to some opinions:

The way I see it:

- if we take the s48 code we fork it, change it as we need, and never look back (which is ok, because s48 is probably "done")
- any of the others we try not to fork more than absolutely necessary (and gmp we can't fork very much?)
- the dynamic linking overhead for gmp is not a huge concern imo, but its use of infallible allocation is
- out-of-line storage for gmp is a minor concern at this time and a custom allocator can keep this reasonably cheap even in the long term
- in-line storage for v8, jsc sounds doable cf what we do for strings but is probably Real Work

The gmp manual is pretty explicit that allocation must be infallible (https://gmplib.org/manual/Custom-Allocation.html).  This seems like a huge red flag.  IMO it rules out gmp as a viable candidate.

The JSC and V8 libraries (which I have not looked at at all) give us parity on performance and move us in the direction where we want to be going, namely, we want to nursery-allocate bignums with inline storage, as we handle strings.  If we can avoid forking these too much we can benefit from future updates.  I would expect they are written with awareness of incremental & generational gc, which seems like a big deal.

The s48 code is simple, and we could try to adapt it to spidermonkey and get it past rooting and hazards analysis, and see where we're at.  But apart from simplicity and the benefit of owning the code (both of which reduce shipping risk in the short term), I suspect this is not the code we want to be using.  So sure, if we can make it simpler for ourselves by shipping v1 with this code we should, but I'm skeptical that we want to be shipping v2 with this code.  (Incidentally this echoes Robin's argument about swappable libraries behind an API.)
Flags: needinfo?(lhansen)
Good morning!

Thanks for taking a look, Jason and Lars.  I hadn't considered the infallible-allocation component.

I had a deeper look at the scheme48 code, and came up with the following.  Firstly, the object layout:

    static constexpr size_t InlineDigitsLength = 1;

    static constexpr uintptr_t SignBit = JS_BIT(js::gc::Cell::ReservedBits+1);
    static constexpr uintptr_t LengthShift = js::gc::Cell::ReservedBits+1;

    static ConstExpr uintptr_t MaxDigitLength = UINTPTR_MAX >> LengthShift;

    // The low js::gc::Cell::ReservedBits are reserved.
    size_t length_sign_and_reserved_bits_;
    uintptr_t digits_;

    size_t length() { return length_sign_and_reserved_bits_ >> LengthShift; }
    bool isZero() { return length() == 0; }
    bool isNegative() { return length_sign_and_reserved_bits_ & SignBit; }

    typedef mozilla::RangedPtr<uintptr_t> Digits;
    Digits digits() {
        if (length() <= InlineDigitsLength)
            return mozilla::RangedPtr(&digits_, length());
        return mozilla::RangedPtr(reinterpret_cast<uintptr_t*>(digits_), length());
    }

What do you think?

The upshot is that we can do inline allocations of 64-bit bigints in two words, on 64-bit platforms.  The first word has the GC bits, the length, and a sign flag.  The second has either a pointer to a uintptr_t[] array, or a uintptr_t.  I am not sure what to do to get 64-bit inline allocation on 32-bit platforms but that is a detail.

As far as the code goes, I ported over a few things: addition, subtraction, multiplication, and creating a bigint from a double.  I took the "fork and don't look back" approach -- i.e. some copying with shims to define e.g. BIGNUM_DIGIT_RADIX and such so that I wouldn't have to change other things, and some adapting external names and passing along a JSContext and dealing with rooting and allowing for allocation failures and so on.

If we assume that the scheme48 code is correct to begin with, a port like this has a small amount of typo-induced risk, but I think it can be OK.  Thoughts welcome.  The port is only 50% done at this point.

From a rooting perspective -- it turns out that the scheme48 code will always allocate bigints all-at-once, i.e., with sufficient space for their digits.  It will usually do this at the beginning of the function, and might "trim" the allocation at the end if needed.  It's obvious from inspecting the code that there will be no allocations before the return, so I would not be concerned from a rooting POV.  (There are a couple exceptions where we'd need to root values.)  Arguments become HandleBigInt obviously.

I think with inline 64-bit allocation we might be able to kick the need for the Real Work of generalized inline allocation down the road.

From a "does scheme48 have the code we want" perspective -- there I do not know.  I have had a look at all three but not with the eye of an experienced bigint implementor.

To my mind the only other alternative is taking JSC's library.  I wouldn't take V8's; it's a bit gnarly.  But both JSC's and V8's bottom out to a similar representation though as what we're looking at; if we see that some operation is much faster in one of them, we could consider cherry-picking the function.

Finally from a schedule POV, I feel like I'm really close -- like 2 days of work away.  But I am away next week.  I think we would see patches around middle of week 43.

WDYT Jason?  Input from others very welcome as well.
Flags: needinfo?(jorendorff)
Thanks you so much, Andy. I think this is going to leave us in a much better place than we were in with libgmp.

> I took the "fork and don't look back" approach

Go for it.

I have told the build team that we should hold off on the libgmp linkage work for now. (Mike has work-in-progress patches but it wasn't terribly far along yet; see bug 1491098.)

> From a "does scheme48 have the code we want" perspective -- there I do not know.
> I have had a look at all three but not with the eye of an experienced bigint
> implementor.

:-\ Does it know how to compare a bigint with a double? This is, of course, an advantage of using JSC's stuff.

Which, for the record: https://trac.webkit.org/browser/webkit/trunk/Source/JavaScriptCore/runtime/JSBigInt.cpp
Flags: needinfo?(jorendorff)
Is it worthwhile to look into existing Rust crates for BigInt support, in light of bug 1469027 already doing most of the hard work of using Rust in SpiderMonkey? I don't know anything about their technical merit, but there are a few published crates that look reasonably feature-complete:
https://crates.io/crates/num-bigint
https://crates.io/crates/ramp (seems to require Nightly Rust, might not be usable for us)
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #9)

> Is it worthwhile to look into existing Rust crates for BigInt support, in
> light of bug 1469027 already doing most of the hard work of using Rust in
> SpiderMonkey?

I don't think that integration work incorporated anything to deal with GC (heap allocation, rooting), so that's a concern.

Clearly allocation can be handled by callbacks to C++.  However, while the S48 code is structured so that rooting is a non-issue (judging from what Andy wrote earlier), we'd need the Rust code to have a similar property where there's at most a single heap allocation per operation, or we'd need to have rooting integration of some sort.
Gotcha. That's a sensible point. Presumably someone will cross the bridge of integrating Rust code with the SM GC, but there's no reason to block this work on that point unless there's another compelling reason.
Good morning!  Picking up this work again, I realized that the JSC/V8/Dart/Go representation of digits is more efficient than the Scheme48/Factor code.  For some reason, the Scheme48 code avoids using the top 2 bits in a digit (uintptr_t).  I think it's to have a bit of headroom, but it's entirely unnecessary; JSC/V8/Dart/Go can use all bits in a digit.  Before, I was confused and thought that Scheme48 did in fact use all digits.

So, I will switch to JSC's distillation of the V8 lineage of bignums.  Besides having a couple more optimizations (e.g. ability to use double-width multiplies) and being a bit cleaner than the Scheme48, it reduces perf risk, as in theory we will have the same fundamental perf as our friendly competition.

Apologies for the noise in going back and forth here!
Just a status update here, got the port all done, just haranguing the build system into getting it working.  Will upload a patch on Monday.
Depends on: 1502797
Priority: -- → P3

Fixed via bug 1502797. Thanks to :Waldo and :terpri for their help getting that done.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.