Closed Bug 1481998 Opened 6 years ago Closed 6 years ago

Make mozilla::Hash{Map,Set}'s entry storage allocation lazy

Categories

(Core :: MFBT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

(Whiteboard: [overhead:45k])

Attachments

(3 files, 2 obsolete files)

Bug 1050035 did likewise for PLDHashTable/nsTHashtable.

The main motivation is to make the API nicer, by avoiding the need for init() calls. It also can save memory because some tables never are inserted into.
GCHashMap will shortly lose its initialized() function, so we need another way
to indicate whether this table has been created. This patch changes it to use a
pointer instead.
Attachment #8998709 - Flags: review?(luke)
Attached patch Remove putNewInfallible() (obsolete) — Splinter Review
It's not compatible with lazy entry storage, because lazy entry storage
prevents us from definitely allocating a known number of entries ahead of time.

Fortunately it's only used in a small number of places.
Attachment #8998710 - Flags: review?(luke)
The two functions are almost identical. This change minimizes duplication.
Attachment #8998711 - Flags: review?(luke)
luke: please review the changes to mfbt/HashTable.h and
js/src/jsapi-tests/testHashTable.cpp.

sfink: please review the rest. Apologies for the tedium.
Attachment #8998712 - Flags: review?(sphink)
Attachment #8998712 - Flags: review?(luke)
Attachment #8998709 - Flags: review?(luke) → review+
Comment on attachment 8998710 [details] [diff] [review]
Remove putNewInfallible()

Review of attachment 8998710 [details] [diff] [review]:
-----------------------------------------------------------------

I guess in theory a reserve() could be added (such that puts could be infallible, but given the low usage, that doesn't seem worth it).

::: js/public/UbiNodeDominatorTree.h
@@ +365,5 @@
>              return false;
> +        for (uint32_t i = 0; i < length; i++) {
> +            if (!map.putNew(postOrder[i], i)) {
> +                return false;
> +            }

nit: no {} around single-line then branch
Attachment #8998710 - Flags: review?(luke) → review+
Attachment #8998711 - Flags: review?(luke) → review+
I'd like some help understanding why the design of clear()/clearAndPrepareForLength() is the right one.  My concern is that clear() does not free memory, both in the C++ standard library, and traditionally for Hash{Map,Set}; it just calls destructors and changes the count() but leaves the memory allocation unchanged.

The methods I would've intuitively expected are:
 clear(): calls destructors
 compact(): reallocs (if non-empty and underloaded) or frees (if empty) the table
and then one of:
 reserve(len): eagerly allocates enough memory for len
 prepareForLength(len): request the next allocation prepare for len

The equivalent of clearAndPrepareForLength(len) could then be achieved by: clear();compact();prepareForLength(len).

That leaves the question of whether reserve() or prepareForLength() are better.  The former could justify keeping putNewInfallible() which *in theory* could be a valuable perf win, although it looks like in practice not.  Is the benefit of the latter that it might avoid allocation altogether if the add() never ended up happening?  Personally, I find the meaning of the former simpler and more consistent with other containers.
(ni so I can get a ping when you answer)
Flags: needinfo?(n.nethercote)
Comment on attachment 8998712 [details] [diff] [review]
Make mozilla::Hash{Map,Set}'s entry storage allocation lazy

Review of attachment 8998712 [details] [diff] [review]:
-----------------------------------------------------------------

Whee, that was fun. But it's an impressive amount of code removal. I like it.

And in another bug I was thinking of asking you if you could arrange for init() to be removed if you use InfallibleAllocPolicy. This is much better!

::: js/src/gc/WeakMap.h
@@ +176,5 @@
>  
>      void sweep() override;
>  
> +    void clearOut() override {
> +        Base::clear();

Why the clearOut() name? Could this just be `using Base::clear;` (There's a `using Base::remove` in this class already.)

::: js/src/vm/ObjectGroup.cpp
@@ +1745,2 @@
>          arrayObjectTable->clear();
> +    if (plainObjectTable) {

I sort of wonder whether it's necessary to make all these table pointers be lazy now that the tables themselves are lazy, but I guess that'd be a separate change if it's even a good idea.

::: mfbt/HashTable.h
@@ +406,5 @@
>  
>    // -- Clearing -------------------------------------------------------------
>  
> +  // Remove all elements and free the entry storage, leaving it in an empty
> +  // state read to be used again. Afterwards, when the first element is added

*ready

@@ +706,5 @@
>  
>    // -- Clearing -------------------------------------------------------------
>  
> +  // Remove all elements and free the entry storage, leaving it in an empty
> +  // state read to be used again. Afterwards, when the first element is added

*ready

@@ +1538,5 @@
> +                    sMaxInit,
> +                  "multiplication in numerator below could overflow");
> +    static_assert(sMaxInit * sAlphaDenominator <=
> +                    UINT32_MAX - sMaxAlphaNumerator,
> +                  "numerator calculation below could potentially overflow");

I think this could be loosened a little bit if you used newCapacity = 1 + ((aLen - 1) * sAlphaDenominator) / sMaxAlphaNumerator. But... why bother?

Oh. I just noticed this is just code motion. Yeah, don't change anything.

@@ +1849,5 @@
> +    // Compress if a quarter or more of all entries are removed. Note that this
> +    // always succeeds if capacity() == 0 (i.e. entry storage has not been
> +    // allocated), which is what we want, because it means changeTableSize will
> +    // allocate the requested capacity rather than doubling it.
> +    bool shouldCompressTable = mRemovedCount >= (capacity() >> 2);

Huh. It seems like this should be using the min-alpha stuff above (in some constexpr way that doesn't introduce any multiplications or divisions.) Or at least, I *thought* that was what min-alpha was meant to represent. static_assert(sMinAlphaNumerator == 1 && sAlphaDenominator == 1 << 2) or something? (Or file a followup if you don't think it belongs here.)
Attachment #8998712 - Flags: review?(sphink) → review+
Comment on attachment 8998710 [details] [diff] [review]
Remove putNewInfallible()

The putNewInfallible() removal won't be necessary once the other changes are done.
Attachment #8998710 - Attachment is obsolete: true
Flags: needinfo?(n.nethercote)
> It seems like this should be using the min-alpha stuff above

The "underloaded" check and the "rehash due to many removed entries" check both use 25% as their threshold, but there is no fundamental reason for that, which is why they don't share code. That could be clarified, but yes it would be better as a follow-up.
Luke, I changed clear() et al as per our discussion this morning. It required
some slightly more extensive changes within HashTable.h. I also added
clearAndCompact(), which is more or less equivalent to the old finish() method,
because it's needed in quite a few places.
Attachment #8999103 - Flags: review?(luke)
Attachment #8998712 - Attachment is obsolete: true
Attachment #8998712 - Flags: review?(luke)
Comment on attachment 8999103 [details] [diff] [review]
Make mozilla::Hash{Map,Set}'s entry storage allocation lazy

Review of attachment 8999103 [details] [diff] [review]:
-----------------------------------------------------------------

Great work, this simplifies a lot of code.

::: mfbt/HashTable.h
@@ +1595,5 @@
> +    // Reject all lengths whose initial computed capacity would exceed
> +    // sMaxCapacity. Round that maximum aLen down to the nearest power of two
> +    // for speedier code.
> +    if (MOZ_UNLIKELY(aLen > sMaxInit)) {
> +      MOZ_CRASH("initial length is too large");

Thinking about code bloat for something inlined in a bajillion places, I wonder if there would be non-negligible size impact for each MOZ_CRASH().

@@ +2049,5 @@
> +
> +  MOZ_MUST_USE bool reserve(uint32_t aLen)
> +  {
> +    if (aLen == 0) {
> +        return true;

nit: indent
Attachment #8999103 - Flags: review?(luke) → review+
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6bb61989d9ae
Change sJSObjWrappers initialization. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/223884f0ad76
Define lookup() in terms of readonlyThreadsafeLookup(). r=luke
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ad30dc53e38e
Make mozilla::Hash{Map,Set}'s entry storage allocation lazy. r=luke,sfink
> > +    if (MOZ_UNLIKELY(aLen > sMaxInit)) {
> > +      MOZ_CRASH("initial length is too large");
> 
> Thinking about code bloat for something inlined in a bajillion places, I
> wonder if there would be non-negligible size impact for each MOZ_CRASH().

I tried replacing it with a (debug-only) assertion, but it only reduced the size of libxul by 800 bytes on Linux64 so I kept the MOZ_CRASH.
This improved our base content JS numbers by 45K \o/
Whiteboard: [overhead:45k]
Yay! I was doing this mostly for the API improvement, but the memory reduction is great news.
Depends on: 1482881
Depends on: 1483182
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: