Closed Bug 1496402 Opened 6 years ago Closed 6 years ago

ConstraintTypeSet::trace reallocate its content with a oomUnsafe LifoAlloc on minor GCs.

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(1 file)

When doing a minor GC, we have to trace all ObjectKey* which changes the pointers value. TypeHashSet uses these pointers values to compute the hash. As they got modified by the tracing function their hashes are no longer valid and they have to be moved accordingly.

At the moment, this is done by iterating over all ObjectKey* elements, tracing them, and inserting them back in a new TypeHashSet allocated in a LifoAlloc.  This implies that every time ConstraintTypeSet::trace is called, we are going to increase the memory consumption a lot.

We have 2 approaches to this problem, either:
 1. We use a LifoAllocScope to create the new TypeHashSet before copying them back on the original TypeHashSet, and reclaim the memory when going out of the LifoAllocScope.
 2. We rewrite this mechanism to rehash the content in-place.

The first approach is the simplest but it might trip on the edge case of being at the end of a LifoAlloc chunks, which would involve much more logic than simply doing a bump allocator.

The second approach is more complex, but give us the opportunity to remove the oomUnsafe case, which appeared while testing Bug 1489572.
This patch implement the second strategy, which is to not rely on any additional memory and instead reuse the TypeHashSet memory.

TypeHashSet have 4 different storage representations based on the number of
element present in the structure:
  1. Empty list -- values == nullptr && count == 0.
  2. 1 Element  -- values is the element. (therefore casting is required)
  3. 2 .. SET_ARRAY_SIZE    -- values is an unordered array of elements.
  4. count > SET_ARRAY_SIZE -- values is an array of 2^(log_2(count) + 1) elements where elements are stored based on their hashes.

The most complex case is the last one, as we need to move each element based on
the new pointers obtained after tracing. This is achieved by following the the
same algorithm as InsertTry with a different rule for what is an empty cell.

The code and its comments should be clear enough, feel free to suggest better
comments or ask for details.

When benchmarking this patch on top of Bug 1489572 patch series, this seems to
improve Octane score by 2.33% and ares6 geo-mean by 2.46%.
Attachment #9014392 - Flags: review?(sphink)
Comment on attachment 9014392 [details] [diff] [review]
ConstraintTypeSet::trace: Rehash HashTypeSet in-place after tracing.

Review of attachment 9014392 [details] [diff] [review]:
-----------------------------------------------------------------

Great patch! And thank you for killing off an unhandled OOM. They're killing us right now.

::: js/src/vm/TypeInference-inl.h
@@ +1096,5 @@
>  
>          return nullptr;
>      }
> +
> +    template <class T, class U, class KEY, class Fun>

Why 'class' instead of 'typename'? 'class' is probably fine for T, U, and KEY, but it really seems like Fun should be typename.

Also, is there a reason for the different capitalization schemes of KEY and Fun?

@@ +1098,5 @@
>      }
> +
> +    template <class T, class U, class KEY, class Fun>
> +    static void
> +    Map(U**& values, unsigned count, Fun f)

I find the name "Map" confusing. I guess you meant it in the functional sense, where you map a function over an iterator or something? The problem is that it looks like the name of a data structure here.

Rebuild(), maybe? Hm, that's kind of specific to the hashtable case. Update()? MapTypes()? MapEntries()? UpdateEntries()?

@@ +1109,5 @@
> +
> +        // When we have a single element it is stored in-place of the function
> +        // array pointer.
> +        if (count == 1) {
> +            values = (U**) f((U*) values);

I think I'd prefer reinterpret_cast to highlight some of the weirdness going on here.

@@ +1115,5 @@
> +        }
> +
> +        // When we have less than SET_ARRAY_SIZE, the values is an unorderred
> +        // array.
> +        if (count <= SET_ARRAY_SIZE) {

The code and comment don't match. "When we have SET_ARRAY_SIZE or fewer elements,..."?

@@ +1122,5 @@
> +            }
> +            return;
> +        }
> +
> +        // This code apply the function f and rellocate the values based on the

*applies *relocates (more s/rellocate/relocate/ below.)

@@ +1153,5 @@
> +        for (unsigned i = 0; i < count; i++) {
> +            U* elem = values[i];
> +            if ((reinterpret_cast<uintptr_t>(elem) & 1) == 0) {
> +                // If this is a newly set pointer, this implies that one of the
> +                // previous object got moved to this position.

s/object got/objects was/

@@ +1169,5 @@
> +                values[pos] = elem;
> +                // The replaced element is either a nullptr, which stops this
> +                // loop, or an element to be inserted, which would be inserted
> +                // by this loop.
> +                elem = next_elem;

Nice! But I think you can simplify this to

  for (i=0..count) {
    U* elem = values[i];
    while (elem & 1) {
      elem = elem ^ 1;
      unsigned pos = HashKey(...);
      while (values[pos] != nullptr && (values[pos] & 1) == 0) {
        pos = (pos + 1) & (capacity - 1);
      }
      swap(elem, values[pos]);
    }
  }

It would be a lot easier to read with static functions that test, set, and clear tag bits (and do the casting), but that's probably too much trouble for this.

This also isn't the greatest hashing scheme, since it's susceptible to long runs of full buckets and the distribution of the bucket chain length is very uneven. It's potentially slower, but I'm sort of curious whether this would result in overall faster lookups:

    unsigned mask = capacity - 1;
    for (i=0..count) {
        U* elem = values[i];
        if ((elem & 1) == 0) {
            continue;
        }
        while (elem) {
            elem = elem & ~1;
            unsigned pos = HashKey(...) & mask;
            unsigned distance = 0;
            while (values[pos] != nullptr) {
                if ((values[pos] & 1) == 0) {
                    unsigned existing = ((HashKey(values[pos]) & mask) + capacity - distance) & mask;
                    if (existing < distance) {
                        break; // Steal slot from existing resident.
                    }
                }
                pos = (pos + 1) & mask;
                distance++;
            }
            swap(elem, values[pos]);
        }
    }

I'm sure I messed up something in there, so please consider the idea not the code. It's just an attempt to use Robin Hood hashing: when inserting and getting a collision, give the bucket to the item that is farthest from its initial probe point. It's quite a bit slower to do the table rebuild, but should even out the chain lengths. (Even better would be to give the bucket to the item that gets looked up the most, but that would be expensive to track.)

Sorry, that wasn't really a review comment. I'm fine with the way you had it, it just got me curious as to whether we could do better.

::: js/src/vm/TypeInference.cpp
@@ +4550,5 @@
>          TraceObjectKey(trc, &key);
> +        return key;
> +    };
> +    TypeHashSet::Map<ObjectKey*, ObjectKey, ObjectKey, decltype(traceFun)>(
> +        objectSet, objectCount, traceFun);

I *think* this should all just work if you do

    TypeHashSet::Map<ObjectKey*>(objectSet, baseObjectCount(), [&](ObjectKey* key) -> ObjectKey* {
        TraceObjectKey(trc, &key);
        return key;
    });

or at most you could do TypeHashSet::Map<ObjectKey*, ObjectKey, ObjectKey>(...) and let the compiler infer just the lambda type. (That might not have worked on MSVC not long ago, but I'm pretty sure it works today.)
Attachment #9014392 - Flags: review?(sphink) → review+
(In reply to Steve Fink [:sfink] [:s:] from comment #2)
> Nice! But I think you can simplify this to
> 
>   for (i=0..count) {
>     U* elem = values[i];
>     while (elem & 1) {
>       elem = elem ^ 1;
>       unsigned pos = HashKey(...);
>       while (values[pos] != nullptr && (values[pos] & 1) == 0) {
>         pos = (pos + 1) & (capacity - 1);
>       }
>       swap(elem, values[pos]);
>     }
>   }

In this case, removing the line `values[i] = nullptr` before starting the inner loop will cause an infinite loop in the case where pos == i.

> It would be a lot easier to read with static functions that test, set, and
> clear tag bits (and do the casting), but that's probably too much trouble
> for this.
> 
> This also isn't the greatest hashing scheme, since it's susceptible to long
> runs of full buckets and the distribution of the bucket chain length is very
> uneven. It's potentially slower, but I'm sort of curious whether this would
> result in overall faster lookups:
> 
>     unsigned mask = capacity - 1;
>     for (i=0..count) {
>         U* elem = values[i];
>         if ((elem & 1) == 0) {
>             continue;
>         }
>         while (elem) {
>             elem = elem & ~1;
>             unsigned pos = HashKey(...) & mask;
>             unsigned distance = 0;
>             while (values[pos] != nullptr) {
>                 if ((values[pos] & 1) == 0) {
>                     unsigned existing = ((HashKey(values[pos]) & mask) +
> capacity - distance) & mask;

nit?
  unsigned existing = (pos + capacity - (HashKey(values[pos]) & mask)) & mask;

>                     if (existing < distance) {
>                         break; // Steal slot from existing resident.
>                     }
>                 }
>                 pos = (pos + 1) & mask;
>                 distance++;
>             }
>             swap(elem, values[pos]);
>         }
>     }
> 
> I'm sure I messed up something in there, so please consider the idea not the
> code. It's just an attempt to use Robin Hood hashing: when inserting and
> getting a collision, give the bucket to the item that is farthest from its
> initial probe point. It's quite a bit slower to do the table rebuild, but
> should even out the chain lengths. (Even better would be to give the bucket
> to the item that gets looked up the most, but that would be expensive to
> track.)

If I understand correctly, you are trying to minimize the distance in case of collisions.

At the moment the array in which the elements are stored is at least twice as large than the number of elements, which minimize the number of chains if the hashing function is random enough.
QA Contact: sdetar
QA Contact: sdetar
(In reply to Steve Fink [:sfink] [:s:] from comment #2)
> > +
> > +    template <class T, class U, class KEY, class Fun>
> 
> Why 'class' instead of 'typename'? 'class' is probably fine for T, U, and
> KEY, but it really seems like Fun should be typename.

C++ does not distinguish `typename`, `class` and `struct` in template parameters. Lambda function are actually C++ classes holding references to the stack or copies of the values. Anyhow, I will use typename for `Fun`.

> Also, is there a reason for the different capitalization schemes of KEY and
> Fun?

No, there is none. The existing code use a capitalized KEY typename. I changed it for MapEntry to be capitalized.

> > +    Map(U**& values, unsigned count, Fun f)
> 
> I find the name "Map" confusing. I guess you meant it in the functional
> sense, where you map a function over an iterator or something? The problem
> is that it looks like the name of a data structure here.

Renamed to MapEntries.

> > +            values = (U**) f((U*) values);
> 
> I think I'd prefer reinterpret_cast to highlight some of the weirdness going
> on here.

Done.

> @@ +1115,5 @@
> > +        }
> > +
> > +        // When we have less than SET_ARRAY_SIZE, the values is an unorderred
> > +        // array.
> > +        if (count <= SET_ARRAY_SIZE) {
> 
> The code and comment don't match. "When we have SET_ARRAY_SIZE or fewer
> elements,..."?

Fixed. This is one of the difference between French and English. :/

> It would be a lot easier to read with static functions that test, set, and
> clear tag bits (and do the casting), but that's probably too much trouble
> for this.

I added 2 lambdas within MapEntries: lowBit and toggleLowBit and changed the code using these 2 functions.

> ::: js/src/vm/TypeInference.cpp
> @@ +4550,5 @@
> >          TraceObjectKey(trc, &key);
> > +        return key;
> > +    };
> > +    TypeHashSet::Map<ObjectKey*, ObjectKey, ObjectKey, decltype(traceFun)>(
> > +        objectSet, objectCount, traceFun);
> 
> or at most you could do TypeHashSet::Map<ObjectKey*, ObjectKey,
> ObjectKey>(...) and let the compiler infer just the lambda type. (That might
> not have worked on MSVC not long ago, but I'm pretty sure it works today.)

Done. The `Key` template parameter had to be specified manually.
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0dcf343133e9
ConstraintTypeSet::trace: Rehash HashTypeSet in-place after tracing. r=sfink
https://hg.mozilla.org/mozilla-central/rev/0dcf343133e9
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Regressions: 1543159
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: