Closed Bug 1233857 Opened 9 years ago Closed 8 years ago

Remove the nursery performance cliff caused by putting large arrays in the WholeObjectBuffer

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla47

Tracking Flags:

Tracking

Status

firefox46

---

affected

firefox47

---

fixed

People

(Reporter: terrence, Assigned: fitzgen, Mentored)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 6 obsolete files)

WIP WIP WIP 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 24.28 KB, patch		Details \| Diff \| Splinter Review
Octane without this patch vs w/ this patch 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 269.34 KB, image/png		Details
Teach the JIT how to put individual elements' edges in the store buffer 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 25.90 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
Teach the JIT how to put individual elements' edges in the store buffer; r=jandem 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 27.33 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
Follow up: Add a new GC zeal mode for the elements edges barrier 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 3.76 KB, patch		Details \| Diff \| Splinter Review
Follow up: Add a new GC zeal mode for the elements edges barrier 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.42 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Teach the JIT how to put individual elements' edges in the store buffer 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 27.32 KB, patch	fitzgen : review+	Details \| Diff \| Splinter Review
Follow up: Add a new GC zeal mode for the elements edges barrier 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.52 KB, patch	fitzgen : review+	Details \| Diff \| Splinter Review
Follow up: Add a new GC zeal mode for the elements edges barrier 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.41 KB, patch	fitzgen : review+	Details \| Diff \| Splinter Review

Terrence Cole [:terrence]

Reporter

Description

•

9 years ago

Problem
=======
In IonMonkey, post-write barriers for Arrays have a very clever (and surprisingly simple) implementation. Instead of storing a reference to each and every slot that was written, we just store the source object and then at GC time trace every element in the Array to discover the actual set of cross-generation edges.

This sounds, on paper, like it would be sub-optimal compared to recording the actual set of written slots; however, for most programs, some subtle aspects of our implementation make the prior implementation faster in practice. First, most Objects, including Arrays, are fairly small and are normally written to in one block of writes. Thus, the number of non-cross generational edges we visit for most objects is actually quite low in practice. Secondly, we have a one-element cache in front of the HashTable backing the StoreBuffer. Thus, subsequent writes to the same object are a nop, whereas recording the actual slots would probably result in a hash on each write.

Unfortunately, there is one serious downside to this approach: the massive performance cliff that occurs if one writes a cross generation edge into a large array that is mostly filled with same-generation edges. In this case, we spend an inordinate amount of time visiting edges that are not actually part of the remembered set.

since we've inlined the entire tenuring path, it is only 10's of ms per million edges, but that can still be extremely bad.

Solution
========

The typical solution to this problem is known as "card marking". Because we do not control our element memory, however, this approach will not work for us. We've discussed writing an elements allocator to make this possible, but there is actually another way that is much simpler: specialize the jit to use a more exact barrier when writing to a large array.

Since C++ code is already so slow, we do not bother use the whole-object buffer there and already use an exact method there. In C++, elements are represented via the HeapSlot class [1]: a wrapper around Value. As can be seen at [2], we store an exact reference to the slot when storing to an object. Given that the address of the elements vector may be relocated via realloc, we store this as the source object + offset + whether it is slots or elements being stored. The GC then re-looks-up the correct address at the time the GC happens [3].

The purpose of this bug is to teach IonMonkey how to insert into this store buffer instead of the other one when writing to a large array. Subtly, but importantly, we also need to ensure that we recompile any code that uses the small Array store buffer to use the large array store buffer when the target array gets large. There is already a sophisticated constraint engine in place to make exactly this sort of thing possible: Jan or Brian will be able to show us how this works once we get something working.

The relevant method to look at in IonMonkey is jsop_setelem [4]. The store buffers nodes are added in the implementations of setelem for concrete types that may have cross-generation edges (e.g. not TypedArrays, etc), here [5]. The source object is already in a register and the vector type is obviously "elements", so the only thing we need to do in the new case is compute the offset.

1- https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Barrier.h?from=HeapSlot#639
2- https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Barrier.h?from=HeapSlot#693
3- https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Marking.cpp#1993
4- https://dxr.mozilla.org/mozilla-central/source/js/src/jit/IonBuilder.cpp#9620
5- https://dxr.mozilla.org/mozilla-central/source/js/src/jit/IonBuilder.cpp#9963,9995