Allow storage of a unique id for a cell independent of address

RESOLVED FIXED in Firefox 43

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: terrence, Assigned: terrence)

Tracking

Trunk
mozilla44
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox43 fixed, firefox44 fixed)

Details

Attachments

(4 attachments, 1 obsolete attachment)

This is the infrastructure code from the POC in bug 1194544, split out, refined, and purified. I have not yet rebuilt the rekey removal on top of the new API bits, but I did include a test that covers most of the edge cases I ran into in writing the POC, so I don't anticipate any problems.
Attachment #8650594 - Flags: review?(jcoppeard)
Assignee: nobody → terrence
Status: NEW → ASSIGNED
Comment on attachment 8650594 [details] [diff] [review]
allow_storage_of_unique_ids_for_cells-v0.diff

Review of attachment 8650594 [details] [diff] [review]:
-----------------------------------------------------------------

This all looks really nice.  I'd like to talk about the locking before r+ing this though.

I think the main reason we need locking is access by the background thread.  Is this the only problem?

Another approach to this might be to have the background thread zero the id only and sweep these tables on the main thread at some point when the background thread is not sweeping (e.g. at the start of a GC).  If we did that maybe we wouldn't need to lock at all.

I'm also wondering why we store the map in the chunk rather than the zone.  Is this to reduce lock contention?  If we stored the map on the zone I'm wondering if there is a splay operation like rekey that would allow us to update the entry without allocating when we tenure or relocate a cell.

::: js/src/ds/SpinLock.h
@@ +24,5 @@
> +    void lock() {
> +        do {
> +            while (locked_)
> +                ; // Spin until the lock seems free.
> +        } while (!locked_.compareExchange(false, true)); // Atomically take the lock.

I'm a bit hesitant about this.  Shouldn't this be implemented by the platform?  I'm sure there are all sorts of gotchas when implemententing your own locks.

::: js/src/gc/Heap.h
@@ +1570,5 @@
> +        map->remove(this);
> +}
> +
> +UniqueIdMap*
> +ChunkTrailer::createUniqueIdsMap(const AutoSpinLock&)

This could be named to indicate that it doesn't create if the map already exists.

@@ +1583,5 @@
> +ChunkTrailer::dropUniqueIdsMap(const AutoSpinLock&)
> +{
> +    MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime));
> +    if (uniqueIds_) {
> +        // printf("DROPPING MAP FOR: %p\n", Chunk::fromPointer(this));

Debugging printf can be removed.

::: js/src/jsapi-tests/testGCUniqueId.cpp
@@ +6,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +BEGIN_TEST(testGCUID)
> +{
> +    uint64_t uid, tmp;

Maybe zero initialize uid since we're testing its value later.

@@ +11,5 @@
> +
> +    // Ensure the heap is as minimal as it can get.
> +    JS_GC(rt);
> +    JS_GC(rt);
> +    JS_GC(rt);

The multiple GCs feel a bit magical.  Maybe factor out and comment why this is necessary?  (I'm not sure myself why it's necessary)

::: js/src/jsgc.cpp
@@ +503,5 @@
>              firstThingOrSuccessorOfLastMarkedThing = thing + thingSize;
>              nmarked++;
>          } else {
>              t->finalize(fop);
> +            t->Cell::finalize(fop);

Can we do this conditionally for kinds that support use as hash table keys?

@@ +2026,5 @@
>      memcpy(dst, src, thingSize);
>  
> +    // Move any uid attached to the object.
> +    if (!dst->adoptUniqueId(src))
> +        CrashAtUnhandlableOOM("failed to insert existing external hash code");

This adds an unhandlable OOM to the compacting path :-(
Attachment #8650594 - Flags: review?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #1)
> Comment on attachment 8650594 [details] [diff] [review]
> allow_storage_of_unique_ids_for_cells-v0.diff
> 
> Review of attachment 8650594 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This all looks really nice.  I'd like to talk about the locking before r+ing
> this though.
> 
> I think the main reason we need locking is access by the background thread. 
> Is this the only problem?

Yes, that is the only thing that needs locking.

> Another approach to this might be to have the background thread zero the id
> only and sweep these tables on the main thread at some point when the
> background thread is not sweeping (e.g. at the start of a GC).  If we did
> that maybe we wouldn't need to lock at all.

Zero how? The mutator cannot allocate new objects adjacent to the background sweeping because the arenas have been moved off the allocation list, but it can still use live objects that are adjacent on those arenas. As this is a splay tree, looking up an entry modifies the tree, so even concurrent lookups will corrupt the tree.

Further than that, doing the main removal on-thread would mean that we are suddenly doing work proportional to the number of dead objects, which seems sub-optimal. I'm sure we could probably make this incremental somehow, but I wanted to minimize complications at least at first.

> I'm also wondering why we store the map in the chunk rather than the zone. 

the main reason is that I didn't really think to put it there. :-/

However, putting it on the chunk is one fewer indirections to find the map and takes zero additional room, as it fits in the padding. On the other hand, more thoughts on that further down.

> Is this to reduce lock contention?

No, I made it a SpinLock specifically because I don't anticipate contention. It is only used by two threads, and then only when background sweeping. The SpinLock is really nice here because the uncontended overhead is a single read followed by a CAS. 

>  If we stored the map on the zone I'm
> wondering if there is a splay operation like rekey that would allow us to
> update the entry without allocating when we tenure or relocate a cell.

Oh, good idea! That is probably possible. I had not worried too much about that because the existing SplayTree is very good about re-using dead nodes and I don't expect the overall fill rate to be that variable.

> ::: js/src/ds/SpinLock.h
> @@ +24,5 @@
> > +    void lock() {
> > +        do {
> > +            while (locked_)
> > +                ; // Spin until the lock seems free.
> > +        } while (!locked_.compareExchange(false, true)); // Atomically take the lock.
> 
> I'm a bit hesitant about this.  Shouldn't this be implemented by the
> platform?  I'm sure there are all sorts of gotchas when implemententing your
> own locks.

Well, ideally, the platform would have a lockless SplayTree (or even an RBTree) and we could just use that. Unfortunately, I'm stuck trailblazing here. I'm thinking that if this implementation is buggy, TSAN will show it up so we can fix it.

> ::: js/src/gc/Heap.h
> @@ +1570,5 @@
> > +        map->remove(this);
> > +}
> > +
> > +UniqueIdMap*
> > +ChunkTrailer::createUniqueIdsMap(const AutoSpinLock&)
> 
> This could be named to indicate that it doesn't create if the map already
> exists.

Done.

> @@ +1583,5 @@
> > +ChunkTrailer::dropUniqueIdsMap(const AutoSpinLock&)
> > +{
> > +    MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime));
> > +    if (uniqueIds_) {
> > +        // printf("DROPPING MAP FOR: %p\n", Chunk::fromPointer(this));
> 
> Debugging printf can be removed.

D'oh! Thought I had moved all of those to a separate patch already.

> ::: js/src/jsapi-tests/testGCUniqueId.cpp
> @@ +6,5 @@
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +BEGIN_TEST(testGCUID)
> > +{
> > +    uint64_t uid, tmp;
> 
> Maybe zero initialize uid since we're testing its value later.

Good thought! Done.

> @@ +11,5 @@
> > +
> > +    // Ensure the heap is as minimal as it can get.
> > +    JS_GC(rt);
> > +    JS_GC(rt);
> > +    JS_GC(rt);
> 
> The multiple GCs feel a bit magical.  Maybe factor out and comment why this
> is necessary?  (I'm not sure myself why it's necessary)

The second one is to force us to wait on background sweeping. The third one is tradition / superstition. I've factored it into MinimizeHeap and removed the third GC.

> ::: js/src/jsgc.cpp
> @@ +503,5 @@
> >              firstThingOrSuccessorOfLastMarkedThing = thing + thingSize;
> >              nmarked++;
> >          } else {
> >              t->finalize(fop);
> > +            t->Cell::finalize(fop);
> 
> Can we do this conditionally for kinds that support use as hash table keys?

Yeah, we might as well. I was thinking we'd want it to be future proof, but the errors are obvious enough if we do start moving other kinds. For now I've moved this call to JSObject::finalize.

> @@ +2026,5 @@
> >      memcpy(dst, src, thingSize);
> >  
> > +    // Move any uid attached to the object.
> > +    if (!dst->adoptUniqueId(src))
> > +        CrashAtUnhandlableOOM("failed to insert existing external hash code");
> 
> This adds an unhandlable OOM to the compacting path :-(

Yes. :-(

The underlying alloc is in the LifoAlloc on a 16 * sizeof(UniqueIdMap::Pair) block. I guess this would be a serious advantage of putting them all on the Zone and adding the ability to re-sort a modified SplayTree entry in-place. Let me experiment a bit and see how it looks.
(In reply to Terrence Cole [:terrence] from comment #2)

> Zero how? The mutator cannot allocate new objects adjacent to the background
> sweeping because the arenas have been moved off the allocation list, but it
> can still use live objects that are adjacent on those arenas. As this is a
> splay tree, looking up an entry modifies the tree, so even concurrent
> lookups will corrupt the tree.

Ah right, I forgot that reading will modify the tree.  Please disregard this :)

> Well, ideally, the platform would have a lockless SplayTree (or even an
> RBTree) and we could just use that. Unfortunately, I'm stuck trailblazing
> here. I'm thinking that if this implementation is buggy, TSAN will show it
> up so we can fix it.

Ok, let's go for it then.

> > The multiple GCs feel a bit magical.  Maybe factor out and comment why this
> > is necessary?  (I'm not sure myself why it's necessary)
> 
> The second one is to force us to wait on background sweeping. The third one
> is tradition / superstition. I've factored it into MinimizeHeap and removed
> the third GC.

I think we can use AutoFinishGC to wait for background sweep to finish.

> The underlying alloc is in the LifoAlloc on a 16 * sizeof(UniqueIdMap::Pair)
> block. I guess this would be a serious advantage of putting them all on the
> Zone and adding the ability to re-sort a modified SplayTree entry in-place.
> Let me experiment a bit and see how it looks.

Great!
(In reply to Jon Coppeard (:jonco) from comment #3)
> (In reply to Terrence Cole [:terrence] from comment #2)
> 
> > Well, ideally, the platform would have a lockless SplayTree (or even an
> > RBTree) and we could just use that. Unfortunately, I'm stuck trailblazing
> > here. I'm thinking that if this implementation is buggy, TSAN will show it
> > up so we can fix it.
> 
> Ok, let's go for it then.

\o/

> > > The multiple GCs feel a bit magical.  Maybe factor out and comment why this
> > > is necessary?  (I'm not sure myself why it's necessary)
> > 
> > The second one is to force us to wait on background sweeping. The third one
> > is tradition / superstition. I've factored it into MinimizeHeap and removed
> > the third GC.
> 
> I think we can use AutoFinishGC to wait for background sweep to finish.

Whoa!

> > The underlying alloc is in the LifoAlloc on a 16 * sizeof(UniqueIdMap::Pair)
> > block. I guess this would be a serious advantage of putting them all on the
> > Zone and adding the ability to re-sort a modified SplayTree entry in-place.
> > Let me experiment a bit and see how it looks.
> 
> Great!

Actually, we don't even need a new method! Removal caches the removed node, so the next insertion is guaranteed to succeed. This all happens under a lock, so we can just assert success. Moreover, since we never change zones, we only need to take one lock in adopt, not three. Doubly moreover, we can be roughly certain that every zone will have at least one hashed object in it, so we can just have a UniqueIdMap on the zone and remove lazy initialization altogether.
Oh, one major downside: Nursery::sweep. Any uid that gets added to a Nursery thing needs to get removed during minor GC and I'm not sure how that can happen efficiently. I guess the ObjectHasher could check for nursery things and update a side, side table, but that's kinda gross.
(In reply to Terrence Cole [:terrence] from comment #5)
Ah damn.  Having a special table for nursery things is not too weird though - we already treat nursery objects differently in many ways.

Something else occurred to me - maybe we can set a flag on ObjectGroup if any objects in that group are have unique IDs assigned, and use this to avoid the lookup on finalization if possible.

Anyway, r+.
Attachment #8650594 - Flags: review+
(In reply to Jon Coppeard (:jonco) from comment #6)
> (In reply to Terrence Cole [:terrence] from comment #5)
> Ah damn.  Having a special table for nursery things is not too weird though
> - we already treat nursery objects differently in many ways.
> 
> Something else occurred to me - maybe we can set a flag on ObjectGroup if
> any objects in that group are have unique IDs assigned, and use this to
> avoid the lookup on finalization if possible.

Oooh, good idea. I'll try that out.

> Anyway, r+.

Heh. This required a pretty much total rewrite, so I'm totally giving you a second pass at it once I get things compiling again.
Alright, it took awhile, but this has been pretty well battle hardened.

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f167beeceb2

I took care of the key explosion from self-hosting by making that debug-only in bug 1199822. We'll have about:memory entries and awsy for it, so we can continue monitoring as we get further with the conversion. I should probably add telemetry as well.
Attachment #8650594 - Attachment is obsolete: true
Attachment #8655522 - Flags: review?(jcoppeard)
Now that the infrastructure is in place, this adds the next level up: using it to provide hash codes. The one complexity here is that we now expect to be able to update key values without needing to rehash, so we need to provide mutable key methods for Sets and Maps, clearly labeled.
Attachment #8655526 - Flags: review?(jcoppeard)
Comment on attachment 8655522 [details] [diff] [review]
allow_storage_of_unique_ids_for_cells-v1.diff

Review of attachment 8655522 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: js/src/jsgc.cpp
@@ +1100,5 @@
>      stats(rt),
>      marker(rt),
>      usage(nullptr),
>      maxMallocBytes(0),
> +    nextCellUniqueId_((1 << 3) + 1),  // Ensure we are disjoint from null tagged pointers.

We should probably have a constant for the highest tagged pointer somewhere.
Attachment #8655522 - Flags: review?(jcoppeard) → review+
Comment on attachment 8655526 [details] [diff] [review]
add_MovableCellHasher-v0.diff

Review of attachment 8655526 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/HashTable.h
@@ -652,5 @@
>      template <class, class, class> friend class detail::HashTable;
>      template <class> friend class detail::HashTableEntry;
>      template <class, class, class, class> friend class HashMap;
>  
> -    Key & mutableKey() { return key_; }

I'm assuming that this gets used in a later patch.

@@ +739,5 @@
>          mozilla::Swap(mem, other->mem);
>      }
>  
>      T& get() { MOZ_ASSERT(isLive()); return *mem.addr(); }
> +    NonConstT& getNonConst() { MOZ_ASSERT(isLive()); return *mem.addr(); }

Should this be called getMutable() like the other methods?

::: js/src/gc/Barrier.cpp
@@ +149,5 @@
> +    uint64_t uidK, uidL;
> +    okK = zone->getUniqueId(k, &uidK);
> +    MOZ_ASSERT(okK);
> +    okL = zone->getUniqueId(l, &uidL);
> +    MOZ_ASSERT(okL);

Can we use MOZ_ALWAYS_TRUE() here?
Attachment #8655526 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/mozilla-central/rev/c9e469c6b915
https://hg.mozilla.org/mozilla-central/rev/7f6585a46cd0
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Depends on: 1201470
Backing out to address the octane regression noted in bug 1201470.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
As we discussed at the GC meeting. This appears to remove the perf issue, at least with no tables using uids yet.
Attachment #8665042 - Flags: review?(jcoppeard)
Note, this is the diff *on top* of the existing uid patch.
Comment on attachment 8665042 [details] [diff] [review]
rewrite_uid_on_ht_with_zone_sweeping-v0.diff

Review of attachment 8665042 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, this looks good.  Simpler too.

::: js/src/jsgc.cpp
@@ +3584,5 @@
> +    for (UniqueIdMap::Enum e(uniqueIds_); !e.empty(); e.popFront()) {
> +        if (DispatchTraceKindTyped(IsAboutToBeFinalizedFunctor(), e.front().key()->getTraceKind(),
> +                                   &e.front().mutableKey()))
> +        {
> +            e.removeFront();

This doesn't look like it handles moving.  I think we need to do the usual IsAboutToBeFinalized() / compare pointer / rekeyFront() dance here.

@@ +5056,5 @@
> +
> +        {
> +            gcstats::AutoPhase ap(stats, gcstats::PHASE_SWEEP_BREAKPOINT);
> +            for (GCZoneGroupIter zone(rt); !zone.done(); zone.next())
> +                zone->sweepUniqueIds(&fop);

We also need to call this in GCRuntime::sweepZoneAfterCompacting().
Attachment #8665042 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #18)
> Comment on attachment 8665042 [details] [diff] [review]
> rewrite_uid_on_ht_with_zone_sweeping-v0.diff
> 
> Review of attachment 8665042 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Ok, this looks good.  Simpler too.
> 
> ::: js/src/jsgc.cpp
> @@ +3584,5 @@
> > +    for (UniqueIdMap::Enum e(uniqueIds_); !e.empty(); e.popFront()) {
> > +        if (DispatchTraceKindTyped(IsAboutToBeFinalizedFunctor(), e.front().key()->getTraceKind(),
> > +                                   &e.front().mutableKey()))
> > +        {
> > +            e.removeFront();
> 
> This doesn't look like it handles moving.  I think we need to do the usual
> IsAboutToBeFinalized() / compare pointer / rekeyFront() dance here.

Both GGC and CGC both relocate manaully when an object gets moved. In the GGC case, this is important so that we don't have to traverse every zone's entire table on every minor GC. For compacting I guess it's less important, but it does let us get away with ignoring moving here. I will add an assertion.

> @@ +5056,5 @@
> > +
> > +        {
> > +            gcstats::AutoPhase ap(stats, gcstats::PHASE_SWEEP_BREAKPOINT);
> > +            for (GCZoneGroupIter zone(rt); !zone.done(); zone.next())
> > +                zone->sweepUniqueIds(&fop);
> 
> We also need to call this in GCRuntime::sweepZoneAfterCompacting().

Ditto here. Compacting should do that itself. I should have added a checkTablesAfterMovingGC entry for the new table -- I'll do that before checking in.
(In reply to Terrence Cole [:terrence] from comment #19)

*re-reads previous patches*

Ah, right, it happens in RelocateCell().  It might be simpler just to do it in the sweep method as that's how we deal with this for everything else but it doesn't matter I guess.
The regressions are back (but smaller than before, no regression on Deltablue)
https://hg.mozilla.org/mozilla-central/rev/cca86cd156cf
https://hg.mozilla.org/mozilla-central/rev/74608aa063b9
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
This seems to be an 8-10% regression on EarleyBoyer across browser and shell. Will back out and investigate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Now that the uids are attached to the zone and the sweeping is no longer done in the background, we no longer need the lock at all. This appears to get us back the missing performance.

I also missed MergeCompartments on the first round. I've added an assertion that the UID table is empty when we merge. If it becomes non-empty, we can evaluate what to do at that time.
Attachment #8666969 - Flags: review?(jcoppeard)
Attachment #8666969 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/mozilla-central/rev/00cd37ae27b7
https://hg.mozilla.org/mozilla-central/rev/ac6fadde6934
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: mozilla43 → mozilla44
You need to log in before you can comment on or make changes to this bug.