Closed Bug 1124473 Opened 9 years ago Closed 9 years ago

Describe our existing GC scheduling heuristics

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Attached patch describe_heuristics-v0.diff (obsolete) — Splinter Review
After a few days of reading and contemplation, I feel like I have a passable grasp of our GC triggers and it's time to solicit some feedback.
Attachment #8552751 - Flags: review?(sphink)
Attachment #8552751 - Flags: feedback?(wmccloskey)
Comment on attachment 8552751 [details] [diff] [review]
describe_heuristics-v0.diff

Review of attachment 8552751 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see another pass. I'm still confused about enough of the specifics that I can't quite grasp the big picture.

One question this does raise in my mind is when mallocBytes gets reset. If it is never reset, it seems like you'd have occasional TOO_MUCH_MALLOC GCs interspersed with normal behavior. Or if it is reset during GC, then it's only well correlated with t[sweeping] if most of the new stuff is dead and little of the old stuff is dead. (Which is a common GC assumption, so it's ok, though perhaps less so in a SET_NEW_DOCUMENT or other GC.) Anyway, a minor point compared to all the other questions this stuff would raise if I were to understand it. :-)

::: js/src/gc/GCRuntime.h
@@ +220,5 @@
> + *     memory it could surrender to the embedding.
> + *
> + *   * Memory layout: if we fill the process with GC allocations, a request for
> + *     a large block of contiguous memory may fail because no contiguous block
> + *     is free, despite having enough memory available to service the request.

s/Memory layout/Memory fragmentation/

@@ +233,5 @@
> + *     GC-internal allocations and variable sized allocations in the normal C
> + *     system heap. Tracking of the sizes in these heaps is done manually with
> + *     separate code. The actual references to the external malloc heap are
> + *     dispersed throughout the system and are not under direct control of the
> + *     GC.

Huh what?

Are you talking about externally malloced memory that the GC never sees at all or malloced memory hanging off of GC things? If it is the latter, then are you saying that external references might exist?

Given the "Other memory" section below, it seems you are referring to the 2nd one. In which case it seems like you should say that there's going to be one pointer from a GC thing, but zero to many external references that we do not control.

@@ +241,5 @@
> + *   * Most applications allocate heavily at startup, then enter a processing
> + *     stage where memory utilization remains roughly fixed with a slower
> + *     allocation rate. This is not true in general, however, so while we may
> + *     optimize for this pattern, we must be able to handle arbitrary
> + *     allocation requests.

"This is not true in general" -> "This is not always the case"

"arbitrary allocation requests" -> "arbitrary allocation patterns"

@@ +247,5 @@
> + * Other factors:
> + *
> + *   * Other memory: This is memory allocated outside the purview of the GC.
> + *     Data mapped by the system for code libraries, data allocated by those
> + *     libraries, data in the JSRuntime that is used to managed the engine,

s/managed/manage/

@@ +250,5 @@
> + *     Data mapped by the system for code libraries, data allocated by those
> + *     libraries, data in the JSRuntime that is used to managed the engine,
> + *     memory used by entirely separate programs that uses space we could
> + *     otherwise use for allocation, etc.  While we don't have to manage it, we
> + *     do have to take it into account when scheduling since it affects when we

by "entirely separate programs", you're referring to other processes?

s/uses/use/

Doesn't this category also include memory used by the embedding for its own purposes with no GC thing pointing to it?

@@ +255,5 @@
> + *     will OOM.
> + *
> + *   * Physical Reality: All target platforms have limits on the number of bits
> + *     that they are physically able to store. This, unfortunately, comes into
> + *     play in many aspects of our scheduling.

This seems a little vague. Are you referring to virtual memory limits, physical memory limits, or both? (Because physical memory has softer limits when you start swapping, but a little swapping of unused memory is ok, etc.)

@@ +271,5 @@
> + *
> + *   1) Do GC now because the embedding knows something useful about the
> + *      zone's memory retention state. These are gcreasons like LOAD_END,
> + *      PAGE_HIDE, SET_NEW_DOCUMENT, DOM_UTILS. Most of these are there because
> + *      gecko knows that a signficant fraction of the scheduled zone's memory

Maybe: "Mostly, Gecko uses these to indicate that a significant fraction..."? Or name this category "Reclaimable memory"?

@@ +276,5 @@
> + *      is probably reclaimable.
> + *
> + *   2) Do some GC work now because the embedding knows now is a good time to
> + *      do a long, unblockable operation. These are INTER_SLICE_GC and
> + *      REFRESH_FRAME.

Do these have a max duration associated with them? There's quite a difference between that and "go ahead and do things until you're all done."

@@ +291,5 @@
> + *   5) Do a GC because we are shutting down: e.g. SHUTDOWN_CC or DESTROY_*.
> + *
> + *   6) Do a GC because a compartment got accessed between GC slices when we
> + *      would have otherwise discarded it. We have to do a second GC to clean
> + *      it up.

s/got accessed/was accessed/

@@ +312,5 @@
> + *      configured maximum (JSGC_MAX_BYTES). We trigger this GC by returning
> + *      nullptr from the allocation and letting the maybeGC we run in that case
> + *      catch the case above. In this case we still return nullptr from the
> + *      allocation and make the script raise an OOM error, rather than
> + *      continuing.

Too many "cases" flying around here. We trigger this by returning nullptr and then calling maybeGC? Which is then guaranteed to fail the size > trigger check above, since trigger <= max? And in what case do we return nullptr to raise an OOM error? When the maybeGC fails, or ?

Also, OOM is a little jargon-y in a detailed comment like this that starts with a basic intro. I'd spell out OOM (Out Of Memory) or something.

@@ +316,5 @@
> + *      continuing.
> + *
> + *      Note that this differs from a LAST_DITCH GC where we actually run out
> + *      of memory when trying to allocate: LAST_DITCH may still return an
> + *      allocation at the end and allow the script to continue.

"...where we actually run out of memory (i.e., a call to a system allocator fails) when trying..."

And this seems mysterious. I think it should say that LAST_DITCH may be able to allocate because it runs a synchronous last ditch GC to try to free up enough space to satisfy the allocation.

@@ +326,5 @@
> + *      extremely common in benchmarks. Note that this uses a wildly different
> + *      mechanism from the above in that it sets the interrupt flag and does
> + *      the GC at the next loop head, before the next alloc, or maybeGC. The
> + *      reason for this is that this check is made after the allocation and we
> + *      cannot GC with an uninitialized thing in the heap.

I don't follow this. ALLOC_TRIGGER happens when we hit the threshold trigger (usually set to 80% of the max) during an allocation? Or when we hit the max? Why does this happen when we hit the limit, whatever it is, between turns of the event loop? Is it because we'll have a maybeGC that checks a more conservative trigger whenever we return to the event loop?

It uses the interrupt flag in order to ensure that it will happen before the next alloc? That makes sense. But where does maybeGC come in? How could that sneak in between? If it calls some JSAPI thing that doesn't allocate but does call maybeGC? Or are you saying that setting the interrupt flag will result in it happening before any of these happen: the next loop head, the next alloc, or the next maybeGC call?

@@ +330,5 @@
> + *      cannot GC with an uninitialized thing in the heap.
> + *
> + *  12) Do an incremental, zonal GC with reason TOO_MUCH_MALLOC when we malloc
> + *      more than JSGC_MAX_MALLOC_BYTES in a zone between the last GC and
> + *      present.

When? What notices this?

@@ +347,5 @@
> + *   B) Retained size: this is the amount of memory that the mutator can
> + *      currently reach. Said another way, it is the size of the heap
> + *      immediately after a GC (modulo background sweeping). This size is very
> + *      costly to know exactly and also extremely hard to estimate with any
> + *      fidelity.

It's also different for minor GCs and major GCs, since minor GCs overestimate liveness. But maybe that doesn't matter.

@@ +378,5 @@
> + *
> + *   This presents some obvious implications for Mark-and-Sweep collectors.
> + *   Namely:
> + *       -> t[marking] ~= size[retained]
> + *       -> t[sweeping] ~= size[allocated] - size[retained]

If you're saying "is proportional to", it should really be ∝ but that's probably not going to work in our code base. ~= seems to mean "is approximately equal to". In statistics, ~ means "has the same distribution as", which I guess isn't what you want. It also sometimes means "is the same magnitude as", which would work when the units match, but not here.

Never mind, it's fine.

@@ +381,5 @@
> + *       -> t[marking] ~= size[retained]
> + *       -> t[sweeping] ~= size[allocated] - size[retained]
> + *
> + *   In a non-incremental collector, this puts the burden of getting low slice
> + *   times on GCing before either of the above quantities. With a bit of

Whoa. This puts the what on what? Is the end of the sentence missing? "either of the above quantities... gets too large" maybe? I'm also not confident of what you mean by "non-incremental". You mean non-incremental foreground marking and sweeping?

@@ +383,5 @@
> + *
> + *   In a non-incremental collector, this puts the burden of getting low slice
> + *   times on GCing before either of the above quantities. With a bit of
> + *   thought and the above insights, the original impetus for the majority of
> + *   our GC triggers becomes fairly obvious.

If you put the word "obvious" in this comment, I will slap you with a wet fish.

It also seems that you intended this to be something of a header for the following discussion, which I did not pick up on originally.

@@ +389,5 @@
> + *   MAYBEGC
> + *   -------
> + *     Occurs when we return to the event loop and find our heap is getting
> + *     largish, but before t[marking] OR t[sweeping] can be too large for a
> + *     responsive GC. This is intended to be the common case in normal web

Aha! So maybeGC is magically called at the end of each turn of the event loop? That would have been useful to know up above.

@@ +392,5 @@
> + *     largish, but before t[marking] OR t[sweeping] can be too large for a
> + *     responsive GC. This is intended to be the common case in normal web
> + *     applications: e.g. we just finished an event handler and the few objects
> + *     we allocated when computing the new whatzitz have pushed us slightly
> + *     over the limit. After this a GC we rescale the new trigger to 150%

s/a //

@@ +402,5 @@
> + *     the current size[retained] so that we'll do one longer GC at the end of
> + *     the mutator startup rather than constant smaller GCs.
> + *
> + *         Assumptions:
> + *           -> t[marking] dominates a GC.

How so? You just said that MAYBEGC occurs if t[sweeping] is going to be too large too. And doesn't this look at the allocation accumulator, which should be proportional to t[marking] + t[sweeping]? Or are you making use of background sweeping here?

@@ +414,5 @@
> + *
> + *   ALLOC_TRIGGER (non-incremental)
> + *   -------------------------------
> + *     If we do not return to the event loop before getting all the way to our
> + *     gc trigger bytes then MAYBEGC will never fire and we may end up OOMing.

"...will never fire. To avoid OOMing, we succeed..."

@@ +421,5 @@
> + *     to raise and OOM exception for the script.
> + *
> + *         Assumptions:
> + *           -> Common web scripts will return to the event loop before using
> + *              JSGC_MAX_BYTES worth of GC memory.

Well, sort of. It's more that we'll make it back to the event loop before using MAX_BYTES - threshold bytes *more* memory. Or something? Not sure what value is compared against here. (The script may start running at zero or close to the threshold.)

@@ +428,5 @@
> + *   ---------------------------
> + *     In practice the above trigger is rough: if a website is just on the
> + *     cusp, sometimes it will trigger a non-incremental GC moments before
> + *     returning to the event loop, where it could have done an incremental GC.
> + *     Thus we recently added an incremental version of the above.

And how is the incremental version different? It uses a lower threshold?

@@ +435,5 @@
> + *   ---------------
> + *     Does a GC before size[allocated] - size[retained] gets too large for
> + *     sweeping to be fast in the case that we have significantly more malloc
> + *     allocation than GC allocation. This is meant to compliment MAYBEGC
> + *     triggers.

*complement

@@ +441,5 @@
> + *         Assumptions:
> + *           -> EITHER size[allocated_by_malloc] ~= size[allocated_by_GC]
> + *                OR   time[sweeping] ~= size[allocated_by_malloc]
> + *           -> size[retained] @ t0 ~= size[retained] @ t1
> + *              e.g. That the mutator is in stead-state operation.

i.e. that the mutator is in steady state operation

@@ +1134,5 @@
>      CallbackVector<JSWeakPointerCallback> updateWeakPointerCallbacks;
>  
>      /*
>       * Malloc counter to measure memory pressure for GC scheduling. It runs
> +     * from maxMallocBytes down to zero.

In which case it should be "mallocBytesRemainingBeforeGC", but never mind.
Attachment #8552751 - Flags: review?(sphink)
(In reply to Steve Fink [:sfink, :s:] from comment #1)
> Comment on attachment 8552751 [details] [diff] [review]
> describe_heuristics-v0.diff
> 
> Review of attachment 8552751 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd like to see another pass. I'm still confused about enough of the
> specifics that I can't quite grasp the big picture.
> 
> One question this does raise in my mind is when mallocBytes gets reset. If
> it is never reset, 

Thanks for pointing out my assumptions about reading level! It gets reset at every GC -- I've added a couple sentences to TOO_MUCH_MALLOC to explain this and point out that this makes it useless for other heuristics where it could potentially be useful.

> it seems like you'd have occasional TOO_MUCH_MALLOC GCs
> interspersed with normal behavior. Or if it is reset during GC, then it's
> only well correlated with t[sweeping] if most of the new stuff is dead and
> little of the old stuff is dead.

That's a great question! My assumption there was that either (1) most malloc memory is actually going to be from slot/element realloc, in which case it's going to mostly be tied to long-lived objects and thus t[marking]. Or, (2) the malloc memory is mostly from random non-object allocations; since non-object allocations do not seem to have the same high-infant-mortality (proved out by Jon's tracing work from last year), they also fall into long-lived and thus t[marking]. I guess the exception here is strings, where t[sweeping] might dominate in many non-octane workloads. 

> (Which is a common GC assumption, so it's
> ok, though perhaps less so in a SET_NEW_DOCUMENT or other GC.) Anyway, a
> minor point compared to all the other questions this stuff would raise if I
> were to understand it. :-)
> 
> @@ +233,5 @@
> > + *     GC-internal allocations and variable sized allocations in the normal C
> > + *     system heap. Tracking of the sizes in these heaps is done manually with
> > + *     separate code. The actual references to the external malloc heap are
> > + *     dispersed throughout the system and are not under direct control of the
> > + *     GC.
> 
> Huh what?
> 
> Are you talking about externally malloced memory that the GC never sees at
> all or malloced memory hanging off of GC things? If it is the latter, then
> are you saying that external references might exist?
> 
> Given the "Other memory" section below, it seems you are referring to the
> 2nd one. In which case it seems like you should say that there's going to be
> one pointer from a GC thing, but zero to many external references that we do
> not control.

I rewrote this section entirely, given the confusion.

> @@ +250,5 @@
> > + *     Data mapped by the system for code libraries, data allocated by those
> > + *     libraries, data in the JSRuntime that is used to managed the engine,
> > + *     memory used by entirely separate programs that uses space we could
> > + *     otherwise use for allocation, etc.  While we don't have to manage it, we
> > + *     do have to take it into account when scheduling since it affects when we
> 
> by "entirely separate programs", you're referring to other processes?

Yes, clarified with your wording.

> Doesn't this category also include memory used by the embedding for its own
> purposes with no GC thing pointing to it?

Yes, added that to the list explicitly.

> @@ +255,5 @@
> > + *     will OOM.
> > + *
> > + *   * Physical Reality: All target platforms have limits on the number of bits
> > + *     that they are physically able to store. This, unfortunately, comes into
> > + *     play in many aspects of our scheduling.
> 
> This seems a little vague. Are you referring to virtual memory limits,
> physical memory limits, or both? (Because physical memory has softer limits
> when you start swapping, but a little swapping of unused memory is ok, etc.)

I've rewritten this and added a "Platform Factors" bulletpoint to address the virtual/physical question. 

> @@ +271,5 @@
> > + *
> > + *   1) Do GC now because the embedding knows something useful about the
> > + *      zone's memory retention state. These are gcreasons like LOAD_END,
> > + *      PAGE_HIDE, SET_NEW_DOCUMENT, DOM_UTILS. Most of these are there because
> > + *      gecko knows that a signficant fraction of the scheduled zone's memory
> 
> Maybe: "Mostly, Gecko uses these to indicate that a significant
> fraction..."? Or name this category "Reclaimable memory"?

Switched it to "Embedding reasons" and moved DEBUG_GC down to Correctness. I guess DEBUG_GC doesn't have a good place in this list. Maybe we shouldn't even mention it?

> @@ +276,5 @@
> > + *      is probably reclaimable.
> > + *
> > + *   2) Do some GC work now because the embedding knows now is a good time to
> > + *      do a long, unblockable operation. These are INTER_SLICE_GC and
> > + *      REFRESH_FRAME.
> 
> Do these have a max duration associated with them? There's quite a
> difference between that and "go ahead and do things until you're all done."

Thanks. Clarified the wording by explicitly mentioning these are of fixed duration.

> @@ +312,5 @@
> > + *      configured maximum (JSGC_MAX_BYTES). We trigger this GC by returning
> > + *      nullptr from the allocation and letting the maybeGC we run in that case
> > + *      catch the case above. In this case we still return nullptr from the
> > + *      allocation and make the script raise an OOM error, rather than
> > + *      continuing.
> 
> Too many "cases" flying around here. We trigger this by returning nullptr
> and then calling maybeGC? Which is then guaranteed to fail the size >
> trigger check above, since trigger <= max? And in what case do we return
> nullptr to raise an OOM error? When the maybeGC fails, or ?

I made your ? into . and pasted this more or less verbatim as a replacement.
 
> Also, OOM is a little jargon-y in a detailed comment like this that starts
> with a basic intro. I'd spell out OOM (Out Of Memory) or something.

I added an explanation in the second paragraph, which I hope will serve for all uses of OOM in this comment. Also, this comment is in the middle of the gc source directory of a virtual machine, so I think assuming the reader has at least a little GC jargon (or just google fu) is probably fair.

> @@ +316,5 @@
> > + *      continuing.
> > + *
> > + *      Note that this differs from a LAST_DITCH GC where we actually run out
> > + *      of memory when trying to allocate: LAST_DITCH may still return an
> > + *      allocation at the end and allow the script to continue.
> 
> "...where we actually run out of memory (i.e., a call to a system allocator
> fails) when trying..."

Added.

> And this seems mysterious. I think it should say that LAST_DITCH may be able
> to allocate because it runs a synchronous last ditch GC to try to free up
> enough space to satisfy the allocation.

I've tried to clarify this.

> @@ +326,5 @@
> > + *      extremely common in benchmarks. Note that this uses a wildly different
> > + *      mechanism from the above in that it sets the interrupt flag and does
> > + *      the GC at the next loop head, before the next alloc, or maybeGC. The
> > + *      reason for this is that this check is made after the allocation and we
> > + *      cannot GC with an uninitialized thing in the heap.
> 
> I don't follow this. ALLOC_TRIGGER happens when we hit the threshold trigger
> (usually set to 80% of the max) during an allocation? Or when we hit the
> max? Why does this happen when we hit the limit, whatever it is, between
> turns of the event loop? Is it because we'll have a maybeGC that checks a
> more conservative trigger whenever we return to the event loop?

I guess you missed this bit of section 9. See below.

> It uses the interrupt flag in order to ensure that it will happen before the
> next alloc? That makes sense. But where does maybeGC come in? How could that
> sneak in between? If it calls some JSAPI thing that doesn't allocate but
> does call maybeGC? Or are you saying that setting the interrupt flag will
> result in it happening before any of these happen: the next loop head, the
> next alloc, or the next maybeGC call?

Maybe. I'm not actually too clear now. I thought maybeGC was only event loop head and the interrupt callback, but I found yesterday that we call it from the allocator as well. Maybe we can do something better here in the next pass.

> @@ +330,5 @@
> > + *      cannot GC with an uninitialized thing in the heap.
> > + *
> > + *  12) Do an incremental, zonal GC with reason TOO_MUCH_MALLOC when we malloc
> > + *      more than JSGC_MAX_MALLOC_BYTES in a zone between the last GC and
> > + *      present.
> 
> When? What notices this?

I guess the discussion of the exact scheduling should be left entirely to the TOO_MUCH_MALLOC section below.

> @@ +347,5 @@
> > + *   B) Retained size: this is the amount of memory that the mutator can
> > + *      currently reach. Said another way, it is the size of the heap
> > + *      immediately after a GC (modulo background sweeping). This size is very
> > + *      costly to know exactly and also extremely hard to estimate with any
> > + *      fidelity.
> 
> It's also different for minor GCs and major GCs, since minor GCs
> overestimate liveness. But maybe that doesn't matter.

I tried to leave discussion of minor GC entirely out of this comment and simply gloss any details that might warrant a divergence in that direction. In general, minor GC is going to have a much lower impact on the issues people are probably having if they're reading this comment, so I think it's fair to just skip it for simplicity.

> @@ +378,5 @@
> > + *
> > + *   This presents some obvious implications for Mark-and-Sweep collectors.
> > + *   Namely:
> > + *       -> t[marking] ~= size[retained]
> > + *       -> t[sweeping] ~= size[allocated] - size[retained]
> 
> If you're saying "is proportional to", it should really be ∝ but that's
> probably not going to work in our code base. ~= seems to mean "is
> approximately equal to". In statistics, ~ means "has the same distribution
> as", which I guess isn't what you want. It also sometimes means "is the same
> magnitude as", which would work when the units match, but not here.
> 
> Never mind, it's fine.

Yeah, that's the conclusion I reached too. ;-)  As long as you understood the meaning, I'm good with it.

> @@ +381,5 @@
> > + *       -> t[marking] ~= size[retained]
> > + *       -> t[sweeping] ~= size[allocated] - size[retained]
> > + *
> > + *   In a non-incremental collector, this puts the burden of getting low slice
> > + *   times on GCing before either of the above quantities. With a bit of
> 
> Whoa. This puts the what on what? Is the end of the sentence missing?

D'oh!

> "either of the above quantities... gets too large" maybe?

Yes, exactly.

> I'm also not
> confident of what you mean by "non-incremental". You mean non-incremental
> foreground marking and sweeping?

Yes. I've tried to also leave discussion of background sweeping entirely out to reduce the size and complexity of a comment that's already 250+ lines. I'm not really sure how much, if at all, the presence of background sweeping should even affect our GC scheduling. I guess we /could/ track how many things are background finalizable and adjust accordingly, but we certainly don't at the moment. Anyway, it didn't seem too terribly important to add that complexity yet.

> @@ +383,5 @@
> > + *
> > + *   In a non-incremental collector, this puts the burden of getting low slice
> > + *   times on GCing before either of the above quantities. With a bit of
> > + *   thought and the above insights, the original impetus for the majority of
> > + *   our GC triggers becomes fairly obvious.
> 
> If you put the word "obvious" in this comment, I will slap you with a wet
> fish.

:-D  That was a meta-circular joke about the absence of this comment. I've removed it now that it's done its work.

> It also seems that you intended this to be something of a header for the
> following discussion, which I did not pick up on originally.

Yes, I've indented an extra space.

> @@ +389,5 @@
> > + *   MAYBEGC
> > + *   -------
> > + *     Occurs when we return to the event loop and find our heap is getting
> > + *     largish, but before t[marking] OR t[sweeping] can be too large for a
> > + *     responsive GC. This is intended to be the common case in normal web
> 
> Aha! So maybeGC is magically called at the end of each turn of the event
> loop? That would have been useful to know up above.

Well, I had "in a call to maybeGC (e.g. when we return to the event loop)" in comment 9, so.... I've made that fact not in an aside in hopes that it will not get skipped and expanded the comment with a duplicate of the "trigger < max" to give that fact more weight too.

> @@ +402,5 @@
> > + *     the current size[retained] so that we'll do one longer GC at the end of
> > + *     the mutator startup rather than constant smaller GCs.
> > + *
> > + *         Assumptions:
> > + *           -> t[marking] dominates a GC.
> 
> How so? You just said that MAYBEGC occurs if t[sweeping] is going to be too
> large too. And doesn't this look at the allocation accumulator, which should
> be proportional to t[marking] + t[sweeping]? Or are you making use of
> background sweeping here?

Actually, I was thinking that the trigger for MAYBEGC only considers size[retained] not size[allocated]. Thus, if we have a workload with a very low size[retained] and a high size[allocated], then we will get very-frequent GCs. However, thinking more, these GC's would actually be very effective and low-latency, so I guess maybe things aren't going off the rails at all if t[sweeping] is dominating. I'll just remove that bullet.

> @@ +421,5 @@
> > + *     to raise and OOM exception for the script.
> > + *
> > + *         Assumptions:
> > + *           -> Common web scripts will return to the event loop before using
> > + *              JSGC_MAX_BYTES worth of GC memory.
> 
> Well, sort of. It's more that we'll make it back to the event loop before
> using MAX_BYTES - threshold bytes *more* memory. Or something? Not sure what
> value is compared against here. (The script may start running at zero or
> close to the threshold.)

Good point! This is |isCloseToAllocTrigger|. I guess the assumption is really that most scripts will use < 10% of gcTriggerBytes so that a typical run will not flow from <trigger to >trigger in a single turn.
 
> @@ +428,5 @@
> > + *   ---------------------------
> > + *     In practice the above trigger is rough: if a website is just on the
> > + *     cusp, sometimes it will trigger a non-incremental GC moments before
> > + *     returning to the event loop, where it could have done an incremental GC.
> > + *     Thus we recently added an incremental version of the above.
> 
> And how is the incremental version different? It uses a lower threshold?

Added some more explanation.

> @@ +1134,5 @@
> >      CallbackVector<JSWeakPointerCallback> updateWeakPointerCallbacks;
> >  
> >      /*
> >       * Malloc counter to measure memory pressure for GC scheduling. It runs
> > +     * from maxMallocBytes down to zero.
> 
> In which case it should be "mallocBytesRemainingBeforeGC", but never mind.

Yeah, wow, wouldn't it be nice if the people that implemented all of this gave reasonable names to everything! I just "love" how our primary intended GC entry point reason is called MAYBEGC.
Feedback addressed as per the comment above.
Attachment #8552751 - Attachment is obsolete: true
Attachment #8552751 - Flags: feedback?(wmccloskey)
Attachment #8553312 - Flags: review?(sphink)
Comment on attachment 8553312 [details] [diff] [review]
describe_heuristics-v1.diff

Review of attachment 8553312 [details] [diff] [review]:
-----------------------------------------------------------------

This is great! I'm still chewing on parts of it, but it's definitely landable and a huge help. Thank you!!

::: js/src/gc/GCRuntime.h
@@ +208,5 @@
> + *
> + * Cost factors:
> + *
> + *   * GC too soon and we'll revisit an object graph almost identical to one we
> + *     just visited; since we are unlikely to find new garbage, the traveral

I think "the one" might sound a little better, but maybe you disagree.

*traversal

@@ +217,5 @@
> + *   * GC too late and we'll run out of memory to allocate (e.g. Out-Of-Memory,
> + *     hereafter simply abbreviated to OOM). If this happens inside
> + *     SpiderMonkey we may be able to recover, but most embedder allocations
> + *     will simply crash on OOM, even if the GC has plenty of free memory it
> + *     could surrender to the embedding.

s/to the embedding//

@@ +226,5 @@
> + *     service the request.
> + *
> + *   * Management overhead: if our GC heap becomes large, even if the
> + *     allocations are mostly unused, we create extra overhead when managing
> + *     the GC's structures.

I would swap the last 2 clauses.

@@ +240,5 @@
> + *     the SpiderMonkey GC allows GC things to hold pointers to C heap memory.
> + *     It is the responsibility of the thing to free this memory with a custom
> + *     finalizer (with the sole exception of NativeObject which knows about
> + *     slots and elements for performance reasons). C heap memory has different
> + *     performance and overhead tradeoffs than GC internal memory that need to

s/that/, which/ (on 1st reading, I thought the precedence was higher, with "that" bound to "GC internal memory".) Or otherwise restructure for clarity.

@@ +272,5 @@
> + *   * Platform Factors: Each OS makes use of wildly different memory
> + *     management techniques. These differences result in different performance
> + *     tradeoffs, different fragmentation patterns, and different hard limits
> + *     on the amount of physical and/or virtual memory that we can be using in
> + *     practice before the system startings OOMing.

s/startings/starts/

or maybe just "...that we can use before OOMing."

@@ +285,5 @@
> + *  that generally coincide with one or more of the above factors.
> + *
> + *  Embedding reasons:
> + *
> + *   1) Do GC now because the embedding knows something useful about the

"Do a GC now" seems to better match your later pattern.

@@ +297,5 @@
> + *      These are INTER_SLICE_GC and REFRESH_FRAME.
> + *
> + *  Correctness reasons:
> + *
> + *   3) Do GC now because correctness depends on some GC property. For example,

Do a GC

@@ +299,5 @@
> + *  Correctness reasons:
> + *
> + *   3) Do GC now because correctness depends on some GC property. For example,
> + *      CC_WAITING is where the embedding depends on the mark bits being
> + *      correct. Also, EVICT_NURSERY where we need to work on the tenured heap.

"depends" makes it sound a little too seat-of-the-pants to me. How about "CC_WAITING is where the embedding requires the mark bits to be set correctly"?

@@ +305,5 @@
> + *   4) Do a GC because we are shutting down: e.g. SHUTDOWN_CC or DESTROY_*.
> + *
> + *   5) Do a GC because a compartment was accessed between GC slices when we
> + *      would have otherwise discarded it. We have to do a second GC to clean
> + *      it up.

There isn't a specific gcreason:: enum value for this?

@@ +308,5 @@
> + *      would have otherwise discarded it. We have to do a second GC to clean
> + *      it up.
> + *
> + *   6) Do extra GC work sometimes in debug builds under DEBUG_GC to help debug
> + *      problems with the GC or allocator itself.

I think I'd vote for dropping DEBUG_GC. Reading it now, it almost implies that this will happen during normal operation or something. (Some crazy self-healing scheme or something.)

@@ +326,5 @@
> + *      MaybeGC gets called at the top of the main event loop. Most heap size
> + *      limiting GC's are meant to be triggered from this callback, while JS is
> + *      not running, because there will be fewer stack roots. For this reason,
> + *      the GC's "trigger" bytes is less than the GC's "max" bytes as used by
> + *      the trigger below.

How about: "Normally, it is expected that this callback will keep the heap size limited. It is relatively inexpensive, because it is invoked with no JS running and thus few stack roots to scan." Or something. The phrase "heap size limiting GC's" seemed a little awkward and seemed to imply that only these GCs limit the heap size.

@@ +335,5 @@
> + *      nullptr and then calling maybeGC at the top level of the allocator.
> + *      This is then guaranteed to fail the "size greater than trigger" check
> + *      above, since trigger is always less than max. After performing the GC,
> + *      the allocator unconditionally returns nullptr to force an OOM exception
> + *      is raised by the script.

s/is/to be/

Ooh, really? So if we exceed max bytes, we are guaranteed to always get an OOM exception regardless of how much space is freed by the GC? That seems good, come to think of it.

@@ +342,5 @@
> + *      of memory (i.e., a call to a system allocator fails) when trying to
> + *      allocate. Unlike above, LAST_DITCH GC only happens when we are really
> + *      out of memory, not just when we cross an arbitrary trigger; despite
> + *      this, it may still return an allocation at the end and allow the script
> + *      to continue, given the LAST_DITCH GC was able to free up enough memory.

s/given/if/

@@ +347,5 @@
> + *
> + *  11) Do a GC under reason ALLOC_TRIGGER when we are over the GC heap trigger
> + *      limit, but in the allocator rather than in a random call to maybeGC.
> + *      This would occur if we did not return to the event loop while
> + *      allocating and hit maybeGC, which checks for ~.8 threshold; this is

I'm not sure what "which checks for ~.8 threshold" means -- you mean it checks against a threshold of about 80% of the JSGC_MAX_BYTES?

Also, I'm having trouble parsing the part just before. Is the "and hit maybeGC" bound to "while allocating" or "not return to the event loop"? As in, is this "This would occur if we allocated too much before returning to the event loop and calling maybeGC", or "This would occur when the allocator calls maybeGC because we've allocated too much before returning to the event loop"?

@@ +348,5 @@
> + *  11) Do a GC under reason ALLOC_TRIGGER when we are over the GC heap trigger
> + *      limit, but in the allocator rather than in a random call to maybeGC.
> + *      This would occur if we did not return to the event loop while
> + *      allocating and hit maybeGC, which checks for ~.8 threshold; this is
> + *      extremely common in benchmarks. Note that this uses a wildly different

Can we lie and claim "...in benchmarks and long-running Worker computations"? As in, I don't actually know if it is true or not, but that's the rationalization for obeying the benchmarks here.

@@ +366,5 @@
> + *  the mutator. It sees only:
> + *
> + *   A) Allocated size: this is the amount of memory currently requested by the
> + *      mutator. This quantity is monotonically increasing: e.g. the allocation
> + *      rate is always >= 0. It is also easy for the system to track.

I think you mean "i.e." instead of "e.g.", or you could drop it completely.

@@ +406,5 @@
> + *       -> t[marking] ~= size[retained]
> + *       -> t[sweeping] ~= size[allocated] - size[retained]
> + *
> + *   In a non-incremental collector, this puts the burden of getting low slice
> + *   times on GCing before either of the above quantities grows large. The

"In a non-incremental collector, low slice times depend on GCing before either of the above quantities grows too large." Or "in order to achieve low slice times, we must GC before...". Or something.

@@ +413,5 @@
> + *   sweeping, this design is no longer entirely relevant.
> + *
> + *    MAYBEGC
> + *    -------
> + *      Occurs when we return to the event loop and find our heap is getting

I would indent MAYBEGC another space or two, and not indent "Occurs" any more than that.

@@ +414,5 @@
> + *
> + *    MAYBEGC
> + *    -------
> + *      Occurs when we return to the event loop and find our heap is getting
> + *      largish, but before t[marking] OR t[sweeping] can be too large for a

s/can be/is/

@@ +418,5 @@
> + *      largish, but before t[marking] OR t[sweeping] can be too large for a
> + *      responsive GC. This is intended to be the common case in normal web
> + *      applications: e.g. we just finished an event handler and the few
> + *      objects we allocated when computing the new whatzitz have pushed us
> + *      slightly over the limit. After this GC we rescale the new trigger to

which trigger? The JSGC_MAX_BYTES trigger or the maybeGC threshold trigger? (I think the latter.)

@@ +425,5 @@
> + *
> + *      As a concession to mutators that allocate heavily during their startup
> + *      phase, we have a highFrequencyGCMode that ups the growth rate to 300%
> + *      of the current size[retained] so that we'll do one longer GC at the end
> + *      of the mutator startup rather than constant smaller GCs.

Hm. [thinking aloud] 300% is relative to something else, so this is still dependent on the initial setting of the maybeGC threshold. Well, if you accept a small-heap GC or two early on in startup, then it would be fine to set the initial threshold lower. (And in fact, the name "high frequency GC mode" implies that that is what happens. That name, in combination with the assertion that we GC only once (at the end of startup), don't seem to match.)

Do we have an end-of-startup trigger?

@@ +434,5 @@
> + *                      computers are reasonable on a wide range of hardware
> + *                      and current generation websites.
> + *                 OR   End users will tweak these values in about:config to
> + *                      get better GC behavior on sites they visit on thier own
> + *                      hardware.

These assumptions are really only required if we don't want *any* GCs until the end of startup. If a few cheap GCs won't hurt, then the defaults don't matter so much.

@@ +442,5 @@
> + *      If we do not return to the event loop before getting all the way to our
> + *      gc trigger bytes then MAYBEGC will never fire. To avoid OOMing, we
> + *      succeed the current allocation and set the script interrupt so that we
> + *      will (hopefully) do a GC before we overflow our max and have to raise
> + *      and OOM exception for the script.

s/and/an/

@@ +446,5 @@
> + *      and OOM exception for the script.
> + *
> + *          Assumptions:
> + *            -> Common web scripts will return to the event loop before using
> + *               10% of the current gcTriggerBytes worth of GC memory.

10% is a tunable constant?

Where does the zone's delay bytes setting come into play?

@@ +457,5 @@
> + *      GC. Thus, we recently added an incremental version of the above with a
> + *      slightly lower threshold, so that we have a soft limit here. If IGC can
> + *      collect faster than the allocator generates garbage, even if the
> + *      allocator does not return to the event loop freqently, we should not
> + *      have to fall back to a non-incremental GC.

Er, how does this work? If you trigger an incremental slice, you won't free up any memory unless it happens to be the final slice, no? So I don't see what using a *slightly* lower threshold would accomplish.

@@ +474,5 @@
> + *          Assumptions:
> + *            -> EITHER size[allocated_by_malloc] ~= size[allocated_by_GC]
> + *                 OR   time[sweeping] ~= size[allocated_by_malloc]
> + *            -> size[retained] @ t0 ~= size[retained] @ t1
> + *               i.e. That the mutator is in stead-state operation.

s/stead/steady/

I haven't thought too hard about these assumptions; brain is melting.

@@ +481,5 @@
> + *    -------------
> + *     Does a GC because we are out of memory.
> + *
> + *         Assumptions:
> + *           -> size[retained] < size[available_memory]

Yay, finally one that unambiguously makes sense! :-)
Attachment #8553312 - Flags: review?(sphink) → review+
> @@ +413,5 @@
> > + *   sweeping, this design is no longer entirely relevant.
> > + *
> > + *    MAYBEGC
> > + *    -------
> > + *      Occurs when we return to the event loop and find our heap is getting
> 
> I would indent MAYBEGC another space or two, and not indent "Occurs" any
> more than that.

Sorry, that's ambiguous. I meant that you don't need to indent the paragraphs at all. They can be at the same level as the MAYBEGC heading.
(In reply to Steve Fink [:sfink, :s:] from comment #4)
> Comment on attachment 8553312 [details] [diff] [review]
> describe_heuristics-v1.diff
> 
> Review of attachment 8553312 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is great! I'm still chewing on parts of it, but it's definitely
> landable and a huge help. Thank you!!
> 
> @@ +335,5 @@
> > + *      nullptr and then calling maybeGC at the top level of the allocator.
> > + *      This is then guaranteed to fail the "size greater than trigger" check
> > + *      above, since trigger is always less than max. After performing the GC,
> > + *      the allocator unconditionally returns nullptr to force an OOM exception
> > + *      is raised by the script.
> 
> s/is/to be/
> 
> Ooh, really? So if we exceed max bytes, we are guaranteed to always get an
> OOM exception regardless of how much space is freed by the GC? That seems
> good, come to think of it.

Yes. However, that limit is really quite arbitrary and low and not hitting it depends on your application generating garbage in such a manner that one of our other triggers fires first. :-/

> @@ +347,5 @@
> > + *
> > + *  11) Do a GC under reason ALLOC_TRIGGER when we are over the GC heap trigger
> > + *      limit, but in the allocator rather than in a random call to maybeGC.
> > + *      This would occur if we did not return to the event loop while
> > + *      allocating and hit maybeGC, which checks for ~.8 threshold; this is
> 
> I'm not sure what "which checks for ~.8 threshold" means -- you mean it
> checks against a threshold of about 80% of the JSGC_MAX_BYTES?

Yeah, not really relevant in this paragraph anyway: removed.

> Also, I'm having trouble parsing the part just before. Is the "and hit
> maybeGC" bound to "while allocating" or "not return to the event loop"? As
> in, is this "This would occur if we allocated too much before returning to
> the event loop and calling maybeGC", or "This would occur when the allocator
> calls maybeGC because we've allocated too much before returning to the event
> loop"?

Updated with your suggested wording.

> @@ +406,5 @@
> > + *       -> t[marking] ~= size[retained]
> > + *       -> t[sweeping] ~= size[allocated] - size[retained]
> > + *
> > + *   In a non-incremental collector, this puts the burden of getting low slice
> > + *   times on GCing before either of the above quantities grows large. The
> 
> "In a non-incremental collector, low slice times depend on GCing before
> either of the above quantities grows too large." Or "in order to achieve low
> slice times, we must GC before...". Or something.

Wow, I shouldn't have used non-incremental followed by "slice". That's just wrong. I've rewritten that paragraph as:

 *   In a non-incremental collector, maintaining low latency and high
 *   responsiveness requires that total GC times be as low as possible. Thus,
 *   in order to stay responsive when we did not have a fully incremental
 *   collector, our GC triggers were focused on minimizing collection time.
 *   Furthermore, since size[retained] is not under control of the GC, all the
 *   GC could do to control collection times was reduce sweep times by
 *   minimizing size[allocated], per the equation above.
 *
 *   The result of the above is GC triggers that focus on size[allocated] to
 *   the exclusion of other important factors and default heuristics that are
 *   not optimal for a fully incremental collector. On the other hand, this is
 *   not all bad: minimizing size[allocated] also minimizes the chance of OOM
 *   and sweeping remains one of the hardest areas to further incrementalize.

> @@ +425,5 @@
> > + *
> > + *      As a concession to mutators that allocate heavily during their startup
> > + *      phase, we have a highFrequencyGCMode that ups the growth rate to 300%
> > + *      of the current size[retained] so that we'll do one longer GC at the end
> > + *      of the mutator startup rather than constant smaller GCs.
> 
> Hm. [thinking aloud] 300% is relative to something else, so this is still
> dependent on the initial setting of the maybeGC threshold. Well, if you
> accept a small-heap GC or two early on in startup, then it would be fine to
> set the initial threshold lower. (And in fact, the name "high frequency GC
> mode" implies that that is what happens. That name, in combination with the
> assertion that we GC only once (at the end of startup), don't seem to match.)

Eh? I was trying to express the intention of the trigger: it's certainly not an assertion about the actual behavior, real or intended. I've adjusted the wording to make this clearer.

> Do we have an end-of-startup trigger?

Content appears to do a PokeGC(LOAD_END) at the end of Document::LoadComplete, so I think yes, although we do nothing special for it.

> @@ +434,5 @@
> > + *                      computers are reasonable on a wide range of hardware
> > + *                      and current generation websites.
> > + *                 OR   End users will tweak these values in about:config to
> > + *                      get better GC behavior on sites they visit on thier own
> > + *                      hardware.
> 
> These assumptions are really only required if we don't want *any* GCs until
> the end of startup. If a few cheap GCs won't hurt, then the defaults don't
> matter so much.

I don't follow. The MAYBEGC trigger is running all the time, so I don't see how this relates to startup. Ignoring startup, "doesn't hurt" is going to be relative to the desires of the application and its allocation patterns, so with IGC doing fewer GC is generally going to be better right up the point where we OOM.

> @@ +446,5 @@
> > + *      and OOM exception for the script.
> > + *
> > + *          Assumptions:
> > + *            -> Common web scripts will return to the event loop before using
> > + *               10% of the current gcTriggerBytes worth of GC memory.
> 
> 10% is a tunable constant?

No, it's hard coded as 0.9 in the source. And in the comment too now. Bwahaha.
 
> Where does the zone's delay bytes setting come into play?

When over the trigger we don't want to do another slice on every allocation.

> @@ +457,5 @@
> > + *      GC. Thus, we recently added an incremental version of the above with a
> > + *      slightly lower threshold, so that we have a soft limit here. If IGC can
> > + *      collect faster than the allocator generates garbage, even if the
> > + *      allocator does not return to the event loop freqently, we should not
> > + *      have to fall back to a non-incremental GC.
> 
> Er, how does this work? If you trigger an incremental slice, you won't free
> up any memory unless it happens to be the final slice, no? So I don't see
> what using a *slightly* lower threshold would accomplish.

I guess I'll change that to substantially.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d04e39f96610

We can make any further changes in-situ.
https://hg.mozilla.org/mozilla-central/rev/d04e39f96610
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: