Closed Bug 1033442 Opened 10 years ago Closed 8 years ago

Fix malloc GC triggers

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

Attachments

(11 files)

15.47 KB, patch
Details | Diff | Splinter Review
17.30 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
43.13 KB, patch
sfink
: review+
terrence
: checkin+
Details | Diff | Splinter Review
14.89 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
21.90 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
28.75 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
45.15 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
28.84 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
7.67 KB, patch
jonco
: review+
terrence
: checkin+
Details | Diff | Splinter Review
5.78 KB, patch
terrence
: review-
Details | Diff | Splinter Review
16.72 KB, patch
Details | Diff | Splinter Review
The malloc triggers are not aware of free calls, since we may not have a size there. We need to add a free api that takes a size and use it wherever possible.
As a start, I wrote this.
 - mallocCount counts up from zero, not down from maxMallocCount
 - updateMallocCounter() becomes incMallocCounter() and decMallocCounter()
 - adds MallocProvider::free_() which takes optional size
This creates a new HeapUsage structure and uses it in both GCRuntime and Zone. This is a fairly limited patch in that it only handles gcBytes, but one step at a time. Features of this patch:
  (1) The GCRuntime's gcBytes now gets updated automatically when we poke the Zone's gcBytes.
  (2) The naming is now forced to be consistent between GCRuntime and Zone.
  (3) The new naming is |foo.usage.gcBytes()|: I think this is clearer than |foo.bytes|, |foo.gcBytes|, or even |foo.allocatedBytes()|.
Attachment #8460409 - Flags: review?(jcoppeard)
Comment on attachment 8460409 [details] [diff] [review]
automate_tracking_of_gcBytes-v0.diff

Review of attachment 8460409 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Heap.h
@@ +899,5 @@
> + */
> +class HeapUsage
> +{
> +    /*
> +     * A heap usage that contains our own heap usage, or null if this is the

er... s/own/parent's/ or something?

@@ +931,5 @@
> +        if (parent_)
> +            parent_->removeGCArena();
> +    }
> +
> +    /* Pair to adoptArenas. Adopts the attendent usage statistics. */

*attendant

::: js/src/jsgc.cpp
@@ +949,5 @@
>          removeFromAvailableList();
>  
> +    zone->usage.addGCArena();
> +
> +    if (zone->usage.gcBytes() >= zone->gcTriggerBytes) {

Perhaps for a later patch, but it'd be nice to move this into HeapUsage to centralize as much of the tuning logic as possible. Or maybe HeapUsage should only measure how much is used, and a GCHeuristics class or something should encapsulate the decisions of when to GC? ("GCTiming"? "GCScheduler"?)

@@ +2197,5 @@
>      Zone *zone = cx->allocator()->zone_;
>  
>      bool runGC = cx->allowGC() && allowGC &&
>                   cx->asJSContext()->runtime()->gc.incrementalState != NO_INCREMENTAL &&
> +                 zone->usage.gcBytes() > zone->gcTriggerBytes;

Hm. Some part of this too.

@@ +2452,5 @@
>      }
>  
>      double factor = highFrequencyGC ? 0.85 : 0.9;
> +    if (zone->usage.gcBytes() > 1024 * 1024 &&
> +        zone->usage.gcBytes() >= factor * zone->gcTriggerBytes &&

and this, especially the magic numbers

@@ +4989,5 @@
>      }
>  
>      bool reset = false;
>      for (ZonesIter zone(rt, WithAtoms); !zone.done(); zone.next()) {
> +        if (zone->usage.gcBytes() >= zone->gcTriggerBytes) {

and this
Comment on attachment 8460409 [details] [diff] [review]
automate_tracking_of_gcBytes-v0.diff

Review of attachment 8460409 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8460409 - Flags: review?(jcoppeard) → review+
(In reply to Steve Fink [:sfink], PTO Jul 28-Aug 6 from comment #4)
> Comment on attachment 8460409 [details] [diff] [review]
> automate_tracking_of_gcBytes-v0.diff
> ::: js/src/jsgc.cpp
> @@ +949,5 @@
> >          removeFromAvailableList();
> >  
> > +    zone->usage.addGCArena();
> > +
> > +    if (zone->usage.gcBytes() >= zone->gcTriggerBytes) {
> 
> Perhaps for a later patch, but it'd be nice to move this into HeapUsage to
> centralize as much of the tuning logic as possible. Or maybe HeapUsage
> should only measure how much is used, and a GCHeuristics class or something
> should encapsulate the decisions of when to GC? ("GCTiming"? "GCScheduler"?)

Yup, that's for patch 3, the one after the next. The immediate next step is to fully encapsulate the computation of the trigger values and the parameters that feed into them. The existing setup is actually pretty weird: it's all done by a function called "setLastGCBytes". Rather than storing the lastBytes it's passed as one would expect, it actually reads all sorts of random-looking stuff off the gcruntime into a terrifying-looking computation that is only lightly related to lastBytes. The surprising thing that I had not quite internalized is that the only gc-heap-size related triggers are on the zone: the computation of them is just done on the runtime because it needs access to lots of runtime values. This is, of course, all totally crazy.

Ideally it should look something like this:
zoneThreshold = zoneThreshold.updateAfterGC(GCSchedulingTunables, GCSchedulingState)

I've more or less got this done in the above form -- still needs a bit more polish, but I hope to upload it today.
Keywords: leave-open
Comment on attachment 8461190 [details] [diff] [review]
make_trigger_dataflow_clearer-v0.diff

Review of attachment 8461190 [details] [diff] [review]:
-----------------------------------------------------------------

I tried very hard to keep this minimal and to 100% preserve the existing semantics -- if some aspects seem disjoint, that is probably why. I'm going to go through and try to comment on the things that were weirdest to me, in particular things that are the way they are for an unclear reason, or that I plan to fix in a later patch. As always, suggestions for better names are always appreciated.

::: js/src/gc/GCRuntime.h
@@ +177,5 @@
> +        highFrequencyHeapGrowthMin_(1.5),
> +        lowFrequencyHeapGrowth_(1.5),
> +        isDynamicMarkSliceEnabled_(false),
> +        minEmptyChunkCount_(1),
> +        maxEmptyChunkCount_(30)

I guess encoding the default values here is not actually all that terrible. Particularly as anyone overriding these is doing so from JS, so exposing the defaults wouldn't actually be helpful. Additionally, I'm not sure I want to name our random constants until /after/ we've made the algorithms comprehensible and have a good idea what the constants even mean.

@@ +208,5 @@
> +     * important factor is in how it controls the "HeapGrowthFactor". The
> +     * growth factor is a measure of how large (as a percentage of the last GC)
> +     * the heap is allowed to grow before we try to schedule another GC.
> +     */
> +    bool inHighFrequencyGCMode_;

I think there will be more here once we integrate the malloc state tracking.

@@ +222,5 @@
> +                                        const GCSchedulingTunables &tunables) {
> +        inHighFrequencyGCMode_ =
> +            tunables.isDynamicHeapGrowthEnabled() && lastGCTime &&
> +            lastGCTime + tunables.highFrequencyTimeThreshold() > currentTime;
> +    }

For now this is mostly in class form so that this computation has a place to live.

@@ +547,5 @@
>      HeapUsage usage;
>  
> +    /* GC scheduling state and parameters. */
> +    GCSchedulingTunables  tunables;
> +    GCSchedulingState     schedulingState;

The actual values are still private or behind accessor methods, so making these public doesn't really change anything in practice.

@@ +727,4 @@
>       *
> +     * At this point, if zeal_ is one of the types that trigger periodic
> +     * collection, then nextScheduled is reset to the value of zealFrequency.
> +     * Otherwise, no additional GCs take place.

Just re-formatting    to  remove the inexplicable   spaces.

::: js/src/gc/Zone.cpp
@@ +45,5 @@
>      /* Ensure that there are no vtables to mess us up here. */
>      JS_ASSERT(reinterpret_cast<JS::shadow::Zone *>(this) ==
>                static_cast<JS::shadow::Zone *>(this));
>  
> +    threshold.updateAfterGC(8192, GC_NORMAL, rt->gc.tunables, rt->gc.schedulingState);

I know this isn't "after GC", but updateAfterGCOrForInitialization wasn't as pithy.

::: js/src/gc/Zone.h
@@ +43,5 @@
>  };
>  
> +namespace gc {
> +
> +class ZoneHeapThreshold

I should probably add a comment to this class.

Something like:
// This class encapsulates all of the data that determines when we need to do a zone GC.

This would, of course, be a lie right now. Specifically, this only contains information about the /GC/ heap triggers. I'm ignoring malloc triggers for now in the (probably doomed) hope that I won't bitrot Jon's patches too horribly with this refactor.

@@ +62,5 @@
> +    size_t gcTriggerBytes() const { return gcTriggerBytes_; }
> +
> +    void updateAfterGC(size_t lastBytes, JSGCInvocationKind gckind,
> +                       const GCSchedulingTunables &tunables, const GCSchedulingState &state);
> +    void updateForRemovedArena(const GCSchedulingTunables &tunables);

These two entry points are the only places that did (and now can) update our gc triggers.

@@ +70,5 @@
> +                                                         const GCSchedulingTunables &tunables,
> +                                                         const GCSchedulingState &state);
> +    static size_t computeZoneTriggerBytes(double growthFactor, size_t lastBytes,
> +                                          JSGCInvocationKind gckind,
> +                                          const GCSchedulingTunables &tunables);

The actual computation we do to figure out the triggers are now static and parameterized on exactly the values used.

@@ +127,5 @@
>                public js::MallocProvider<JS::Zone>
>  {
>      explicit Zone(JSRuntime *rt);
>      ~Zone();
> +    bool init(bool isSystem);

This is simple cleanup. Apparently we were just initializing isSystem and the gc triggers manually after construction before. I'd have put this on the constructor, but MallocProvider complained for some reason.

::: js/src/jsgc.cpp
@@ +954,2 @@
>          AutoUnlockGC unlock(rt);
>          TriggerZoneGC(zone, JS::gcreason::ALLOC_TRIGGER);

Well, that certainly seems reasonable! I think it's wrong though, keep reading.

@@ +1405,5 @@
>          sliceBudget = SliceBudget::TimeBudget(value);
>          break;
>        case JSGC_MARK_STACK_LIMIT:
>          setMarkStackLimit(value);
>          break;

One step at a time. I guess we'll probably want this to pass-through at some point.

@@ -1469,5 @@
> -        if (minEmptyChunkCount > maxEmptyChunkCount)
> -            minEmptyChunkCount = maxEmptyChunkCount;
> -        break;
> -      default:
> -        JS_ASSERT(key == JSGC_MODE);

This seemed needlessly optimistic to me, so now we crash instead of randomizing the value of |mode|.

@@ +1770,5 @@
> +    //                     (last - low)
> +    // max - (max - min) * ------------
> +    //                     (high - low)
> +    double factor = maxRatio - ((maxRatio - minRatio) * ((lastBytes - lowLimit) /
> +                                                         (highLimit - lowLimit)));

I guess this may be clear enough now that we don't even need the ascii-art representation.

@@ +1789,2 @@
>      double trigger = double(base) * growthFactor;
> +    return size_t(Min(double(tunables.gcMaxBytes()), trigger));

This computation seems incoherent to me.

First, the gc heap numbers are not currently ever going to be any different between shrinking and non, so that check seems totally random. The alloc base is 30MiB by default, so our typical gcTriggerBytes for small compartments is going to be 45MiB. Unless we just did a shrinking GC, in which case it's going to be size * 1.5 -- e.g. tiny.

Also, clipping the trigger to gcMaxBytes makes sense only in the shallowest sense that it is the ceiling. In practice (except for ggc, which is new), gcMaxBytes is a global hard limit on allocations, so a trigger >= gcMaxBytes will never hit in practice anyway, even with the Min().

There is some additional scaling that goes on with some of the gc trigger checks too, so maybe this all makes sense in practice, but it's really not clear from just reading the trigger assignments.

@@ +1811,3 @@
>          return;
> +
> +    gcTriggerBytes_ -= amount;

I guess we're guarding against rounding errors by stopping at the base threshold.

@@ +2216,5 @@
>      Zone *zone = cx->allocator()->zone_;
>  
>      bool runGC = cx->allowGC() && allowGC &&
>                   cx->asJSContext()->runtime()->gc.incrementalState != NO_INCREMENTAL &&
> +                 zone->usage.gcBytes() > zone->threshold.gcTriggerBytes();

Okay, this also seems pretty reasonable. In addition to the trigger, we're also checking if we're inbetween incremental slices. I guess the point is that we're going to be doing another slice soon, so just wait for that. I guess we probably want to do this in allocArena too. Still wrong though, keep reading.

@@ +2476,2 @@
>          incrementalState == NO_INCREMENTAL &&
>          !isBackgroundSweeping())

Similar to before, this checks the trigger and the incrementalState... and two other things.

Checking if we're background sweeping keeps us from blocking on the background sweep thread, so probably needs to go other places too. That's not the important bit though.

The important change is that we scale the trigger before comparing. If you read lower, this turns out to be /really/ important. The thing is, if we actually surge past our gc trigger, we stop everything and do a non-incremental GC. Thus, we really, really want to trigger before we get to that point. The fact that neither of the above does this and we don't have massive problems with non-incremental GC suggests strongly that we almost never trigger a GC because of GC memory usage -- it's probably almost all related to malloc usage.

@@ +5012,2 @@
>              *budget = SliceBudget::Unlimited;
>              stats.nonincremental("allocation trigger");

Ah ha! If we actually use our GC trigger, panic!

The reason for this is that, since we clip the trigger to gcMaxBytes, we might be at or over our hard memory cap right now. If we do try to do an incremental GC here, there is a good chance we will run out of memory at a bad time. In theory, we can handle OOM gracefully. In practice, Bill found out when deploying IGC that we generally fall over hard if we don't do a non-incremental GC here.

I think we'll eventually want to give ZoneHeapThreshold soft, hard, and panic triggers. I'd do this next, but I expect none of this is terribly relevant at the moment, given the other apparent bugs in these triggers.

Instead, I'm going to do a bit more cleanup, then hook up the probes.
Attachment #8461190 - Flags: review?(sphink)
Comment on attachment 8461190 [details] [diff] [review]
make_trigger_dataflow_clearer-v0.diff

Review of attachment 8461190 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +5012,2 @@
>              *budget = SliceBudget::Unlimited;
>              stats.nonincremental("allocation trigger");

Ohh, after a couple hours of reflection, I think I'm actually totally wrong about this all being wrong. MaybeGC is the only trigger from outside the engine, so it's the only one that wants to do incremental slices. The other triggers /want/ to do non-incremental GC -- it seems that not calling GC before we've blown the trigger is one way to force this, even if we enter through MaybeGC from a callback or something. At least, I think that's right. Hopefully, it will be clearer tomorrow morning.

In any case, this patch's should not be changing the semantics, so the overall correctness of the algorithm is orthogonal for now.
Comment on attachment 8461190 [details] [diff] [review]
make_trigger_dataflow_clearer-v0.diff

Review of attachment 8461190 [details] [diff] [review]:
-----------------------------------------------------------------

None of my questions or confusion mean that you shouldn't land this patch, I think.

::: js/src/gc/GCRuntime.h
@@ +115,5 @@
> +class GCSchedulingTunables
> +{
> +    /*
> +     * Soft limit on the number of bytes we are allowed to allocate
> +     * into the GC. Attempts to allocate over this limit will....

"into the GC"? I guess you mean "in the GC heap"?

Is the .... going to be filled in with a later patch once you figure out exactly what allocating over this limit will do? I'd be fine with that. But maybe you know already and can put it here.

@@ +121,5 @@
> +    size_t                gcMaxBytes_;
> +
> +    /*
> +     * The base value used to compute zone->trigger.gcBytes(). When
> +     * trigger.gcBytes() surpasses usage.gcBytes() for a zone, the zone may be

That sounds backwards. Is something still counting down?

@@ +130,5 @@
> +    /*
> +     * Totally disables |highFrequencyGC|, the HeapGrowthFactor, and other
> +     * tunables that make GC non-deterministic.
> +     */
> +    bool isDynamicHeapGrowthEnabled_;

Is it the naming convention to call this isDynamicHeapGrowthEnabled_? I would have expected dynamicHeapGrowthEnabled_. A (hypothetical) accessor might be IsDynamicHeapGrowthEnabled(). But I could be wrong.

@@ +136,5 @@
> +    /*
> +     * We enter high-frequency mode if we GC a twice within this many
> +     * microseconds. This value is stored directly in microseconds.
> +     */
> +    uint64_t highFrequencyTimeThreshold_;

This is fine, but how would highFrequencyThresholdUsec_ sound?

(I'm not sure that I like it, actually.)

@@ +177,5 @@
> +        highFrequencyHeapGrowthMin_(1.5),
> +        lowFrequencyHeapGrowth_(1.5),
> +        isDynamicMarkSliceEnabled_(false),
> +        minEmptyChunkCount_(1),
> +        maxEmptyChunkCount_(30)

If we were to use different defaults for eg b2g vs desktop, would we do that here or rely on calling code to do the right thing? (The latter seems reasonable.)

@@ +222,5 @@
> +                                        const GCSchedulingTunables &tunables) {
> +        inHighFrequencyGCMode_ =
> +            tunables.isDynamicHeapGrowthEnabled() && lastGCTime &&
> +            lastGCTime + tunables.highFrequencyTimeThreshold() > currentTime;
> +    }

Is that the original name? How about just updateGCMode? Or at least lose the "toggle" part; this is setting the inHighFreqMode thing, which could be described as a toggle but isn't really toggling anything. I guess maybeSetHighFrequencyGCMode isn't inclusive enough.

Oh, never mind.

I really like having the tunables separated out into one struct here.

::: js/src/gc/Zone.h
@@ +43,5 @@
>  };
>  
> +namespace gc {
> +
> +class ZoneHeapThreshold

So for now, make it

// This class encapsulates the data that determines when we need to do a zone GC.

Now the lie is more vague. :-) I'll ignore the grammatical issue of "...data...determines" ('data' is plural, but meh.)

::: js/src/jsgc.cpp
@@ +703,5 @@
>       * other chunks in the list. This way, if the GC runs several times
>       * without emptying the list, the older chunks will stay at the tail
>       * and are more likely to reach the max age.
>       */
> +    JS_ASSERT(tunables.maxEmptyChunkCount() >= tunables.minEmptyChunkCount());

Seems like the assertion should be within tunables, but maybe these are set during separate tunables.setParameter calls so finding the right spot for an assertion would be a pain. Maybe make a tunables.validate() and call it here?

@@ +1421,5 @@
> +    }
> +}
> +
> +void
> +GCSchedulingTunables::setParameter(JSGCParamKey key, uint32_t value)

Yay! Separation of concerns is good!

@@ +1750,5 @@
> +    // the heap grow to 150%. For high frequency GCs we let the heap grow
> +    // depending on the heap size:
> +    //   lastBytes < highFrequencyLowLimit: 300%
> +    //   lastBytes > highFrequencyHighLimit: 150%
> +    //   otherwise: linear interpolation between 150% and 300% based on lastBytes

Can you say "between 300% and 150%" here?

@@ +1757,5 @@
> +
> +    if (lastBytes >= tunables.highFrequencyHighLimitBytes())
> +        return tunables.highFrequencyHeapGrowthMin();
> +
> +    // Use shorter names to make the operation comprehensible.

Yes! When I read through this code, I used R0, R1, M0, M1 and it was all immediately clear. Long names are good, but not when lots of math is involved.

@@ +1770,5 @@
> +    //                     (last - low)
> +    // max - (max - min) * ------------
> +    //                     (high - low)
> +    double factor = maxRatio - ((maxRatio - minRatio) * ((lastBytes - lowLimit) /
> +                                                         (highLimit - lowLimit)));

Yeah, your indentation is just as good as the ASCII art version. I think you can ditch that part of the comment.

@@ +1789,2 @@
>      double trigger = double(base) * growthFactor;
> +    return size_t(Min(double(tunables.gcMaxBytes()), trigger));

I confess that I follow neither the meaning behind the code nor your comment here about it. I think I'll wait for a later patch to worry about it.

@@ +1811,1 @@
>          return;

I would expect this to set gcTriggerBytes_ to threshold * factor instead of just returning. Not that it's a big thing.

Basically,

  min = threshold * factor;
  triggerBytes = (triggerBytes - amount < min) ? min : triggerBytes - amount;

@@ +1811,3 @@
>          return;
> +
> +    gcTriggerBytes_ -= amount;

I would think unsigned underflow, not rounding errors, but yeah.

@@ +2476,2 @@
>          incrementalState == NO_INCREMENTAL &&
>          !isBackgroundSweeping())

The fact that "neither of the above does this"? What are the "above"? I don't understand your comment.

Nor do I understand the code. If we're in danger of hitting our trigger threshold, we do GCSlice(rt, GC_NORMAL, JS::gcreason::MAYBEGC). I guess that's an immediate nonincremental GC for the zone? Why isn't this factored into the gcTriggerBytes threshold? Only because it needs to change more dynamically, with the mode (hi freq or no)?

Or is this just triggering another incremental slice? If so, why would that ever free any memory? /me is confused. Maybe this means to start up the series of incremental slices, in the hopes of completing them "soon" (as in, before the true trigger is reached)?

@@ +5012,2 @@
>              *budget = SliceBudget::Unlimited;
>              stats.nonincremental("allocation trigger");

Wait, what? I thought this was just a plain trigger check. You hit the amount of memory at which you decide you really need a GC before allocating anything else, so you do it. It can't be incremental, because that wouldn't free anything up for a while. You hope that normally you won't hit this case because at 85%/90% you would have triggered a preemptive GC (though I'm still confused whether that's just starting up an incremental series or doing a full nonincremental GC.)

What does clipping to gcMaxBytes have to do with it?

Anyway, I think I'd better wait until the morning too.
Attachment #8461190 - Flags: review?(sphink) → review+
Comment on attachment 8461190 [details] [diff] [review]
make_trigger_dataflow_clearer-v0.diff

And backed out. Apparently the build fixes I made locally got lost somewhere.
https://hg.mozilla.org/integration/mozilla-inbound/rev/469244786622
Attachment #8461190 - Flags: checkin+
https://hg.mozilla.org/integration/mozilla-inbound/rev/17d70e1ff57b
Backed out for a baffling valgrind failure:


TEST-UNEXPECTED-FAIL | valgrind-test | Conditional jump or move depends on uninitialised value(s) at js::gc::GCRuntime::maybeGC / js::MaybeGC / JS_MaybeGC / mozilla::dom::workers::WorkerPrivate::RunCurrentSyncLoop

==25378== Conditional jump or move depends on uninitialised value(s)
==25378==    at 0xA27656D: js::gc::GCRuntime::maybeGC(JS::Zone*) (jsgc.cpp:2471)
==25378==    by 0xA2765D1: js::MaybeGC(JSContext*) (jsgc.cpp:2431)
==25378==    by 0xA2765DC: JS_MaybeGC(JSContext*) (jsapi.cpp:1894)
==25378==    by 0x90BD92F: mozilla::dom::workers::WorkerPrivate::RunCurrentSyncLoop() (WorkerPrivate.cpp:4841)
==25378==    by 0x90A86BB: (anonymous namespace)::LoadAllScripts(JSContext*, mozilla::dom::workers::WorkerPrivate*, nsTArray<(anonymous namespace)::ScriptLoadInfo>&, bool) (WorkerPrivate.h:1266)
==25378==    by 0x90A88AE: mozilla::dom::workers::scriptloader::Load(JSContext*, mozilla::dom::workers::WorkerPrivate*, mozilla::dom::Sequence<nsString> const&, mozilla::ErrorResult&) (ScriptLoader.cpp:948)
==25378==    by 0x90C40A8: mozilla::dom::workers::WorkerGlobalScope::ImportScripts(JSContext*, mozilla::dom::Sequence<nsString> const&, mozilla::ErrorResult&) (WorkerScope.cpp:169)
==25378==    by 0x8E98F8E: mozilla::dom::WorkerGlobalScopeBinding_workers::importScripts(JSContext*, JS::Handle<JSObject*>, mozilla::dom::workers::WorkerGlobalScope*, JSJitMethodCallArgs const&) (WorkerGlobalScopeBinding.cpp:376)
==25378==    by 0x8E38B9C: mozilla::dom::WorkerGlobalScopeBinding_workers::genericMethod(JSContext*, unsigned int, JS::Value*) (WorkerGlobalScopeBinding.cpp:936)
==25378==    by 0xA343837: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:231)
==25378==    by 0xA340070: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2558)
==25378==    by 0xA343030: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:413)
==25378==    by 0xA343247: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) (Interpreter.cpp:621)
==25378==    by 0xA3433A4: js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) (Interpreter.cpp:658)
==25378==    by 0xA27B1DB: Evaluate(JSContext*, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&, JS::Value*) (jsapi.cpp:4802)
==25378==    by 0xA27B571: JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&) (jsapi.cpp:4893)
==25378==    by 0x90A80B4: (anonymous namespace)::ScriptExecutorRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) (ScriptLoader.cpp:751)
==25378==    by 0x90C3363: mozilla::dom::workers::WorkerRunnable::Run() (WorkerRunnable.cpp:312)
==25378==    by 0x80F959B: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:770)
==25378==    by 0x8117026: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:265)
==25378==    by 0x90BD927: mozilla::dom::workers::WorkerPrivate::RunCurrentSyncLoop() (WorkerPrivate.cpp:4838)
==25378==    by 0x90A86BB: (anonymous namespace)::LoadAllScripts(JSContext*, mozilla::dom::workers::WorkerPrivate*, nsTArray<(anonymous namespace)::ScriptLoadInfo>&, bool) (WorkerPrivate.h:1266)
==25378==    by 0x90A88AE: mozilla::dom::workers::scriptloader::Load(JSContext*, mozilla::dom::workers::WorkerPrivate*, mozilla::dom::Sequence<nsString> const&, mozilla::ErrorResult&) (ScriptLoader.cpp:948)
==25378==    by 0x90C40A8: mozilla::dom::workers::WorkerGlobalScope::ImportScripts(JSContext*, mozilla::dom::Sequence<nsString> const&, mozilla::ErrorResult&) (WorkerScope.cpp:169)
==25378==    by 0x8E98F8E: mozilla::dom::WorkerGlobalScopeBinding_workers::importScripts(JSContext*, JS::Handle<JSObject*>, mozilla::dom::workers::WorkerGlobalScope*, JSJitMethodCallArgs const&) (WorkerGlobalScopeBinding.cpp:376)
==25378==    by 0x8E38B9C: mozilla::dom::WorkerGlobalScopeBinding_workers::genericMethod(JSContext*, unsigned int, JS::Value*) (WorkerGlobalScopeBinding.cpp:936)
==25378==    by 0xA343837: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (jscntxtinlines.h:231)
==25378==    by 0xA340070: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2558)
==25378==    by 0xA343030: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:413)
==25378==    by 0xA343247: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) (Interpreter.cpp:621)
==25378==    by 0xA3433A4: js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) (Interpreter.cpp:658)
==25378==    by 0xA27B1DB: Evaluate(JSContext*, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&, JS::Value*) (jsapi.cpp:4802)
==25378==    by 0xA27B571: JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&) (jsapi.cpp:4893)
==25378==    by 0x90A80B4: (anonymous namespace)::ScriptExecutorRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) (ScriptLoader.cpp:751)
==25378==    by 0x90C3363: mozilla::dom::workers::WorkerRunnable::Run() (WorkerRunnable.cpp:312)
==25378==    by 0x80F959B: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:770)
==25378==  Uninitialised value was created by a heap allocation
==25378==    at 0x4C28A49: malloc (vg_replace_malloc.c:270)
==25378==    by 0xA232089: JS_NewRuntime(unsigned int, unsigned int, JSRuntime*) (Utility.h:99)
==25378==    by 0x80C0D13: mozilla::CycleCollectedJSRuntime::CycleCollectedJSRuntime(JSRuntime*, unsigned int) (CycleCollectedJSRuntime.cpp:480)
==25378==    by 0x90A57DB: (anonymous namespace)::WorkerThreadPrimaryRunnable::Run() (RuntimeService.cpp:866)
==25378==    by 0x80F959B: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:770)
==25378==    by 0x8117026: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:265)
==25378==    by 0x8346F9F: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (MessagePump.cpp:326)
==25378==    by 0x8335CBB: MessageLoop::RunInternal() (message_loop.cc:229)
==25378==    by 0x8335CC6: MessageLoop::RunHandler() (message_loop.cc:222)
==25378==    by 0x8335F4B: MessageLoop::Run() (message_loop.cc:196)
==25378==    by 0x80FD608: nsThread::ThreadFunc(void*) (nsThread.cpp:347)
==25378==    by 0x6871E9D: _pt_root (ptthread.c:212)
==25378==    by 0x4E377F0: start_thread (in /lib64/libpthread-2.12.so)
==25378==    by 0x5CDD92C: clone (in /lib64/libc-2.12.so)
==25378==
Looks like it's probably decommitThreshold. Lets go ahead and initialize it to something sane for now and get it into the tunable struct asap.

https://tbpl.mozilla.org/?tree=Try&rev=76d9965b390b
Also got this on B2G emulator ICS, M(12):
    Assertion failure: gcTriggerBytes_ >= amount, at ../../../gecko/js/src/jsgc.cpp:1798

This appears to be because I changed |amount| from size_t to double. The bounds assertion should be unnecessary since we are comparing against a similarly scaled double, but no sense worrying about it too much right now. For the moment, I've changed it back to size_t.

https://tbpl.mozilla.org/?tree=Try&rev=a717ef1dfac8
The above try runs are green, so I think I've solve the issues on the platforms I skipped on my initial try run.

https://hg.mozilla.org/integration/mozilla-inbound/rev/34b05fea7384
I think this is equivalent to what we're doing right now. In any case, now that I've looked at more of the allocation stuff, I'm pretty convinced that we want to get rid of cx->mallocStuff and force allocation through either the zone or runtime for clarity. This patch does so for the slot allocations, or at least as many of them as I remembered.
Attachment #8467849 - Flags: review?(jcoppeard)
This removes the non-pod variant of calloc_ and replaces it with pod_calloc<T>(numElems). Unfortunately, most of our use of calloc is for structs that have dynamic extension data, so this is frequently pod_calloc<uint8_t>(numBytes). At least this pattern is now greppable so if we do add an explicit api for these, it will be easy to find.

There are also a few cases where pod_ simplifies the code, so that's good. Also, we're now able to remove calloc entirely from ContextAllocPolicy, a step in the right direction.

https://tbpl.mozilla.org/?tree=Try&rev=8947b5d6841c
Attachment #8467859 - Flags: review?(jcoppeard)
Comment on attachment 8467849 [details] [diff] [review]
explicit_zone_slot_alloc-v0.diff

Review of attachment 8467849 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8467849 - Flags: review?(jcoppeard) → review+
Comment on attachment 8467859 [details] [diff] [review]
remove_nonpod_calloc-v0.diff

Review of attachment 8467859 [details] [diff] [review]:
-----------------------------------------------------------------

It's almost like we want an allocation function for template type T plus n bytes additional data at the end...

::: js/src/jsalloc.h
@@ +69,2 @@
>          if (MOZ_UNLIKELY(!p))
> +            p = (T *)onOutOfMemory(reinterpret_cast<void *>(1), numElems * sizeof(T));

Oh it seems this was wrong before then.  

I'm not sure why we don't just pass a flag to onOutOfMemory to indicate the allocation type rather than this (void *)1 business.
Attachment #8467859 - Flags: review?(jcoppeard) → review+
Yes, I'd like to fix both of those issues, but figured follow-ups would be better.
Attachment #8467849 - Flags: checkin+
Attachment #8461190 - Flags: checkin+
https://tbpl.mozilla.org/?tree=Try&rev=73e2a3c6e57e

Exactly the same thing for realloc.
Attachment #8469384 - Flags: review?(jcoppeard)
Copy-pasto on some code that's not built by default.
https://tbpl.mozilla.org/?tree=Try&rev=28af422230ac
Comment on attachment 8469384 [details] [diff] [review]
remove_nonpod_realloc-v0.diff

Review of attachment 8469384 [details] [diff] [review]:
-----------------------------------------------------------------

Good stuff!

::: js/src/vm/MallocProvider.h
@@ +93,5 @@
>      }
>  
>      template <class T>
>      T *
> +    pod_calloc(size_t numElems/*, JSCompartment *comp = nullptr, JSContext *cx = nullptr*/) {

You probably meant to remove these parameters rather than commenting them out.

@@ +116,2 @@
>                            JSCompartment *comp = nullptr,
> +                          JSContext *cx = nullptr*/)

Ditto above.

@@ +123,5 @@
>      T *pod_realloc(T *prior, size_t oldSize, size_t newSize) {
> +        Client *client = static_cast<Client *>(this);
> +        T *p = js_pod_realloc(prior, oldSize, newSize);
> +        if (MOZ_LIKELY(p)) {
> +            // For compatibility we do not account for realloc that decreases

I've never understood what this compatibility is about.  Hopefully someday we can get rid of this too.
Attachment #8469384 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #30)
> I've never understood what this compatibility is about.  Hopefully someday
> we can get rid of this too.

I do know that jemalloc avoids performing an actual realloc if the size class didn't change, see [1] and [2]. That doesn't mean it will *never* realloc if the new size is smaller, though.

[1] http://dxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c#4732
[2] http://dxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c#5160
(In reply to Jon Coppeard (:jonco) from comment #30)
> @@ +123,5 @@
> >      T *pod_realloc(T *prior, size_t oldSize, size_t newSize) {
> > +        Client *client = static_cast<Client *>(this);
> > +        T *p = js_pod_realloc(prior, oldSize, newSize);
> > +        if (MOZ_LIKELY(p)) {
> > +            // For compatibility we do not account for realloc that decreases
> 
> I've never understood what this compatibility is about.  Hopefully someday
> we can get rid of this too.

It's the same thing you ran into with hooking up free to decrement the malloc counter -- we don't alloc from a consistent pool, so decrementing here could drive some counter into the red. At least that's what I recall happening when I tried making this decrement at one point.
A new try run with both patches, since yesterdays got caught in the outage.
https://tbpl.mozilla.org/?tree=Try&rev=a68c61d1a52e
And for malloc_.

Remaining things to do:
* Remove MallocProvider from JSContext. Right now most allocations go to cx, which dispatches to runtime(). This puts the allocation on the runtime, which triggers an all-zones GCs instead of per-zone GC's.

* Re-orient allocation tracking so that zone informs runtime, not the tangled mess we currently have.

* Wrap the concept of pod_malloc + extra bytes for custom variable sized stuff at the end.

* Free all things with the same provider that allocated them.

* Retune our GC approach based on malloc triggers that are relatively accurate.
Attachment #8470283 - Flags: review?(jcoppeard)
This broke b2g ics builds, I pushed a small fix for it: https://hg.mozilla.org/integration/mozilla-inbound/rev/6ee4b6cc240c
Comment on attachment 8470283 [details] [diff] [review]
remove_nonpod_malloc-v0.diff

Review of attachment 8470283 [details] [diff] [review]:
-----------------------------------------------------------------

Great, nice to see that the multiply gets now checked for overflow everywhere with this.

::: js/src/vm/MallocProvider.h
@@ +65,5 @@
>      }
>  
>      template <class T>
>      T *pod_malloc(size_t numElems) {
> +        Client *client = static_cast<Client *>(this);

We could provide a private method client() to do this cast as this happens in several places.
Attachment #8470283 - Flags: review?(jcoppeard) → review+
The first patch in the series appears to not have the android issues, so let's push that first and take a thoughtful read of the realloc changes while waiting for m-i.

https://tbpl.mozilla.org/?tree=Try&rev=2b4e2e785fc9

https://hg.mozilla.org/integration/mozilla-inbound/rev/a8138c05044d
I didn't see anything obvious in realloc, so I moved malloc up in the queue.

Green try run, modulo some dumb bustage on arm:
https://tbpl.mozilla.org/?tree=Try&rev=a4033b515b52
And arm after fixing the bustage:
https://tbpl.mozilla.org/?tree=Try&rev=98cd58dbe104

None of the Android4 failures and generally green otherwise, so I think this is ready: 
https://hg.mozilla.org/integration/mozilla-inbound/rev/9acca266d2c8
This caused a 22% improvement on octane-mandreel-latency. (1.5% overal on octane). Good job!
http://arewefastyet.com/#machine=11&view=single&suite=octane&subtest=MandreelLatency&start=1408116991&end=1408166822
(In reply to Hannes Verschore [:h4writer] from comment #44)
> This caused a 22% improvement on octane-mandreel-latency. (1.5% overal on
> octane). Good job!
> http://arewefastyet.com/
> #machine=11&view=single&suite=octane&subtest=MandreelLatency&start=1408116991
> &end=1408166822

Any ideia why the Windows machine wasn't affected?
Well, after rewriting everything under pod_malloc, the crashes appear to be gone.
https://tbpl.mozilla.org/?tree=Try&rev=f348fb409b72

So, https://hg.mozilla.org/integration/mozilla-inbound/rev/f9ec09ff142c
 (In reply to Hannes Verschore [:h4writer] from comment #44)
> This caused a 22% improvement on octane-mandreel-latency. (1.5% overal on
> octane). Good job!
> http://arewefastyet.com/
> #machine=11&view=single&suite=octane&subtest=MandreelLatency&start=1408116991
> &end=1408166822

This moves many allocations from the runtime to the zone so we can trigger more per-zone GC's. Presumably this patch happened to rejigger the trigger loading such that we GC fewer zones on this platform.

(In reply to Guilherme Lima from comment #45)
> (In reply to Hannes Verschore [:h4writer] from comment #44)
> > This caused a 22% improvement on octane-mandreel-latency. (1.5% overal on
> > octane). Good job!
> > http://arewefastyet.com/
> > #machine=11&view=single&suite=octane&subtest=MandreelLatency&start=1408116991
> > &end=1408166822
> 
> Any ideia why the Windows machine wasn't affected?

The octane Latency benchmarks are total nonsense. In this case I'd guess we're getting different GC's because we're running under the browser, which creates tons of non-test-related objects.
(1) This adds allocators to Cell (really BarrieredCell because we need access to zone()) and uses them instead of the cx or zone, where possible.
(2) This adds alloc_with_extra<T,U>(count) methods and uses them where possible.

This is part one of what will hopefully be an ongoing series -- I wanted to get something checked in though, before the patch starts collapsing under its own gravity. I am surprised at how much less direct allocation there is in SM than I had assumed, but even so, this is still an enormous task.
Attachment #8475234 - Flags: review?(jcoppeard)
Attachment #8467859 - Flags: checkin+
Attachment #8469384 - Flags: checkin+
Attachment #8470283 - Flags: checkin+
Comment on attachment 8475234 [details] [diff] [review]
add_pergcthing_allocators-v0.diff

Review of attachment 8475234 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8475234 - Flags: review?(jcoppeard) → review+
A few more minor things I found while poking at cx->*alloc.
Attachment #8475537 - Flags: review?(jcoppeard)
Comment on attachment 8475537 [details] [diff] [review]
some_minor_cleanups-v0.diff

Review of attachment 8475537 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgcinlines.h
@@ +618,5 @@
>      if (nDynamicSlots) {
> +        if (cx->isExclusiveContext())
> +            slots = cx->asExclusiveContext()->zone()->pod_malloc<HeapSlot>(nDynamicSlots);
> +        else
> +            slots = js_pod_malloc<HeapSlot>(nDynamicSlots);

This seems like it will be a problem when we come to free to know which counter to update.

Maybe in the future we can make Allocator a MallocProvider that updates the Zone and use cx->allocator() here.
Attachment #8475537 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #53)
> Comment on attachment 8475537 [details] [diff] [review]
> some_minor_cleanups-v0.diff
> 
> Review of attachment 8475537 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsgcinlines.h
> @@ +618,5 @@
> >      if (nDynamicSlots) {
> > +        if (cx->isExclusiveContext())
> > +            slots = cx->asExclusiveContext()->zone()->pod_malloc<HeapSlot>(nDynamicSlots);
> > +        else
> > +            slots = js_pod_malloc<HeapSlot>(nDynamicSlots);
> 
> This seems like it will be a problem when we come to free to know which
> counter to update.
> 
> Maybe in the future we can make Allocator a MallocProvider that updates the
> Zone and use cx->allocator() here.

Hurm, good point. This is the PJS case, so would be parallel access to the zone, which I think is why it's not easily available. The counter is atomic though, so it should be fine to do regardless. I'll poke about and see if there is an "unsafe" way to get the zone for this.
Attachment #8475234 - Flags: checkin+
It occurred to me today that we don't need to forward the allocations to the zone, just the accounting. This lets us save some typing while re-using our standard MallocProvider pattern.

Also, JS_DECLARE_NEW_METHODS uses the template parameter T internally; before I realized we could just subclass, I changed the template parameter name to Outer. This is a better match the other places where we use this pattern, so I've kept the name.
Attachment #8476308 - Flags: review?(jcoppeard)
Adding delete_ and pod_free to MallocProvider is trivial. The real challenge is making the UniquePointer returned by make_unique make use of it. In the end, this turned out to be both surprisingly hard, and pretty easy.

It will be interesting to see what MSVC thinks of this:
https://tbpl.mozilla.org/?tree=Try&rev=ffc0e2bc7f5f
Comment on attachment 8476308 [details] [diff] [review]
simplify_cell_alloc_forwarding-v0.diff

Review of attachment 8476308 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, looks good.
Attachment #8476308 - Flags: review?(jcoppeard) → review+
Comment on attachment 8476308 [details] [diff] [review]
simplify_cell_alloc_forwarding-v0.diff

Review of attachment 8476308 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is breaking on windows -- hitting all of the static_asserts we have on gcthing size and alignment. MSVC seems to think that BarrieredCell is non-zero sized for some reason. Clang and gcc seem to do the right thing, so this is probably a compiler issue. I'm going to experiment with some different approaches to forwarding.
Attachment #8476308 - Flags: review+ → review-
Depends on: 1057140
Depends on: 1057393
One of these csets seems to have caused a 5.7% regression on Octane-zlib.
See the depending bug ;) (bug 1057393). There is already a fix that is reviewed.
Depends on: 981198
Depends on: 1081260
Depends on: 1208665
Can this be closed fixed?
AFAICT all patches were checked in, and all blockers are closed
Flags: needinfo?(terrence)
Well, the stuff that's checked in was all prerequisite work for the stated purpose of the bug. It turns out that this is just really hard to do completely and it's not clear that doing the partial solution of just tracking element/slot realloc would actually be worthwhile given the obvious fragmentation problem. I'm fine with closing this particular instance: we can always open a new bug if we think of a better way to approach this.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(terrence)
Resolution: --- → WORKSFORME
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.