Last Comment Bug 679026 - Add gcMallocBytes per compartment
: Add gcMallocBytes per compartment
Status: RESOLVED FIXED
[good first bug][mentor=:gwagner][sna...
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: mozilla12
Assigned To: Till Schneidereit [:till]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-15 10:36 PDT by Gregor Wagner [:gwagner]
Modified: 2012-01-31 07:02 PST (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Implementation of per-compartment gcMallocBytes counters - WIP (12.56 KB, patch)
2012-01-18 21:25 PST, Till Schneidereit [:till]
no flags Details | Diff | Review
GC timings without patch and with patch with two different values for gcMaxAllocBytes (4.87 KB, text/plain)
2012-01-18 21:30 PST, Till Schneidereit [:till]
no flags Details
Implementation of per-compartment gcMallocBytes counters - WIP version 2 (10.58 KB, patch)
2012-01-19 11:06 PST, Till Schneidereit [:till]
no flags Details | Diff | Review
GC stats for a v8 run without patch (7.07 KB, text/plain)
2012-01-19 11:24 PST, Till Schneidereit [:till]
no flags Details
GC stats for a v8 run with patch (7.49 KB, text/plain)
2012-01-19 11:25 PST, Till Schneidereit [:till]
no flags Details
Implementation of per-compartment gcMallocBytes counters - WIP version 3 (10.48 KB, patch)
2012-01-20 06:20 PST, Till Schneidereit [:till]
no flags Details | Diff | Review
Implementation of per-compartment gcMallocBytes counters - WIP version 4 (10.02 KB, patch)
2012-01-27 15:05 PST, Till Schneidereit [:till]
no flags Details | Diff | Review
Implementation of per-compartment gcMallocBytes counters - WIP version 5 (10.01 KB, patch)
2012-01-28 03:40 PST, Till Schneidereit [:till]
anygregor: review+
Details | Diff | Review

Description Gregor Wagner [:gwagner] 2011-08-15 10:36:54 PDT
Currently we only monitor non-JS-heap allocations per runtime. This causes a few runtime-wide GCs during the V8 benchmarks. This should also make sense with real web workload. 
We can easily avoid this behavior by adding a per-compartment counter.
Comment 1 Josh Matthews [:jdm] 2011-09-16 11:19:47 PDT
Would some JS hacker mind being a mentor for this bug, and adding a couple MXR pointers to relevant places that would need changing? This sounds like a really great task for a new contributor, but I'm worried it'll be lost in the "good first bug" list.
Comment 2 Gregor Wagner [:gwagner] 2011-09-16 14:14:04 PDT
Basically just look for places where gcMallocBytes is used. Whenever we adjust the runtime wide value, we have to adjust the value in the corresponding compartment as well.
Comment 3 Till Schneidereit [:till] 2012-01-16 17:41:52 PST
I'd be interested in working on this.

So far, I have found out that gcMallocBytes is only updated and evaluated in updateMallocCounter, so I guess the compartment update should happen at all call sites of that method, too.

What I don't yet know is which is the "corresponding compartment". Is it the current context's GC heap compartment set through JSContext::setCompartment? If so, that should be easy to update as the context is available at all of the call sites of updateMallocCounter.

Also, I suppose it makes sense to work on this on top of the patches in bug 716142?
Comment 4 Till Schneidereit [:till] 2012-01-16 17:58:17 PST
Another question is if another configurable GC parameter should be added, comparable to JSGC_MAX_MALLOC_BYTES and the associated pref. Or is the idea to completely replace the per-runtime monitoring?
Comment 5 Gregor Wagner [:gwagner] 2012-01-17 10:36:35 PST
(In reply to Till Schneidereit from comment #3)
> I'd be interested in working on this.

Glad to hear :)

> 
> So far, I have found out that gcMallocBytes is only updated and evaluated in
> updateMallocCounter, so I guess the compartment update should happen at all
> call sites of that method, too.

Yes, update the runtime-counter and the compartment-counter.

> 
> What I don't yet know is which is the "corresponding compartment". Is it the
> current context's GC heap compartment set through JSContext::setCompartment?
> If so, that should be easy to update as the context is available at all of
> the call sites of updateMallocCounter.

Yes it should be the compartment of the current context.

> 
> Also, I suppose it makes sense to work on this on top of the patches in bug
> 716142?

Yes, here we want to avoid a full-GC caused by too much dynamic allocation.



(In reply to Till Schneidereit from comment #4)
> Another question is if another configurable GC parameter should be added,
> comparable to JSGC_MAX_MALLOC_BYTES and the associated pref. Or is the idea
> to completely replace the per-runtime monitoring?

The easiest is to add another counter for the compartment and set the watermark at 80% of the runtime wide trigger. We don't want to replace the runtime monitoring.
Comment 6 Till Schneidereit [:till] 2012-01-17 11:16:12 PST
Cool, thanks for the info. I'll get back with a patch or ping you on IRC if I have further questions.
Comment 7 Till Schneidereit [:till] 2012-01-18 21:25:08 PST
Created attachment 589775 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP

I think the attached WIP patch should be somewhat close to final. Apart from general feedback, there are a couple of things I'd like to get feedback and help on, though:

- Is the way I'm currently initializing the compartment's gcMaxMallocBytes value ok?

- Is there any chance the runtime's value for gcMaxMallocBytes changes later on and would require iterating over the compartments to change their values accordingly?

- Does JSCompartment::gcMallocBytes need to be volatile and is the lock in JSCompartment::onTooMuchMalloc required? I suppose so, but am not sure.

Most importantly, right now the patch causes a clear regression in the runtime of V8 - at least in the shell. That seems to be entirely due to an increase of GCs: Without the patch, V8 has 9 global GCs, one of which is triggered by the runtime's gcMalloc counter. With the patch, there are 10 GCs, two of which are compartmental and triggered by the compartment's gcMalloc counter.

That extra GC seems to be caused by the lower value of gcMaxMallocBytes for compartments. If I set it to the same value as for the runtime, the extra GC disappears and I get a clear speedup: total time drops from ~1351ms to ~1339ms.

Is that just an artifact of the shell setup or something that should be prevented by tuning the multiplier for gcMaxMallocBytes accordingly (or both)?
Comment 8 Till Schneidereit [:till] 2012-01-18 21:30:02 PST
Created attachment 589776 [details]
GC timings without patch and with patch with two different values for gcMaxAllocBytes
Comment 9 Gregor Wagner [:gwagner] 2012-01-18 23:32:30 PST
I haven't looked at your code right now but your GC timings might be wrong. What machine are you using and how do you invoke the benchmarks? I usually see 35GCs and a total score of 7600 for the v8 benchmarks in a threadsafe shell.
Comment 10 Gregor Wagner [:gwagner] 2012-01-19 00:12:27 PST
Comment on attachment 589775 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP

> 
>+void
>+JSRuntime::updateMallocCounter(size_t nbytes, JSContext *cx)
>+{
>+    /* We tolerate any thread races when updating gcMallocBytes. */
>+    ptrdiff_t newCount = gcMallocBytes - ptrdiff_t(nbytes);
>+    gcMallocBytes = newCount;
>+    if (JS_UNLIKELY(newCount <= 0))
>+        onTooMuchMalloc();
>+    else if (cx && cx->compartment)
>+        cx->compartment->updateMallocCounter(nbytes);
>+}

This is a VERY hot function and could cause regressions if we can't 
inline it.

> MathCache *
> JSCompartment::allocMathCache(JSContext *cx)
> {
>     JS_ASSERT(!mathCache);
>     mathCache = cx->new_<MathCache>();
>     if (!mathCache)
>         js_ReportOutOfMemory(cx);
>     return mathCache;
>diff --git a/js/src/jscompartment.h b/js/src/jscompartment.h
>--- a/js/src/jscompartment.h
>+++ b/js/src/jscompartment.h
>@@ -182,16 +182,18 @@ struct JS_FRIEND_API(JSCompartment) {
>         if (gcIncrementalTracer)
>             return gcIncrementalTracer;
>         return createBarrierTracer();
>     }
> 
>     size_t                       gcBytes;
>     size_t                       gcTriggerBytes;
>     size_t                       gcLastBytes;
>+    size_t                       gcMaxBytes;

This shouldn't be needed.

>+    size_t                       gcMaxMallocBytes;
> 
>     bool                         hold;
>     bool                         isSystemCompartment;
> 
>     /*
>      * Pool for analysis and intermediate type information in this compartment.
>      * Cleared on every GC, unless the GC happens during analysis (indicated
>      * by activeAnalysis, which is implied by activeInference).
>@@ -273,16 +275,22 @@ struct JS_FRIEND_API(JSCompartment) {
> 
>     /* Cache to speed up object creation. */
>     js::NewObjectCache           newObjectCache;
> 
>   private:
>     enum { DebugFromC = 1, DebugFromJS = 2 };
> 
>     uintN                        debugModeBits;  // see debugMode() below
>+    
>+    /*
>+     * Malloc counter to measure memory pressure for GC scheduling. It runs
>+     * from gcMaxMallocBytes down to zero.
>+     */
>+    volatile ptrdiff_t           gcMallocBytes;
> 
>   public:
>     js::NativeIterCache          nativeIterCache;
> 
>     typedef js::Maybe<js::ToSourceCache> LazyToSourceCache;
>     LazyToSourceCache            toSourceCache;
> 
>     js::ScriptFilenameTable      scriptFilenameTable;
>@@ -306,18 +314,28 @@ struct JS_FRIEND_API(JSCompartment) {
>     bool wrap(JSContext *cx, js::AutoIdVector &props);
> 
>     void markTypes(JSTracer *trc);
>     void sweep(JSContext *cx, bool releaseTypes);
>     void purge(JSContext *cx);
> 
>     void setGCLastBytes(size_t lastBytes, JSGCInvocationKind gckind);
>     void reduceGCTriggerBytes(size_t amount);
>+    
>+    void resetGCMallocBytes();
>+    void setGCMaxMallocBytes(size_t value);
>+    void updateMallocCounter(size_t nbytes);
>+    
>+    /*
>+     * The function must be called outside the GC lock.
>+     */
>+    JS_FRIEND_API(void) onTooMuchMalloc();
> 
>     js::DtoaCache dtoaCache;
>+    
> 
>   private:
>     js::MathCache                *mathCache;
> 
>     js::MathCache *allocMathCache(JSContext *cx);
> 
>     /*
>      * Weak reference to each global in this compartment that is a debuggee.
>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>--- a/js/src/jsgc.cpp
>+++ b/js/src/jsgc.cpp
>@@ -2163,17 +2163,20 @@ TriggerGC(JSRuntime *rt, gcstats::Reason
>     if (rt->gcRunning || rt->gcIsNeeded)
>         return;
> 
>     /*
>      * Trigger the GC when it is safe to call an operation callback on any
>      * thread.
>      */
>     rt->gcIsNeeded = true;
>-    rt->gcTriggerCompartment = NULL;
>+    if (rt->gcTriggerCompartment) {
>+        rt->gcTriggerCompartment->resetGCMallocBytes();
>+        rt->gcTriggerCompartment = NULL;
>+    }

You don't want to reset the mallocBytes during a GC trigger event.
This should be done at the end of the GC.
Furthermore, at the end of a full GC you have to iterate
all compartments and reset the value.


Nice work! As you already noticed we deal here with sensitive code
where some small changes can lead do regression. 
Your way is to call comp->updateMallocCounter from rt->updateMallocCounter.
If we can't do this because of inline issues you might want try a sequential call.
Comment 11 Till Schneidereit [:till] 2012-01-19 05:34:53 PST
Thanks for the feedback - I think I understand all the requested changes and will implement them.

As for the benchmark: Maybe I should have mentioned that I used the SunSpider harness for the tests ...

I'll attach timings from the official Google harness with my next patch.
Comment 12 Till Schneidereit [:till] 2012-01-19 11:06:33 PST
Created attachment 589915 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 2

New version, with comments taken into account except for moving JSRuntime::updateMallocCounter back into the header file. As mentioned on IRC, that's somewhat difficult to accomplish, so I'd like to do a try run to find out if the current solution regresses anything.
Comment 13 Till Schneidereit [:till] 2012-01-19 11:24:52 PST
Created attachment 589925 [details]
GC stats for a v8 run without patch
Comment 14 Till Schneidereit [:till] 2012-01-19 11:25:27 PST
Created attachment 589926 [details]
GC stats for a v8 run with patch
Comment 15 Mozilla RelEng Bot 2012-01-19 18:30:43 PST
Try run for 4f30324589c5 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=4f30324589c5
Results (out of 217 total builds):
    success: 189
    warnings: 24
    failure: 4
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/sfink@mozilla.com-4f30324589c5
Comment 16 Till Schneidereit [:till] 2012-01-20 06:20:40 PST
Created attachment 590181 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 3

So if I understand these try results correctly it doesn't look like there are any regressions in JS benchmarks:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=29e23805bfab&newRev=4f30324589c5&tests=dromaeo_basics,dromaeo_css,dromaeo_dom,dromaeo_jslib,dromaeo_sunspider,dromaeo_v8,tsspider,tsspider_nochrome,tsspider_paint,tsspider_nochrome_paint,v8&submit=true

The errors in the Windows builds should be resolved with the attached patch.

I unfortunately don't know enough about what to expect in terms of try results to be able to interpret the test failures and the numbers other than the JS benchmarks in the compare results. Are these errors and regressions in the numbers to be expected or are they something I should worry about even though they seem unrelated?
Comment 17 Till Schneidereit [:till] 2012-01-27 15:05:05 PST
Created attachment 592294 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 4

I've updated the patch to apply to mozilla-central tweaked a few small things. 

Based on my testing, this looks like it doesn't regress anything and turns a lot of global GCs into compartmental ones.
Comment 18 Gregor Wagner [:gwagner] 2012-01-27 20:45:19 PST
Comment on attachment 592294 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 4

># HG changeset patch
># Parent 7ab255f53568efa4dc84a34a1d2f7a18453ab315
># User Till Schneidereit <tschneidereit@gmail.com>
>Bug 679026 - Add gcMallocBytes per compartment
>
>diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
>--- a/js/src/jsapi.cpp
>+++ b/js/src/jsapi.cpp
>@@ -2211,17 +2211,17 @@ JS_PUBLIC_API(void)
> JS_free(JSContext *cx, void *p)
> {
>     return cx->free_(p);
> }
> 
> JS_PUBLIC_API(void)
> JS_updateMallocCounter(JSContext *cx, size_t nbytes)
> {
>-    return cx->runtime->updateMallocCounter(nbytes);
>+    return cx->runtime->updateMallocCounter(nbytes, cx);

Please always move non-optional cx to the front in the arguments list.
Comment 19 Mozilla RelEng Bot 2012-01-27 23:00:16 PST
Try run for 209f199ed597 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=209f199ed597
Results (out of 271 total builds):
    success: 248
    warnings: 22
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/anygregor@gmail.com-209f199ed597
 Timed out after 06 hours without completing.
Comment 20 Till Schneidereit [:till] 2012-01-28 03:16:28 PST
(In reply to Gregor Wagner [:gwagner] from comment #18)
> Please always move non-optional cx to the front in the arguments list.

Will do.

The try results look good to me and almost all of the changes in benchmark results seem within range for the usual variation:

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=2eaaea255729,1950052411f1,3f4b31771b17,e55e534783b5&newRev=209f199ed597&tests=dromaeo_basics,dromaeo_css,dromaeo_dom,dromaeo_jslib,dromaeo_sunspider,dromaeo_v8,tsspider,tsspider_nochrome,tsspider_paint,tsspider_nochrome_paint,v8,ts&submit=true

The one exception is ts Android XUL which shows a huge regression. Seeing as ts Android Native is behaving normally, I hope that's not a real result but some problem with the setup?
Comment 21 Till Schneidereit [:till] 2012-01-28 03:40:30 PST
Created attachment 592384 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 5

Made cx non-optional in JSRuntime::updateMallocCounter and moved it to the front
Comment 22 Gregor Wagner [:gwagner] 2012-01-30 10:28:50 PST
Comment on attachment 592384 [details] [diff] [review]
Implementation of per-compartment gcMallocBytes counters - WIP version 5

Thanks!
Comment 23 Till Schneidereit [:till] 2012-01-30 10:39:41 PST
Thanks for the review!
Comment 25 Ed Morley [:emorley] 2012-01-31 07:02:23 PST
https://hg.mozilla.org/mozilla-central/rev/15f4101321a3

Note You need to log in before you can comment on or make changes to this bug.