Closed Bug 408113 Opened 12 years ago Closed 12 years ago

We reallocate context->stackPool arenas way too often

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta3

People

(Reporter: pavlov, Assigned: crowderbt)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 4 obsolete files)

Lots of people use js_AllocStack/js_FreeStack, rarely using more than 8k (current arena size).  The exception being js_Interpret.

The pattern I see lots of is:

start with an empty context->stackPool
js_AllocStack (malloc 8k)
....
js_FreeStack (free 8k)

While we want to use the arena incase we need to grow bigger than 8k, it would be great to hold on to the first arena in the pool so we aren't reallocating it all the time.
here is a simple patch that pins the first arena around for the lifetime of the context.  brendan and crowder had some ideas for improvement -- handing off to them.
With this patch on loading some pretty simple pages we go from having lots of callers doing 8k allocations down to 1 callsite (js_Interpret) and down from about 3000 allocations to 17
Keywords: footprint, perf
OS: Mac OS X → All
Hardware: PC → All
It'd be hard to argue that this isn't a tidy little perf win, and relatively easy.  Need to make it opt-in for embedders and use the "pinned" space for something, rather than burning it (even at four bytes).
Status: NEW → ASSIGNED
Flags: blocking1.9?
Priority: -- → P3
Target Milestone: --- → mozilla1.9 M11
I think we should reduce the 8K to 4K too, at the same time. Don't really want to hoard too much. And check my 4K intuition -- 1K might be even better. How does the stackPool grow? Smaller arena size means more overhead, and sometimes splitting arguments across arenas => copying args. Larger means more wasted space during use -- and with this patch, more wasted space while the context is idle.

If you have 300 tabs open, you have at least 600 JSContexts. 600*8K is 4.8M. We need to be careful here.

Another thought: make the option be to hoard for a time, used time-based caching somehow to shake loose that initial four-byte allocation. Could even store a time stamp there (make it 8 bytes if necessary).

/be
Flags: blocking1.9? → blocking1.9+
Time-based is looking better to me. We could even avoid JSOPTION_HOARD_STACK. Just make the stackPool initial allocation when first using the pool (when it is empty and we're allocating from it), put a timestamp in the jsval-sized slot allocated to keep the first arena around.

Then (since we lack a JS_HeartBeat() call that embeddings could make from their event loops when idling) perhaps just use JS_GC, which iterates over all the contexts in a runtime, to release the initial arena if the pool is empty again. Could leave the pool alone if not enough time has passed.

Simple, no option, minimal change. Try a patch, I'll review quickly.

/be
No need for change from 8K stack arena size in this bug, if comment 5 pans out.

Stuart proposed in person making all large advisory arena sizes that are powers of two be gross sizes, not net -- net means the arena overhead pushes us into the next size class used by malloc, so 1K net becomes 1040 gross and could take 2K!

This idea is good and it needs a bug of its own. Pav, did you file it already? This applies to JS arenas for sure, not sure about PLArena uses in the rest of the codebase.

/be
Just filed 408921 for JSArenaPools.  Not sure about PLArenas -- we should probably do the same thing to them, but I haven't looked at their code yet.
Blocks: 115973
Brendan:  Can you please restate the comments you made on IRC, here?  I'm not sure they agree with your comment 5 and comment 6 here, or else I am misreading.
We could clarify things in email better. I think comments 5&6 are consistent with this (from IRC query today):

<brendan_work> no, want to re-hoard when starting to use stackPool
<brendan_work> so if empty and entering interpreter from API, allocate a timestamp (uint64) set from JS_Now()
<brendan_work> in GC, for each cx if stack arena contains only the timestamp, release it
<brendan_work> (freeing the first non-header arena)
<brendan_work> that way we don't pre-hoard from JS_NewContext till first interp (which might be a while in some embeddings)
<crowder> Not actually using the timestamp for anything, though
<crowder> Can basically just use stuarts wasted 4-bytes and reclaim in GC if that's the only thing used in the pool?  But only pin the pool when it's actually being used (in js_AllocateStack()?) instead of in JS_NewContext
<crowder> Have I got that right?
<brendan_work> yeah, but timestamp is good for DEBUG-only code that computes how long we hold onto the first 8K
<brendan_work> (don't bother tuning the 8K down in this bug -- we can leave that for later/never)
<brendan_work> want to quantify time here

"first 8K means the first JSArena that is not the header (pool->first), at any given time -- not the first ever allocated for the pool.

/be
I'm not sure this is truly superior to stuart's patch.  Instrumenting this in an Fx build, we end up releasing and then immediately recreating the pools pinned by this patch (in other words, the GC doesn't really reduce footprint, since we immediately re-allocate the stackPool.  Might be worth just doing stuart's thing and playing with smaller arena-sizes, if we're concerned about overall impact.

That said, in the surfing I did with this patch and some instrumenting enabled, I only ended up doing 2-3 thousand allocations of this type over a couple minutes of surfing.  Stuart:  can you test this with the same procedure you got your original results from?
Attachment #292859 - Attachment is obsolete: true
Attachment #295399 - Flags: review?(brendan)
Duplicate of this bug: 115973
Blocks: 7251
(In reply to comment #10)
> I'm not sure this is truly superior to stuart's patch.

No. See comment 4, second paragraph!

IRC exchange should produce new patch. In mean time, note comment nit: blank line before any comment that consumes one or more lines, unless the preceding line ends in an opening brace.

/be
Crowder you on this?
he's waiting on me to test.  will try to do in next day or two.
Stuart:  I actually need to do one more rev based on comment #12 and IRC discussion with brendan.  Should be attached here later today.
Currently defaulting to letting stackPools live 30 seconds (approximately 3 GC cycles in Fx).  This can be altered by using:

    JS_SetGCParameter(rt, JSGC_STACKPOOL_LIFESPAN, 45000000);

...for example.  I can either add this or tune the default in spidermonkey, if we feel it should be longer.  30 seconds seems like plenty to me.
Attachment #295399 - Attachment is obsolete: true
Attachment #295399 - Flags: review?(brendan)
Attachment #296201 - Flags: review?(brendan)
Review-ping, Brendan?  Stuart, have you tried this?
Comment on attachment 296201 [details] [diff] [review]
Adds a runtime parameter to control stackPool lifespans

> JS_PUBLIC_API(void)
> JS_GC(JSContext *cx)
> {
>+    JSArena *spCurrent = cx->stackPool.current;

Nit, big-time: Canonical short name (local var case) is a for JSArena *, or arena if longer name (longer live range, big function) -- spCurrent doesn't do much with all the letters it uses and its camelCaps shape is odd.

Non-nit: this code all belongs in js_GC for every context (the acx iterator var) in the runtime, not just for whatever context happens to host GC. Otherwise 300 tab users will find 299 tabs hoarding 8K (or 16K, if two cx's per tab) each.

>+
>     /* Don't nuke active arenas if executing or compiling. */
>-    if (cx->stackPool.current == &cx->stackPool.first)
>-        JS_FinishArenaPool(&cx->stackPool);
>+    if ((spCurrent == cx->stackPool.first.next) &&
>+        (spCurrent->avail == spCurrent->base + sizeof(int64))) {

Nit: don't overparenthesize == etc. against && and similar.

> typedef enum JSGCParamKey {
>     JSGC_MAX_BYTES        = 0,  /* maximum nominal heap before last ditch GC */
>-    JSGC_MAX_MALLOC_BYTES = 1   /* # of JS_malloc bytes before last ditch GC */
>+    JSGC_MAX_MALLOC_BYTES = 1,  /* # of JS_malloc bytes before last ditch GC */
>+    JSGC_STACKPOOL_LIFESPAN = 2 /* Hoard stackPools for this long, in ms */

Nit: "hoard" in right-side comment -- sentence frag, no capitalization or period at end.

Nit: feel free to column-align the = again.

Non-nit: ms is wrong, the rest of the patch treats lifespan as us (microseconds). Are usec the right units? Could argue based on consistency with SpiderMonkey's C API function JS_Now (which is based on PR_Now and the rest of NSPR's time-from-epoch-start APIs), but it seems a bit much to use 1000x the JS ms (millisecond) Date getTime, etc. resolution. Thoughts?

>+    uint32              gcStackPoolLifespan;

This is cool, don't want bigger, but with usec you allow only ~4000 seconds lifespan :-P. Not a big deal, but slightly favors msec again (but I'm open to argument based on JS_Now precedent).

>+    rt->gcStackPoolLifespan = 30000000;

30 * PRMJ_USEC_PER_SEC might be better (if you ignore that awful PRMJ -- PR for Portable Runtime, MJ for "Mocha/Java"! Don't ask...).

> js_AllocRawStack(JSContext *cx, uintN nslots, void **markp)
> {
>     jsval *sp;
> 
>+    if (cx->stackPool.first.next == 0) {
>+        int64 now = JS_Now();
>+        int64 *timestamp;
>+
>+        JS_ARENA_ALLOCATE(timestamp, &cx->stackPool, sizeof(*timestamp));
>+        if (!timestamp) {
>+            js_ReportOutOfScriptQuota(cx);
>+            return NULL;
>+        }
>+        memcpy(timestamp, &now, sizeof(now));

Use int64 assignment here, not memcpy.

>+    }

Common static subroutine needed here *if* this repetition:

>@@ -3674,16 +3688,29 @@ interrupt:
> 
>                     /* Allocate the inline frame with its vars and operands. */
>                     newsp = (jsval *) avail;
>                     nbytes = nslots * sizeof(jsval);
>                     avail += nbytes;
>                     if (avail <= a->limit) {
>                         a->avail = avail;
>                     } else {
>+                        if (cx->stackPool.first.next == 0) {

Nit: !ptr for null test, not ptr == 0.

>+                            int64 now = JS_Now();
>+                            int64 *timestamp;
>+
>+                            JS_ARENA_ALLOCATE(timestamp, &cx->stackPool,
>+                                              sizeof(*timestamp));
>+                            if (!timestamp) {
>+                                js_ReportOutOfScriptQuota(cx);
>+                                goto bad_inline_call;
>+                            }
>+                            memcpy(timestamp, &now, sizeof(now));
>+                        }
>+

were truly needed -- but it's not! You can't have an empty stackPool at this point (JSOP_CALL "inline call" case), because we allocate stack on entry to js_Interpret that's not freed till we leave js_Interpret. So just js_AllocRawStack should be the place. You could JS_ASSERT(cx->stackPool.first.next) here of course.

/be
Attached patch with brendan's feedback (obsolete) — Splinter Review
I moved the expiration login into js_TraceContext, since it didn't seem to fit very well with either of the other two spots in js_GC where cx are iterated.  Also, I have changed my logic to properly reference ms, not us, thus making the comment correct and allowing for a much longer lifespan  I opted not to use the PRMJ_ macros for time as there does not exist a PRMJ_MSEC_IN_SEC (might be nice), and I wasn't inclined to add a dependency from jsgc.c to prmjtime.h where none existed before -just- for this.

Lining up the assignments in the JSGCParamKey enum looks weird.  I wanted to add a space after the ",", but that pushes the comments into the 80th column, so I left out the space (I considered rewording, but that seemed worse).  I can correct this on check-in, if you prefer the space.
Attachment #296201 - Attachment is obsolete: true
Attachment #299240 - Flags: review?(brendan)
Attachment #296201 - Flags: review?(brendan)
Comment on attachment 299240 [details] [diff] [review]
with brendan's feedback

> typedef enum JSGCParamKey {
>-    JSGC_MAX_BYTES        = 0,  /* maximum nominal heap before last ditch GC */
>-    JSGC_MAX_MALLOC_BYTES = 1   /* # of JS_malloc bytes before last ditch GC */
>+    JSGC_MAX_BYTES          = 0,/* maximum nominal heap before last ditch GC */
>+    JSGC_MAX_MALLOC_BYTES   = 1,/* # of JS_malloc bytes before last ditch GC */
>+    JSGC_STACKPOOL_LIFESPAN = 2 /* hoard stackPools for this long, in ms */

Alternative style-wise is comment on line preceding each enumerator, single-line comment, with blank line before unless prev char is {. Seems worth it here so you can say what each param does, since they are not necessarily related and may need more comment-space to say what each does.

In this light, you want to say what the default is (30 sec) here in jsapi.h so readers can grok.

> js_TraceContext(JSTracer *trc, JSContext *acx)
> {
>+    JSArena *a = acx->stackPool.current;

Generally still frowned upon to hide initializers in a group of decls (C code style rule -- in C++ you'd move the decl with init down to just before first use).

>     JSStackFrame *fp, *nextChain;
>     JSStackHeader *sh;
>     JSTempValueRooter *tvr;
> 
>     /*
>+     * Release stackPool here, if it has been in existence for longer
>+     * than the limit specified by gcStackPoolLifespan

Period at end of sentence in comment.

>+     */
>+    if (a == acx->stackPool.first.next &&
>+        a->avail == a->base + sizeof(int64)) {
>+        int64 *timestamp = (int64 *) acx->stackPool.first.next->base;
>+        if (JS_Now() - *timestamp > acx->runtime->gcStackPoolLifespan * 1000) {
>+            JS_FinishArenaPool(&acx->stackPool);
>+        }

Nits: moving timestamp up to top-level decls, in order of first use (def), seems more readable. Here the code crunches together vertically and the decl boilerplate gets in the way a bit. Also not sure a pointer is best, given no mutation. Could use int64 timestamp; ... timestamp = *(int64 *) acx->... -- but if only one use, then macro may not be worth it (more below).

> JS_FRIEND_API(jsval *)
> js_AllocRawStack(JSContext *cx, uintN nslots, void **markp)
> {
>     jsval *sp;
> 
>+    if (!cx->stackPool.first.next) {
>+        int64 now = JS_Now();
>+        int64 *timestamp;
>+
>+        JS_ARENA_ALLOCATE(timestamp, &cx->stackPool, sizeof(*timestamp));

Note better readability cuz you couldn't initialize the timestamp decl ;-). Need timestamp to be a pointer here for mutation, so pointer earlier is ok to match.

Almost there, I'll turn around an updated patch quickly. Let's get this in. Thanks,

/be
Only notable change other than addressing your comments is that I use "a->base" in js_TraceContext instead of "acx->stackPool.first.next->base" since the conditional enforces that these expressions are equivalent.  I was tempted to use *(int64 *)a->base instead of having the timestamp variable at all, but I think this code is more readable.
Attachment #299240 - Attachment is obsolete: true
Attachment #300086 - Flags: review?(brendan)
Attachment #299240 - Flags: review?(brendan)
Comment on attachment 300086 [details] [diff] [review]
stackpool lifespanning, v5

> typedef enum JSGCParamKey {
>-    JSGC_MAX_BYTES        = 0,  /* maximum nominal heap before last ditch GC */
>-    JSGC_MAX_MALLOC_BYTES = 1   /* # of JS_malloc bytes before last ditch GC */
>+    /* maximum nominal heap before last ditch GC */
>+    JSGC_MAX_BYTES          = 0,
>+
>+    /* # of JS_malloc bytes before last ditch GC */
>+    JSGC_MAX_MALLOC_BYTES   = 1,
>+
>+    /* hoard stackPools for this long, in ms, default is 30 seconds */

These are sentences now, not fragments, so capitalize and full stop are good style. Could write Number instead of #.

>+        if (JS_Now() - *timestamp > acx->runtime->gcStackPoolLifespan * 1000) {
>+            JS_FinishArenaPool(&acx->stackPool);
>+        }

Don't overbrace.

>+    if (!cx->stackPool.first.next) {
>+        int64 now = JS_Now();
>+        int64 *timestamp;
>+
>+        JS_ARENA_ALLOCATE(timestamp, &cx->stackPool, sizeof(*timestamp));
>+        if (!timestamp) {
>+            js_ReportOutOfScriptQuota(cx);
>+            return NULL;
>+        }
>+
>+        *timestamp = now;

Eliminate single-use (and initialized early) now, just store JS_Now() in *timestamp.

r=me with these,

/be
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.396; previous revision: 3.395
done
Checking in jsapi.h;
/cvsroot/mozilla/js/src/jsapi.h,v  <--  jsapi.h
new revision: 3.180; previous revision: 3.179
done
Checking in jscntxt.h;
/cvsroot/mozilla/js/src/jscntxt.h,v  <--  jscntxt.h
new revision: 3.186; previous revision: 3.185
done
Checking in jsgc.c;
/cvsroot/mozilla/js/src/jsgc.c,v  <--  jsgc.c
new revision: 3.264; previous revision: 3.263
done
Checking in jsinterp.c;
/cvsroot/mozilla/js/src/jsinterp.c,v  <--  jsinterp.c
new revision: 3.409; previous revision: 3.408
done
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
So what's the measured effect of the final patch?
Unfortunately, it may be hard to tell against Ts/Txul etc., measurements, since so much other stuff is landing tonight.
Well, Pavlov said he saw the number of allocations go from 3000 to 17, do we still get such a nice win?
Comment on attachment 300086 [details] [diff] [review]
stackpool lifespanning, v5

I didn't see the patch that landed, but I think it had everything wanted. Still best to follow r/a? and get + so beltzner doesn't look you up. ;-)

/be
Attachment #300086 - Flags: review?(brendan)
Attachment #300086 - Flags: review+
Attachment #300086 - Flags: approval1.9+
Oh, sorry.  I took "r=me with these" to mean we were good to go, and didn't think I needed a+ on the patch, as it is a blocker with the right TM?
No problem, Igor and I usually attach too many final patches (some of them actually final ;-).

/be
Depends on: 415142
Brian's patch may have caused bug #415142 (build bustage) on FreeBSD (at least on version 6.x), so I filed a new bug and added a dependency.
Depends on: 415207
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.