Closed Bug 412866 Opened 12 years ago Closed 12 years ago

Reuse regexp arena

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: pavlov, Assigned: moz)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(2 files, 7 obsolete files)

Currently in js_ExecuteRegExp it creates a new arena pool of allocations 12k in size.  When running the sunspider test, I see a total of 583074 allocations coming from that arena.  I tried increasing the 12k to sizes up to 64k and it made no difference on the size, so it looks like we're probably calling js_ExecuteRegExp a _lot_ of times.  It might be worth while to keep the arena around as part of the context or to maybe reuse the now pinnable temp arena (although if you did that you'd have to increase the size of the temp arena...)?

Thoughts?


              libmemory.dylib`zone_malloc+0x11
              libSystem.B.dylib`malloc_zone_malloc+0x51
              libSystem.B.dylib`malloc+0x37
              libmozjs.dylib`JS_ArenaAllocate+0x85
              libmozjs.dylib`js_ExecuteRegExp+0x111
              libmozjs.dylib`match_or_replace+0x23e
              libmozjs.dylib`str_replace+0x1bc
              libmozjs.dylib`js_Interpret+0x67ef
              libmozjs.dylib`js_Execute+0x292
              libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x82

           value  ------------- Distribution ------------- count    
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 583074   
           16384 |                                         0        

(the 8192 value there covers 8192-16383 byte allocations)
Attached patch hack (obsolete) — Splinter Review
this isn't probably what you want, but this will get rid of all the allocations.  you could probably do something like this along with a patch like the one in 408113 to get something that works well.
It is interesting to note that the hack is necessary because of a difference between the JSArenaPool code and the PLArenaPool code on which it is based.  The latter had a "really deallocate" flag set to false in its PL_FreeArenaPool code.  In JS-version, the flag is missing, and the code always deallocates the arenas, instead of saving them on a free list.  This is why there are so many calls to malloc - the arena pool has to malloc and free its arenas all the time.

I think that the non-hack version of this code to call JS_FreeArenaPool, which is intended to be different from JS_FinishArenaPool.  It's not, though.  The non-hack fix would include fixing JS_FreeArenaPool to operate more like PL_FreeArenaPool.

The "Free" versions of functions, like PL_FreeArenaPool, are intended to signal a deallocation of all of the little allocations that came from the pool.  The "Finish" versions of the functions, like PL_FinishArenaPool, are like destructors for the arena pool, meaning that the pool will not be reused.

I'm the original author of prarena.c based on the lcc guys' paper, in 1995. This became plarena.c in NSPR "2" (what we now call just NSPR), and jsarena.c some time in the late 90s.

Old icky C code, I know.

We do not want to hoard arenas on a truly-global (requiring a separate lock, which is a hotspot) freelist. So we're not going to make a change at the jsarena level to reuse regexp arenas. 

jsregexp.c's use of an arena-pool in REGlobalData is suspect on its face. Two of the three JS_ARENA_ALLOCATE initial allocations may grow (JS_ARENA_GROW), but note how the two that may grow are the first and second, which run into the second and third and so must grow by allocating more space and copying! This is all misbegotten, IIRC -- rogerl (jsregexp.c rewrite author) was tuning against a slow malloc, possibly with a jsarena.c that *did* have a private freelist, and he found arenas to be faster.

I'd much rather see a patch to base this code directly on malloc/realloc/free, and benchmark that in time as well as space, with time-based caching to hold onto the space between regexp executions (using new JSContext members). See bug 408113 for cheap time-based memcaching ideas.

/be
(In reply to comment #3)
> I'm the original author of prarena.c based on the lcc guys' paper, in 1995.

That is available online, for those interested:
http://storage.webhop.net/documents/fastalloc.pdf

(In reply to comment #3)
> We do not want to hoard arenas on a truly-global (requiring a separate lock,
> which is a hotspot) freelist. So we're not going to make a change at the
> jsarena level to reuse regexp arenas. 

Actually, I don't think *anyone* should want to have a global freelist for arenas.  The freelists should be per-arena-pool, and not just for regexp arenas.  This is because the size of the arenas are a function of the pool to which they *originally* belonged.  When they go on a global freelist, they can be taken off by an arena pool for which they are unsuitably sized.  This could be a cause of unnecessary fragmentation.  Also, it requires searching a (possibly very long) freelist to find a suitably sized arena, which is slow and will pollute caches and TLBs (causing further slowdown).  The global freelist must go, and not just for js* code.

In any case, I would help with changes if you like, Brendan.  Please let me know.
Agreed on global freelist (IIRC the lcc guys did that, but for a compiler using mid-90s intermediate forms and optimizations, probably that was ok) being wrong in general. Freelist per pool is better for sure if one has to provide a freelist to make the Free vs. Finish distinction meaningful again.

But I don't believe we want that in SpiderMonkey's jsarena.c, certainly not for the jsregexp.c use-case. That use-case (which probably should be examined closely when there's more time, with an eye on avoiding two dynamically allocated stacks) is peculiar enough, and has growth patterns that look likely enough to collide in a single pool, that it would be better to investigate direct malloc/realloc usage.

There's no unwind-protect problem where we need to release to a given mark, if I remember the code correctly. We can simply malloc, realloc, and free at the end -- or put the allocations in a single-entry time-based cache, if they are not too big.

/be

/be
What is the status of this bug?

Shall I take it and continue the investigation?

I think that jsarena code would benefit from caching at least 1 arena, to avoid malloc/free thrashing.  I would prove that with stats, first, before implementing any changes.  Maybe a new bug for that investigation?  (Or, do it as part of bug 421435...)
(In reply to comment #7)
> What is the status of this bug?
> 
> Shall I take it and continue the investigation?
> 

Sure
Assignee: general → moz
Priority: -- → P4
The regexp pool is the one that is thrashing (see bug 423815 and bug 421435).  A slightly modified version of Stuart's patch reduces the number of calls to malloc during a SunSpider run by 700% or so.

I will post details soon.

A fix for this is necessary, otherwise SunSpider run times will be overly sensitive to changes on JS_InitArenaPool and JS_ArenaAllocate.  That results in me being unable to checkin changes to those functions because the changes slow SunSpider, even though they shouldn't.  :(
OS: Windows XP → All
Priority: P4 → P1
Hardware: PC → All
Attached patch v3 - 3% on SunSpider opt (obsolete) — Splinter Review
This patch, the main idea of which comes from Stuart Parmenter's patch, drastically reduces the number of calls to malloc.  The problem was that the regexp pool was thrashing (created and destroyed, over and over).

This patch produces a reproducible 3% improvement on SunSpider run from an optimized shell build.

I hope some JS gurus will look at the patch and tell me whether the regexp pool is now in the right "scope".  Also: if the regexp pool and the temp pool have the same lifetimes, then why are there two pools?
Attachment #297669 - Attachment is obsolete: true
Attachment #310891 - Flags: review?
Here's the sunspider-compare output for the before-and-after this patch.
Here are some MemStats output to show what the most recent patch does:

Before the patch:
numMallocs:         145663
numReallocs:        54
numFrees:           145663
hwmReal:            4147712

After the patch:
numMallocs:         21361
numReallocs:        41
numFrees:           21361
hwmReal:            4147200

So, the real memory use stayed the same at the high water mark; the number of reallocations decreased (significant because a realloc is indicative of a very large memory allocation growing, typically) from 54 to 41; and the number of calls to malloc was reduced by 681%.  (!)

Here is the histogram of memory allocations for 1 run of SunSpider before the patch:

   (1040,1536) 11169
   (1047,1536) 10228
   (2071,2560) 7
   (2075,2560) 8
   (4119,4608) 4
   (4123,4608) 5
   (8211,8704) 995
   (8215,8704) 2
   (8219,8704) 4
   (12307,12800) 123266
   (16023,16384) 1
   (16407,20480) 2
   (16411,20480) 4
    .... and more ....

Here is the histogram after the patch:
   (1040,1536) 11169
   (2071,2560) 7
   (4119,4608) 4
   (8175,8192) 9190
   (8211,8704) 995
   (8215,8704) 2
   (8219,8704) 4
   (12267,12288) 2
   (16023,16384) 1
   (16407,20480) 2
   (16411,20480) 4
   .... more, but same histogram as above ....

The 123266 allocations of sizes ~12K are reduced to 2.  Sweet.
Status: NEW → ASSIGNED
Comment on attachment 310891 [details] [diff] [review]
v3 - 3% on SunSpider opt

>+    /*
>+     * Pin a small allocation in regexpPool to keep at least the first arena in
>+     * the pool.  This will avoid unnecessary calls to malloc whenever someone
>+     * uses js_AllocStack and js_FreeStack.
>+     */
>+    {
>+        void *dummy;
>+        JS_ARENA_ALLOCATE(dummy, &cx->regexpPool, sizeof(void*));
>+    }

The regexpPool isn't really used by js_AllocStack or js_FreeStack, is it?  I suspect that comment was insufficiently edited after its transplant. :)
In-browser, this patch shows a 1% overall improvement, mostly concentrated in string-unpack-code. regexp-dna also shows a small improvement.
(In reply to comment #13)
> The regexpPool isn't really used by js_AllocStack or js_FreeStack, is it?  I
> suspect that comment was insufficiently edited after its transplant. :)

Yes.  Fixing in next patch....
Attached patch v4 - small fix to v3 (obsolete) — Splinter Review
Attachment #310891 - Attachment is obsolete: true
Attachment #310891 - Flags: review?
(In reply to comment #14)
> In-browser, this patch shows a 1% overall improvement, mostly concentrated in
> string-unpack-code. regexp-dna also shows a small improvement.

Thanks for testing!  What platform was this?

(In reply to comment #17)
> (In reply to comment #14)
> > In-browser, this patch shows a 1% overall improvement, mostly concentrated in
> > string-unpack-code. regexp-dna also shows a small improvement.
> 
> Thanks for testing!  What platform was this?
> 

Mac 0S 10.5 Intel.
Comment on attachment 310909 [details] [diff] [review]
v4 - small fix to v3

Unsolicited review, but this looks winning in its essence and time is short.

Please split the jsarena.[ch] patch out to a separate bug. Also, style and substance nits:

>+# define JS_ARENA_MEM_FREE(p) free(p)
>+# define JS_ARENA_MEM_MALLOC(rp, sz) (rp = malloc(sz))
>+# define JS_ARENA_MEM_REALLOC(rp, p, newSz) (rp = realloc((p), (newSz)))

Indent the macro definitions to start in the same column for less ragged-middle readability stress.

For C++ compilation support the last two need casts, so (substance here) you'll need to pass the pointer type in as a macro param.

>+    /*
>+     * To avoid multiple allocations in InitMatch() (in jsregexp.c), the arena
>+     * size parameter should be at least as big as:
>+     * INITIAL_BACKTRACK
>+     * + (sizeof(REProgState) * INITIAL_STATESTACK)
>+     * + (offsetof(REMatchState, parens) + avgParanSize * sizeof(RECapture))

Indenting the last three lines (the terms of the + operators) two spaces in the comment would make those lines more readable IMO.

>+     */
>+    JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp", 12 * 1024 - 40, 4,

Shouldn't the magic numbers be manifest constants, or combined into one manifest?

The alignment grain of 4 should probably be sizeof(void *).

>                        &cx->scriptStackQuota);
> 
>+    /*
>+     * Make a small allocation in regexpPool to keep at least the first arena in
>+     * the pool.  This avoids unnecessary arena allocations.
>+     */
>+    JS_ARENA_ALLOCATE(unused, &cx->regexpPool, sizeof(void*));

Nit: sizeof(void *) -- one space before * and other such declarator modes, and after the type name.

>-    JS_ARENA_GROW_CAST(gData->stateStack, REProgState *, &gData->pool, sz, sz);
>+    JS_ARENA_GROW_CAST(gData->stateStack, REProgState *, &gData->cx->regexpPool, sz, sz);
..12345678901234567890123456789012345678901234567890123456789012345678901234567890

Stop shy of column 80, so best to wrap before &gData->cx->regexpPool.

/be
(In reply to comment #19)
> Unsolicited review, but this looks winning in its essence and time is short.

Brendan, thanks for the review!

> Please split the jsarena.[ch] patch out to a separate bug.

I have opened bug 424287 for the logging stuff.  I'll get to that soon, because I'm getting tired of putting it in and then taking it out before every patch submission.
 
> Shouldn't the magic numbers be manifest constants, or combined into one
> manifest?

I will fix these constants in bug 421435, after bug 420476 is in, which will allow me to lose the stupid -40 hackery.

I have addressed your other concerns in my upcoming patch.
Attached patch v5 - Post review from Brendan (obsolete) — Splinter Review
Attachment #310909 - Attachment is obsolete: true
Attachment #310919 - Flags: review?(brendan)
Comment on attachment 310919 [details] [diff] [review]
v5 - Post review from Brendan


>+    JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp",
>+                       12 * 1024 - 40,  /* FIXME: bug421435 */
>+                       sizeof(void *), &cx->scriptStackQuota);
>+
>+    /*
>+     * Make a small allocation in regexpPool to keep at least the first arena in
>+     * the pool.  This avoids unnecessary arena allocations.
>+     */
>+    JS_ARENA_ALLOCATE(unused, &cx->regexpPool, sizeof(void *));

It seems to me we should do what bug 408113 did for cx->stackPool, and not preallocate and hoard significant memory per context as this patch does, instead delaying the 12K malloc to first regexp execution with a timestamp kept live at the front of the first real arena in the pool, and freeing that arena from the GC when its time-based cache lifetime has expired.

Sorry I didn't mention this last time.

/be
(In reply to comment #22)
> It seems to me we should do what bug 408113 did for cx->stackPool, and not
> preallocate and hoard significant memory per context as this patch does,
> instead delaying the 12K malloc to first regexp execution with a timestamp kept
> live at the front of the first real arena in the pool, and freeing that arena
> from the GC when its time-based cache lifetime has expired.

In that bug, we added a timestamp to the arena's base (in js_AllocRawStack), a garbage collector tunable parameter (stackpool lifespan), and a check of the timestamp against the lifespan in js_TraceContext.  This might help free one arena per context.  It was kind of a time-based virtual-memory-recovery algorithm.

But, there was already a time-based virtual-memory-recovery algorithm available: the OS's.  As we know, the OS will steal physical pages that have not been touched in a while.  If Firefox's stack pool arena goes unused for a time, and the OS needs memory, then it will take the physical pages from the stack pool's arena.  Firefox's virtual memory size (and it's layout) will remain unchanged, but it's resident set size (the physical pages that it's actually using) will decrease.  Then, when the stack pool needs memory again, it's there, transparently.  Nice.  Total code required for this solution: 0.

My understanding is that there are only 1 or 2 contexts per browser tab.  Please correct my understanding, if appropriate.

So, I pose these questions, for Brendan and others.  They are not rhetorical:  Are we trying to do too much, in this bug, or can we check it in as-is?  Is the new proposed functionality (like in bug 408113) worth the effort?  Can we check that in with another bug?  If we do need functionality like with the stack pool in bug 408113, then do we want to check the regexpPool's timestamp in the same place as we check the stack pool's timestamp?

(BTW, jemalloc (Mozilla's new memory subsystem) is only going to release memory back to the OS if it has a "chunk's" worth of it.  That means that you need to have a 1MB contiguous region of memory before jemalloc releases it to the OS.  I'm not sure if "releasing memory back to the OS" is motivating the change, but if it is, then know that the only other consumer of the released stack pool arena's memory is Firefox.)
Since Spidermonkey can be embedded on architectures without virtual memory, we must attempt to play nice on those architectures.  You need to match my timestamp code for this pool, as well.
Also:  users still complain about VM usage, even when it doesn't impact them.  For a user with dozens and dozens of tabs open, at a cost of 12KB * 2 per-tab, this will be a noticeable footprint degradation (even if, in reality, they aren't suffering as a result).
(In reply to comment #24)
> Since Spidermonkey can be embedded on architectures without virtual memory, we
> must attempt to play nice on those architectures.  You need to match my
> timestamp code for this pool, as well.

Ah, this makes a big difference.  In that case, my comments about jemalloc don't apply, either.  I'll add code for timestamp which closely follows what the stack pool code does.
(In reply to comment #25)
> Also:  users still complain about VM usage, even when it doesn't impact them. 
> For a user with dozens and dozens of tabs open, at a cost of 12KB * 2 per-tab,
> this will be a noticeable footprint degradation (even if, in reality, they
> aren't suffering as a result).

Yes, the footprint will be degraded.  But, it will not likely be improved by freeing those arenas, once degraded.

My comment about chunks and jemalloc was meant to address this issue.  (This applies only to the Firefox case on a "typical" computer/OS - one with a VM subsystem.)  malloc (in jemalloc, specifically) gulps memory away from the OS in big chunks (1 or 2 MB, say).  To have a noticeable effect on VM size (not RSS size), one must return an entire chunk back to the OS, which means that one must free 1MB worth of contiguous memory.  This is not likely to happen even if one freed 300 tabs' regexp arenas (3.6MB), which are not likely contiguous.

However, it does make sense to free the arenas for other reasons.  I'm on board with the change, especially since Spidermonkey has to work on embedded devices, etc.

This is how I think a "bug 408113"-like change should be made for regexpPool.  Let me know if I am on the wrong track.

This patch achieves the same effect as the prior "v5" patch, in terms of reducing calls to malloc.

Unfortunately, it slows SunSpider by 1.9% compared to that patch.  This is due to the extra code path in js_ExecuteRegExp.  The extra code is merely the check to see if the timestamp needs to be added to the base of the pool's first arena.

I suspect that the equivalent check in js_AllocRawStack in jsinterp.c gave an equivalent slowdown when that change was made in 408113.

Ideas, anyone, about how to proceed?

I would like put the check for regexpPool's emptiness outside of js_ExecuteRegExp, but where?
Attachment #310919 - Attachment is obsolete: true
Attachment #311478 - Flags: review?(brendan)
Attachment #310919 - Flags: review?(brendan)
Attachment #311478 - Attachment description: v6 - bug 408133-like code for regexpPool (slows SunSpider) → v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)
(In reply to comment #28) 
> I suspect that the equivalent check in js_AllocRawStack in jsinterp.c gave an
> equivalent slowdown when that change was made in 408113.

My suspicion is not confirmed.  I checked.  There was no speedup when I undid the change from bug 408113.
(In reply to comment #28) 
> Unfortunately, it slows SunSpider by 1.9% compared to that patch.

Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build on my Mac.  That is comparing to a recent build without any patch.
(In reply to comment #30)
> (In reply to comment #28) 
> > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
> 
> Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> on my Mac.  That is comparing to a recent build without any patch.
> 

Was your first test in a debug build? 
(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #28) 
> > > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
> > 
> > Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> > on my Mac.  That is comparing to a recent build without any patch. 
> Was your first test in a debug build? 
No, definitely not.
(In reply to comment #32)
> (In reply to comment #31)
> > (In reply to comment #30)
> > > (In reply to comment #28) 
> > > > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
> > > 
> > > Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> > > on my Mac.  That is comparing to a recent build without any patch. 
> > Was your first test in a debug build? 
> No, definitely not.
> 

Perhaps you can answer the source of my question - you reported a slowdown first, then a speedup.  What change accounted for this difference?
He's saying the version without the timestamping is faster than the version with, but both are faster trunk.
(In reply to comment #34)
> He's saying the version without the timestamping is faster than the version
> with, but both are faster trunk.

Yes, thanks for clarifying, Brian.  The version without timestamping ("v5" patch) gave 3% speed increase on SunSpider.  The version with timestamping ("v6" patch) gave 1.2% speed increase.

I was (and still am) unclear about why the magnitude of the speed increase in the timestamping version is so much less than that for the non-timestamping version.
Comment on attachment 311478 [details] [diff] [review]
v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)

>Index: jsapi.c
>===================================================================
>RCS file: /cvsroot/mozilla/js/src/jsapi.c,v
>retrieving revision 3.434
>diff -u -8 -p -r3.434 jsapi.c
>--- jsapi.c	19 Mar 2008 02:47:40 -0000	3.434
>+++ jsapi.c	24 Mar 2008 23:54:30 -0000
>@@ -2424,19 +2424,19 @@ JS_IsGCMarkingTracer(JSTracer *trc)
>     return IS_GC_MARKING_TRACER(trc);
> }
> 
> JS_PUBLIC_API(void)
> JS_GC(JSContext *cx)
> {
>     /* Don't nuke active arenas if executing or compiling. */
>     if (cx->stackPool.current == &cx->stackPool.first)
>-        JS_FinishArenaPool(&cx->stackPool);
>+        JS_FreeArenaPool(&cx->stackPool);
>     if (cx->tempPool.current == &cx->tempPool.first)
>-        JS_FinishArenaPool(&cx->tempPool);
>+        JS_FreeArenaPool(&cx->tempPool);

Is this really part of the patch for this bug? It seems unrelated and unmotivated.

Otherwise looks good at a glance, would be good if via debugger inspection or logging you proved that the regexpPool is surviving for the desired interval during sunspider testing, which would make the slowdown even more mysterious -- or if you find a bug (such as the lifetime not being configured from 0, or some other bug), it would explain the slowdown.

Passing addl. review to crowder since I'm skimming.

/be
Attachment #311478 - Flags: review?(crowder)
Attachment #311478 - Flags: review?(brendan)
Attachment #311478 - Flags: review+
(In reply to comment #36)
> >     if (cx->stackPool.current == &cx->stackPool.first)
> >-        JS_FinishArenaPool(&cx->stackPool);
> >+        JS_FreeArenaPool(&cx->stackPool);
> >     if (cx->tempPool.current == &cx->tempPool.first)
> >-        JS_FinishArenaPool(&cx->tempPool);
> >+        JS_FreeArenaPool(&cx->tempPool);
> 
> Is this really part of the patch for this bug? It seems unrelated and
> unmotivated.
It is related - my logging code discriminates between "Finish"ing an arena pool, and "Free"ing an arena pool.  The former is a destructor function, and should only be called once per arena pool, while the latter does not destroy the arena pool but does free all of its arenas.

When JS_ARENAMETER is not defined, and my logging code is not enabled either, then both functions operate in the same way.

So, my change is a correction which helps with my own logging, but I could easily move the change to another bug.
Comment on attachment 311478 [details] [diff] [review]
v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)

>-        JS_FinishArenaPool(&cx->stackPool);
>+        JS_FreeArenaPool(&cx->stackPool);
>     if (cx->tempPool.current == &cx->tempPool.first)
>-        JS_FinishArenaPool(&cx->tempPool);
>+        JS_FreeArenaPool(&cx->tempPool);

I agree with Brendan, I don't see the purpose of this change.

>+    JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp",
>+                       12 * 1024 - 40,  /* FIXME: bug421435 */
>+                       sizeof(void *), &cx->scriptStackQuota);

Need a space after "bug" in "bug421435"

>             age = JS_Now() - *(int64 *) a->base;
>             if (age > (int64) acx->runtime->gcStackPoolLifespan * 1000)
>-                JS_FinishArenaPool(&acx->stackPool);
>+                JS_FreeArenaPool(&acx->stackPool);

Again, the Finish -> Free change.

>+        a = acx->regexpPool.current;
>+        if (a == acx->regexpPool.first.next &&
>+            a->avail == a->base + sizeof(int64)) {
>+            age = JS_Now() - *(int64 *) a->base;
>+            if (age > (int64) acx->runtime->gcStackPoolLifespan * 1000)
>+                JS_FreeArenaPool(&acx->regexpPool);

gcStackPoolLifespan should get itself a new name, since it is being used for more.  Too late to fix the GCPARAM name (?), I guess -- wish I'd had more vision there -- if it's not too late, we should do it.  Suggest gcArenaPoolsLifespan or something.

Please combined these pool timestamp patterns as a macro or function, instead of cut-and-pasting.
Attachment #311478 - Flags: review?(crowder) → review-
Sorry, missed comment #37 while reviewing.  Let's do move those changes to another bug, I'll review quickly.
(In reply to comment #36) 
> Otherwise looks good at a glance, would be good if via debugger inspection or
> logging you proved that the regexpPool is surviving for the desired interval
> during sunspider testing, which would make the slowdown even more mysterious --
> or if you find a bug (such as the lifetime not being configured from 0, or some
> other bug), it would explain the slowdown.

I have MemStats output from a SunSpider run of an optimized shell build to
which the patch in question was applied.  The interesting stats are below, with
a partial histogram (of the "interesting" allocation sizes):

With v6:
numMallocs:         21360
numReallocs:        41
totalHist:
   (1040,1536) 11169
   (2071,2560) 7
   (4119,4608) 4
   (8175,8192) 9190
   (8211,8704) 995
   (8215,8704) 2
   (8219,8704) 4
   (12267,12288) 1
   (16023,16384) 1
   (16407,20480) 2

Trunk:
numMallocs:         145663
numReallocs:        54
totalHist:
   (1040,1536) 11169
   (1047,1536) 10228
   (2071,2560) 7
   (2075,2560) 8
   (4119,4608) 4
   (4123,4608) 5
   (8211,8704) 995
   (8215,8704) 2
   (8219,8704) 4
   (12307,12800) 123266
   (16023,16384) 1
   (16407,20480) 2

So, you can see that the number of calls to malloc has dropped by about 120K. 
Also, the number of allocations of size 12K (the size of the regexpPool's
arenas) has dropped by that same amount.

So, the patch is definitely avoiding the calls to malloc - the purpose of the
patch.

(Keep in mind that the function js_ExecuteRegExp is still being called that
120K times.  So, the new check that is in the v6 patch (to see whether the pool
is empty) happens 120K times during the SunSpider run.  I wouldn't have thought
that that was enough to cause the speed discrepancy, but it's not totally
crazy, I guess.)
This patch changes 'gcStackPoolLifespan' to 'gcEmptyArenaPoolLifespan'.  The FREE_OLD_ARENAS macro factors some common code out of 'js_TraceContext', but it may not be well-positioned.  Please let me know.
Attachment #311478 - Attachment is obsolete: true
Attachment #311674 - Flags: review?(crowder)
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder

>     if (IS_GC_MARKING_TRACER(trc)) {
>+
>+#define FREE_OLD_ARENAS(pool)                                                 \

Maybe "MAYBE_FREE_TIMESTAMPED_POOL(pool)" or something?  Just a thought, this is fine.

>         /*
>-         * Release stackPool here, if it has been in existence for longer than
>-         * the limit specified by gcStackPoolLifespan.
>+         * Release the stackPool's arenas, if the stackPool has existed for
>+         * longer than the limit specified by gcEmptyArenaPoolLifespan.
>          */

Maybe this should just be a comment about the macro earlier up, and then ditch this and the next comment.  Again, just a thought.

Otherwise, looks good.  I will r+ a new patch quickly, if you choose to do one.  If you'd like to address these nits, I'll r+ quickly.
Attachment #311674 - Flags: review?(crowder) → review+
Attachment #311674 - Flags: superreview?(brendan)
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder

Brian Crowder agreed (in an IM) that no superreview was needed.
Attachment #311674 - Flags: superreview?(brendan)
I'm okay to check this in, now.
Keywords: checkin-needed
(In reply to comment #44)
> I'm okay to check this in, now.

Since it isn't a blocker, it needs approval before it can be checked-in.
Keywords: checkin-needed
Attachment #311674 - Flags: approval1.9?
The patch fixes an extreme overuse of malloc/free calls, and unnecessary ones at that.  It provides a performance improvement.  The fix is well tested (by me, on my Mac).  The bug blocks many others.  For those reasons, it should be a blocker.
Flags: blocking1.9?
There is no way that we would hold the release of Firefox 3 for this bug -- not a blocker.  That patch might get approval anyway, though we're still chasing crash regressions in JS from other recent patches (on branch and trunk), so I'm personally feeling very conservative about taking anything in JS that doesn't resolve a know crash or blocker.

Maybe others will feel differently!
Flags: blocking1.9? → blocking1.9-
(In reply to comment #46)
> The patch fixes an extreme overuse of malloc/free calls, and unnecessary ones
> at that.  It provides a performance improvement.  The fix is well tested (by
> me, on my Mac).  The bug blocks many others.  For those reasons, it should be a
> blocker.
> 

What is the performance win?  Have js tesuite + mochi tests been run?
For sunspider discussion, see comment #35
(In reply to comment #49)
> For sunspider discussion, see comment #35
> 

If you can confirm that this passes test suites we will approve..
I ran
   jsDriver.pl -e smdebug -L slow-n.tests -L spidermonkey-n.tests
in mozilla/js/tests and got 167 failures.  That's the same amount that I get on an unmodified build for that test set, so I consider that a pass.

Please let me know what you mean by "test suites" and I will run them.
(In reply to comment #51)

> Please let me know what you mean by "test suites" and I will run them.
> 

Mochitests are generally good for testing out how JS engine changes affect the browser:

http://developer.mozilla.org/en/docs/Mochitest




Robin:  Do you have mochitest results yet?
(In reply to comment #53)
> Robin:  Do you have mochitest results yet?

Yes.  I get:

Passed: 48057
Failed: 493
Todo: 1101 

on both an unmodified build and one modified with the "v7" patch.  Since both builds give the same results, I assume that the patch is good.

Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder

a1.9=beltzner
Attachment #311674 - Flags: approval1.9? → approval1.9+
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder

I'm actually retracting my review on this for now, based on the memory results from schreps last run.
Attachment #311674 - Flags: review+ → review-
What results would those be?
checkin-not-needed-yet
Keywords: checkin-needed
(In reply to comment #57)
> What results would those be?
> 

The long-running pageset results showed a 2x increase in final memory usage. 
Brian did those builds have everything else otherwise the same - e.g. jemalloc?
schrep:  I did not provide a custom .mozconfig to my TryServer build, so it should be comparable to a nightly or tinderbox build.
Whoops, I just realized why!

+
+    JS_INIT_ARENA_POOL(&cx->tempPool, "temp",
+                       8 * 1024 - 40,  /* FIXME: bug 421435 */
+                       sizeof(jsdouble), &cx->scriptStackQuota);
+

This change cannot accompany this bug, growing the tempPool by 8x is unacceptable without an equivalent timestamp/release mechanism.  Sorry I missed this in review.  :(

This completely explains the terrible footprint performance (assuming we're opening lots of tabs), since this would cause every cx (of which there are two per window) to need ~7K more than before.
(In reply to comment #59)
> (In reply to comment #57)
> > What results would those be?
> The long-running pageset results showed a 2x increase in final memory usage. 
> Brian did those builds have everything else otherwise the same - e.g. jemalloc?

Can someone point me to a description of this test, and maybe to instructions on how to repro?
 

(In reply to comment #61)
> Whoops, I just realized why! 
> +
> +    JS_INIT_ARENA_POOL(&cx->tempPool, "temp",
> +                       8 * 1024 - 40,  /* FIXME: bug 421435 */
> +                       sizeof(jsdouble), &cx->scriptStackQuota);
> +
> This change cannot accompany this bug, growing the tempPool by 8x is
> unacceptable without an equivalent timestamp/release mechanism.  Sorry I missed
> this in review.  :(
> 
> This completely explains the terrible footprint performance (assuming we're
> opening lots of tabs), since this would cause every cx (of which there are two
> per window) to need ~7K more than before.

I ran a test in which I opened a bunch of tabs and then closed them one by one.  The high water mark for (JSArena) memory use happened when there were many tempPools in existence; most of them were empty, however (0 arenas).  One of them had 3.1MB of arenas.  This high water mark is not going to be reduced by changing the tempPool size - the high water mark will get worse, in fact.

So, I suspect that the number being measured in the test referred to in comment #59 is different than the high water mark.  I would like to understand what is being measured, there.  Please let me know.

Also, is it possible that there is some confusion between this bug and bug 423815?  I recall some discussion about TryServer builds for the latter, but never before for this bug.

Attached patch v7b - 1KB tempPool (obsolete) — Splinter Review
Here's an updated patch which uses a 1KB tempPool (the old tempPool size) if someone wants to test Brian's assertion in comment #61.
The updated patch (attachment 313512 [details] [diff] [review]) does not degrade optimized SunSpider (in shell) relative to the prior patch (attachment 311674 [details] [diff] [review]), for me.
(In reply to comment #59)
> The long-running pageset results showed a 2x increase in final memory usage. 

I have used http://random.pavlov.net/membuster/index.html ("membuster") to examine memory use with and without the patch on debug builds.  On my Mac, the final memory usage is the same or similar.

How are you measuring final memory use?
Attachment #313512 - Flags: review?(crowder)
schrep:  Can you comment on comment #66 here?  I have to agree with Robin on this: I'm not sure how the results you're seeing can be possible.  I did not turn off jemalloc in any of the tryserver builds I gave you.
Comment on attachment 313512 [details] [diff] [review]
v7b - 1KB tempPool

I'd like to try this again after 1.9 is out.
Attachment #313512 - Flags: review?(crowder) → review+
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder

This can wait for post-1.9
Attachment #311674 - Flags: approval1.9+
Flags: wanted1.9.1?
OK, I've unbitrotted the v7b patch using the 1KB temp pool. This patch should apply cleanly to mozilla-central. In my own testing, I saw a 3-4% speedup on my Core2Duo E8400 system running Vista x64 Edition on a clean profile with JIT enabled.

    TEST     |   BEFORE  |   AFTER   |   SPEEDUP
-------------|-----------|-----------|-------------
   Dromaeo   | 4542.60ms | 4392.80ms | 3.3% faster
  SunSpider  | 1128.80ms | 1082.60ms | 4.1% faster

Detailed comparison links:
http://v2.dromaeo.com/?id=38452,38449 (Before,After)

SunSpider Before:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B39,39,39,38,39%5D,%223d-morph%22:%5B19,20,20,21,20%5D,%223d-raytrace%22:%5B48,48,46,56,47%5D,%22access-binary-trees%22:%5B36,34,34,50,34%5D,%22access-fannkuch%22:%5B49,74,49,48,54%5D,%22access-nbody%22:%5B22,21,21,21,21%5D,%22access-nsieve%22:%5B9,9,9,9,9%5D,%22bitops-3bit-bits-in-byte%22:%5B2,1,1,1,2%5D,%22bitops-bits-in-byte%22:%5B8,7,8,8,7%5D,%22bitops-bitwise-and%22:%5B16,16,16,16,16%5D,%22bitops-nsieve-bits%22:%5B17,16,16,17,17%5D,%22controlflow-recursive%22:%5B31,32,31,31,31%5D,%22crypto-aes%22:%5B25,25,26,26,27%5D,%22crypto-md5%22:%5B15,16,16,16,16%5D,%22crypto-sha1%22:%5B7,7,6,6,7%5D,%22date-format-tofte%22:%5B99,100,99,101,100%5D,%22date-format-xparb%22:%5B71,68,68,70,69%5D,%22math-cordic%22:%5B20,19,19,20,19%5D,%22math-partial-sums%22:%5B13,13,12,13,13%5D,%22math-spectral-norm%22:%5B5,6,6,6,5%5D,%22regexp-dna%22:%5B186,186,188,185,182%5D,%22string-base64%22:%5B13,13,13,13,13%5D,%22string-fasta%22:%5B61,63,61,62,64%5D,%22string-tagcloud%22:%5B84,83,81,82,89%5D,%22string-unpack-code%22:%5B182,186,182,182,167%5D,%22string-validate-input%22:%5B46,46,46,46,48%5D%7D
SunSpider After:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B39,38,38,38,38%5D,%223d-morph%22:%5B19,20,21,20,20%5D,%223d-raytrace%22:%5B47,109,45,51,109%5D,%22access-binary-trees%22:%5B35,34,34,47,38%5D,%22access-fannkuch%22:%5B55,53,52,55,51%5D,%22access-nbody%22:%5B21,21,20,20,21%5D,%22access-nsieve%22:%5B8,8,9,9,9%5D,%22bitops-3bit-bits-in-byte%22:%5B1,2,1,1,2%5D,%22bitops-bits-in-byte%22:%5B7,8,8,7,8%5D,%22bitops-bitwise-and%22:%5B5,11,10,16,16%5D,%22bitops-nsieve-bits%22:%5B17,17,17,16,17%5D,%22controlflow-recursive%22:%5B33,33,32,32,33%5D,%22crypto-aes%22:%5B24,24,25,26,24%5D,%22crypto-md5%22:%5B16,16,16,15,16%5D,%22crypto-sha1%22:%5B7,7,7,7,7%5D,%22date-format-tofte%22:%5B98,98,98,98,99%5D,%22date-format-xparb%22:%5B66,69,69,68,67%5D,%22math-cordic%22:%5B20,20,20,21,19%5D,%22math-partial-sums%22:%5B13,13,13,13,13%5D,%22math-spectral-norm%22:%5B5,5,5,5,5%5D,%22regexp-dna%22:%5B167,177,179,187,185%5D,%22string-base64%22:%5B13,13,13,13,12%5D,%22string-fasta%22:%5B64,62,60,62,63%5D,%22string-tagcloud%22:%5B73,74,75,75,78%5D,%22string-unpack-code%22:%5B139,139,141,141,139%5D,%22string-validate-input%22:%5B43,43,42,41,41%5D%7D

TryServer Builds:
https://build.mozilla.org/tryserver-builds/2008-09-10_11:46-bcrowder@mozilla.com-try-2d02343ac72/
Attachment #311674 - Attachment is obsolete: true
Attachment #313512 - Attachment is obsolete: true
Attachment #338014 - Flags: review?(crowder)
Attachment #338014 - Attachment is patch: true
Attachment #338014 - Attachment mime type: application/octet-stream → text/plain
Attachment #338014 - Flags: review?(crowder) → review+
Comment on attachment 338014 [details] [diff] [review]
v8 - unbitrotted v7b patch updated to mozilla-central

You had me at hello (aka v7).
Brian, I assume you'll take care of this from here?
Keywords: checkin-needed
Ryan, thanks for resuscitating this bug.  It blocks many others, so getting it in (successfully) will open an opportunity for further improvements.  Good job.
v8 landed as changeset a917af161e35
http://hg.mozilla.org/mozilla-central/rev/a917af161e35

I'm not seeing any obvious perf or allocation wins on the tinderboxen, but those numbers are all very noisy, so it's hard to make any real conclusions. On the other hand, it didn't seem to break anything, so I'm going to mark this FIXED.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
It's okay not to see tinderbox performance improvements, given how those are measured.

The primary purpose of this bug to to remove the many-too-many creations of the regexp arena.  Removing these will permit other arena-related improvements to work as they should.

It so happened that on my machine, there was a 1.2% on an optimized shell build run of SunSpider (for patch v7b).  As long as there is no performance degradation due to patch v8, then we should be happy.

Note that this bug blocked many others!
Duplicate of this bug: 423815
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.