Closed
Bug 412866
Opened 17 years ago
Closed 16 years ago
Reuse regexp arena
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: pavlov, Assigned: moz)
References
Details
(Keywords: memory-footprint, perf)
Attachments
(2 files, 7 obsolete files)
3.64 KB,
text/plain
|
Details | |
13.62 KB,
patch
|
crowderbt
:
review+
|
Details | Diff | Splinter Review |
Currently in js_ExecuteRegExp it creates a new arena pool of allocations 12k in size. When running the sunspider test, I see a total of 583074 allocations coming from that arena. I tried increasing the 12k to sizes up to 64k and it made no difference on the size, so it looks like we're probably calling js_ExecuteRegExp a _lot_ of times. It might be worth while to keep the arena around as part of the context or to maybe reuse the now pinnable temp arena (although if you did that you'd have to increase the size of the temp arena...)?
Thoughts?
libmemory.dylib`zone_malloc+0x11
libSystem.B.dylib`malloc_zone_malloc+0x51
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_ArenaAllocate+0x85
libmozjs.dylib`js_ExecuteRegExp+0x111
libmozjs.dylib`match_or_replace+0x23e
libmozjs.dylib`str_replace+0x1bc
libmozjs.dylib`js_Interpret+0x67ef
libmozjs.dylib`js_Execute+0x292
libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x82
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 583074
16384 | 0
(the 8192 value there covers 8192-16383 byte allocations)
Reporter | ||
Comment 1•17 years ago
|
||
this isn't probably what you want, but this will get rid of all the allocations. you could probably do something like this along with a patch like the one in 408113 to get something that works well.
Assignee | ||
Comment 2•17 years ago
|
||
It is interesting to note that the hack is necessary because of a difference between the JSArenaPool code and the PLArenaPool code on which it is based. The latter had a "really deallocate" flag set to false in its PL_FreeArenaPool code. In JS-version, the flag is missing, and the code always deallocates the arenas, instead of saving them on a free list. This is why there are so many calls to malloc - the arena pool has to malloc and free its arenas all the time.
I think that the non-hack version of this code to call JS_FreeArenaPool, which is intended to be different from JS_FinishArenaPool. It's not, though. The non-hack fix would include fixing JS_FreeArenaPool to operate more like PL_FreeArenaPool.
The "Free" versions of functions, like PL_FreeArenaPool, are intended to signal a deallocation of all of the little allocations that came from the pool. The "Finish" versions of the functions, like PL_FinishArenaPool, are like destructors for the arena pool, meaning that the pool will not be reused.
Comment 3•17 years ago
|
||
I'm the original author of prarena.c based on the lcc guys' paper, in 1995. This became plarena.c in NSPR "2" (what we now call just NSPR), and jsarena.c some time in the late 90s.
Old icky C code, I know.
We do not want to hoard arenas on a truly-global (requiring a separate lock, which is a hotspot) freelist. So we're not going to make a change at the jsarena level to reuse regexp arenas.
jsregexp.c's use of an arena-pool in REGlobalData is suspect on its face. Two of the three JS_ARENA_ALLOCATE initial allocations may grow (JS_ARENA_GROW), but note how the two that may grow are the first and second, which run into the second and third and so must grow by allocating more space and copying! This is all misbegotten, IIRC -- rogerl (jsregexp.c rewrite author) was tuning against a slow malloc, possibly with a jsarena.c that *did* have a private freelist, and he found arenas to be faster.
I'd much rather see a patch to base this code directly on malloc/realloc/free, and benchmark that in time as well as space, with time-based caching to hold onto the space between regexp executions (using new JSContext members). See bug 408113 for cheap time-based memcaching ideas.
/be
Assignee | ||
Comment 4•17 years ago
|
||
(In reply to comment #3)
> I'm the original author of prarena.c based on the lcc guys' paper, in 1995.
That is available online, for those interested:
http://storage.webhop.net/documents/fastalloc.pdf
Assignee | ||
Comment 5•17 years ago
|
||
(In reply to comment #3)
> We do not want to hoard arenas on a truly-global (requiring a separate lock,
> which is a hotspot) freelist. So we're not going to make a change at the
> jsarena level to reuse regexp arenas.
Actually, I don't think *anyone* should want to have a global freelist for arenas. The freelists should be per-arena-pool, and not just for regexp arenas. This is because the size of the arenas are a function of the pool to which they *originally* belonged. When they go on a global freelist, they can be taken off by an arena pool for which they are unsuitably sized. This could be a cause of unnecessary fragmentation. Also, it requires searching a (possibly very long) freelist to find a suitably sized arena, which is slow and will pollute caches and TLBs (causing further slowdown). The global freelist must go, and not just for js* code.
In any case, I would help with changes if you like, Brendan. Please let me know.
Comment 6•17 years ago
|
||
Agreed on global freelist (IIRC the lcc guys did that, but for a compiler using mid-90s intermediate forms and optimizations, probably that was ok) being wrong in general. Freelist per pool is better for sure if one has to provide a freelist to make the Free vs. Finish distinction meaningful again.
But I don't believe we want that in SpiderMonkey's jsarena.c, certainly not for the jsregexp.c use-case. That use-case (which probably should be examined closely when there's more time, with an eye on avoiding two dynamically allocated stacks) is peculiar enough, and has growth patterns that look likely enough to collide in a single pool, that it would be better to investigate direct malloc/realloc usage.
There's no unwind-protect problem where we need to release to a given mark, if I remember the code correctly. We can simply malloc, realloc, and free at the end -- or put the allocations in a single-entry time-based cache, if they are not too big.
/be
/be
Assignee | ||
Comment 7•17 years ago
|
||
What is the status of this bug?
Shall I take it and continue the investigation?
I think that jsarena code would benefit from caching at least 1 arena, to avoid malloc/free thrashing. I would prove that with stats, first, before implementing any changes. Maybe a new bug for that investigation? (Or, do it as part of bug 421435...)
Comment 8•17 years ago
|
||
(In reply to comment #7)
> What is the status of this bug?
>
> Shall I take it and continue the investigation?
>
Sure
Assignee | ||
Updated•17 years ago
|
Assignee: general → moz
Priority: -- → P4
Assignee | ||
Comment 9•17 years ago
|
||
The regexp pool is the one that is thrashing (see bug 423815 and bug 421435). A slightly modified version of Stuart's patch reduces the number of calls to malloc during a SunSpider run by 700% or so.
I will post details soon.
A fix for this is necessary, otherwise SunSpider run times will be overly sensitive to changes on JS_InitArenaPool and JS_ArenaAllocate. That results in me being unable to checkin changes to those functions because the changes slow SunSpider, even though they shouldn't. :(
Assignee | ||
Comment 10•17 years ago
|
||
This patch, the main idea of which comes from Stuart Parmenter's patch, drastically reduces the number of calls to malloc. The problem was that the regexp pool was thrashing (created and destroyed, over and over).
This patch produces a reproducible 3% improvement on SunSpider run from an optimized shell build.
I hope some JS gurus will look at the patch and tell me whether the regexp pool is now in the right "scope". Also: if the regexp pool and the temp pool have the same lifetimes, then why are there two pools?
Attachment #297669 -
Attachment is obsolete: true
Attachment #310891 -
Flags: review?
Assignee | ||
Comment 11•17 years ago
|
||
Here's the sunspider-compare output for the before-and-after this patch.
Assignee | ||
Comment 12•17 years ago
|
||
Here are some MemStats output to show what the most recent patch does:
Before the patch:
numMallocs: 145663
numReallocs: 54
numFrees: 145663
hwmReal: 4147712
After the patch:
numMallocs: 21361
numReallocs: 41
numFrees: 21361
hwmReal: 4147200
So, the real memory use stayed the same at the high water mark; the number of reallocations decreased (significant because a realloc is indicative of a very large memory allocation growing, typically) from 54 to 41; and the number of calls to malloc was reduced by 681%. (!)
Here is the histogram of memory allocations for 1 run of SunSpider before the patch:
(1040,1536) 11169
(1047,1536) 10228
(2071,2560) 7
(2075,2560) 8
(4119,4608) 4
(4123,4608) 5
(8211,8704) 995
(8215,8704) 2
(8219,8704) 4
(12307,12800) 123266
(16023,16384) 1
(16407,20480) 2
(16411,20480) 4
.... and more ....
Here is the histogram after the patch:
(1040,1536) 11169
(2071,2560) 7
(4119,4608) 4
(8175,8192) 9190
(8211,8704) 995
(8215,8704) 2
(8219,8704) 4
(12267,12288) 2
(16023,16384) 1
(16407,20480) 2
(16411,20480) 4
.... more, but same histogram as above ....
The 123266 allocations of sizes ~12K are reduced to 2. Sweet.
Status: NEW → ASSIGNED
Comment 13•17 years ago
|
||
Comment on attachment 310891 [details] [diff] [review]
v3 - 3% on SunSpider opt
>+ /*
>+ * Pin a small allocation in regexpPool to keep at least the first arena in
>+ * the pool. This will avoid unnecessary calls to malloc whenever someone
>+ * uses js_AllocStack and js_FreeStack.
>+ */
>+ {
>+ void *dummy;
>+ JS_ARENA_ALLOCATE(dummy, &cx->regexpPool, sizeof(void*));
>+ }
The regexpPool isn't really used by js_AllocStack or js_FreeStack, is it? I suspect that comment was insufficiently edited after its transplant. :)
Comment 14•17 years ago
|
||
In-browser, this patch shows a 1% overall improvement, mostly concentrated in string-unpack-code. regexp-dna also shows a small improvement.
Assignee | ||
Comment 15•17 years ago
|
||
(In reply to comment #13)
> The regexpPool isn't really used by js_AllocStack or js_FreeStack, is it? I
> suspect that comment was insufficiently edited after its transplant. :)
Yes. Fixing in next patch....
Assignee | ||
Comment 16•17 years ago
|
||
Attachment #310891 -
Attachment is obsolete: true
Attachment #310891 -
Flags: review?
Assignee | ||
Comment 17•17 years ago
|
||
(In reply to comment #14)
> In-browser, this patch shows a 1% overall improvement, mostly concentrated in
> string-unpack-code. regexp-dna also shows a small improvement.
Thanks for testing! What platform was this?
Comment 18•17 years ago
|
||
(In reply to comment #17)
> (In reply to comment #14)
> > In-browser, this patch shows a 1% overall improvement, mostly concentrated in
> > string-unpack-code. regexp-dna also shows a small improvement.
>
> Thanks for testing! What platform was this?
>
Mac 0S 10.5 Intel.
Comment 19•17 years ago
|
||
Comment on attachment 310909 [details] [diff] [review]
v4 - small fix to v3
Unsolicited review, but this looks winning in its essence and time is short.
Please split the jsarena.[ch] patch out to a separate bug. Also, style and substance nits:
>+# define JS_ARENA_MEM_FREE(p) free(p)
>+# define JS_ARENA_MEM_MALLOC(rp, sz) (rp = malloc(sz))
>+# define JS_ARENA_MEM_REALLOC(rp, p, newSz) (rp = realloc((p), (newSz)))
Indent the macro definitions to start in the same column for less ragged-middle readability stress.
For C++ compilation support the last two need casts, so (substance here) you'll need to pass the pointer type in as a macro param.
>+ /*
>+ * To avoid multiple allocations in InitMatch() (in jsregexp.c), the arena
>+ * size parameter should be at least as big as:
>+ * INITIAL_BACKTRACK
>+ * + (sizeof(REProgState) * INITIAL_STATESTACK)
>+ * + (offsetof(REMatchState, parens) + avgParanSize * sizeof(RECapture))
Indenting the last three lines (the terms of the + operators) two spaces in the comment would make those lines more readable IMO.
>+ */
>+ JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp", 12 * 1024 - 40, 4,
Shouldn't the magic numbers be manifest constants, or combined into one manifest?
The alignment grain of 4 should probably be sizeof(void *).
> &cx->scriptStackQuota);
>
>+ /*
>+ * Make a small allocation in regexpPool to keep at least the first arena in
>+ * the pool. This avoids unnecessary arena allocations.
>+ */
>+ JS_ARENA_ALLOCATE(unused, &cx->regexpPool, sizeof(void*));
Nit: sizeof(void *) -- one space before * and other such declarator modes, and after the type name.
>- JS_ARENA_GROW_CAST(gData->stateStack, REProgState *, &gData->pool, sz, sz);
>+ JS_ARENA_GROW_CAST(gData->stateStack, REProgState *, &gData->cx->regexpPool, sz, sz);
..12345678901234567890123456789012345678901234567890123456789012345678901234567890
Stop shy of column 80, so best to wrap before &gData->cx->regexpPool.
/be
Assignee | ||
Comment 20•17 years ago
|
||
(In reply to comment #19)
> Unsolicited review, but this looks winning in its essence and time is short.
Brendan, thanks for the review!
> Please split the jsarena.[ch] patch out to a separate bug.
I have opened bug 424287 for the logging stuff. I'll get to that soon, because I'm getting tired of putting it in and then taking it out before every patch submission.
> Shouldn't the magic numbers be manifest constants, or combined into one
> manifest?
I will fix these constants in bug 421435, after bug 420476 is in, which will allow me to lose the stupid -40 hackery.
I have addressed your other concerns in my upcoming patch.
Assignee | ||
Comment 21•17 years ago
|
||
Attachment #310909 -
Attachment is obsolete: true
Attachment #310919 -
Flags: review?(brendan)
Comment 22•17 years ago
|
||
Comment on attachment 310919 [details] [diff] [review]
v5 - Post review from Brendan
>+ JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp",
>+ 12 * 1024 - 40, /* FIXME: bug421435 */
>+ sizeof(void *), &cx->scriptStackQuota);
>+
>+ /*
>+ * Make a small allocation in regexpPool to keep at least the first arena in
>+ * the pool. This avoids unnecessary arena allocations.
>+ */
>+ JS_ARENA_ALLOCATE(unused, &cx->regexpPool, sizeof(void *));
It seems to me we should do what bug 408113 did for cx->stackPool, and not preallocate and hoard significant memory per context as this patch does, instead delaying the 12K malloc to first regexp execution with a timestamp kept live at the front of the first real arena in the pool, and freeing that arena from the GC when its time-based cache lifetime has expired.
Sorry I didn't mention this last time.
/be
Assignee | ||
Comment 23•17 years ago
|
||
(In reply to comment #22)
> It seems to me we should do what bug 408113 did for cx->stackPool, and not
> preallocate and hoard significant memory per context as this patch does,
> instead delaying the 12K malloc to first regexp execution with a timestamp kept
> live at the front of the first real arena in the pool, and freeing that arena
> from the GC when its time-based cache lifetime has expired.
In that bug, we added a timestamp to the arena's base (in js_AllocRawStack), a garbage collector tunable parameter (stackpool lifespan), and a check of the timestamp against the lifespan in js_TraceContext. This might help free one arena per context. It was kind of a time-based virtual-memory-recovery algorithm.
But, there was already a time-based virtual-memory-recovery algorithm available: the OS's. As we know, the OS will steal physical pages that have not been touched in a while. If Firefox's stack pool arena goes unused for a time, and the OS needs memory, then it will take the physical pages from the stack pool's arena. Firefox's virtual memory size (and it's layout) will remain unchanged, but it's resident set size (the physical pages that it's actually using) will decrease. Then, when the stack pool needs memory again, it's there, transparently. Nice. Total code required for this solution: 0.
My understanding is that there are only 1 or 2 contexts per browser tab. Please correct my understanding, if appropriate.
So, I pose these questions, for Brendan and others. They are not rhetorical: Are we trying to do too much, in this bug, or can we check it in as-is? Is the new proposed functionality (like in bug 408113) worth the effort? Can we check that in with another bug? If we do need functionality like with the stack pool in bug 408113, then do we want to check the regexpPool's timestamp in the same place as we check the stack pool's timestamp?
(BTW, jemalloc (Mozilla's new memory subsystem) is only going to release memory back to the OS if it has a "chunk's" worth of it. That means that you need to have a 1MB contiguous region of memory before jemalloc releases it to the OS. I'm not sure if "releasing memory back to the OS" is motivating the change, but if it is, then know that the only other consumer of the released stack pool arena's memory is Firefox.)
Comment 24•17 years ago
|
||
Since Spidermonkey can be embedded on architectures without virtual memory, we must attempt to play nice on those architectures. You need to match my timestamp code for this pool, as well.
Comment 25•17 years ago
|
||
Also: users still complain about VM usage, even when it doesn't impact them. For a user with dozens and dozens of tabs open, at a cost of 12KB * 2 per-tab, this will be a noticeable footprint degradation (even if, in reality, they aren't suffering as a result).
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #24)
> Since Spidermonkey can be embedded on architectures without virtual memory, we
> must attempt to play nice on those architectures. You need to match my
> timestamp code for this pool, as well.
Ah, this makes a big difference. In that case, my comments about jemalloc don't apply, either. I'll add code for timestamp which closely follows what the stack pool code does.
Assignee | ||
Comment 27•17 years ago
|
||
(In reply to comment #25)
> Also: users still complain about VM usage, even when it doesn't impact them.
> For a user with dozens and dozens of tabs open, at a cost of 12KB * 2 per-tab,
> this will be a noticeable footprint degradation (even if, in reality, they
> aren't suffering as a result).
Yes, the footprint will be degraded. But, it will not likely be improved by freeing those arenas, once degraded.
My comment about chunks and jemalloc was meant to address this issue. (This applies only to the Firefox case on a "typical" computer/OS - one with a VM subsystem.) malloc (in jemalloc, specifically) gulps memory away from the OS in big chunks (1 or 2 MB, say). To have a noticeable effect on VM size (not RSS size), one must return an entire chunk back to the OS, which means that one must free 1MB worth of contiguous memory. This is not likely to happen even if one freed 300 tabs' regexp arenas (3.6MB), which are not likely contiguous.
However, it does make sense to free the arenas for other reasons. I'm on board with the change, especially since Spidermonkey has to work on embedded devices, etc.
Assignee | ||
Comment 28•17 years ago
|
||
This is how I think a "bug 408113"-like change should be made for regexpPool. Let me know if I am on the wrong track.
This patch achieves the same effect as the prior "v5" patch, in terms of reducing calls to malloc.
Unfortunately, it slows SunSpider by 1.9% compared to that patch. This is due to the extra code path in js_ExecuteRegExp. The extra code is merely the check to see if the timestamp needs to be added to the base of the pool's first arena.
I suspect that the equivalent check in js_AllocRawStack in jsinterp.c gave an equivalent slowdown when that change was made in 408113.
Ideas, anyone, about how to proceed?
I would like put the check for regexpPool's emptiness outside of js_ExecuteRegExp, but where?
Attachment #310919 -
Attachment is obsolete: true
Attachment #311478 -
Flags: review?(brendan)
Attachment #310919 -
Flags: review?(brendan)
Assignee | ||
Updated•17 years ago
|
Attachment #311478 -
Attachment description: v6 - bug 408133-like code for regexpPool (slows SunSpider) → v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)
Assignee | ||
Comment 29•17 years ago
|
||
(In reply to comment #28)
> I suspect that the equivalent check in js_AllocRawStack in jsinterp.c gave an
> equivalent slowdown when that change was made in 408113.
My suspicion is not confirmed. I checked. There was no speedup when I undid the change from bug 408113.
Assignee | ||
Comment 30•17 years ago
|
||
(In reply to comment #28)
> Unfortunately, it slows SunSpider by 1.9% compared to that patch.
Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build on my Mac. That is comparing to a recent build without any patch.
Comment 31•17 years ago
|
||
(In reply to comment #30)
> (In reply to comment #28)
> > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
>
> Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> on my Mac. That is comparing to a recent build without any patch.
>
Was your first test in a debug build?
Assignee | ||
Comment 32•17 years ago
|
||
(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #28)
> > > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
> >
> > Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> > on my Mac. That is comparing to a recent build without any patch.
> Was your first test in a debug build?
No, definitely not.
Comment 33•17 years ago
|
||
(In reply to comment #32)
> (In reply to comment #31)
> > (In reply to comment #30)
> > > (In reply to comment #28)
> > > > Unfortunately, it slows SunSpider by 1.9% compared to that patch.
> > >
> > > Nonetheless, v6 gives a 1.2% speedup on SunSpider in an optimized shell build
> > > on my Mac. That is comparing to a recent build without any patch.
> > Was your first test in a debug build?
> No, definitely not.
>
Perhaps you can answer the source of my question - you reported a slowdown first, then a speedup. What change accounted for this difference?
Comment 34•17 years ago
|
||
He's saying the version without the timestamping is faster than the version with, but both are faster trunk.
Assignee | ||
Comment 35•17 years ago
|
||
(In reply to comment #34)
> He's saying the version without the timestamping is faster than the version
> with, but both are faster trunk.
Yes, thanks for clarifying, Brian. The version without timestamping ("v5" patch) gave 3% speed increase on SunSpider. The version with timestamping ("v6" patch) gave 1.2% speed increase.
I was (and still am) unclear about why the magnitude of the speed increase in the timestamping version is so much less than that for the non-timestamping version.
Comment 36•17 years ago
|
||
Comment on attachment 311478 [details] [diff] [review]
v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)
>Index: jsapi.c
>===================================================================
>RCS file: /cvsroot/mozilla/js/src/jsapi.c,v
>retrieving revision 3.434
>diff -u -8 -p -r3.434 jsapi.c
>--- jsapi.c 19 Mar 2008 02:47:40 -0000 3.434
>+++ jsapi.c 24 Mar 2008 23:54:30 -0000
>@@ -2424,19 +2424,19 @@ JS_IsGCMarkingTracer(JSTracer *trc)
> return IS_GC_MARKING_TRACER(trc);
> }
>
> JS_PUBLIC_API(void)
> JS_GC(JSContext *cx)
> {
> /* Don't nuke active arenas if executing or compiling. */
> if (cx->stackPool.current == &cx->stackPool.first)
>- JS_FinishArenaPool(&cx->stackPool);
>+ JS_FreeArenaPool(&cx->stackPool);
> if (cx->tempPool.current == &cx->tempPool.first)
>- JS_FinishArenaPool(&cx->tempPool);
>+ JS_FreeArenaPool(&cx->tempPool);
Is this really part of the patch for this bug? It seems unrelated and unmotivated.
Otherwise looks good at a glance, would be good if via debugger inspection or logging you proved that the regexpPool is surviving for the desired interval during sunspider testing, which would make the slowdown even more mysterious -- or if you find a bug (such as the lifetime not being configured from 0, or some other bug), it would explain the slowdown.
Passing addl. review to crowder since I'm skimming.
/be
Attachment #311478 -
Flags: review?(crowder)
Attachment #311478 -
Flags: review?(brendan)
Attachment #311478 -
Flags: review+
Assignee | ||
Comment 37•17 years ago
|
||
(In reply to comment #36)
> > if (cx->stackPool.current == &cx->stackPool.first)
> >- JS_FinishArenaPool(&cx->stackPool);
> >+ JS_FreeArenaPool(&cx->stackPool);
> > if (cx->tempPool.current == &cx->tempPool.first)
> >- JS_FinishArenaPool(&cx->tempPool);
> >+ JS_FreeArenaPool(&cx->tempPool);
>
> Is this really part of the patch for this bug? It seems unrelated and
> unmotivated.
It is related - my logging code discriminates between "Finish"ing an arena pool, and "Free"ing an arena pool. The former is a destructor function, and should only be called once per arena pool, while the latter does not destroy the arena pool but does free all of its arenas.
When JS_ARENAMETER is not defined, and my logging code is not enabled either, then both functions operate in the same way.
So, my change is a correction which helps with my own logging, but I could easily move the change to another bug.
Comment 38•17 years ago
|
||
Comment on attachment 311478 [details] [diff] [review]
v6 - bug 408133-like code for regexpPool (slows SunSpider relative to v5)
>- JS_FinishArenaPool(&cx->stackPool);
>+ JS_FreeArenaPool(&cx->stackPool);
> if (cx->tempPool.current == &cx->tempPool.first)
>- JS_FinishArenaPool(&cx->tempPool);
>+ JS_FreeArenaPool(&cx->tempPool);
I agree with Brendan, I don't see the purpose of this change.
>+ JS_INIT_ARENA_POOL(&cx->regexpPool, "regexp",
>+ 12 * 1024 - 40, /* FIXME: bug421435 */
>+ sizeof(void *), &cx->scriptStackQuota);
Need a space after "bug" in "bug421435"
> age = JS_Now() - *(int64 *) a->base;
> if (age > (int64) acx->runtime->gcStackPoolLifespan * 1000)
>- JS_FinishArenaPool(&acx->stackPool);
>+ JS_FreeArenaPool(&acx->stackPool);
Again, the Finish -> Free change.
>+ a = acx->regexpPool.current;
>+ if (a == acx->regexpPool.first.next &&
>+ a->avail == a->base + sizeof(int64)) {
>+ age = JS_Now() - *(int64 *) a->base;
>+ if (age > (int64) acx->runtime->gcStackPoolLifespan * 1000)
>+ JS_FreeArenaPool(&acx->regexpPool);
gcStackPoolLifespan should get itself a new name, since it is being used for more. Too late to fix the GCPARAM name (?), I guess -- wish I'd had more vision there -- if it's not too late, we should do it. Suggest gcArenaPoolsLifespan or something.
Please combined these pool timestamp patterns as a macro or function, instead of cut-and-pasting.
Attachment #311478 -
Flags: review?(crowder) → review-
Comment 39•17 years ago
|
||
Sorry, missed comment #37 while reviewing. Let's do move those changes to another bug, I'll review quickly.
Assignee | ||
Comment 40•17 years ago
|
||
(In reply to comment #36)
> Otherwise looks good at a glance, would be good if via debugger inspection or
> logging you proved that the regexpPool is surviving for the desired interval
> during sunspider testing, which would make the slowdown even more mysterious --
> or if you find a bug (such as the lifetime not being configured from 0, or some
> other bug), it would explain the slowdown.
I have MemStats output from a SunSpider run of an optimized shell build to
which the patch in question was applied. The interesting stats are below, with
a partial histogram (of the "interesting" allocation sizes):
With v6:
numMallocs: 21360
numReallocs: 41
totalHist:
(1040,1536) 11169
(2071,2560) 7
(4119,4608) 4
(8175,8192) 9190
(8211,8704) 995
(8215,8704) 2
(8219,8704) 4
(12267,12288) 1
(16023,16384) 1
(16407,20480) 2
Trunk:
numMallocs: 145663
numReallocs: 54
totalHist:
(1040,1536) 11169
(1047,1536) 10228
(2071,2560) 7
(2075,2560) 8
(4119,4608) 4
(4123,4608) 5
(8211,8704) 995
(8215,8704) 2
(8219,8704) 4
(12307,12800) 123266
(16023,16384) 1
(16407,20480) 2
So, you can see that the number of calls to malloc has dropped by about 120K.
Also, the number of allocations of size 12K (the size of the regexpPool's
arenas) has dropped by that same amount.
So, the patch is definitely avoiding the calls to malloc - the purpose of the
patch.
(Keep in mind that the function js_ExecuteRegExp is still being called that
120K times. So, the new check that is in the v6 patch (to see whether the pool
is empty) happens 120K times during the SunSpider run. I wouldn't have thought
that that was enough to cause the speed discrepancy, but it's not totally
crazy, I guess.)
Assignee | ||
Comment 41•17 years ago
|
||
This patch changes 'gcStackPoolLifespan' to 'gcEmptyArenaPoolLifespan'. The FREE_OLD_ARENAS macro factors some common code out of 'js_TraceContext', but it may not be well-positioned. Please let me know.
Attachment #311478 -
Attachment is obsolete: true
Attachment #311674 -
Flags: review?(crowder)
Comment 42•17 years ago
|
||
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder
> if (IS_GC_MARKING_TRACER(trc)) {
>+
>+#define FREE_OLD_ARENAS(pool) \
Maybe "MAYBE_FREE_TIMESTAMPED_POOL(pool)" or something? Just a thought, this is fine.
> /*
>- * Release stackPool here, if it has been in existence for longer than
>- * the limit specified by gcStackPoolLifespan.
>+ * Release the stackPool's arenas, if the stackPool has existed for
>+ * longer than the limit specified by gcEmptyArenaPoolLifespan.
> */
Maybe this should just be a comment about the macro earlier up, and then ditch this and the next comment. Again, just a thought.
Otherwise, looks good. I will r+ a new patch quickly, if you choose to do one. If you'd like to address these nits, I'll r+ quickly.
Attachment #311674 -
Flags: review?(crowder) → review+
Assignee | ||
Updated•17 years ago
|
Attachment #311674 -
Flags: superreview?(brendan)
Assignee | ||
Comment 43•17 years ago
|
||
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder
Brian Crowder agreed (in an IM) that no superreview was needed.
Attachment #311674 -
Flags: superreview?(brendan)
Comment 45•17 years ago
|
||
(In reply to comment #44)
> I'm okay to check this in, now.
Since it isn't a blocker, it needs approval before it can be checked-in.
Keywords: checkin-needed
Updated•17 years ago
|
Attachment #311674 -
Flags: approval1.9?
Assignee | ||
Comment 46•17 years ago
|
||
The patch fixes an extreme overuse of malloc/free calls, and unnecessary ones at that. It provides a performance improvement. The fix is well tested (by me, on my Mac). The bug blocks many others. For those reasons, it should be a blocker.
Flags: blocking1.9?
Comment 47•17 years ago
|
||
There is no way that we would hold the release of Firefox 3 for this bug -- not a blocker. That patch might get approval anyway, though we're still chasing crash regressions in JS from other recent patches (on branch and trunk), so I'm personally feeling very conservative about taking anything in JS that doesn't resolve a know crash or blocker.
Maybe others will feel differently!
Flags: blocking1.9? → blocking1.9-
Comment 48•17 years ago
|
||
(In reply to comment #46)
> The patch fixes an extreme overuse of malloc/free calls, and unnecessary ones
> at that. It provides a performance improvement. The fix is well tested (by
> me, on my Mac). The bug blocks many others. For those reasons, it should be a
> blocker.
>
What is the performance win? Have js tesuite + mochi tests been run?
Comment 49•17 years ago
|
||
For sunspider discussion, see comment #35
Comment 50•17 years ago
|
||
(In reply to comment #49)
> For sunspider discussion, see comment #35
>
If you can confirm that this passes test suites we will approve..
Assignee | ||
Comment 51•17 years ago
|
||
I ran
jsDriver.pl -e smdebug -L slow-n.tests -L spidermonkey-n.tests
in mozilla/js/tests and got 167 failures. That's the same amount that I get on an unmodified build for that test set, so I consider that a pass.
Please let me know what you mean by "test suites" and I will run them.
Comment 52•17 years ago
|
||
(In reply to comment #51)
> Please let me know what you mean by "test suites" and I will run them.
>
Mochitests are generally good for testing out how JS engine changes affect the browser:
http://developer.mozilla.org/en/docs/Mochitest
Comment 53•17 years ago
|
||
Robin: Do you have mochitest results yet?
Assignee | ||
Comment 54•17 years ago
|
||
(In reply to comment #53)
> Robin: Do you have mochitest results yet?
Yes. I get:
Passed: 48057
Failed: 493
Todo: 1101
on both an unmodified build and one modified with the "v7" patch. Since both builds give the same results, I assume that the patch is good.
Comment 55•17 years ago
|
||
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder
a1.9=beltzner
Attachment #311674 -
Flags: approval1.9? → approval1.9+
Comment 56•17 years ago
|
||
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder
I'm actually retracting my review on this for now, based on the memory results from schreps last run.
Attachment #311674 -
Flags: review+ → review-
Updated•17 years ago
|
Keywords: checkin-needed
Comment 57•17 years ago
|
||
What results would those be?
Comment 59•17 years ago
|
||
(In reply to comment #57)
> What results would those be?
>
The long-running pageset results showed a 2x increase in final memory usage.
Brian did those builds have everything else otherwise the same - e.g. jemalloc?
Comment 60•17 years ago
|
||
schrep: I did not provide a custom .mozconfig to my TryServer build, so it should be comparable to a nightly or tinderbox build.
Comment 61•17 years ago
|
||
Whoops, I just realized why!
+
+ JS_INIT_ARENA_POOL(&cx->tempPool, "temp",
+ 8 * 1024 - 40, /* FIXME: bug 421435 */
+ sizeof(jsdouble), &cx->scriptStackQuota);
+
This change cannot accompany this bug, growing the tempPool by 8x is unacceptable without an equivalent timestamp/release mechanism. Sorry I missed this in review. :(
This completely explains the terrible footprint performance (assuming we're opening lots of tabs), since this would cause every cx (of which there are two per window) to need ~7K more than before.
Assignee | ||
Comment 62•17 years ago
|
||
(In reply to comment #59)
> (In reply to comment #57)
> > What results would those be?
> The long-running pageset results showed a 2x increase in final memory usage.
> Brian did those builds have everything else otherwise the same - e.g. jemalloc?
Can someone point me to a description of this test, and maybe to instructions on how to repro?
Assignee | ||
Comment 63•17 years ago
|
||
(In reply to comment #61)
> Whoops, I just realized why!
> +
> + JS_INIT_ARENA_POOL(&cx->tempPool, "temp",
> + 8 * 1024 - 40, /* FIXME: bug 421435 */
> + sizeof(jsdouble), &cx->scriptStackQuota);
> +
> This change cannot accompany this bug, growing the tempPool by 8x is
> unacceptable without an equivalent timestamp/release mechanism. Sorry I missed
> this in review. :(
>
> This completely explains the terrible footprint performance (assuming we're
> opening lots of tabs), since this would cause every cx (of which there are two
> per window) to need ~7K more than before.
I ran a test in which I opened a bunch of tabs and then closed them one by one. The high water mark for (JSArena) memory use happened when there were many tempPools in existence; most of them were empty, however (0 arenas). One of them had 3.1MB of arenas. This high water mark is not going to be reduced by changing the tempPool size - the high water mark will get worse, in fact.
So, I suspect that the number being measured in the test referred to in comment #59 is different than the high water mark. I would like to understand what is being measured, there. Please let me know.
Also, is it possible that there is some confusion between this bug and bug 423815? I recall some discussion about TryServer builds for the latter, but never before for this bug.
Assignee | ||
Comment 64•17 years ago
|
||
Here's an updated patch which uses a 1KB tempPool (the old tempPool size) if someone wants to test Brian's assertion in comment #61.
Assignee | ||
Comment 65•17 years ago
|
||
The updated patch (attachment 313512 [details] [diff] [review]) does not degrade optimized SunSpider (in shell) relative to the prior patch (attachment 311674 [details] [diff] [review]), for me.
Assignee | ||
Comment 66•17 years ago
|
||
(In reply to comment #59)
> The long-running pageset results showed a 2x increase in final memory usage.
I have used http://random.pavlov.net/membuster/index.html ("membuster") to examine memory use with and without the patch on debug builds. On my Mac, the final memory usage is the same or similar.
How are you measuring final memory use?
Assignee | ||
Updated•17 years ago
|
Attachment #313512 -
Flags: review?(crowder)
Comment 67•17 years ago
|
||
schrep: Can you comment on comment #66 here? I have to agree with Robin on this: I'm not sure how the results you're seeing can be possible. I did not turn off jemalloc in any of the tryserver builds I gave you.
Comment 68•17 years ago
|
||
Comment on attachment 313512 [details] [diff] [review]
v7b - 1KB tempPool
I'd like to try this again after 1.9 is out.
Attachment #313512 -
Flags: review?(crowder) → review+
Comment 69•17 years ago
|
||
Comment on attachment 311674 [details] [diff] [review]
v7 - post review from Brian Crowder
This can wait for post-1.9
Attachment #311674 -
Flags: approval1.9+
Updated•17 years ago
|
Flags: wanted1.9.1?
Comment 70•16 years ago
|
||
OK, I've unbitrotted the v7b patch using the 1KB temp pool. This patch should apply cleanly to mozilla-central. In my own testing, I saw a 3-4% speedup on my Core2Duo E8400 system running Vista x64 Edition on a clean profile with JIT enabled.
TEST | BEFORE | AFTER | SPEEDUP
-------------|-----------|-----------|-------------
Dromaeo | 4542.60ms | 4392.80ms | 3.3% faster
SunSpider | 1128.80ms | 1082.60ms | 4.1% faster
Detailed comparison links:
http://v2.dromaeo.com/?id=38452,38449 (Before,After)
SunSpider Before:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B39,39,39,38,39%5D,%223d-morph%22:%5B19,20,20,21,20%5D,%223d-raytrace%22:%5B48,48,46,56,47%5D,%22access-binary-trees%22:%5B36,34,34,50,34%5D,%22access-fannkuch%22:%5B49,74,49,48,54%5D,%22access-nbody%22:%5B22,21,21,21,21%5D,%22access-nsieve%22:%5B9,9,9,9,9%5D,%22bitops-3bit-bits-in-byte%22:%5B2,1,1,1,2%5D,%22bitops-bits-in-byte%22:%5B8,7,8,8,7%5D,%22bitops-bitwise-and%22:%5B16,16,16,16,16%5D,%22bitops-nsieve-bits%22:%5B17,16,16,17,17%5D,%22controlflow-recursive%22:%5B31,32,31,31,31%5D,%22crypto-aes%22:%5B25,25,26,26,27%5D,%22crypto-md5%22:%5B15,16,16,16,16%5D,%22crypto-sha1%22:%5B7,7,6,6,7%5D,%22date-format-tofte%22:%5B99,100,99,101,100%5D,%22date-format-xparb%22:%5B71,68,68,70,69%5D,%22math-cordic%22:%5B20,19,19,20,19%5D,%22math-partial-sums%22:%5B13,13,12,13,13%5D,%22math-spectral-norm%22:%5B5,6,6,6,5%5D,%22regexp-dna%22:%5B186,186,188,185,182%5D,%22string-base64%22:%5B13,13,13,13,13%5D,%22string-fasta%22:%5B61,63,61,62,64%5D,%22string-tagcloud%22:%5B84,83,81,82,89%5D,%22string-unpack-code%22:%5B182,186,182,182,167%5D,%22string-validate-input%22:%5B46,46,46,46,48%5D%7D
SunSpider After:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B39,38,38,38,38%5D,%223d-morph%22:%5B19,20,21,20,20%5D,%223d-raytrace%22:%5B47,109,45,51,109%5D,%22access-binary-trees%22:%5B35,34,34,47,38%5D,%22access-fannkuch%22:%5B55,53,52,55,51%5D,%22access-nbody%22:%5B21,21,20,20,21%5D,%22access-nsieve%22:%5B8,8,9,9,9%5D,%22bitops-3bit-bits-in-byte%22:%5B1,2,1,1,2%5D,%22bitops-bits-in-byte%22:%5B7,8,8,7,8%5D,%22bitops-bitwise-and%22:%5B5,11,10,16,16%5D,%22bitops-nsieve-bits%22:%5B17,17,17,16,17%5D,%22controlflow-recursive%22:%5B33,33,32,32,33%5D,%22crypto-aes%22:%5B24,24,25,26,24%5D,%22crypto-md5%22:%5B16,16,16,15,16%5D,%22crypto-sha1%22:%5B7,7,7,7,7%5D,%22date-format-tofte%22:%5B98,98,98,98,99%5D,%22date-format-xparb%22:%5B66,69,69,68,67%5D,%22math-cordic%22:%5B20,20,20,21,19%5D,%22math-partial-sums%22:%5B13,13,13,13,13%5D,%22math-spectral-norm%22:%5B5,5,5,5,5%5D,%22regexp-dna%22:%5B167,177,179,187,185%5D,%22string-base64%22:%5B13,13,13,13,12%5D,%22string-fasta%22:%5B64,62,60,62,63%5D,%22string-tagcloud%22:%5B73,74,75,75,78%5D,%22string-unpack-code%22:%5B139,139,141,141,139%5D,%22string-validate-input%22:%5B43,43,42,41,41%5D%7D
TryServer Builds:
https://build.mozilla.org/tryserver-builds/2008-09-10_11:46-bcrowder@mozilla.com-try-2d02343ac72/
Attachment #311674 -
Attachment is obsolete: true
Attachment #313512 -
Attachment is obsolete: true
Attachment #338014 -
Flags: review?(crowder)
Updated•16 years ago
|
Attachment #338014 -
Attachment is patch: true
Attachment #338014 -
Attachment mime type: application/octet-stream → text/plain
Updated•16 years ago
|
Attachment #338014 -
Flags: review?(crowder) → review+
Comment 71•16 years ago
|
||
Comment on attachment 338014 [details] [diff] [review]
v8 - unbitrotted v7b patch updated to mozilla-central
You had me at hello (aka v7).
Comment 72•16 years ago
|
||
Brian, I assume you'll take care of this from here?
Keywords: checkin-needed
Assignee | ||
Comment 73•16 years ago
|
||
Ryan, thanks for resuscitating this bug. It blocks many others, so getting it in (successfully) will open an opportunity for further improvements. Good job.
Comment 74•16 years ago
|
||
v8 landed as changeset a917af161e35
http://hg.mozilla.org/mozilla-central/rev/a917af161e35
I'm not seeing any obvious perf or allocation wins on the tinderboxen, but those numbers are all very noisy, so it's hard to make any real conclusions. On the other hand, it didn't seem to break anything, so I'm going to mark this FIXED.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 75•16 years ago
|
||
It's okay not to see tinderbox performance improvements, given how those are measured.
The primary purpose of this bug to to remove the many-too-many creations of the regexp arena. Removing these will permit other arena-related improvements to work as they should.
It so happened that on my machine, there was a 1.2% on an optimized shell build run of SunSpider (for patch v7b). As long as there is no performance degradation due to patch v8, then we should be happy.
Note that this bug blocked many others!
Updated•16 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•