In bug 675136 comment 2 I found we are allocating lots of 1056 byte chunks, which jemalloc rounds up to 2048 bytes. Here are the culprits: JS_InitArenaPool(&codePool, "code", 1024, sizeof(jsbytecode)); JS_InitArenaPool(¬ePool, "note", 1024, sizeof(jssrcnote)); 1024 is the size of each chunk in the arena. Oh wait, that's the payload size, we also have 32 bytes in a header, resulting in 1056 bytes. Awesome. All JSArenaPools use a power-of-two for their chunk size, resulting in this problem, except in jscntxt.cpp where we have this: static const size_t ARENA_HEADER_SIZE_HACK = 40; static const size_t TEMP_POOL_CHUNK_SIZE = 4096 - ARENA_HEADER_SIZE_HACK; ... JS_InitArenaPool(&cx->tempPool, "temp", TEMP_POOL_CHUNK_SIZE, sizeof(jsdouble)); JS_InitArenaPool(&cx->regExpPool, "regExp", TEMP_POOL_CHUNK_SIZE, sizeof(int)); cdleary had the same problem with tempPool and regExpPool in bug 558971, and handled it in the above fashion. Wow, no gold star for you. I intend to modify JSArena so that (a) it'll assert if you don't pass in a power-of-two, and (b) the payload size is adjusted by the size of the header. That will allow ARENA_HEADER_SIZE_HACK to be removed. Avoiding the excess allocation for codePool will result in an 8% reduction in the amount of cumulative heap allocation done when starting the browser and opening gmail(!) (See bug 675136 comment 1 and 2 for how I computed that.) I might also experiment with a larger chunk size than 1024 for codePool. Maybe in a different bug.
Nicholas, I bet PL_InitArenaPool has similar issues. Not sure about making your proposed changes to NSPR, but we can at least fix the callers as needed...
bz: possibly, but I didn't see it be a problem in practice on gmail. Any workloads you can suggest that will stress the NSPR arenas more? Hmm, I think this bug is a dup of bug 408921, and that bug makes it sound like it's harder than it first seems... but that's one of a numerous long-winded bugs that Robin Bate Boerop worked on but never landed. A fresh look will be good.
(In reply to comment #2) > bz: possibly, but I didn't see it be a problem in practice on gmail. Any > workloads you can suggest that will stress the NSPR arenas more? Bug 414088 and 419131 have more about NSPR arenas.
> Any workloads you can suggest that will stress the NSPR arenas more? Hmm. We use NSPR arenas for stringbundles, pref API, category manager, the rulehash, presshell's arenas, line layout temporary data structures, display lists, and the various consumers of nsFixedSizeAllocator. Also various XBL/chrome stuff. Of these, the obvious ones under page control that are not transient data at the moment are rulehash and the presshell arenas. The presshell arenas are the "layout" line in about:memory. So any testcase that gives you a high "layout" number would stress test that stuff. That's a 4096-byte setup, so look for allocations slightly larger than 4096 bytes while loading, say, large tables? Especially tables with large colspans and rowspans. Note that the presshell's stack arena works around this by using a block size of 4044+sizeof(void*) and hoping that plays nice with the allocator. So if we could take the guesswork out of that, it'd be nice.
Oh, to be clear the stack arena is separate from the plarenas used by frame allocations; those are the PLArena sized to 4096.
> Note that the presshell's stack arena works around this by using a block size > of 4044+sizeof(void*) and hoping that plays nice with the allocator. So if we > could take the guesswork out of that, it'd be nice. One approach is bug 675226.
Created attachment 550272 [details] [diff] [review] patch (on top of patch in bug 675545) This ensures power-of-two arena sizes, and also increases some of the arena size requests, reducing the amount of mallocing going on. This makes very little difference to Sunspider (a tiny improvement according to Cachegrind) because it doesn't have much code, but on loading gmail the peak number of arenas in the "code" arena pool drops from ~2700 to ~650. The "script_analyze" peak numbers also dropped down from many hundreds.
Created attachment 550295 [details] [diff] [review] patch v2 (on top of patch in bug 675545) Slightly tweaked version.
I opened 676457 for PLArenas.
Created attachment 550907 [details] [diff] [review] instrumentation patch I tried the attached instrumentation patch to see the peak amount of wasted memory. When starting the browser and opening Gmail, at peak there were 9,999,970 bytes requested and 1,000,2432 allocated. This is because this problem had already been recognised and fixed (via ARENA_HEADER_SIZE_HACK) for cx->tempPool, which is by far the most important case. Still, it's worth fixing properly for simplicity and to avoid future problems.
Backed out of inbound due to possible OS X 10.5 Ts regression, a=dao: http://hg.mozilla.org/integration/mozilla-inbound/rev/2c7e8a50af12 https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/Bqnlw-ZvI54 Ts, MIN Dirty Profile increase 7.31% on MacOSX 10.5.8 Mozilla-Inbound Previous: avg 720.526 stddev 4.673 of 30 runs up to revision 2642c7b0dc60 New : avg 773.210 stddev 0.431 of 5 runs since revision 5f430b3d9aa4 Change : +52.684 (7.31% / z=11.275) Ts, MED Dirty Profile increase 7.55% on MacOSX 10.5.8 Mozilla-Inbound Previous: avg 722.389 stddev 4.025 of 30 runs up to revision 2642c7b0dc60 New : avg 776.958 stddev 2.870 of 5 runs since revision 5f430b3d9aa4 Change : +54.568 (7.55% / z=13.558)
Even if almost unnoticeable, looks like there may be also a plain Ts and Ts,paint regression on 10.5.8, the graph is higher. But no change on 10.6
Thanks for backing out, Ed. There are two things going on in the patch: 1. It avoids the clownshoes (i.e. wasted space due to rounding). This should be pure win, though a smaller win on Mac OS because it's allocator doesn't round up as hard as jemalloc. 2. It increases the size of several JSArena sizes. I thought this would be a perf win because bigger arenas makes for fewer calls to malloc, and these arena pools typically have lots of arenas. I'll try the two parts individually, try to work out what's going on.
I did a Talos try run with just part 1 from comment 15. The results looked innocuous, but who knows if I was reading them correctly. So I'll roll the dice again: http://hg.mozilla.org/integration/mozilla-inbound/rev/240cfe9e5c2c
Backed out again, this time the fly in the ointment was a Win7 debug xpcshell failure in services\sync\tests\unit\test_errorhandler_filelog.js like https://tbpl.mozilla.org/php/getParsedLog.php?id=6322901, and I'm a touch suspicious of the explosion in https://tbpl.mozilla.org/php/getParsedLog.php?id=6324035&full=1, a WinXP debug mochitest-1 that hit an "Assertion failure: fun->getProto() == proto, at e:\builds\moz2_slave\m-in-w32-dbg\build\js\src\jsfun.h:458" running test_audio_event_adopt.html (and then turned purple because it failed to kill off the unhappy process).
Bug 684039 completely rewrote JSArena and fixed the clownshoes in the process.