Last Comment Bug 675150 - Avoid wasted space in JSArenaPools due to jemalloc rounding up
: Avoid wasted space in JSArenaPools due to jemalloc rounding up
Status: RESOLVED DUPLICATE of bug 684039
[MemShrink:P2][clownshoes]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 4 votes (vote)
: ---
Assigned To: Nicholas Nethercote [:njn] (on vacation until July 11)
:
Mentors:
: 408921 (view as bug list)
Depends on: 675545
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-29 00:12 PDT by Nicholas Nethercote [:njn] (on vacation until July 11)
Modified: 2011-10-16 21:44 PDT (History)
26 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (on top of patch in bug 675545) (4.23 KB, patch)
2011-08-02 18:17 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
patch v2 (on top of patch in bug 675545) (8.98 KB, patch)
2011-08-02 21:37 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
cdleary: review+
Details | Diff | Review
instrumentation patch (3.31 KB, patch)
2011-08-04 17:35 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review

Description Nicholas Nethercote [:njn] (on vacation until July 11) 2011-07-29 00:12:54 PDT
In bug 675136 comment 2 I found we are allocating lots of 1056 byte chunks, which jemalloc rounds up to 2048 bytes.  Here are the culprits:

    JS_InitArenaPool(&codePool, "code", 1024, sizeof(jsbytecode));
    JS_InitArenaPool(&notePool, "note", 1024, sizeof(jssrcnote));

1024 is the size of each chunk in the arena.  Oh wait, that's the payload size, we also have 32 bytes in a header, resulting in 1056 bytes.  Awesome.

All JSArenaPools use a power-of-two for their chunk size, resulting in this problem, except in jscntxt.cpp where we have this:

  static const size_t ARENA_HEADER_SIZE_HACK = 40;
  static const size_t TEMP_POOL_CHUNK_SIZE = 4096 - ARENA_HEADER_SIZE_HACK;
  ...
  JS_InitArenaPool(&cx->tempPool, "temp", TEMP_POOL_CHUNK_SIZE, sizeof(jsdouble));
  JS_InitArenaPool(&cx->regExpPool, "regExp", TEMP_POOL_CHUNK_SIZE, sizeof(int));

cdleary had the same problem with tempPool and regExpPool in bug 558971, and handled it in the above fashion.  Wow, no gold star for you.

I intend to modify JSArena so that (a) it'll assert if you don't pass in a power-of-two, and (b) the payload size is adjusted by the size of the header.  That will allow ARENA_HEADER_SIZE_HACK to be removed.  Avoiding the excess allocation for codePool will result in an 8% reduction in the amount of cumulative heap allocation done when starting the browser and opening gmail(!)  (See bug 675136 comment 1 and 2 for how I computed that.)

I might also experiment with a larger chunk size than 1024 for codePool.  Maybe in a different bug.
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-07-29 07:39:35 PDT
Nicholas, I bet PL_InitArenaPool has similar issues.  Not sure about making your proposed changes to NSPR, but we can at least fix the callers as needed...
Comment 2 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-07-29 13:42:49 PDT
bz: possibly, but I didn't see it be a problem in practice on gmail.  Any workloads you can suggest that will stress the NSPR arenas more?

Hmm, I think this bug is a dup of bug 408921, and that bug makes it sound like it's harder than it first seems... but that's one of a numerous long-winded bugs that Robin Bate Boerop worked on but never landed.  A fresh look will be good.
Comment 3 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-07-29 13:44:33 PDT
(In reply to comment #2)
> bz: possibly, but I didn't see it be a problem in practice on gmail.  Any
> workloads you can suggest that will stress the NSPR arenas more?

Bug 414088 and 419131 have more about NSPR arenas.
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-07-29 21:22:11 PDT
> Any workloads you can suggest that will stress the NSPR arenas more?

Hmm.  We use NSPR arenas for stringbundles, pref API, category manager, the rulehash, presshell's arenas, line layout temporary data structures, display lists, and the various consumers of nsFixedSizeAllocator.  Also various XBL/chrome stuff.

Of these, the obvious ones under page control that are not transient data at the moment are rulehash and the presshell arenas.

The presshell arenas are the "layout" line in about:memory.  So any testcase that gives you a high "layout" number would stress test that stuff.  That's a 4096-byte setup, so look for allocations slightly larger than 4096 bytes while loading, say, large tables?  Especially tables with large colspans and rowspans.

Note that the presshell's stack arena works around this by using a block size of 4044+sizeof(void*) and hoping that plays nice with the allocator.  So if we could take the guesswork out of that, it'd be nice.
Comment 5 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-07-29 21:22:55 PDT
Oh, to be clear the stack arena is separate from the plarenas used by frame allocations; those are the PLArena sized to 4096.
Comment 6 Justin Lebar (not reading bugmail) 2011-07-29 21:24:41 PDT
> Note that the presshell's stack arena works around this by using a block size 
> of 4044+sizeof(void*) and hoping that plays nice with the allocator.  So if we 
> could take the guesswork out of that, it'd be nice.

One approach is bug 675226.
Comment 7 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-02 18:17:46 PDT
Created attachment 550272 [details] [diff] [review]
patch (on top of patch in bug 675545)

This ensures power-of-two arena sizes, and also increases some of the arena size requests, reducing the amount of mallocing going on.  This makes very little difference to Sunspider (a tiny improvement according to Cachegrind) because it doesn't have much code, but on loading gmail the peak number of arenas in the "code" arena pool drops from ~2700 to ~650.  The "script_analyze" peak numbers also dropped down from many hundreds.
Comment 8 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-02 18:20:32 PDT
*** Bug 408921 has been marked as a duplicate of this bug. ***
Comment 9 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-02 21:37:47 PDT
Created attachment 550295 [details] [diff] [review]
patch v2 (on top of patch in bug 675545)

Slightly tweaked version.
Comment 10 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 17:31:49 PDT
I opened 676457 for PLArenas.
Comment 11 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 17:35:55 PDT
Created attachment 550907 [details] [diff] [review]
instrumentation patch

I tried the attached instrumentation patch to see the peak amount of wasted memory.  When starting the browser and opening Gmail, at peak there were   9,999,970 bytes requested and 1,000,2432 allocated.  This is because this problem had already been recognised and fixed (via ARENA_HEADER_SIZE_HACK) for cx->tempPool, which is by far the most important case.  Still, it's worth fixing properly for simplicity and to avoid future problems.
Comment 12 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-29 17:38:53 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/5f430b3d9aa4
Comment 13 Ed Morley [:emorley] 2011-08-30 04:02:56 PDT
Backed out of inbound due to possible OS X 10.5 Ts regression, a=dao:
http://hg.mozilla.org/integration/mozilla-inbound/rev/2c7e8a50af12

https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/Bqnlw-ZvI54

Ts, MIN Dirty Profile increase 7.31% on MacOSX 10.5.8 Mozilla-Inbound
    Previous: avg 720.526 stddev 4.673 of 30 runs up to revision 2642c7b0dc60
    New     : avg 773.210 stddev 0.431 of 5 runs since revision 5f430b3d9aa4
    Change  : +52.684 (7.31% / z=11.275)

Ts, MED Dirty Profile increase 7.55% on MacOSX 10.5.8 Mozilla-Inbound
    Previous: avg 722.389 stddev 4.025 of 30 runs up to revision 2642c7b0dc60
    New     : avg 776.958 stddev 2.870 of 5 runs since revision 5f430b3d9aa4
    Change  : +54.568 (7.55% / z=13.558)
Comment 14 Marco Bonardo [::mak] 2011-08-30 05:17:50 PDT
Even if almost unnoticeable, looks like there may be also a plain Ts and Ts,paint regression on 10.5.8, the graph is higher. But no change on 10.6
Comment 15 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-30 15:59:02 PDT
Thanks for backing out, Ed.

There are two things going on in the patch:

1. It avoids the clownshoes (i.e. wasted space due to rounding).  This should be pure win, though a smaller win on Mac OS because it's allocator doesn't round up as hard as jemalloc.

2. It increases the size of several JSArena sizes.  I thought this would be a perf win because bigger arenas makes for fewer calls to malloc, and these arena pools typically have lots of arenas.

I'll try the two parts individually, try to work out what's going on.
Comment 16 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-09-07 16:50:31 PDT
I did a Talos try run with just part 1 from comment 15.  The results looked innocuous, but who knows if I was reading them correctly.  So I'll roll the dice again:

http://hg.mozilla.org/integration/mozilla-inbound/rev/240cfe9e5c2c
Comment 17 Phil Ringnalda (:philor) 2011-09-07 21:27:24 PDT
Backed out again, this time the fly in the ointment was a Win7 debug xpcshell failure in services\sync\tests\unit\test_errorhandler_filelog.js like https://tbpl.mozilla.org/php/getParsedLog.php?id=6322901, and I'm a touch suspicious of the explosion in https://tbpl.mozilla.org/php/getParsedLog.php?id=6324035&full=1, a WinXP debug mochitest-1 that hit an "Assertion failure: fun->getProto() == proto, at e:\builds\moz2_slave\m-in-w32-dbg\build\js\src\jsfun.h:458" running test_audio_event_adopt.html (and then turned purple because it failed to kill off the unhappy process).
Comment 18 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-10-16 21:44:16 PDT
Bug 684039 completely rewrote JSArena and fixed the clownshoes in the process.

*** This bug has been marked as a duplicate of bug 684039 ***

Note You need to log in before you can comment on or make changes to this bug.