Closed Bug 675150 Opened 14 years ago Closed 14 years ago

Avoid wasted space in JSArenaPools due to jemalloc rounding up

Tracking

()

Status:

RESOLVED DUPLICATE of bug 684039

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: [MemShrink:P2][clownshoes])

Attachments

(2 files, 1 obsolete file)

patch (on top of patch in bug 675545) 14 years ago Nicholas Nethercote [inactive] 4.23 KB, patch		Details \| Diff \| Splinter Review
patch v2 (on top of patch in bug 675545) 14 years ago Nicholas Nethercote [inactive] 8.98 KB, patch	cdleary : review+	Details \| Diff \| Splinter Review
instrumentation patch 14 years ago Nicholas Nethercote [inactive] 3.31 KB, patch		Details \| Diff \| Splinter Review

Nicholas Nethercote [inactive]

Assignee

Description

•

14 years ago

In bug 675136 comment 2 I found we are allocating lots of 1056 byte chunks, which jemalloc rounds up to 2048 bytes. Here are the culprits: JS_InitArenaPool(&codePool, "code", 1024, sizeof(jsbytecode)); JS_InitArenaPool(&notePool, "note", 1024, sizeof(jssrcnote)); 1024 is the size of each chunk in the arena. Oh wait, that's the payload size, we also have 32 bytes in a header, resulting in 1056 bytes. Awesome. All JSArenaPools use a power-of-two for their chunk size, resulting in this problem, except in jscntxt.cpp where we have this: static const size_t ARENA_HEADER_SIZE_HACK = 40; static const size_t TEMP_POOL_CHUNK_SIZE = 4096 - ARENA_HEADER_SIZE_HACK; ... JS_InitArenaPool(&cx->tempPool, "temp", TEMP_POOL_CHUNK_SIZE, sizeof(jsdouble)); JS_InitArenaPool(&cx->regExpPool, "regExp", TEMP_POOL_CHUNK_SIZE, sizeof(int)); cdleary had the same problem with tempPool and regExpPool in bug 558971, and handled it in the above fashion. Wow, no gold star for you. I intend to modify JSArena so that (a) it'll assert if you don't pass in a power-of-two, and (b) the payload size is adjusted by the size of the header. That will allow ARENA_HEADER_SIZE_HACK to be removed. Avoiding the excess allocation for codePool will result in an 8% reduction in the amount of cumulative heap allocation done when starting the browser and opening gmail(!) (See bug 675136 comment 1 and 2 for how I computed that.) I might also experiment with a larger chunk size than 1024 for codePool. Maybe in a different bug.

Boris Zbarsky [:bzbarsky]

Comment 1

•

14 years ago

Nicholas, I bet PL_InitArenaPool has similar issues. Not sure about making your proposed changes to NSPR, but we can at least fix the callers as needed...

Nicholas Nethercote [inactive]

Assignee

Comment 2

•

14 years ago

bz: possibly, but I didn't see it be a problem in practice on gmail. Any workloads you can suggest that will stress the NSPR arenas more? Hmm, I think this bug is a dup of bug 408921, and that bug makes it sound like it's harder than it first seems... but that's one of a numerous long-winded bugs that Robin Bate Boerop worked on but never landed. A fresh look will be good.

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

14 years ago

(In reply to comment #2) > bz: possibly, but I didn't see it be a problem in practice on gmail. Any > workloads you can suggest that will stress the NSPR arenas more? Bug 414088 and 419131 have more about NSPR arenas.

Boris Zbarsky [:bzbarsky]

Comment 4

•

14 years ago

> Any workloads you can suggest that will stress the NSPR arenas more? Hmm. We use NSPR arenas for stringbundles, pref API, category manager, the rulehash, presshell's arenas, line layout temporary data structures, display lists, and the various consumers of nsFixedSizeAllocator. Also various XBL/chrome stuff. Of these, the obvious ones under page control that are not transient data at the moment are rulehash and the presshell arenas. The presshell arenas are the "layout" line in about:memory. So any testcase that gives you a high "layout" number would stress test that stuff. That's a 4096-byte setup, so look for allocations slightly larger than 4096 bytes while loading, say, large tables? Especially tables with large colspans and rowspans. Note that the presshell's stack arena works around this by using a block size of 4044+sizeof(void*) and hoping that plays nice with the allocator. So if we could take the guesswork out of that, it'd be nice.

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

Oh, to be clear the stack arena is separate from the plarenas used by frame allocations; those are the PLArena sized to 4096.

Justin Lebar (not reading bugmail)

Comment 6

•

14 years ago

> Note that the presshell's stack arena works around this by using a block size > of 4044+sizeof(void*) and hoping that plays nice with the allocator. So if we > could take the guesswork out of that, it'd be nice. One approach is bug 675226.

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Depends on: 675545

Nicholas Nethercote [inactive]

Assignee

Comment 7

•

14 years ago

Attached patch patch (on top of patch in bug 675545) (obsolete) — Details — Splinter Review

This ensures power-of-two arena sizes, and also increases some of the arena size requests, reducing the amount of mallocing going on. This makes very little difference to Sunspider (a tiny improvement according to Cachegrind) because it doesn't have much code, but on loading gmail the peak number of arenas in the "code" arena pool drops from ~2700 to ~650. The "script_analyze" peak numbers also dropped down from many hundreds.

Nicholas Nethercote [inactive]

Assignee

Comment 9

•

14 years ago

Attached patch patch v2 (on top of patch in bug 675545) — Details — Splinter Review

Slightly tweaked version.

Attachment #550272 - Attachment is obsolete: true

Attachment #550295 - Flags: review?(brendan)

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Whiteboard: [MemShrink] → [MemShrink][clownshoes]

Nicholas Nethercote [inactive]

Assignee

Comment 10

•

14 years ago

I opened 676457 for PLArenas.

Nicholas Nethercote [inactive]

Assignee

Comment 11

•

14 years ago

Attached patch instrumentation patch — Details — Splinter Review

I tried the attached instrumentation patch to see the peak amount of wasted memory. When starting the browser and opening Gmail, at peak there were 9,999,970 bytes requested and 1,000,2432 allocated. This is because this problem had already been recognised and fixed (via ARENA_HEADER_SIZE_HACK) for cx->tempPool, which is by far the most important case. Still, it's worth fixing properly for simplicity and to avoid future problems.

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Attachment #550295 - Flags: review?(brendan) → review?(cdleary)

Nicholas Nethercote [inactive]

Assignee

Updated

•

14 years ago

Whiteboard: [MemShrink][clownshoes] → [MemShrink:P2][clownshoes]

Chris Leary [:cdleary] (not checking bugmail)

Updated

•

14 years ago

Attachment #550295 - Flags: review?(cdleary) → review+

Nicholas Nethercote [inactive]

Assignee

Comment 12

•

14 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/5f430b3d9aa4

Whiteboard: [MemShrink:P2][clownshoes] → [MemShrink:P2][clownshoes][inbound]

Ed Morley [:emorley]

Comment 13

•

14 years ago

Backed out of inbound due to possible OS X 10.5 Ts regression, a=dao: http://hg.mozilla.org/integration/mozilla-inbound/rev/2c7e8a50af12 https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/Bqnlw-ZvI54 Ts, MIN Dirty Profile increase 7.31% on MacOSX 10.5.8 Mozilla-Inbound Previous: avg 720.526 stddev 4.673 of 30 runs up to revision 2642c7b0dc60 New : avg 773.210 stddev 0.431 of 5 runs since revision 5f430b3d9aa4 Change : +52.684 (7.31% / z=11.275) Ts, MED Dirty Profile increase 7.55% on MacOSX 10.5.8 Mozilla-Inbound Previous: avg 722.389 stddev 4.025 of 30 runs up to revision 2642c7b0dc60 New : avg 776.958 stddev 2.870 of 5 runs since revision 5f430b3d9aa4 Change : +54.568 (7.55% / z=13.558)

OS: Linux → All

Hardware: x86_64 → All

Whiteboard: [MemShrink:P2][clownshoes][inbound] → [MemShrink:P2][clownshoes]

Version: unspecified → Trunk

Marco Bonardo [:mak]

Comment 14

•

14 years ago

Even if almost unnoticeable, looks like there may be also a plain Ts and Ts,paint regression on 10.5.8, the graph is higher. But no change on 10.6

Nicholas Nethercote [inactive]

Assignee

Comment 15

•

14 years ago

Thanks for backing out, Ed. There are two things going on in the patch: 1. It avoids the clownshoes (i.e. wasted space due to rounding). This should be pure win, though a smaller win on Mac OS because it's allocator doesn't round up as hard as jemalloc. 2. It increases the size of several JSArena sizes. I thought this would be a perf win because bigger arenas makes for fewer calls to malloc, and these arena pools typically have lots of arenas. I'll try the two parts individually, try to work out what's going on.

Nicholas Nethercote [inactive]

Assignee

Comment 16

•

14 years ago

I did a Talos try run with just part 1 from comment 15. The results looked innocuous, but who knows if I was reading them correctly. So I'll roll the dice again: http://hg.mozilla.org/integration/mozilla-inbound/rev/240cfe9e5c2c

Whiteboard: [MemShrink:P2][clownshoes] → [MemShrink:P2][clownshoes][inbound]

Phil Ringnalda (:philor)

Comment 17

•

14 years ago

Backed out again, this time the fly in the ointment was a Win7 debug xpcshell failure in services\sync\tests\unit\test_errorhandler_filelog.js like https://tbpl.mozilla.org/php/getParsedLog.php?id=6322901, and I'm a touch suspicious of the explosion in https://tbpl.mozilla.org/php/getParsedLog.php?id=6324035&full=1, a WinXP debug mochitest-1 that hit an "Assertion failure: fun->getProto() == proto, at e:\builds\moz2_slave\m-in-w32-dbg\build\js\src\jsfun.h:458" running test_audio_event_adopt.html (and then turned purple because it failed to kill off the unhappy process).

Whiteboard: [MemShrink:P2][clownshoes][inbound] → [MemShrink:P2][clownshoes]

Nicholas Nethercote [inactive]

Assignee

Comment 18

•

14 years ago

Bug 684039 completely rewrote JSArena and fixed the clownshoes in the process.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.