Closed Bug 1005836 Opened 11 years ago Closed 11 years ago

Wasted space in NodePool::Block allocations

Categories

(Core :: XPCOM, defect)

32 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32
blocking-b2g 1.3T+
Tracking Status
b2g-v1.3T --- fixed

People

(Reporter: away, Assigned: n.nethercote)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file)

xul!GCGraphBuilder::NoteChild+0x42: 63e3a112 6818800200 push 28018h 63e3a117 ff1510e9cb64 call dword ptr [xul!_imp__moz_xmalloc (64cbe910)] That's just a hair over 40 pages, so it uses 41. It's not a huge waste, but I do see this number frequently in crash reports, so I assume there are many of these things.
Whiteboard: [MemShrink]
Yeah, it would be nice to save some space if possible. EdgePool would also be worth taking a look at. The purple buffer also does some stuff with blocks, but that got fixed up, and some static asserts were added to make sure we don't regress slop. (Despite the terrible name, GCGraphBuilder is part of the cycle collector, which is in the XPCOM component.)
Component: JavaScript: GC → XPCOM
from the purple buffer code: // Try to match the size of a jemalloc bucket, to minimize slop bytes. // - On 32-bit platforms sizeof(nsPurpleBufferEntry) is 12, so mEntries // is 16,380 bytes, which leaves 4 bytes for mNext. // - On 64-bit platforms sizeof(nsPurpleBufferEntry) is 24, so mEntries // is 32,544 bytes, which leaves 8 bytes for mNext. nsPurpleBufferEntry mEntries[1365]; Block() : mNext(nullptr) { // Ensure Block is the right size (see above). static_assert( sizeof(Block) == 16384 || // 32-bit sizeof(Block) == 32768, // 64-bit "ill-sized nsPurpleBuffer::Block" ); }
I think this is coming from NodePool, which says: class NodePool { private: enum { BlockSize = 8 * 1024 }; // could be int template parameter struct Block { // We create and destroy Block using NS_Alloc/NS_Free rather // than new and delete to avoid calling its constructor and // destructor. Block() { NS_NOTREACHED("should never be called"); } ~Block() { NS_NOTREACHED("should never be called"); } Block* mNext; PtrInfo mEntries[BlockSize + 1]; // +1 to store last child of last node }; PtrInfos are five words on 32-bit platforms, 4 words on 64-bit. So we have (8 * 1024 + 1) * 5 * 4 + 4 bytes being allocated for a NodePool::Block, which is...0x28018 bytes. It would be super nice to shrink PtrInfo on 32-bit platforms, but that looks...very hard. So recalibrating NodePool::BlockSize looks like the best bet here?
(In reply to Nathan Froyd (:froydnj) from comment #3) > I think this is coming from NodePool, which says: Yeah, there was some heavy inlining. NodePool is the one I was looking at.
(In reply to Nathan Froyd (:froydnj) from comment #3) > It would be super nice to shrink PtrInfo on 32-bit platforms, but that > looks...very hard. So recalibrating NodePool::BlockSize looks like the best > bet here? It looks like trying to make NodePool::Block consume 24 pages (96KB) would fit ~4K+ PtrInfo structures without any slop on 32-bit platforms. As a bonus, we'd be allocating smaller chunks of memory than we would before. We'd have a tiny bit of slop on 64-bit platforms, but that seems unavoidable given the structure sizes.
If you want to know how many of these there are in practice, you can look at telemetry and add up CYCLE_COLLECTOR_VISITED_GCED and CYCLE_COLLECTOR_VISITED_REF_COUNTED.
It looks like the median is about 14000.
Nice catch, dmajor. So who wants to take this? mccr8?
Whiteboard: [MemShrink] → [MemShrink:P2]
Assignee: nobody → n.nethercote
I just went with the dead simple approach of reducing BlockSize. On 32-bit, Block now takes up 39.997 pages. On 64-bit it takes up 63.994 pages. I can live with that level of unused space. I did some testing to see how many of these we have live at once. During normal browsing, the peaks are typically 1 or 2, and occasionally spike as high as 10 or so. A pathological case is to load about:memory and then hit "Measure" over and over; that gives seemingly unlimited increases due to the orphan DOM nodes accumulating. I got "explicit" up to 1 GiB and the Block count over 400 before I got bored and stopped. So in practice, this is often reducing the CC memory spike a couple of 10s of KiBs. Not bad for a trivial patch, and probably worth taking on Tarako.
Attachment #8422165 - Flags: review?(continuation)
BTW, having three different classes called |Block| in one file is no fun. I'm strongly tempted to rename them |EBlock|, |NBlock| and |PBlock|...
Comment on attachment 8422165 [details] [diff] [review] Avoid slop NodePool::Block allocations Review of attachment 8422165 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. A static assert like we have elsewhere in this file might be nice.
Attachment #8422165 - Flags: review?(continuation) → review+
This is a small win, but it's a trivial change, and so worthwhile for Tarako.
blocking-b2g: --- → 1.3T?
OS: Windows 7 → All
Hardware: x86_64 → All
(In reply to Nicholas Nethercote [:njn] from comment #10) > BTW, having three different classes called |Block| in one file is no fun. Cycle collector has still some odd looking code from the initial landing. It could be cleaned up a bit.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Flags: needinfo?(fabrice)
blocking-b2g: 1.3T? → 1.3T+
Flags: needinfo?(fabrice)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: