Closed
Bug 403830
Opened 17 years ago
Closed 17 years ago
Use arena pools and a separate heap for content
Categories
(Core :: DOM: Core & HTML, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: pavlov, Assigned: smaug)
References
Details
(Keywords: memory-footprint, perf)
Attachments
(2 files, 6 obsolete files)
47.30 KB,
patch
|
Details | Diff | Splinter Review | |
80.75 KB,
patch
|
jst
:
superreview+
|
Details | Diff | Splinter Review |
Given that content nodes, text nodes/fragments, etc are tied to the lifetime of the document, it would make sense to use pools to group those allocations together so that we can destroy them together to reduce fragmentation. I would suggest that we also look at doing the pool allocations on a separate heap that all documents use to further avoid fragmentation. As an experiment we it would be pretty easy to allocate all content stuff in its own heap (which would have its own fragmentation problems) to get all of these allocations out of everything else to see what the other heaps fragmentation looks like with this in place. Things jst and sicking suggested changing the allocators for include: dom nodes, nsAttrAndChild stuff, dom slots, text fragments, nodeinfo and attr values.
Assignee | ||
Comment 1•17 years ago
|
||
(In reply to comment #0) > Given that content nodes, text nodes/fragments, etc are tied to the lifetime of > the document, Since when content nodes are tied to the lifetime of the document? Have I missed that change during 1.9? Though, all nodes created for a document use the same nodeinfomanager, so deleting nodeinfomanager could be used as a hint when to release a pool. Nodeinfos already use one pool, but that is not per document.
Reporter | ||
Comment 2•17 years ago
|
||
sorry, i should have said they are generally tied to that lifetime. they can certainly be moved to other documents.
Assignee | ||
Comment 3•17 years ago
|
||
But when they are moved to some other document, only their nodeinfo is changed.
Assignee | ||
Comment 4•17 years ago
|
||
And so the nodeinfo would be from one pool and the content node would from some other pool. Doesn't really work, unless that situation keeps both pools alive.
Yes, we would need to be able to keep the pool alive longer than the document. Most likely all nodes will need to keep a pointer to the pool they were allocated from and refcount the pool itself, such that nodes are removed from the right pool when they die, and that the pool doesn't go away until all things in it are gone.
Reporter | ||
Updated•17 years ago
|
Flags: blocking1.9+
Priority: -- → P2
Assignee | ||
Comment 7•17 years ago
|
||
(In reply to comment #5) > Most likely all nodes will need to keep a pointer to the pool they were > allocated Are we willing to increase the size of content objects? Maybe yes in this case.
I'd say that we are. I think we should tick a pointer to it in each nsAttrAndChildArray, nsTextFragment and nsAttributeNode, that should cover all nodes iirc. We could try to get fancy and just stick a pointer in the nsNodeInfoManager, and if a node is moved between documents (and thus gets another nodeinfomanager) we'd stick a pointer in the slots. However we'd have to pass that pointer into nsTextFragment and nsAttrAndChildArray any time we make a call that could need to allocate memory (which are most calls), so I'm not sure it's worth it.
Assignee | ||
Comment 9•17 years ago
|
||
Why not keep it simple and have the pointer in nsINode?
Because then we'll still have to pass it in to nsAttrAndChildArray/nsTextFragment whenever we're calling into it. I think the result would be uglier code. Would you have the cycles to work on this Smaug? Feel free to use any implementation you think would be best of course.
Assignee: jonas → Olli.Pettay
Reporter | ||
Comment 11•17 years ago
|
||
whoever ends up doing this, we probably want an nsArena class that is probably initially implemented in terms of PL_ARENA but can be replaced with something better
Assignee | ||
Comment 12•17 years ago
|
||
Our current arena implementation isn't too good for arbitrary sized allocations, for example for textfragments. Or at least I haven't found any reasonable way to give/free X bytes back to arena. FrameArena for example uses its own cache and eventually frees everything. But for content nodes and maybe for slots and ELMs it is pretty easy to use an arena. A patch for content nodes coming.
Assignee | ||
Comment 13•17 years ago
|
||
The arena is used for other nodes except nsIDocument, which creates the nodeinfomanager/nodeallocator. nsIDocument has its own new -operator, so it doesn't use nsINode::new. nsNodeAllocator is based on FrameArena, but is refcounted and takes account pointer size. Whether or not such allocator will be used, after the patch it is anyway easy to change node allocator. Adding mAllocator to nsINode, because in this way it really makes it easy to handle new/delete. If we want to arena allocate arbitrary sized textfragments, all comments how to implement that are welcome. The allocation method provided by the patch doesn't really fit on that. Some mem-fragmentation tests even with this patch would be great. I'll do some, but I don't have the tool to visualize mem-fragmentation.
Assignee | ||
Comment 14•17 years ago
|
||
Maybe not too reliable numbers, but still promising: Running mochitest on 64bit linux with the patch showed that after the test debug-build took ~44MB less memory than without the patch. (374MB vs 330MB, the initial memusage was in both cases ~110MB).
Assignee | ||
Comment 15•17 years ago
|
||
Unfortunately opt-32bit-linux-build doesn't seem show any decreased memusage. ...still testing.
Assignee | ||
Comment 16•17 years ago
|
||
Or 32bit-opt-build with the patch has only very small difference in mem usage. The final mem usage with the patch is ~149MB, without ~151MB. While running mochitest's DOM1/DOM2 tests (lots of different pages) with the patch mem usage doesn't increase at all, but without the patch mem usage increases slowly. One thing to test is performance, which, I think, shouldn't be affected much.
Assignee | ||
Comment 17•17 years ago
|
||
32bit-opt-linux, tp2-total-mean, 2 runs, with patch 5262, 5217, without patch 5310, 5233 The difference is probably just noise, but doesn't look like the patch slows down page load.
Comment 18•17 years ago
|
||
Two questions: 1) nsFixedSizeAllocator is not practical here because of the number of sizes involved? I'm not sure how that compares to framearena. 2) If we want to avoid adding a word to every node, could we get the allocator off the nodeinfo manager (which we can always reach from a node)? To handle adopted nodes, we could have a flag, and if it's set look for the allocator in slots (so adopting a node would in particular force slots creation). Not sure this is worth worrying about, but if we care...
Assignee | ||
Comment 19•17 years ago
|
||
About nsFixedSizeAllocator, that could be used yes, though the current implementation is slower, because the right sized bucked is search iteratively, framearena-style uses simple indexing (using more memory though) Getting the allocator from nodeinfomanager is possible yes, but since adding mAllocator made the implementation so simple, I did that.
Comment 20•17 years ago
|
||
> but since adding mAllocator made the implementation so simple, I did that.
Makes sense. I'm just saying that we should try doing it the other way too, and measure memory usage. For a first cut, you could not worry about adopted nodes (since typically there aren't any, right?).
Assignee | ||
Comment 21•17 years ago
|
||
Keeps the current nsINode size, but is uglier.
Assignee | ||
Comment 22•17 years ago
|
||
Hmm, there is nsNodeAllocator also in nsDTDUtils.cpp, and that disturbs log. For some reason I didn't see the problem in 64bit build, only on 32bit.
Assignee | ||
Comment 23•17 years ago
|
||
Attachment #291187 -
Attachment is obsolete: true
Assignee | ||
Comment 24•17 years ago
|
||
Attachment #291257 -
Attachment is obsolete: true
Assignee | ||
Comment 25•17 years ago
|
||
Comment on attachment 291288 [details] [diff] [review] Better naming for the allocator class, mAllocator in slots What do you think about this as a first step? As I said ELM and slots could be handled similarly, textfragments would need something else. This is uglier than having mAllocator in nsINode, but still pretty simple. Uses |operator delete| in a similar way as nsIFrames and doesn't increase the size of nsINode. Passes mochitest without crashing :)
Attachment #291288 -
Flags: review?(jonas)
Over in JS-land, we use careful page layout to let us find the GC flags associated with a given JSObject. If we use the same trick here (allocate page aligned where possible, revert to slight overallocation otherwise to ensure that we have appropriately-aligned bits -- our major platforms all support the mmap we need for page alignment, though) then we should be able to get from the nodes to their allocator in O(1) without adding to the size of each node. http://mxr.mozilla.org/mozilla/source/js/src/jsgc.c#114 is the novel, no word on casting for the movie.
This looks in general great. I especially like the idea of getting the allocator through the nodeinfo. However I think we should try to get the nsAttrAndChildArray buffer in there too. It's shown up pretty high in stats as a source of churning, which makes sense since it reallocs a lot. In doing that I really think that we want to not mess around with slots and instead stick the allocator in nsAttrAndChildArray. We probably should kill the realloc that lives in nsAttrAndChildArray, I don't think it's giving us anything at all really, we should just nuke it.
Assignee | ||
Comment 28•17 years ago
|
||
PLArena, as far as I see, isn't very useful to be used with variable sized nsAttrAndChildArray mBuffer. Same problem with textfragments. Though, perhaps adding size checks to nodeallocator could work quite well. If trying to allocate bigger chunk of memory than X, allocate it from heap, otherwise from arena. I'll try that one and when someone comes up with better arena implementation, domnodeallocator can be changed to use that.
Most malloc allocations will likely implement realloc for the sizes that are common for the child-array buffer by simply allocating a new block and freeing the old. I.e. they will likely, for chunks this small, have separate areas for separate sized blocks. So I think we should simply allocate a new block in nsAttrAndChildArray::GrowBy, which should work fine with the current PLArena implementation.
Assignee | ||
Comment 30•17 years ago
|
||
(In reply to comment #29) > So I think we should simply allocate a new block in > nsAttrAndChildArray::GrowBy, which should work fine with the current PLArena > implementation. The allocation isn't problem, but if array gets resized many times, how to release the previously used memory to be used other arena users.
Comment 31•17 years ago
|
||
I'm a little concerned about using dumb pools of objects: don't most AJAXy pages go through lots of DOM nodes during their lifetime? Does this mean that all of those nodes would stay alive until the document was unloaded (which is "the end of time" for people with gmail permanently loaded in their browser)?
Assignee | ||
Comment 32•17 years ago
|
||
This doesn't change the lifetime of domnodes. It is the memory which isn't released until all the objects created using some allocator are released. But memory does get reused/recycled and most dom nodes have the same size (or there are few common node object sizes). Similar thing is already done with nsIFrame objets.
> The allocation isn't problem, but if array gets resized many times, how to > release the previously used memory to be used other arena users. All of the arrays goes through the same list of sizes as they grow; 32, 64, 96, 128, 160, 192, 224, 256, 512, 1024, 2048, etc[*]. And in the far most cases we're going to stay in the lower range of that list. So it should be possible to very often reuse the freed memory. Both by other array buffers, and by other random objects. So I don't think the freed array buffers are going to be harder to reuse than random freed nodes. [*] This isn't currently 100% true as we do a shrink-realloc at the end of parsing HTML nodes currently. But that's what I meant we should remove in comment 27.
Assignee | ||
Comment 34•17 years ago
|
||
Comment on attachment 291288 [details] [diff] [review] Better naming for the allocator class, mAllocator in slots I'll post a new patch once I've found out why there is a random crash happening when running dom2 mochitest
Attachment #291288 -
Attachment description: etter naming for the allocator class, mAllocator in slots → Better naming for the allocator class, mAllocator in slots
Attachment #291288 -
Attachment is obsolete: true
Attachment #291288 -
Flags: review?(jonas)
Assignee | ||
Comment 35•17 years ago
|
||
Still no luck to find where the problem is :( Something very strange is happening...
Assignee | ||
Comment 36•17 years ago
|
||
The problem was that nsDOMDocumentType is handled a bit differently. Running mochitest on x86_64 linux: DOMNodeAllocator, arena allocation: 8 bytes 118681 times DOMNodeAllocator, arena allocation: 16 bytes 34476 times DOMNodeAllocator, arena allocation: 24 bytes 16803 times DOMNodeAllocator, arena allocation: 32 bytes 7203 times DOMNodeAllocator, arena allocation: 40 bytes 13641 times DOMNodeAllocator, arena allocation: 48 bytes 9054 times DOMNodeAllocator, arena allocation: 56 bytes 12240 times DOMNodeAllocator, arena allocation: 64 bytes 607450 times DOMNodeAllocator, arena allocation: 72 bytes 336973 times DOMNodeAllocator, arena allocation: 80 bytes 548233 times DOMNodeAllocator, arena allocation: 88 bytes 205335 times DOMNodeAllocator, arena allocation: 96 bytes 17510 times DOMNodeAllocator, arena allocation: 104 bytes 23258 times DOMNodeAllocator, arena allocation: 112 bytes 1312 times DOMNodeAllocator, arena allocation: 120 bytes 5135 times DOMNodeAllocator, arena allocation: 128 bytes 63543 times DOMNodeAllocator, arena allocation: 136 bytes 1331 times DOMNodeAllocator, arena allocation: 144 bytes 1854 times DOMNodeAllocator, arena allocation: 152 bytes 3416 times DOMNodeAllocator, arena allocation: 160 bytes 874 times DOMNodeAllocator, arena allocation: 168 bytes 21 times DOMNodeAllocator, arena allocation: 176 bytes 27 times DOMNodeAllocator, arena allocation: 184 bytes 18 times DOMNodeAllocator, arena allocation: 192 bytes 11165 times DOMNodeAllocator, arena allocation: 200 bytes 19 times DOMNodeAllocator, arena allocation: 208 bytes 22 times DOMNodeAllocator, arena allocation: 216 bytes 32 times DOMNodeAllocator, arena allocation: 224 bytes 2470 times DOMNodeAllocator, arena allocation: 232 bytes 8707 times DOMNodeAllocator, arena allocation: 240 bytes 1267 times DOMNodeAllocator, arena allocation: 248 bytes 24 times DOMNodeAllocator, arena allocation: 256 bytes 1760 times DOMNodeAllocator, arena allocation: 264 bytes 9 times DOMNodeAllocator, arena allocation: 272 bytes 12 times DOMNodeAllocator, arena allocation: 280 bytes 217 times DOMNodeAllocator, arena allocation: 288 bytes 24 times DOMNodeAllocator, arena allocation: 296 bytes 90 times DOMNodeAllocator, arena allocation: 304 bytes 11 times DOMNodeAllocator, arena allocation: 312 bytes 10 times DOMNodeAllocator, arena allocation: 320 bytes 7 times DOMNodeAllocator, arena allocation: 328 bytes 7 times DOMNodeAllocator, arena allocation: 336 bytes 5 times DOMNodeAllocator, arena allocation: 344 bytes 7 times DOMNodeAllocator, arena allocation: 352 bytes 13 times DOMNodeAllocator, arena allocation: 360 bytes 21 times DOMNodeAllocator, arena allocation: 368 bytes 5 times DOMNodeAllocator, arena allocation: 376 bytes 17 times DOMNodeAllocator, arena allocation: 384 bytes 14 times DOMNodeAllocator, arena allocation: 392 bytes 8 times DOMNodeAllocator, arena allocation: 400 bytes 9 times DOMNodeAllocator, arena allocation: 408 bytes 10 times DOMNodeAllocator, arena allocation: 416 bytes 10 times DOMNodeAllocator, arena allocation: 424 bytes 9 times DOMNodeAllocator, arena allocation: 432 bytes 2 times DOMNodeAllocator, arena allocation: 440 bytes 3 times DOMNodeAllocator, arena allocation: 448 bytes 7 times DOMNodeAllocator, arena allocation: 456 bytes 2 times DOMNodeAllocator, arena allocation: 464 bytes 5 times DOMNodeAllocator, arena allocation: 472 bytes 5 times DOMNodeAllocator, arena allocation: 480 bytes 5 times DOMNodeAllocator, arena allocation: 488 bytes 4 times DOMNodeAllocator, arena allocation: 496 bytes 4 times DOMNodeAllocator, arena allocation: 504 bytes 3 times DOMNodeAllocator, normal allocations: 6489 times
Assignee | ||
Comment 37•17 years ago
|
||
I hope ELM handling could be done in bug 407442.
Assignee | ||
Comment 38•17 years ago
|
||
I tested few different plarena sizes, and on 32bit-opt-linux-build 8192 kept memory usage at lower level than 4096 (which is used in the patch) or 16384. Testcase was to run mochitest and then open a new tab and close the tab where mochitest was run.
Assignee | ||
Comment 39•17 years ago
|
||
Comment on attachment 292227 [details] [diff] [review] nodes, slots, textfragments and attrandchildarrays in arena Jonas, what do you think about this? I'd change PL_InitArenaPool(mPool, "nsDOMNodeAllocator", 4096, 0); to use 8196. That makes small documents to use some extra memory, but that shouldn't be too bad. It would be great if Stuart could test how this affects to the memory fragmentation (and the 4096 -> 8192 change might be good to be done in that case too).
Attachment #292227 -
Flags: review?(jonas)
Comment on attachment 292227 [details] [diff] [review] nodes, slots, textfragments and attrandchildarrays in arena Why do you need nsSlots::mAllocator? Seems like the only place you use it is in nsNodeUtils::LastRelease? If so you could add an nsINode::GetAllocator function that uses the pointer in nsAttrAndChildArray or nsTextFragment. You'd also need to move destruction of the slots into the various node classes. If you do this you also wouldn't need to force the slots to be created when a node is moved between documents. Make the various mAllocator members be nsRefPtrs rather than manually doing the refcounting. >Index: content/base/public/nsINode.h >+ void* operator new(size_t aSize, nsINodeInfo* aNodeInfo); >+ void operator delete(void* aPtr, size_t aSize); >+private: >+ void* operator new(size_t aSize) CPP_THROW_NEW { return nsnull; } > protected: Can you simply leave this unimplemented so that it produces a linker error if anyone tries to use it. We do that in nsCOMPtr.h. Same for nsSlots. Done until nsNodeInfoManager.cpp
Comment on attachment 292227 [details] [diff] [review] nodes, slots, textfragments and attrandchildarrays in arena You can remove nsIContent::Compact() now that there are no callers and all implementations are no-ops. >Index: content/base/src/nsGenericElement.cpp >+nsINode::operator delete(void* aPtr, size_t aSize) >+{ >+ size_t* szPtr = static_cast<size_t*>(aPtr); >+ *szPtr = aSize; >+} Ugh ugh ugh. Is there no other way to do this? This is really ugly :( >Index: content/base/src/nsNodeInfoManager.cpp >+#define NS_RECYCLER_MULTIPLIER (PR_FloorLog2(sizeof(void*))) The PR_FloorLog2 call isn't going to compile away, so you'll get faster code if you simply use sizeof(void*) and divide by that instead. That'll make it more of a 'multiplier' anyway. Also comment on what this is used for. >+nsresult >+nsDOMNodeAllocator::Init() >+{ >+#ifdef DEBUG >+ ++gDOMNodeAllocators; >+#endif >+ mPool = new PLArenaPool(); >+ NS_ENSURE_TRUE(mPool, NS_ERROR_OUT_OF_MEMORY); >+ PL_InitArenaPool(mPool, "nsDOMNodeAllocator", 4096, 0); >+ memset(mRecyclers, 0, sizeof(mRecyclers)); >+ return NS_OK; >+} Can't you make mPool be the arena itself rather than a pointer to it? You're forgetting to delete it in the destructor. That way you could get rid of the Init function too. >+nsrefcnt >+nsDOMNodeAllocator::AddRef() >+{ >+ NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt"); >+ ++mRefCnt; >+ NS_LOG_ADDREF(this, mRefCnt, "nsDOMNodeAllocator", sizeof(*this)); >+ return mRefCnt; >+} You might as well inline this if you make the NS_PRECONDITION into an assertion. That'll make for smaller and faster code. Don't inline Release of course. >+nsDOMNodeAllocator::Alloc(size_t aSize) >+ if (result) { >+ NS_ADDREF_THIS(); Hmm.. So you're refcounting both for alloc/frees, and at all pointers to the pool. This seems excessive, you should be able to just refcount at all pointers to the pool, no? Anyone that allocates memory also needs to keep a reference to the pool somewhere so that they can free it. >Index: content/base/src/nsNodeInfoManager.h >+#define NS_NODE_RECYCLER_SIZE 64 Please document what this number is. >+ // The recycler array is sparse with the indices being multiples of >+ // sizeof(void*), i.e., 0, 4, 8, 12, 16, 20, ... or 0, 8, 16, ... >+ void* mRecyclers[NS_NODE_RECYCLER_SIZE]; So I think technically 'sparse' is correct here, but usually it's taken to mean that you don't actually allocate all items up front, which this array does. So I would avoid that term. Also, the indices aren't actually multiples, it's the memory sizes at the indices that are. >Index: content/base/src/nsNodeUtils.cpp > nsNodeUtils::LastRelease(nsINode* aNode) > { > nsINode::nsSlots* slots = aNode->GetExistingSlots(); >+ nsDOMNodeAllocator* allocator = nsnull; > if (slots) { >+ allocator = slots->mAllocator; As commented above, change this to call aNode->GetAllocator instead and assert that it is non-null for all nodes except documents. >@@ -230,17 +232,27 @@ nsNodeUtils::LastRelease(nsINode* aNode) >- delete aNode; >+ if (aNode->IsNodeOfType(nsINode::eDOCUMENT)) { >+ delete aNode; >+ } else { >+ if (!allocator) { >+ allocator = aNode->mNodeInfo->NodeInfoManager()->NodeAllocator(); >+ } >+ NS_ASSERTION(allocator, "Should have allocator here!"); >+ delete aNode; // Calls destructor and sets size to *aNode. >+ size_t* sz = reinterpret_cast<size_t*>(aNode); >+ allocator->Free(*sz, static_cast<void*>(aNode)); This is really really weird. There has to be a better way to do this :( >Index: content/base/src/nsTextFragment.cpp >+ if (mState.mLength && mState.mInHeap) { Are we ever setting mInHeap to true when mLength is 0? Or did you add that as extra precaution. I don't really care much either way, but it'd seem like a bug if we ever set mInHeap to true in such a case. Looks great otherwise. I do want to take a look at it once mAllocator is removed from slots though.
Attachment #292227 -
Flags: review?(jonas) → review-
Assignee | ||
Comment 42•17 years ago
|
||
(In reply to comment #40) > (From update of attachment 292227 [details] [diff] [review]) > Why do you need nsSlots::mAllocator? Seems like the only place you use it is in mAllocator points to the allocator of slots, which is always the same as allocator of node. The makes slots to be created when adopting node. > If so you could add an nsINode::GetAllocator function that uses the pointer in > nsAttrAndChildArray or nsTextFragment. You'd also need to move destruction of > the slots into the various node classes. Doesn't work. Attribute nodes don't have nsAttrAndChildArray or nsTextFragment. Using allocator from nodeinfo works always except when the node is adopted. (in which case slots->mAllocator points to the right allocator.) > Make the various mAllocator members be nsRefPtrs rather than manually doing the > refcounting. I don't really want to include nsNodeInfoManager.h (or whereever the allocator definition is going to live).
(In reply to comment #42) > > If so you could add an nsINode::GetAllocator function that uses the pointer > > in nsAttrAndChildArray or nsTextFragment. You'd also need to move > > destruction of the slots into the various node classes. > Doesn't work. Attribute nodes don't have nsAttrAndChildArray or > nsTextFragment. Using allocator from nodeinfo works always except when the > node is adopted. (in which case slots->mAllocator points to the right > allocator.) You could add an mAllocator to nsDOMAttribute. They are very rarely created so size doesn't matter. > > Make the various mAllocator members be nsRefPtrs rather than manually doing the > > refcounting. > I don't really want to include nsNodeInfoManager.h (or whereever the allocator > definition is going to live). I think we'll eventually will want to move the allocator into it's own file as we'll likely want to use it for other things too. But I'm fine with doing that as part of the pool-rewriting work.
Assignee | ||
Comment 44•17 years ago
|
||
(In reply to comment #41) > You can remove nsIContent::Compact() Er, what? > Ugh ugh ugh. Is there no other way to do this? This is really ugly :( Haven't found any other way. Same thing is done in nsFrame::operator delete. > > >Index: content/base/src/nsNodeInfoManager.cpp > > >+#define NS_RECYCLER_MULTIPLIER (PR_FloorLog2(sizeof(void*))) > The PR_FloorLog2 call isn't going to compile away, so you'll get faster code if > you simply use sizeof(void*) and divide by that instead. sizeof(void*) returns 4 or 8, PR_FloorLog2(sizeof(void*) 2 or 3 But sure, I can optimize that. > Can't you make mPool be the arena itself rather than a pointer to it? I was again just trying to not to include plarena.h nsNodeInfoManager.h > >+ if (mState.mLength && mState.mInHeap) { > > Are we ever setting mInHeap to true when mLength is 0? Or did you add that as > extra precaution. Just some extra check that I added when trying to find the crash. But I'll re-check if it is needed. I'd keep the mAllocator in slots. Makes memory handling simpler. Allocator is in slots if slots is created, otherwise nodeinfo->nimanager->allocator.
Assignee | ||
Comment 45•17 years ago
|
||
Using PLArenaPool* mPool, otherwise got lots of errors because of #include mess. Still using nsSlots::mAllocator, that IMO keeps memory handling simpler. No need to special case nsDocument even in more case etc. But ofc if the size of slots matters a lot, that could be changed.
Attachment #292227 -
Attachment is obsolete: true
Attachment #292386 -
Flags: review?(jonas)
Assignee | ||
Updated•17 years ago
|
Attachment #292386 -
Attachment is obsolete: true
Attachment #292386 -
Flags: review?(jonas)
Assignee | ||
Comment 46•17 years ago
|
||
ok, it wasn't so difficult or ugly after all.
Attachment #292403 -
Flags: review?(jonas)
Isn't this leaking all nsDocuments since you end up using the new nsINode::operator delete which doesn't actually free the memory.
Assignee | ||
Comment 48•17 years ago
|
||
nsIDocument has its own new and delete operators
Assignee | ||
Comment 49•17 years ago
|
||
Btw, I did test that nsIDocument's delete operator gets called, at least when compiled on gcc 4.1.2. If this causes problems with some other compilers, hopefully casting to nsIDocument and calling explicitly its delete operator helps (in nsNodeUtils::LastRelease).
Comment on attachment 292403 [details] [diff] [review] removed slots:mAllocator Where is the operator delete for nsIDocument. I don't see it in the patch or the tree. >Index: content/base/src/nsNodeInfoManager.h >+// The size of mRecyclers array. The max size of recycled memory is >+// sizeof(void(*)) * (NS_NODE_RECYCLER_SIZE - 1). >+#define NS_NODE_RECYCLER_SIZE 64 This means that you won't fit 256 byte allocations in the arena which seems like it would be useful, so change this to 65. >Index: content/base/src/nsNodeInfoManager.cpp >+nsDOMNodeAllocator::Alloc(size_t aSize) >+{ >+ void* result = nsnull; >+ >+ // Ensure we have correct alignment for pointers. >+ aSize = PR_ROUNDUP(aSize, sizeof(void*)); >+ // Check recyclers first >+ if (aSize < NS_MAX_NODE_RECYCLE_SIZE) { >+ static PRUint32 highestSetBit = PR_FloorLog2(sizeof(void*)); >+ const PRInt32 index = aSize >> highestSetBit; Just do |const PRInt32 index = aSize / sizeof(void*)|. You could still keep the NS_RECYCLER_MULTIPLIER define though. Sorry if I was unclear before. >Index: content/base/src/nsNodeUtils.cpp >=================================================================== >RCS file: /cvsroot/mozilla/content/base/src/nsNodeUtils.cpp,v >retrieving revision 3.38 >diff -u -8 -p -r3.38 nsNodeUtils.cpp >--- content/base/src/nsNodeUtils.cpp 4 Dec 2007 18:37:54 -0000 3.38 >+++ content/base/src/nsNodeUtils.cpp 10 Dec 2007 14:38:38 -0000 >@@ -183,25 +183,33 @@ nsNodeUtils::ParentChainChanged(nsIConte > (aContent)); > } > } > > void > nsNodeUtils::LastRelease(nsINode* aNode) > { > nsINode::nsSlots* slots = aNode->GetExistingSlots(); >+ nsDOMNodeAllocator* allocator = aNode->GetAllocator(); You need to turn this into a strong reference (use nsRefPtr) since otherwise the arena could go away when the nodes dtor runs. r=me with that fixed.
Attachment #292403 -
Flags: review?(jonas) → review+
Assignee | ||
Comment 51•17 years ago
|
||
(In reply to comment #50) > (From update of attachment 292403 [details] [diff] [review]) > Where is the operator delete for nsIDocument. I don't see it in the patch or > the tree. NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW > You need to turn this into a strong reference (use nsRefPtr) since otherwise > the arena could go away when the nodes dtor runs. Node is allocated using the same arena, so it is also addrefed.
Assignee | ||
Updated•17 years ago
|
Attachment #292403 -
Flags: superreview?(jst)
Assignee | ||
Comment 52•17 years ago
|
||
Stuart, you tested the patch. Did it show any improvements to the memory fragmentation?
Reporter | ||
Comment 53•17 years ago
|
||
I did test it -- hard to get exact numbers, but I saw for sure a 20-30x reduction in allocations from content. Looks to reduce fragmentation somewhat, but we'd need to do some more testing to get better numbers. As jonas noticed, changing the 64 to 65 will get all the 256 byte allocations in which will cover probably 90% of the remaining allocations not caught.
(In reply to comment #51) > > You need to turn this into a strong reference (use nsRefPtr) since otherwise > > the arena could go away when the nodes dtor runs. > Node is allocated using the same arena, so it is also addrefed. Yes, but when you call |delete aNode| that will make the node release its reference to the arena, so the |allocator->Free| call two lines later might not be safe.
Assignee | ||
Comment 55•17 years ago
|
||
Ah, you're right.
Assignee | ||
Comment 56•17 years ago
|
||
I run still some tests and noticed that to keep memory usage at lower level also on x86_64 builds 4096*(sizeof(void*)/2) is better size for the arena. So 8192 on 32bit, 16384 on 64bit. That way memory usage stayed few MB lower level with the patch than without when running tp2 (25 iterations). I used a patch where max recycled memory was (65-1)*sizeof(void*). tp2 also showed that the patch might affect performance positively. Run tp2(25 iterations) twice, and both cases -O3 build without patch had total mean ~9000, with patch ~8800.
So I realized one more problem with the patch. It currently sort of supports allocations of 0 bytes. The problem is that once you free such an allocation we try to put it in the mRecyclers list. Doing that writes to the first 4 bytes of the allocation (to write the 'next' pointer), which isn't safe if the allocation is 0 bytes. To fix this, either explicitly don't support 0 byte allocations by letting Alloc assert and return null. Or you could treat 0 byte allocations (and frees) as 4 byte allocations. Or you could treat 0 byte allocations as allocations bigger than 256 bytes. In any event, doing any of this would let you store 4 byte allocations in mRecyclers[0], 8 byte allocations in mRecyclers[1] etc. This would also let you keep NS_NODE_RECYCLER_SIZE at 64
Assignee | ||
Comment 58•17 years ago
|
||
I'll make 0 byte allocations to be sizeof(void*)
Assignee | ||
Comment 59•17 years ago
|
||
Attachment #292403 -
Attachment is obsolete: true
Attachment #292789 -
Flags: superreview?(jst)
Attachment #292403 -
Flags: superreview?(jst)
Updated•17 years ago
|
Summary: Use pools and a separate heap for content → Use arena pools and a separate heap for content
Reporter | ||
Comment 60•17 years ago
|
||
I think it is better to return null for allocations of size 0 and assert. We shouldn't be doing this anywhere in our code. Also, these things should probably be in the content arena: NS_NewJSEventListener NS_NewEventListenerManager NS_NewDOMEvent
Reporter | ||
Comment 61•17 years ago
|
||
also, nsMappedAttributes should live int he arena
Assignee | ||
Comment 62•17 years ago
|
||
DOMEvents(In reply to comment #60) > I think it is better to return null for allocations of size 0 and assert. We > shouldn't be doing this anywhere in our code. This really isn't too important thing (because Alloc doesn't actually get called with size 0), but on linux malloc(0) returns non-null. I'm not sure dom events should live in the arena. They are pretty commonly used in multiple document contexts. Though *usually* events are very short living objects, they get often killed by the next CC. Other things which could live in the arena are all the content's tearoffs, maybe also content lists. But adding more things to the arena could be a separate bug. We should first get tbox numbers with the patch and also some baking time to see how the patch behaves in the real world.
Agreed about separate bug for remaining classes. Though it doesn't hurt to document it here.
Reporter | ||
Comment 64•17 years ago
|
||
until this is fully reviewed and landed it would be nice to get all the classes possible hooked up to this and in to the patch. certainly the things i mentioned are non-0 cost and we should get them hooked up as possible. shouldn't take a lot of time to do so.
Comment 65•17 years ago
|
||
I should note that the more things we arena-allocate like this, the more chance we have that something is kept alive for a long time and keeps the arena alive. Not a problem for internal data structures or perhaps nodes. Not a problem for DOM events, since the DOM event keeps its target alive, so this would keep the arena alive anyway. At least when the DOM event and the target are from the same arena....
Reporter | ||
Comment 66•17 years ago
|
||
nsNodeInfo's should be allocated from the arena rather than using their own nsFixedSizeAllocator in nsNodeInfo::Create bz: we should get some stats on lifetime of the arenas past document destruction and verify that our lifetime assumptions are good.
Reporter | ||
Comment 67•17 years ago
|
||
while I'm adding things to the list, nsEventDispatcher::Dispatch should use the content arena rather than the nsFixedSizeAllocator stuff
Assignee | ||
Comment 68•17 years ago
|
||
In theory nsEventDispatcher::Dispatch may be used with |windowroot| or |window| which don't have any access to DOM and so there should be some special case when selecting which allocator to use. Same thing would happen if XHR started to use ELM and normal event dispatch and XHR was used in some JS component.
Assignee | ||
Comment 69•17 years ago
|
||
But I do have a patch somewhere to reuse the dispatcher pool.
Updated•17 years ago
|
Attachment #292789 -
Flags: superreview?(jst) → superreview+
Comment 70•17 years ago
|
||
Do we want to land the patch as is to being testing and continue on with the other enhancements?
smaug did that yesterday, down 15K allocs and 5ms on Ts, I think.
Comment 72•17 years ago
|
||
(In reply to comment #70) > Do we want to land the patch as is to being testing and continue on with the > other enhancements? It landed last night. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=Olli.Pettay%helsinki.fi&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-12-15+01%3A45&maxdate=2007-12-15+01%3A59&cvsroot=%2Fcvsroot
Comment 73•17 years ago
|
||
This also helped Linux Tp by 2% or so, right?
Assignee | ||
Comment 74•17 years ago
|
||
Yes, seems like this helped tp, also on talos. The effect to memory usage is harder to measure, because more memory is possibly used (because of recycling etc.), but memory fragmentation shouldn't be as bad as before so after some (possibly quite long) time memory usage should be lower. I'm testing this all the time on FF and TB. Marking this fixed. Patch(es) for nsJSEventListener nsEventListenerManager, nsMappedAttributes etc should go to some other bug. Please reopen this if you find some bad test results.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 75•17 years ago
|
||
(In reply to comment #74) > The effect to memory usage is harder to measure, because more memory is > possibly used (because of recycling etc.) Yes, the MaxHeap number for SeaMonkey is up significantly, but I guess that's expected due to the nature of the patch.
Reporter | ||
Comment 76•17 years ago
|
||
I wouldn't expect max heap usage to be "up significantly" but we should take a look at the arena implementation. smaug: do you have any information on how long arenas live after most pages? are they deleted after cycle collection or do they last around for a while? if the latter, do you know what is holding them alive?
Assignee | ||
Comment 77•17 years ago
|
||
The main reason way arenas don't get deleted immediately is bfcache.
Assignee | ||
Comment 78•17 years ago
|
||
And also, CC may not get called immediately, especially if user is active. So one CC may then release several arenas.
(In reply to comment #75) > Yes, the MaxHeap number for SeaMonkey is up significantly, but I guess that's > expected due to the nature of the patch. How much is "up significantly". It's not really expected to go up a lot under normal usage patterns.
Assignee | ||
Comment 80•17 years ago
|
||
What is MH test actually measuring? Starting the browser and closing it? Seems like Seamonkey MH is 300K up. I did test running mochitest and then the *final* memory usage went down.
Assignee | ||
Comment 81•17 years ago
|
||
AttrandChildArrays (== elements), DOMAttributes and textfragments (==textnodes) are now sizeof(void*) larger than before. Few thousands elements are created during startup, I think, not sure how many textnodes.
Comment 82•17 years ago
|
||
As Smaug pointed out, SeaMonkey MaxHeap has gone from 17.6MB to 17.9MB, which Allocations dropped from 520K to 500K, see the nye column on http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey&maxdate=1197763889&hours=24&legend=0&norules=1 at the time the patch went in (2007/12/15 shortly after 01:00:00).
It actually surprises me that it went up that much. It's unlikely that the extra 4 bytes per node is making that big of a difference, and 300k of unused/recycled arena space sounds like a lot. I'd still say the benefits very much outweigh it though, but it'd be interesting to see where those 300k are going.
Comment 84•17 years ago
|
||
Everything in comment 77 and comment 78 would apply to the pre-arena world as well, since the DOM stays alive in those cases. I would assume MH measures the same thing as Rlk, which is running bloatcycle.html.
Comment 85•17 years ago
|
||
We shouldn't assume anything about benefits yet -- we have an unknown, it needs to be understood. Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)? In my dreams, arena allocation has the same or less per-object gross overhead than malloc. Keeping underutilized arenas around is a big hazard -- internal fragmentation. I've warned Stuart about this often. Is this what is driving MH high? If so, it's a problem. /be
Comment 86•17 years ago
|
||
In light of comment #85, we need a new bug, if this bug's patch isn't backed out and this bug reopened. Please file and cite it here, I guess as blocking this bug per usual custom that helps people find what a branch needs by following the blocking chain. /be
MH tracks the arguments to malloc, I believe, so moving to an arena system will cause allocation overhead to be counted now, where it wasn't before. There may also be other space inefficiencies, such as using a void * instead of offset math to find the allocator, but I strongly suspect that we're also seeing an artifact of the measurement mechanism here as well. Instrumenting arena usage to measure unfreed overhead wasn't hard when I did it for the presshell, not sure how similar this model is.
Reporter | ||
Comment 88•17 years ago
|
||
(In reply to comment #77) > The main reason way arenas don't get deleted immediately is bfcache. > This doesn't really answer my question. If you disable bfcache, force cycle collection, etc, how long do the content arenas stick around? We need to make very sure that our assumptions are good and that they do in fact, under normal circumstances, go away and that we didn't accidentally add some long lifetime objects in there.
The arena allocates memory in 8k chunks, so 300k would be a lot of 'wasted' chunks. The reason nodes are sizeof(void*) bigger is that they keep a pointer to the arena they were allocated with, so that we can free them from the right arena. We could do the pointer-math stuff to save us those 4 bytes. This would allow us to write proper delete operators too rather than the uglyness that's currently required.
Comment 90•17 years ago
|
||
Let's say the work so far has been good -- necessary but not sufficient. Before 1.9 is done I think we need to step up to the next level. The JS GC uses mmap or VirtualAlloc to get aligned pages from the OS. This allows small allocations to avoid any overhead for a back-pointer -- instead using what I think Sicking must mean by "pointer-math stuff". We should use the JS page allocator, either factor it into a libjsgcalloc or just rip of the (small amount of) code for the basics. We should certainly figure out why MH jumped so much -- sicking makes clear my point, which was that rounding up to 8K a few times won't get you to 300K. Where is the next bug to discuss all of this? Please cc: the right people from this bug. /be
Assignee | ||
Comment 91•17 years ago
|
||
(In reply to comment #85) > Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)? A comment in nsFrameArena implementation says that objects must be aligned based on sizeof(void*) (in some systems). I had really no reason to not-to-believe it. nsDOMNodeAllocator has the same problem as nsFrameArena. Was it shaver or who, who tested nsFrameArena and noticed that in some cases it contains lots of not-in-real-use memory. So if we can fix the problem with nsDOMNodeAllocator, same should be applied to nsFrameArena. (Those implementations should probably merge). But anyway, I'll test this. And btw, to me the increase in MH isn't a surprise at all. The current plarena implementation just isn't very good for this and nsDOMNodeAllocator adds even more overhead. Something similar already in comment #11 and comment #12.
Assignee | ||
Comment 92•17 years ago
|
||
(In reply to comment #91) > (In reply to comment #85) > > Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)? And ofc storing pointer to the next recycled memory area needs at least sizeof(void*), so smaller allocations aren't possible.
Comment 93•17 years ago
|
||
Smaug: I don't understand comment 92 -- the issue is not rounding up allocations smaller than 4 bytes (on 32-bit systems) to 4 bytes, it's adding 4 bytes overhead. I really don't know of a good reason to expect 300K MH increase, unless you have dozens of tabs open. That is an issue in itself -- we must make these things lazily and perhaps do something to avoid bloat for the mega-tabs use case. But it looks like bug 408720 is the place to talk about all of this, so I'll shut up here. /be
Can someone point me at test results that show the fragmentation-avoidance gains for this patch? I'm not sure where we're tracking that, or how we're measuring it (what workloads, what allocators, etc.).
Comment 95•16 years ago
|
||
Looks like content arenas are going to be removed. See bug 414894.
Depends on: 414894
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•