Closed Bug 403830 Opened 17 years ago Closed 17 years ago

Use arena pools and a separate heap for content

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: pavlov, Assigned: smaug)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(2 files, 6 obsolete files)

Given that content nodes, text nodes/fragments, etc are tied to the lifetime of the document, it would make sense to use pools to group those allocations together so that we can destroy them together to reduce fragmentation.  I would suggest that we also look at doing the pool allocations on a separate heap that all documents use to further avoid fragmentation.

As an experiment we it would be pretty easy to allocate all content stuff in its own heap (which would have its own fragmentation problems) to get all of these allocations out of everything else to see what the other heaps fragmentation looks like with this in place.

Things jst and sicking suggested changing the allocators for include:
dom nodes, nsAttrAndChild stuff, dom slots, text fragments, nodeinfo and attr values.
(In reply to comment #0)
> Given that content nodes, text nodes/fragments, etc are tied to the lifetime of
> the document, 

Since when content nodes are tied to the lifetime of the document?
Have I missed that change during 1.9?
Though, all nodes created for a document use the same nodeinfomanager, so
deleting nodeinfomanager could be used as a hint when to release a pool.
Nodeinfos already use one pool, but that is not per document.
sorry, i should have said they are generally tied to that lifetime. they can certainly be moved to other documents.
But when they are moved to some other document, only their nodeinfo is changed.
And so the nodeinfo would be from one pool and the content node would from some other pool. Doesn't really work, unless that situation keeps both pools alive.
Yes, we would need to be able to keep the pool alive longer than the document. Most likely all nodes will need to keep a pointer to the pool they were allocated from and refcount the pool itself, such that nodes are removed from the right pool when they die, and that the pool doesn't go away until all things in it are gone.
Flags: blocking1.9+
Priority: -- → P2
Sicking can you find the right owner?
Assignee: nobody → jonas
(In reply to comment #5)
> Most likely all nodes will need to keep a pointer to the pool they were
> allocated
Are we willing to increase the size of content objects?
Maybe yes in this case.

I'd say that we are. I think we should tick a pointer to it in each nsAttrAndChildArray, nsTextFragment and nsAttributeNode, that should cover all nodes iirc.

We could try to get fancy and just stick a pointer in the nsNodeInfoManager, and if a node is moved between documents (and thus gets another nodeinfomanager) we'd stick a pointer in the slots. However we'd have to pass that pointer into nsTextFragment and nsAttrAndChildArray any time we make a call that could need to allocate memory (which are most calls), so I'm not sure it's worth it.
Why not keep it simple and have the pointer in nsINode?
Because then we'll still have to pass it in to nsAttrAndChildArray/nsTextFragment whenever we're calling into it. I think the result would be uglier code.

Would you have the cycles to work on this Smaug? Feel free to use any implementation you think would be best of course.
Assignee: jonas → Olli.Pettay
whoever ends up doing this, we probably want an nsArena class that is probably initially implemented in terms of PL_ARENA but can be replaced with something better
Our current arena implementation isn't too good for arbitrary sized 
allocations, for example for textfragments. Or at least I haven't found
any reasonable way to give/free X bytes back to arena. FrameArena for example
uses its own cache and eventually frees everything.
But for content nodes and maybe for slots and ELMs it is pretty easy to
use an arena. A patch for content nodes coming.
Attached patch WIP, arena for content nodes (obsolete) — Splinter Review
The arena is used for other nodes except nsIDocument, which creates the nodeinfomanager/nodeallocator. nsIDocument has its own new -operator, so
it doesn't use nsINode::new.
nsNodeAllocator is based on FrameArena, but is refcounted and takes
account pointer size. Whether or not such allocator will be used,
after the patch it is anyway easy to change node allocator.
Adding mAllocator to nsINode, because in this way it really makes it easy to
handle new/delete.
If we want to arena allocate arbitrary sized textfragments, all comments how
to implement that are welcome. The allocation method provided by the patch 
doesn't really fit on that.
Some mem-fragmentation tests even with this patch would be great.
I'll do some, but I don't have the tool to visualize mem-fragmentation.
Maybe not too reliable numbers, but still promising:
Running mochitest on 64bit linux with the patch showed that after
the test debug-build took ~44MB less memory than without the patch.
(374MB vs 330MB, the initial memusage was in both cases ~110MB).
Unfortunately opt-32bit-linux-build doesn't seem show any decreased memusage.
...still testing.
Or 32bit-opt-build with the patch has only very small difference in mem usage.
The final mem usage with the patch is ~149MB, without ~151MB.

While running mochitest's DOM1/DOM2 tests (lots of different pages)
with the patch mem usage doesn't increase at all, but without the patch mem usage 
increases slowly.

One thing to test is performance, which, I think, shouldn't be affected much.
32bit-opt-linux, tp2-total-mean, 2 runs,
with patch 5262, 5217, without patch 5310, 5233
The difference is probably just noise, but doesn't look like the patch
slows down page load.
Two questions:

1)  nsFixedSizeAllocator is not practical here because of the number of sizes involved?  I'm not sure how that compares to framearena.

2)  If we want to avoid adding a word to every node, could we get the allocator off the nodeinfo manager (which we can always reach from a node)?  To handle adopted nodes, we could have a flag, and if it's set look for the allocator in slots (so adopting a node would in particular force slots creation).  Not sure this is worth worrying about, but if we care...
About nsFixedSizeAllocator, that could be used yes, though the current implementation is slower, because the right sized bucked is search iteratively, 
framearena-style uses simple indexing (using more memory though)

Getting the allocator from nodeinfomanager is possible yes, but
since adding mAllocator made the implementation so simple,
I did that. 
> but since adding mAllocator made the implementation so simple, I did that. 

Makes sense.  I'm just saying that we should try doing it the other way too, and measure memory usage.  For a first cut, you could not worry about adopted nodes (since typically there aren't any, right?).
Attached patch WIP, uglier, mAllocator to slots (obsolete) — Splinter Review
Keeps the current nsINode size, but is uglier.
Hmm, there is nsNodeAllocator also in nsDTDUtils.cpp, and that disturbs log.
For some reason I didn't see the problem in 64bit build, only on 32bit.
Comment on attachment 291288 [details] [diff] [review]
Better naming for the allocator class, mAllocator in slots

What do you think about this as a first step?
As I said ELM and slots could be handled similarly, textfragments would need something else.

This is uglier than having mAllocator in nsINode, but still pretty simple. Uses |operator delete| in a similar way as nsIFrames and doesn't increase the size of nsINode. Passes mochitest without crashing :)
Attachment #291288 - Flags: review?(jonas)
Over in JS-land, we use careful page layout to let us find the GC flags associated with a given JSObject.  If we use the same trick here (allocate page aligned where possible, revert to slight overallocation otherwise to ensure that we have appropriately-aligned bits -- our major platforms all support the mmap we need for page alignment, though) then we should be able to get from the nodes to their allocator in O(1) without adding to the size of each node.

http://mxr.mozilla.org/mozilla/source/js/src/jsgc.c#114 is the novel, no word on casting for the movie.
This looks in general great. I especially like the idea of getting the allocator through the nodeinfo.

However I think we should try to get the nsAttrAndChildArray buffer in there too. It's shown up pretty high in stats as a source of churning, which makes sense since it reallocs a lot.

In doing that I really think that we want to not mess around with slots and instead stick the allocator in nsAttrAndChildArray. We probably should kill the realloc that lives in nsAttrAndChildArray, I don't think it's giving us anything at all really, we should just nuke it.
PLArena, as far as I see, isn't very useful to be used with variable sized nsAttrAndChildArray mBuffer. Same problem with textfragments.
Though, perhaps adding size checks to nodeallocator could work quite well.
If trying to allocate bigger chunk of memory than X, allocate it from heap,
otherwise from arena.
I'll try that one and when someone comes up with better arena implementation, 
domnodeallocator can be changed to use that.
Most malloc allocations will likely implement realloc for the sizes that are common for the child-array buffer by simply allocating a new block and freeing the old. I.e. they will likely, for chunks this small, have separate areas for separate sized blocks.

So I think we should simply allocate a new block in nsAttrAndChildArray::GrowBy, which should work fine with the current PLArena implementation.
(In reply to comment #29)
> So I think we should simply allocate a new block in
> nsAttrAndChildArray::GrowBy, which should work fine with the current PLArena
> implementation.
The allocation isn't problem, but if array gets resized many times, how to
release the previously used memory to be used other arena users.

I'm a little concerned about using dumb pools of objects: don't most AJAXy pages go through lots of DOM nodes during their lifetime? Does this mean that all of those nodes would stay alive until the document was unloaded (which is "the end of time" for people with gmail permanently loaded in their browser)?
This doesn't change the lifetime of domnodes. It is the memory which isn't released until all the objects created using some allocator are released.
But memory does get reused/recycled and most dom nodes have the same size
(or there are few common node object sizes).
Similar thing is already done with nsIFrame objets.
> The allocation isn't problem, but if array gets resized many times, how to
> release the previously used memory to be used other arena users.

All of the arrays goes through the same list of sizes as they grow; 32, 64, 96, 128, 160, 192, 224, 256, 512, 1024, 2048, etc[*]. And in the far most cases we're going to stay in the lower range of that list. So it should be possible to very often reuse the freed memory. Both by other array buffers, and by other random objects.

So I don't think the freed array buffers are going to be harder to reuse than random freed nodes.

[*] This isn't currently 100% true as we do a shrink-realloc at the end of parsing HTML nodes currently. But that's what I meant we should remove in comment 27.
Comment on attachment 291288 [details] [diff] [review]
Better naming for the allocator class, mAllocator in slots

I'll post a new patch once I've found out why there is a random crash happening when running dom2 mochitest
Attachment #291288 - Attachment description: etter naming for the allocator class, mAllocator in slots → Better naming for the allocator class, mAllocator in slots
Attachment #291288 - Attachment is obsolete: true
Attachment #291288 - Flags: review?(jonas)
Still no luck to find where the problem is :( Something very strange is happening...
The problem was that nsDOMDocumentType is handled a bit differently.
Running mochitest on x86_64 linux:
DOMNodeAllocator, arena allocation: 8 bytes 118681 times
DOMNodeAllocator, arena allocation: 16 bytes 34476 times
DOMNodeAllocator, arena allocation: 24 bytes 16803 times
DOMNodeAllocator, arena allocation: 32 bytes 7203 times
DOMNodeAllocator, arena allocation: 40 bytes 13641 times
DOMNodeAllocator, arena allocation: 48 bytes 9054 times
DOMNodeAllocator, arena allocation: 56 bytes 12240 times
DOMNodeAllocator, arena allocation: 64 bytes 607450 times
DOMNodeAllocator, arena allocation: 72 bytes 336973 times
DOMNodeAllocator, arena allocation: 80 bytes 548233 times
DOMNodeAllocator, arena allocation: 88 bytes 205335 times
DOMNodeAllocator, arena allocation: 96 bytes 17510 times
DOMNodeAllocator, arena allocation: 104 bytes 23258 times
DOMNodeAllocator, arena allocation: 112 bytes 1312 times
DOMNodeAllocator, arena allocation: 120 bytes 5135 times
DOMNodeAllocator, arena allocation: 128 bytes 63543 times
DOMNodeAllocator, arena allocation: 136 bytes 1331 times
DOMNodeAllocator, arena allocation: 144 bytes 1854 times
DOMNodeAllocator, arena allocation: 152 bytes 3416 times
DOMNodeAllocator, arena allocation: 160 bytes 874 times
DOMNodeAllocator, arena allocation: 168 bytes 21 times
DOMNodeAllocator, arena allocation: 176 bytes 27 times
DOMNodeAllocator, arena allocation: 184 bytes 18 times
DOMNodeAllocator, arena allocation: 192 bytes 11165 times
DOMNodeAllocator, arena allocation: 200 bytes 19 times
DOMNodeAllocator, arena allocation: 208 bytes 22 times
DOMNodeAllocator, arena allocation: 216 bytes 32 times
DOMNodeAllocator, arena allocation: 224 bytes 2470 times
DOMNodeAllocator, arena allocation: 232 bytes 8707 times
DOMNodeAllocator, arena allocation: 240 bytes 1267 times
DOMNodeAllocator, arena allocation: 248 bytes 24 times
DOMNodeAllocator, arena allocation: 256 bytes 1760 times
DOMNodeAllocator, arena allocation: 264 bytes 9 times
DOMNodeAllocator, arena allocation: 272 bytes 12 times
DOMNodeAllocator, arena allocation: 280 bytes 217 times
DOMNodeAllocator, arena allocation: 288 bytes 24 times
DOMNodeAllocator, arena allocation: 296 bytes 90 times
DOMNodeAllocator, arena allocation: 304 bytes 11 times
DOMNodeAllocator, arena allocation: 312 bytes 10 times
DOMNodeAllocator, arena allocation: 320 bytes 7 times
DOMNodeAllocator, arena allocation: 328 bytes 7 times
DOMNodeAllocator, arena allocation: 336 bytes 5 times
DOMNodeAllocator, arena allocation: 344 bytes 7 times
DOMNodeAllocator, arena allocation: 352 bytes 13 times
DOMNodeAllocator, arena allocation: 360 bytes 21 times
DOMNodeAllocator, arena allocation: 368 bytes 5 times
DOMNodeAllocator, arena allocation: 376 bytes 17 times
DOMNodeAllocator, arena allocation: 384 bytes 14 times
DOMNodeAllocator, arena allocation: 392 bytes 8 times
DOMNodeAllocator, arena allocation: 400 bytes 9 times
DOMNodeAllocator, arena allocation: 408 bytes 10 times
DOMNodeAllocator, arena allocation: 416 bytes 10 times
DOMNodeAllocator, arena allocation: 424 bytes 9 times
DOMNodeAllocator, arena allocation: 432 bytes 2 times
DOMNodeAllocator, arena allocation: 440 bytes 3 times
DOMNodeAllocator, arena allocation: 448 bytes 7 times
DOMNodeAllocator, arena allocation: 456 bytes 2 times
DOMNodeAllocator, arena allocation: 464 bytes 5 times
DOMNodeAllocator, arena allocation: 472 bytes 5 times
DOMNodeAllocator, arena allocation: 480 bytes 5 times
DOMNodeAllocator, arena allocation: 488 bytes 4 times
DOMNodeAllocator, arena allocation: 496 bytes 4 times
DOMNodeAllocator, arena allocation: 504 bytes 3 times
DOMNodeAllocator, normal allocations: 6489 times
I hope ELM handling could be done in bug 407442.
I tested few different plarena sizes, and on 32bit-opt-linux-build
8192 kept memory usage at lower level than 4096 (which is used in the patch) or 
16384. Testcase was to run mochitest and then open a new tab and close the tab where mochitest was run.
Comment on attachment 292227 [details] [diff] [review]
nodes, slots, textfragments and attrandchildarrays in arena

Jonas, what do you think about this?
I'd change PL_InitArenaPool(mPool, "nsDOMNodeAllocator", 4096, 0);
 to use 8196. That makes small documents to use some extra memory, but that shouldn't be too bad.
It would be great if Stuart could test how this affects to the memory fragmentation (and the 4096 -> 8192 change might be good to be done in that case too).
Attachment #292227 - Flags: review?(jonas)
Comment on attachment 292227 [details] [diff] [review]
nodes, slots, textfragments and attrandchildarrays in arena

Why do you need nsSlots::mAllocator? Seems like the only place you use it is in nsNodeUtils::LastRelease?

If so you could add an nsINode::GetAllocator function that uses the pointer in nsAttrAndChildArray or nsTextFragment. You'd also need to move destruction of the slots into the various node classes.

If you do this you also wouldn't need to force the slots to be created when a node is moved between documents.

Make the various mAllocator members be nsRefPtrs rather than manually doing the refcounting.

>Index: content/base/public/nsINode.h
>+  void* operator new(size_t aSize, nsINodeInfo* aNodeInfo);
>+  void operator delete(void* aPtr, size_t aSize);
>+private:
>+  void* operator new(size_t aSize) CPP_THROW_NEW { return nsnull; }
> protected:

Can you simply leave this unimplemented so that it produces a linker error if anyone tries to use it. We do that in nsCOMPtr.h. Same for nsSlots.

Done until nsNodeInfoManager.cpp
Comment on attachment 292227 [details] [diff] [review]
nodes, slots, textfragments and attrandchildarrays in arena

You can remove nsIContent::Compact() now that there are no callers and all implementations are no-ops.

>Index: content/base/src/nsGenericElement.cpp
>+nsINode::operator delete(void* aPtr, size_t aSize)
>+{
>+  size_t* szPtr = static_cast<size_t*>(aPtr);
>+  *szPtr = aSize;
>+}

Ugh ugh ugh. Is there no other way to do this? This is really ugly :(

>Index: content/base/src/nsNodeInfoManager.cpp

>+#define NS_RECYCLER_MULTIPLIER (PR_FloorLog2(sizeof(void*)))
The PR_FloorLog2 call isn't going to compile away, so you'll get faster code if you simply use sizeof(void*) and divide by that instead. That'll make it more of a 'multiplier' anyway. Also comment on what this is used for.

>+nsresult
>+nsDOMNodeAllocator::Init()
>+{
>+#ifdef DEBUG
>+  ++gDOMNodeAllocators;
>+#endif
>+  mPool = new PLArenaPool();
>+  NS_ENSURE_TRUE(mPool, NS_ERROR_OUT_OF_MEMORY);
>+  PL_InitArenaPool(mPool, "nsDOMNodeAllocator", 4096, 0);
>+  memset(mRecyclers, 0, sizeof(mRecyclers));
>+  return NS_OK;
>+}

Can't you make mPool be the arena itself rather than a pointer to it? You're forgetting to delete it in the destructor. That way you could get rid of the Init function too.

>+nsrefcnt
>+nsDOMNodeAllocator::AddRef()
>+{
>+  NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt");
>+  ++mRefCnt;
>+  NS_LOG_ADDREF(this, mRefCnt, "nsDOMNodeAllocator", sizeof(*this));
>+  return mRefCnt;
>+}

You might as well inline this if you make the NS_PRECONDITION into an assertion. That'll make for smaller and faster code. Don't inline Release of course.

>+nsDOMNodeAllocator::Alloc(size_t aSize)
>+  if (result) {
>+    NS_ADDREF_THIS();

Hmm.. So you're refcounting both for alloc/frees, and at all pointers to the pool. This seems excessive, you should be able to just refcount at all pointers to the pool, no? Anyone that allocates memory also needs to keep a reference to the pool somewhere so that they can free it.

>Index: content/base/src/nsNodeInfoManager.h
>+#define NS_NODE_RECYCLER_SIZE 64

Please document what this number is.

>+  // The recycler array is sparse with the indices being multiples of
>+  // sizeof(void*), i.e., 0, 4, 8, 12, 16, 20, ... or 0, 8, 16, ...
>+  void*        mRecyclers[NS_NODE_RECYCLER_SIZE];

So I think technically 'sparse' is correct here, but usually it's taken to mean that you don't actually allocate all items up front, which this array does. So I would avoid that term. Also, the indices aren't actually multiples, it's the memory sizes at the indices that are.

>Index: content/base/src/nsNodeUtils.cpp
> nsNodeUtils::LastRelease(nsINode* aNode)
> {
>   nsINode::nsSlots* slots = aNode->GetExistingSlots();
>+  nsDOMNodeAllocator* allocator = nsnull;
>   if (slots) {
>+    allocator = slots->mAllocator;

As commented above, change this to call aNode->GetAllocator instead and assert that it is non-null for all nodes except documents.

>@@ -230,17 +232,27 @@ nsNodeUtils::LastRelease(nsINode* aNode)
>-  delete aNode;
>+  if (aNode->IsNodeOfType(nsINode::eDOCUMENT)) {
>+    delete aNode;
>+  } else {
>+    if (!allocator) {
>+      allocator = aNode->mNodeInfo->NodeInfoManager()->NodeAllocator();
>+    }
>+    NS_ASSERTION(allocator, "Should have allocator here!");
>+    delete aNode; // Calls destructor and sets size to *aNode.
>+    size_t* sz = reinterpret_cast<size_t*>(aNode);
>+    allocator->Free(*sz, static_cast<void*>(aNode));

This is really really weird. There has to be a better way to do this :(

>Index: content/base/src/nsTextFragment.cpp

>+    if (mState.mLength && mState.mInHeap) {

Are we ever setting mInHeap to true when mLength is 0? Or did you add that as extra precaution. I don't really care much either way, but it'd seem like a bug if we ever set mInHeap to true in such a case.

Looks great otherwise. I do want to take a look at it once mAllocator is removed from slots though.
Attachment #292227 - Flags: review?(jonas) → review-
(In reply to comment #40)
> (From update of attachment 292227 [details] [diff] [review])
> Why do you need nsSlots::mAllocator? Seems like the only place you use it is in

mAllocator points to the allocator of slots, which is always the same as
allocator of node. The makes slots to be created when adopting node.

> If so you could add an nsINode::GetAllocator function that uses the pointer in
> nsAttrAndChildArray or nsTextFragment. You'd also need to move destruction of
> the slots into the various node classes.
Doesn't work. Attribute nodes don't have nsAttrAndChildArray or nsTextFragment.
Using allocator from nodeinfo works always except when the node is adopted.
(in which case slots->mAllocator points to the right allocator.)
 
> Make the various mAllocator members be nsRefPtrs rather than manually doing the
> refcounting.
I don't really want to include nsNodeInfoManager.h (or whereever the allocator
definition is going to live).
(In reply to comment #42)
> > If so you could add an nsINode::GetAllocator function that uses the pointer 
> > in nsAttrAndChildArray or nsTextFragment. You'd also need to move
> > destruction of the slots into the various node classes.
> Doesn't work. Attribute nodes don't have nsAttrAndChildArray or 
> nsTextFragment. Using allocator from nodeinfo works always except when the 
> node is adopted. (in which case slots->mAllocator points to the right
> allocator.)

You could add an mAllocator to nsDOMAttribute. They are very rarely created so size doesn't matter.

> > Make the various mAllocator members be nsRefPtrs rather than manually doing the
> > refcounting.
> I don't really want to include nsNodeInfoManager.h (or whereever the allocator
> definition is going to live).

I think we'll eventually will want to move the allocator into it's own file as we'll likely want to use it for other things too. But I'm fine with doing that as part of the pool-rewriting work.
(In reply to comment #41)
> You can remove nsIContent::Compact()
Er, what?

> Ugh ugh ugh. Is there no other way to do this? This is really ugly :(
Haven't found any other way. Same thing is done in nsFrame::operator delete.

> 
> >Index: content/base/src/nsNodeInfoManager.cpp
> 
> >+#define NS_RECYCLER_MULTIPLIER (PR_FloorLog2(sizeof(void*)))
> The PR_FloorLog2 call isn't going to compile away, so you'll get faster code if
> you simply use sizeof(void*) and divide by that instead.
sizeof(void*) returns 4 or 8, PR_FloorLog2(sizeof(void*) 2 or 3
But sure, I can optimize that.


> Can't you make mPool be the arena itself rather than a pointer to it? 
I was again just trying to not to include plarena.h nsNodeInfoManager.h
> >+    if (mState.mLength && mState.mInHeap) {
> 
> Are we ever setting mInHeap to true when mLength is 0? Or did you add that as
> extra precaution. 
Just some extra check that I added when trying to find the crash.
But I'll re-check if it is needed.

I'd keep the mAllocator in slots. Makes memory handling simpler.
Allocator is in slots if slots is created, otherwise nodeinfo->nimanager->allocator.
Using PLArenaPool* mPool, otherwise got lots of errors because of #include mess.
Still using nsSlots::mAllocator, that IMO keeps memory handling simpler.
No need to special case nsDocument even in more case etc.
But ofc if the size of slots matters a lot, that could be changed.
Attachment #292227 - Attachment is obsolete: true
Attachment #292386 - Flags: review?(jonas)
Attachment #292386 - Attachment is obsolete: true
Attachment #292386 - Flags: review?(jonas)
Attached patch removed slots:mAllocator (obsolete) — Splinter Review
ok, it wasn't so difficult or ugly after all.
Attachment #292403 - Flags: review?(jonas)
Isn't this leaking all nsDocuments since you end up using the new nsINode::operator delete which doesn't actually free the memory.
nsIDocument has its own new and delete operators
Btw, I did test that nsIDocument's delete operator gets called,
at least when compiled on gcc 4.1.2.
If this causes problems with some other compilers, hopefully casting
to nsIDocument and calling explicitly its delete operator helps (in nsNodeUtils::LastRelease).
Comment on attachment 292403 [details] [diff] [review]
removed slots:mAllocator

Where is the operator delete for nsIDocument. I don't see it in the patch or the tree.

>Index: content/base/src/nsNodeInfoManager.h

>+// The size of mRecyclers array. The max size of recycled memory is
>+// sizeof(void(*)) * (NS_NODE_RECYCLER_SIZE - 1).
>+#define NS_NODE_RECYCLER_SIZE 64

This means that you won't fit 256 byte allocations in the arena which seems like it would be useful, so change this to 65.

>Index: content/base/src/nsNodeInfoManager.cpp
>+nsDOMNodeAllocator::Alloc(size_t aSize)
>+{
>+  void* result = nsnull;
>+
>+  // Ensure we have correct alignment for pointers.
>+  aSize = PR_ROUNDUP(aSize, sizeof(void*));
>+  // Check recyclers first
>+  if (aSize < NS_MAX_NODE_RECYCLE_SIZE) {
>+    static PRUint32 highestSetBit = PR_FloorLog2(sizeof(void*));
>+    const PRInt32 index = aSize >> highestSetBit;

Just do |const PRInt32 index = aSize / sizeof(void*)|. You could still keep the NS_RECYCLER_MULTIPLIER define though. Sorry if I was unclear before.

>Index: content/base/src/nsNodeUtils.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/content/base/src/nsNodeUtils.cpp,v
>retrieving revision 3.38
>diff -u -8 -p -r3.38 nsNodeUtils.cpp
>--- content/base/src/nsNodeUtils.cpp	4 Dec 2007 18:37:54 -0000	3.38
>+++ content/base/src/nsNodeUtils.cpp	10 Dec 2007 14:38:38 -0000
>@@ -183,25 +183,33 @@ nsNodeUtils::ParentChainChanged(nsIConte
>         (aContent));
>   }
> }
> 
> void
> nsNodeUtils::LastRelease(nsINode* aNode)
> {
>   nsINode::nsSlots* slots = aNode->GetExistingSlots();
>+  nsDOMNodeAllocator* allocator = aNode->GetAllocator();

You need to turn this into a strong reference (use nsRefPtr) since otherwise the arena could go away when the nodes dtor runs.

r=me with that fixed.
Attachment #292403 - Flags: review?(jonas) → review+
(In reply to comment #50)
> (From update of attachment 292403 [details] [diff] [review])
> Where is the operator delete for nsIDocument. I don't see it in the patch or
> the tree.
NS_DECL_AND_IMPL_ZEROING_OPERATOR_NEW

> You need to turn this into a strong reference (use nsRefPtr) since otherwise
> the arena could go away when the nodes dtor runs.
Node is allocated using the same arena, so it is also addrefed.
Attachment #292403 - Flags: superreview?(jst)
Stuart, you tested the patch. Did it show any improvements to the memory fragmentation?
I did test it -- hard to get exact numbers, but I saw for sure a 20-30x reduction in allocations from content.  Looks to reduce fragmentation somewhat, but we'd need to do some more testing to get better numbers.  As jonas noticed, changing the 64 to 65 will get all the 256 byte allocations in which will cover probably 90% of the remaining allocations not caught.
(In reply to comment #51)
> > You need to turn this into a strong reference (use nsRefPtr) since otherwise
> > the arena could go away when the nodes dtor runs.
> Node is allocated using the same arena, so it is also addrefed.

Yes, but when you call |delete aNode| that will make the node release its reference to the arena, so the |allocator->Free| call two lines later might not be safe.
Ah, you're right.
I run still some tests and noticed that to keep memory usage at lower level also 
on x86_64 builds 4096*(sizeof(void*)/2) is better size for the arena.
So 8192 on 32bit, 16384 on 64bit. That way memory usage stayed few MB lower level
with the patch than without when running tp2 (25 iterations).
I used a patch where max recycled memory was (65-1)*sizeof(void*).

tp2 also showed that the patch might affect performance positively.
Run tp2(25 iterations) twice, and both cases -O3 build without patch
had total mean ~9000, with patch ~8800.
So I realized one more problem with the patch. It currently sort of supports allocations of 0 bytes. The problem is that once you free such an allocation we try to put it in the mRecyclers list. Doing that writes to the first 4 bytes of the allocation (to write the 'next' pointer), which isn't safe if the allocation is 0 bytes.

To fix this, either explicitly don't support 0 byte allocations by letting Alloc assert and return null. Or you could treat 0 byte allocations (and frees) as 4 byte allocations. Or you could treat 0 byte allocations as allocations bigger than 256 bytes.

In any event, doing any of this would let you store 4 byte allocations in mRecyclers[0], 8 byte allocations in mRecyclers[1] etc. This would also let you keep NS_NODE_RECYCLER_SIZE at 64
I'll make 0 byte allocations to be sizeof(void*)
Attached patch +commentsSplinter Review
Attachment #292403 - Attachment is obsolete: true
Attachment #292789 - Flags: superreview?(jst)
Attachment #292403 - Flags: superreview?(jst)
Summary: Use pools and a separate heap for content → Use arena pools and a separate heap for content
I think it is better to return null for allocations of size 0 and assert.  We shouldn't be doing this anywhere in our code.

Also, these things should probably be in the content arena:

NS_NewJSEventListener
NS_NewEventListenerManager
NS_NewDOMEvent
also, nsMappedAttributes should live int he arena
DOMEvents(In reply to comment #60)
> I think it is better to return null for allocations of size 0 and assert.  We
> shouldn't be doing this anywhere in our code.
This really isn't too important thing (because Alloc doesn't actually get called with size 0), but on linux malloc(0) returns non-null.

I'm not sure dom events should live in the arena. They are pretty commonly
used in multiple document contexts. Though *usually* events are very short
living objects, they get often killed by the next CC.

Other things which could live in the arena are all the content's tearoffs, maybe
also content lists. But adding more things to the arena could be a separate 
bug. We should first get tbox numbers with the patch and also some baking time
to see how the patch behaves in the real world.
Agreed about separate bug for remaining classes. Though it doesn't hurt to document it here.
until this is fully reviewed and landed it would be nice to get all the classes possible hooked up to this and in to the patch.  certainly the things i mentioned are non-0 cost and we should get them hooked up as possible.  shouldn't take a lot of time to do so.
I should note that the more things we arena-allocate like this, the more chance we have that something is kept alive for a long time and keeps the arena alive.  Not a problem for internal data structures or perhaps nodes.  Not a problem for DOM events, since the DOM event keeps its target alive, so this would keep the arena alive anyway.  At least when the DOM event and the target are from the same arena....
nsNodeInfo's should be allocated from the arena rather than using their own nsFixedSizeAllocator in nsNodeInfo::Create

bz: we should get some stats on lifetime of the arenas past document destruction and verify that our lifetime assumptions are good.

while I'm adding things to the list, nsEventDispatcher::Dispatch should use the content arena rather than the nsFixedSizeAllocator stuff
In theory nsEventDispatcher::Dispatch may be used with |windowroot| or |window|
which don't have any access to DOM and so there should be some special case
when selecting which allocator to use. Same thing would happen if XHR started 
to use ELM and normal event dispatch and XHR was used in some JS component. 
But I do have a patch somewhere to reuse the dispatcher pool.
Attachment #292789 - Flags: superreview?(jst) → superreview+
Do we want to land the patch as is to being testing and continue on with the other enhancements?
smaug did that yesterday, down 15K allocs and 5ms on Ts, I think.
This also helped Linux Tp by 2% or so, right?
Yes, seems like this helped tp, also on talos.
The effect to memory usage is harder to measure, because more memory is possibly
used (because of recycling etc.), but memory fragmentation shouldn't be
as bad as before so after some (possibly quite long) time memory usage
should be lower.
I'm testing this all the time on FF and TB.

Marking this fixed. Patch(es) for nsJSEventListener
nsEventListenerManager, nsMappedAttributes etc should go to some other bug.
Please reopen this if you find some bad test results.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(In reply to comment #74)
> The effect to memory usage is harder to measure, because more memory is
> possibly used (because of recycling etc.)

Yes, the MaxHeap number for SeaMonkey is up significantly, but I guess that's expected due to the nature of the patch.
I wouldn't expect max heap usage to be "up significantly" but we should take a look at the arena implementation.

smaug: do you have any information on how long arenas live after most pages?  are they deleted after cycle collection or do they last around for a while?  if the latter, do you know what is holding them alive?
The main reason way arenas don't get deleted immediately is bfcache.
And also, CC may not get called immediately, especially if user is active.
So one CC may then release several arenas.
(In reply to comment #75)
> Yes, the MaxHeap number for SeaMonkey is up significantly, but I guess that's
> expected due to the nature of the patch.

How much is "up significantly". It's not really expected to go up a lot under normal usage patterns.
What is MH test actually measuring? Starting the browser and closing it?
Seems like Seamonkey MH is 300K up.
I did test running mochitest and then the *final* memory usage went down.
AttrandChildArrays (== elements), DOMAttributes and textfragments (==textnodes) 
are now sizeof(void*) larger than before. Few thousands elements are created 
during startup, I think, not sure how many textnodes. 
As Smaug pointed out, SeaMonkey MaxHeap has gone from 17.6MB to 17.9MB, which Allocations dropped from 520K to 500K, see the nye column on http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey&maxdate=1197763889&hours=24&legend=0&norules=1 at the time the patch went in (2007/12/15 shortly after 01:00:00).
It actually surprises me that it went up that much. It's unlikely that the extra 4 bytes per node is making that big of a difference, and 300k of unused/recycled arena space sounds like a lot.

I'd still say the benefits very much outweigh it though, but it'd be interesting to see where those 300k are going. 
Everything in comment 77 and comment 78 would apply to the pre-arena world as well, since the DOM stays alive in those cases.

I would assume MH measures the same thing as Rlk, which is running bloatcycle.html.
We shouldn't assume anything about benefits yet -- we have an unknown, it needs to be understood.

Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)? In my dreams, arena allocation has the same or less per-object gross overhead than malloc.

Keeping underutilized arenas around is a big hazard -- internal fragmentation. I've warned Stuart about this often. Is this what is driving MH high? If so, it's a problem.

/be
In light of comment #85, we need a new bug, if this bug's patch isn't backed out and this bug reopened. Please file and cite it here, I guess as blocking this bug per usual custom that helps people find what a branch needs by following the blocking chain.

/be
MH tracks the arguments to malloc, I believe, so moving to an arena system will cause allocation overhead to be counted now, where it wasn't before.  There may also be other space inefficiencies, such as using a void * instead of offset math to find the allocator, but I strongly suspect that we're also seeing an artifact of the measurement mechanism here as well.

Instrumenting arena usage to measure unfreed overhead wasn't hard when I did it for the presshell, not sure how similar this model is.
(In reply to comment #77)
> The main reason way arenas don't get deleted immediately is bfcache.
> 

This doesn't really answer my question.  If you disable bfcache, force cycle collection, etc, how long do the content arenas stick around?  We need to make very sure that our assumptions are good and that they do in fact, under normal circumstances, go away and that we didn't accidentally add some long lifetime objects in there.
The arena allocates memory in 8k chunks, so 300k would be a lot of 'wasted' chunks.

The reason nodes are sizeof(void*) bigger is that they keep a pointer to the arena they were allocated with, so that we can free them from the right arena.

We could do the pointer-math stuff to save us those 4 bytes. This would allow us to write proper delete operators too rather than the uglyness that's currently required.
Let's say the work so far has been good -- necessary but not sufficient. Before 1.9 is done I think we need to step up to the next level. The JS GC uses mmap or VirtualAlloc to get aligned pages from the OS. This allows small allocations to avoid any overhead for a back-pointer -- instead using what I think Sicking must mean by "pointer-math stuff".

We should use the JS page allocator, either factor it into a libjsgcalloc or just rip of the (small amount of) code for the basics.

We should certainly figure out why MH jumped so much -- sicking makes clear my point, which was that rounding up to 8K a few times won't get you to 300K.

Where is the next bug to discuss all of this? Please cc: the right people from this bug.

/be
(In reply to comment #85)
> Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)?
A comment in nsFrameArena implementation says that objects must be aligned based
on sizeof(void*) (in some systems). I had really no reason to not-to-believe it.

nsDOMNodeAllocator has the same problem as nsFrameArena.
Was it shaver or who, who tested nsFrameArena and noticed that in some cases
it contains lots of not-in-real-use memory. So if we can fix the problem with
nsDOMNodeAllocator, same should be applied to nsFrameArena. (Those implementations should probably merge).

But anyway, I'll test this. And btw, to me the increase in MH isn't a surprise 
at all. The current plarena implementation just isn't very good for this
and nsDOMNodeAllocator adds even more overhead. Something similar already in comment #11 and comment #12.
(In reply to comment #91)
> (In reply to comment #85)
> > Smaug, forgive my ignorance, but why did any object increase by sizeof(void*)?
And ofc storing pointer to the next recycled memory area needs at least sizeof(void*), so smaller allocations aren't possible.

Depends on: 408720
Smaug: I don't understand comment 92 -- the issue is not rounding up allocations smaller than 4 bytes (on 32-bit systems) to 4 bytes, it's adding 4 bytes overhead.

I really don't know of a good reason to expect 300K MH increase, unless you have dozens of tabs open. That is an issue in itself -- we must make these things lazily and perhaps do something to avoid bloat for the mega-tabs use case.

But it looks like bug 408720 is the place to talk about all of this, so I'll shut up here.

/be
Can someone point me at test results that show the fragmentation-avoidance gains for this patch?  I'm not sure where we're tracking that, or how we're measuring it (what workloads, what allocators, etc.).
Depends on: 409674
Looks like content arenas are going to be removed.  See bug 414894.
Depends on: 414894
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: