Closed Bug 408720 Opened 12 years ago Closed 12 years ago

MH regression because of bug 403830

Categories

(Core :: DOM: Core & HTML, defect, P1)

x86
All
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: smaug, Assigned: smaug)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(6 files)

The arena for content doesn't seem to work as well as it should.
Either it should be backed out (though it helps tp quite a bit) or
preferable tweaked or replaced with something better.
But first we need to know how the memory is used and what is causing the
MH increase.
Flags: blocking1.9?
So the first thing to optimize is to create pool lazily.
Testing shows that we're creating DOMNodeAllocators but not always really
using it (at least not the recycler part of it).
Flags: blocking1.9? → blocking1.9+
For some reason we're creating allocators which aren't used.
That is possible for example when createDocument or createDocumentType is used.
Creating pool lazily should help a bit.
I saw 7 empty allocators after starting/closing FF. 7*8192bytes is already 
something.
Assignee: nobody → Olli.Pettay
Status: NEW → ASSIGNED
Attachment #293568 - Flags: superreview?(jonas)
Attachment #293568 - Flags: review?(jonas)
can we please figure out why before checking in more patches?
Well, it is perfectly valid to create empty documents (or documenttypes which 
aren't connected to anywhere). But I'll try to hack something to check at least
the URL of those documents.
None of those empty documents have document uri...
Why not breakpoint in the document creation and see what the stack (C++ and JS) is?
Well, when creating a document it is impossible to know whether there will
be some elements in it. I could check *all* the documents which are created
between starting and shutting down FF and then somehow record the stack and
when document gets deleted connect the initial stack to it...
But still, it is perfectly valid to create empty documents. XHR for example
may do that.
You could add some instrumentation to the pool to see if any objects are
allocated. Or even better, compare "memory-high" in the pool vs how much memory
the pool actually uses.

7*8k is a lot, but still much short of the 300k. Especially given that those 7 pools probably weren't all alive at the same time.
Comment on attachment 293568 [details] [diff] [review]
step 1, create pool lazily

I'm fine with checking this in though to see how much of those 300k were because of this.
Attachment #293568 - Flags: superreview?(jonas)
Attachment #293568 - Flags: superreview+
Attachment #293568 - Flags: review?(jonas)
Attachment #293568 - Flags: review+
That 300k was from seamonkey, this 7*8k is from FF.
I have locally some instrumentation for the pool, that is how
I found these empty allocators.
Comment on attachment 293568 [details] [diff] [review]
step 1, create pool lazily

This didn't seem to have much effect :( But it is anyway the right thing to do.
Brendan, about increased object size:
nsTextFragment, nsAttrAndChildArray and nsDOMAttribute have now pointer to the
allocator which they use. nsGenericElement gets pointer to the allocator
through nsAttrAndChildArray and data nodes through nsTextFragment.
If nsTextFragments and nsAttrAndChildArray wouldn't use arena, then nodes could
get the pointer to allocator via nodeinfo->nodeinfomanager->allocator, except
when a node is adopted to a different document (in that case slots could be 
created and slots could contain pointer to allocator).
Now with the content arena patch sizeof(nsXMLElement) is the same in 1.8 and
1.9, sizeof(nsXULElement) is still smaller in 1.9.
Attached patch Smaller arenaSplinter Review
4096*(sizeof(void*)/2) was perhaps too large size for arena(, though on my machine
running mochitest with that consumes less memory than 4096*(sizeof(void*)/4)).
I tested the patch and not-used-memory-in-arenas gets reduced significantly
(~240k on 64bit with those documents alive when shutting down,
so possibly ~120 on 32bit). Anyway worth to try, even though this will increase
number of malloc calls.
Btw, MH may not be that useful testcase. It is testing only very few simple pages.
(If I looked and tested the right bloatcycle.html)
When using the patch with that, max mem usage seems to be 50-100k smaller on 
32bit opt-linux. I tested also using larger arena and mem usage increased, same thing happened when using even smaller arenas.
The talos machines that I linked to in comment 12 are a good test since they run through 400+ pages 10 times.   Esp we can land the patch in bug 408228 to have a delay before shutdown and/or setup a separate mini running with multiple windows at once we'll have a much better real-world test.

But even w/o those changes those mini graphs are meaningful.  I'd also be careful about directly correlating 64/32 bit systems - we should really be testing on 32bit since that's what 99% of folks are using right now.
I agree bloatcycle.html contains too few pages to be an ideal benchmark, but I'd still really like to understand why MH goes up that much.
One thing to test would be to try to share arena between all documents in a script context, i.e. stick the arena on the "top level" inner window and then pass that down to contained document/frames etc. This should avoid the problem of temporary documents/doctypes being created and destroyed.
If a large arena is used, it may contain lots of not-in-use-memory.
Arena will (almost) always contain some memory which could be otherwise used
elsewhere.
And yes, testing on 32bit is more important, but we really shouldn't forget
64bit systems. Those are getting more common all the time.
The tests I did with bloatcycle.html were done using optimized Linux build on
32bit system, as I said.
(In reply to comment #19)
> One thing to test would be to try to share arena between all documents in a
> script context, i.e. stick the arena on the "top level" inner window and then
> pass that down to contained document/frames etc. This should avoid the problem
> of temporary documents/doctypes being created and destroyed.
And how to handle then documents created by XHR or DOMParser in JS components?

Smaug: did sicking mean *arena* or *arena pool* at top level inner window? The latter makes sense, the former does not, as you note.

No one should be forcing different-lifetime objects to come from the same arena, but that can happen at the limit. Indeed, the worst case here wastes a whole arena (4K+/-) on each long-lived allocation, where all other allocations from the same arena have died and no new demand has reused the space they occupied.

This is why just trying to reduce malloc call counts by potentially rounding each allocation up to a much larger size can bite back -- so-called internal (to the arena) fragmentation. It isn't obvious that everyone is on the same page in fearing this.

/be
Attachment #293667 - Flags: superreview?(jonas)
Attachment #293667 - Flags: review?(jonas)
Both XHR and DOMParser should be able to get to the window that it is executing within, it should be able to get the allocator off of that window.
What if XHR or DOMParser in created in a JS component, or in C++ component.
Then there isn't necessarily a |window|.
There's always a window -- or something like it -- for finding a principal. Boris should weigh in here.

/be
Those may use nullprincipal.

But anyway, to reduce MH, attachment 293667 [details] [diff] [review] should help some, but that will cause 
more allocations. Having a better arena/memoryrecycler would be great, but my 
guess is that even in that case more memory would be used in testcases like
MH which just loads few (~5) pages so memory fragmentation doesn't yet get to bad.
> There's always a window -- or something like it -- for finding a principal.

That's not the case, no.  Non-window callers of DOMParser are expected to hand-init it with a principal, while non-window callers of XMLHttpRequest (including C++) are just more or less screwed right now.

Is it possible to get the actual data on MH from tinderbox or elsewhere?  That is, see what objects are actually allocated?  Seems better than taking wild guesses...

For what it's worth, I'm leery of sticking everything on the toplevel window because that can lead to things staying alive a lot longer than they should (as the AJAX case does now).
Comment on attachment 293667 [details] [diff] [review]
Smaller arena

I testlanded (and backed out) this and MH did drop (~100K), and I think also tp_pbytes_T/tp_memset_T.
'A' didn't jump up.
Comment on attachment 293667 [details] [diff] [review]
Smaller arena

OK, Seems reasonable to check this in. That still leaves us up 150K though, right?
Attachment #293667 - Flags: superreview?(jonas)
Attachment #293667 - Flags: superreview+
Attachment #293667 - Flags: review?(jonas)
Attachment #293667 - Flags: review+
(In reply to comment #29)
> (From update of attachment 293667 [details] [diff] [review])
> OK, Seems reasonable to check this in. That still leaves us up 150K though,
> right?
> 
Seamonkey MH went ~100k down. Similar results for Firefox. Number of allocations
didn't increase significantly or at all.
Seamonkey MH is still 150k-200k up because of bug 403830.
Not sure how to reduce that. Using DOMNodeAllocator uses more memory (because 
of increased content object sizes and memory reserved for recycling) and
running MH is so fast and only so few pages are loaded that memory
fragmentation shouldn't get too bad.

I'm not sure whether tp_pbytes_T and tp_memset_T are still up because of
the content arena; there is some other, significantly worse regression, which
happened 29/12/07. Attachment 293667 [details] [diff] did seem to help with those numbers 
though, just hard to figure out how much comparing to the original regression.
Testing shows that there are number of cases where only few hundreds bytes are
allocated from an arena. To keep memory usage lower in those cases the patch
introduces mSmallPool which is used until total allocations > (4096 * (sizeof(void*)/4)).
On 32bit linux this seems to keep MH ~50K lower level, at least on my machine.

It seems that quite often DOMNodeAllocator is used either to allocate "lots" of
memory (50k+) or just few hundred (or 1-2 K) bytes.
Attachment #295388 - Flags: superreview?(jonas)
Attachment #295388 - Flags: review?(jonas)
Jonas, any comments to the latest patch? Past that I see quite hard to improve
MH (MH is very simple test). As far as I understand, using an arena (especially 
the current PLArena + recycler) will always use more memory than an "optimal
malloc". The memory is just (hopefully significantly) less fragmented. And so 
after a long time (not MH test) and current or realistic malloc 
implementations memory usage should stay at lower level. That is what I saw when 
originally testing the patch for bug 403830 using Mochitest.
Ah, right, I forgot Tpbytes.
Would be great to get the 12/29 regression fixed. 
When attachment 293667 [details] [diff] [review] landed, tpbytes did go down, I think. That happened
1/1/2008
Comment on attachment 295388 [details] [diff] [review]
"small" and "large" pool

Looks good, sorry for the slow review.

Definitely keep an eye on Tpbytes and see what the grand total of all patches is after this patch lands.
Attachment #295388 - Flags: superreview?(jonas)
Attachment #295388 - Flags: superreview+
Attachment #295388 - Flags: review?(jonas)
Attachment #295388 - Flags: review+
Will bug 404824 help this, since it supposedly helps optimize PLArena? There's also some bit-ordering fixes in another bug tied to that one.
(In reply to comment #36)
> Will bug 404824 help this, since it supposedly helps optimize PLArena? 
I doubt that bug will have any effect here.

tpbytes regression is possibly fixed, though hard to see because of other
changes.
http://graphs.mozilla.org/graph.html#spst=range&spstart=0&spend=1200833894&bpst=cursor&bpstart=0&bpend=1200833894&m1tid=53240&m1bl=0&m1avg=0
The regression happened 12/15/2007, fixes 12/18/2007, 1/1/2008, 1/18/2007 (before the bad temporary regression.)

MH is clearly still up, especially on seamonkey where other smaller changes to
MH have pretty much eaten all the fixes from this bug.

Still trying to figure out how to optimize arena usage some more.
Suggestions welcome.
Btw, I tried removing nsFrameArena and making presshell to use DOMNodeallocator.
That had negative impact to MH.
This doesn't affect too much to MH.
Very ugly test. Even if reallocing released some memory for other usage, couldn't
be huge improvement to MH. On 64bit system max ~6k could be released during MH run.
The increased size of content objects is responsible of ~20k MH (on 32bit), 
if I didn't make any mistake.

I think reusing the same allocator for chrome XBL documents might make sense.
At least going to test that.
(In reply to comment #41)
> I think reusing the same allocator for chrome XBL documents might make sense.
> At least going to test that.
Tested. Doesn't help with MH :(

I'm less worried about MH, the better test is Tpbytes. As you say, MH is too short to see any advantages due to less fragmentation.
The amount of allocated bytes varies a lot.
When thinking about whether to keep the arena or not, one thing to remember is
the positive effect it had to performance .
Because this was backed out in bug 414894 I'm resolving wontfix
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Component: DOM: Core → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.