Closed
Bug 408720
Opened 17 years ago
Closed 17 years ago
MH regression because of bug 403830
Categories
(Core :: DOM: Core & HTML, defect, P1)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: smaug, Assigned: smaug)
References
Details
(Keywords: memory-footprint, perf)
Attachments
(6 files)
1.57 KB,
patch
|
sicking
:
review+
sicking
:
superreview+
|
Details | Diff | Splinter Review |
1004 bytes,
patch
|
sicking
:
review+
sicking
:
superreview+
|
Details | Diff | Splinter Review |
4.68 KB,
patch
|
sicking
:
review+
sicking
:
superreview+
|
Details | Diff | Splinter Review |
3.89 KB,
patch
|
Details | Diff | Splinter Review | |
2.83 KB,
patch
|
Details | Diff | Splinter Review | |
6.10 KB,
text/plain
|
Details |
The arena for content doesn't seem to work as well as it should.
Either it should be backed out (though it helps tp quite a bit) or
preferable tweaked or replaced with something better.
But first we need to know how the memory is used and what is causing the
MH increase.
Flags: blocking1.9?
Assignee | ||
Comment 1•17 years ago
|
||
So the first thing to optimize is to create pool lazily.
Testing shows that we're creating DOMNodeAllocators but not always really
using it (at least not the recycler part of it).
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Assignee | ||
Comment 2•17 years ago
|
||
For some reason we're creating allocators which aren't used.
That is possible for example when createDocument or createDocumentType is used.
Creating pool lazily should help a bit.
I saw 7 empty allocators after starting/closing FF. 7*8192bytes is already
something.
Assignee: nobody → Olli.Pettay
Status: NEW → ASSIGNED
Attachment #293568 -
Flags: superreview?(jonas)
Attachment #293568 -
Flags: review?(jonas)
Comment 3•17 years ago
|
||
can we please figure out why before checking in more patches?
Assignee | ||
Comment 4•17 years ago
|
||
Well, it is perfectly valid to create empty documents (or documenttypes which
aren't connected to anywhere). But I'll try to hack something to check at least
the URL of those documents.
Assignee | ||
Comment 5•17 years ago
|
||
None of those empty documents have document uri...
Comment 6•17 years ago
|
||
Why not breakpoint in the document creation and see what the stack (C++ and JS) is?
Assignee | ||
Comment 7•17 years ago
|
||
Well, when creating a document it is impossible to know whether there will
be some elements in it. I could check *all* the documents which are created
between starting and shutting down FF and then somehow record the stack and
when document gets deleted connect the initial stack to it...
But still, it is perfectly valid to create empty documents. XHR for example
may do that.
You could add some instrumentation to the pool to see if any objects are
allocated. Or even better, compare "memory-high" in the pool vs how much memory
the pool actually uses.
7*8k is a lot, but still much short of the 300k. Especially given that those 7 pools probably weren't all alive at the same time.
Comment on attachment 293568 [details] [diff] [review]
step 1, create pool lazily
I'm fine with checking this in though to see how much of those 300k were because of this.
Attachment #293568 -
Flags: superreview?(jonas)
Attachment #293568 -
Flags: superreview+
Attachment #293568 -
Flags: review?(jonas)
Attachment #293568 -
Flags: review+
Assignee | ||
Comment 10•17 years ago
|
||
That 300k was from seamonkey, this 7*8k is from FF.
I have locally some instrumentation for the pool, that is how
I found these empty allocators.
Comment 11•17 years ago
|
||
FWIW, The Firefox MH jumps are also ~300K (21.0MB -> 21.3MB) on Linux and 200-300K (17.1/17.7MB -> 17.4/17.9MB) on OSX.
Graphs are here:
http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=fxdbug-linux-tbox.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:52:12,22341313
http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=bm-xserve11.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:19:49,18766865
Comment 12•17 years ago
|
||
(In reply to comment #11)
> FWIW, The Firefox MH jumps are also ~300K (21.0MB -> 21.3MB) on Linux and
> 200-300K (17.1/17.7MB -> 17.4/17.9MB) on OSX.
> Graphs are here:
> http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=fxdbug-linux-tbox.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:52:12,22341313
> http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=bm-xserve11.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:19:49,18766865
>
Also shows up on the Mini's under private bytes:
http://graphs.mozilla.org/#spst=range&spstart=1196883676&spend=1197931646&bpst=cursor&bpstart=1196883676&bpend=1197931646&m1tid=53258&m1bl=0&m1avg=1
Assignee | ||
Comment 13•17 years ago
|
||
Comment on attachment 293568 [details] [diff] [review]
step 1, create pool lazily
This didn't seem to have much effect :( But it is anyway the right thing to do.
Assignee | ||
Comment 14•17 years ago
|
||
Brendan, about increased object size:
nsTextFragment, nsAttrAndChildArray and nsDOMAttribute have now pointer to the
allocator which they use. nsGenericElement gets pointer to the allocator
through nsAttrAndChildArray and data nodes through nsTextFragment.
If nsTextFragments and nsAttrAndChildArray wouldn't use arena, then nodes could
get the pointer to allocator via nodeinfo->nodeinfomanager->allocator, except
when a node is adopted to a different document (in that case slots could be
created and slots could contain pointer to allocator).
Now with the content arena patch sizeof(nsXMLElement) is the same in 1.8 and
1.9, sizeof(nsXULElement) is still smaller in 1.9.
Assignee | ||
Comment 15•17 years ago
|
||
4096*(sizeof(void*)/2) was perhaps too large size for arena(, though on my machine
running mochitest with that consumes less memory than 4096*(sizeof(void*)/4)).
I tested the patch and not-used-memory-in-arenas gets reduced significantly
(~240k on 64bit with those documents alive when shutting down,
so possibly ~120 on 32bit). Anyway worth to try, even though this will increase
number of malloc calls.
Assignee | ||
Comment 16•17 years ago
|
||
Btw, MH may not be that useful testcase. It is testing only very few simple pages.
(If I looked and tested the right bloatcycle.html)
When using the patch with that, max mem usage seems to be 50-100k smaller on
32bit opt-linux. I tested also using larger arena and mem usage increased, same thing happened when using even smaller arenas.
Comment 17•17 years ago
|
||
The talos machines that I linked to in comment 12 are a good test since they run through 400+ pages 10 times. Esp we can land the patch in bug 408228 to have a delay before shutdown and/or setup a separate mini running with multiple windows at once we'll have a much better real-world test.
But even w/o those changes those mini graphs are meaningful. I'd also be careful about directly correlating 64/32 bit systems - we should really be testing on 32bit since that's what 99% of folks are using right now.
I agree bloatcycle.html contains too few pages to be an ideal benchmark, but I'd still really like to understand why MH goes up that much.
One thing to test would be to try to share arena between all documents in a script context, i.e. stick the arena on the "top level" inner window and then pass that down to contained document/frames etc. This should avoid the problem of temporary documents/doctypes being created and destroyed.
Assignee | ||
Comment 20•17 years ago
|
||
If a large arena is used, it may contain lots of not-in-use-memory.
Arena will (almost) always contain some memory which could be otherwise used
elsewhere.
And yes, testing on 32bit is more important, but we really shouldn't forget
64bit systems. Those are getting more common all the time.
The tests I did with bloatcycle.html were done using optimized Linux build on
32bit system, as I said.
Assignee | ||
Comment 21•17 years ago
|
||
(In reply to comment #19)
> One thing to test would be to try to share arena between all documents in a
> script context, i.e. stick the arena on the "top level" inner window and then
> pass that down to contained document/frames etc. This should avoid the problem
> of temporary documents/doctypes being created and destroyed.
And how to handle then documents created by XHR or DOMParser in JS components?
Comment 22•17 years ago
|
||
Smaug: did sicking mean *arena* or *arena pool* at top level inner window? The latter makes sense, the former does not, as you note.
No one should be forcing different-lifetime objects to come from the same arena, but that can happen at the limit. Indeed, the worst case here wastes a whole arena (4K+/-) on each long-lived allocation, where all other allocations from the same arena have died and no new demand has reused the space they occupied.
This is why just trying to reduce malloc call counts by potentially rounding each allocation up to a much larger size can bite back -- so-called internal (to the arena) fragmentation. It isn't obvious that everyone is on the same page in fearing this.
/be
Assignee | ||
Updated•17 years ago
|
Attachment #293667 -
Flags: superreview?(jonas)
Attachment #293667 -
Flags: review?(jonas)
Both XHR and DOMParser should be able to get to the window that it is executing within, it should be able to get the allocator off of that window.
Assignee | ||
Comment 24•17 years ago
|
||
What if XHR or DOMParser in created in a JS component, or in C++ component.
Then there isn't necessarily a |window|.
Comment 25•17 years ago
|
||
There's always a window -- or something like it -- for finding a principal. Boris should weigh in here.
/be
Assignee | ||
Comment 26•17 years ago
|
||
Those may use nullprincipal.
But anyway, to reduce MH, attachment 293667 [details] [diff] [review] should help some, but that will cause
more allocations. Having a better arena/memoryrecycler would be great, but my
guess is that even in that case more memory would be used in testcases like
MH which just loads few (~5) pages so memory fragmentation doesn't yet get to bad.
Comment 27•17 years ago
|
||
> There's always a window -- or something like it -- for finding a principal.
That's not the case, no. Non-window callers of DOMParser are expected to hand-init it with a principal, while non-window callers of XMLHttpRequest (including C++) are just more or less screwed right now.
Is it possible to get the actual data on MH from tinderbox or elsewhere? That is, see what objects are actually allocated? Seems better than taking wild guesses...
For what it's worth, I'm leery of sticking everything on the toplevel window because that can lead to things staying alive a lot longer than they should (as the AJAX case does now).
Assignee | ||
Comment 28•17 years ago
|
||
Comment on attachment 293667 [details] [diff] [review]
Smaller arena
I testlanded (and backed out) this and MH did drop (~100K), and I think also tp_pbytes_T/tp_memset_T.
'A' didn't jump up.
Comment on attachment 293667 [details] [diff] [review]
Smaller arena
OK, Seems reasonable to check this in. That still leaves us up 150K though, right?
Attachment #293667 -
Flags: superreview?(jonas)
Attachment #293667 -
Flags: superreview+
Attachment #293667 -
Flags: review?(jonas)
Attachment #293667 -
Flags: review+
Assignee | ||
Comment 30•17 years ago
|
||
(In reply to comment #29)
> (From update of attachment 293667 [details] [diff] [review])
> OK, Seems reasonable to check this in. That still leaves us up 150K though,
> right?
>
Seamonkey MH went ~100k down. Similar results for Firefox. Number of allocations
didn't increase significantly or at all.
Seamonkey MH is still 150k-200k up because of bug 403830.
Not sure how to reduce that. Using DOMNodeAllocator uses more memory (because
of increased content object sizes and memory reserved for recycling) and
running MH is so fast and only so few pages are loaded that memory
fragmentation shouldn't get too bad.
I'm not sure whether tp_pbytes_T and tp_memset_T are still up because of
the content arena; there is some other, significantly worse regression, which
happened 29/12/07. Attachment 293667 [details] [diff] did seem to help with those numbers
though, just hard to figure out how much comparing to the original regression.
Assignee | ||
Comment 31•17 years ago
|
||
Testing shows that there are number of cases where only few hundreds bytes are
allocated from an arena. To keep memory usage lower in those cases the patch
introduces mSmallPool which is used until total allocations > (4096 * (sizeof(void*)/4)).
On 32bit linux this seems to keep MH ~50K lower level, at least on my machine.
It seems that quite often DOMNodeAllocator is used either to allocate "lots" of
memory (50k+) or just few hundred (or 1-2 K) bytes.
Attachment #295388 -
Flags: superreview?(jonas)
Attachment #295388 -
Flags: review?(jonas)
Assignee | ||
Comment 32•17 years ago
|
||
Jonas, any comments to the latest patch? Past that I see quite hard to improve
MH (MH is very simple test). As far as I understand, using an arena (especially
the current PLArena + recycler) will always use more memory than an "optimal
malloc". The memory is just (hopefully significantly) less fragmented. And so
after a long time (not MH test) and current or realistic malloc
implementations memory usage should stay at lower level. That is what I saw when
originally testing the patch for bug 403830 using Mochitest.
Comment 33•17 years ago
|
||
The private bytes test:
http://graphs.mozilla.org/#spst=range&spstart=1196881975&spend=1198106723&bpst=cursor&bpstart=1196881975&bpend=1198106723&m1tid=53222&m1bl=0&m1avg=0&m2tid=53258&m2bl=0&m2avg=0&m3tid=53240&m3bl=0&m3avg=0&m4tid=53280&m4bl=0&m4avg=0
Does ~4k page loads and the terminal number is the one on the test. So that's a better indication of long-term mem use
Assignee | ||
Comment 34•17 years ago
|
||
Ah, right, I forgot Tpbytes.
Would be great to get the 12/29 regression fixed.
When attachment 293667 [details] [diff] [review] landed, tpbytes did go down, I think. That happened
1/1/2008
Blocks: 407442
Priority: -- → P1
Comment on attachment 295388 [details] [diff] [review]
"small" and "large" pool
Looks good, sorry for the slow review.
Definitely keep an eye on Tpbytes and see what the grand total of all patches is after this patch lands.
Attachment #295388 -
Flags: superreview?(jonas)
Attachment #295388 -
Flags: superreview+
Attachment #295388 -
Flags: review?(jonas)
Attachment #295388 -
Flags: review+
Comment 36•17 years ago
|
||
Will bug 404824 help this, since it supposedly helps optimize PLArena? There's also some bit-ordering fixes in another bug tied to that one.
Assignee | ||
Comment 37•17 years ago
|
||
(In reply to comment #36)
> Will bug 404824 help this, since it supposedly helps optimize PLArena?
I doubt that bug will have any effect here.
tpbytes regression is possibly fixed, though hard to see because of other
changes.
http://graphs.mozilla.org/graph.html#spst=range&spstart=0&spend=1200833894&bpst=cursor&bpstart=0&bpend=1200833894&m1tid=53240&m1bl=0&m1avg=0
The regression happened 12/15/2007, fixes 12/18/2007, 1/1/2008, 1/18/2007 (before the bad temporary regression.)
MH is clearly still up, especially on seamonkey where other smaller changes to
MH have pretty much eaten all the fixes from this bug.
Still trying to figure out how to optimize arena usage some more.
Suggestions welcome.
Assignee | ||
Comment 38•17 years ago
|
||
Btw, I tried removing nsFrameArena and making presshell to use DOMNodeallocator.
That had negative impact to MH.
Assignee | ||
Comment 39•17 years ago
|
||
This doesn't affect too much to MH.
Assignee | ||
Comment 40•17 years ago
|
||
Very ugly test. Even if reallocing released some memory for other usage, couldn't
be huge improvement to MH. On 64bit system max ~6k could be released during MH run.
Assignee | ||
Comment 41•17 years ago
|
||
The increased size of content objects is responsible of ~20k MH (on 32bit),
if I didn't make any mistake.
I think reusing the same allocator for chrome XBL documents might make sense.
At least going to test that.
Assignee | ||
Comment 42•17 years ago
|
||
(In reply to comment #41)
> I think reusing the same allocator for chrome XBL documents might make sense.
> At least going to test that.
Tested. Doesn't help with MH :(
I'm less worried about MH, the better test is Tpbytes. As you say, MH is too short to see any advantages due to less fragmentation.
Assignee | ||
Comment 44•17 years ago
|
||
The amount of allocated bytes varies a lot.
Assignee | ||
Comment 45•17 years ago
|
||
When thinking about whether to keep the arena or not, one thing to remember is
the positive effect it had to performance .
Comment 46•17 years ago
|
||
Because this was backed out in bug 414894 I'm resolving wontfix
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•