Closed Bug 408720 Opened 17 years ago Closed 17 years ago

MH regression because of bug 403830

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: smaug, Assigned: smaug)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(6 files)

step 1, create pool lazily 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 1.57 KB, patch	sicking : review+ sicking : superreview+	Details \| Diff \| Splinter Review
Smaller arena 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 1004 bytes, patch	sicking : review+ sicking : superreview+	Details \| Diff \| Splinter Review
"small" and "large" pool 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 4.68 KB, patch	sicking : review+ sicking : superreview+	Details \| Diff \| Splinter Review
test patch 1, allow small memory blocks reuse memory reserved for large blocks 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 3.89 KB, patch		Details \| Diff \| Splinter Review
test patch 2, decrease arena size by reallocing. 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 2.83 KB, patch		Details \| Diff \| Splinter Review
Arena allocations / random browsing slashdot/cnn/bbc/m.o/linuxtoday/etc 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 6.10 KB, text/plain		Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Description

•

17 years ago

The arena for content doesn't seem to work as well as it should. Either it should be backed out (though it helps tp quite a bit) or preferable tweaked or replaced with something better. But first we need to know how the memory is used and what is causing the MH increase.

Flags: blocking1.9?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 1

•

17 years ago

So the first thing to optimize is to create pool lazily. Testing shows that we're creating DOMNodeAllocators but not always really using it (at least not the recycler part of it).

Stuart Parmenter

Updated

•

17 years ago

Flags: blocking1.9? → blocking1.9+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 2

•

17 years ago

Attached patch step 1, create pool lazily — Details — Splinter Review

For some reason we're creating allocators which aren't used. That is possible for example when createDocument or createDocumentType is used. Creating pool lazily should help a bit. I saw 7 empty allocators after starting/closing FF. 7*8192bytes is already something.

Assignee: nobody → Olli.Pettay

Status: NEW → ASSIGNED

Attachment #293568 - Flags: superreview?(jonas)

Attachment #293568 - Flags: review?(jonas)

Stuart Parmenter

Comment 3

•

17 years ago

can we please figure out why before checking in more patches?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 4

•

17 years ago

Well, it is perfectly valid to create empty documents (or documenttypes which aren't connected to anywhere). But I'll try to hack something to check at least the URL of those documents.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 5

•

17 years ago

None of those empty documents have document uri...

Boris Zbarsky [:bzbarsky]

Comment 6

•

17 years ago

Why not breakpoint in the document creation and see what the stack (C++ and JS) is?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 7

•

17 years ago

Well, when creating a document it is impossible to know whether there will be some elements in it. I could check *all* the documents which are created between starting and shutting down FF and then somehow record the stack and when document gets deleted connect the initial stack to it... But still, it is perfectly valid to create empty documents. XHR for example may do that.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 8

•

17 years ago

You could add some instrumentation to the pool to see if any objects are allocated. Or even better, compare "memory-high" in the pool vs how much memory the pool actually uses. 7*8k is a lot, but still much short of the 300k. Especially given that those 7 pools probably weren't all alive at the same time.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 9

•

17 years ago

Comment on attachment 293568 [details] [diff] [review] step 1, create pool lazily I'm fine with checking this in though to see how much of those 300k were because of this.

Attachment #293568 - Flags: superreview?(jonas)

Attachment #293568 - Flags: superreview+

Attachment #293568 - Flags: review?(jonas)

Attachment #293568 - Flags: review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 10

•

17 years ago

That 300k was from seamonkey, this 7*8k is from FF. I have locally some instrumentation for the pool, that is how I found these empty allocators.

Robert Kaiser

Comment 11

•

17 years ago

FWIW, The Firefox MH jumps are also ~300K (21.0MB -> 21.3MB) on Linux and 200-300K (17.1/17.7MB -> 17.4/17.9MB) on OSX. Graphs are here: http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=fxdbug-linux-tbox.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:52:12,22341313 http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=bm-xserve11.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:19:49,18766865

Mike Schroepfer

Comment 12

•

17 years ago

(In reply to comment #11) > FWIW, The Firefox MH jumps are also ~300K (21.0MB -> 21.3MB) on Linux and > 200-300K (17.1/17.7MB -> 17.4/17.9MB) on OSX. > Graphs are here: > http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=fxdbug-linux-tbox.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:52:12,22341313 > http://build-graphs.mozilla.org/graph/query.cgi?testname=trace_malloc_maxheap&units=bytes&tbox=bm-xserve11.build.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2007:12:15:02:19:49,18766865 > Also shows up on the Mini's under private bytes: http://graphs.mozilla.org/#spst=range&spstart=1196883676&spend=1197931646&bpst=cursor&bpstart=1196883676&bpend=1197931646&m1tid=53258&m1bl=0&m1avg=1

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 13

•

17 years ago

Comment on attachment 293568 [details] [diff] [review] step 1, create pool lazily This didn't seem to have much effect :( But it is anyway the right thing to do.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 14

•

17 years ago

Brendan, about increased object size: nsTextFragment, nsAttrAndChildArray and nsDOMAttribute have now pointer to the allocator which they use. nsGenericElement gets pointer to the allocator through nsAttrAndChildArray and data nodes through nsTextFragment. If nsTextFragments and nsAttrAndChildArray wouldn't use arena, then nodes could get the pointer to allocator via nodeinfo->nodeinfomanager->allocator, except when a node is adopted to a different document (in that case slots could be created and slots could contain pointer to allocator). Now with the content arena patch sizeof(nsXMLElement) is the same in 1.8 and 1.9, sizeof(nsXULElement) is still smaller in 1.9.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 15

•

17 years ago

Attached patch Smaller arena — Details — Splinter Review

4096*(sizeof(void*)/2) was perhaps too large size for arena(, though on my machine running mochitest with that consumes less memory than 4096*(sizeof(void*)/4)). I tested the patch and not-used-memory-in-arenas gets reduced significantly (~240k on 64bit with those documents alive when shutting down, so possibly ~120 on 32bit). Anyway worth to try, even though this will increase number of malloc calls.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 16

•

17 years ago

Btw, MH may not be that useful testcase. It is testing only very few simple pages. (If I looked and tested the right bloatcycle.html) When using the patch with that, max mem usage seems to be 50-100k smaller on 32bit opt-linux. I tested also using larger arena and mem usage increased, same thing happened when using even smaller arenas.

Mike Schroepfer

Comment 17

•

17 years ago

The talos machines that I linked to in comment 12 are a good test since they run through 400+ pages 10 times. Esp we can land the patch in bug 408228 to have a delay before shutdown and/or setup a separate mini running with multiple windows at once we'll have a much better real-world test. But even w/o those changes those mini graphs are meaningful. I'd also be careful about directly correlating 64/32 bit systems - we should really be testing on 32bit since that's what 99% of folks are using right now.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 18

•

17 years ago

I agree bloatcycle.html contains too few pages to be an ideal benchmark, but I'd still really like to understand why MH goes up that much.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 19

•

17 years ago

One thing to test would be to try to share arena between all documents in a script context, i.e. stick the arena on the "top level" inner window and then pass that down to contained document/frames etc. This should avoid the problem of temporary documents/doctypes being created and destroyed.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 20

•

17 years ago

If a large arena is used, it may contain lots of not-in-use-memory. Arena will (almost) always contain some memory which could be otherwise used elsewhere. And yes, testing on 32bit is more important, but we really shouldn't forget 64bit systems. Those are getting more common all the time. The tests I did with bloatcycle.html were done using optimized Linux build on 32bit system, as I said.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 21

•

17 years ago

(In reply to comment #19) > One thing to test would be to try to share arena between all documents in a > script context, i.e. stick the arena on the "top level" inner window and then > pass that down to contained document/frames etc. This should avoid the problem > of temporary documents/doctypes being created and destroyed. And how to handle then documents created by XHR or DOMParser in JS components?

Brendan Eich [:brendan]

Comment 22

•

17 years ago

Smaug: did sicking mean *arena* or *arena pool* at top level inner window? The latter makes sense, the former does not, as you note. No one should be forcing different-lifetime objects to come from the same arena, but that can happen at the limit. Indeed, the worst case here wastes a whole arena (4K+/-) on each long-lived allocation, where all other allocations from the same arena have died and no new demand has reused the space they occupied. This is why just trying to reduce malloc call counts by potentially rounding each allocation up to a much larger size can bite back -- so-called internal (to the arena) fragmentation. It isn't obvious that everyone is on the same page in fearing this. /be

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

17 years ago

Attachment #293667 - Flags: superreview?(jonas)

Attachment #293667 - Flags: review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 23

•

17 years ago

Both XHR and DOMParser should be able to get to the window that it is executing within, it should be able to get the allocator off of that window.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 24

•

17 years ago

What if XHR or DOMParser in created in a JS component, or in C++ component. Then there isn't necessarily a |window|.

Brendan Eich [:brendan]

Comment 25

•

17 years ago

There's always a window -- or something like it -- for finding a principal. Boris should weigh in here. /be

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 26

•

17 years ago

Those may use nullprincipal. But anyway, to reduce MH, attachment 293667 [details] [diff] [review] should help some, but that will cause more allocations. Having a better arena/memoryrecycler would be great, but my guess is that even in that case more memory would be used in testcases like MH which just loads few (~5) pages so memory fragmentation doesn't yet get to bad.

Boris Zbarsky [:bzbarsky]

Comment 27

•

17 years ago

> There's always a window -- or something like it -- for finding a principal. That's not the case, no. Non-window callers of DOMParser are expected to hand-init it with a principal, while non-window callers of XMLHttpRequest (including C++) are just more or less screwed right now. Is it possible to get the actual data on MH from tinderbox or elsewhere? That is, see what objects are actually allocated? Seems better than taking wild guesses... For what it's worth, I'm leery of sticking everything on the toplevel window because that can lead to things staying alive a lot longer than they should (as the AJAX case does now).

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

17 years ago

Comment on attachment 293667 [details] [diff] [review] Smaller arena I testlanded (and backed out) this and MH did drop (~100K), and I think also tp_pbytes_T/tp_memset_T. 'A' didn't jump up.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 29

•

17 years ago

Comment on attachment 293667 [details] [diff] [review] Smaller arena OK, Seems reasonable to check this in. That still leaves us up 150K though, right?

Attachment #293667 - Flags: superreview?(jonas)

Attachment #293667 - Flags: superreview+

Attachment #293667 - Flags: review?(jonas)

Attachment #293667 - Flags: review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 30

•

17 years ago

(In reply to comment #29) > (From update of attachment 293667 [details] [diff] [review]) > OK, Seems reasonable to check this in. That still leaves us up 150K though, > right? > Seamonkey MH went ~100k down. Similar results for Firefox. Number of allocations didn't increase significantly or at all. Seamonkey MH is still 150k-200k up because of bug 403830. Not sure how to reduce that. Using DOMNodeAllocator uses more memory (because of increased content object sizes and memory reserved for recycling) and running MH is so fast and only so few pages are loaded that memory fragmentation shouldn't get too bad. I'm not sure whether tp_pbytes_T and tp_memset_T are still up because of the content arena; there is some other, significantly worse regression, which happened 29/12/07. Attachment 293667 [details] [diff] did seem to help with those numbers though, just hard to figure out how much comparing to the original regression.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 31

•

17 years ago

Attached patch "small" and "large" pool — Details — Splinter Review

Testing shows that there are number of cases where only few hundreds bytes are allocated from an arena. To keep memory usage lower in those cases the patch introduces mSmallPool which is used until total allocations > (4096 * (sizeof(void*)/4)). On 32bit linux this seems to keep MH ~50K lower level, at least on my machine. It seems that quite often DOMNodeAllocator is used either to allocate "lots" of memory (50k+) or just few hundred (or 1-2 K) bytes.

Attachment #295388 - Flags: superreview?(jonas)

Attachment #295388 - Flags: review?(jonas)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 32

•

17 years ago

Jonas, any comments to the latest patch? Past that I see quite hard to improve MH (MH is very simple test). As far as I understand, using an arena (especially the current PLArena + recycler) will always use more memory than an "optimal malloc". The memory is just (hopefully significantly) less fragmented. And so after a long time (not MH test) and current or realistic malloc implementations memory usage should stay at lower level. That is what I saw when originally testing the patch for bug 403830 using Mochitest.

Mike Schroepfer

Comment 33

•

17 years ago

The private bytes test: http://graphs.mozilla.org/#spst=range&spstart=1196881975&spend=1198106723&bpst=cursor&bpstart=1196881975&bpend=1198106723&m1tid=53222&m1bl=0&m1avg=0&m2tid=53258&m2bl=0&m2avg=0&m3tid=53240&m3bl=0&m3avg=0&m4tid=53280&m4bl=0&m4avg=0 Does ~4k page loads and the terminal number is the one on the test. So that's a better indication of long-term mem use

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 34

•

17 years ago

Ah, right, I forgot Tpbytes. Would be great to get the 12/29 regression fixed. When attachment 293667 [details] [diff] [review] landed, tpbytes did go down, I think. That happened 1/1/2008

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Blocks: 407442

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Priority: -- → P1

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 35

•

17 years ago

Comment on attachment 295388 [details] [diff] [review] "small" and "large" pool Looks good, sorry for the slow review. Definitely keep an eye on Tpbytes and see what the grand total of all patches is after this patch lands.

Attachment #295388 - Flags: superreview?(jonas)

Attachment #295388 - Flags: superreview+

Attachment #295388 - Flags: review?(jonas)

Attachment #295388 - Flags: review+

Reed Loden [:reed]

Comment 36

•

17 years ago

Will bug 404824 help this, since it supposedly helps optimize PLArena? There's also some bit-ordering fixes in another bug tied to that one.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 37

•

17 years ago

(In reply to comment #36) > Will bug 404824 help this, since it supposedly helps optimize PLArena? I doubt that bug will have any effect here. tpbytes regression is possibly fixed, though hard to see because of other changes. http://graphs.mozilla.org/graph.html#spst=range&spstart=0&spend=1200833894&bpst=cursor&bpstart=0&bpend=1200833894&m1tid=53240&m1bl=0&m1avg=0 The regression happened 12/15/2007, fixes 12/18/2007, 1/1/2008, 1/18/2007 (before the bad temporary regression.) MH is clearly still up, especially on seamonkey where other smaller changes to MH have pretty much eaten all the fixes from this bug. Still trying to figure out how to optimize arena usage some more. Suggestions welcome.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 38

•

17 years ago

Btw, I tried removing nsFrameArena and making presshell to use DOMNodeallocator. That had negative impact to MH.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 39

•

17 years ago

Attached patch test patch 1, allow small memory blocks reuse memory reserved for large blocks — Details — Splinter Review

This doesn't affect too much to MH.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 40

•

17 years ago

Attached patch test patch 2, decrease arena size by reallocing. — Details — Splinter Review

Very ugly test. Even if reallocing released some memory for other usage, couldn't be huge improvement to MH. On 64bit system max ~6k could be released during MH run.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 41

•

17 years ago

The increased size of content objects is responsible of ~20k MH (on 32bit), if I didn't make any mistake. I think reusing the same allocator for chrome XBL documents might make sense. At least going to test that.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 42

•

17 years ago

(In reply to comment #41) > I think reusing the same allocator for chrome XBL documents might make sense. > At least going to test that. Tested. Doesn't help with MH :(

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 43

•

17 years ago

I'm less worried about MH, the better test is Tpbytes. As you say, MH is too short to see any advantages due to less fragmentation.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 44

•

17 years ago

Attached file Arena allocations / random browsing slashdot/cnn/bbc/m.o/linuxtoday/etc — Details

The amount of allocated bytes varies a lot.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 45

•

17 years ago

When thinking about whether to keep the arena or not, one thing to remember is the positive effect it had to performance .

Mike Schroepfer

Comment 46

•

17 years ago

Because this was backed out in bug 414894 I'm resolving wontfix

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Resolution: --- → WONTFIX

timeless

Updated

•

16 years ago

Component: DOM: Core → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.