Closed Bug 422245 Opened 12 years ago Closed 5 years ago

Tune sizes of arenas using PLArenas

Categories

(Core :: General, defect, P3, minor)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: moz, Assigned: moz)

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file)

The sizes of PLArenas should be larger.  See https://bugzilla.mozilla.org/show_bug.cgi?id=414088#c15.
Depends on: 414088, 419131
Keywords: perf
Priority: -- → P4
Summary: Tune sizes of arenas using plarenas → Tune sizes of arenas using PLArenas
Version: unspecified → Trunk
Changing priority to match bug 421435.
Priority: P4 → P3
The 'footprint' keyword now applies, too, as I was able to improve the memory consumption with tuning.
No longer depends on: 414088, 419131
Keywords: footprint
Bug 414088 and bug 419131 are not strictly dependencies, so I removed them.  However, their patches should be cumulative with the memory savings from the patch(es) in this bug.
This patch produces substantial improvements in memory use in the 'start and stop the browser' test.

High water mark memory consumption by PLArenas is reduced mildly (4.6%) from 978704 to 935552.  More importantly, the number of calls to malloc is reduced by a very serious 24% from 898 to 725.
Attachment #309043 - Flags: review?(pavlov)
Tdhtml seems to be improved on this test, though I'm not comparing apples with apples (different builds).

The number of calls to malloc is reduced by 13%.  The high water mark real memory use is reduced by 6%.

Some of these benefits will not appear in conjunction with patches from bug 419131 and 414088.  Though, things definitely won't get worse.

419131's effects should "trump" this bug's effects.
Attachment #309043 - Flags: superreview?(pavlov)
Attachment #309043 - Flags: review?(wtc)
Attachment #309043 - Flags: review?(pavlov)
Attachment #309043 - Flags: review?(jonas)
Comment on attachment 309043 [details] [diff] [review]
v1, alters a bunch of callers of PLArena code

I only reviewed the trivial change to mozilla/nsprpub/lib/ds/plarena.c.
r=wtc on that change.
Attachment #309043 - Flags: review?(wtc) → review+
(In reply to comment #4)
> High water mark memory consumption by PLArenas is reduced mildly (4.6%) from
> 978704 to 935552.  More importantly, the number of calls to malloc is reduced
> by a very serious 24% from 898 to 725.

Wait, we're just saving some 175 malloc calls? Isn't that a very tiny fraction of the total number of malloc calls?

I'm worried about increased memory usage here. For example the increase to kNodeInfoPoolInitialSize seems excessive, what was that based on?
(In reply to comment #7)
> Wait, we're just saving some 175 malloc calls? Isn't that a very tiny fraction
> of the total number of malloc calls?

Certainly, yes.  That datum (# of calls to malloc during start-stop test) is meant to be *representative* of how wisely the PLArena code collects memory from the malloc subsystem.  Memory allocated by PLArena code accounts for a significant fraction of Firefox's resident memory size.  Memory allocated by PLArena code also accounts for a great deal of Firefox memory allocation activity.  So, reducing the amount of memory consumed by PLArena code is good, and so is reducing the time required to allocate it.

Reducing the number of calls to malloc and reducing the number of distinct mallocation sizes are two things that can help reduce fragmentation in the memory managed by the malloc subsystem.  So, it is good to avoid unnecessary calls to malloc, for many reasons.

Note that the reduction in # of calls to malloc during a run of Tdhtml reported in comment #5 was 13%.  This corresponds to thousands of calls to malloc.

The intention of that datum is to show that we can allocate the same number of objects with less memory and with fewer malloc calls.  It's a win-win.

I certainly wasn't trying to suggest that removing 175 malloc calls in a sea of millions was an accomplishment.

> I'm worried about increased memory usage here. For example the increase to
> kNodeInfoPoolInitialSize seems excessive, what was that based on?

Remember: I measured the memory usage.  It is reduced by each of the changes in this patch.  That is true for both the 'start-stop' test and Tdhtml.  (Though, in Tdhtml, I only measured the effects of the changes all together, not one at a time.)

In the case of kNodeInfoPoolInitialSize, I increased it by a factor of 4, to about 8KB from about 2KB.  Here is all the info about pools named "NodeInfo Pool" from stats collected from a start-stop test:
      NodeInfo Pool
         Pool at address 22d39a70
            name: NodeInfo Pool
            arenaSize: 2060
            arenas: 13 (27079,33280)
               "350d800" (2083,2560)
               "3378400" (2083,2560)
               "32e3800" (2083,2560)
               "329b600" (2083,2560)
               "32ddc00" (2083,2560)
               "32ae800" (2083,2560)
               "3266400" (2083,2560)
               "32a6600" (2083,2560)
               "3277600" (2083,2560)
               "323b000" (2083,2560)
               "323a600" (2083,2560)
               "3187600" (2083,2560)
               "30f8400" (2083,2560)

The pairs (x,y) indicate that an allocation from malloc(x) occupies y bytes.  There is waste in each of the above-reported allocations.  Having fewer such allocations would reduce the waste.

[kNodeInfoPoolInitialSize is not well-named, because that value is not an "initial pool size", it is a "gulp size" by which PLArenaPools grow.  (It is 'PLArena.arenasize' in the code.)  Every time the NodeInfo Pool grows, it is by that much.  The PLArena code "gulps" kNodeInfoPoolInitialSize bytes from the malloc subsystem.  That's what those (x,y) values are, above.]

My logic was that increasing the "gulp size" for NodeInfo Pool would reduce the amount of malloc subsystem activity caused by allocations from that pool.  Worst case wastage is increased (by 6K), but typical wastage (as exemplified by the high water mark usage in the start-stop test) is decreased.

Similar logic applied to other pools.  Of course, I don't expect my argument to stand on its own - that's why I measured the effects of the changes.  I found them all positive.

What makes you think that the increase is excessive?
Sorry, there's a bit of information overload here, and some of it doesn't seem to be entirely accurate which is making me worried. Although I'm by no means an expert on memory allocation, so please do correct me if I'm wrong.

(In reply to comment #8)
> (In reply to comment #7)
> > Wait, we're just saving some 175 malloc calls? Isn't that a very tiny 
> > fraction of the total number of malloc calls?
> 
> Certainly, yes.  That datum (# of calls to malloc during start-stop test) is
> meant to be *representative* of how wisely the PLArena code collects memory
> from the malloc subsystem.

But if startup is representative then the patch doesn't seem very effective at reducing the number of calls to malloc, since we're just cutting out 175 mallocs out of several thousands.

But maybe there are tests that are showing better results.

> Memory allocated by PLArena code accounts for a
> significant fraction of Firefox's resident memory size.  Memory allocated by
> PLArena code also accounts for a great deal of Firefox memory allocation
> activity.  So, reducing the amount of memory consumed by PLArena code is good,
> and so is reducing the time required to allocate it.

But doesn't your patch increase the amount of memory consumed by PLArena since you're increasing the sizes everywhere? Though you are reducing the time spent allocating since you're allocating in larger chunks.

> Reducing the number of calls to malloc and reducing the number of distinct
> mallocation sizes are two things that can help reduce fragmentation in the
> memory managed by the malloc subsystem.  So, it is good to avoid unnecessary
> calls to malloc, for many reasons.

Not sure what you mean by "number of distinct mallocation sizes" here.

> Note that the reduction in # of calls to malloc during a run of Tdhtml 
> reported in comment #5 was 13%.  This corresponds to thousands of calls to
> malloc.

Now we're getting to the juicy parts. These are the numbers we're interested in. Though it sounds like you're saying that bug 419131 has gotten us most of this advantage already?

> The intention of that datum is to show that we can allocate the same number of
> objects with less memory and with fewer malloc calls.  It's a win-win.

How is this patch using "less memory"?


> In the case of kNodeInfoPoolInitialSize, I increased it by a factor of 4, to
> about 8KB from about 2KB.  Here is all the info about pools named "NodeInfo
> Pool" from stats collected from a start-stop test:
>       NodeInfo Pool
>          Pool at address 22d39a70
>             name: NodeInfo Pool
>             arenaSize: 2060
>             arenas: 13 (27079,33280)
>                "350d800" (2083,2560)
>                "3378400" (2083,2560)
>                "32e3800" (2083,2560)
>                "329b600" (2083,2560)
>                "32ddc00" (2083,2560)
>                "32ae800" (2083,2560)
>                "3266400" (2083,2560)
>                "32a6600" (2083,2560)
>                "3277600" (2083,2560)
>                "323b000" (2083,2560)
>                "323a600" (2083,2560)
>                "3187600" (2083,2560)
>                "30f8400" (2083,2560)

I'm not really following what this data means. Does it mean that we have a single nodeinfo pool which consists of 13 areanas, all 2083 bytes big? Was this really typical for nodeinfo pools, or was it the biggest one or the smallest one?

Also, where does the difference between x and y come from? malloc overhead or arena overhead? Or both?


> My logic was that increasing the "gulp size" for NodeInfo Pool would reduce 
> the amount of malloc subsystem activity caused by allocations from that pool. 
> Worst case wastage is increased (by 6K), but typical wastage (as exemplified 
> by the high water mark usage in the start-stop test) is decreased.

Ah, so the idea is that each arena has some overhead, so increasing the size of the arena will reduce the number of arenas and thus the amount of overhead?

> What makes you think that the increase is excessive?

I would expect the number of nodeinfos stored in a single arena to be in the order 20 or so on average. I counted the number of nodenames in browser.xul and found around 30, and that should be an unusually high number. So I'm very surprised that increasing the "gulp size" to anything over 64 will be anything but waste of memory. And the fact that we would need 13 2k arenas for a single nodeinfo pool seems to indicate some very serious bugs if it's in fact true.
Actually, i guess that during startup we do create some unusual nodeinfo patterns due to the way we do fastloading of attribute names. This shouldn't be representative of how we do past startup though.
(In reply to comment #9)
> Sorry, there's a bit of information overload here, and some of it doesn't seem
> to be entirely accurate which is making me worried. Although I'm by no means an
> expert on memory allocation, so please do correct me if I'm wrong.

Yes, there's a lot to digest, here.  The patch was meant to prompt your questions.  If the information still seems inaccurate after my reply, please ask more questions.

> (In reply to comment #8)
> But if startup is representative then the patch doesn't seem very effective at
> reducing the number of calls to malloc, since we're just cutting out 175
> mallocs out of several thousands.

The goal of the patch is not to reduce calls to malloc throughout all of Firefox.  The goal is to improve the behaviour of the PLArena-related code.  Other patches improve the behaviour in other ways; this one does it by tweaking the calling parameters for various users.

> But doesn't your patch increase the amount of memory consumed by PLArena since
> you're increasing the sizes everywhere? Though you are reducing the time spent
> allocating since you're allocating in larger chunks.

(I think you answer this, yourself, correctly, below.)  Each "gulp" (arena) that the PLArena code takes from the malloc subsystem has an associated overhead.  The more gulps, the more overhead.  By taking fewer gulps, we can reduce the overhead, and thus reduce the memory consumption.

I'm not increasing the total memory consumed, just the gulp sizes.

> > Reducing the number of calls to malloc and reducing the number of distinct
> > mallocation sizes are two things that can help reduce fragmentation in the
> > memory managed by the malloc subsystem.  So, it is good to avoid unnecessary
> > calls to malloc, for many reasons.
> Not sure what you mean by "number of distinct mallocation sizes" here.

A "mallocation" is a memory allocation by use of malloc().

The degree of memory fragmentation is affected by the sizes of the mallocations.  Certain sizes are easier for the malloc subsystem to allocate efficiently.  For example, 10 allocations of size 4096 are allocated with less fragmentation than the 10 allocations of sizes 8190, 2, 2, 8190, 4095, 4097, 4097, 4095, 2047, 6145.  This is true even though exactly the same total memory is allocated in both sets of 10.

My comment #8 meant to express that the malloc subsystem will be less likely to fragment with the new choices for sizes that I provide in this patch.  

> > Note that the reduction in # of calls to malloc during a run of Tdhtml 
> > reported in comment #5 was 13%.  This corresponds to thousands of calls to
> > malloc.
> Now we're getting to the juicy parts. These are the numbers we're interested
> in. Though it sounds like you're saying that bug 419131 has gotten us most of
> this advantage already?

That bug achieves some of the same effects as this one.  I don't mean to say that only one of the two bugs should be fixed.  Both patches have positive effects, even when the other is also applied.

> > The intention of that datum is to show that we can allocate the same number of
> > objects with less memory and with fewer malloc calls.  It's a win-win.
> How is this patch using "less memory"?

You answer this below.

> I'm not really following what this data means. Does it mean that we have a
> single nodeinfo pool which consists of 13 areanas, all 2083 bytes big? Was this
> really typical for nodeinfo pools, or was it the biggest one or the smallest
> one?

It means we have a single "NodeInfo Pool" which consists of 13 arenas, each sized 2560 bytes.  Though, only 2083 of those 2560 bytes are used.  (Bug 419131 addresses that problem directly.)

> Also, where does the difference between x and y come from? malloc overhead or
> arena overhead? Or both?

It comes from the malloc overhead.

> > My logic was that increasing the "gulp size" for NodeInfo Pool would reduce 
> > the amount of malloc subsystem activity caused by allocations from that pool. 
> > Worst case wastage is increased (by 6K), but typical wastage (as exemplified 
> > by the high water mark usage in the start-stop test) is decreased.
> Ah, so the idea is that each arena has some overhead, so increasing the size of
> the arena will reduce the number of arenas and thus the amount of overhead?

True.

> > What makes you think that the increase is excessive?
> I would expect the number of nodeinfos stored in a single arena to be in the
> order 20 or so on average. I counted the number of nodenames in browser.xul and
> found around 30, and that should be an unusually high number. So I'm very
> surprised that increasing the "gulp size" to anything over 64 will be anything
> but waste of memory. And the fact that we would need 13 2k arenas for a single
> nodeinfo pool seems to indicate some very serious bugs if it's in fact true.

Given your following comment #10, are you now satisfied that NodeInfo Pool is not incorrectly sized?

[Note that NodeInfo Pool is not an important memory allocator - it doesn't consume much memory and there's only one such pool.  However, the questions above might well have been asked for a more important (active, bigger) PLArenaPool.]
(In reply to comment #11) 
> Other patches improve the behaviour in other ways; this one does it by tweaking
> the calling parameters for various users.

s/calling parameters/function call arguments/
s/users/callers of PLArena code/
I think one of sicking's main issues is with the assertion that overall memory usage is less even though arena sizes are larger.  One of the main reasons for the improvement here (as with JS Arenas) is not just in "overhead" (by which I mean the cost of headers, pointers linking arenas, etc.), but also in fragmentation.  For situations where arenas are too small to suit their overall usage pattern, we end up leaving large chunks of the end of each arena unused, which contributes to fragmentation and yields more and larger swaths of unused and unusable memory.
(In reply to comment #13)
> For situations where arenas are too small to suit their overall
> usage pattern, we end up leaving large chunks of the end of each arena unused,
> which contributes to fragmentation and yields more and larger swaths of unused
> and unusable memory.

Brian points out another way in which larger arenas save memory.  I'll illustrate with an example, to help clarify the kind of "internal fragmentation" that he's talking about.

Suppose that arenas are of size 9 units, and that a user's allocations are of size 2 units.  The user makes 4 such allocations, totaling 8 units.  The 5th allocation causes another arena (of size 9) to be allocated, because the allocation does not fit in the first arena.  1 memory unit is left unused at the end of the that arena.

After the user has made 16 allocations, 4 9-unit arenas are allocated, occupying 36 units of memory.  The user has only allocated a total of 32 units.  Yet, no more 2-unit allocations can be made without allocating another arena.  There are 4 units of memory wasted (4/36 = 11% internal fragmentation).

Suppose that, in that scenario, the size of the arenas were doubled.  The arenas are now of size 18. After 16 allocations (of size 2), 9 allocations fill up one arena, and 7 allocations are in a second arena.  But, there's room for 2 more!

With the larger arenas, we get 2 extra 2-unit allocations using the same 36 units of memory.  This is because we reduced the "internal fragmentation" by increasing the arena size.

In general, increasing the arena size by a factor will reduce the internal fragmentation.  This is in addition to the other benefits that I describe in prior comments.
Ok, i see what the overall idea with the patch is, though I'm still curious to know how you reached these specific sizes.

It still seems wierd to me that the nodeinfo pools get that big often enough even during startup. And in any case it's definitely not the size we want after startup since usually we'll just stick around 20 nodeinfos in there, so even 64 is arguably a waste of memory, and 256 definitely is.

I'm worried that this holds true even for the other pools you are increasing in size.
(In reply to comment #15)
> Ok, i see what the overall idea with the patch is, though I'm still curious to
> know how you reached these specific sizes.

For the most part, I simply tried to arrange that the arena sizes were multiples of 4096 bytes.  In some cases, this did not give a performance improvement.  I did not include the change in the patch, in those cases.  Please ask about a particular pool about which you are curious, and I will give my logic with the stats to accompany it.

> It still seems wierd to me that the nodeinfo pools get that big often enough
> even during startup. And in any case it's definitely not the size we want after
> startup since usually we'll just stick around 20 nodeinfos in there, so even 64
> is arguably a waste of memory, and 256 definitely is.

Okay.  It sounds like the NodeInfo Pool is not behaving as you expect it to.  We should address that.  What is the NodeInfo Pool used for, and how big do you expect it to get?  What are its growth patterns?

NodeInfo arenas are a little bit abused at startup, so their usage isn't representative then and I don't know off the top of my head how large I would expect it to be.

But during the rest of the app we just store names of nodes and namespaced attributes in the nodeinfo area. So you'd get one entry for every unique name of a webpage. So if a page contains 1 <html>, 1 <body>, 1 <head>, 1 <title>, 5 <div>, 3 <p>, and 10 <span>, you'd get 7 entries. Plus 5 or so special ones. I just counted and on this bugzilla page there are 32 different element names, and so should have less than 40 nodeinfo entries in this documents.

I very strongly doubt that there is a bug somewhere causing us to have more than expected nodeinfos. I think what's skewing your numbers is the special startup code, which I don't think we should be optimizing for.

I'll look further at the other arenas to see if I have any specific questions there.
(In reply to comment #17) 
> I very strongly doubt that there is a bug somewhere causing us to have more
> than expected nodeinfos. I think what's skewing your numbers is the special
> startup code, which I don't think we should be optimizing for.

Just to be clear (and I'm not sure if you're suggesting otherwise), we are not "optimizing for startup code".

The start-stop test represents the minimum memory consumption we expect the browser to have.  (Not strictly true, but we need a 'representative' test of some sort.)  The *most* memory consumed at any one given time during the start-stop test is the high water mark.  It is for that high water mark of PLArenaPool memory consumption that I optimized.

In this patch, I tried to increase the arena sizes ("gulps") as much as possible without negatively impacting the high water mark.  The idea was to improve speed and efficiency of the memory allocations.  As a bonus, the patch improves memory footprint also.

So, I used startup code in my optimizations, but didn't really optimize "for" startup code.  I optimized for speed, while avoiding memory footprint increases.

Remember: arena sizes are increased, but footprint is decreased.  See comments #4 and #5.
So I looked at the change to nsElementMap.cpp. You are increasing the "gulp size" to 4k. Looks like we're only storing ContentListItems in it, which are 8 bytes big, meaning we allocate room for 500 items at a time. And we store one item for each node with an 'id' or 'ref'. Do we really usually have close to 500 such elements per document? Or how did you arrive at the 4k number?

Sorry, still trying to understand how you've reached the numbers in the patch since they are so much higher than what's in the code currently.
> In this patch, I tried to increase the arena sizes ("gulps") as much as
> possible without negatively impacting the high water mark.  The idea was to
> improve speed and efficiency of the memory allocations.  As a bonus, the patch
> improves memory footprint also.

Ah, this is the answer I was looking for. So the numbers aren't based on actual measurements of how much went into the various instances of the various pools, but simply trying to adjust the numbers upwards, until memory usage started increasing?

I guess this strategy is fine if we're trying to optimize for speed, though I'm slightly worried about using startup/shutdown as testcase in such a strategy, as memory consumption during normal browsing is more important to look out for. More below.

> So, I used startup code in my optimizations, but didn't really optimize "for"
> startup code.  I optimized for speed, while avoiding memory footprint
> increases.

Yeah, I'm sure you are not intentionally making things faster during startup at the cost of normal usage. However, when during performance optimizations it's important to use "average" usage as your testcase as in many cases performance (memory as well as CPU) optimizations optimizing one testcase will make other testcases worse.

For example, by increasing the "gulp size" you are reducing the amount of overhead from headers, but you are potentially increasing the amount of overhead from allocating more memory than needed.

In your example in comment 8, you changed the gulp size from 2560 bytes to 10240 bytes. Assuming that the overhead per gulp is about the same, i.e. 500 bytes (which sounds insanely high btw, that has to be more than just malloc overhead), then any use of less than 6k will result in more memory wasted due to too much memory being allocated, whereas any use of more than 40k will result in less memory used. And allocations between 6k and 40k depending the exact amount of memory allocated, with higher numbers more likely to better after your patch.

So it's definitely the case that just looking at memory usage it's very important to use representative allocations when determining the gulp size. And I'm a little worried that startup/shutdown is not going to give representative numbers.

But you are right that performance is likely most of the time better with bigger gulp sizes as we'll hit malloc more rarely.


I guess I'd be fine with simply checking this patch in, minus the nodeinfo change i'd say as I'm relatively sure that one is wrong during normal browsing, and see what the talos numbers look like.
(In reply to comment #19)
> So I looked at the change to nsElementMap.cpp. You are increasing the "gulp
> size" to 4k. Looks like we're only storing ContentListItems in it, which are 8
> bytes big, meaning we allocate room for 500 items at a time. And we store one
> item for each node with an 'id' or 'ref'. Do we really usually have close to
> 500 such elements per document? Or how did you arrive at the 4k number?
> 
> Sorry, still trying to understand how you've reached the numbers in the patch
> since they are so much higher than what's in the code currently.

Before my changes, in the start-stop test, at the high water mark, there were 2 nsElementMap pools, with the following stats:
      nsElementMap
         Pool: 30ac808
            name: nsElementMap
            arenaSize: 536
            arenas: (5590,10240)
               "31bd400" (559,1024)
               "31bd000" (559,1024)
               "31bcc00" (559,1024)
               "31bc800" (559,1024)
               "31bc000" (559,1024)
               "31bb800" (559,1024)
               "31bb400" (559,1024)
               "31bb000" (559,1024)
               "31aec00" (559,1024)
               "30aba00" (559,1024)
         Pool: 3240a08
            name: nsElementMap
            arenaSize: 536
            arenas: (11180,20480)
               "3272a00" (559,1024)
               "3261400" (559,1024)
               "3261000" (559,1024)
               "3260c00" (559,1024)
               "3255c00" (559,1024)
               "31c9200" (559,1024)
               "31c8e00" (559,1024)
               "31c8a00" (559,1024)
               "3252a00" (559,1024)
               "3255000" (559,1024)
               "3254c00" (559,1024)
               "3254800" (559,1024)
               "30b9000" (559,1024)
               "30b8c00" (559,1024)
               "3252600" (559,1024)
               "3252200" (559,1024)
               "323ee00" (559,1024)
               "323ea00" (559,1024)
               "323e600" (559,1024)
               "3239800" (559,1024)

This pool is one which definitely begs for a larger arenasize.  Currently, you can see that the arenasize is set to 536 bytes.  This results in mallocations of 559 bytes (536+overhead from PLArena).  Each of these allocations actually use 1024 bytes.  The extra bytes are wasted - that's internal fragmentation due to the malloc subsystem.

The first pool has 10 arenas, the second one 20.  The arenas are given with their addresses and (x,y) values which correspond to memory use of y bytes from x-byte requests.  y-x bytes are wasted in each case.

One can see that the first pool "wants" about 5K of memory, and the second one "wants" about 11K.  Instead, they're using 10K and 20K respectively.

After my change, the stats for the nsElementMap at the high water mark are:
      nsElementMap
         Pool: 3087208
            name: nsElementMap
            arenaSize: 4120
            arenas: (8286,9216)
               "31bde00" (4143,4608)
               "3148200" (4143,4608)
         Pool: 3239c08
            name: nsElementMap
            arenaSize: 4120
            arenas: (12429,13824)
               "3260200" (4143,4608)
               "3254e00" (4143,4608)
               "3239e00" (4143,4608)

Now, the first pool has the 5K that it wanted (and 3K more), and the second pool has the 11K that it wanted (and 1K more).  The actual memory consumed is 20K - a savings of 10K, and the buffers have memory still unused.

(Bug 419131 is for reducing the y-x waste further.)

So, in the start-stop test, increasing the arenasize for nsElementMap is clearly a good idea.  But, what about in Tdhtml?  As it happens, the high water mark has the same size numbers for nsElementMap as in the start-stop test.  So, increasing the arenasize for nsElementMap is also good for Tdhtml.  I extrapolated that increasing the arenasize for nsElementMap is always a good idea, so I increased it.
(In reply to comment #20)
> So the numbers aren't based on actual
> measurements of how much went into the various instances of the various pools,
> but simply trying to adjust the numbers upwards, until memory usage started
> increasing?

No.  The numbers *are* based on actual measurements of how much went into the various instances of the various pools.  Given those numbers, I increased the arenasizes upwards until I became worried about memory overcomsumption by the pools (individually).  I considered each pool one at a time; I did *not* simply try to improve a global memory consumption quantity.

That is, each pool is individually improved by my patch, not merely memory consumption as a whole.

> I guess this strategy is fine if we're trying to optimize for speed, though I'm
> slightly worried about using startup/shutdown as testcase in such a strategy,
> as memory consumption during normal browsing is more important to look out for.
> More below.

I could have optimized for "normal browsing".  However, then, my patch would be subject to criticism about possibly increasing the memory consumption on "less than normal browsing".  So, I played it safe, and used the start-stop test.

> Yeah, I'm sure you are not intentionally making things faster during startup at
> the cost of normal usage. However, when during performance optimizations it's
> important to use "average" usage as your testcase as in many cases performance
> (memory as well as CPU) optimizations optimizing one testcase will make other
> testcases worse.

Sometimes, it is not good to use the average case.  Sometimes, one must worry about the worst case or some such.

I am aware that paying attention to only one test case while optimizing might be detrimental to some other ignored test case.  For this patch, if you know of a significant test case which is worsened by my patch, please let me know.

> For example, by increasing the "gulp size" you are reducing the amount of
> overhead from headers, but you are potentially increasing the amount of
> overhead from allocating more memory than needed.

I am keenly aware of this.  That is one of the reasons that I have made so many measurements - it is important to measure all of those quantities, to ensure that they are improved by the patch.

> In your example in comment 8, you changed the gulp size from 2560 bytes to
> 10240 bytes. Assuming that the overhead per gulp is about the same, i.e. 500
> bytes (which sounds insanely high btw, that has to be more than just malloc
> overhead), then any use of less than 6k will result in more memory wasted due
> to too much memory being allocated, whereas any use of more than 40k will
> result in less memory used. And allocations between 6k and 40k depending the
> exact amount of memory allocated, with higher numbers more likely to better
> after your patch.

That's not the best viewpoint.  In the > 6K case, for the large arenasize, the memory that you count as wasted is not wasted because it is "in the arena" and available for use.  In the > 6K case, for the small arenasize, the memory is not in the arena, not available for use, and really is wasted. 

> So it's definitely the case that just looking at memory usage it's very
> important to use representative allocations when determining the gulp size. And
> I'm a little worried that startup/shutdown is not going to give representative
> numbers.

I am willing to consider a test that is more representative of typical memory use.

I invented my own.  I opened the browser, read 1 message in my Gmail account, and closed my browser.  I call that the 'gmail' test.  The gmail test showed excellent results - better than the start-stop test.

> I guess I'd be fine with simply checking this patch in, minus the nodeinfo
> change i'd say as I'm relatively sure that one is wrong during normal browsing,
> and see what the talos numbers look like.

I'll further investigate the NodeInfo Pool numbers, and submit a new patch.

(In reply to comment #22)
> I am willing to consider a test that is more representative of typical memory
> use.

I meant to add: Please do suggest a workload that is representative of typical memory use.
> > In your example in comment 8, you changed the gulp size from 2560 bytes to
> > 10240 bytes. Assuming that the overhead per gulp is about the same, i.e. 500
> > bytes (which sounds insanely high btw, that has to be more than just malloc
> > overhead), then any use of less than 6k will result in more memory wasted 
> > due to too much memory being allocated, whereas any use of more than 40k 
> > will result in less memory used. And allocations between 6k and 40k 
> > depending the exact amount of memory allocated, with higher numbers more 
> > likely to better after your patch.
> 
> That's not the best viewpoint.  In the > 6K case, for the large arenasize, the
> memory that you count as wasted is not wasted because it is "in the arena" and
> available for use.  In the > 6K case, for the small arenasize, the memory is
> not in the arena, not available for use, and really is wasted. 

What do you mean by "best viewpoint". If we only need to store 2k of information in the PLArena, then the remaining allocated 6k *is* wasted, it won't be used by anyone.

I'm sort of thinking that the best strategy here is to get bug 419131 fixed first since that will take care of the waste due to not being able to use all memory that malloc has actually allocated.

Once that is done we can make much fairer comparisons for what gulp sizes are optimal based purely on PLArena overhead, fragmentation, and reducing number of mallocations.
(In reply to comment #24)
> > > In your example in comment 8, you changed the gulp size from 2560 bytes to
> > > 10240 bytes. Assuming that the overhead per gulp is about the same, i.e. 500
> > > bytes (which sounds insanely high btw, that has to be more than just malloc
> > > overhead), then any use of less than 6k will result in more memory wasted 
> > > due to too much memory being allocated, whereas any use of more than 40k 
> > > will result in less memory used. And allocations between 6k and 40k 
> > > depending the exact amount of memory allocated, with higher numbers more 
> > > likely to better after your patch.
> > 
> > That's not the best viewpoint.  In the > 6K case, for the large arenasize, the
> > memory that you count as wasted is not wasted because it is "in the arena" and
> > available for use.  In the > 6K case, for the small arenasize, the memory is
> > not in the arena, not available for use, and really is wasted. 
> 
> What do you mean by "best viewpoint". If we only need to store 2k of
> information in the PLArena, then the remaining allocated 6k *is* wasted, it
> won't be used by anyone.

I mean that what you wrote was not untrue, but that there was a better way to view the situation.  In particular, I would suggest that the memory inside the arena not be viewed as wasted.  Only the mallocated memory that is not inside the arena should be viewed as wasted.  The memory inside the arena could potentially be used by future allocations.  The memory inside the arena is "unused" but not "wasted".

> I'm sort of thinking that the best strategy here is to get bug 419131 fixed
> first since that will take care of the waste due to not being able to use all
> memory that malloc has actually allocated.
> 
> Once that is done we can make much fairer comparisons for what gulp sizes are
> optimal based purely on PLArena overhead, fragmentation, and reducing number of
> mallocations.

Yes, that's a good idea.  419131 will lessen the positive impact of this bug, but changes similar to my most recent patch for this bug will still be beneficial.

However, 419131 is already "fixed" by its most recent patch.  Any new patches for that bug won't change how it solves that bug's fundamental problem.  The only thing new patches for 419131 will change is how NSPR is built with the new feature.  So, we need not wait.

I have already tried tuning on top of 419131.  The numbers in the most recent patch for this bug also worked well with 419131's most recent patch applied.  It was only the impact numbers (measurements of benefit) that were less (as I predicted in comment #5, above).

Note that one should expect future bugs which do the same sort of tuning as this bug does.  That's because, as Firefox changes, so will the optimal numbers for arena sizes.  So, we are not committing to too much by applying my patch.

[Also, if anyone has an idea of a better workload against which to optimize, then please suggest it.]
> > What do you mean by "best viewpoint". If we only need to store 2k of
> > information in the PLArena, then the remaining allocated 6k *is* wasted, it
> > won't be used by anyone.
> 
> I mean that what you wrote was not untrue, but that there was a better way to
> view the situation.  In particular, I would suggest that the memory inside the
> arena not be viewed as wasted.  Only the mallocated memory that is not inside
> the arena should be viewed as wasted.  The memory inside the arena could
> potentially be used by future allocations.  The memory inside the arena is
> "unused" but not "wasted".

I very strongly disagree memory that is allocated never going to get used *is* wasted. It definitely concerns me that this is the strategy that has been used when tuning these numbers, but it does explain why they are as high as they are.

I think for long-lived arenas that vary in size, we generally want to use a gulp-size that is about half or a third of the typical total size needed. This way we can shrink when less than typical size is used, and we won't go over too much if slightly more than typical is used.

Of course this aren't any hard numbers. The less the size varies, the more it makes sense set the gulp-size to be that of the expected size. And the more short lived the arena is, the more it makes sense to optimize for speed and use larger gulp-size.

I'd say a good test case would be to start the browser, open 5 or so web pages and shut down. (the more pages you open the better, but of course it takes more time to test)

> Yes, that's a good idea.  419131 will lessen the positive impact of this bug,
> but changes similar to my most recent patch for this bug will still be
> beneficial.
> 
> However, 419131 is already "fixed" by its most recent patch.  Any new patches
> for that bug won't change how it solves that bug's fundamental problem.  The
> only thing new patches for 419131 will change is how NSPR is built with the
> new feature.  So, we need not wait.

Yay, that is excellent!

Sorry I'm such a pain about reviewing this. I just want to make sure that we're optimizing for the right things, under the right conditions. Especially since you are increasing some of the numbers so much.

But the numbers from comment 5 sounds great, so it's definitely looking like tweaking these numbers can yield awesome improvements.
> I very strongly disagree memory that is allocated never going to get used *is*
> wasted.

Sorry, that should say

I very strongly disagree, memory that is allocated never going to get used *is*
wasted.
Comment on attachment 309043 [details] [diff] [review]
v1, alters a bunch of callers of PLArena code

Clearing old review request. If anyone think that we still want to use this patch feel free to rerequest review. I'm personally still concerned about the optimization strategy used in this patch though.
Jonas, my comments above explain a great deal about the optimization strategy used, why it is the right one, and why the numbers are what they are.  If I have failed to communicate any of this, please let me know what I need to explain.

Remember: each of these tweakings, individually, decreases high water mark memory use.  Putting them all together also decreases high water mark memory use (they don't somehow conflict).  Further, the number of calls to memory allocation code decreases.  It's win-win.

I would be glad to re-run the tweaking process, to make sure that the numbers in this patch are still the right ones.  The original tweaking was done more than 1 year and a half ago.  But, I'm only willing to do that if the patch is going to be accepted.  I understand that to accept this patch, you must accept the methodology, which I am willing to further explain.
Some Arena size optimization happened a while back. Nothing more will come from this bug.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Attachment #309043 - Flags: superreview?(pavlov)
You need to log in before you can comment on or make changes to this bug.