Efficiency tweak & fix wrong comments for nsVoidArray

RESOLVED FIXED in mozilla0.9.4

Status

()

defect
P2
normal
RESOLVED FIXED
18 years ago
18 years ago

People

(Reporter: jesup, Assigned: jesup)

Tracking

({memory-footprint, perf})

Trunk
mozilla0.9.4
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(19 attachments)

7.62 KB, patch
Details | Diff | Splinter Review
70.81 KB, patch
Details | Diff | Splinter Review
79.14 KB, patch
Details | Diff | Splinter Review
78.28 KB, patch
Details | Diff | Splinter Review
28.21 KB, patch
Details | Diff | Splinter Review
79.56 KB, patch
Details | Diff | Splinter Review
79.70 KB, patch
Details | Diff | Splinter Review
77.48 KB, patch
Details | Diff | Splinter Review
38.82 KB, patch
Details | Diff | Splinter Review
39.36 KB, patch
Details | Diff | Splinter Review
2.33 KB, text/plain
Details
23.67 KB, patch
Details | Diff | Splinter Review
2.36 KB, text/plain
Details
60.51 KB, patch
Details | Diff | Splinter Review
59.62 KB, patch
Details | Diff | Splinter Review
59.19 KB, patch
Details | Diff | Splinter Review
59.19 KB, patch
Details | Diff | Splinter Review
46.03 KB, patch
Details | Diff | Splinter Review
68.97 KB, patch
Details | Diff | Splinter Review
nsVoidArray.cpp is rather inefficient in how it sets up the Impl structure;
there's lots of duplicated code.  Also, the comments about how the Impl
structure works are just plain wrong (probably they date to before a major
change).

I also (and perhaps more controversially) patched the allocator for slightly
larger increases for small arrays (8 instead of 4), and moved the point where it
starts doubling to 32 (from 16), and modified the code to try to keep
allocations right on a power-of-two in size instead of often just over a
power-of-two - for larger allocations, this is far less inefficient of memory. 
With 'binned' allocators (such as the BSD allocator), there's an improvement for
mid-to-small-size allocations as well.

Patch with improvements (to be) attached.  I'm more than willing to attach a
patch without the changes to the allocation sizes.  I have tested these with no
problems.
Added a patch.  I've verified this using dmalloc (an allocation
checker/fencepost tool I have working with mozilla in my tree; to be released -
see www.dmalloc.org).

This should reduce the number of allocation calls (some), and reduce the amount
of wasted memory (some to a lot, depends on OS allocator).  It also simplifies
the code and corrects incorrect comments.
Keywords: footprint, patch, perf
Blocks: 67618

Comment 3

18 years ago
The casting between char[] and Impl* is quite ugly, IMHO.

There is an easy way to see if a number is a pure 2^x in which case you don't 
have to count bits and that is:

if ((x & (x-1)) == 0) // x = 2^n for some n

This seems like a typo, kLinearTreshold is a count that has nothing to do with 
pointer size:
+static const PRInt32 kLinearThreshold = 32 * sizeof(void *);
I want to round up to the next 2**n, so n&(n-1) doesn't work.

I'll look at the casting, however I think it was all there already.

I changed kLinearThreshold to be a byte threshold instead of a count threshold;
that was intentional (note the comparison changed too).

Here's something I added to bug 67618 I just found out:

:I found a lot of the reason for all the nsVoidArray::InsertElementAt
:allocations is the CSS code.  It uses lots of nsVoidArrays which are then 
:immediately appended to.  I'm going to look at either (a) switching some of 
:them to nsAutoVoidArrays, and/or (b) adding a constructor that takes a single 
:array element to be inserted at construction time.  (a) probably makes more
:sense.

You see thousands of void arrays going from size 0 (no mImpl) to size 8; almost
none going larger.
Adding hyatt because this impacts on his CSS optimizations possibly.
I checked; the NS_REINTERPRET_CAST()s were all there before (essentially).  I
just moved them.

Comment 7

18 years ago
I was thinking something like:

if ((oldSize & (oldSize-1)) == 0) // oldSize = 2^n for some n
    newSize = oldSize < 2;
else // count bits and stuff.

I'm applying your patch now and will try to get some quantify numbers before I 
fall asleep.

Comment 8

18 years ago
You probably want to update the comment "Grow by kGrowArrayBy slots if we're 
smaller than kLinearThreshold elements" if kLinearTreshold no longer is a count 
of elements.

Comment 9

18 years ago
Quantify numbers for a startup: 
                                  Before patch            After patch
Allocations by InsertElementAt     5769  (8,25ms)          4686 (27,6ms)
Memory freed by InsertElementAt    1268  (1,25ms)           181 (0,25ms)
Memory freed by destructor         1074  (1,37ms)          1078 (1,29ms)

That is 1000 reallocations fewer but the times are scary. I don't know if the 
memory handling times (which are clock timed by Quantify) should be ignored. 
The timing could be dependent on so much else. The state of the heap and the 
load of the computer and thousands of small things. 

I also don't know about memory impact. I guess you could add some statistics to 
keep track of wasted memory so that it too can be compared?

Comment 10

18 years ago
I did a new run and now the allocation time (which was 27ms above) was 9,21ms so 
the number in the twenties was something odd I think. So if the memory waste 
isn't too big, I would say go for the patch (with minor adjustments as mentioned 
earlier).

Comment 11

18 years ago
Btw, you might want to look at bug 63473 and bug 29873 if you are changing 
things in nsVoidArray.

More numbers. Total allocation CPU time during startup was before the change 866 
ms and after the change 835 ms. This is despite that there were 2% more 
allocations afterwards (for some reason). That can just be noise, but the 
numbers are at least on the right side.

I think the first bug number is wrong...

Thanks for all the testing!!  It's nice to see some confirmation.  I suspect
part of the improvement might be when we do allocate large blocks, they're
not wasting as much memory.

The BIG improvement will be reducing the number of 0 items -> 8 items
allocations in the CSS code.

I'll look at not counting bits if we don't need to, but that code doesn't get
invoked all that often, and it's fast.
Keywords: mozilla0.9.3

Comment 13

18 years ago
Right, it should have been bug 63743.
While I'm at it, I'll fix bug 63743 to.  I'll put up a new patch probably
monday, as well as some patches for CSS (perhaps) to minimize the number of
small allocations done.
Blocks: 91017
I've found that there are a LOT of places where we use nsVoidArray and
immediately append an element to it.  By some judicious conversions to
nsAutoVoidArrays, I've cut quite a few thousand allocations from startup, and
probably added little or no footprint (since the allocations often got rounded
up anyways).  I'm still working on it.
Well, I have the number of nsVoidArray mImpl allocations down from around 15000
to about 2500.  I _think_ the amount of VSS should be similar or unchanged.  I'm
going to need someone to test timing (and VM usage if possible, plus maybe
page-load times), since I can't easily do so on my system.  Daniel?  Dave Hyatt?

Comment 17

18 years ago
Sounds promising. Can jrgm do the perf tests you want?
Current results:
Start browser (to blank window), close with X box, in allocations of the
array portion (mImpl) of nsVoidArray:

                                0.9.2      my version
Including registration:         11093          4789
After registration (normal):     6797          1788
Goto CNN, quit (after reg):     10605          3779
Delta for CNN (after start):     3808          1991

(I've been mostly working on stuff touched in startup, so there's probably more
to be had in loading CNN).
Sure.  I'm going to make a patch in the next day or so with all these mods. 
(All are pretty simple other than the changes to nsVoidArray, which are mostly
in the patch already poste, and are still pretty simple.)

Comment 20

18 years ago
Is it possible to calculate the amount of "unnecessary" memory allocated and
number of reallocations done? Such statistics could be used to tweak the
constants even further.

For instance, is there a way for a user to specify wanted start size? If I know
that I will use 5000 slots, it would be bad to let the array reallocate itself
several times while adding stuff to it. 

Btw, does the array ever shrink? If I empty it, but leave the struct alive, will
it then be the same size as when it was as biggest?

Otherwise, I like these changes. I just hope they don't affect bloat.
nsVoidArray has always supported an nsVoidArray::nsVoidArray(PRInt32 aCount), so
you can set an initial size (this is done in a few places).  Also, there's a
::Compact() method which is also used a few places.

I modified ::operator= to Compact() if the array lost more than 50 entries. 
There's no auto-compact for RemoveElementAt(), though, and I'm not sure it's a
good idea - maybe if there's a big drop in size, AND we then add an entry back. 
Not worth the trouble given the usage I see.

nsSupportsArray was really dumb, and always grew by 8 entries - and unlike
nsVoidArrays, lots of nsSupportsArrays have hundreds of entries.  I'm fixing
that too.

I believe this will have minimal to no impact on bloat, with the _possible_
exception of the CSS changes.  Even there, some of them are quite unlikely to
bloat, and others I shied away from because of bloat.  Many things I changed
were basically:
    blah = new nsVoidArray; blah.AppendElement(foo);

Yeah!!!

Updated current results:
Start browser (to blank window), close with X box, in allocations of the
array portion (mImpl) of nsVoidArray:

                                0.9.2      my version   %decrease
Including registration:         11093          3843        65%
After registration (normal):     6797           840        87%
Goto CNN, quit (after reg):     10605          2219        79%
Delta for CNN (after start):     3808          1379        64%

The reduction in after-registration startup allocations is especially
significant.

Time for me to roll all this into a patch someone can try running through some
timings and some memory-usage tests. jrgm?  Daniel?

Comment 23

18 years ago
I can probably do some simple timing tests but I don't expect anything really 
big from it since there is so much else that also does allocations. It's just a 
step in the right direction. The average time for a memory allocation seems to 
be in the order of 10 us so 10000 saved allocations would be max 0.1 s.

How it affects the state of the heap is a little more complicated question and 
it's probably there the greatest wins can be found if you have managed to make 
the allocations better adapted in size. Very difficult to measure though.

Comment 24

18 years ago
If you roll that patch up, I can give it a spin, and I'll ping someone (curt?)
about memory tests. If this does fall into the 0.1 region of wall-clock time
changes though I may not be able to "see" this. I'm just not that accurate :-\
Sorry the patches haven't been posted yet; the fiber-cut that's affecting
east-coast makes it impossible for me to run a cvs diff or checkout.  I have
trouble even fetching a single bug report.

Updated

18 years ago
Blocks: 71814
Whomever added the 71814 dependancy got it wrong I think...

Updated

18 years ago
Blocks: 71874
No longer blocks: 71814
For anyone following this, I'm also switching from new char[...] to
PR_Malloc/etc for the mImpl structure so that when I grow the array, I can use
PR_Realloc.  I eliminated almost all memset's and many memcpy's.

Current figures:
startup blank after reg, quit:           700 (was  6797)
startup then load cnn after reg, quit:  1319 (was 10605)

So I'm around 90% reduction.  Also, nsSupportsArrays now double allocations
sizes instead of growing by 8 linearly (and they're often 500-1000 entries or
more!)

I'm hoping for a noticable tick down in page-load times as well as restart.

I'm taking this bug (surprise)
Assignee: kandrot → rjesup
Anyone who can, please review the patch.  I'll write some more details about it
later, but basically I rewrote nsVoidArray moderately extensively, and changed
many uses of nsVoidArray to nsAutoVoidArray.  I also added some methods to allow
users of nsVoidArray more control on how much to shrink it by.  The changes
outside of nsVoidArray should be generally quite safe.

We should have dramatically less allocations of both nsVoidArray->mImpl
structures and nsSupportsArray mArray structures (which was O(n**2) before, with
n often > 500).  nsVoidArray is much smarter about avoiding copying and
clearing, and some bugs have been quashed in the process (ReplaceElementAt()
could extend the array, but if it was an nsAutoVoidArray the implicit elements
added would be garbage).

Anyone who can run either timing tests (with and without the patch) at startup
or loading a set of URL's, or memory tests (VSS and/or RSS) to test for any
bloat, please give it a whirl.  Whatever tool was used in bug 67618 would be a
good test too, since that was the impetus for me to go after this.  Wherever
possible, please give before and after numbers.  Execution time differences may
be too small to measure with a stopwatch (though I'm hoping).

Note: one frequent source of nsVoidArrays with thousands of elements was
deferred GC cleanup objects for JS, which was then Compact()ed and freed.  I
used a new method (probably poorly named as nsVoidArray::ExpandTo()) to Compact
the nsVoidArray to no smaller than 250 entries, and only if it was more than
512.  There's only one of these arrays, and it gets used on every GC, and it's
always up in the hundreds of entries.

Targeting mozilla0.9.3
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.3

Comment 31

18 years ago
Doesn't compile on Windows. 
j:\moz\mozillaprofile\mozilla\xpcom\ds\nsVoidArray.cpp(451) : error C4716: 
'nsAutoVoidArray::ExpandTo' : must return a value

Comment 32

18 years ago
I looked at the patch. Comments interleaved with patch lines:


+  nsStringArray(PRInt32 aCount);  // initial count of aCount elements set to 
nsnull
+  nsCStringArray(PRInt32 aCount); // initial count of aCount elements set to 
nsnull
Don't understand that comment at all. 

+    mRangeList = (nsVoidArray *) new nsAutoVoidArray();
is the cast necessary? Isn't nsAutoVoidArray already a nsVoidArray?

Several times I see the pattern below:
-    nsVoidArray* vector = GetChildVector();
+    nsAutoVoidArray* vector = GetChildVector();
You don't have to know that it is an nsAutoVoidArray. Treat it as an nsVoidArray 
so you don't have to change code whenever there is a new type of nsVoidArray. 
It's only when creating it you need to know the exact type. If you use casts to 
call the right function you're doing it the wrong way. Make the relevant 
function |virtual| instead. That change will reduce the size of the patch alot 
to.

-    mOrder = new nsVoidArray();
+    mOrder = new nsAutoVoidArray();
How big is the win from allocating a nsAutoVoidArray from the heap instead of an 
nsVoidArray?

+    mCursor(nsnull),
 What is this?


+    gGlobalList = new nsFontNodeArray(32);
+    nodes = new nsFontNodeArray(32);
+    nsFontNodeArray nodes(32);
Magic uncommented constant
Daniel: Thanks for going over the code in detail!

>Doesn't compile on Windows. 
>j:\moz\mozillaprofile\mozilla\xpcom\ds\nsVoidArray.cpp(451) : error C4716: 
>'nsAutoVoidArray::ExpandTo' : must return a value

Sorry; I added a return to ExpandTo late in the game and didn't check
nsAutoVoidArray's version.  Easy fix.

>You don't have to know that it is an nsAutoVoidArray. Treat it as an nsVoidArray
>so you don't have to change code whenever there is a new type of nsVoidArray. 
>It's only when creating it you need to know the exact type. If you use casts to 
>call the right function you're doing it the wrong way. Make the relevant 
>function |virtual| instead. That change will reduce the size of the patch alot 
>to.

Good point; I should go back and clean that up some.  (Much of my serious OO
experience is with purely-dynamic OO systems where you don't need constructs
like virtual.)

>-    mOrder = new nsVoidArray();
>+    mOrder = new nsAutoVoidArray();
>How big is the win from allocating a nsAutoVoidArray from the heap instead of an 
>nsVoidArray?

Roughly 1/2 the overhead if the array never needs more than 8 items, and at some
point has an item in it.  If it never has an item in it, the cpu overhead is the
same.  The win is that you only call malloc (and free) once.

>+    mCursor(nsnull),
> What is this?

That was to fix a bug with initialization of mCursor, if I remember correctly -
when I added dmalloc (which trashes allocations to 0xc5), it went kaboom there.
 I'll try to double-check, but that sounds right.  Note that CSS_IF_COPY doesn't
touch val if aCopy.val isn't set.

>+    gGlobalList = new nsFontNodeArray(32);
>+    nodes = new nsFontNodeArray(32);
>+    nsFontNodeArray nodes(32);
>Magic uncommented constant

Sorry; I was fooling around with values to figure out a good default and didn't
go back to parameterize it.

>+  nsCStringArray(PRInt32 aCount); // initial count of aCount elements set to 
nsnull
>Don't understand that comment at all. 

I think my English deserted me.  Actually, I think someone interrupted me or was
talking to me while I typed and two versions got intermingled.  How about:

+  nsCStringArray(PRInt32 aCount); // Storage for aCount elements will be
pre-allocated

Priority: -- → P2
Hardware: PC → All
I updated the patch.  Daniel's comments should all have been addressed.  I also
implemented dbaron's MoveElement() (I did it separately, and when I looked at
his code it was almost identical) and changed the font cache to use it.

Functions that are overridden by nsAutoVoidArray/etc are now virtual.

I went through the source tree checking calls to ReplaceElementAt and found a
few that should really be AppendElement.

I found by inspection that the font cache arrays don't have many entries except
maybe the main one (which then gets nuked anyways.... see another bug I just
filed) so I just made them nsAutoVoidArrays.  (Except for the main list I never
saw one with more than 5 entries, most had either 1 or 2-3.)

I made one or two more things nsAutoVoidArrays (nsRDFXMLSerializer.cpp).

I also added dbaron's comments to nsQuickSort.h.

(While writing thing I realized I forgot to remove changes from one file; I'll
re-upload the patch as soon as I post this.)

Comment 37

18 years ago
Looks much better. And now it even seem to compile. I will try to do some
measurements later today if I'm home early.

Many lines similar to:
+  j = FillArrayWithAncestors((nsVoidArray *) &array2,aNode2);

You should never have to cast a nsAutoVoidArray pointer to a
nsVoidArray and if you really have to (don't know when that would
be) you shouldn't use a C cast. Use NS_STATIC_CAST instead so that
the compiler will stop obviously illegal casts. If you really have
to do a cast that is "unsafe", use NS_REINTERPRET_CAST.

The best is to always use pointers to nsVoidArrays and then only mention
nsAutoVoidArray when you create the array. The virtual functions will
take care of the rest. That will make it easier to maintain the code and
nobody will have to wonder why for instance the aData in DeleteSheetMap has
to a nsAutoVoidArray instead of any nsVoidArray.

+  nsAutoVoidArray*    mOrder;
Since it's just a pointer, you can just declare it as a nsVoidArray* and let
it point to whatever type of nsVoidArray you need, unless you want to be
perfectly sure that it really is a nsAutoVoidArray.


+
	  // We could use aBuffer.MoveElement(), but it wouldn't be much of
+
	  // a win if any for swapping two elements.

Do we need a nsVoidArray::SwapElements?

Comment 38

18 years ago
It almost compiles:

j:\moz\mozillaprofile\mozilla\mailnews\base\search\src\nsMsgFilterList.cpp(857)
: error C2039: 'MoveElement' : is not a member of 'nsDerivedSafe<class
nsISupportsArray>'
Darn.  Sorry; I usually build with --disable-mailnews and didn't think that I
hadn't added MoveElement to nsSupportsArray (I have now; it makes sense to do so
anyways -- If I was being really pedantic, nsSupportsArray would inherit from
nsVoidArray, or both would inherit from a virtual base class).

As for the casts, I didn't go through every file in the patch to remove them. 
I'll make another look and post another updated patch in a few minutes.  I'm
also adding a sanity check I missed to MoveElement (I was only checking aTo for
being past the end of the array, not aFrom).

If anyone else can run performance test (particularly bloat tests and tests that
log the number of allocations, or ibench and other startup/page-load tests) it
would be useful.  Waterson?  Hyatt? JRGM?
My apologies; I first uploaded a file of only the files changed from the
previous patch by mistake.  The latest patch
(http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43388) is the correct
one and updates all files.
*** Bug 91017 has been marked as a duplicate of this bug. ***
>Do we need a nsVoidArray::SwapElements?

I don't think so; there's only one place in the code I can see that would use
it, and the overhead is minimal to do it directly.  Moving an element to the
head (and sliding the ones inbetween) is something that should be handled inside
the class, both for abstraction and efficiency reasons.

Comment 45

18 years ago
Sun's compiler whines about code order issue:
"xpcom/ds/nsVoidArray.cpp", line 118: Error: "nsVoidArray::IsArrayOwner() const"
has already been called and cannot be defined inline.

Thas is easy to fix by moving IsArrayOwner() before code which calls it.

Comment 47

18 years ago
Startup CPU time on main thread decreased by ~100 ms for me. That is -4%. 

The number of mallocs decreased by ~1000. (-0.7%) But they were timed at times a 
couple of tenths of a second slower than my referance run. I don't know why. 
Maybe it's just too much noise. My computer isn't exactly unloaded when I'm 
doing the tests. That doesn't matter for Quantify withing Mozilla because it 
counts the instructions but it times the OS and libc calls giving much greater 
uncertainty there.

Time to load the page stress case seemed to decrease by 100-200 ms which 
surprises me. I wouldn't expect such a difference, but then that page is heavy 
on CSS. 

From a testrun:
Reflows -30 ms (-3%)
Tokenizer -25ms (-10%)
NavDTD::BuildModel -130ms (-6%)

I have a hard time believing these figures but they are at least in the right 
direction. It will be interesting to see the result from jgrm's more 
professional test runs. 

I have not looked very hard at the bloat change. The only thing I can say 
is that the change doesn't seem to be dramatic.



Comment 48

18 years ago
I'm in the middle of getting a couple of builds (win2k/linux) done to do some
tests (startup time, page loading) and I hope to get results by end tomorrow. 

[Maybe it's just me, but I had a lot of trouble patching on win32. However, 
there is one thing wrong in the patch attach_id=43418 (confusing but benign):
the same hunk for gfx/src/nsImageRequest.cpp is included twice, leading to 
a failure on the second pass).

I'm not really sure what the best memory/allocation/bloat tests should be 
run with this. waterson?

Comment 49

18 years ago
Randell, how about allocating memory with |alloca| in the nsAutoVoidArray case?
Since the array won't survive the current scope we could as well store the full
array on the stack reducing heap operations even further.

The problems would be that you can't realloc or free an |alloca|-allocated
memory but it could still be used for the initial buffer that is hard coded to 8
elements. If we allocate that buffer with alloca (allocate on the stack) we
don't need to hard code the stack size, but can allow the caller to say how big
the array will be. That should both reduce heap operations, increase performance
and reduce bloat.


(assuming there is an alloca equivalent in NSPR)
Daniel: Great numbers!  I figured there'd be some execution win in CSS by
avoiding lots of allocations, but I never figured it'd be that good.  It's also
really nice to see 3-10% drops in every test done, even if they may not hold up
entirely in a more rigorous test.

John: I fixed the duplicate hunk; because this is pulled from a tree with other
changes I don't want to release, the updates to the patch have been done by just
replacing files within the patch file, and I missed removing the old version of
one of them.  Thanks for noticing.

Daniel: I don't think there's an alloca equivalent in NSPR - it's too
compiler-specific (i.e. GCC).  Even if there was, I'd have to put the alloca at
the same scope as where the "nsAutoVoidArray foo;" is now, and that would mean
using macros or modifying a lot of code.  Even then, it might be problematic to
get C++ to be happy with this.

I had thought of overriding operator new, but that doesn't help with automatics
like the above.  If someone has a good _portable_ way to not have a fixed
initial arraysize _and_ only have a single allocation without overriding new,
I'd love to hear it (for a later patch perhaps).  I don't think C++ can do it. 
Fully dynamic object systems can (at least some, perhaps all).
Blocks: 92256
BTW, overriding operator new for nsVoidArrays is still a useful idea for things
such as "new nsVoidArray(100)".  This isn't done enough for me to have tried it
yet, and it doesn't help with automatics or arrays within other objects.

Comment 52

18 years ago
jrgm: re bloat tests, something like what tracy runs would be ideal (see
http://ftp.mozilla.org/pub/data/memtests/, e.g.); however, I'd be happy with
somebody doing a gross comparison of RSS and VM size after startup.

Comment 53

18 years ago
cc'ing some other people who've worked on nsVoidArray before (dbaron, rbs, scc).

- I'm frightened by the use of nsAutoVoidArray as a member variable. (It's ten
words, no?) Although it _looks_ like it's only being used in classes that don't
tend to have a lot of instances, we need to be doubly sure here. jst: it appears
that most of these replacements are in DOM code.

- I didn't check terribly carefully, but did you keep the special casing for
zero element arrays? (Ought we special-case single element arrays, too?) Perhaps
it'd be useful to add some statistics gathering code to count grows, shrinks,
size distribution, etc.?

- There are several places where we ought to hoist an invariant function out of
an expression where it's repeatedly evaluated -- e.g., GetArraySize(). It's not
|inline|, and I don't think that the compiler will optimize away the redundant
calls.

- Minor nit, but...you're asserting your own style here. That's fine, as it
appears you're rewriting most of the code; just don't leave the file as a
mish-mash of coding styles when you're done.

thanks...
mozilla.0.9.4, since I apparently missed the 0.9.3 boat.
Keywords: mozilla0.9.3
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I was pretty careful about where I made members into nsAutoVoidArrays; almost
all such cases have 1 or more entries inserted, so we end up saving space in
most cases.

0 element arrays (as nsVoidArrays) still don't have an mImpl structure allocated
until the first element is added.

I've considered making an nsVoidArray that stores single objects inline, and if
more than 1 is added, then allocate a real voidarray object.  It still would
require at least two words (maybe 3), and more complex code.  Note that one
class already does this itself (search for nsCheapVoidArray).

Making GetArraySize() an inline is a good idea.  (I had forgotten it wasn't.)
I'll check other invariants.

As for style: Since I'm rewriting so much, I guess it makes sense to assert
style (actually, emacs does it for me... ;-)  I left it a mismash because I
wanted the diffs to be readable; but so much has changed that perhaps I should
just go in and reformat.  Opinions?

Thanks for the review Chris!
Um, Chris, GetArraySize is:
inline PRInt32 nsVoidArray::GetArraySize() const {...}

So I think that's ok.
I strongly object to doing what looks like a search-and-replace job to use
nsAutoVoidArray in stead of nsVoidArray, some of these changes are to members of
classes whose size we don't wanto change just like that. How about we split this
work up so that the optimiziations to ns[Auto]VoidArray are in this bug and the
other changes get split into separate bugs for the different areas that are
affected?
It was definitely not a search-and-replace (though I did hit a lot of spots).  I
ran through many startups of mozilla (and page loads), looking where initial
allocations of mImpl structures were coming from, and looked to see which ones
were done a lot, and where they often had entries (but not lots of entrie).  If
an nsVoidArray is going to have entries anyways, it doesn't generally cost you
memory to make it an nsAutoVoidArray (and it can save memory, and it definitely
saves a trip through the allocator and deallocator, plus you get better
processor cache hit-rates).

Do you have any specific ones you question?  The ones I worried about the most
were the CSS classes, XUL Elements, and maybe one or two more.  Thats why I want
people to run VSS/RSS checks to see the impact on memory usage.  I'm hoping the
memory hit is in the 10's of K (total) - not bad for a multiple-% speedup.

Classes where I made members into nsAutoVoidArrays:

nsContentSubtreeIterator:
	nsAutoVoidArray mStartNodes;
	nsAutoVoidArray mStartOffsets;
	nsAutoVoidArray mEndNodes;
	nsAutoVoidArray mEndOffsets;
    These all normally get entries added in ::Init

nsBaseContentList:
	nsAutoVoidArray mElements;
    These seemed to always end up with elements in them in practice.

nsDocument: (a giant object if ever there was one)
	nsAutoVoidArray mStyleSheets;
	nsAutoVoidArray mObservers; // basically always has at least 1 entry
    Note I only did these two; other nsVoidArrays in nsDocument
    (mCharSetObservers, mPresShells, mSubDocuments) I left alone, because I was 
    less sure if they should be converted (perhaps mPresShells and 
    mCharSetObservers).

nsDocumentEncoder:
	nsAutoVoidArray   mCommonAncestors;
	nsAutoVoidArray   mStartNodes;
	nsAutoVoidArray   mStartOffsets;
	nsAutoVoidArray   mEndNodes;
	nsAutoVoidArray   mEndOffsets;
    Again, when DocumentEncoder is created and used, these seem to always end up 
    with elements in them.

nsFormControlList:
	nsAutoVoidArray mElements;  // Holds WEAK references - bug 36639
    Again, this seems to always have at least one item in it when it exists.

HTMLContentSink: (Another huge object)
	nsAutoVoidArray mContextStack;
    This gets used a lot.

nsCSSLoader:
	// mParsingData is (almost?) always needed, so create with storage
	nsAutoVoidArray   mParsingData; // array of data for sheets currently parsing
    As stated.  There are two other nsVoidArrays I didn't touch.

CSSStyleSheetInner:
	nsAutoVoidArray           mSheets;
    This does an AppendElement in it's constructor.

nsXBLStreamListener:
	nsAutoVoidArray mBindingRequests;
    These seem to get requests associated with them almost always.

nsXULElement:
	nsAutoVoidArray                     mChildren;           // [OWNER]
    Many (most) XUL elements seem to have children

nsXULContentSinkImpl:
	nsAutoVoidArray mNameSpaceStack;
    Again, almost always used - it's called from ::OpenContainer

nsXULDocument: (another BIG object)
	// This always has at least one observer
	nsAutoVoidArray            mObservers;
	// This is set in nsPresContext::Init, which calls SetShell.
	// Since I think this is almost always done, take the 32-byte hit for
	// an nsAutoVoidArray instead of having it be a separate allocation.
	nsAutoVoidArray            mCharSetObservers;
	// is we're attached to a DocumentViewImpl, we have a presshell
	nsAutoVoidArray            mPresShells;
   The comments speak for themselves.

nsHTMLReflowCommand:
	nsAutoVoidArray mPath;
   As best I can tell, this almost always gets set by BuildPath, which is called
   from ::Dispatch().

nsImageMap:
	nsAutoVoidArray mAreas; // almost always has some entries
   ImageMaps rarely have no entries...

nsLineLayout:
	nsAutoVoidArray mWordFrames;
   RecordWordFrame/ForgetWordFrame is called all the time.

nsTableCellMap:
	nsAutoVoidArray mCols;
   Tables rarely have no columns.

nsCellMap:
	nsAutoVoidArray mRows; 
   Tables rarely have no rows.

nsTableFrame:
	nsAutoVoidArray mColFrames; // XXX temporarily public 
   Ditto

nsHttpHeaderArray:
	nsAutoVoidArray mHeaders;
   We always have at least a few HTTP headers.  (This was one area where I 
   wished I could devote even more space to the autovoidarray, since 8 might be
   too low for many webservers.)

CompositeDataSourceImpl: (rdf)
	nsAutoVoidArray mDataSources;
	nsAutoVoidArray mObservers;
   I doubt there are any CompositeDataSources without data sources.  As for 
   Observers, i can't say they all have them, but it certainly showed up a lot 
   in my allocation traps.

CompositeEnumeratorImpl: (rdf)
	nsAutoVoidArray  mAlreadyReturned;
   If we return anything and mCoalesceDuplicateArcs is true, we stick it in 
   here.  I'd guess roughly 1/2 of them use this.

InMemoryArcsEnumeratorImpl: (rdf)
	nsAutoVoidArray  mAlreadyReturned;
   If we return anything, we stick it in here.

RDFContentSinkImpl:
	nsAutoVoidArray     mNameSpaceScopes;
   We add something to this when ::OpenContainer() is done.

nsDocLoaderImpl:
	nsAutoVoidArray mRequestInfoList;
   This is added to in ::OnStartRequest()

nsViewManager: (big object)
	nsAutoVoidArray   mDisplayList;
   This is used whenever we do ::RenderViews() (for Z sorting)
I should note that if there's an objection to any of these, or if they're
bloating us too much, they obviously can be dropped (or reworked to use
nsVoidArray *'s and new as needed).

Comment 60

18 years ago
I must admit that I share the misgivings of the previous reviewers. Wouldn't 
that be possible to instead manage these |mImpl| using an arena that will shield 
memory tricks from consumers.

Comment 61

18 years ago
Yes, I realize that an alloca solution requires macros but macros is an integral
part of both C and C++ so that is not extremely bad. 

Another stackbased solution would be to pass a buffer and a size to the
nsAutoVoid constructor. 

char initialArrayBuffer[4*sizeof(void*)];
nsAutoVoidArray stuffArray(initialArrayBuffer, 4);

That breaks abstraction somewhat but could be used if we really want to tune an
array. It is not as flexible as alloca since we have to know the size of the
array at compile time.


Arena solution: Who will own the arena?

Comment 62

18 years ago
>Arena solution: Who will own the arena?
A wrapper (helper) class that registers itself as a shutdown listener at its
creation (first use).

Comment 63

18 years ago
No, that's no good solution. An arena that lives as long as the program and
accepts requests for all kind of sizes (which this arena would have to) will
have major drawbacks. They will suffer from fragmentation and bloat and they
probably wont be as fast. An arena is at its best when they can be used to serve
blocks of fixed sizes that has a common predetermined life span. When that life
span is over  the whole arena can be release at once. As is the case of the
frame arena in the pres shell.

If an object has undetermined life span and size, it's the heap that is the
solution. If the heap can't handle that case, which is what it's optimized to
do, well, why should we be better?

Comment 64

18 years ago
In this context, the storage used by the arena is heap-allocated. Just that a
large amount is allocated at once. And you could have several default arenas for
the various sizes that are the current "defaults". The |mImpl| of a voidarray of
size 8 is picked from Arena8, that of size 24 is picked from Arena24; Full
arenas or a voidarray of large unexpected size are left alone and the space is
picked from the heap. And ExpandTo() could safely shuffle data around between
arenas because |mImpl| is private.

But I just noted another issue. The service manager that does the shutting down
is itself using nsVoidArray... So to avoid troubles, the wrapper class might
instead have to refcount its users: a |new| (resp. |delete|) through the wrapper
class increments (resp. decrements) the count of the arena being used. And the
wrapper can free any arena(s) anytime and itself when the counts goes to zero.
(In order words, it is implementable. Whether it is worth it is another matter.)
Randell, I know this wasn't done as a simple search-and-replace change, but from
looking at the diffs it looks more or less as it was one :-) Anyhow, I have
objections to the changes made to:

nsBaseContentList, nsFormControlList, nsLineLayout, nsGenericDOMDataNode,
nsGenericElement (all of them), nsEventListenerManager (all of them), nsXULElement.

I'd need to see real numbers showing that this is worth it before allowing these
changes in those classes (files), what's the speed/mem useage tradeoff here?
Maybe I'm misunderstanding something here, but you're changing members of
classes from nsVoidArray (8 bytes empty, on a 32-bit system) to nsAutoVoidArray
(~40 bytes empty), that seems excessive to me, maybe it makes sense in some
cases, but are you sure it's worth it in all these cases? Also, if you put more
than 8 elements in an nsAutoVoidArray you'll completely waste 32 bytes for the
now unused internal buffer.

- I highly doubt the changes in nsDocument are worth it, we create documents
very seldom (relatively speaking), and the items in the arrays mStyleSheets and
mObservers hardly ever change so I don't see the benefit.

- What's the point of changing the type of the pointer member in
nsXMLContentSink.h? Why not just leave that as nsVoidArray and allocate a
nsAutoVoidArray if we want to use one?

- This seems silly, nsXULContentSink.cpp, if you wanto use nsAutoVoidArray's,
then do that, but don't change API's to make it required:

-        nsresult GetTopChildren(nsVoidArray** aChildren);
+        nsresult GetTopChildren(nsAutoVoidArray** aChildren);

- There's only ever one single presshell in a document (for now), so I don't see
the point of making nsXULDocument::mPresShells a nsAutoVoidArray, maybe just
pre-allocate a one-element sized buffer on cunstruction in stead? Oh, and "// is
we're ..." should read "// i*f* we're ...".

- Oh, and what's the benefit of this change in nsCSSDeclaration.cpp (other than
the nsVoidArray -> nsAutoVoidArray)? Avoid the one IndexOf() call on an empty
array (which is trivial and very fast)? Hardly worth the added complexity IMO.

@@ -1837,13 +1838,16 @@
 
   if (NS_OK == result) {
     if (nsnull == mOrder) {
-      mOrder = new nsVoidArray();
+      mOrder = new nsAutoVoidArray();
     }
-    if (nsnull != mOrder) {
+    else {
+      // order IS important for CSS, so remove and add to the end
       PRInt32 index = mOrder->IndexOf((void*)aProperty);
       if (-1 != index) {
         mOrder->RemoveElementAt(index);
       }
+    }
+    if (nsnull != mOrder) {
       if (eCSSUnit_Null != aValue.GetUnit()) {
         mOrder->AppendElement((void*)(PRInt32)aProperty);
       }

- Fix the whitespace in nsCSSStyleSheet.cpp:

-  nsVoidArray           mSheets;
+  nsAutoVoidArray           mSheets;

- Fix the whitespace in nsRDFContentSink.cpp:

-    nsVoidArray     mNameSpaceScopes;
+    nsAutoVoidArray     mNameSpaceScopes;

- Hyatt really needs to have a look at the changes to the stylesystem before
this is checked in.

I think you're doing a good thing here, but we need to study the impact of these
changes more before checking in all these changes. Again, I think it'd be better
to split this bug into a few separate bugs for changes to separate areas of the
code.
Mass reply to Daniel, rbs, and jst:

Daniel wrote:
>> Arenas
>
>No, that's no good solution. An arena that lives as long as the program
>and accepts requests for all kind of sizes (which this arena would have
>to) will have major drawbacks. They will suffer from fragmentation and
>bloat and they probably wont be as fast. An arena is at its best when they
>can be used to serve blocks of fixed sizes that has a common predetermined
>life span. When that life span is over the whole arena can be release at
>once. As is the case of the frame arena in the pres shell.

        Arenas and pools are (as you said) good for specific allocation
patterns.  An allocation pool is good where the maximum number of identical
items in use is fairly static and they're often needed.  Pools for the
smaller allocation sizes here _might_ be useful (allocate N at a time, deal
with the "froth" aspect of people adding things to them and then destroying
them, rinse, repeat).  I may look further into that, but I'm not sure it
would be a win.  Arenas are probably not a great solution; these arrays
aren't in any way tied to each other, and their lifetimes will vary wildly.

rbs wrote:
>In this context, the storage used by the arena is heap-allocated. Just
>that a large amount is allocated at once. And you could have several
>default arenas for the various sizes that are the current "defaults". The
>|mImpl| of a voidarray of size 8 is picked from Arena8, that of size 24 is
>picked from Arena24; Full arenas or a voidarray of large unexpected size
>are left alone and the space is picked from the heap. And ExpandTo() could
>safely shuffle data around between arenas because |mImpl| is private.

        Certainly, though I think an allocation pool (or rather pools by
size) may make more sense.

>But I just noted another issue. The service manager that does the shutting
>down is itself using nsVoidArray... So to avoid troubles, the wrapper
>class might instead have to refcount its users: a |new| (resp. |delete|)
>through the wrapper class increments (resp. decrements) the count of the
>arena being used. And the wrapper can free any arena(s) anytime and itself
>when the counts goes to zero.  (In order words, it is
>implementable. Whether it is worth it is another matter.)

        That's certainly a reasonable solution if we want to do that.  As
you said, I'm not sure it's worth it - but it may be.

jst@netscape.com wrote:
>Randell, I know this wasn't done as a simple search-and-replace change,
>but from looking at the diffs it looks more or less as it was one :-)

        That's certainly true; it looks like I shotgunned the source tree.
And, to an extent, I did.  :-)  Mostly I was just putting breakpoints on
voidarray allocations, and seeing where they were coming from and why.
(I run FreeBSD, which doesn't have stack traceback code for tracemalloc/etc).

        In some cases, they were ones that were being hit a LOT.  In other
cases, they were ones that there was no gain to holding off on allocation.

>Anyhow, I have objections to the changes made to:
>
>nsBaseContentList, nsFormControlList, nsLineLayout, nsGenericDOMDataNode,
>nsGenericElement (all of them), nsEventListenerManager (all of them),
>nsXULElement.

        Thanks for a list; I'll try to address those in a second message.

>I'd need to see real numbers showing that this is worth it before allowing
>these changes in those classes (files), what's the speed/mem useage
>tradeoff here?

        I agree; that's why I want people to run CPU/layout and memory
tests with these patches.

>  Maybe I'm misunderstanding something here, but you're
>changing members of classes from nsVoidArray (8 bytes empty, on a 32-bit
>system) to nsAutoVoidArray (~40 bytes empty),

        Correct.

> that seems excessive to me,
>maybe it makes sense in some cases, but are you sure it's worth it in all
>these cases? Also, if you put more than 8 elements in an nsAutoVoidArray
>you'll completely waste 32 bytes for the now unused internal buffer.

        Correct; the space is wasted if it gets more than 8 elements.  When
doing my tests, I had a breakpoint/debug on an AutoVoidArray being grown so
I could keep an eye on that.  The total number of nsAutoVoidArrays grown
during startup was 280 (which are included in the total number of
allocations being ~700); the number of non-Auto's grown is down to 270.
The _maximum_ storage wasted if every one of those grown arrays sticks
around is 9K - and I'll bet many of them, if not most of them are freed.

>- I highly doubt the changes in nsDocument are worth it, we create documents
>very seldom (relatively speaking), and the items in the arrays mStyleSheets and
>mObservers hardly ever change so I don't see the benefit.

        Perhaps we don't create that many of them, but I kept seeing them,
especially when loading CNN.  The upside is that if we don't make many
nsDocuments, the memory overhead for an AutoVoidArray is minimal.

>- What's the point of changing the type of the pointer member in
>nsXMLContentSink.h? Why not just leave that as nsVoidArray and allocate a
>nsAutoVoidArray if we want to use one?

        I already changed that in a cleanup pass yesterday.

>- This seems silly, nsXULContentSink.cpp, if you wanto use nsAutoVoidArray's,
>then do that, but don't change API's to make it required:
>
>-        nsresult GetTopChildren(nsVoidArray** aChildren);
>+        nsresult GetTopChildren(nsAutoVoidArray** aChildren);

        Ditto; I hadn't gotten around to cleaning up my changes to that
before I posted.  I was waiting for more feedback before I changed the
patch again; I fixed that yesterday.  (I missed it in my first cleanup pass.)

>- There's only ever one single presshell in a document (for now), so I don't
see
>the point of making nsXULDocument::mPresShells a nsAutoVoidArray, maybe just
>pre-allocate a one-element sized buffer on cunstruction in stead? Oh, and "//
is
>we're ..." should read "// i*f* we're ...".

        Hmmm.  Then why is it an array at all?  There are all sorts of
DeleteShell, GetNumberOfShells, GetShellAt, etc methods.  If there can be
only one (hmmm, why am I thinking of Highlander?), it should be a simple
pointer.

        Thanks for the typo correction.

>- Oh, and what's the benefit of this change in nsCSSDeclaration.cpp (other than
>the nsVoidArray -> nsAutoVoidArray)? Avoid the one IndexOf() call on an empty
>array (which is trivial and very fast)? Hardly worth the added complexity IMO.

>+    else {
>+      // order IS important for CSS, so remove and add to the end
>       PRInt32 index = mOrder->IndexOf((void*)aProperty);
>       if (-1 != index) {
>         mOrder->RemoveElementAt(index);
>       }
>+    }
>+    if (nsnull != mOrder) {
>       if (eCSSUnit_Null != aValue.GetUnit()) {
>         mOrder->AppendElement((void*)(PRInt32)aProperty);
>       }

        Actually that's something I found when I was first starting to look
at nsVoidArray (it wasn't clear to me if order was important - I thought it
was, but I wasn't sure).  When I was adding the comment, I optimized the
code.  IndexOf isn't expensive, but for items with dozens of items (which
CSS arrays occasionally have) it does have some cost.  I think adding the
comment is important.  The rest of the change is minor but will be faster
than the original.

>- Fix the whitespace in nsCSSStyleSheet.cpp:
>
>-  nsVoidArray           mSheets;
>+  nsAutoVoidArray           mSheets;

>- Fix the whitespace in nsRDFContentSink.cpp:
>
>-    nsVoidArray     mNameSpaceScopes;
>+    nsAutoVoidArray     mNameSpaceScopes;

        No problem.

>- Hyatt really needs to have a look at the changes to the stylesystem before
>this is checked in.

        Absolutely; that's why I added him to the CC list:

:Adding hyatt because this impacts on his CSS optimizations possibly.:

The CSS stuff was something I was quite worried about touching, but I
think my changes are right (if anything, perhaps too conservative).

>I think you're doing a good thing here, but we need to study the impact of
>these changes more before checking in all these changes. Again, I think
>it'd be better to split this bug into a few separate bugs for changes to
>separate areas of the code.

        We could do that; however that will require a lot more r='s & sr='s
for things that are (mostly) the same, and may end up individually
deferred.  I'm focused on getting some real-world measurable gains out of
this, and as a springboard to future optimizations.  I've opened a tracking
bug for voidarray -> autovoidarray issues; I plan to file bugs/patches for
other areas of the system after this patch is committed.  We can split off
controversial pieces of this patch into other bugs and tie them to that
one (bug 92256).  I'd like to get as much of this as possible committed
together, however, to start.


       jst: thanks again for the feedback.  It's VERY useful, even where I
disagree with you about the tradeoffs.

Comment 67

18 years ago
>        That's certainly a reasonable solution if we want to do that.  As
>you said, I'm not sure it's worth it - but it may be.

I had a quick look at the implementation of FrameArena in nsPresShell.cpp. Very
neat. It seemed to me that there may actually be some merit in trying arena(s).
After all that's what layout and the Style System have been using with some
satisfaction.
I looked at presshell, and (again) at arenas.  For VoidArrays, I'm afraid arenas
don't make a lot of sense.  The problem is that you have lots of different uses
mixed together time-wise, with different lifetimes.  Arenas generally don't like
that; they like groups of allocations that grow and then all go away around the
same time.  They really don't like holes.   Arenas plus recyclers (ala
presshell) are somewhat better, but still are problematic when you mix
dramatically different lifetimes.  BTW, arenas are dreadfully documented, and
the code (while efficient) is even worse in terms of readability.  It
desperately needs comments (yet another task...).  Perhaps there's an HTML
document about them somewhere?  (I've been through this "figure out arenas" at
least once before.)

Recyclers on their own, however, or some other type of mImpl structure caches
backed by the main heap are quite promising, and I'll look into them.

Comment 69

18 years ago
Okay with a recycler of some sort as this might be nicer for the long run and 
offer a potential alternative to sprinkling autoVoidArrays which wouldn't be a 
good precedent.

Comment 71

18 years ago
- Why not let free() be the recycler? There is no initialization overhead for a
void array's storage, so what do we gain by maintaining a separate heap for
recycled void arrays?

- Once again, I'd like to make a plea for some instrumentation here so we could
base future implementation decisions on data instead of speculation. :-)

- I also agree with jst that we should start with the changes to nsVoidArray and
its ilk, and fan out the pre-allocated storage in a separate pass. (Preferrably
doing so based on the data we've collected from our instrumentation!)
With a low-overhead memory allocator (such as the BSD beerware allocator, which
is also in nsprpub as an option), we would gain little by keeping our own
recycler.  With a high-overhead allocator (such as Lea, and maybe glibc or
Win32) we do gain some.  Witness that your measurements of the Lea allocator
caused a 5-8% slowdown for pageload versus glibc, and I strongly suspect that
the BSD allocator would be at least that much faster than glibc (at the cost of
some memory).

Still, that's a good point.  I've implemented an mImpl recycler; perhaps after
all this settles I'll see if someone could run it through and see if it speeds
anything up.

As for instrumentation, I'm handcuffed by running BSD - tracemalloc/etc can't
grab stack traces from anything except linux or win32.  I have "tools" (i.e.
printfs, sort, and emacs) for capturing the number of voidarrays allocated and
their sizes, but to find out from where I've had to use GDB and 'up' until I
find the source of the allocation.

I do have access to a Linux machine; I'll try seeing if I can get tracemalloc to
work there if I can find time.  Unfortunately, I do have other things I'm
neglecting.

Since Chris and jst want it, I'll split the patch.  However, PLEASE remember
that the nsVoidArray and nsSupportsArray patches alone only have a medium to
small effect; the leverage comes from the changes elsewhere to make more use of
nsAutoVoidArrays.

Should I split out ALL the caller changes; just the ones that change member
variables; or just the ones that jst was worried about?

Personally, I think the issue isn't which of these should we not do; I think the
issue is how many _more_ places in the code should we replace nsVoidArrays?  I
was only taking ones I saw hit a lot, and were obviously going to have a few
entries in them - and my testing of non-startup code was exceedingly simplistic:
I loaded CNN.  No Mail, no menu, etc.  This is just IMO - I want to see
improvements that show up in the daily measurements after all this work.  ;-)

For reference, nsSupportsArray always has an 8-entry auto, and always has.

Comment 73

18 years ago
I made two optimized builds, base build and with
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43712 patch. 

I measured the startup timing, on win98 200mHz 128MB RAM, and I wasn't able to
get a big difference from the two... both came up with ~22.5 sec for base and
patched mozilla.exe builds. (did it with a watch, average)  I also used
taskinfo2000, and memory used by both builds are the same, 18,820kb.
Well, for a first-order check that's good - no major regressions in memory or
speed.  I didn't expect easy stopwatch-level improvements (given this was ~6-10K
out of 300+K allocations), and it appears I was right.  The pageload tests and
(machine-timed) startup tests may tell more of the story.  jrgm?

I'm currently trying to get a Linux build of the (unmodified) trunk up (and
failing).  Also, I've split the patch into two files - one base (nsVoidArray &
nsSupportsArray) plus important pieces, another for most of the
nsVoidArray->nsAutoVoidArray changes.  I'll be uploading them later.  Also, it
will include a minor fix for the Sun Forte compiler regarding 'inline' (GCC is
perfectly happy).

The second patch assumes the first has been applied.

Note that I also removed 'inline' from GetArraySize in the .h file to make the
Sun compiler happy.
Randell, if you could split out the changes that make members of classes
(directly or through allocations) be nsAutoVoidArray's by maybe one bug for
mozilla/content, one for mozilla/layout and one for the rest of the member
changes I think that'd be a good start.
Blocks: 92573
Blocks: 92575
Blocks: 92576
Layout: bug 92573; content: bug 92575; all others: bug 92576.
Blocks: 92614

Comment 80

18 years ago
Any news if the changes are of measurable benefit? If reducing the malloc proves 
helpful, then as an alternative to the shower of nsAutoVoidArray inside concrete 
class, I would think that other things might be worth trying as well so that 
there could be other options to choose from. To this end, re: recycler, I had 
this thought that something not that fancy could be tried.

Assuming N = defaut number of |mImpl| to be pre-allocated (note: since the focus 
here is startup time, the number could be picked accordingly. With enough room, 
the data for the nsVoidArrays that are currently allocated at startup will be 
picked from there.) A tailor-made recycler could start by |new|ing:

char ImplData[k*N] // k = default block size (e.g. 8, emulate nsAutoVoidArray)
int  ImplOffset[N] // _linked_ list(s) for occupied and non-occupied blocks 
                   // ImplOffset[ImplOffset[i]] is the successor in either list
                   // ImplData[k*ImplOffset[i]] is the actual block

Then, those |mImpl| that are |new|ed for the first-time could be obtained by 
just picking an offset in the non-ocuppied list. And the recyler will go away 
whenever the list of occupied blocks is empty. Not very clear but hope those who 
are familiar with these types of sparse data structures can see what I mean. 
Using the above scheme will mimic what the nsAutoVoidArrays are doing, but.. 
will 1) avoid the shower of nsAutoVoidArrays in the tree, and 2) shield memory 
tricks and allow all consumers to benefit from the scheme. Of course, if the 
current migration and testing that rjesup is doing shows that avoiding the 
malloc isn't helping that much, then there is little motivation at trying other 
things.

Comment 81

18 years ago
rbs, I think that is the best plan yet. That would reduce the bloat from
nsAutoVoidArrays and still make them very cheap to allocate. I've tried
something similar with nsTextFragments in layout (patch not uploaded yet).

The one problem is what if we allocate say 1 MB of these structures, because of
a page that does something that requires lots of arrays and then when the page
goes away it deallocates all structures but one per malloced block so that we
can't free them. Then we would sit with 1 MB allocated but only a small fraction
of that used. 

I don't think that problem is very big though. There will always be new users of
such array structures. 

(It is much worse with the nsTextFragments, because they are not fixed size)
And for it not helping by reducing memory allocations, it does but there are
many, many streams to stop and we have to start somewhere. I do think the memory
allocations hurt page load time more than startup time though. (Table stress
case spends 30% of the time allocating memory)

Comment 82

18 years ago
> (It is much worse with the nsTextFragments, because they are not fixed size)

Not sure how you are coding over there. But the idea is that all nsVoidArrays 
behave like nsAutoVoidArrays the first-time, i.e., the first AppendElement() 
always starts within a pre-allocated block (if an unocuppied block is still 
available). If there is no more block or if the array later grows pass the block 
size, its own separate space is then allocated in the heap. (Those nsVoidArrays 
with predefined size at construction may perhaps need to be left out of the 
equation.)

The practical difference with nsAutoVoidArray will be therefore that 
nsAutoVoidArray offers the _guarantee_ that an allocation is not happening the 
first-time. With the scheme, an allocation may happen or may not happen 
depending on whether there is still an unocuppied block. (Of course, it is 
possible to expand the number blocks on the fly, but that introduces other 
complications. The idea is to keep things simple and wait'n see).

Comment 83

18 years ago
WJat I did with nsTextFragments and what I suggest here is to not keep one big 
block, but to keep a number of blocks, so say that we have blocks of 500 mImpls. 
When those 500 are used we allocate a new block, and when one block becomes 
unused we free it. It will only be if the default size mImpl is too small (or 
big) that the mImpl is allocated directly from the heap. This will reduce the 
allocation cost to one ~500th (ignoring our own overhead).

If we have one big block we will 
1. Use all its space and don't gain anything at all or
2. Have a big allocated block that wastes space, increasing bloat.

We can't tune the size to be about right, because that size is dependant on what 
the user does, how many windows she has open and what pages are loaded. Remember 
that the big memory crook is not startup, it's page load.

I think there already exists structures that do these things in the tree so it 
wouldn't be very hard to implement. I remember that I've seen the name 
nsFixedSizeAllocator at least which sound as if it might do the right thing.

Comment 84

18 years ago
Sounds good. There is still the scenario where there could be sticking elements 
in several blocks at once. But thinking of such things would just discourage 
from doing anything (anyway, there is the possibility of combining an 
collapsing blocks if that becomes a serious issue later).

Since you already started something on a related problem, wanna pick up the 
torch here? Handling the inner 500 elements (those mImpl of fixed size=8 for 
example) inside each block is perhaps where some custom code is needed. 
When an |mImpl| is 'full', it is re-allocated outside and its previous location 
inside the block becomes ready for another element. If done properly, the final 
data structure can be sufficiently general so as to solve your nsTextFragment 
problem too.

BTW, in my previous terminology I used 'block' to refer to the smaller space 
taken by each inner |mImpl| (i.e., size=8 for example). But now that more 
details are known, I think the overall picture of the scheme is clear.

Comment 85

18 years ago
The nsTextFragment stuff is completely different, there all blocks are of 
different sizes, many being 2-10 bytes, but there are also blocks of several KB.

I don't think I can guarantee to have time to invest in nsVoidArray. I'm more 
focused on driving my other ~10 bugs to finish. Then, we'll see...

Comment 86

18 years ago
Hmm, I just had quick look and the two problems look similar to me. nsVoidArray 
owns |mImpl| while in the other case nsTextFragment owns its inner data which 
might be handled in a similar way. So it seems to me.

Comment 87

18 years ago
Yes, the problems are similar and yet so completely different.

1. nsVoidArrays can accept a little unused memory, it's part of a dynamic arrays 
life to be too big. nsTextFragments should use as little memory as possible, 
because even 10 bytes / block can be hundreds or thousands of KB.

2. Because of 1. we can allocate a reasonable sized block (say room for 8 void*) 
 for a default nsVoidArray thereby allocating lots of blocks of the same size. 
nsTextFragments have blocks that all are of different sizes.

3. When allocating many blocks of the same size we don't have to worry about 
fragmentation. When allocating blocks of different sizes, fragmentation becomes 
a very big problem. 

So for nsVoidArray we have a solution in nsFixedSizeAllocator (which I found in 
xpcom/ds http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h 
 ). For nsTextFragments we have the heap which is expert on allocating and 
freeing blocks of different sizes. To improve things for nsTextFragments we have 
to take advantage of something else that the heap cannot know. My plan was to 
use the fact that nsTextFragments have a life time that is the same as the page 
the text is displayed on, but I still haven't gotten the help I need to complete 
that work, even though it looks promising.

Btw, http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h has 
a comment at the top describing how to use it in a class. It might be less work 
then one might believe.

Comment 88

18 years ago
The mismatch of terminology may be making things not very clear. Let's start 
over. A 'block' is a large _container_ of N=500 (or so) contiguous 'chunks'. 
Then, the size of each 'chunk' is k='8' (or so) elements. (BTW, why would '500' 
become '8' in the nsTextFragment case?)

The management of the 'blocks' may be done via the nsFixedSizedAllocator. 
However, the management of the 'chunks' needs some custom code as I described 
earlier. Each fresh nsVoidArray object is first assigned a 'chunk' (not a 
'block'). Then, after 8 nsVoidArray::AppendElement(), the initial 8 entries of 
its chunk are filled up, it is then _re-allocated outside the block_, and takes 
on a life of its own away. And the 'chunk' that was occupied becomes available 
for another nsVoidArray that may come along. Hence the nsVoidArrays can be of 
different sizes, but if an nsVoidArray ends its life without doing more than 8 
AppendElement(), no malloc occurs and that's where the benefit comes up.

Isn't that the same problem with nsTextFragment, where the chunks might contain 
8 characters instead of 8 |void*|? Where N=500 and k=8 can be parameterized at 
construction. Anyway, rjesup has done a good job in gahering more attention 
towards speeding up nsVoidArray than nsTextFragment... So speeding up 
nsVoidArray can be done independently. But it would make sense if efforts are 
not unnecessarily duplicated.

Also nsFixedSizedAllocator is not making things as easy as you alluded. It will 
only help to manage the blocks. The handling of the 'chunks' need a special code 
(although not algorithmically difficult). And further work is needed to put all 
things together.

Comment 89

18 years ago
I am not sure that nsFixedSizedAllocator is going to be that useful in _this_ 
context. The benefit comes from the chunks. The blocks themselves can be handled 
with plain |new| and |delete|. Any idea about the advantages of using
nsFixedSizedAllocator in this context?

Comment 90

18 years ago
No, a fixed size chunk for nsTextFragments doesn't win anything. There is no 
size that could be used for more than a small fraction of the fragments on a 
page without introducing lots of wasted memory.

Ah, well I didn't read the nsFixedSizeAllocator code very much. I just thought 
it did what the name suggested, which would leave just a small change to the 
array code to do.

And I understand what you want. I think I have understood from your first 
presentation of the idea. :-)

Comment 91

18 years ago
>No, a fixed size chunk for nsTextFragments doesn't win anything.
The difference is therefore in the handling of the auxiliary 'offset' array.
Is it what you are doing (or intend to do)? Who knows, you might now win help 
from the large list of people cc:ed :-)

Comment 92

18 years ago
No, trust me, it isn't close to be as easy as the fixed size allocator. If the
sizes varies, you will have to maintain extra information (wastes memory) to be
able to free a chunk but it also causes bad fragmentation if the code is not
very advanced with memory block concatenation and seperating requests of
different size. The heap does just that and it makes allocations quite expensive.

Comment 93

18 years ago
FYI, brattel's mouth watering work is bug 89567!
I'll write a longer response later, but I should note that I already have a
recycler mostly written for nsVoidArray.  It keeps a specified maximum number of
free Impl structures a specific sizes instead of letting them go back to the
heap.  It does NOT "block" allocate the mImpl's, mostly because you then have
the problem of wildly varying lifetimes causing fragmentation of your blocks;
you can end up with a lot of mostly-empty blocks after something does a large
number of allocations, then frees them.  If you kept the number allocated
together small, you avoid most of that problem, but you lose much of the gain. 
Still, 5 or 10 mImpl's per block cuts the time in the malloc allocator by 5-10x,
and you probably  wouldn't have _too_ many partly-filled blocks.

Note that my recycler helps deal with allocation "froth"; it doesn't help too
much with the tidal-waves from, say, loading a CSS page - you end up allocated a
lot while loading it, and freeing a lot when the page goes away.  Still, you'd
end up allocating and freeing a lot less than now.

Also, more-expensive allocators would make the advantage of recyclers larger.
Daniel's worry about 1 autoarray in use per large allocated block is real -
arrays tend to have wildly divergent lifetimes.  Solutions include:

1) We _can_ compact the memory, since it all goes through the mImpl pointer. 
HOWEVER, this would mean locks around all mImpl access, which is a Bad Thing.

2) keep the number of allocation units in a block-allocation small, say 5 or
10.  There would be some memory bloat/loss, but it would be much smaller, and
allocation overhead would drop by ~5x (minus our overhead).

3) Use a recycler (as I've mostly coded) to deal with the froth of short-term
uses.  Things that grab and hold a lot of arrays, then release them, will still
cause allocator traffic (though less than now, depending on how high you set the
max in-recycle-bin sizes).  Unfortunately, pageload is one such case.  Upside is
limited (known) memory bloat.

Comment 96

18 years ago
Let me take a stab at implementing the scheme and see how it goes. Time for some 
experimentation -- there is still some time before the next milestone m9.4. My 
gut feeling is that those nVoidArrays that are created at the same time are 
likely going to have the same lifespan. So fragmentation might actually be rare 
(or negligible). Also, the maximum number of blocks at any one time can be kept 
bounded, and it seems to me that a one-block setup may subsume what you are 
doing, and may have less overhead due to the custom way of handling the 
auxiliary 'offset' array (though I am not sure how your implementation is. I 
couldn't get bratell to detail his mystery implementation :-) With the 
configurable parameters (number of blocks, number of chunks per block, maximum 
number of blocks), perhaps there could be some win in the scheme.

Comment 97

18 years ago
You'll need to make sure your recycler is threadsafe, which will likely negate
any performance benefits you'd otherwise see. Remember, the PresShell arena
``works'' because:

1. All objects are known to be created, accessed, and
   destroyed on one thread only. This implies that users
   pay no penalty for locking.

2. All objects have very similar lifetimes. This implies
   that the allocator's implementation can be extremely
   simple (e.g., ``malloc()'' increments a single pointer,
   and ``free()'' is a no-op).

You cannot make either simplification for nsVoidArrays. If you _can_ make this
simplification for a certain class of nsVoidArrays (say, those used by the
content model code), then you should do like stdc++ does and templatize across
an allocator.

We should make the global memory allocator better, rather than tinker with
one-off recycler hacks IMO.

Comment 98

18 years ago
Wise words. I could possibly templatize but... you just reminded me that all 
this sounds like too much work for little gain. The thing is that if the savings 
from these mallocs are not really significant (that's the evidence so far, 
sadly?) then there is little incentive at trying complicated things. So any news 
if there is any measurable benefit by avoiding these mallocs in nsVoidArray? If 
not, why bother.
I agree with Chris.  I think that we'll do better to commit the straightforward
changes I have in this patch and the 3 subisdary bugs/patches than to spend lots
more time implementing recyclers (and testing them, etc).  I think VSS/RSS
memory tests will show little to no extra memory use due to my patches.

(I had a recycler written (with locking), and I might want to test it after we
get the first, straightforward patches in.  But lets get the simpler, safer
patches in first.)  IMHO

BTW, chris, I'm still working on getting some stats incorporated into it - maybe
tonight.
Any more comments on the code in the base patch
(http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43753)?

I'd like to get this moving towards checkin.  We _should_ be getting some perf
numbers in the next day or two.  (I so have some stat-gathering code I'm adding,
but I'm not changing the main code in any way other than adding a macro for
calculating the size of an Impl structure.)
Ok, I have stats on nsVoidArrays with (all) my patches.

This is the distribution of max number of array entries for a startup to blank
and shutdown.  This includes nsVoidArray and nsAutoVoidArrays.

0: 3242  1: 4646  2: 1264  3: 994  4: 871  5: 573  6: 558  7: 419  8: 129  9:
39  10: 17  11: 18

Note: 8 seems like a pretty good cutoff, though you could make an argument for 1
or maybe 2.

I'll attach a dump of the stats, and copies of nsVoidArray.cpp and nsVoidArray.h
that collect and dump the stats.

Some of that data.  Note that AutoVoidArrays go from 0 (allocated mImpl size) to
72, so all AutoVoidArrays of 8 elements or less are included here under size 0. 
Number of VoidArrays includes AutoVoidArrays.

Size 0:
	Number of VoidArrays this size (max):     12358
	Number of AutoVoidArrays this size (max): 11956
	Number of allocations this size (total):  0
	Number of GrowsInPlace this size (total): 0
Size 40:
	Number of VoidArrays this size (max):     156
	Number of AutoVoidArrays this size (max): 0
	Number of allocations this size (total):  156
	Number of GrowsInPlace this size (total): 0
Size 72:
	Number of VoidArrays this size (max):     298
	Number of AutoVoidArrays this size (max): 276
	Number of allocations this size (total):  298
	Number of GrowsInPlace this size (total): 0

Note: that patch for getting stats is just nsVoidArray.[h|cpp], and is against
the trunk.  To apply it, install the base patch, revert those two files, and
then install the stats patch.  I'll make an updated base patch tomorrow.  There
are no functional changes, just stat-gathering and such.

Comment 106

18 years ago
FYI, http://lxr.mozilla.org/seamonkey/find?string=nsFixedSizeAllocator
which I found as I was trying to have another look at the allocator that bratell
mentionned earlier. And suprise... sfraser's version (on the mac) is already
doing what I described earlier (but with an inverted terminology, s/block/chunk/
and s/chunk/block/ -- an his 'chunks' are objects linked together in a standard
way instead of just being locations handled with an auxiliary 'offset' array).
The FixedSizeAllocator is not appropriate, I think.  It implements a recycler 
combined with block (chunk) allocation of additional items.  However, you have 
the same problem with fragmentation (or would, if it ever released memory back 
to the heap).  Basically, this is close to a worst-case allocator for footprint 
(it's probably pretty fast, however).

Comment 108

18 years ago
It was a FYI item -- a structural coincidence worth mentioning, but overkill for
the little _contiguous_ 'chunks' of size=8 suggested here. It is interesting how
he circumvented the explicit handling of the locks.
BTW, the reason there are no "GrowsInPlace" for my results is that FreeBSD's
allocator is binned, and thus a realloc() that goes over the current bin-size
always allocates a new buffer.  "Traditional" allocators will get GrowsInPlace
hits.
I added stats to nsSupportsArray.  THe number of memory allocations that
SupportsArray does with the base patch is reduced from 1672 (in 0.9) to 585 with
the patch.  This is in startup to a blank page and shutdown.  They also get used
a lot in pageload.  Also note that the distribution of nsSupportsArrays has a
very long tail: the largest array has 688 entries in it, and about a dozen had
more than 400.

Total bytes allocated for arrays by nsSupportsArray: 0.9: 957152  patch: 147264

Comment 111

18 years ago
Seems like the nsSupportArray changes are a win however you count. Very good.
This patch included stats-gathering (see other messages for results).  I also
(at waterson's suggestion) asserted style in nsVoidArray.cpp because I was
largely rewriting it.  This patch is against the trunk as of when I post this -
I needed to redo the patch because Brendan's changes for Fastload touched
nsSupportsArray in a way that bit-rotted my old patch.  I believe the subsidary
patches attached to 92256 are still valid.

Looking for performance/memory results(!!!), and r/sr.
Note: if you're using a debug build, nsAutoLock will do a bunch of extra
nsVoidArray usage (which we used to do always before this patch).  I added some
#if DEBUG's around some checks for possible coding errors.
- The comment in nsVoidArray::RemoveElement() sez:

+  // XXX should use IndexOf

so why doesn't it use IndexOf()?

- You're now changing the semantics of nsVoidArray::Compact() to only compact if
there are more than 256 elements in the array, I don't agree with that change. I
think Compact() should *always* compact the array, like it did before, since the
array implementation doesn't know how much memory will be saved by compacting a
bunch of arrays, the saving done by compacting a single array might be next to
none, but the savings gotten from compacting a huge number of arrays in a system
(like the content model, as an example) can be more than enough to make it worth
compacting even if the saving per array is only 1 pointer size. The code that
calls Compact() can choose not to compact if the number of elements in the array
is lower than a certain number that makes sense for that code, but the array
implementation should *not* restrict when things are actually compacted. Undo
that change.

- Looking at the changes to nsSupportsArray makes one wonder why nsSupportArray
doesn't inherit nsVoidArray...

- Add the new method in nsISupportsArray.idl to the *end* of the interface, the
current change is guaranteed to cause binary incompatibilities, if you add it
tot the end of the interface there's at least a chance for old callers to find
the right method when calling methods through nsISupportsArray.

- In nsAutoLock.cpp, change #if DEBUG to #ifdef DEBUG

- In nsFontMetricsGTK.cpp:

@@ -3136,12 +3143,7 @@
       found = 1;
     }
     else {
-      PRInt32 n = aNodes->Count();
...
-      }
+
  found = (aNodes->IndexOf(node) != -1);
     }

Bad indentation, and also bad indentation at:

@@ -3562,6 +3569,8 @@
 
     /*
      * count hyphens
+
 * XXX It might be good to try to pre-cache this information instead
+
 * XXX of recalculating it on every font access!
      */

- Don't check in the changes to xpcjsruntime.cpp until you have an a= from jband
and/or dbradley.

Style nits:

- In nsVoidArray::ExpandTo(PRInt32 aMin), fix the indentation of the code inside
the #ifdef to match other similar places (i.e. use 2 spaces, not 4):

+      if (newImpl) 
+      {
+#if DEBUG_VOIDARRAY
+          ADD_TO_STATS(AllocedOfSize,SIZEOF_IMPL(aMin));
+          if (aMin > mMaxSize)

- Oh how I hate changes like this:

-  if (mImpl) {
+  if (mImpl)
+  {

but at least you're consistent so I'll just haveto live with it :-)
With those issues fixed, r=jst


Hmm, this doesn't make sense to me (not your fault, it was in the old code too),
InsertElementAt() can only insert in the middle of the current array or append
immediately past the current array, but it can't insert elements at any position
in the array causing the array to grow to the required size needed for the
operation, and padding new unset elements with 0. Yet, ReplaceElementAt() can do
exactly that, what's the reasoning behind that I wonder? Should we file a new
bug on that?
>- The comment in nsVoidArray::RemoveElement() sez:
>
>+  // XXX should use IndexOf
>
>so why doesn't it use IndexOf()?

	No reason.  I happened to notice that while doing style changes and
noted it for future interest but didn't think it would do much but save a
few bytes of code (and clean things up).  How about this:

PRBool nsVoidArray::RemoveElement(void* aElement)
{
  PRInt32 theIndex = IndexOf(aElement);
  if (theIndex != -1)
    return RemoveElementAt(theIndex);

  return PR_FALSE;
}

>- You're now changing the semantics of nsVoidArray::Compact() to only
>compact if there are more than 256 elements in the array, I don't agree
>with that change. I think Compact() should *always* compact the array,
>like it did before, since the array implementation doesn't know how much
>memory will be saved by compacting a bunch of arrays, the saving done by
>compacting a single array might be next to none, but the savings gotten
>from compacting a huge number of arrays in a system (like the content
>model, as an example) can be more than enough to make it worth compacting
>even if the saving per array is only 1 pointer size. The code that calls
>Compact() can choose not to compact if the number of elements in the array
>is lower than a certain number that makes sense for that code, but the
>array implementation should *not* restrict when things are actually
>compacted. Undo that change.

	I understand your objection.  However: there is an issue here.
Compact() is often called when something may (or may not) get much smaller.
(The cases I trapped tended to be relatively small shrinkages.)  On many
systems (almost all), allocations are rounded up, and sometimes to
powers-of-two (in binned allocators such as the BSD allocator).  I was
trying to find a balance between the overhead of freeing/malloc/copying and
space.  However, I agree that the tradeoff I picked was pretty random and
mostly designed for speed, not memory efficiency (which was why I was
looking for bloat testers).

	Looking at LXR, I see that about the only use of Compact() is for

1) js/src/xpconnect/src/xpcjsruntime.cpp: JS GC code, which I'm already
   modifying to use ExpandTo().
2) uriloader/base/nsDocLoader.cpp: observer arrarys - it calls Compact()
   after every event.  Also it uses Clear()/Compact() to empty arrays.
3) a number of content files make it available as bool attr, which may be
   where what usage we have is coming from.

	So, should we realloc the memory when it has shrunken by a single
entry, as we were?  I may throw in some debugs to figure out how often this
happens.

	Note: I made the change to Compact before I did the major rewrite that
added ExpandTo(), which is now used in xpcjsruntime.cpp.  If you want me to, I
will redo Compact() to always realloc regardless of the savings (if >0).

>- Looking at the changes to nsSupportsArray makes one wonder why nsSupportArray
>doesn't inherit nsVoidArray...

	It would make quite a bit of sense.  However, I didn't think I should
go into such extensive and more-likely-to-have-unexpected-results changes.
It also would require considerably more testing and reviewing IMHO.  Also,
see the end of this message for info on some niggling semantic differences
between them.

>- Add the new method in nsISupportsArray.idl to the *end* of the interface, the
>current change is guaranteed to cause binary incompatibilities, if you add it
>tot the end of the interface there's at least a chance for old callers to find
>the right method when calling methods through nsISupportsArray.

	I didn't realize we had locked down binary compatibility on this
interface; I added it where it would read well.  No problem, though.

>- In nsAutoLock.cpp, change #if DEBUG to #ifdef DEBUG

	Ok, makes sense.  (For testing I was using #if 0).  #if DEBUG's are
scattered through the tree, but I do agree with you.  In any case, I looked
at the file more, and found a vast amount of it (including the change I
made) is already under #ifdef DEBUG, so I removed my change.

>- In nsFontMetricsGTK.cpp:
>Bad indentation, and also bad indentation at:

	Ok.  Emacs helpfully inserted tabs.

>- Don't check in the changes to xpcjsruntime.cpp until you have an a= from
jband
>and/or dbradley.

	No problem.  Email sent.

>Style nits:
>
>- In nsVoidArray::ExpandTo(PRInt32 aMin), fix the indentation of the code
inside
>the #ifdef to match other similar places (i.e. use 2 spaces, not 4):
>
>+      if (newImpl) 
>+      {
>+#if DEBUG_VOIDARRAY
>+          ADD_TO_STATS(AllocedOfSize,SIZEOF_IMPL(aMin));
>+          if (aMin > mMaxSize)

	No problem, that was just forgetting to have Emacs re-indent after
cut-and-paste.

>- Oh how I hate changes like this:
>
>-  if (mImpl) {
>+  if (mImpl)
>+  {
>
>but at least you're consistent so I'll just haveto live with it :-)
>With those issues fixed, r=jst

	I wasn't going to do that, but waterson suggested asserting style given
the depth of the changes to the file (instead of going back and making all
the new code match the old standard).

>Hmm, this doesn't make sense to me (not your fault, it was in the old code
>too), InsertElementAt() can only insert in the middle of the current array
>or append immediately past the current array, but it can't insert elements
>at any position in the array causing the array to grow to the required
>size needed for the operation, and padding new unset elements with 0. Yet,
>ReplaceElementAt() can do exactly that, what's the reasoning behind that I
>wonder? Should we file a new bug on that?

	I have _no_ idea.  I noted the same weirdness you did, and at least
commented it in the code:

    // An invalid index causes the insertion to fail
    // Invalid indexes are ones that add more than one entry to the
    // array (i.e., they can append).

and

  // Unlike InsertElementAt, ReplaceElementAt can implicitly add more
  // than just the one element to the array.


	For added fun, nsSupportsArray doesn't allow ReplaceElementAt to add
elements _at all_.  Also, nsSupportsArray has AppendElements; nsVoidArray
doesn't.


Johnny:  thanks for such a through review
I think all of jst's and jband's comments are now fully addressed.  At this
point, I need r= (jst should provide that), sr= (waterson?), and an a= from
jband or dbradley.

Compact() now always compacts again, as per jst's request.  We may eventually
want to look at callers of Compact, or revisit the time/space tradeoff within
nsVoidArray.
Blocks: 7251

Comment 121

18 years ago
Add this from my e-mail:

For the xpcjsruntime.cpp change. Why not just check if (count == 1) rather than
!count. This would save numerous uneeded calls to ExpandTo. The only counter for
this would be if the array shrunk to some small number and then releasing an
object triggerred additional objects to be placed in the array. jband, how
likely is the array to shrink below 256 and then grow again due to additional
releases? 
>For the xpcjsruntime.cpp change. Why not just check if (count == 1) rather than
>!count. This would save numerous uneeded calls to ExpandTo.

	Unless I miss my guess, the array is only size zero if everything in it has
been released and the array Compact()ed.  When it's releasing things, it does so
one at a time, and it rechecks count on each pass.  If count == 0, it Compact()s
the array and quits the GC.  In the code for DeferredRelease(), we only ExpandTo
if Count == 0, so this should only happen on the first call of a GC pass.

	Your comment did make me go check the code; there is one mod that would help: 
Change this:

      if (aMin > mImpl->mCount)
      {
        char* bytes = (char *) PR_Realloc(mImpl,SIZEOF_IMPL(aMin));

into this:

      if (aMin > mImpl->mCount && aMin != oldsize)
      {
        char* bytes = (char *) PR_Realloc(mImpl,SIZEOF_IMPL(aMin));

That will cut the overhead of ExpandTo() with the same size to almost nothing. 
I'll revise the patch later today.

Comment 123

18 years ago
I hate Monday mornings. I read the test backwards, sorry Randell. At least
something good came of it.

So the change is fine with me. So a=dbradley.

Comment 124

18 years ago
1. Instead of ``very evil macros'' (they're not evil AFAICT), how about
``statistics gathering macros''.

2. In |ExpandTo()|, why are we testing mCount at all here?

  if (aMin > mImpl->mCount) { ... }

Shouldn't we be testing against oldsize? Why force the array to be reallocated
if the space is already available? I assume it's written this way because we're
using |ExpandTo()| to _shrink_ the array as well? If so, wouldn't |SizeTo()| be
a better name for this method?

3. In |ExpandTo()|, I'm to wading through nested if/else clauses four levels
deep, which is a pet peeve of mine. And there are early returns in here to boot!
Can't we re-arrange this logic more sensically? For example,

  if (aMin <= 0) {
    // free the array
    return PR_TRUE;
  }

  if (mImpl && IsArrayOwner()) {
    // We currently own an array impl. Resize it appropriately.
    if (aMin <= mImpl->mCount) {
      return PR_FALSE; // wouldn't fit

    // (grow the array, etc.)
    return PR_TRUE;
  }

  // We don't own an array impl. Allocate one now.
  return PR_TRUE;

Perhaps this logic looked prettier before the statistics gathering code was in
place, but it's torturous now!

To a lesser degree, operator=() suffers the same way.

4. Why aren't |GetArraySize()| and |IsArrayOwner()| declared inline in the header?

5. You don't declare |nsAutoVoidArray::ExpandTo()| as virtual. Did you mean to
do that? (I can never remember if this matters.)

6. nsAutoVoidArray::Clear() is a bit complicated. Can't we unilaterally call
nsVoidArray::Clear(), and conditionally ExpandTo(0)?

7. It looks like gSupportsArraySizes is obsolete. Can it just be removed?

I've been whacking on this stuff in my tree, so I'll just attach a patch of what
I've got currently.
	Thanks for the thorough review.

>1. Instead of ``very evil macros'' (they're not evil AFAICT), how about
>``statistics gathering macros''.

        No problem.  :-)

>2. In |ExpandTo()|, why are we testing mCount at all here?
>
>  if (aMin > mImpl->mCount) { ... }
>
>Shouldn't we be testing against oldsize? Why force the array to be reallocated
>if the space is already available? I assume it's written this way because we're
>using |ExpandTo()| to _shrink_ the array as well? If so, wouldn't |SizeTo()| be
>a better name for this method?

        Correct, it is sizing.  As I mentioned earlier, the name is poor.
SizeTo() is fine by me and more descriptive.  Done.

>3. In |ExpandTo()|, I'm to wading through nested if/else clauses four levels
>deep, which is a pet peeve of mine. And there are early returns in here to
>boot!  Can't we re-arrange this logic more sensically? For example,

        Sure, I can reorder it.

>Perhaps this logic looked prettier before the statistics gathering code was in
>place, but it's torturous now!

        The stats code really made it harder to read.

>To a lesser degree, operator=() suffers the same way.

        I'll look at that.

>4. Why aren't |GetArraySize()| and |IsArrayOwner()| declared inline in the
header?

        I'll check on that.  I think the reason was that the only callers I
cared about for those were internal.  Those should get inlined, since the
code was written (in the .cpp) as 'inline'.  External callers would get a
non-inline copy.  This does waste some code space however.  I did it originally
because I like to have the implementation be in the .cpp file; I find
implementations split between two files painful.  However, this is C++, so
it's a pain I should expect.  Fixed.

>5. You don't declare |nsAutoVoidArray::ExpandTo()| as virtual. Did you mean to
>do that? (I can never remember if this matters.)

        I can't either.  I don't _think_ it matters, but I suppose it
doesn't hurt, or hurt much.  I'll add it.  Ditto for nsAutoVoidArray::Compact.

>6. nsAutoVoidArray::Clear() is a bit complicated. Can't we unilaterally call
>nsVoidArray::Clear(), and conditionally ExpandTo(0)?

        Sure.  The compiler would probably optimize it that way anyways.

>7. It looks like gSupportsArraySizes is obsolete. Can it just be removed?
>
>I've been whacking on this stuff in my tree, so I'll just attach a patch
>of what I've got currently.

        Ok, thanks.

	I've done some additional cleanup (factored out the Grow code into
nsVoidArray::GrowBy and nsSupportsArray::GrowBy), and added (with Pavlov's
agreement) the methods InsertElementsAt(aOther, aIndex) and
RemoveElementsAt(aIndex,aCount).  These may be useful in InsertBefore, etc
(especially for doc fragments to speed up mail reply).  Also, in SizeTo() I
added an early return for aMin == current size for efficiency.

	I'll post an updated patch shortly after testing all this.
I found a bug in nsSupportsArray while doing a little cleanup.  RemoveElement
takes a starting point, but ignores it.  I'll include the fix for this in the
patch.
Updated to include waterson's comments: ExpandTo() called SizeTo(), cleanup of
the code for SizeTo, inlines, etc.

Added InsertElementsAt/RemoveElementsAt (per discussion with Pav - see
n.p.m.performace/.porkjockeys/.xpcom) to both nsVoidArray and nsSupportsArray. 
nsSupportsArray::AppendElements() uses InsertElementsAt() now.

Cleanup: unify code for growing arrays into ::GrowBy().  Cleans up
Insert/Replace/etc. (Both nsVoidArray and nsSupportsArray).

nsSupportsArray: fixed RemoveElement(); it was ignoring the starting position. 
Made RemoveElement and RemoveLastElement() use IndexOf*.  Added SizeTo() to keep
the interface at least somewhat like nsVoidArray's.



How about checking this in now before adding even more changes to this same
patch? Please file new bugs on additional large changes to nsVoidArray n'
friends in stead of cooking them into this patch...
jst: sorry about that.  If you like, I can provide a patch with waterson's
comments addressed and no other changes, and do the other stuff in that last
patch in a new bug/patch.  (I'd originally planned to do that, actually, but
when I found the bug in nsSupports and realized that I had to make a bunch of
changed for Waterson I just rolled it together.)

Tell me what you'd like.  I can provide a split patch tomorrow if wanted.

Comment 132

18 years ago
xpcom/ds changes checked in.
Blocks: 94243
bug 94243 will cover the remaining parts of this patch (outside of xpcom/ds),
such as the xpcjsruntime.cpp change (a='d by dbradley & jband).  The base
changes are checked in and apparently stable in the trunk, so I'm closing this
bug.  This will leave bug 92573, bug 92575, and bug 92576 for the
member-variable changes, and bug 94243 for the usage fixes and auto-variable
changes (and the js GC fix).  Anyone who wants to continue to follow this should
add themselves to the CC list (I'm adding a few people myself).  Thanks for all
the comments, people!
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED

Comment 134

18 years ago
does this bug being fixed mean the landing plan entry should closed out?

here is what the landing plan sez...

* 3. nsVoidArray/nsSupportsArray

* 27-31-1: Reduce allocation overheads: bug 90545/etc
* 7-31-1: Ready for tests: speed, startup speed, # of allocations, memory use.
(Includes 3 dependant patches)
* 8-7-1: xpcom/ds patches applied. Next will be things using
nsVoid/SupportsArray, then member variable uses
Yes, that should be closed out.  What's the URL to adminster those?

Comment 136

18 years ago
one line:
http://komodo.mozilla.org/planning/branches.cgi? 
showname=nsVoidArray/nsSupportsArray
You need to log in before you can comment on or make changes to this bug.