Closed Bug 90545 Opened 24 years ago Closed 24 years ago

Efficiency tweak & fix wrong comments for nsVoidArray

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.4

People

(Reporter: jesup, Assigned: jesup)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(19 files)

7.62 KB, patch
Details | Diff | Splinter Review
70.81 KB, patch
Details | Diff | Splinter Review
79.14 KB, patch
Details | Diff | Splinter Review
78.28 KB, patch
Details | Diff | Splinter Review
28.21 KB, patch
Details | Diff | Splinter Review
79.56 KB, patch
Details | Diff | Splinter Review
79.70 KB, patch
Details | Diff | Splinter Review
77.48 KB, patch
Details | Diff | Splinter Review
38.82 KB, patch
Details | Diff | Splinter Review
39.36 KB, patch
Details | Diff | Splinter Review
2.33 KB, text/plain
Details
23.67 KB, patch
Details | Diff | Splinter Review
2.36 KB, text/plain
Details
60.51 KB, patch
Details | Diff | Splinter Review
59.62 KB, patch
Details | Diff | Splinter Review
59.19 KB, patch
Details | Diff | Splinter Review
59.19 KB, patch
Details | Diff | Splinter Review
46.03 KB, patch
Details | Diff | Splinter Review
68.97 KB, patch
Details | Diff | Splinter Review
nsVoidArray.cpp is rather inefficient in how it sets up the Impl structure; there's lots of duplicated code. Also, the comments about how the Impl structure works are just plain wrong (probably they date to before a major change). I also (and perhaps more controversially) patched the allocator for slightly larger increases for small arrays (8 instead of 4), and moved the point where it starts doubling to 32 (from 16), and modified the code to try to keep allocations right on a power-of-two in size instead of often just over a power-of-two - for larger allocations, this is far less inefficient of memory. With 'binned' allocators (such as the BSD allocator), there's an improvement for mid-to-small-size allocations as well. Patch with improvements (to be) attached. I'm more than willing to attach a patch without the changes to the allocation sizes. I have tested these with no problems.
Added a patch. I've verified this using dmalloc (an allocation checker/fencepost tool I have working with mozilla in my tree; to be released - see www.dmalloc.org). This should reduce the number of allocation calls (some), and reduce the amount of wasted memory (some to a lot, depends on OS allocator). It also simplifies the code and corrects incorrect comments.
Keywords: footprint, patch, perf
Blocks: 67618
The casting between char[] and Impl* is quite ugly, IMHO. There is an easy way to see if a number is a pure 2^x in which case you don't have to count bits and that is: if ((x & (x-1)) == 0) // x = 2^n for some n This seems like a typo, kLinearTreshold is a count that has nothing to do with pointer size: +static const PRInt32 kLinearThreshold = 32 * sizeof(void *);
I want to round up to the next 2**n, so n&(n-1) doesn't work. I'll look at the casting, however I think it was all there already. I changed kLinearThreshold to be a byte threshold instead of a count threshold; that was intentional (note the comparison changed too). Here's something I added to bug 67618 I just found out: :I found a lot of the reason for all the nsVoidArray::InsertElementAt :allocations is the CSS code. It uses lots of nsVoidArrays which are then :immediately appended to. I'm going to look at either (a) switching some of :them to nsAutoVoidArrays, and/or (b) adding a constructor that takes a single :array element to be inserted at construction time. (a) probably makes more :sense. You see thousands of void arrays going from size 0 (no mImpl) to size 8; almost none going larger.
Adding hyatt because this impacts on his CSS optimizations possibly.
I checked; the NS_REINTERPRET_CAST()s were all there before (essentially). I just moved them.
I was thinking something like: if ((oldSize & (oldSize-1)) == 0) // oldSize = 2^n for some n newSize = oldSize < 2; else // count bits and stuff. I'm applying your patch now and will try to get some quantify numbers before I fall asleep.
You probably want to update the comment "Grow by kGrowArrayBy slots if we're smaller than kLinearThreshold elements" if kLinearTreshold no longer is a count of elements.
Quantify numbers for a startup: Before patch After patch Allocations by InsertElementAt 5769 (8,25ms) 4686 (27,6ms) Memory freed by InsertElementAt 1268 (1,25ms) 181 (0,25ms) Memory freed by destructor 1074 (1,37ms) 1078 (1,29ms) That is 1000 reallocations fewer but the times are scary. I don't know if the memory handling times (which are clock timed by Quantify) should be ignored. The timing could be dependent on so much else. The state of the heap and the load of the computer and thousands of small things. I also don't know about memory impact. I guess you could add some statistics to keep track of wasted memory so that it too can be compared?
I did a new run and now the allocation time (which was 27ms above) was 9,21ms so the number in the twenties was something odd I think. So if the memory waste isn't too big, I would say go for the patch (with minor adjustments as mentioned earlier).
Btw, you might want to look at bug 63473 and bug 29873 if you are changing things in nsVoidArray. More numbers. Total allocation CPU time during startup was before the change 866 ms and after the change 835 ms. This is despite that there were 2% more allocations afterwards (for some reason). That can just be noise, but the numbers are at least on the right side.
I think the first bug number is wrong... Thanks for all the testing!! It's nice to see some confirmation. I suspect part of the improvement might be when we do allocate large blocks, they're not wasting as much memory. The BIG improvement will be reducing the number of 0 items -> 8 items allocations in the CSS code. I'll look at not counting bits if we don't need to, but that code doesn't get invoked all that often, and it's fast.
Keywords: mozilla0.9.3
Right, it should have been bug 63743.
While I'm at it, I'll fix bug 63743 to. I'll put up a new patch probably monday, as well as some patches for CSS (perhaps) to minimize the number of small allocations done.
Blocks: 91017
I've found that there are a LOT of places where we use nsVoidArray and immediately append an element to it. By some judicious conversions to nsAutoVoidArrays, I've cut quite a few thousand allocations from startup, and probably added little or no footprint (since the allocations often got rounded up anyways). I'm still working on it.
Well, I have the number of nsVoidArray mImpl allocations down from around 15000 to about 2500. I _think_ the amount of VSS should be similar or unchanged. I'm going to need someone to test timing (and VM usage if possible, plus maybe page-load times), since I can't easily do so on my system. Daniel? Dave Hyatt?
Sounds promising. Can jrgm do the perf tests you want?
Current results: Start browser (to blank window), close with X box, in allocations of the array portion (mImpl) of nsVoidArray: 0.9.2 my version Including registration: 11093 4789 After registration (normal): 6797 1788 Goto CNN, quit (after reg): 10605 3779 Delta for CNN (after start): 3808 1991 (I've been mostly working on stuff touched in startup, so there's probably more to be had in loading CNN).
Sure. I'm going to make a patch in the next day or so with all these mods. (All are pretty simple other than the changes to nsVoidArray, which are mostly in the patch already poste, and are still pretty simple.)
Is it possible to calculate the amount of "unnecessary" memory allocated and number of reallocations done? Such statistics could be used to tweak the constants even further. For instance, is there a way for a user to specify wanted start size? If I know that I will use 5000 slots, it would be bad to let the array reallocate itself several times while adding stuff to it. Btw, does the array ever shrink? If I empty it, but leave the struct alive, will it then be the same size as when it was as biggest? Otherwise, I like these changes. I just hope they don't affect bloat.
nsVoidArray has always supported an nsVoidArray::nsVoidArray(PRInt32 aCount), so you can set an initial size (this is done in a few places). Also, there's a ::Compact() method which is also used a few places. I modified ::operator= to Compact() if the array lost more than 50 entries. There's no auto-compact for RemoveElementAt(), though, and I'm not sure it's a good idea - maybe if there's a big drop in size, AND we then add an entry back. Not worth the trouble given the usage I see. nsSupportsArray was really dumb, and always grew by 8 entries - and unlike nsVoidArrays, lots of nsSupportsArrays have hundreds of entries. I'm fixing that too. I believe this will have minimal to no impact on bloat, with the _possible_ exception of the CSS changes. Even there, some of them are quite unlikely to bloat, and others I shied away from because of bloat. Many things I changed were basically: blah = new nsVoidArray; blah.AppendElement(foo);
Yeah!!! Updated current results: Start browser (to blank window), close with X box, in allocations of the array portion (mImpl) of nsVoidArray: 0.9.2 my version %decrease Including registration: 11093 3843 65% After registration (normal): 6797 840 87% Goto CNN, quit (after reg): 10605 2219 79% Delta for CNN (after start): 3808 1379 64% The reduction in after-registration startup allocations is especially significant. Time for me to roll all this into a patch someone can try running through some timings and some memory-usage tests. jrgm? Daniel?
I can probably do some simple timing tests but I don't expect anything really big from it since there is so much else that also does allocations. It's just a step in the right direction. The average time for a memory allocation seems to be in the order of 10 us so 10000 saved allocations would be max 0.1 s. How it affects the state of the heap is a little more complicated question and it's probably there the greatest wins can be found if you have managed to make the allocations better adapted in size. Very difficult to measure though.
If you roll that patch up, I can give it a spin, and I'll ping someone (curt?) about memory tests. If this does fall into the 0.1 region of wall-clock time changes though I may not be able to "see" this. I'm just not that accurate :-\
Sorry the patches haven't been posted yet; the fiber-cut that's affecting east-coast makes it impossible for me to run a cvs diff or checkout. I have trouble even fetching a single bug report.
Blocks: 71814
Whomever added the 71814 dependancy got it wrong I think...
Blocks: 71874
No longer blocks: 71814
For anyone following this, I'm also switching from new char[...] to PR_Malloc/etc for the mImpl structure so that when I grow the array, I can use PR_Realloc. I eliminated almost all memset's and many memcpy's. Current figures: startup blank after reg, quit: 700 (was 6797) startup then load cnn after reg, quit: 1319 (was 10605) So I'm around 90% reduction. Also, nsSupportsArrays now double allocations sizes instead of growing by 8 linearly (and they're often 500-1000 entries or more!) I'm hoping for a noticable tick down in page-load times as well as restart.
I'm taking this bug (surprise)
Assignee: kandrot → rjesup
Anyone who can, please review the patch. I'll write some more details about it later, but basically I rewrote nsVoidArray moderately extensively, and changed many uses of nsVoidArray to nsAutoVoidArray. I also added some methods to allow users of nsVoidArray more control on how much to shrink it by. The changes outside of nsVoidArray should be generally quite safe. We should have dramatically less allocations of both nsVoidArray->mImpl structures and nsSupportsArray mArray structures (which was O(n**2) before, with n often > 500). nsVoidArray is much smarter about avoiding copying and clearing, and some bugs have been quashed in the process (ReplaceElementAt() could extend the array, but if it was an nsAutoVoidArray the implicit elements added would be garbage). Anyone who can run either timing tests (with and without the patch) at startup or loading a set of URL's, or memory tests (VSS and/or RSS) to test for any bloat, please give it a whirl. Whatever tool was used in bug 67618 would be a good test too, since that was the impetus for me to go after this. Wherever possible, please give before and after numbers. Execution time differences may be too small to measure with a stopwatch (though I'm hoping). Note: one frequent source of nsVoidArrays with thousands of elements was deferred GC cleanup objects for JS, which was then Compact()ed and freed. I used a new method (probably poorly named as nsVoidArray::ExpandTo()) to Compact the nsVoidArray to no smaller than 250 entries, and only if it was more than 512. There's only one of these arrays, and it gets used on every GC, and it's always up in the hundreds of entries. Targeting mozilla0.9.3
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.3
Doesn't compile on Windows. j:\moz\mozillaprofile\mozilla\xpcom\ds\nsVoidArray.cpp(451) : error C4716: 'nsAutoVoidArray::ExpandTo' : must return a value
I looked at the patch. Comments interleaved with patch lines: + nsStringArray(PRInt32 aCount); // initial count of aCount elements set to nsnull + nsCStringArray(PRInt32 aCount); // initial count of aCount elements set to nsnull Don't understand that comment at all. + mRangeList = (nsVoidArray *) new nsAutoVoidArray(); is the cast necessary? Isn't nsAutoVoidArray already a nsVoidArray? Several times I see the pattern below: - nsVoidArray* vector = GetChildVector(); + nsAutoVoidArray* vector = GetChildVector(); You don't have to know that it is an nsAutoVoidArray. Treat it as an nsVoidArray so you don't have to change code whenever there is a new type of nsVoidArray. It's only when creating it you need to know the exact type. If you use casts to call the right function you're doing it the wrong way. Make the relevant function |virtual| instead. That change will reduce the size of the patch alot to. - mOrder = new nsVoidArray(); + mOrder = new nsAutoVoidArray(); How big is the win from allocating a nsAutoVoidArray from the heap instead of an nsVoidArray? + mCursor(nsnull), What is this? + gGlobalList = new nsFontNodeArray(32); + nodes = new nsFontNodeArray(32); + nsFontNodeArray nodes(32); Magic uncommented constant
Daniel: Thanks for going over the code in detail! >Doesn't compile on Windows. >j:\moz\mozillaprofile\mozilla\xpcom\ds\nsVoidArray.cpp(451) : error C4716: >'nsAutoVoidArray::ExpandTo' : must return a value Sorry; I added a return to ExpandTo late in the game and didn't check nsAutoVoidArray's version. Easy fix. >You don't have to know that it is an nsAutoVoidArray. Treat it as an nsVoidArray >so you don't have to change code whenever there is a new type of nsVoidArray. >It's only when creating it you need to know the exact type. If you use casts to >call the right function you're doing it the wrong way. Make the relevant >function |virtual| instead. That change will reduce the size of the patch alot >to. Good point; I should go back and clean that up some. (Much of my serious OO experience is with purely-dynamic OO systems where you don't need constructs like virtual.) >- mOrder = new nsVoidArray(); >+ mOrder = new nsAutoVoidArray(); >How big is the win from allocating a nsAutoVoidArray from the heap instead of an >nsVoidArray? Roughly 1/2 the overhead if the array never needs more than 8 items, and at some point has an item in it. If it never has an item in it, the cpu overhead is the same. The win is that you only call malloc (and free) once. >+ mCursor(nsnull), > What is this? That was to fix a bug with initialization of mCursor, if I remember correctly - when I added dmalloc (which trashes allocations to 0xc5), it went kaboom there. I'll try to double-check, but that sounds right. Note that CSS_IF_COPY doesn't touch val if aCopy.val isn't set. >+ gGlobalList = new nsFontNodeArray(32); >+ nodes = new nsFontNodeArray(32); >+ nsFontNodeArray nodes(32); >Magic uncommented constant Sorry; I was fooling around with values to figure out a good default and didn't go back to parameterize it. >+ nsCStringArray(PRInt32 aCount); // initial count of aCount elements set to nsnull >Don't understand that comment at all. I think my English deserted me. Actually, I think someone interrupted me or was talking to me while I typed and two versions got intermingled. How about: + nsCStringArray(PRInt32 aCount); // Storage for aCount elements will be pre-allocated
Priority: -- → P2
Hardware: PC → All
I updated the patch. Daniel's comments should all have been addressed. I also implemented dbaron's MoveElement() (I did it separately, and when I looked at his code it was almost identical) and changed the font cache to use it. Functions that are overridden by nsAutoVoidArray/etc are now virtual. I went through the source tree checking calls to ReplaceElementAt and found a few that should really be AppendElement. I found by inspection that the font cache arrays don't have many entries except maybe the main one (which then gets nuked anyways.... see another bug I just filed) so I just made them nsAutoVoidArrays. (Except for the main list I never saw one with more than 5 entries, most had either 1 or 2-3.) I made one or two more things nsAutoVoidArrays (nsRDFXMLSerializer.cpp). I also added dbaron's comments to nsQuickSort.h. (While writing thing I realized I forgot to remove changes from one file; I'll re-upload the patch as soon as I post this.)
Looks much better. And now it even seem to compile. I will try to do some measurements later today if I'm home early. Many lines similar to: + j = FillArrayWithAncestors((nsVoidArray *) &array2,aNode2); You should never have to cast a nsAutoVoidArray pointer to a nsVoidArray and if you really have to (don't know when that would be) you shouldn't use a C cast. Use NS_STATIC_CAST instead so that the compiler will stop obviously illegal casts. If you really have to do a cast that is "unsafe", use NS_REINTERPRET_CAST. The best is to always use pointers to nsVoidArrays and then only mention nsAutoVoidArray when you create the array. The virtual functions will take care of the rest. That will make it easier to maintain the code and nobody will have to wonder why for instance the aData in DeleteSheetMap has to a nsAutoVoidArray instead of any nsVoidArray. + nsAutoVoidArray* mOrder; Since it's just a pointer, you can just declare it as a nsVoidArray* and let it point to whatever type of nsVoidArray you need, unless you want to be perfectly sure that it really is a nsAutoVoidArray. + // We could use aBuffer.MoveElement(), but it wouldn't be much of + // a win if any for swapping two elements. Do we need a nsVoidArray::SwapElements?
It almost compiles: j:\moz\mozillaprofile\mozilla\mailnews\base\search\src\nsMsgFilterList.cpp(857) : error C2039: 'MoveElement' : is not a member of 'nsDerivedSafe<class nsISupportsArray>'
Darn. Sorry; I usually build with --disable-mailnews and didn't think that I hadn't added MoveElement to nsSupportsArray (I have now; it makes sense to do so anyways -- If I was being really pedantic, nsSupportsArray would inherit from nsVoidArray, or both would inherit from a virtual base class). As for the casts, I didn't go through every file in the patch to remove them. I'll make another look and post another updated patch in a few minutes. I'm also adding a sanity check I missed to MoveElement (I was only checking aTo for being past the end of the array, not aFrom). If anyone else can run performance test (particularly bloat tests and tests that log the number of allocations, or ibench and other startup/page-load tests) it would be useful. Waterson? Hyatt? JRGM?
My apologies; I first uploaded a file of only the files changed from the previous patch by mistake. The latest patch (http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43388) is the correct one and updates all files.
*** Bug 91017 has been marked as a duplicate of this bug. ***
>Do we need a nsVoidArray::SwapElements? I don't think so; there's only one place in the code I can see that would use it, and the overhead is minimal to do it directly. Moving an element to the head (and sliding the ones inbetween) is something that should be handled inside the class, both for abstraction and efficiency reasons.
Sun's compiler whines about code order issue: "xpcom/ds/nsVoidArray.cpp", line 118: Error: "nsVoidArray::IsArrayOwner() const" has already been called and cannot be defined inline. Thas is easy to fix by moving IsArrayOwner() before code which calls it.
Startup CPU time on main thread decreased by ~100 ms for me. That is -4%. The number of mallocs decreased by ~1000. (-0.7%) But they were timed at times a couple of tenths of a second slower than my referance run. I don't know why. Maybe it's just too much noise. My computer isn't exactly unloaded when I'm doing the tests. That doesn't matter for Quantify withing Mozilla because it counts the instructions but it times the OS and libc calls giving much greater uncertainty there. Time to load the page stress case seemed to decrease by 100-200 ms which surprises me. I wouldn't expect such a difference, but then that page is heavy on CSS. From a testrun: Reflows -30 ms (-3%) Tokenizer -25ms (-10%) NavDTD::BuildModel -130ms (-6%) I have a hard time believing these figures but they are at least in the right direction. It will be interesting to see the result from jgrm's more professional test runs. I have not looked very hard at the bloat change. The only thing I can say is that the change doesn't seem to be dramatic.
I'm in the middle of getting a couple of builds (win2k/linux) done to do some tests (startup time, page loading) and I hope to get results by end tomorrow. [Maybe it's just me, but I had a lot of trouble patching on win32. However, there is one thing wrong in the patch attach_id=43418 (confusing but benign): the same hunk for gfx/src/nsImageRequest.cpp is included twice, leading to a failure on the second pass). I'm not really sure what the best memory/allocation/bloat tests should be run with this. waterson?
Randell, how about allocating memory with |alloca| in the nsAutoVoidArray case? Since the array won't survive the current scope we could as well store the full array on the stack reducing heap operations even further. The problems would be that you can't realloc or free an |alloca|-allocated memory but it could still be used for the initial buffer that is hard coded to 8 elements. If we allocate that buffer with alloca (allocate on the stack) we don't need to hard code the stack size, but can allow the caller to say how big the array will be. That should both reduce heap operations, increase performance and reduce bloat. (assuming there is an alloca equivalent in NSPR)
Daniel: Great numbers! I figured there'd be some execution win in CSS by avoiding lots of allocations, but I never figured it'd be that good. It's also really nice to see 3-10% drops in every test done, even if they may not hold up entirely in a more rigorous test. John: I fixed the duplicate hunk; because this is pulled from a tree with other changes I don't want to release, the updates to the patch have been done by just replacing files within the patch file, and I missed removing the old version of one of them. Thanks for noticing. Daniel: I don't think there's an alloca equivalent in NSPR - it's too compiler-specific (i.e. GCC). Even if there was, I'd have to put the alloca at the same scope as where the "nsAutoVoidArray foo;" is now, and that would mean using macros or modifying a lot of code. Even then, it might be problematic to get C++ to be happy with this. I had thought of overriding operator new, but that doesn't help with automatics like the above. If someone has a good _portable_ way to not have a fixed initial arraysize _and_ only have a single allocation without overriding new, I'd love to hear it (for a later patch perhaps). I don't think C++ can do it. Fully dynamic object systems can (at least some, perhaps all).
Blocks: 92256
BTW, overriding operator new for nsVoidArrays is still a useful idea for things such as "new nsVoidArray(100)". This isn't done enough for me to have tried it yet, and it doesn't help with automatics or arrays within other objects.
jrgm: re bloat tests, something like what tracy runs would be ideal (see http://ftp.mozilla.org/pub/data/memtests/, e.g.); however, I'd be happy with somebody doing a gross comparison of RSS and VM size after startup.
cc'ing some other people who've worked on nsVoidArray before (dbaron, rbs, scc). - I'm frightened by the use of nsAutoVoidArray as a member variable. (It's ten words, no?) Although it _looks_ like it's only being used in classes that don't tend to have a lot of instances, we need to be doubly sure here. jst: it appears that most of these replacements are in DOM code. - I didn't check terribly carefully, but did you keep the special casing for zero element arrays? (Ought we special-case single element arrays, too?) Perhaps it'd be useful to add some statistics gathering code to count grows, shrinks, size distribution, etc.? - There are several places where we ought to hoist an invariant function out of an expression where it's repeatedly evaluated -- e.g., GetArraySize(). It's not |inline|, and I don't think that the compiler will optimize away the redundant calls. - Minor nit, but...you're asserting your own style here. That's fine, as it appears you're rewriting most of the code; just don't leave the file as a mish-mash of coding styles when you're done. thanks...
mozilla.0.9.4, since I apparently missed the 0.9.3 boat.
Keywords: mozilla0.9.3
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I was pretty careful about where I made members into nsAutoVoidArrays; almost all such cases have 1 or more entries inserted, so we end up saving space in most cases. 0 element arrays (as nsVoidArrays) still don't have an mImpl structure allocated until the first element is added. I've considered making an nsVoidArray that stores single objects inline, and if more than 1 is added, then allocate a real voidarray object. It still would require at least two words (maybe 3), and more complex code. Note that one class already does this itself (search for nsCheapVoidArray). Making GetArraySize() an inline is a good idea. (I had forgotten it wasn't.) I'll check other invariants. As for style: Since I'm rewriting so much, I guess it makes sense to assert style (actually, emacs does it for me... ;-) I left it a mismash because I wanted the diffs to be readable; but so much has changed that perhaps I should just go in and reformat. Opinions? Thanks for the review Chris!
Um, Chris, GetArraySize is: inline PRInt32 nsVoidArray::GetArraySize() const {...} So I think that's ok.
I strongly object to doing what looks like a search-and-replace job to use nsAutoVoidArray in stead of nsVoidArray, some of these changes are to members of classes whose size we don't wanto change just like that. How about we split this work up so that the optimiziations to ns[Auto]VoidArray are in this bug and the other changes get split into separate bugs for the different areas that are affected?
It was definitely not a search-and-replace (though I did hit a lot of spots). I ran through many startups of mozilla (and page loads), looking where initial allocations of mImpl structures were coming from, and looked to see which ones were done a lot, and where they often had entries (but not lots of entrie). If an nsVoidArray is going to have entries anyways, it doesn't generally cost you memory to make it an nsAutoVoidArray (and it can save memory, and it definitely saves a trip through the allocator and deallocator, plus you get better processor cache hit-rates). Do you have any specific ones you question? The ones I worried about the most were the CSS classes, XUL Elements, and maybe one or two more. Thats why I want people to run VSS/RSS checks to see the impact on memory usage. I'm hoping the memory hit is in the 10's of K (total) - not bad for a multiple-% speedup. Classes where I made members into nsAutoVoidArrays: nsContentSubtreeIterator: nsAutoVoidArray mStartNodes; nsAutoVoidArray mStartOffsets; nsAutoVoidArray mEndNodes; nsAutoVoidArray mEndOffsets; These all normally get entries added in ::Init nsBaseContentList: nsAutoVoidArray mElements; These seemed to always end up with elements in them in practice. nsDocument: (a giant object if ever there was one) nsAutoVoidArray mStyleSheets; nsAutoVoidArray mObservers; // basically always has at least 1 entry Note I only did these two; other nsVoidArrays in nsDocument (mCharSetObservers, mPresShells, mSubDocuments) I left alone, because I was less sure if they should be converted (perhaps mPresShells and mCharSetObservers). nsDocumentEncoder: nsAutoVoidArray mCommonAncestors; nsAutoVoidArray mStartNodes; nsAutoVoidArray mStartOffsets; nsAutoVoidArray mEndNodes; nsAutoVoidArray mEndOffsets; Again, when DocumentEncoder is created and used, these seem to always end up with elements in them. nsFormControlList: nsAutoVoidArray mElements; // Holds WEAK references - bug 36639 Again, this seems to always have at least one item in it when it exists. HTMLContentSink: (Another huge object) nsAutoVoidArray mContextStack; This gets used a lot. nsCSSLoader: // mParsingData is (almost?) always needed, so create with storage nsAutoVoidArray mParsingData; // array of data for sheets currently parsing As stated. There are two other nsVoidArrays I didn't touch. CSSStyleSheetInner: nsAutoVoidArray mSheets; This does an AppendElement in it's constructor. nsXBLStreamListener: nsAutoVoidArray mBindingRequests; These seem to get requests associated with them almost always. nsXULElement: nsAutoVoidArray mChildren; // [OWNER] Many (most) XUL elements seem to have children nsXULContentSinkImpl: nsAutoVoidArray mNameSpaceStack; Again, almost always used - it's called from ::OpenContainer nsXULDocument: (another BIG object) // This always has at least one observer nsAutoVoidArray mObservers; // This is set in nsPresContext::Init, which calls SetShell. // Since I think this is almost always done, take the 32-byte hit for // an nsAutoVoidArray instead of having it be a separate allocation. nsAutoVoidArray mCharSetObservers; // is we're attached to a DocumentViewImpl, we have a presshell nsAutoVoidArray mPresShells; The comments speak for themselves. nsHTMLReflowCommand: nsAutoVoidArray mPath; As best I can tell, this almost always gets set by BuildPath, which is called from ::Dispatch(). nsImageMap: nsAutoVoidArray mAreas; // almost always has some entries ImageMaps rarely have no entries... nsLineLayout: nsAutoVoidArray mWordFrames; RecordWordFrame/ForgetWordFrame is called all the time. nsTableCellMap: nsAutoVoidArray mCols; Tables rarely have no columns. nsCellMap: nsAutoVoidArray mRows; Tables rarely have no rows. nsTableFrame: nsAutoVoidArray mColFrames; // XXX temporarily public Ditto nsHttpHeaderArray: nsAutoVoidArray mHeaders; We always have at least a few HTTP headers. (This was one area where I wished I could devote even more space to the autovoidarray, since 8 might be too low for many webservers.) CompositeDataSourceImpl: (rdf) nsAutoVoidArray mDataSources; nsAutoVoidArray mObservers; I doubt there are any CompositeDataSources without data sources. As for Observers, i can't say they all have them, but it certainly showed up a lot in my allocation traps. CompositeEnumeratorImpl: (rdf) nsAutoVoidArray mAlreadyReturned; If we return anything and mCoalesceDuplicateArcs is true, we stick it in here. I'd guess roughly 1/2 of them use this. InMemoryArcsEnumeratorImpl: (rdf) nsAutoVoidArray mAlreadyReturned; If we return anything, we stick it in here. RDFContentSinkImpl: nsAutoVoidArray mNameSpaceScopes; We add something to this when ::OpenContainer() is done. nsDocLoaderImpl: nsAutoVoidArray mRequestInfoList; This is added to in ::OnStartRequest() nsViewManager: (big object) nsAutoVoidArray mDisplayList; This is used whenever we do ::RenderViews() (for Z sorting)
I should note that if there's an objection to any of these, or if they're bloating us too much, they obviously can be dropped (or reworked to use nsVoidArray *'s and new as needed).
I must admit that I share the misgivings of the previous reviewers. Wouldn't that be possible to instead manage these |mImpl| using an arena that will shield memory tricks from consumers.
Yes, I realize that an alloca solution requires macros but macros is an integral part of both C and C++ so that is not extremely bad. Another stackbased solution would be to pass a buffer and a size to the nsAutoVoid constructor. char initialArrayBuffer[4*sizeof(void*)]; nsAutoVoidArray stuffArray(initialArrayBuffer, 4); That breaks abstraction somewhat but could be used if we really want to tune an array. It is not as flexible as alloca since we have to know the size of the array at compile time. Arena solution: Who will own the arena?
>Arena solution: Who will own the arena? A wrapper (helper) class that registers itself as a shutdown listener at its creation (first use).
No, that's no good solution. An arena that lives as long as the program and accepts requests for all kind of sizes (which this arena would have to) will have major drawbacks. They will suffer from fragmentation and bloat and they probably wont be as fast. An arena is at its best when they can be used to serve blocks of fixed sizes that has a common predetermined life span. When that life span is over the whole arena can be release at once. As is the case of the frame arena in the pres shell. If an object has undetermined life span and size, it's the heap that is the solution. If the heap can't handle that case, which is what it's optimized to do, well, why should we be better?
In this context, the storage used by the arena is heap-allocated. Just that a large amount is allocated at once. And you could have several default arenas for the various sizes that are the current "defaults". The |mImpl| of a voidarray of size 8 is picked from Arena8, that of size 24 is picked from Arena24; Full arenas or a voidarray of large unexpected size are left alone and the space is picked from the heap. And ExpandTo() could safely shuffle data around between arenas because |mImpl| is private. But I just noted another issue. The service manager that does the shutting down is itself using nsVoidArray... So to avoid troubles, the wrapper class might instead have to refcount its users: a |new| (resp. |delete|) through the wrapper class increments (resp. decrements) the count of the arena being used. And the wrapper can free any arena(s) anytime and itself when the counts goes to zero. (In order words, it is implementable. Whether it is worth it is another matter.)
Randell, I know this wasn't done as a simple search-and-replace change, but from looking at the diffs it looks more or less as it was one :-) Anyhow, I have objections to the changes made to: nsBaseContentList, nsFormControlList, nsLineLayout, nsGenericDOMDataNode, nsGenericElement (all of them), nsEventListenerManager (all of them), nsXULElement. I'd need to see real numbers showing that this is worth it before allowing these changes in those classes (files), what's the speed/mem useage tradeoff here? Maybe I'm misunderstanding something here, but you're changing members of classes from nsVoidArray (8 bytes empty, on a 32-bit system) to nsAutoVoidArray (~40 bytes empty), that seems excessive to me, maybe it makes sense in some cases, but are you sure it's worth it in all these cases? Also, if you put more than 8 elements in an nsAutoVoidArray you'll completely waste 32 bytes for the now unused internal buffer. - I highly doubt the changes in nsDocument are worth it, we create documents very seldom (relatively speaking), and the items in the arrays mStyleSheets and mObservers hardly ever change so I don't see the benefit. - What's the point of changing the type of the pointer member in nsXMLContentSink.h? Why not just leave that as nsVoidArray and allocate a nsAutoVoidArray if we want to use one? - This seems silly, nsXULContentSink.cpp, if you wanto use nsAutoVoidArray's, then do that, but don't change API's to make it required: - nsresult GetTopChildren(nsVoidArray** aChildren); + nsresult GetTopChildren(nsAutoVoidArray** aChildren); - There's only ever one single presshell in a document (for now), so I don't see the point of making nsXULDocument::mPresShells a nsAutoVoidArray, maybe just pre-allocate a one-element sized buffer on cunstruction in stead? Oh, and "// is we're ..." should read "// i*f* we're ...". - Oh, and what's the benefit of this change in nsCSSDeclaration.cpp (other than the nsVoidArray -> nsAutoVoidArray)? Avoid the one IndexOf() call on an empty array (which is trivial and very fast)? Hardly worth the added complexity IMO. @@ -1837,13 +1838,16 @@ if (NS_OK == result) { if (nsnull == mOrder) { - mOrder = new nsVoidArray(); + mOrder = new nsAutoVoidArray(); } - if (nsnull != mOrder) { + else { + // order IS important for CSS, so remove and add to the end PRInt32 index = mOrder->IndexOf((void*)aProperty); if (-1 != index) { mOrder->RemoveElementAt(index); } + } + if (nsnull != mOrder) { if (eCSSUnit_Null != aValue.GetUnit()) { mOrder->AppendElement((void*)(PRInt32)aProperty); } - Fix the whitespace in nsCSSStyleSheet.cpp: - nsVoidArray mSheets; + nsAutoVoidArray mSheets; - Fix the whitespace in nsRDFContentSink.cpp: - nsVoidArray mNameSpaceScopes; + nsAutoVoidArray mNameSpaceScopes; - Hyatt really needs to have a look at the changes to the stylesystem before this is checked in. I think you're doing a good thing here, but we need to study the impact of these changes more before checking in all these changes. Again, I think it'd be better to split this bug into a few separate bugs for changes to separate areas of the code.
Mass reply to Daniel, rbs, and jst: Daniel wrote: >> Arenas > >No, that's no good solution. An arena that lives as long as the program >and accepts requests for all kind of sizes (which this arena would have >to) will have major drawbacks. They will suffer from fragmentation and >bloat and they probably wont be as fast. An arena is at its best when they >can be used to serve blocks of fixed sizes that has a common predetermined >life span. When that life span is over the whole arena can be release at >once. As is the case of the frame arena in the pres shell. Arenas and pools are (as you said) good for specific allocation patterns. An allocation pool is good where the maximum number of identical items in use is fairly static and they're often needed. Pools for the smaller allocation sizes here _might_ be useful (allocate N at a time, deal with the "froth" aspect of people adding things to them and then destroying them, rinse, repeat). I may look further into that, but I'm not sure it would be a win. Arenas are probably not a great solution; these arrays aren't in any way tied to each other, and their lifetimes will vary wildly. rbs wrote: >In this context, the storage used by the arena is heap-allocated. Just >that a large amount is allocated at once. And you could have several >default arenas for the various sizes that are the current "defaults". The >|mImpl| of a voidarray of size 8 is picked from Arena8, that of size 24 is >picked from Arena24; Full arenas or a voidarray of large unexpected size >are left alone and the space is picked from the heap. And ExpandTo() could >safely shuffle data around between arenas because |mImpl| is private. Certainly, though I think an allocation pool (or rather pools by size) may make more sense. >But I just noted another issue. The service manager that does the shutting >down is itself using nsVoidArray... So to avoid troubles, the wrapper >class might instead have to refcount its users: a |new| (resp. |delete|) >through the wrapper class increments (resp. decrements) the count of the >arena being used. And the wrapper can free any arena(s) anytime and itself >when the counts goes to zero. (In order words, it is >implementable. Whether it is worth it is another matter.) That's certainly a reasonable solution if we want to do that. As you said, I'm not sure it's worth it - but it may be. jst@netscape.com wrote: >Randell, I know this wasn't done as a simple search-and-replace change, >but from looking at the diffs it looks more or less as it was one :-) That's certainly true; it looks like I shotgunned the source tree. And, to an extent, I did. :-) Mostly I was just putting breakpoints on voidarray allocations, and seeing where they were coming from and why. (I run FreeBSD, which doesn't have stack traceback code for tracemalloc/etc). In some cases, they were ones that were being hit a LOT. In other cases, they were ones that there was no gain to holding off on allocation. >Anyhow, I have objections to the changes made to: > >nsBaseContentList, nsFormControlList, nsLineLayout, nsGenericDOMDataNode, >nsGenericElement (all of them), nsEventListenerManager (all of them), >nsXULElement. Thanks for a list; I'll try to address those in a second message. >I'd need to see real numbers showing that this is worth it before allowing >these changes in those classes (files), what's the speed/mem useage >tradeoff here? I agree; that's why I want people to run CPU/layout and memory tests with these patches. > Maybe I'm misunderstanding something here, but you're >changing members of classes from nsVoidArray (8 bytes empty, on a 32-bit >system) to nsAutoVoidArray (~40 bytes empty), Correct. > that seems excessive to me, >maybe it makes sense in some cases, but are you sure it's worth it in all >these cases? Also, if you put more than 8 elements in an nsAutoVoidArray >you'll completely waste 32 bytes for the now unused internal buffer. Correct; the space is wasted if it gets more than 8 elements. When doing my tests, I had a breakpoint/debug on an AutoVoidArray being grown so I could keep an eye on that. The total number of nsAutoVoidArrays grown during startup was 280 (which are included in the total number of allocations being ~700); the number of non-Auto's grown is down to 270. The _maximum_ storage wasted if every one of those grown arrays sticks around is 9K - and I'll bet many of them, if not most of them are freed. >- I highly doubt the changes in nsDocument are worth it, we create documents >very seldom (relatively speaking), and the items in the arrays mStyleSheets and >mObservers hardly ever change so I don't see the benefit. Perhaps we don't create that many of them, but I kept seeing them, especially when loading CNN. The upside is that if we don't make many nsDocuments, the memory overhead for an AutoVoidArray is minimal. >- What's the point of changing the type of the pointer member in >nsXMLContentSink.h? Why not just leave that as nsVoidArray and allocate a >nsAutoVoidArray if we want to use one? I already changed that in a cleanup pass yesterday. >- This seems silly, nsXULContentSink.cpp, if you wanto use nsAutoVoidArray's, >then do that, but don't change API's to make it required: > >- nsresult GetTopChildren(nsVoidArray** aChildren); >+ nsresult GetTopChildren(nsAutoVoidArray** aChildren); Ditto; I hadn't gotten around to cleaning up my changes to that before I posted. I was waiting for more feedback before I changed the patch again; I fixed that yesterday. (I missed it in my first cleanup pass.) >- There's only ever one single presshell in a document (for now), so I don't see >the point of making nsXULDocument::mPresShells a nsAutoVoidArray, maybe just >pre-allocate a one-element sized buffer on cunstruction in stead? Oh, and "// is >we're ..." should read "// i*f* we're ...". Hmmm. Then why is it an array at all? There are all sorts of DeleteShell, GetNumberOfShells, GetShellAt, etc methods. If there can be only one (hmmm, why am I thinking of Highlander?), it should be a simple pointer. Thanks for the typo correction. >- Oh, and what's the benefit of this change in nsCSSDeclaration.cpp (other than >the nsVoidArray -> nsAutoVoidArray)? Avoid the one IndexOf() call on an empty >array (which is trivial and very fast)? Hardly worth the added complexity IMO. >+ else { >+ // order IS important for CSS, so remove and add to the end > PRInt32 index = mOrder->IndexOf((void*)aProperty); > if (-1 != index) { > mOrder->RemoveElementAt(index); > } >+ } >+ if (nsnull != mOrder) { > if (eCSSUnit_Null != aValue.GetUnit()) { > mOrder->AppendElement((void*)(PRInt32)aProperty); > } Actually that's something I found when I was first starting to look at nsVoidArray (it wasn't clear to me if order was important - I thought it was, but I wasn't sure). When I was adding the comment, I optimized the code. IndexOf isn't expensive, but for items with dozens of items (which CSS arrays occasionally have) it does have some cost. I think adding the comment is important. The rest of the change is minor but will be faster than the original. >- Fix the whitespace in nsCSSStyleSheet.cpp: > >- nsVoidArray mSheets; >+ nsAutoVoidArray mSheets; >- Fix the whitespace in nsRDFContentSink.cpp: > >- nsVoidArray mNameSpaceScopes; >+ nsAutoVoidArray mNameSpaceScopes; No problem. >- Hyatt really needs to have a look at the changes to the stylesystem before >this is checked in. Absolutely; that's why I added him to the CC list: :Adding hyatt because this impacts on his CSS optimizations possibly.: The CSS stuff was something I was quite worried about touching, but I think my changes are right (if anything, perhaps too conservative). >I think you're doing a good thing here, but we need to study the impact of >these changes more before checking in all these changes. Again, I think >it'd be better to split this bug into a few separate bugs for changes to >separate areas of the code. We could do that; however that will require a lot more r='s & sr='s for things that are (mostly) the same, and may end up individually deferred. I'm focused on getting some real-world measurable gains out of this, and as a springboard to future optimizations. I've opened a tracking bug for voidarray -> autovoidarray issues; I plan to file bugs/patches for other areas of the system after this patch is committed. We can split off controversial pieces of this patch into other bugs and tie them to that one (bug 92256). I'd like to get as much of this as possible committed together, however, to start. jst: thanks again for the feedback. It's VERY useful, even where I disagree with you about the tradeoffs.
> That's certainly a reasonable solution if we want to do that. As >you said, I'm not sure it's worth it - but it may be. I had a quick look at the implementation of FrameArena in nsPresShell.cpp. Very neat. It seemed to me that there may actually be some merit in trying arena(s). After all that's what layout and the Style System have been using with some satisfaction.
I looked at presshell, and (again) at arenas. For VoidArrays, I'm afraid arenas don't make a lot of sense. The problem is that you have lots of different uses mixed together time-wise, with different lifetimes. Arenas generally don't like that; they like groups of allocations that grow and then all go away around the same time. They really don't like holes. Arenas plus recyclers (ala presshell) are somewhat better, but still are problematic when you mix dramatically different lifetimes. BTW, arenas are dreadfully documented, and the code (while efficient) is even worse in terms of readability. It desperately needs comments (yet another task...). Perhaps there's an HTML document about them somewhere? (I've been through this "figure out arenas" at least once before.) Recyclers on their own, however, or some other type of mImpl structure caches backed by the main heap are quite promising, and I'll look into them.
Okay with a recycler of some sort as this might be nicer for the long run and offer a potential alternative to sprinkling autoVoidArrays which wouldn't be a good precedent.
- Why not let free() be the recycler? There is no initialization overhead for a void array's storage, so what do we gain by maintaining a separate heap for recycled void arrays? - Once again, I'd like to make a plea for some instrumentation here so we could base future implementation decisions on data instead of speculation. :-) - I also agree with jst that we should start with the changes to nsVoidArray and its ilk, and fan out the pre-allocated storage in a separate pass. (Preferrably doing so based on the data we've collected from our instrumentation!)
With a low-overhead memory allocator (such as the BSD beerware allocator, which is also in nsprpub as an option), we would gain little by keeping our own recycler. With a high-overhead allocator (such as Lea, and maybe glibc or Win32) we do gain some. Witness that your measurements of the Lea allocator caused a 5-8% slowdown for pageload versus glibc, and I strongly suspect that the BSD allocator would be at least that much faster than glibc (at the cost of some memory). Still, that's a good point. I've implemented an mImpl recycler; perhaps after all this settles I'll see if someone could run it through and see if it speeds anything up. As for instrumentation, I'm handcuffed by running BSD - tracemalloc/etc can't grab stack traces from anything except linux or win32. I have "tools" (i.e. printfs, sort, and emacs) for capturing the number of voidarrays allocated and their sizes, but to find out from where I've had to use GDB and 'up' until I find the source of the allocation. I do have access to a Linux machine; I'll try seeing if I can get tracemalloc to work there if I can find time. Unfortunately, I do have other things I'm neglecting. Since Chris and jst want it, I'll split the patch. However, PLEASE remember that the nsVoidArray and nsSupportsArray patches alone only have a medium to small effect; the leverage comes from the changes elsewhere to make more use of nsAutoVoidArrays. Should I split out ALL the caller changes; just the ones that change member variables; or just the ones that jst was worried about? Personally, I think the issue isn't which of these should we not do; I think the issue is how many _more_ places in the code should we replace nsVoidArrays? I was only taking ones I saw hit a lot, and were obviously going to have a few entries in them - and my testing of non-startup code was exceedingly simplistic: I loaded CNN. No Mail, no menu, etc. This is just IMO - I want to see improvements that show up in the daily measurements after all this work. ;-) For reference, nsSupportsArray always has an 8-entry auto, and always has.
I made two optimized builds, base build and with http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43712 patch. I measured the startup timing, on win98 200mHz 128MB RAM, and I wasn't able to get a big difference from the two... both came up with ~22.5 sec for base and patched mozilla.exe builds. (did it with a watch, average) I also used taskinfo2000, and memory used by both builds are the same, 18,820kb.
Well, for a first-order check that's good - no major regressions in memory or speed. I didn't expect easy stopwatch-level improvements (given this was ~6-10K out of 300+K allocations), and it appears I was right. The pageload tests and (machine-timed) startup tests may tell more of the story. jrgm? I'm currently trying to get a Linux build of the (unmodified) trunk up (and failing). Also, I've split the patch into two files - one base (nsVoidArray & nsSupportsArray) plus important pieces, another for most of the nsVoidArray->nsAutoVoidArray changes. I'll be uploading them later. Also, it will include a minor fix for the Sun Forte compiler regarding 'inline' (GCC is perfectly happy).
The second patch assumes the first has been applied. Note that I also removed 'inline' from GetArraySize in the .h file to make the Sun compiler happy.
Randell, if you could split out the changes that make members of classes (directly or through allocations) be nsAutoVoidArray's by maybe one bug for mozilla/content, one for mozilla/layout and one for the rest of the member changes I think that'd be a good start.
Blocks: 92573
Blocks: 92575
Blocks: 92576
Layout: bug 92573; content: bug 92575; all others: bug 92576.
Blocks: 92614
Any news if the changes are of measurable benefit? If reducing the malloc proves helpful, then as an alternative to the shower of nsAutoVoidArray inside concrete class, I would think that other things might be worth trying as well so that there could be other options to choose from. To this end, re: recycler, I had this thought that something not that fancy could be tried. Assuming N = defaut number of |mImpl| to be pre-allocated (note: since the focus here is startup time, the number could be picked accordingly. With enough room, the data for the nsVoidArrays that are currently allocated at startup will be picked from there.) A tailor-made recycler could start by |new|ing: char ImplData[k*N] // k = default block size (e.g. 8, emulate nsAutoVoidArray) int ImplOffset[N] // _linked_ list(s) for occupied and non-occupied blocks // ImplOffset[ImplOffset[i]] is the successor in either list // ImplData[k*ImplOffset[i]] is the actual block Then, those |mImpl| that are |new|ed for the first-time could be obtained by just picking an offset in the non-ocuppied list. And the recyler will go away whenever the list of occupied blocks is empty. Not very clear but hope those who are familiar with these types of sparse data structures can see what I mean. Using the above scheme will mimic what the nsAutoVoidArrays are doing, but.. will 1) avoid the shower of nsAutoVoidArrays in the tree, and 2) shield memory tricks and allow all consumers to benefit from the scheme. Of course, if the current migration and testing that rjesup is doing shows that avoiding the malloc isn't helping that much, then there is little motivation at trying other things.
rbs, I think that is the best plan yet. That would reduce the bloat from nsAutoVoidArrays and still make them very cheap to allocate. I've tried something similar with nsTextFragments in layout (patch not uploaded yet). The one problem is what if we allocate say 1 MB of these structures, because of a page that does something that requires lots of arrays and then when the page goes away it deallocates all structures but one per malloced block so that we can't free them. Then we would sit with 1 MB allocated but only a small fraction of that used. I don't think that problem is very big though. There will always be new users of such array structures. (It is much worse with the nsTextFragments, because they are not fixed size) And for it not helping by reducing memory allocations, it does but there are many, many streams to stop and we have to start somewhere. I do think the memory allocations hurt page load time more than startup time though. (Table stress case spends 30% of the time allocating memory)
> (It is much worse with the nsTextFragments, because they are not fixed size) Not sure how you are coding over there. But the idea is that all nsVoidArrays behave like nsAutoVoidArrays the first-time, i.e., the first AppendElement() always starts within a pre-allocated block (if an unocuppied block is still available). If there is no more block or if the array later grows pass the block size, its own separate space is then allocated in the heap. (Those nsVoidArrays with predefined size at construction may perhaps need to be left out of the equation.) The practical difference with nsAutoVoidArray will be therefore that nsAutoVoidArray offers the _guarantee_ that an allocation is not happening the first-time. With the scheme, an allocation may happen or may not happen depending on whether there is still an unocuppied block. (Of course, it is possible to expand the number blocks on the fly, but that introduces other complications. The idea is to keep things simple and wait'n see).
WJat I did with nsTextFragments and what I suggest here is to not keep one big block, but to keep a number of blocks, so say that we have blocks of 500 mImpls. When those 500 are used we allocate a new block, and when one block becomes unused we free it. It will only be if the default size mImpl is too small (or big) that the mImpl is allocated directly from the heap. This will reduce the allocation cost to one ~500th (ignoring our own overhead). If we have one big block we will 1. Use all its space and don't gain anything at all or 2. Have a big allocated block that wastes space, increasing bloat. We can't tune the size to be about right, because that size is dependant on what the user does, how many windows she has open and what pages are loaded. Remember that the big memory crook is not startup, it's page load. I think there already exists structures that do these things in the tree so it wouldn't be very hard to implement. I remember that I've seen the name nsFixedSizeAllocator at least which sound as if it might do the right thing.
Sounds good. There is still the scenario where there could be sticking elements in several blocks at once. But thinking of such things would just discourage from doing anything (anyway, there is the possibility of combining an collapsing blocks if that becomes a serious issue later). Since you already started something on a related problem, wanna pick up the torch here? Handling the inner 500 elements (those mImpl of fixed size=8 for example) inside each block is perhaps where some custom code is needed. When an |mImpl| is 'full', it is re-allocated outside and its previous location inside the block becomes ready for another element. If done properly, the final data structure can be sufficiently general so as to solve your nsTextFragment problem too. BTW, in my previous terminology I used 'block' to refer to the smaller space taken by each inner |mImpl| (i.e., size=8 for example). But now that more details are known, I think the overall picture of the scheme is clear.
The nsTextFragment stuff is completely different, there all blocks are of different sizes, many being 2-10 bytes, but there are also blocks of several KB. I don't think I can guarantee to have time to invest in nsVoidArray. I'm more focused on driving my other ~10 bugs to finish. Then, we'll see...
Hmm, I just had quick look and the two problems look similar to me. nsVoidArray owns |mImpl| while in the other case nsTextFragment owns its inner data which might be handled in a similar way. So it seems to me.
Yes, the problems are similar and yet so completely different. 1. nsVoidArrays can accept a little unused memory, it's part of a dynamic arrays life to be too big. nsTextFragments should use as little memory as possible, because even 10 bytes / block can be hundreds or thousands of KB. 2. Because of 1. we can allocate a reasonable sized block (say room for 8 void*) for a default nsVoidArray thereby allocating lots of blocks of the same size. nsTextFragments have blocks that all are of different sizes. 3. When allocating many blocks of the same size we don't have to worry about fragmentation. When allocating blocks of different sizes, fragmentation becomes a very big problem. So for nsVoidArray we have a solution in nsFixedSizeAllocator (which I found in xpcom/ds http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h ). For nsTextFragments we have the heap which is expert on allocating and freeing blocks of different sizes. To improve things for nsTextFragments we have to take advantage of something else that the heap cannot know. My plan was to use the fact that nsTextFragments have a life time that is the same as the page the text is displayed on, but I still haven't gotten the help I need to complete that work, even though it looks promising. Btw, http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h has a comment at the top describing how to use it in a class. It might be less work then one might believe.
The mismatch of terminology may be making things not very clear. Let's start over. A 'block' is a large _container_ of N=500 (or so) contiguous 'chunks'. Then, the size of each 'chunk' is k='8' (or so) elements. (BTW, why would '500' become '8' in the nsTextFragment case?) The management of the 'blocks' may be done via the nsFixedSizedAllocator. However, the management of the 'chunks' needs some custom code as I described earlier. Each fresh nsVoidArray object is first assigned a 'chunk' (not a 'block'). Then, after 8 nsVoidArray::AppendElement(), the initial 8 entries of its chunk are filled up, it is then _re-allocated outside the block_, and takes on a life of its own away. And the 'chunk' that was occupied becomes available for another nsVoidArray that may come along. Hence the nsVoidArrays can be of different sizes, but if an nsVoidArray ends its life without doing more than 8 AppendElement(), no malloc occurs and that's where the benefit comes up. Isn't that the same problem with nsTextFragment, where the chunks might contain 8 characters instead of 8 |void*|? Where N=500 and k=8 can be parameterized at construction. Anyway, rjesup has done a good job in gahering more attention towards speeding up nsVoidArray than nsTextFragment... So speeding up nsVoidArray can be done independently. But it would make sense if efforts are not unnecessarily duplicated. Also nsFixedSizedAllocator is not making things as easy as you alluded. It will only help to manage the blocks. The handling of the 'chunks' need a special code (although not algorithmically difficult). And further work is needed to put all things together.
I am not sure that nsFixedSizedAllocator is going to be that useful in _this_ context. The benefit comes from the chunks. The blocks themselves can be handled with plain |new| and |delete|. Any idea about the advantages of using nsFixedSizedAllocator in this context?
No, a fixed size chunk for nsTextFragments doesn't win anything. There is no size that could be used for more than a small fraction of the fragments on a page without introducing lots of wasted memory. Ah, well I didn't read the nsFixedSizeAllocator code very much. I just thought it did what the name suggested, which would leave just a small change to the array code to do. And I understand what you want. I think I have understood from your first presentation of the idea. :-)
>No, a fixed size chunk for nsTextFragments doesn't win anything. The difference is therefore in the handling of the auxiliary 'offset' array. Is it what you are doing (or intend to do)? Who knows, you might now win help from the large list of people cc:ed :-)
No, trust me, it isn't close to be as easy as the fixed size allocator. If the sizes varies, you will have to maintain extra information (wastes memory) to be able to free a chunk but it also causes bad fragmentation if the code is not very advanced with memory block concatenation and seperating requests of different size. The heap does just that and it makes allocations quite expensive.
FYI, brattel's mouth watering work is bug 89567!
I'll write a longer response later, but I should note that I already have a recycler mostly written for nsVoidArray. It keeps a specified maximum number of free Impl structures a specific sizes instead of letting them go back to the heap. It does NOT "block" allocate the mImpl's, mostly because you then have the problem of wildly varying lifetimes causing fragmentation of your blocks; you can end up with a lot of mostly-empty blocks after something does a large number of allocations, then frees them. If you kept the number allocated together small, you avoid most of that problem, but you lose much of the gain. Still, 5 or 10 mImpl's per block cuts the time in the malloc allocator by 5-10x, and you probably wouldn't have _too_ many partly-filled blocks. Note that my recycler helps deal with allocation "froth"; it doesn't help too much with the tidal-waves from, say, loading a CSS page - you end up allocated a lot while loading it, and freeing a lot when the page goes away. Still, you'd end up allocating and freeing a lot less than now. Also, more-expensive allocators would make the advantage of recyclers larger.
Daniel's worry about 1 autoarray in use per large allocated block is real - arrays tend to have wildly divergent lifetimes. Solutions include: 1) We _can_ compact the memory, since it all goes through the mImpl pointer. HOWEVER, this would mean locks around all mImpl access, which is a Bad Thing. 2) keep the number of allocation units in a block-allocation small, say 5 or 10. There would be some memory bloat/loss, but it would be much smaller, and allocation overhead would drop by ~5x (minus our overhead). 3) Use a recycler (as I've mostly coded) to deal with the froth of short-term uses. Things that grab and hold a lot of arrays, then release them, will still cause allocator traffic (though less than now, depending on how high you set the max in-recycle-bin sizes). Unfortunately, pageload is one such case. Upside is limited (known) memory bloat.
Let me take a stab at implementing the scheme and see how it goes. Time for some experimentation -- there is still some time before the next milestone m9.4. My gut feeling is that those nVoidArrays that are created at the same time are likely going to have the same lifespan. So fragmentation might actually be rare (or negligible). Also, the maximum number of blocks at any one time can be kept bounded, and it seems to me that a one-block setup may subsume what you are doing, and may have less overhead due to the custom way of handling the auxiliary 'offset' array (though I am not sure how your implementation is. I couldn't get bratell to detail his mystery implementation :-) With the configurable parameters (number of blocks, number of chunks per block, maximum number of blocks), perhaps there could be some win in the scheme.
You'll need to make sure your recycler is threadsafe, which will likely negate any performance benefits you'd otherwise see. Remember, the PresShell arena ``works'' because: 1. All objects are known to be created, accessed, and destroyed on one thread only. This implies that users pay no penalty for locking. 2. All objects have very similar lifetimes. This implies that the allocator's implementation can be extremely simple (e.g., ``malloc()'' increments a single pointer, and ``free()'' is a no-op). You cannot make either simplification for nsVoidArrays. If you _can_ make this simplification for a certain class of nsVoidArrays (say, those used by the content model code), then you should do like stdc++ does and templatize across an allocator. We should make the global memory allocator better, rather than tinker with one-off recycler hacks IMO.
Wise words. I could possibly templatize but... you just reminded me that all this sounds like too much work for little gain. The thing is that if the savings from these mallocs are not really significant (that's the evidence so far, sadly?) then there is little incentive at trying complicated things. So any news if there is any measurable benefit by avoiding these mallocs in nsVoidArray? If not, why bother.
I agree with Chris. I think that we'll do better to commit the straightforward changes I have in this patch and the 3 subisdary bugs/patches than to spend lots more time implementing recyclers (and testing them, etc). I think VSS/RSS memory tests will show little to no extra memory use due to my patches. (I had a recycler written (with locking), and I might want to test it after we get the first, straightforward patches in. But lets get the simpler, safer patches in first.) IMHO BTW, chris, I'm still working on getting some stats incorporated into it - maybe tonight.
Any more comments on the code in the base patch (http://bugzilla.mozilla.org/showattachment.cgi?attach_id=43753)? I'd like to get this moving towards checkin. We _should_ be getting some perf numbers in the next day or two. (I so have some stat-gathering code I'm adding, but I'm not changing the main code in any way other than adding a macro for calculating the size of an Impl structure.)
Ok, I have stats on nsVoidArrays with (all) my patches. This is the distribution of max number of array entries for a startup to blank and shutdown. This includes nsVoidArray and nsAutoVoidArrays. 0: 3242 1: 4646 2: 1264 3: 994 4: 871 5: 573 6: 558 7: 419 8: 129 9: 39 10: 17 11: 18 Note: 8 seems like a pretty good cutoff, though you could make an argument for 1 or maybe 2. I'll attach a dump of the stats, and copies of nsVoidArray.cpp and nsVoidArray.h that collect and dump the stats. Some of that data. Note that AutoVoidArrays go from 0 (allocated mImpl size) to 72, so all AutoVoidArrays of 8 elements or less are included here under size 0. Number of VoidArrays includes AutoVoidArrays. Size 0: Number of VoidArrays this size (max): 12358 Number of AutoVoidArrays this size (max): 11956 Number of allocations this size (total): 0 Number of GrowsInPlace this size (total): 0 Size 40: Number of VoidArrays this size (max): 156 Number of AutoVoidArrays this size (max): 0 Number of allocations this size (total): 156 Number of GrowsInPlace this size (total): 0 Size 72: Number of VoidArrays this size (max): 298 Number of AutoVoidArrays this size (max): 276 Number of allocations this size (total): 298 Number of GrowsInPlace this size (total): 0
Attached file startup stats
Note: that patch for getting stats is just nsVoidArray.[h|cpp], and is against the trunk. To apply it, install the base patch, revert those two files, and then install the stats patch. I'll make an updated base patch tomorrow. There are no functional changes, just stat-gathering and such.
FYI, http://lxr.mozilla.org/seamonkey/find?string=nsFixedSizeAllocator which I found as I was trying to have another look at the allocator that bratell mentionned earlier. And suprise... sfraser's version (on the mac) is already doing what I described earlier (but with an inverted terminology, s/block/chunk/ and s/chunk/block/ -- an his 'chunks' are objects linked together in a standard way instead of just being locations handled with an auxiliary 'offset' array).
The FixedSizeAllocator is not appropriate, I think. It implements a recycler combined with block (chunk) allocation of additional items. However, you have the same problem with fragmentation (or would, if it ever released memory back to the heap). Basically, this is close to a worst-case allocator for footprint (it's probably pretty fast, however).
It was a FYI item -- a structural coincidence worth mentioning, but overkill for the little _contiguous_ 'chunks' of size=8 suggested here. It is interesting how he circumvented the explicit handling of the locks.
BTW, the reason there are no "GrowsInPlace" for my results is that FreeBSD's allocator is binned, and thus a realloc() that goes over the current bin-size always allocates a new buffer. "Traditional" allocators will get GrowsInPlace hits.
I added stats to nsSupportsArray. THe number of memory allocations that SupportsArray does with the base patch is reduced from 1672 (in 0.9) to 585 with the patch. This is in startup to a blank page and shutdown. They also get used a lot in pageload. Also note that the distribution of nsSupportsArrays has a very long tail: the largest array has 688 entries in it, and about a dozen had more than 400. Total bytes allocated for arrays by nsSupportsArray: 0.9: 957152 patch: 147264
Seems like the nsSupportArray changes are a win however you count. Very good.
This patch included stats-gathering (see other messages for results). I also (at waterson's suggestion) asserted style in nsVoidArray.cpp because I was largely rewriting it. This patch is against the trunk as of when I post this - I needed to redo the patch because Brendan's changes for Fastload touched nsSupportsArray in a way that bit-rotted my old patch. I believe the subsidary patches attached to 92256 are still valid. Looking for performance/memory results(!!!), and r/sr.
Note: if you're using a debug build, nsAutoLock will do a bunch of extra nsVoidArray usage (which we used to do always before this patch). I added some #if DEBUG's around some checks for possible coding errors.
- The comment in nsVoidArray::RemoveElement() sez: + // XXX should use IndexOf so why doesn't it use IndexOf()? - You're now changing the semantics of nsVoidArray::Compact() to only compact if there are more than 256 elements in the array, I don't agree with that change. I think Compact() should *always* compact the array, like it did before, since the array implementation doesn't know how much memory will be saved by compacting a bunch of arrays, the saving done by compacting a single array might be next to none, but the savings gotten from compacting a huge number of arrays in a system (like the content model, as an example) can be more than enough to make it worth compacting even if the saving per array is only 1 pointer size. The code that calls Compact() can choose not to compact if the number of elements in the array is lower than a certain number that makes sense for that code, but the array implementation should *not* restrict when things are actually compacted. Undo that change. - Looking at the changes to nsSupportsArray makes one wonder why nsSupportArray doesn't inherit nsVoidArray... - Add the new method in nsISupportsArray.idl to the *end* of the interface, the current change is guaranteed to cause binary incompatibilities, if you add it tot the end of the interface there's at least a chance for old callers to find the right method when calling methods through nsISupportsArray. - In nsAutoLock.cpp, change #if DEBUG to #ifdef DEBUG - In nsFontMetricsGTK.cpp: @@ -3136,12 +3143,7 @@ found = 1; } else { - PRInt32 n = aNodes->Count(); ... - } + found = (aNodes->IndexOf(node) != -1); } Bad indentation, and also bad indentation at: @@ -3562,6 +3569,8 @@ /* * count hyphens + * XXX It might be good to try to pre-cache this information instead + * XXX of recalculating it on every font access! */ - Don't check in the changes to xpcjsruntime.cpp until you have an a= from jband and/or dbradley. Style nits: - In nsVoidArray::ExpandTo(PRInt32 aMin), fix the indentation of the code inside the #ifdef to match other similar places (i.e. use 2 spaces, not 4): + if (newImpl) + { +#if DEBUG_VOIDARRAY + ADD_TO_STATS(AllocedOfSize,SIZEOF_IMPL(aMin)); + if (aMin > mMaxSize) - Oh how I hate changes like this: - if (mImpl) { + if (mImpl) + { but at least you're consistent so I'll just haveto live with it :-) With those issues fixed, r=jst Hmm, this doesn't make sense to me (not your fault, it was in the old code too), InsertElementAt() can only insert in the middle of the current array or append immediately past the current array, but it can't insert elements at any position in the array causing the array to grow to the required size needed for the operation, and padding new unset elements with 0. Yet, ReplaceElementAt() can do exactly that, what's the reasoning behind that I wonder? Should we file a new bug on that?
>- The comment in nsVoidArray::RemoveElement() sez: > >+ // XXX should use IndexOf > >so why doesn't it use IndexOf()? No reason. I happened to notice that while doing style changes and noted it for future interest but didn't think it would do much but save a few bytes of code (and clean things up). How about this: PRBool nsVoidArray::RemoveElement(void* aElement) { PRInt32 theIndex = IndexOf(aElement); if (theIndex != -1) return RemoveElementAt(theIndex); return PR_FALSE; } >- You're now changing the semantics of nsVoidArray::Compact() to only >compact if there are more than 256 elements in the array, I don't agree >with that change. I think Compact() should *always* compact the array, >like it did before, since the array implementation doesn't know how much >memory will be saved by compacting a bunch of arrays, the saving done by >compacting a single array might be next to none, but the savings gotten >from compacting a huge number of arrays in a system (like the content >model, as an example) can be more than enough to make it worth compacting >even if the saving per array is only 1 pointer size. The code that calls >Compact() can choose not to compact if the number of elements in the array >is lower than a certain number that makes sense for that code, but the >array implementation should *not* restrict when things are actually >compacted. Undo that change. I understand your objection. However: there is an issue here. Compact() is often called when something may (or may not) get much smaller. (The cases I trapped tended to be relatively small shrinkages.) On many systems (almost all), allocations are rounded up, and sometimes to powers-of-two (in binned allocators such as the BSD allocator). I was trying to find a balance between the overhead of freeing/malloc/copying and space. However, I agree that the tradeoff I picked was pretty random and mostly designed for speed, not memory efficiency (which was why I was looking for bloat testers). Looking at LXR, I see that about the only use of Compact() is for 1) js/src/xpconnect/src/xpcjsruntime.cpp: JS GC code, which I'm already modifying to use ExpandTo(). 2) uriloader/base/nsDocLoader.cpp: observer arrarys - it calls Compact() after every event. Also it uses Clear()/Compact() to empty arrays. 3) a number of content files make it available as bool attr, which may be where what usage we have is coming from. So, should we realloc the memory when it has shrunken by a single entry, as we were? I may throw in some debugs to figure out how often this happens. Note: I made the change to Compact before I did the major rewrite that added ExpandTo(), which is now used in xpcjsruntime.cpp. If you want me to, I will redo Compact() to always realloc regardless of the savings (if >0). >- Looking at the changes to nsSupportsArray makes one wonder why nsSupportArray >doesn't inherit nsVoidArray... It would make quite a bit of sense. However, I didn't think I should go into such extensive and more-likely-to-have-unexpected-results changes. It also would require considerably more testing and reviewing IMHO. Also, see the end of this message for info on some niggling semantic differences between them. >- Add the new method in nsISupportsArray.idl to the *end* of the interface, the >current change is guaranteed to cause binary incompatibilities, if you add it >tot the end of the interface there's at least a chance for old callers to find >the right method when calling methods through nsISupportsArray. I didn't realize we had locked down binary compatibility on this interface; I added it where it would read well. No problem, though. >- In nsAutoLock.cpp, change #if DEBUG to #ifdef DEBUG Ok, makes sense. (For testing I was using #if 0). #if DEBUG's are scattered through the tree, but I do agree with you. In any case, I looked at the file more, and found a vast amount of it (including the change I made) is already under #ifdef DEBUG, so I removed my change. >- In nsFontMetricsGTK.cpp: >Bad indentation, and also bad indentation at: Ok. Emacs helpfully inserted tabs. >- Don't check in the changes to xpcjsruntime.cpp until you have an a= from jband >and/or dbradley. No problem. Email sent. >Style nits: > >- In nsVoidArray::ExpandTo(PRInt32 aMin), fix the indentation of the code inside >the #ifdef to match other similar places (i.e. use 2 spaces, not 4): > >+ if (newImpl) >+ { >+#if DEBUG_VOIDARRAY >+ ADD_TO_STATS(AllocedOfSize,SIZEOF_IMPL(aMin)); >+ if (aMin > mMaxSize) No problem, that was just forgetting to have Emacs re-indent after cut-and-paste. >- Oh how I hate changes like this: > >- if (mImpl) { >+ if (mImpl) >+ { > >but at least you're consistent so I'll just haveto live with it :-) >With those issues fixed, r=jst I wasn't going to do that, but waterson suggested asserting style given the depth of the changes to the file (instead of going back and making all the new code match the old standard). >Hmm, this doesn't make sense to me (not your fault, it was in the old code >too), InsertElementAt() can only insert in the middle of the current array >or append immediately past the current array, but it can't insert elements >at any position in the array causing the array to grow to the required >size needed for the operation, and padding new unset elements with 0. Yet, >ReplaceElementAt() can do exactly that, what's the reasoning behind that I >wonder? Should we file a new bug on that? I have _no_ idea. I noted the same weirdness you did, and at least commented it in the code: // An invalid index causes the insertion to fail // Invalid indexes are ones that add more than one entry to the // array (i.e., they can append). and // Unlike InsertElementAt, ReplaceElementAt can implicitly add more // than just the one element to the array. For added fun, nsSupportsArray doesn't allow ReplaceElementAt to add elements _at all_. Also, nsSupportsArray has AppendElements; nsVoidArray doesn't. Johnny: thanks for such a through review
I think all of jst's and jband's comments are now fully addressed. At this point, I need r= (jst should provide that), sr= (waterson?), and an a= from jband or dbradley. Compact() now always compacts again, as per jst's request. We may eventually want to look at callers of Compact, or revisit the time/space tradeoff within nsVoidArray.
Blocks: 7251
Add this from my e-mail: For the xpcjsruntime.cpp change. Why not just check if (count == 1) rather than !count. This would save numerous uneeded calls to ExpandTo. The only counter for this would be if the array shrunk to some small number and then releasing an object triggerred additional objects to be placed in the array. jband, how likely is the array to shrink below 256 and then grow again due to additional releases?
>For the xpcjsruntime.cpp change. Why not just check if (count == 1) rather than >!count. This would save numerous uneeded calls to ExpandTo. Unless I miss my guess, the array is only size zero if everything in it has been released and the array Compact()ed. When it's releasing things, it does so one at a time, and it rechecks count on each pass. If count == 0, it Compact()s the array and quits the GC. In the code for DeferredRelease(), we only ExpandTo if Count == 0, so this should only happen on the first call of a GC pass. Your comment did make me go check the code; there is one mod that would help: Change this: if (aMin > mImpl->mCount) { char* bytes = (char *) PR_Realloc(mImpl,SIZEOF_IMPL(aMin)); into this: if (aMin > mImpl->mCount && aMin != oldsize) { char* bytes = (char *) PR_Realloc(mImpl,SIZEOF_IMPL(aMin)); That will cut the overhead of ExpandTo() with the same size to almost nothing. I'll revise the patch later today.
I hate Monday mornings. I read the test backwards, sorry Randell. At least something good came of it. So the change is fine with me. So a=dbradley.
1. Instead of ``very evil macros'' (they're not evil AFAICT), how about ``statistics gathering macros''. 2. In |ExpandTo()|, why are we testing mCount at all here? if (aMin > mImpl->mCount) { ... } Shouldn't we be testing against oldsize? Why force the array to be reallocated if the space is already available? I assume it's written this way because we're using |ExpandTo()| to _shrink_ the array as well? If so, wouldn't |SizeTo()| be a better name for this method? 3. In |ExpandTo()|, I'm to wading through nested if/else clauses four levels deep, which is a pet peeve of mine. And there are early returns in here to boot! Can't we re-arrange this logic more sensically? For example, if (aMin <= 0) { // free the array return PR_TRUE; } if (mImpl && IsArrayOwner()) { // We currently own an array impl. Resize it appropriately. if (aMin <= mImpl->mCount) { return PR_FALSE; // wouldn't fit // (grow the array, etc.) return PR_TRUE; } // We don't own an array impl. Allocate one now. return PR_TRUE; Perhaps this logic looked prettier before the statistics gathering code was in place, but it's torturous now! To a lesser degree, operator=() suffers the same way. 4. Why aren't |GetArraySize()| and |IsArrayOwner()| declared inline in the header? 5. You don't declare |nsAutoVoidArray::ExpandTo()| as virtual. Did you mean to do that? (I can never remember if this matters.) 6. nsAutoVoidArray::Clear() is a bit complicated. Can't we unilaterally call nsVoidArray::Clear(), and conditionally ExpandTo(0)? 7. It looks like gSupportsArraySizes is obsolete. Can it just be removed? I've been whacking on this stuff in my tree, so I'll just attach a patch of what I've got currently.
Thanks for the thorough review. >1. Instead of ``very evil macros'' (they're not evil AFAICT), how about >``statistics gathering macros''. No problem. :-) >2. In |ExpandTo()|, why are we testing mCount at all here? > > if (aMin > mImpl->mCount) { ... } > >Shouldn't we be testing against oldsize? Why force the array to be reallocated >if the space is already available? I assume it's written this way because we're >using |ExpandTo()| to _shrink_ the array as well? If so, wouldn't |SizeTo()| be >a better name for this method? Correct, it is sizing. As I mentioned earlier, the name is poor. SizeTo() is fine by me and more descriptive. Done. >3. In |ExpandTo()|, I'm to wading through nested if/else clauses four levels >deep, which is a pet peeve of mine. And there are early returns in here to >boot! Can't we re-arrange this logic more sensically? For example, Sure, I can reorder it. >Perhaps this logic looked prettier before the statistics gathering code was in >place, but it's torturous now! The stats code really made it harder to read. >To a lesser degree, operator=() suffers the same way. I'll look at that. >4. Why aren't |GetArraySize()| and |IsArrayOwner()| declared inline in the header? I'll check on that. I think the reason was that the only callers I cared about for those were internal. Those should get inlined, since the code was written (in the .cpp) as 'inline'. External callers would get a non-inline copy. This does waste some code space however. I did it originally because I like to have the implementation be in the .cpp file; I find implementations split between two files painful. However, this is C++, so it's a pain I should expect. Fixed. >5. You don't declare |nsAutoVoidArray::ExpandTo()| as virtual. Did you mean to >do that? (I can never remember if this matters.) I can't either. I don't _think_ it matters, but I suppose it doesn't hurt, or hurt much. I'll add it. Ditto for nsAutoVoidArray::Compact. >6. nsAutoVoidArray::Clear() is a bit complicated. Can't we unilaterally call >nsVoidArray::Clear(), and conditionally ExpandTo(0)? Sure. The compiler would probably optimize it that way anyways. >7. It looks like gSupportsArraySizes is obsolete. Can it just be removed? > >I've been whacking on this stuff in my tree, so I'll just attach a patch >of what I've got currently. Ok, thanks. I've done some additional cleanup (factored out the Grow code into nsVoidArray::GrowBy and nsSupportsArray::GrowBy), and added (with Pavlov's agreement) the methods InsertElementsAt(aOther, aIndex) and RemoveElementsAt(aIndex,aCount). These may be useful in InsertBefore, etc (especially for doc fragments to speed up mail reply). Also, in SizeTo() I added an early return for aMin == current size for efficiency. I'll post an updated patch shortly after testing all this.
I found a bug in nsSupportsArray while doing a little cleanup. RemoveElement takes a starting point, but ignores it. I'll include the fix for this in the patch.
Updated to include waterson's comments: ExpandTo() called SizeTo(), cleanup of the code for SizeTo, inlines, etc. Added InsertElementsAt/RemoveElementsAt (per discussion with Pav - see n.p.m.performace/.porkjockeys/.xpcom) to both nsVoidArray and nsSupportsArray. nsSupportsArray::AppendElements() uses InsertElementsAt() now. Cleanup: unify code for growing arrays into ::GrowBy(). Cleans up Insert/Replace/etc. (Both nsVoidArray and nsSupportsArray). nsSupportsArray: fixed RemoveElement(); it was ignoring the starting position. Made RemoveElement and RemoveLastElement() use IndexOf*. Added SizeTo() to keep the interface at least somewhat like nsVoidArray's.
How about checking this in now before adding even more changes to this same patch? Please file new bugs on additional large changes to nsVoidArray n' friends in stead of cooking them into this patch...
jst: sorry about that. If you like, I can provide a patch with waterson's comments addressed and no other changes, and do the other stuff in that last patch in a new bug/patch. (I'd originally planned to do that, actually, but when I found the bug in nsSupports and realized that I had to make a bunch of changed for Waterson I just rolled it together.) Tell me what you'd like. I can provide a split patch tomorrow if wanted.
xpcom/ds changes checked in.
Blocks: 94243
bug 94243 will cover the remaining parts of this patch (outside of xpcom/ds), such as the xpcjsruntime.cpp change (a='d by dbradley & jband). The base changes are checked in and apparently stable in the trunk, so I'm closing this bug. This will leave bug 92573, bug 92575, and bug 92576 for the member-variable changes, and bug 94243 for the usage fixes and auto-variable changes (and the js GC fix). Anyone who wants to continue to follow this should add themselves to the CC list (I'm adding a few people myself). Thanks for all the comments, people!
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
does this bug being fixed mean the landing plan entry should closed out? here is what the landing plan sez... * 3. nsVoidArray/nsSupportsArray * 27-31-1: Reduce allocation overheads: bug 90545/etc * 7-31-1: Ready for tests: speed, startup speed, # of allocations, memory use. (Includes 3 dependant patches) * 8-7-1: xpcom/ds patches applied. Next will be things using nsVoid/SupportsArray, then member variable uses
Yes, that should be closed out. What's the URL to adminster those?
one line: http://komodo.mozilla.org/planning/branches.cgi? showname=nsVoidArray/nsSupportsArray
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: