Closed
Bug 67618
Opened 24 years ago
Closed 8 years ago
Reduce number of memory allocations at startup
Categories
(Core Graveyard :: Tracking, defect)
Core Graveyard
Tracking
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: bratell, Assigned: cathleennscp)
References
(Blocks 1 open bug, )
Details
(Keywords: meta, perf)
Attachments
(3 files)
It has been seen that we spend 40% of the CPU time during startup allocating memory, making around 300 000 calls to malloc or new. If we could reduce that number significantly we might be able to make the program start quicker. I've made a list of which functions are responsible for the memory allocations which I will attach to this bug. Methods to reduce number of memory allocations is: 1. Use a local buffer and pass it into the function that would otherwise return a newed pointer. 2. Use the original data instead of cloning it. 3. Use an arena. 4. Avoid the code altogether. 5. Pass objects by reference. As can be seen from the list, there are no top offenders so I guess it's just to try to make small improvements everywhere gaining some ms here and some ms there. One thing I noticed though, is that we initialize more than a thousand RDF resources. Now, I don't know what an RDF resource is, but that seems to be alot of them during startup. The measurements were done on a Windows 2000 build from 2001-02-02.
Reporter | ||
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
This is interesting information. How did you collect it?
Reporter | ||
Comment 3•24 years ago
|
||
I started Mozilla with Quantify and then directly quit. I then started to look at the functions new and malloc. Since they couldn't be responsible for being called, I replaced them with their respective callers. Among those there were also many functions not responsible themselves like "Clone" functions, strdup, PR_Malloc and so on and I replaced those with their callers until at least the top 50 contained (mostly) functions where I couldn't put the blame elsewhere. At least not without distorting the picture. I also merged functions that appeared from many directions (nsHashtable::Put comes to mind). So it's not very difficult to collect the data. It was just a lot of manual work. :-) I still have the qfy file if someone wants to look at it.
Comment 4•24 years ago
|
||
The js_NewScopeProperty malloc craze is bug 62164. Track it from here via a dependency if you like, but please don't dup it. /be
Comment 5•24 years ago
|
||
Did you start and stop Mozilla with Quantify running the whole time, or did you start Mozilla, then pause Quantify, then quit. They will generate slighty different timings/results. I believe that the second method is what we want, so I just want to verify.
Reporter | ||
Comment 6•24 years ago
|
||
I'm not 100% sure but I think the run that I extracted the data from is including the shutdown. Quantify just crashed when I tried to pause it before shutting down. Anyway, I would think that the number of memory allocations happening during shutdown is neglectable. If not, something is seriously wrong.
Comment 7•24 years ago
|
||
someone who knows more want to take this bug from me?
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 10•24 years ago
|
||
Here's my data on malloc sites: <http://www.mozilla.org/performance/mallocsites.html>
Comment 11•24 years ago
|
||
I will see what can be done. There are a lot of mallocs spread all over the code, and will take time to track down the owner and find out if they are really needed. I am looking for a global solution currently.
Target Milestone: --- → mozilla1.0
Comment 12•23 years ago
|
||
Maybe you could try a faster implementation of malloc. Just an idéa.
Comment 13•23 years ago
|
||
Platform/OS All/All from PC/W2k. If I'm wrong, put it back.
OS: Windows 2000 → All
Hardware: PC → All
Comment 14•23 years ago
|
||
The fundamental problem is simply the number of objects (allocations) that get used, especially for short periods and then thrown away. Pools and arenas are one way to attack the problem (beyond the obvious but tough solution of writing to code to not need to allocate zillions of objects). Others might involve making things like arrays (nsVoidArray::InsertElementAt is high on the list) increase allocations by more than one entry each time. This is already (I think) done for strings. I may take a look at the Array code, but this won't solve the problem of algorithms that assume/require a lot of objects.
Comment 15•23 years ago
|
||
Sorry, nsVoidArray and nsSupportsArray do increase by more than 1 (nsSupports increases by 8 always, and nsVoidArray increases by either 4 or double the current size). However, they're still high on the list of allocations, especially void, so I'm looking at them some more.
Reporter | ||
Comment 16•23 years ago
|
||
Places to look at (from my old collected data) for nsVoidArray stuff could be: CSSDeclarationImpl::AppendValue (creates ~900 arrays, does 3600 InsertElementAt) nsGenericHTMLContainerElement::AppendChildTo (does 3500 AppendElement) nsXULElement::AppendChildTo (does 1800 InsertElementAt) nsGenericContainerElement::SetAttribute (does 1500 InsertElementAt) Another point, maybe the nsDeque the nsHTMLTokenizer uses should be bigger initally. It seems to grow several times during the startup.
Comment 17•23 years ago
|
||
nsVoidArray patch in bug 90545. I'll look at some of the stats, thanks.
Depends on: 90545
Comment 18•23 years ago
|
||
I found a lot of the reason for all the nsVoidArray::InsertElementAt allocations is the CSS code. It uses lots of nsVoidArrays which are then immediately appended to. I'm going to look at either (a) switching some of them to nsAutoVoidArrays, and/or (b) adding a constructor that takes a single array element to be inserted at construction time. (a) probably makes more sense.
Comment 19•23 years ago
|
||
I have some major reductions in nsVoidArray allocations (largely by judicious use of nsAutoVoidArrays, and by tweaking the grow routine in nsVoidArray). Currently I'm down to ~1900 allocations/reallocations of arrays, down from close to 15000 when I started. See bug 90545 for ongoing analysis and patches. I could use some help (once I post my next patch to bug 90545) doing profiles and startup measurements, and making sure I didn't hurt footprint too much (I'm not really set up to do them here).
Comment 20•23 years ago
|
||
jrgm's our profiling guru I believe.
Comment 21•23 years ago
|
||
Simon Fraser wrote: > >Here's my data on malloc sites: > > <http://www.mozilla.org/performance/mallocsites.html> Ummm... Simon - I looked at your data, and I'm confused. What _exactly_ was it that you put in that file?
Comment 22•23 years ago
|
||
sfraser is away right now, so you may want to ping him with email, which he'll be more likely to see when he gets back (overflowing inbox and all).
Comment 23•23 years ago
|
||
Carolyn is supposed to get new quantify and trace-malloc data in the next day or so. I advise emailing her if you have advise on how to generate some useful reports like what's attached here.
Comment 24•23 years ago
|
||
cathleen
Comment 25•23 years ago
|
||
Cathleen. Duh. (slaps forehead). Sorry. Late night (I work with a Carolyn who does similar things.) One thing to note about the (old) stats that Daniel produced: strings and js are by far the worst at startup. Not that this is surprising... Perhaps someone should look at JS allocation habits as well.
Comment 26•23 years ago
|
||
randell: will the JS numbers be still of interest, if most of the Mozilla-JS code is precompiled? Thats bug 68045... There is also a bug 62164 for JS-mallocs
Comment 27•23 years ago
|
||
JS may still be allocating a lot of memory even if precompiled. We'll see.
Assignee | ||
Comment 28•23 years ago
|
||
brendan needs to fix JS malloc bug 62164, after his fast load work. :-)
Comment 29•23 years ago
|
||
rjesup: see bug 62164 and bug 72748. We use JS, it uses memory, so far so good. The problem in the engine (62164) will help somewhat, but the bigger gain may come from the FastLoad work (currently torching a tree near you). shaver's perl script discloses ~9500 function objects, many of which I bet are not called at startup, or typically any time soon after. Via nsFastLoadPtr and similar such techniques, I hope we can lazily deserialize these from the FastLoad file. /be
Comment 30•23 years ago
|
||
Any luck getting more recent data than Feb? Cathleen, John?
Comment 31•23 years ago
|
||
From a warm-start jprof in bug 7251: malloc/free/new time is ~16-18% of startup on a fast Linux box.
Comment 32•23 years ago
|
||
http://www.mozilla.org/performance/mallocsites.html shows the number of times malloc() was called with a particular call stack. Simple.
Comment 33•23 years ago
|
||
I think to get traction on a bug like this there needs to be data that says there was exactly N number of allocations from double click to complete startup. That data is needed for at least each milestone so one can see that progress is being made. Daily data might be excessive but would at least provide an early warning of regressions in this area. rjesup has already checked in several patches which should decrease the number of allocations. It would be really helpful to be able to quantify the progress that is being made so that one would know of exactly what direction to take next.
Comment 34•23 years ago
|
||
Simon: is that updated? If not (and I'm pretty sure it's not), when was it run? I was really hoping for some recent memory allocation traces. Note: your data is useful, though it would also be nice to know the number of times for a particular routine, as opposed to a match on the entire call stack. For example, I'm more interested in how many times each routine called PL_strdup, not the number of PL_strdups. It is possible to extract that, but not easy. That was why Daniel's info was handy. We could almost use a jprof-like output: full call stack matches, and jprof-like call trees. In any case, I'd be happy for _any_ recent or useful data (and your data is useful).
Comment 35•23 years ago
|
||
reassigning kandrot perf bugs.
Assignee: kandrot → cathleen
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---
Comment 36•23 years ago
|
||
Any chance we can get a list of startup allocations from a modern build to attack? A bunch of allocation-reductions have hit the tree recently (I plan to land the last of the nsVoidArray -> nsAutoVoidArray patches in the morning).
Assignee | ||
Comment 37•23 years ago
|
||
discussed a bit in performance meeting today, we plan to tackle the following areas for reducing the number of memory allocs: - JS malloc happy (brendan) - make change xpcom registry format (shaver?) - CSS parser (dbaron/perrier?) - XML parser (harishd) - string sharing, utf8 string type (jag/scc/jband) - unify fastload (dp/brendan)
Component: Browser-General → Tracking
Reporter | ||
Comment 38•23 years ago
|
||
On popular demand I did a quick quantify run to confirm the old numbers: CPU time malloc: 142191 calls 676,9ms new: 108792 calls 310,5ms free: 86710 calls 120,0ms delete: 58963 calls 54,7ms --------------------------------- memory: 396656 calls 1162,1ms I won't have time to break this into guilty callers, but it looks as if the number of allocations have been reduced by ~20% since february. The big part of the startup time still seems to be non CPU time, like locking and I/O wait. Note that this is on a profile with very few bookmarks. Bookmarks cause a huge number of mamory operations both in RDF and in bookmarksservice as revealed by Simon's allocations lists. I have a bug about that that I fixed a long time ago, but that I have not been able to get reviewed. (tests done on an Duron 800MHz) (and please Randell, spell my name correctly. :-) )
Comment 40•23 years ago
|
||
New data on malloc sites at startup will soon show at: <http://www.mozilla.org/performance/mallocsites20011031.html>
Comment 41•23 years ago
|
||
sfraser: The xpti stuff at the top would indicate that this is probably a first run (or that we are not properly doing autoregistration and saving xpti.dat on Mac?). Assuming that this is in fact a first run, doesn't that skew the results compared to a typical run? First run is usually part of the install process and not indicative of normal usage.
Comment 42•23 years ago
|
||
Notable things from the data: * The top malloc site is use of nsXPIDLCString in nsComponentManagerImpl::PlatformPrePopulateRegistry (2094 mallocs) * Next site is nsGenericContainerElement::SetAttr on nsXMLElements (2088) * Next is string foo called from NS_NewAtom (2033) * Loading xpt files accounts for the next two sites (two lots of 1884 allocs) * nsCSSSelector::nsCSSSelector is next (1663) etc. nsComponentManagerImpl::PlatformPrePopulateRegistry accounts for over 2094 + 1420 + 746 + 674 + 674 + 674 + 2 + 1 = 6285 allocations altogether.
Comment 43•23 years ago
|
||
jband: this was not a first run. File read tracing also showed that all the XPT files were getting read at startup each time too, so there seems to be something wrong there. Maybe it's because I have aliases to XPT files in Components, rather than the actual files? In theory that should still work.
Comment 44•23 years ago
|
||
For some of the fixed-size high-activity allocations, it may make sense to pool them, and quite possibly bunch-allocate them from system malloc. I'm thinking explicitly of things like Atoms (maybe; they have strings associated) and Rules. Quite possibly others. Or find ways to include a bunch of them as static data instead of allocating at startup.
Comment 45•23 years ago
|
||
rjesup: atoms coalesce each atom's allocation with its string. sfraser: I see nsXULPrototypeScript::Serialize in two stacks, showing that you didn't have a valid FastLoad file. If the normal startup has one (and it should), we should measure that. If the file got invalidated for some bogus reason (see bug 106021, now fixed), I may need a new bug. /be
Comment 46•23 years ago
|
||
sfraser: I'll refesh my Mac build and poke around a bit at xpti.
Comment 47•23 years ago
|
||
jband: how should xpt file loading behave? I have a 68K xpti.dat file. Debugging shows that it opens each .xpt file as necessary to get interface info.
Comment 48•23 years ago
|
||
sfraser: what it should do is: attempt to read xpti.dat (this is human readable if you care), if it finds reasonable stuff there then it will only load .xpt files when it needs to resolve a particular interfacce info stored inthat file. At that point it will also pull in any other interface info from that .xpt file. If reading the xpti.dat fails in the first place (including the appdir or components dir path having changed) then it will do a full autoregister. This entails reading each and every .xpt file and then writing a new xpti.dat for use in the next run. If for some reason xpti is not happy with what it finds in the xpti.dat on eac run then it is going to read each and every .xpt file on each run. We want to ensure that this is not happening. I'm curious is the persistent file path stuff (nsILocalFile::GetPersistentDescriptor) is perhaps unhappy with your aliases or something. Something like tht might also (perhaps) cause fastload problems. If you want to debug then fine. Otherwise I'll poke at it tonight or tomorrow.
Comment 49•23 years ago
|
||
The data in mallocsites20011031.html were collected for a run after a previous run that crashed. I suspect that's why both xpt loading and fastload were confused. New data have been pushed to mozilla.org, at the same URL, for a good run.
Comment 50•23 years ago
|
||
New data up. And brendan is the winner! Top allocations are now: * 3129 + 2817 JS_Mallocs called from nsXULPrototypeScript::Deserialize().
Comment 51•23 years ago
|
||
That's more like it. :-) We have a lot of chrome JS functions; that's not by itself bad, nor is it my bug to fix by somehow reducing malloc calls. We need to whip out the JS profiler (http://www.mozilla.org/performance/jsprofiler.html) and see how many are called during startup -- if it's significantly fewer than the total, I'll start using nsFastLoadPtrs (sort of; JS engine hooks required) to defer deserialization till they're about to be called. /be
Comment 52•23 years ago
|
||
I'm relieved to hear that I don't have to go figure out why xpti autoreg was not working right on Mac.
Comment 53•23 years ago
|
||
I'm extracting the CSS data from this page, if someone wants to do the same for the other modules. At first sight, the main culprits seem to be Parser, XUL, CSS, Prefs and i18n.
Comment 54•23 years ago
|
||
Nice data sfraser. I will look to see if I can reduce the PrePopulateRegistry() allocation count
Comment 55•23 years ago
|
||
The style system is responsible for approximately 24300 allocations, or 6.6% of the total number of allocations at startup: Selectors 10500 Declarations 3800 Rules 7000 Other 3000 See bug 106356 for more details.
Comment 56•23 years ago
|
||
JS profiler says that starting up through profile manager to first browser window (it loads the default mozilla milestone home page) compiles 376 uncalled JS functions consuming at least 108192 bytes of net-request-total heap space, while 80 compiled functions are called at least once and take at least 20350 bytes. Spinning off dependency bug 107907. /be
Depends on: 107907
Comment 57•23 years ago
|
||
Sites that do over 100 mallocs in 2001-10-31 build
Comment 58•23 years ago
|
||
Sites that do over 100 mallocs in 2001-11-29 build
Updated•23 years ago
|
Keywords: mozilla1.0+
Comment 59•8 years ago
|
||
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE. If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•