Closed Bug 67618 Opened 24 years ago Closed 8 years ago

Reduce number of memory allocations at startup

Categories

(Core Graveyard :: Tracking, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: bratell, Assigned: cathleennscp)

References

(Blocks 1 open bug, )

Details

(Keywords: meta, perf)

Attachments

(3 files)

It has been seen that we spend 40% of the CPU time during startup allocating memory, making around 300 000 calls to malloc or new. If we could reduce that number significantly we might be able to make the program start quicker. I've made a list of which functions are responsible for the memory allocations which I will attach to this bug. Methods to reduce number of memory allocations is: 1. Use a local buffer and pass it into the function that would otherwise return a newed pointer. 2. Use the original data instead of cloning it. 3. Use an arena. 4. Avoid the code altogether. 5. Pass objects by reference. As can be seen from the list, there are no top offenders so I guess it's just to try to make small improvements everywhere gaining some ms here and some ms there. One thing I noticed though, is that we initialize more than a thousand RDF resources. Now, I don't know what an RDF resource is, but that seems to be alot of them during startup. The measurements were done on a Windows 2000 build from 2001-02-02.
Blocks: 7251
Keywords: perf
This is interesting information. How did you collect it?
I started Mozilla with Quantify and then directly quit. I then started to look at the functions new and malloc. Since they couldn't be responsible for being called, I replaced them with their respective callers. Among those there were also many functions not responsible themselves like "Clone" functions, strdup, PR_Malloc and so on and I replaced those with their callers until at least the top 50 contained (mostly) functions where I couldn't put the blame elsewhere. At least not without distorting the picture. I also merged functions that appeared from many directions (nsHashtable::Put comes to mind). So it's not very difficult to collect the data. It was just a lot of manual work. :-) I still have the qfy file if someone wants to look at it.
The js_NewScopeProperty malloc craze is bug 62164. Track it from here via a dependency if you like, but please don't dup it. /be
Did you start and stop Mozilla with Quantify running the whole time, or did you start Mozilla, then pause Quantify, then quit. They will generate slighty different timings/results. I believe that the second method is what we want, so I just want to verify.
I'm not 100% sure but I think the run that I extracted the data from is including the shutdown. Quantify just crashed when I tried to pause it before shutting down. Anyway, I would think that the number of memory allocations happening during shutdown is neglectable. If not, something is seriously wrong.
someone who knows more want to take this bug from me?
since waterson recognized it, *poof*
Assignee: asa → waterson
kandrot has been studin' this.
Assignee: waterson → kandrot
Status: NEW → ASSIGNED
I will see what can be done. There are a lot of mallocs spread all over the code, and will take time to track down the owner and find out if they are really needed. I am looking for a global solution currently.
Target Milestone: --- → mozilla1.0
Blocks: 71874
No longer blocks: 7251
Maybe you could try a faster implementation of malloc. Just an idéa.
Platform/OS All/All from PC/W2k. If I'm wrong, put it back.
OS: Windows 2000 → All
Hardware: PC → All
The fundamental problem is simply the number of objects (allocations) that get used, especially for short periods and then thrown away. Pools and arenas are one way to attack the problem (beyond the obvious but tough solution of writing to code to not need to allocate zillions of objects). Others might involve making things like arrays (nsVoidArray::InsertElementAt is high on the list) increase allocations by more than one entry each time. This is already (I think) done for strings. I may take a look at the Array code, but this won't solve the problem of algorithms that assume/require a lot of objects.
Sorry, nsVoidArray and nsSupportsArray do increase by more than 1 (nsSupports increases by 8 always, and nsVoidArray increases by either 4 or double the current size). However, they're still high on the list of allocations, especially void, so I'm looking at them some more.
Places to look at (from my old collected data) for nsVoidArray stuff could be: CSSDeclarationImpl::AppendValue (creates ~900 arrays, does 3600 InsertElementAt) nsGenericHTMLContainerElement::AppendChildTo (does 3500 AppendElement) nsXULElement::AppendChildTo (does 1800 InsertElementAt) nsGenericContainerElement::SetAttribute (does 1500 InsertElementAt) Another point, maybe the nsDeque the nsHTMLTokenizer uses should be bigger initally. It seems to grow several times during the startup.
nsVoidArray patch in bug 90545. I'll look at some of the stats, thanks.
Depends on: 90545
I found a lot of the reason for all the nsVoidArray::InsertElementAt allocations is the CSS code. It uses lots of nsVoidArrays which are then immediately appended to. I'm going to look at either (a) switching some of them to nsAutoVoidArrays, and/or (b) adding a constructor that takes a single array element to be inserted at construction time. (a) probably makes more sense.
I have some major reductions in nsVoidArray allocations (largely by judicious use of nsAutoVoidArrays, and by tweaking the grow routine in nsVoidArray). Currently I'm down to ~1900 allocations/reallocations of arrays, down from close to 15000 when I started. See bug 90545 for ongoing analysis and patches. I could use some help (once I post my next patch to bug 90545) doing profiles and startup measurements, and making sure I didn't hurt footprint too much (I'm not really set up to do them here).
jrgm's our profiling guru I believe.
Blocks: 7251
Simon Fraser wrote: > >Here's my data on malloc sites: > > <http://www.mozilla.org/performance/mallocsites.html> Ummm... Simon - I looked at your data, and I'm confused. What _exactly_ was it that you put in that file?
sfraser is away right now, so you may want to ping him with email, which he'll be more likely to see when he gets back (overflowing inbox and all).
Carolyn is supposed to get new quantify and trace-malloc data in the next day or so. I advise emailing her if you have advise on how to generate some useful reports like what's attached here.
cathleen
Cathleen. Duh. (slaps forehead). Sorry. Late night (I work with a Carolyn who does similar things.) One thing to note about the (old) stats that Daniel produced: strings and js are by far the worst at startup. Not that this is surprising... Perhaps someone should look at JS allocation habits as well.
randell: will the JS numbers be still of interest, if most of the Mozilla-JS code is precompiled? Thats bug 68045... There is also a bug 62164 for JS-mallocs
JS may still be allocating a lot of memory even if precompiled. We'll see.
brendan needs to fix JS malloc bug 62164, after his fast load work. :-)
rjesup: see bug 62164 and bug 72748. We use JS, it uses memory, so far so good. The problem in the engine (62164) will help somewhat, but the bigger gain may come from the FastLoad work (currently torching a tree near you). shaver's perl script discloses ~9500 function objects, many of which I bet are not called at startup, or typically any time soon after. Via nsFastLoadPtr and similar such techniques, I hope we can lazily deserialize these from the FastLoad file. /be
Any luck getting more recent data than Feb? Cathleen, John?
No longer blocks: 7251
Blocks: 7251
From a warm-start jprof in bug 7251: malloc/free/new time is ~16-18% of startup on a fast Linux box.
http://www.mozilla.org/performance/mallocsites.html shows the number of times malloc() was called with a particular call stack. Simple.
I think to get traction on a bug like this there needs to be data that says there was exactly N number of allocations from double click to complete startup. That data is needed for at least each milestone so one can see that progress is being made. Daily data might be excessive but would at least provide an early warning of regressions in this area. rjesup has already checked in several patches which should decrease the number of allocations. It would be really helpful to be able to quantify the progress that is being made so that one would know of exactly what direction to take next.
Simon: is that updated? If not (and I'm pretty sure it's not), when was it run? I was really hoping for some recent memory allocation traces. Note: your data is useful, though it would also be nice to know the number of times for a particular routine, as opposed to a match on the entire call stack. For example, I'm more interested in how many times each routine called PL_strdup, not the number of PL_strdups. It is possible to extract that, but not easy. That was why Daniel's info was handy. We could almost use a jprof-like output: full call stack matches, and jprof-like call trees. In any case, I'd be happy for _any_ recent or useful data (and your data is useful).
reassigning kandrot perf bugs.
Assignee: kandrot → cathleen
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---
Any chance we can get a list of startup allocations from a modern build to attack? A bunch of allocation-reductions have hit the tree recently (I plan to land the last of the nsVoidArray -> nsAutoVoidArray patches in the morning).
discussed a bit in performance meeting today, we plan to tackle the following areas for reducing the number of memory allocs: - JS malloc happy (brendan) - make change xpcom registry format (shaver?) - CSS parser (dbaron/perrier?) - XML parser (harishd) - string sharing, utf8 string type (jag/scc/jband) - unify fastload (dp/brendan)
Component: Browser-General → Tracking
On popular demand I did a quick quantify run to confirm the old numbers: CPU time malloc: 142191 calls 676,9ms new: 108792 calls 310,5ms free: 86710 calls 120,0ms delete: 58963 calls 54,7ms --------------------------------- memory: 396656 calls 1162,1ms I won't have time to break this into guilty callers, but it looks as if the number of allocations have been reduced by ~20% since february. The big part of the startup time still seems to be non CPU time, like locking and I/O wait. Note that this is on a profile with very few bookmarks. Bookmarks cause a huge number of mamory operations both in RDF and in bookmarksservice as revealed by Simon's allocations lists. I have a bug about that that I fixed a long time ago, but that I have not been able to get reviewed. (tests done on an Duron 800MHz) (and please Randell, spell my name correctly. :-) )
Adding meta keyword. Sorry for the spam!
Keywords: meta
sfraser: The xpti stuff at the top would indicate that this is probably a first run (or that we are not properly doing autoregistration and saving xpti.dat on Mac?). Assuming that this is in fact a first run, doesn't that skew the results compared to a typical run? First run is usually part of the install process and not indicative of normal usage.
Notable things from the data: * The top malloc site is use of nsXPIDLCString in nsComponentManagerImpl::PlatformPrePopulateRegistry (2094 mallocs) * Next site is nsGenericContainerElement::SetAttr on nsXMLElements (2088) * Next is string foo called from NS_NewAtom (2033) * Loading xpt files accounts for the next two sites (two lots of 1884 allocs) * nsCSSSelector::nsCSSSelector is next (1663) etc. nsComponentManagerImpl::PlatformPrePopulateRegistry accounts for over 2094 + 1420 + 746 + 674 + 674 + 674 + 2 + 1 = 6285 allocations altogether.
jband: this was not a first run. File read tracing also showed that all the XPT files were getting read at startup each time too, so there seems to be something wrong there. Maybe it's because I have aliases to XPT files in Components, rather than the actual files? In theory that should still work.
For some of the fixed-size high-activity allocations, it may make sense to pool them, and quite possibly bunch-allocate them from system malloc. I'm thinking explicitly of things like Atoms (maybe; they have strings associated) and Rules. Quite possibly others. Or find ways to include a bunch of them as static data instead of allocating at startup.
rjesup: atoms coalesce each atom's allocation with its string. sfraser: I see nsXULPrototypeScript::Serialize in two stacks, showing that you didn't have a valid FastLoad file. If the normal startup has one (and it should), we should measure that. If the file got invalidated for some bogus reason (see bug 106021, now fixed), I may need a new bug. /be
sfraser: I'll refesh my Mac build and poke around a bit at xpti.
jband: how should xpt file loading behave? I have a 68K xpti.dat file. Debugging shows that it opens each .xpt file as necessary to get interface info.
sfraser: what it should do is: attempt to read xpti.dat (this is human readable if you care), if it finds reasonable stuff there then it will only load .xpt files when it needs to resolve a particular interfacce info stored inthat file. At that point it will also pull in any other interface info from that .xpt file. If reading the xpti.dat fails in the first place (including the appdir or components dir path having changed) then it will do a full autoregister. This entails reading each and every .xpt file and then writing a new xpti.dat for use in the next run. If for some reason xpti is not happy with what it finds in the xpti.dat on eac run then it is going to read each and every .xpt file on each run. We want to ensure that this is not happening. I'm curious is the persistent file path stuff (nsILocalFile::GetPersistentDescriptor) is perhaps unhappy with your aliases or something. Something like tht might also (perhaps) cause fastload problems. If you want to debug then fine. Otherwise I'll poke at it tonight or tomorrow.
The data in mallocsites20011031.html were collected for a run after a previous run that crashed. I suspect that's why both xpt loading and fastload were confused. New data have been pushed to mozilla.org, at the same URL, for a good run.
New data up. And brendan is the winner! Top allocations are now: * 3129 + 2817 JS_Mallocs called from nsXULPrototypeScript::Deserialize().
That's more like it. :-) We have a lot of chrome JS functions; that's not by itself bad, nor is it my bug to fix by somehow reducing malloc calls. We need to whip out the JS profiler (http://www.mozilla.org/performance/jsprofiler.html) and see how many are called during startup -- if it's significantly fewer than the total, I'll start using nsFastLoadPtrs (sort of; JS engine hooks required) to defer deserialization till they're about to be called. /be
I'm relieved to hear that I don't have to go figure out why xpti autoreg was not working right on Mac.
I'm extracting the CSS data from this page, if someone wants to do the same for the other modules. At first sight, the main culprits seem to be Parser, XUL, CSS, Prefs and i18n.
Nice data sfraser. I will look to see if I can reduce the PrePopulateRegistry() allocation count
Depends on: 106356
The style system is responsible for approximately 24300 allocations, or 6.6% of the total number of allocations at startup: Selectors 10500 Declarations 3800 Rules 7000 Other 3000 See bug 106356 for more details.
JS profiler says that starting up through profile manager to first browser window (it loads the default mozilla milestone home page) compiles 376 uncalled JS functions consuming at least 108192 bytes of net-request-total heap space, while 80 compiled functions are called at least once and take at least 20350 bytes. Spinning off dependency bug 107907. /be
Depends on: 107907
No longer depends on: 107907
Depends on: 107907
Depends on: 108071
Sites that do over 100 mallocs in 2001-10-31 build
Sites that do over 100 mallocs in 2001-11-29 build
Keywords: mozilla1.0+
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE. If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: