Closed
Bug 67618
Opened 24 years ago
Closed 8 years ago
Reduce number of memory allocations at startup
Categories
(Core Graveyard :: Tracking, defect)
Core Graveyard
Tracking
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: bratell, Assigned: cathleennscp)
References
(Blocks 1 open bug, )
Details
(Keywords: meta, perf)
Attachments
(3 files)
It has been seen that we spend 40% of the CPU time during startup allocating
memory, making around 300 000 calls to malloc or new. If we could reduce that
number significantly we might be able to make the program start quicker.
I've made a list of which functions are responsible for the memory allocations
which I will attach to this bug.
Methods to reduce number of memory allocations is:
1. Use a local buffer and pass it into the function that would otherwise return
a newed pointer.
2. Use the original data instead of cloning it.
3. Use an arena.
4. Avoid the code altogether. 5. Pass objects by reference.
As can be seen from the list, there are no top offenders so I guess it's just to
try to make small improvements everywhere gaining some ms here and some ms
there. One thing I noticed though, is that we initialize more than a thousand
RDF resources. Now, I don't know what an RDF resource is, but that seems to be
alot of them during startup.
The measurements were done on a Windows 2000 build from 2001-02-02.
Reporter | ||
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
This is interesting information. How did you collect it?
Reporter | ||
Comment 3•24 years ago
|
||
I started Mozilla with Quantify and then directly quit. I then started to look
at the functions new and malloc. Since they couldn't be responsible for being
called, I replaced them with their respective callers. Among those there were
also many functions not responsible themselves like "Clone" functions, strdup,
PR_Malloc and so on and I replaced those with their callers until at least the
top 50 contained (mostly) functions where I couldn't put the blame elsewhere. At
least not without distorting the picture. I also merged functions that appeared
from many directions (nsHashtable::Put comes to mind). So it's not very
difficult to collect the data. It was just a lot of manual work. :-)
I still have the qfy file if someone wants to look at it.
Comment 4•24 years ago
|
||
The js_NewScopeProperty malloc craze is bug 62164. Track it from here via a
dependency if you like, but please don't dup it.
/be
Comment 5•24 years ago
|
||
Did you start and stop Mozilla with Quantify running the whole time, or did you
start Mozilla, then pause Quantify, then quit. They will generate slighty
different timings/results. I believe that the second method is what we want, so
I just want to verify.
Reporter | ||
Comment 6•24 years ago
|
||
I'm not 100% sure but I think the run that I extracted the data from is
including the shutdown. Quantify just crashed when I tried to pause it before
shutting down. Anyway, I would think that the number of memory allocations
happening during shutdown is neglectable. If not, something is seriously wrong.
Comment 7•24 years ago
|
||
someone who knows more want to take this bug from me?
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 10•24 years ago
|
||
Here's my data on malloc sites:
<http://www.mozilla.org/performance/mallocsites.html>
Comment 11•24 years ago
|
||
I will see what can be done. There are a lot of mallocs spread all over the
code, and will take time to track down the owner and find out if they are really
needed. I am looking for a global solution currently.
Target Milestone: --- → mozilla1.0
Comment 12•24 years ago
|
||
Maybe you could try a faster implementation of malloc. Just an idéa.
Comment 13•24 years ago
|
||
Platform/OS All/All from PC/W2k. If I'm wrong, put it back.
OS: Windows 2000 → All
Hardware: PC → All
Comment 14•23 years ago
|
||
The fundamental problem is simply the number of objects (allocations) that get
used, especially for short periods and then thrown away. Pools and arenas are
one way to attack the problem (beyond the obvious but tough solution of writing
to code to not need to allocate zillions of objects). Others might involve
making things like arrays (nsVoidArray::InsertElementAt is high on the list)
increase allocations by more than one entry each time. This is already (I
think) done for strings. I may take a look at the Array code, but this won't
solve the problem of algorithms that assume/require a lot of objects.
Comment 15•23 years ago
|
||
Sorry, nsVoidArray and nsSupportsArray do increase by more than 1 (nsSupports
increases by 8 always, and nsVoidArray increases by either 4 or double the
current size). However, they're still high on the list of allocations,
especially void, so I'm looking at them some more.
Reporter | ||
Comment 16•23 years ago
|
||
Places to look at (from my old collected data) for nsVoidArray stuff could be:
CSSDeclarationImpl::AppendValue (creates ~900 arrays, does 3600 InsertElementAt)
nsGenericHTMLContainerElement::AppendChildTo (does 3500 AppendElement)
nsXULElement::AppendChildTo (does 1800 InsertElementAt)
nsGenericContainerElement::SetAttribute (does 1500 InsertElementAt)
Another point, maybe the nsDeque the nsHTMLTokenizer uses should be bigger
initally. It seems to grow several times during the startup.
Comment 17•23 years ago
|
||
nsVoidArray patch in bug 90545. I'll look at some of the stats, thanks.
Depends on: 90545
Comment 18•23 years ago
|
||
I found a lot of the reason for all the nsVoidArray::InsertElementAt allocations
is the CSS code. It uses lots of nsVoidArrays which are then immediately
appended to. I'm going to look at either (a) switching some of them to
nsAutoVoidArrays, and/or (b) adding a constructor that takes a single array
element to be inserted at construction time. (a) probably makes more sense.
Comment 19•23 years ago
|
||
I have some major reductions in nsVoidArray allocations (largely by judicious
use of nsAutoVoidArrays, and by tweaking the grow routine in nsVoidArray).
Currently I'm down to ~1900 allocations/reallocations of arrays, down from close
to 15000 when I started. See bug 90545 for ongoing analysis and patches. I
could use some help (once I post my next patch to bug 90545) doing profiles and
startup measurements, and making sure I didn't hurt footprint too much (I'm not
really set up to do them here).
Comment 20•23 years ago
|
||
jrgm's our profiling guru I believe.
Comment 21•23 years ago
|
||
Simon Fraser wrote:
>
>Here's my data on malloc sites:
>
> <http://www.mozilla.org/performance/mallocsites.html>
Ummm... Simon - I looked at your data, and I'm confused. What _exactly_ was it
that you put in that file?
Comment 22•23 years ago
|
||
sfraser is away right now, so you may want to ping him with email, which he'll
be more likely to see when he gets back (overflowing inbox and all).
Comment 23•23 years ago
|
||
Carolyn is supposed to get new quantify and trace-malloc data in the next day
or so. I advise emailing her if you have advise on how to generate some useful
reports like what's attached here.
Comment 24•23 years ago
|
||
cathleen
Comment 25•23 years ago
|
||
Cathleen. Duh. (slaps forehead). Sorry. Late night (I work with a Carolyn
who does similar things.)
One thing to note about the (old) stats that Daniel produced: strings and js are
by far the worst at startup. Not that this is surprising... Perhaps someone
should look at JS allocation habits as well.
Comment 26•23 years ago
|
||
Comment 27•23 years ago
|
||
JS may still be allocating a lot of memory even if precompiled. We'll see.
Assignee | ||
Comment 28•23 years ago
|
||
brendan needs to fix JS malloc bug 62164, after his fast load work. :-)
Comment 29•23 years ago
|
||
rjesup: see bug 62164 and bug 72748. We use JS, it uses memory, so far so good.
The problem in the engine (62164) will help somewhat, but the bigger gain may
come from the FastLoad work (currently torching a tree near you). shaver's perl
script discloses ~9500 function objects, many of which I bet are not called at
startup, or typically any time soon after. Via nsFastLoadPtr and similar such
techniques, I hope we can lazily deserialize these from the FastLoad file.
/be
Comment 30•23 years ago
|
||
Any luck getting more recent data than Feb? Cathleen, John?
Comment 31•23 years ago
|
||
From a warm-start jprof in bug 7251:
malloc/free/new time is ~16-18% of startup on a fast Linux box.
Comment 32•23 years ago
|
||
http://www.mozilla.org/performance/mallocsites.html shows the number of times
malloc() was called with a particular call stack. Simple.
Comment 33•23 years ago
|
||
I think to get traction on a bug like this there needs to be data that says
there was exactly N number of allocations from double click to complete startup.
That data is needed for at least each milestone so one can see that progress is
being made. Daily data might be excessive but would at least provide an early
warning of regressions in this area.
rjesup has already checked in several patches which should decrease the number
of allocations. It would be really helpful to be able to quantify the progress
that is being made so that one would know of exactly what direction to take next.
Comment 34•23 years ago
|
||
Simon: is that updated? If not (and I'm pretty sure it's not), when was it
run? I was really hoping for some recent memory allocation traces.
Note: your data is useful, though it would also be nice to know the number of
times for a particular routine, as opposed to a match on the entire call stack.
For example, I'm more interested in how many times each routine called
PL_strdup, not the number of PL_strdups. It is possible to extract that, but
not easy. That was why Daniel's info was handy. We could almost use a
jprof-like output: full call stack matches, and jprof-like call trees.
In any case, I'd be happy for _any_ recent or useful data (and your data is
useful).
Comment 35•23 years ago
|
||
reassigning kandrot perf bugs.
Assignee: kandrot → cathleen
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---
Comment 36•23 years ago
|
||
Any chance we can get a list of startup allocations from a modern build to
attack? A bunch of allocation-reductions have hit the tree recently (I plan to
land the last of the nsVoidArray -> nsAutoVoidArray patches in the morning).
Assignee | ||
Comment 37•23 years ago
|
||
discussed a bit in performance meeting today, we plan to tackle the following
areas for reducing the number of memory allocs:
- JS malloc happy (brendan)
- make change xpcom registry format (shaver?)
- CSS parser (dbaron/perrier?)
- XML parser (harishd)
- string sharing, utf8 string type (jag/scc/jband)
- unify fastload (dp/brendan)
Component: Browser-General → Tracking
Reporter | ||
Comment 38•23 years ago
|
||
On popular demand I did a quick quantify run to confirm the old numbers:
CPU time
malloc: 142191 calls 676,9ms
new: 108792 calls 310,5ms
free: 86710 calls 120,0ms
delete: 58963 calls 54,7ms
---------------------------------
memory: 396656 calls 1162,1ms
I won't have time to break this into guilty callers, but it looks as if the
number of allocations have been reduced by ~20% since february. The big part of
the startup time still seems to be non CPU time, like locking and I/O wait.
Note that this is on a profile with very few bookmarks. Bookmarks cause a huge
number of mamory operations both in RDF and in bookmarksservice as revealed by
Simon's allocations lists. I have a bug about that that I fixed a long time ago,
but that I have not been able to get reviewed.
(tests done on an Duron 800MHz)
(and please Randell, spell my name correctly. :-) )
Comment 40•23 years ago
|
||
New data on malloc sites at startup will soon show at:
<http://www.mozilla.org/performance/mallocsites20011031.html>
Comment 41•23 years ago
|
||
sfraser: The xpti stuff at the top would indicate that this is probably a first
run (or that we are not properly doing autoregistration and saving xpti.dat on
Mac?). Assuming that this is in fact a first run, doesn't that skew the results
compared to a typical run? First run is usually part of the install process and
not indicative of normal usage.
Comment 42•23 years ago
|
||
Notable things from the data:
* The top malloc site is use of nsXPIDLCString in
nsComponentManagerImpl::PlatformPrePopulateRegistry (2094 mallocs)
* Next site is nsGenericContainerElement::SetAttr on nsXMLElements (2088)
* Next is string foo called from NS_NewAtom (2033)
* Loading xpt files accounts for the next two sites (two lots of 1884 allocs)
* nsCSSSelector::nsCSSSelector is next (1663)
etc.
nsComponentManagerImpl::PlatformPrePopulateRegistry accounts for over
2094 + 1420 + 746 + 674 + 674 + 674 + 2 + 1 = 6285 allocations altogether.
Comment 43•23 years ago
|
||
jband: this was not a first run. File read tracing also showed that all the XPT
files were getting read at startup each time too, so there seems to be something
wrong there. Maybe it's because I have aliases to XPT files in Components, rather
than the actual files? In theory that should still work.
Comment 44•23 years ago
|
||
For some of the fixed-size high-activity allocations, it may make sense to pool
them, and quite possibly bunch-allocate them from system malloc. I'm thinking
explicitly of things like Atoms (maybe; they have strings associated) and
Rules. Quite possibly others. Or find ways to include a bunch of them as
static data instead of allocating at startup.
Comment 45•23 years ago
|
||
rjesup: atoms coalesce each atom's allocation with its string.
sfraser: I see nsXULPrototypeScript::Serialize in two stacks, showing that you
didn't have a valid FastLoad file. If the normal startup has one (and it
should), we should measure that. If the file got invalidated for some bogus
reason (see bug 106021, now fixed), I may need a new bug.
/be
Comment 46•23 years ago
|
||
sfraser: I'll refesh my Mac build and poke around a bit at xpti.
Comment 47•23 years ago
|
||
jband: how should xpt file loading behave? I have a 68K xpti.dat file. Debugging
shows that it opens each .xpt file as necessary to get interface info.
Comment 48•23 years ago
|
||
sfraser: what it should do is: attempt to read xpti.dat (this is human readable
if you care), if it finds reasonable stuff there then it will only load .xpt
files when it needs to resolve a particular interfacce info stored inthat file.
At that point it will also pull in any other interface info from that .xpt file.
If reading the xpti.dat fails in the first place (including the appdir or
components dir path having changed) then it will do a full autoregister. This
entails reading each and every .xpt file and then writing a new xpti.dat for use
in the next run. If for some reason xpti is not happy with what it finds in the
xpti.dat on eac run then it is going to read each and every .xpt file on each
run. We want to ensure that this is not happening. I'm curious is the persistent
file path stuff (nsILocalFile::GetPersistentDescriptor) is perhaps unhappy with
your aliases or something. Something like tht might also (perhaps) cause
fastload problems. If you want to debug then fine. Otherwise I'll poke at it
tonight or tomorrow.
Comment 49•23 years ago
|
||
The data in mallocsites20011031.html were collected for a run after a previous
run that crashed. I suspect that's why both xpt loading and fastload were
confused.
New data have been pushed to mozilla.org, at the same URL, for a good run.
Comment 50•23 years ago
|
||
New data up. And brendan is the winner! Top allocations are now:
* 3129 + 2817 JS_Mallocs called from nsXULPrototypeScript::Deserialize().
Comment 51•23 years ago
|
||
That's more like it. :-)
We have a lot of chrome JS functions; that's not by itself bad, nor is it my bug
to fix by somehow reducing malloc calls. We need to whip out the JS profiler
(http://www.mozilla.org/performance/jsprofiler.html) and see how many are called
during startup -- if it's significantly fewer than the total, I'll start using
nsFastLoadPtrs (sort of; JS engine hooks required) to defer deserialization till
they're about to be called.
/be
Comment 52•23 years ago
|
||
I'm relieved to hear that I don't have to go figure out why xpti autoreg was not
working right on Mac.
Comment 53•23 years ago
|
||
I'm extracting the CSS data from this page, if someone wants to do the same for
the other modules. At first sight, the main culprits seem to be Parser, XUL,
CSS, Prefs and i18n.
Comment 54•23 years ago
|
||
Nice data sfraser. I will look to see if I can reduce the PrePopulateRegistry()
allocation count
Comment 55•23 years ago
|
||
The style system is responsible for approximately 24300 allocations, or 6.6% of
the total number of allocations at startup:
Selectors 10500
Declarations 3800
Rules 7000
Other 3000
See bug 106356 for more details.
Comment 56•23 years ago
|
||
JS profiler says that starting up through profile manager to first browser
window (it loads the default mozilla milestone home page) compiles 376 uncalled
JS functions consuming at least 108192 bytes of net-request-total heap space,
while 80 compiled functions are called at least once and take at least 20350 bytes.
Spinning off dependency bug 107907.
/be
Depends on: 107907
Comment 57•23 years ago
|
||
Sites that do over 100 mallocs in 2001-10-31 build
Comment 58•23 years ago
|
||
Sites that do over 100 mallocs in 2001-11-29 build
Updated•23 years ago
|
Keywords: mozilla1.0+
Comment 59•8 years ago
|
||
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE.
If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work
being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•