Closed Bug 67618 Opened 23 years ago Closed 8 years ago

Reduce number of memory allocations at startup

Categories

(Core Graveyard :: Tracking, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: bratell, Assigned: cathleennscp)

References

(Blocks 1 open bug, )

Details

(Keywords: meta, perf)

Attachments

(3 files)

It has been seen that we spend 40% of the CPU time during startup allocating
memory, making around 300 000 calls to malloc or new. If we could reduce that
number significantly we might be able to make the program start quicker. 

I've made a list of which functions are responsible for the memory allocations
which I will attach to this bug. 

Methods to reduce number of memory allocations is:
1. Use a local buffer and pass it into the function that would otherwise return
a newed pointer.
2. Use the original data instead of cloning it.
3. Use an arena.
4. Avoid the code altogether. 5. Pass objects by reference.


As can be seen from the list, there are no top offenders so I guess it's just to
try to make small improvements everywhere gaining some ms here and some ms
there. One thing I noticed though, is that we initialize more than a thousand
RDF resources. Now, I don't know what an RDF resource is, but that seems to be
alot of them during startup.

The measurements were done on a Windows 2000 build from 2001-02-02.
Blocks: 7251
Keywords: perf
This is interesting information. How did you collect it?
I started Mozilla with Quantify and then directly quit. I then started to look 
at the functions new and malloc. Since they couldn't be responsible for being 
called, I replaced them with their respective callers. Among those there were 
also many functions not responsible themselves like "Clone" functions, strdup, 
PR_Malloc and so on and I replaced those with their callers until at least the 
top 50 contained (mostly) functions where I couldn't put the blame elsewhere. At 
least not without distorting the picture. I also merged functions that appeared 
from many directions (nsHashtable::Put comes to mind). So it's not very 
difficult to collect the data. It was just a lot of manual work. :-)

I still have the qfy file if someone wants to look at it.
The js_NewScopeProperty malloc craze is bug 62164.  Track it from here via a
dependency if you like, but please don't dup it.

/be
Did you start and stop Mozilla with Quantify running the whole time, or did you
start Mozilla, then pause Quantify, then quit.  They will generate slighty
different timings/results.  I believe that the second method is what we want, so
I just want to verify.
I'm not 100% sure but I think the run that I extracted the data from is 
including the shutdown. Quantify just crashed when I tried to pause it before 
shutting down. Anyway, I would think that the number of memory allocations 
happening during shutdown is neglectable. If not, something is seriously wrong.
someone who knows more want to take this bug from me?
since waterson recognized it, *poof*
Assignee: asa → waterson
kandrot has been studin' this.
Assignee: waterson → kandrot
Status: NEW → ASSIGNED
I will see what can be done.  There are a lot of mallocs spread all over the
code, and will take time to track down the owner and find out if they are really
needed.  I am looking for a global solution currently.
Target Milestone: --- → mozilla1.0
Blocks: 71874
No longer blocks: 7251
Maybe you could try a faster implementation of malloc. Just an idéa.
Platform/OS All/All from PC/W2k. If I'm wrong, put it back.
OS: Windows 2000 → All
Hardware: PC → All
The fundamental problem is simply the number of objects (allocations) that get
used, especially for short periods and then thrown away.  Pools and arenas are
one way to attack the problem (beyond the obvious but tough solution of writing
to code to not need to allocate zillions of objects).  Others might involve
making things like arrays (nsVoidArray::InsertElementAt is high on the list)
increase allocations by more than one entry each time.  This is already (I
think) done for strings.  I may take a look at the Array code, but this won't
solve the problem of algorithms that assume/require a lot of objects.
Sorry, nsVoidArray and nsSupportsArray do increase by more than 1 (nsSupports
increases by 8 always, and nsVoidArray increases by either 4 or double the
current size).  However, they're still high on the list of allocations,
especially void, so I'm looking at them some more.
Places to look at (from my old collected data) for nsVoidArray stuff could be:

CSSDeclarationImpl::AppendValue (creates ~900 arrays, does 3600 InsertElementAt)
nsGenericHTMLContainerElement::AppendChildTo (does 3500 AppendElement)
nsXULElement::AppendChildTo  (does 1800 InsertElementAt)
nsGenericContainerElement::SetAttribute (does 1500 InsertElementAt)

Another point, maybe the nsDeque the nsHTMLTokenizer uses should be bigger 
initally. It seems to grow several times during the startup.



nsVoidArray patch in bug 90545.  I'll look at some of the stats, thanks.
Depends on: 90545
I found a lot of the reason for all the nsVoidArray::InsertElementAt allocations
is the CSS code.  It uses lots of nsVoidArrays which are then immediately
appended to.  I'm going to look at either (a) switching some of them to
nsAutoVoidArrays, and/or (b) adding a constructor that takes a single array
element to be inserted at construction time.  (a) probably makes more sense.
I have some major reductions in nsVoidArray allocations (largely by judicious
use of nsAutoVoidArrays, and by tweaking the grow routine in nsVoidArray).
Currently I'm down to ~1900 allocations/reallocations of arrays, down from close
to 15000 when I started.  See bug 90545 for ongoing analysis and patches.  I
could use some help (once I post my next patch to bug 90545) doing profiles and
startup measurements, and making sure I didn't hurt footprint too much (I'm not
really set up to do them here).
jrgm's our profiling guru I believe.
Blocks: 7251
Simon Fraser wrote:
>
>Here's my data on malloc sites:
>
>   <http://www.mozilla.org/performance/mallocsites.html>

Ummm... Simon - I looked at your data, and I'm confused.  What _exactly_ was it
that you put in that file?
sfraser is away right now, so you may want to ping him with email, which he'll
be more likely to see when he gets back (overflowing inbox and all).
Carolyn is supposed to get new quantify and trace-malloc data in the next day 
or so.  I advise emailing her if you have advise on how to generate some useful 
reports like what's attached here.
cathleen
Cathleen.  Duh.  (slaps forehead).  Sorry.  Late night (I work with a Carolyn
who does similar things.)

One thing to note about the (old) stats that Daniel produced: strings and js are
by far the worst at startup.  Not that this is surprising...  Perhaps someone
should  look at JS allocation habits as well.
randell: will the JS numbers be still of interest, if most of the Mozilla-JS
code is precompiled? Thats bug 68045... There is also a bug 62164 for JS-mallocs
JS may still be allocating a lot of memory even if precompiled.  We'll see.
brendan needs to fix JS malloc bug 62164, after his fast load work.  :-)
rjesup: see bug 62164 and bug 72748.  We use JS, it uses memory, so far so good.
 The problem in the engine (62164) will help somewhat, but the bigger gain may
come from the FastLoad work (currently torching a tree near you).  shaver's perl
script discloses ~9500 function objects, many of which I bet are not called at
startup, or typically any time soon after.  Via nsFastLoadPtr and similar such
techniques, I hope we can lazily deserialize these from the FastLoad file.

/be
Any luck getting more recent data than Feb?  Cathleen, John?
No longer blocks: 7251
Blocks: 7251
From a warm-start jprof in bug 7251:

malloc/free/new time is ~16-18% of startup on a fast Linux box.
http://www.mozilla.org/performance/mallocsites.html shows the number of times 
malloc() was called with a particular call stack. Simple.
I think to get traction on a bug like this there needs to be data that says
there was exactly N number of allocations from double click to complete startup.
That data is needed for at least each milestone so one can see that progress is
being made. Daily data might be excessive but would at least provide an early
warning of regressions in this area.

rjesup has already checked in several patches which should decrease the number
of allocations. It would be really helpful to be able to quantify the progress
that is being made so that one would know of exactly what direction to take next.
Simon: is that updated?  If not (and I'm pretty sure it's not), when was it
run?  I was really hoping for some recent memory allocation traces. 

Note: your data is useful, though it would also be nice to know the number of
times for a particular routine, as opposed to a match on the entire call stack. 
For example, I'm more interested in how many times each routine called
PL_strdup, not the number of PL_strdups.  It is possible to extract that, but
not easy.  That was why Daniel's info was handy.  We could almost use a
jprof-like output: full call stack matches, and jprof-like call trees.

In any case, I'd be happy for _any_ recent or useful data (and your data is
useful).
reassigning kandrot perf bugs.
Assignee: kandrot → cathleen
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---
Any chance we can get a list of startup allocations from a modern build to
attack?  A bunch of allocation-reductions have hit the tree recently (I plan to
land the last of the nsVoidArray -> nsAutoVoidArray patches in the morning).

discussed a bit in performance meeting today, we plan to tackle the following
areas for reducing the number of memory allocs:

- JS malloc happy  (brendan)
- make change xpcom registry format  (shaver?)
- CSS parser (dbaron/perrier?)
- XML parser (harishd)
- string sharing, utf8 string type (jag/scc/jband)
- unify fastload (dp/brendan)
Component: Browser-General → Tracking
On popular demand I did a quick quantify run to confirm the old numbers:

                         CPU time
malloc:   142191 calls   676,9ms
new:      108792 calls   310,5ms
free:      86710 calls   120,0ms
delete:    58963 calls    54,7ms
---------------------------------
memory:   396656 calls  1162,1ms

I won't have time to break this into guilty callers, but it looks as if the
number of allocations have been reduced by ~20% since february. The big part of
the startup time still seems to be non CPU time, like locking and I/O wait.


Note that this is on a profile with very few bookmarks. Bookmarks cause a huge
number of mamory operations both in RDF and in bookmarksservice as revealed by
Simon's allocations lists. I have a bug about that that I fixed a long time ago,
but that I have not been able to get reviewed.


(tests done on an Duron 800MHz)
(and please Randell, spell my name correctly. :-) )
Adding meta keyword. Sorry for the spam!
Keywords: meta
sfraser: The xpti stuff at the top would indicate that this is probably a first 
run (or that we are not properly doing autoregistration and saving xpti.dat on 
Mac?). Assuming that this is in fact a first run, doesn't that skew the results 
compared to a typical run? First run is usually part of the install process and 
not indicative of normal usage.
Notable things from the data:
* The top malloc site is use of nsXPIDLCString in 
  nsComponentManagerImpl::PlatformPrePopulateRegistry (2094 mallocs)
* Next site is nsGenericContainerElement::SetAttr on nsXMLElements (2088)
* Next is string foo called from NS_NewAtom (2033)
* Loading xpt files accounts for the next two sites (two lots of 1884 allocs)
* nsCSSSelector::nsCSSSelector is next (1663)
etc.

nsComponentManagerImpl::PlatformPrePopulateRegistry accounts for over
2094 + 1420 + 746 + 674 + 674 + 674 + 2 + 1 = 6285 allocations altogether.
jband: this was not a first run. File read tracing also showed that all the XPT 
files were getting read at startup each time too, so there seems to be something 
wrong there. Maybe it's because I have aliases to XPT files in Components, rather 
than the actual files? In theory that should still work.
For some of the fixed-size high-activity allocations, it may make sense to pool
them, and quite possibly bunch-allocate them from system malloc.  I'm thinking
explicitly of things like Atoms (maybe; they have strings associated) and
Rules.  Quite possibly others.  Or find ways to include a bunch of them as
static data instead of allocating at startup.
rjesup: atoms coalesce each atom's allocation with its string.

sfraser: I see nsXULPrototypeScript::Serialize in two stacks, showing that you
didn't have a valid FastLoad file.  If the normal startup has one (and it
should), we should measure that.  If the file got invalidated for some bogus
reason (see bug 106021, now fixed), I may need a new bug.

/be
sfraser: I'll refesh my Mac build and poke around a bit at xpti.
jband: how should xpt file loading behave? I have a 68K xpti.dat file. Debugging 
shows that it opens each .xpt file as necessary to get interface info.
sfraser: what it should do is: attempt to read xpti.dat (this is human readable 
if you care), if it finds reasonable stuff there then it will only load .xpt 
files when it needs to resolve a particular interfacce info stored inthat file. 
At that point it will also pull in any other interface info from that .xpt file. 
If reading the xpti.dat fails in the first place (including the appdir or 
components dir path having changed) then it will do a full autoregister. This 
entails reading each and every .xpt file and then writing a new xpti.dat for use 
in the next run. If for some reason xpti is not happy with what it finds in the 
xpti.dat on eac run then it is going to read each and every .xpt file on each 
run. We want to ensure that this is not happening. I'm curious is the persistent 
file path stuff (nsILocalFile::GetPersistentDescriptor) is perhaps unhappy with 
your aliases or something. Something like tht might also (perhaps) cause 
fastload problems. If you want to debug then fine. Otherwise I'll poke at it 
tonight or tomorrow. 
The data in mallocsites20011031.html were collected for a run after a previous 
run that crashed. I suspect that's why both xpt loading and fastload were 
confused.

New data have been pushed to mozilla.org, at the same URL, for a good run.
New data up. And brendan is the winner! Top allocations are now:
* 3129 + 2817 JS_Mallocs called from nsXULPrototypeScript::Deserialize().
That's more like it.  :-)

We have a lot of chrome JS functions; that's not by itself bad, nor is it my bug
to fix by somehow reducing malloc calls.  We need to whip out the JS profiler
(http://www.mozilla.org/performance/jsprofiler.html) and see how many are called
during startup -- if it's significantly fewer than the total, I'll start using
nsFastLoadPtrs (sort of; JS engine hooks required) to defer deserialization till
they're about to be called.

/be
I'm relieved to hear that I don't have to go figure out why xpti autoreg was not 
working right on Mac.
I'm extracting the CSS data from this page, if someone wants to do the same for 
the other modules.  At first sight, the main culprits seem to be Parser, XUL, 
CSS, Prefs and i18n.
Nice data sfraser. I will look to see if I can reduce the PrePopulateRegistry()
allocation count
Depends on: 106356
The style system is responsible for approximately 24300 allocations, or 6.6% of 
the total number of allocations at startup:
    Selectors     10500
    Declarations   3800
    Rules          7000
    Other          3000
See bug 106356 for more details.  
JS profiler says that starting up through profile manager to first browser
window (it loads the default mozilla milestone home page) compiles 376 uncalled
JS functions consuming at least 108192 bytes of net-request-total heap space,
while 80 compiled functions are called at least once and take at least 20350 bytes.

Spinning off dependency bug 107907.

/be
Depends on: 107907
No longer depends on: 107907
Depends on: 107907
Depends on: 108071
Sites that do over 100 mallocs in 2001-10-31 build
Sites that do over 100 mallocs in 2001-11-29 build
Keywords: mozilla1.0+
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE.
If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work
being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.