Closed Bug 30753 Opened 20 years ago Closed 19 years ago

typelib loading improvements for time and space performance

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jband_mozilla, Assigned: jband_mozilla)

References

Details

(Keywords: perf)

We are still using our original simple-minded scheme of loading all interface 
infos (II) from all typelibs at startup. As things have evolved this is now a 
performance hit in terms of both time and especially space. I have some numbers 
and a plan...

The numbers below are from just starting up the browser and navigating to a page 
and then shutting down. Note that the info about typelib file loading would not 
be effected by use of other parts of the product, but the numbers on which IIs 
are actually used would change some if we, say, used some of the mail/new 
features in this test run.

I wrote a very simple arena suballocator for libxpt that can be turned on for 
decoding only. It is not currently part of the build. There are some obvious 
things that can be made to improve its use of space. But, the stats gathered 
from it are of real interest (since they reflect the memory usage of the current 
builds):

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
Start xpt memory use stats

14779 times arena malloc called
305343 total bytes requested from arena malloc
20 average bytes requested per call to arena malloc
30 average bytes used per call to arena malloc

1452 during loading times arena free called
38824 during loading total bytes requested to free
50440 during loading total bytes not freed

14771 total times arena free called
331184 total bytes requested to free
449352 total bytes not freed

2 total times arena realloc called to shrink
8 total times arena realloc called to alloc
48 total bytes requested in realloc alloc
0 total times arena realloc called to free
0 total bytes not freed in realloc free
8 total times arena realloc called to grow
144 total bytes not freed in realloc grow
174 total bytes used in realloc grow

45 times arena called system malloc
460800 total bytes arena requested from system

End xpt memory use stats
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()

Timewise the fact that malloc is called 14779 times is significant. Spacewise 
305k is being requested and 50k is being released. Since the blocks are on 
average small the spacewise heap overhead matters a lot. We are using upwards of 
1/2 a meg to hold II structs in libxpt!

I then instrumented nsInterfaceInfoManager to get some stats on our actual usage 
of the II structs that are loaded...

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
START interface info stats

Interface infos that were actually used (useCount,name,filename):
301, nsISupports, xpcom_base.xpt
1, nsIRDFCompositeDataSource, rdf.xpt
1, nsIProgressEventSink, necko.xpt
2, nsILocale, locale.xpt
3, nsIRDFResource, rdf.xpt
1, nsIHTTPNotify, necko_http.xpt
36, nsIGlobalHistory, history.xpt
36, nsIBookmarksService, bookmarks.xpt
2, nsIRDFRemoteDataSource, rdf.xpt
1, nsINetNotify, necko.xpt
1, nsIFileSpec, xpcom_io.xpt
21, nsIJSIID, xpconnect.xpt
1, nsIRDFObserver, rdf.xpt
21, nsIXPCComponents_Interfaces, xpconnect.xpt
35, nsIInternetSearchService, search.xpt
2, nsISimpleEnumerator, xpcom_ds.xpt
3, nsIPref, pref.xpt
2, nsIDialogParamBlock, appshell.xpt
2, nsIJSCID, xpconnect.xpt
2, nsIAppShellService, appshell.xpt
1, nsIXPCComponents, xpconnect.xpt
3, nsIRDFDataSource, rdf.xpt
4, nsIStringBundleService, intl.xpt
2, nsIBrowserInstance, mozbrwsr.xpt
2, nsIRDFNode, rdf.xpt
4, nsIRDFService, rdf.xpt
18, nsIController, rdf.xpt
2, nsIRDFLiteral, rdf.xpt
2, nsIStringBundle, intl.xpt
2, nsICmdLineHandler, appshell.xpt
2, nsIXPCException, xpconnect.xpt
2, nsIRegistryDataSource, regviewer.xpt
20, nsICurrentCharsetListener, uconv.xpt
2, nsIFileLocator, appshell.xpt
4, nsILocaleService, locale.xpt
2, nsIXPCComponents_Classes, xpconnect.xpt
2, nsIProfile, profile.xpt
=======================
37 of 498 interfaces accessed

Interface info files that were actually used:
xpconnect.xpt
xpcom_io.xpt
xpcom_ds.xpt
xpcom_base.xpt
uconv.xpt
search.xpt
regviewer.xpt
rdf.xpt
profile.xpt
pref.xpt
necko_http.xpt
necko.xpt
mozbrwsr.xpt
locale.xpt
intl.xpt
history.xpt
bookmarks.xpt
appshell.xpt
=======================
18 of 83 interface info files accessed

for each file (interfaces used, total interfaces, filename):
2, 4, locale.xpt
1, 1, pref.xpt
9, 26, rdf.xpt
1, 3, search.xpt
2, 36, necko.xpt
1, 1, bookmarks.xpt
1, 1, profile.xpt
1, 30, xpcom_ds.xpt
4, 9, appshell.xpt
1, 1, history.xpt
1, 17, xpcom_io.xpt
1, 3, uconv.xpt
2, 2, intl.xpt
1, 1, regviewer.xpt
1, 6, xpcom_base.xpt
6, 24, xpconnect.xpt
1, 4, necko_http.xpt
1, 1, mozbrwsr.xpt
=======================
37 of 170 interfaces accessed in interface info files accessed

END interface info stats
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

In the first table above we see that less than 10% of the loaded IIs were 
actually used (without running mail/new etc.) Note that the usage counts are 
based on how many times nsInterfaceInfoManager (IIM) was asked for this data. 
The clients of IIM do their own caching.

The second table shows that only about a 25% of the typelib files loaded were 
accessed at all.

The third table shows we accessed only about 20% of the interfaces of each 
typelib that had *any* interfaces accessed. This is becuase we currently use 
xpt_link to link the typelibs in each idl directory.

Also, Purify shows me that using my simple arena code made loading the same full 
set of typelibs take only 25% of the time used without the arena code (on NT 
optimized).

We have other data that shows that searching for and loading many .xpt files at 
startup is using too much time. For beta1 it was decided to use xpt_link at 
package time to build a big .xpt file. This way we spend less time finding .xpt 
files and doing file i/o to open/read/close them. (see bug 28964)

All of the above leads me to believe that the best approach is a combination of:

1) Stop using xpt_link in the building of the .idl directories. Instead export 
all .xpt files to dist.

2) Use information from this runtime instrumentation to build a a manifest of 
.xpt files that we use. This will be based on some set of 'normal' application 
activities. This manifest will be used to drive xpt_link and build a big .xpt 
file.

3) Add support to use the zipfile support in libjar for loading .xpt files out 
of .zip files.

4) Zip up all the .xpt files *not* merged in item '2'. This allows us to deliver 
one or a few large .xpt files with the most used IIs and one or a few .zip files 
with all the rest of the IIs. A driving factor here is that the typelib format 
uses a data pool scheme and thus when reading *any* IIs from a typelib it is 
probably best to read them all. So, the commonly used IIs will go in big linked 
.xpt files (and be read all together) and the less used ones will go in .zip 
files as separate subfiles that can be individualy extracted into memeory and 
converted by libxpt. (Measuring performance of reading from zipfiles to 
check the viability of this part of the plan comes next).

5) The other big piece is that we need to extend libxpt so that it can extract 
the interface table without reading everything in the file. The interaction 
would be: a) IIM asks libxpt how many bytes to read from a file to read just the 
header. b) IIM reads in a file's header and hands that to libxpt. c) libxpt 
figures out from the header info how many bytes need to be read in order to read 
the interface table and responds with that number to IIM. d) IIM reads in those 
bytes from the file and gives them to libxpt with builds the in memory interface 
table.

6) Given item '5' then we can get rid of the requirement to read in .xpt files 
each and every time we start the app. Instead we can build a manifest table of 
(iface_name, uuid, zip_file_name, file_name) records *only* at autoreg time. 
Just like with DLLs we will require that in order for the system to know about 
new II files (.xpt and .zip) it will have to be told to do autoreg. We 
connect this to the same DLL autoreg system. So, only at autoreg time do we 
build the manififest of mappings of IIs to files. We can store this in a file in 
the file system or in the registry. We load that menifest at each startup. The 
IIM then builds a small in memory record of each interface without touching any 
typelib files. This allows us to enumerate the interfaces, etc. It lazily loads 
any interface for which the info that we need from the typelibs is requested. If 
we read one II from a file then we read all IIs from that file. So, the commonly 
used IIs will be gang loaded from the big .xpt file(s). The rarely used 
IIs will be grabbed from the .zip file(s) as needed. Typelibs for extensions 
(etc.) can be dropped into the components directory (or elsewhere if we support 
that) as .xpt or .zip files and autoreg forced. Note that we might want to use 
some other file extension instead of "zip"; e.g "xar" or something.

7) The above will save hugly on space and time. We can then decide whether or 
not to use the arena suballocator scheme. It's weaknesses are that it can't 
release any memory, it needs to align memory, and it needs to track block size 
to support realloc. With some tuning in libxpt I think that the wasted 
non-released memory might be smaller. The XPT_Malloc calls could specify 
alignment requirements to avoid byte waste. And we might be able to segregate 
the very few allocs that require later reallocs so that arena alloc'd blocks 
need not carry size info. So, with just a little work we may save a bunch more 
memory here by tightly packing these blocks.
 
I think that we can implement the above without a huge amount of work and that 
it all constitutes a reasonable tradeoff a huge will in footprint.
er, "...and a huge win in footprint"
Status: NEW → ASSIGNED
Very very cool.
Blocks: 27510
Keywords: perf
A status update...

I have this substantially working.

I did make a strategy change. Rather than extend libxpt to read only headers, I 
realized that the times when we need to go back and look at exactly what is in a 
changed xpt file is the perfect time to look deeply into the file and make 
certain that it is not in conflict with any other xpt file. As we get past the 
stage of locked-down shipped interface definitions we need to check that 
modified interface definitions do not creap in. This is especially dangerous in  
add-on components where some developer might foolishly modify some shipping 
interface definition. This is a good place to guard against that and to warn 
developers if they do something stupid. I have a scheme for preserving old 
interface definition search order and for knowing which definitions are suspect 
as new xpt files are added. I think that this is important.

As far as what is currently working goes... My code is now reading and writing 
interface manifests and doing incremental on-demand loading of interface 
definitions. Mozilla runs. Loading from .zip files is working. AutoReg does a 
full grovel. I have frameworks in place for doing interface definition 
verification, but I need to add code to do that. I also need to add the code 
that will do the minimal work when no xpt/zip files have changed or when only 
additions have been made.

The difference in memory footprint looks like it is going down from ~400K to 
~50K.

I intend to soon checkin may changes to libxpt and changes to xptinfo that will 
allow it to conditionally compile to use either the old or new xptinfo scheme. 
When my work is done we'll make it use only the new scheme and cvs remove the 
old files.
When you do implement .zip support you will be giving them some unique 
extension or directory, right? It would be unfriendly to steal the generic .zip 
extension. May I suggest .xptz ? though I don't really care as long as it's not 
.jar, .zip or some other already taken extension.
zip support *is* implemented. We're just not making use of it in our 
packaging yet. To try it out do something like:

del xpti.dat
zip nsi.zip nsI*.xpt
del nsI*.xpt

Dan, didn't we have this conversation? Right now xpti will scan .zip and .jar 
files for top level .xpt entries. We could change that, but I don't see why. 
Putting .xpt file into .zip or .jar files does not preclude putting other things 
in files with those extensions or into the very same files. We might even extend 
the JS component loader to load .js files from jars and let people just drop 
.jar files containing .js and .xpt files into the components directory and 
force an autoreg without having to unzip or anything.
Oops, I was doing my earlier testing in a tree that exported .xpt files without 
doing xpt_link so there *were* nsI*.xpt files. If anyone cares to play with this 
zip loader stuff then try instead:

del xpti.dat
zip nsi.zip xpc*.xpt
del xpc*.xpt

Also I see that the progid for he zipreader changed. This broke my code. I just 
checked in a fix to xpti to use the new progid. Should work on the tip.



Was any of this ever checked in?
selmer: This pretty much got morphed into bug 46707 (about which your same 
question could be asked!). The .zip support did get implemented. However, a 
different strategy for packaging evolved. The packaging changes were not 
implemented - so the gain is still to be made. I'll close this bug and comment 
in bug 46707
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Component: xpidl → XPCOM
QA Contact: mike+mozilla → xpcom
You need to log in before you can comment on or make changes to this bug.