Closed Bug 30753 Opened 20 years ago Closed 19 years ago
typelib loading improvements for time and space performance
We are still using our original simple-minded scheme of loading all interface infos (II) from all typelibs at startup. As things have evolved this is now a performance hit in terms of both time and especially space. I have some numbers and a plan... The numbers below are from just starting up the browser and navigating to a page and then shutting down. Note that the info about typelib file loading would not be effected by use of other parts of the product, but the numbers on which IIs are actually used would change some if we, say, used some of the mail/new features in this test run. I wrote a very simple arena suballocator for libxpt that can be turned on for decoding only. It is not currently part of the build. There are some obvious things that can be made to improve its use of space. But, the stats gathered from it are of real interest (since they reflect the memory usage of the current builds): ()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()() Start xpt memory use stats 14779 times arena malloc called 305343 total bytes requested from arena malloc 20 average bytes requested per call to arena malloc 30 average bytes used per call to arena malloc 1452 during loading times arena free called 38824 during loading total bytes requested to free 50440 during loading total bytes not freed 14771 total times arena free called 331184 total bytes requested to free 449352 total bytes not freed 2 total times arena realloc called to shrink 8 total times arena realloc called to alloc 48 total bytes requested in realloc alloc 0 total times arena realloc called to free 0 total bytes not freed in realloc free 8 total times arena realloc called to grow 144 total bytes not freed in realloc grow 174 total bytes used in realloc grow 45 times arena called system malloc 460800 total bytes arena requested from system End xpt memory use stats ()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()() Timewise the fact that malloc is called 14779 times is significant. Spacewise 305k is being requested and 50k is being released. Since the blocks are on average small the spacewise heap overhead matters a lot. We are using upwards of 1/2 a meg to hold II structs in libxpt! I then instrumented nsInterfaceInfoManager to get some stats on our actual usage of the II structs that are loaded... <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> START interface info stats Interface infos that were actually used (useCount,name,filename): 301, nsISupports, xpcom_base.xpt 1, nsIRDFCompositeDataSource, rdf.xpt 1, nsIProgressEventSink, necko.xpt 2, nsILocale, locale.xpt 3, nsIRDFResource, rdf.xpt 1, nsIHTTPNotify, necko_http.xpt 36, nsIGlobalHistory, history.xpt 36, nsIBookmarksService, bookmarks.xpt 2, nsIRDFRemoteDataSource, rdf.xpt 1, nsINetNotify, necko.xpt 1, nsIFileSpec, xpcom_io.xpt 21, nsIJSIID, xpconnect.xpt 1, nsIRDFObserver, rdf.xpt 21, nsIXPCComponents_Interfaces, xpconnect.xpt 35, nsIInternetSearchService, search.xpt 2, nsISimpleEnumerator, xpcom_ds.xpt 3, nsIPref, pref.xpt 2, nsIDialogParamBlock, appshell.xpt 2, nsIJSCID, xpconnect.xpt 2, nsIAppShellService, appshell.xpt 1, nsIXPCComponents, xpconnect.xpt 3, nsIRDFDataSource, rdf.xpt 4, nsIStringBundleService, intl.xpt 2, nsIBrowserInstance, mozbrwsr.xpt 2, nsIRDFNode, rdf.xpt 4, nsIRDFService, rdf.xpt 18, nsIController, rdf.xpt 2, nsIRDFLiteral, rdf.xpt 2, nsIStringBundle, intl.xpt 2, nsICmdLineHandler, appshell.xpt 2, nsIXPCException, xpconnect.xpt 2, nsIRegistryDataSource, regviewer.xpt 20, nsICurrentCharsetListener, uconv.xpt 2, nsIFileLocator, appshell.xpt 4, nsILocaleService, locale.xpt 2, nsIXPCComponents_Classes, xpconnect.xpt 2, nsIProfile, profile.xpt ======================= 37 of 498 interfaces accessed Interface info files that were actually used: xpconnect.xpt xpcom_io.xpt xpcom_ds.xpt xpcom_base.xpt uconv.xpt search.xpt regviewer.xpt rdf.xpt profile.xpt pref.xpt necko_http.xpt necko.xpt mozbrwsr.xpt locale.xpt intl.xpt history.xpt bookmarks.xpt appshell.xpt ======================= 18 of 83 interface info files accessed for each file (interfaces used, total interfaces, filename): 2, 4, locale.xpt 1, 1, pref.xpt 9, 26, rdf.xpt 1, 3, search.xpt 2, 36, necko.xpt 1, 1, bookmarks.xpt 1, 1, profile.xpt 1, 30, xpcom_ds.xpt 4, 9, appshell.xpt 1, 1, history.xpt 1, 17, xpcom_io.xpt 1, 3, uconv.xpt 2, 2, intl.xpt 1, 1, regviewer.xpt 1, 6, xpcom_base.xpt 6, 24, xpconnect.xpt 1, 4, necko_http.xpt 1, 1, mozbrwsr.xpt ======================= 37 of 170 interfaces accessed in interface info files accessed END interface info stats <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> In the first table above we see that less than 10% of the loaded IIs were actually used (without running mail/new etc.) Note that the usage counts are based on how many times nsInterfaceInfoManager (IIM) was asked for this data. The clients of IIM do their own caching. The second table shows that only about a 25% of the typelib files loaded were accessed at all. The third table shows we accessed only about 20% of the interfaces of each typelib that had *any* interfaces accessed. This is becuase we currently use xpt_link to link the typelibs in each idl directory. Also, Purify shows me that using my simple arena code made loading the same full set of typelibs take only 25% of the time used without the arena code (on NT optimized). We have other data that shows that searching for and loading many .xpt files at startup is using too much time. For beta1 it was decided to use xpt_link at package time to build a big .xpt file. This way we spend less time finding .xpt files and doing file i/o to open/read/close them. (see bug 28964) All of the above leads me to believe that the best approach is a combination of: 1) Stop using xpt_link in the building of the .idl directories. Instead export all .xpt files to dist. 2) Use information from this runtime instrumentation to build a a manifest of .xpt files that we use. This will be based on some set of 'normal' application activities. This manifest will be used to drive xpt_link and build a big .xpt file. 3) Add support to use the zipfile support in libjar for loading .xpt files out of .zip files. 4) Zip up all the .xpt files *not* merged in item '2'. This allows us to deliver one or a few large .xpt files with the most used IIs and one or a few .zip files with all the rest of the IIs. A driving factor here is that the typelib format uses a data pool scheme and thus when reading *any* IIs from a typelib it is probably best to read them all. So, the commonly used IIs will go in big linked .xpt files (and be read all together) and the less used ones will go in .zip files as separate subfiles that can be individualy extracted into memeory and converted by libxpt. (Measuring performance of reading from zipfiles to check the viability of this part of the plan comes next). 5) The other big piece is that we need to extend libxpt so that it can extract the interface table without reading everything in the file. The interaction would be: a) IIM asks libxpt how many bytes to read from a file to read just the header. b) IIM reads in a file's header and hands that to libxpt. c) libxpt figures out from the header info how many bytes need to be read in order to read the interface table and responds with that number to IIM. d) IIM reads in those bytes from the file and gives them to libxpt with builds the in memory interface table. 6) Given item '5' then we can get rid of the requirement to read in .xpt files each and every time we start the app. Instead we can build a manifest table of (iface_name, uuid, zip_file_name, file_name) records *only* at autoreg time. Just like with DLLs we will require that in order for the system to know about new II files (.xpt and .zip) it will have to be told to do autoreg. We connect this to the same DLL autoreg system. So, only at autoreg time do we build the manififest of mappings of IIs to files. We can store this in a file in the file system or in the registry. We load that menifest at each startup. The IIM then builds a small in memory record of each interface without touching any typelib files. This allows us to enumerate the interfaces, etc. It lazily loads any interface for which the info that we need from the typelibs is requested. If we read one II from a file then we read all IIs from that file. So, the commonly used IIs will be gang loaded from the big .xpt file(s). The rarely used IIs will be grabbed from the .zip file(s) as needed. Typelibs for extensions (etc.) can be dropped into the components directory (or elsewhere if we support that) as .xpt or .zip files and autoreg forced. Note that we might want to use some other file extension instead of "zip"; e.g "xar" or something. 7) The above will save hugly on space and time. We can then decide whether or not to use the arena suballocator scheme. It's weaknesses are that it can't release any memory, it needs to align memory, and it needs to track block size to support realloc. With some tuning in libxpt I think that the wasted non-released memory might be smaller. The XPT_Malloc calls could specify alignment requirements to avoid byte waste. And we might be able to segregate the very few allocs that require later reallocs so that arena alloc'd blocks need not carry size info. So, with just a little work we may save a bunch more memory here by tightly packing these blocks. I think that we can implement the above without a huge amount of work and that it all constitutes a reasonable tradeoff a huge will in footprint.
er, "...and a huge win in footprint"
Very very cool.
A status update... I have this substantially working. I did make a strategy change. Rather than extend libxpt to read only headers, I realized that the times when we need to go back and look at exactly what is in a changed xpt file is the perfect time to look deeply into the file and make certain that it is not in conflict with any other xpt file. As we get past the stage of locked-down shipped interface definitions we need to check that modified interface definitions do not creap in. This is especially dangerous in add-on components where some developer might foolishly modify some shipping interface definition. This is a good place to guard against that and to warn developers if they do something stupid. I have a scheme for preserving old interface definition search order and for knowing which definitions are suspect as new xpt files are added. I think that this is important. As far as what is currently working goes... My code is now reading and writing interface manifests and doing incremental on-demand loading of interface definitions. Mozilla runs. Loading from .zip files is working. AutoReg does a full grovel. I have frameworks in place for doing interface definition verification, but I need to add code to do that. I also need to add the code that will do the minimal work when no xpt/zip files have changed or when only additions have been made. The difference in memory footprint looks like it is going down from ~400K to ~50K. I intend to soon checkin may changes to libxpt and changes to xptinfo that will allow it to conditionally compile to use either the old or new xptinfo scheme. When my work is done we'll make it use only the new scheme and cvs remove the old files.
When you do implement .zip support you will be giving them some unique extension or directory, right? It would be unfriendly to steal the generic .zip extension. May I suggest .xptz ? though I don't really care as long as it's not .jar, .zip or some other already taken extension.
zip support *is* implemented. We're just not making use of it in our packaging yet. To try it out do something like: del xpti.dat zip nsi.zip nsI*.xpt del nsI*.xpt Dan, didn't we have this conversation? Right now xpti will scan .zip and .jar files for top level .xpt entries. We could change that, but I don't see why. Putting .xpt file into .zip or .jar files does not preclude putting other things in files with those extensions or into the very same files. We might even extend the JS component loader to load .js files from jars and let people just drop .jar files containing .js and .xpt files into the components directory and force an autoreg without having to unzip or anything.
Oops, I was doing my earlier testing in a tree that exported .xpt files without doing xpt_link so there *were* nsI*.xpt files. If anyone cares to play with this zip loader stuff then try instead: del xpti.dat zip nsi.zip xpc*.xpt del xpc*.xpt Also I see that the progid for he zipreader changed. This broke my code. I just checked in a fix to xpti to use the new progid. Should work on the tip.
Was any of this ever checked in?
selmer: This pretty much got morphed into bug 46707 (about which your same question could be asked!). The .zip support did get implemented. However, a different strategy for packaging evolved. The packaging changes were not implemented - so the gain is still to be made. I'll close this bug and comment in bug 46707
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.