Closed Bug 28964 Opened 20 years ago Closed 20 years ago
_Init XPCOM() takes 3 .35 secs out of 22 secs of startup
921 bytes, patch
|Details | Diff | Splinter Review|
Function % #call Time (secs) ----------------------------------------------------------- main 100.00 1 25.35 main1 86.79 1 22.00 NS_InitXPCOM 13.21 1 3.35 That is about 15% of startup. Most of it is xpt. Will post more details.
Nominating for beta1 for startup performance
Function % #call Time (milsecs) ----------------------------------------------------------- NS_InitXPCOM 13.21 1 3350 XPTI_GetInterfaceInfoManager 84.66 1 2835 nsComponentManagerImpl::Init 8.43 1 282 PlatformPrePopulateRegistry 6.64 1 222 RegisterGenericFactory 0.07 35 2 So as we can see, 85% of the init xpcom time is being spent in XPTI_GetInterfaceInfoManager. We need to optmize the heck out of XPTI_InterfaceInfoManager, CompMgr::Init() and CompMgr::PrePopulateRegistry()
dp, Are you saying that you want .xpt loading out of NS_InitXPCOM or that you just want it to be faster? The xpt files are going to get loaded before we show a window anyway. NS_InitXPCOM is not such a bad place. Various people have looked into solutions without finding anything that makes a lot of difference. There have been newsgroup threads on the topic. Some options that come to mind... 1) tune the set of .xpt file to the minimum set needed. 2) link together .xpt files in the package phase into fewer large files. 3) load xpt stuff more incrementally. 4) look for speed problems in the .xpt loading code. notes: - There are real platform issues here. sfraser was seeing a lot of cost in iterating the directory and stat'ing files looking for .xpt files on Mac. - Incremental loading schemes would require either prereading the file headers to find *where* particular interface info is, or building a manifest(s) ahead of time and loading *it*. Without careful tuning of sets of files with sets of interfaces that are known to be needed and loaded together there is real danger that we'll be spending a lot of time going back again and again to the file system - incremental loading could be a net loss. The real point here is that too much of our info on how to improve this is anecdotal. It may be that we can't help it much. I agree that this is a good place to do some focused work to get the numbers and evaluate options. I don't mind doing some of that myself.
I wanted to see if we can improve the performance of XPTI_InterfaceInfoManager. Here is more data for you: Function % #call Time (secs) ------------------------------------------------------------------- initInterfaceTables 100.00 1 2835.37 nsInterfaceInfoManager::indexify_file 90.05 83 2553.24 nsFileSpec::IsFile 9.60 185 272.10 All that you said sounds good. Reassigning to jband. Doing incremental stuff would be good. Let us talk more about this. Do you bandwidth to tackle this for beta1
Assignee: dp → jband
Status: ASSIGNED → NEW
Putting on PDT- radar for beta. If you get compelling data to guarantee improvement, with low risk, can do for beta1.
I'm clearing PDT- since I think we need to improve startup time and since DP said in mail he'd like the PDT to reconsider it.
dp, I got sidetracked today to work on a crashing bug. I did want to relay some info... My quantify results on a fast (450mhz) NT machine with a fast disk subsystem and no files in cache running a release build shows NS_InitXPCOM taking about 1/3 of a second. Of course most machines will be a lot slower. Here is the breakdown I see: (each indent level accounts for 100% of previous level) 100% NS_InitXPCOM 48% XPTI_GetInterfaceInfoManager 30% iterating files and doing stat to find .xpt files 178 calls 66% nsInterfaceInfoManager::indexify_file 16% doing stat (again) for file size 88 calls 72% XPT_DoHeader (in memory only - malloc/free/memcpy/etc) > 50,000 calls 1% doing file reads (only!?!) 40% nsCompentManager::PrePopulateRegistry 30% nsID::Parse (1006 calls - uses PR_sscanf) 70% everything else 12% misc. other stuff As far as xpt stuff goes I think that we should focus on 3 things: 1) Reduce the number of xpt files at runtime by linking them at package time to one or a handful of files. We have the xpt_link tool that does this to make one .xpt file per idl source directory. But we are not using it to reduce the count at package time. We'd need a system to do this. I really haven't learned much about xpinstall yet. *Maybe* we could just leverage the packager manifests that exist now and have the packager just apply a different rule to .xpt files in the lists... Now the packager just copies each .xpt file in the manifest. Maybe we could have the packager build a script to run xpt_link by extracting the name of each .xpt file from the existing manifests and then use that script to link together one .xpt file and package it. Anyway, someone needs to look into this. If it is the case that the file i/o part is more expensive on most machines than on mine then reducing the file count could be the biggest win. We gain both in reducing the number of files whose size we need to assertain before allocing enough memory to read them and we gain in reduxcing the number of files to open/read/close. 2) Build a runtime manifest of the .xpt files to load. This would be a list generated each and every time we do autoreg. It would include our big .xpt file generated in item '1' above and also whatever other .xpt files might have found their way into the components dir. With this manifest we don't have to search the components dir on each startup. Still, it is not hardcoded; so it can be extended as components get installed. This manifest could be a standalone file at a hardcoded relative path location or it could be a (short!) list in the registry - assuming that we can read it before needing to know *any* typelib info. I think that this should just be a list of .xpt file. I don't think we need to handle .xpt files in different directories; i.e. no need to store path info. I also think that we don't need to fiddle with info about which interface info is in which file - incremental xpt loading is not the thing to do anytime soon. This system should be very linked to the system that does autoreg of DLLs. 3) The in memory stuff in libxpt is heavy on small allocs and little memcpy calls. It builds trees of structs that are never incrementally torn down. It is a small system. I think that it is a prime candidate for an arena. This could be a big win. I think that these are three separate complementary tasks. If we can get some one(s) to help then I'd be inclined to work on item '3' because I'm familiar with libxpt and the junk around it. Someone would need to figure out xpinstall and the packaging system to do item '1'. We might need that team's help for that. Someone familiar with the component loading ought to be able to rip through item '2'. If I'm to do these things myself I'd probably go 1-2-3 because limiting the amount of repetative i/o just seems like the biggest win. If we can get help on 1 and/or 2 Then I'd rather jump on 3. P.S. nsID::Parse looks pretty ugly to me. I wonder if the time gain of a special purpose parser for these things would be worth the cost of the complexity and risk of a not-quite-perfect implementation.
Gotcha! You focus on (3) then. (2) is engineer doable. (1) we can delay if we do (2) as that is going to improve only the autoreg time. I am going to try get help to implement xpt manifest similar to the xpcom autoreg manifest.
I disagree with your characterization of task (1). It is not going to only improve autoreg time. It is going to reduce the number of times that we have to stat each .xpt file for size and then open and read it on each and every run. Reducing this from 80+ files to 1+ files is bound to make a difference. That said, there is nothing wrong with doing (2) first.
(1) We got to do this merging carefully right. If not, when we need could be reading in more data than needed 'cause some interface in a xpt file was used while the rest wont be. Once I do (2) we will figure out the right combination of (1). Also, I posted a bug on nsID::Parse(). It isn't spending much time. only 52 millisecs for 1200 calls. bug# 29241
Depends on: 29241
No, the xptinfo system currently reads all the .xpt files in the components dir and builds an in memory tree. (1) will just allow us to do the exact same thing but will be reading the info from less files. Any tuning, partitioning of files, and/or incremental loading of xpt files comes later. The packaging manifests already say which xpt files the product needs. I'm just suggesting we use that information at package time to link into one file rather than copy many. There isn't any special knowledge of the files' contents required for this task.
Putting on PDT+ based on conversation with dp. We will get 10% perf improvement and this is low risk to combining all the xpt files. dp, can you give us a ETA on fix date in Status Whiteboard please.
Jband you were so right. I did the test of putting all xpt files together. The result the timing went down from 985 to 200 millisecs (a lot of call graph timeing deleted). So I expect to get about 13% improvement if we just put all the xpt files together using xpt_link Ccing leaf and jj. what would be right way to do this. Doing it at every install or when we make the release bits. The command to do that is simple: xpt_link all.xpt file1.xpt file2.xpt .... Install time sounds better as then everyone will be testing with what we are going to ship and maybe we can turn of directory enumeration (more win for mac) and instead load all.xpt until we do autoreg style grovelling manifest building. Jband, the malloc improvement: do you have any overall improvement number for it. Does it sound risky for beta.
Whiteboard: [PDT+] → [PDT+]stage1 2/29
dp, Sorry, I was still caught up in fixing a crasher and have not done anything about reduing the malloc calls in libxpt. It may be considered a relatively risky bit of code to play with since it is used as in xpidl and xpt_link at build time too. If you have a target system where you've reduced the .xpt file count to one, then you can easily measure the time spent in this libxpt code by checking the time on entry to XPT_DoHeader and the time on exit. This is called once per .xpt file in nsInterfaceInfoManager::indexify_file. Though on my fast system this cost was a big percentage. I really doubt that the realtime cost is high even on a on slower target system. Can you easily tests this? Reducing the touching of the file system may be the real low hanging fruit here. I've attached a patch that reorders two calls in nsInterfaceInfoManager::initInterfaceTables. It turns out that nsFileSpec::IsFile() takes about twenty times longer than nsFileSpec::GetLeafName(). So, in the code that is looking through the components dir for files whose names end in ".xpt" this reordering checks the filename (and rejects most files!) *before* checking to see if the thing is a file or directory. So, we only do the 'stat' in isFile on the very few .xpt files instead of all files. This is another big win if stat is expensive and we have very few .xpt files. This is a *very* safe fix. I think we should commit it.
Sounds great. This patch along with the xpt file globbing should do the trick for this bug. And I agree that malloc twiddling is risky and not for beta. I already have pdt approval for stat changing and xpt file combining. I have code reviewed your patch and approve/review it. Can you check it in before monday builds ?
My "GetLeafName() before GetLeafName()" patch is checked in. a=jar.
If this is really checked in... can we take it off the PDT+ radar?? Is there something else that we're waiting for? I'm adding the comment that we'll have to go to PDT- on 3/3
Whiteboard: [PDT+]stage1 2/29 → [PDT+]stage1 2/29 w/b minus on 3/3
We got PDT+ bugs filed to cover the beta1 issues (see dependecies for this bug). So this bug need not be PDT+ anymore. Removing PDT+ notation for reconsideration.
No longer depends on: 30151
Whiteboard: [PDT+]stage1 2/29 w/b minus on 3/3
dp, I've been working on code to handle bug http://bugzilla.mozilla.org/show_bug.cgi?id=30753 What more do you want to do with this bug?
Assignee: jband → dp
Status: ASSIGNED → NEW
Let me see the new timing to see if we need to find more stuff to fix...
Status: NEW → ASSIGNED
This is so optimized now.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
- Per last comments, age of bug, and no reopen - Marking Verified/Fixed. Please reopen if still a problem.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.