Closed Bug 28964 Opened 25 years ago Closed 24 years ago

NS_InitXPCOM() takes 3.35 secs out of 22 secs of startup

Categories

(Core :: XPCOM, defect, P1)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dp, Assigned: dp)

References

Details

(Keywords: perf, Whiteboard: [PDT-])

Attachments

(1 file)

Function			%	#call	Time (secs)
-----------------------------------------------------------
main				100.00	1	25.35
  main1				86.79	1	22.00
  NS_InitXPCOM			13.21	1	 3.35

That is about 15% of startup. Most of it is xpt. Will post more details.
Nominating for beta1 for startup performance
Status: NEW → ASSIGNED
Keywords: beta1, perf
Priority: P3 → P1
Target Milestone: M14
Function                        %       #call   Time (milsecs)
-----------------------------------------------------------
NS_InitXPCOM			13.21	1	3350
  XPTI_GetInterfaceInfoManager	84.66	1	2835
  nsComponentManagerImpl::Init	8.43	1	 282
  PlatformPrePopulateRegistry	6.64	1	 222
  RegisterGenericFactory	0.07	35	   2

So as we can see, 85% of the init xpcom time is being spent in 
XPTI_GetInterfaceInfoManager.

We need to optmize the heck out of XPTI_InterfaceInfoManager, CompMgr::Init() 
and CompMgr::PrePopulateRegistry()
dp, Are you saying that you want .xpt loading out of NS_InitXPCOM or that you 
just want it to be faster? The xpt files are going to get loaded before we show 
a window anyway. NS_InitXPCOM is not such a bad place.

Various people have looked into solutions without finding anything that makes a 
lot of difference. There have been newsgroup threads on the topic.

Some options that come to mind...

1) tune the set of .xpt file to the minimum set needed.
2) link together .xpt files in the package phase into fewer large files.
3) load xpt stuff more incrementally.
4) look for speed problems in the .xpt loading code.

notes:
- There are real platform issues here. sfraser was seeing a lot of cost in 
iterating the directory and stat'ing files looking for .xpt files on Mac.
- Incremental loading schemes would require either prereading the file headers 
to find *where* particular interface info is, or building a manifest(s) ahead of 
time and loading *it*. Without careful tuning of sets of files with sets of 
interfaces that are known to be needed and loaded together there is real danger 
that we'll be spending a lot of time going back again and again to the file 
system - incremental loading could be a net loss.

The real point here is that too much of our info on how to improve this is 
anecdotal. It may be that we can't help it much. I agree that this is a good 
place to do some focused work to get the numbers and evaluate options. I don't 
mind doing some of that myself.
I wanted to see if we can improve the performance of XPTI_InterfaceInfoManager.

Here is more data for you:
Function                                %       #call   Time (secs)
-------------------------------------------------------------------
initInterfaceTables			100.00	  1	2835.37
  nsInterfaceInfoManager::indexify_file	 90.05	 83	2553.24
  nsFileSpec::IsFile			  9.60	185	 272.10

All that you said sounds good. Reassigning to jband. Doing incremental stuff 
would be good. Let us talk more about this. Do you bandwidth to tackle this for 
beta1
Assignee: dp → jband
Status: ASSIGNED → NEW
Blocks: 7251
Putting on PDT- radar for beta.  If you get compelling data to guarantee 
improvement, with low risk, can do for beta1.
Whiteboard: [PDT-]
I'm clearing PDT- since I think we need to improve startup time and since DP 
said in mail he'd like the PDT to reconsider it.
Whiteboard: [PDT-]
dp, I got sidetracked today to work on a crashing bug. I did want to relay some 
info...

My quantify results on a fast (450mhz) NT machine with a fast disk subsystem 
and no files in cache running a release build shows NS_InitXPCOM taking about 
1/3 of a second. Of course most machines will be a lot slower. Here is the 
breakdown I see:

(each indent level accounts for 100% of previous level)
100% NS_InitXPCOM
  48% XPTI_GetInterfaceInfoManager
    30% iterating files and doing stat to find .xpt files 178 calls
    66% nsInterfaceInfoManager::indexify_file
       16% doing stat (again) for file size 88 calls
       72% XPT_DoHeader (in memory only - malloc/free/memcpy/etc) > 50,000 calls
        1% doing file reads (only!?!)
  40% nsCompentManager::PrePopulateRegistry
    30% nsID::Parse (1006 calls - uses PR_sscanf)
    70% everything else
  12% misc. other stuff

As far as xpt stuff goes I think that we should focus on 3 things:

1) Reduce the number of xpt files at runtime by linking them at package time to 
one or a handful of files. We have the xpt_link tool that does this to make one 
.xpt file per idl source directory. But we are not using it to reduce the count 
at package time. We'd need a system to do this. I really haven't learned much 
about xpinstall yet. *Maybe* we could just leverage the packager manifests that 
exist now and have the packager just apply a different rule to .xpt files in 
the lists... Now the packager just copies each .xpt file in the manifest. Maybe 
we could have the packager build a script to run xpt_link by extracting the name 
of each .xpt file from the existing manifests and then use that script to link 
together one .xpt file and package it. Anyway, someone needs to look into this. 
If it is the case that the file i/o part is more expensive on most machines than 
on mine then reducing the file count could be the biggest win. We gain both in 
reducing the number of files whose size we need to assertain before allocing 
enough memory to read them and we gain in reduxcing the number of files to 
open/read/close.

2) Build a runtime manifest of the .xpt files to load. This would be a list 
generated each and every time we do autoreg. It would include our big .xpt file 
generated in item '1' above and also whatever other .xpt files might have found 
their way into the components dir. With this manifest we don't have to search 
the components dir on each startup. Still, it is not hardcoded; so it can be 
extended as components get installed. This manifest could be a standalone file 
at a hardcoded relative path location or it could be a (short!) list in the 
registry - assuming that we can read it before needing to know *any* typelib 
info. I think that this should just be a list of .xpt file. I don't think we 
need to handle .xpt files in different directories; i.e. no need to store path 
info. I also think that we don't need to fiddle with info about which interface 
info is in which file - incremental xpt loading is not the thing to do anytime 
soon. This system should be very linked to the system that does autoreg of DLLs.

3) The in memory stuff in libxpt is heavy on small allocs and little memcpy 
calls. It builds trees of structs that are never incrementally torn down. It is 
a small system. I think that it is a prime candidate for an arena. This could be 
a big win.

I think that these are three separate complementary tasks. If we can get some 
one(s) to help then I'd be inclined to work on item '3' because I'm familiar 
with libxpt and the junk around it. Someone would need to figure out xpinstall 
and the packaging system to do item '1'. We might need that team's help for 
that. Someone familiar with the component loading ought to be able to rip 
through item '2'.

If I'm to do these things myself I'd probably go 1-2-3 because limiting the 
amount of repetative i/o just seems like the biggest win. If we can get help on 
1 and/or 2 Then I'd rather jump on 3.

P.S. nsID::Parse looks pretty ugly to me. I wonder if the time gain of a special 
purpose parser for these things would be worth the cost of the complexity and 
risk of a not-quite-perfect implementation.
Gotcha! You focus on (3) then. (2) is engineer doable. (1) we can delay if we 
do (2) as that is going to improve only the autoreg time.

I am going to try get help to implement xpt manifest similar to the 
xpcom autoreg manifest.
I disagree with your characterization of task (1). It is not going to only 
improve autoreg time. It is going to reduce the number of times that we have to 
stat each .xpt file for size and then open and read it on each and every run. 
Reducing this from 80+ files to 1+ files is bound to make a difference. That 
said, there is nothing wrong with doing (2) first.
(1) We got to do this merging carefully right. If not, when we need could be 
reading in more data than needed 'cause some interface in a xpt file was used 
while the rest wont be.

Once I do (2) we will figure out the right combination of (1).

Also, I posted a bug on nsID::Parse(). It isn't spending much time. only 52 
millisecs for 1200 calls. bug# 29241
Depends on: 29241
No, the xptinfo system currently reads all the .xpt files in the components dir 
and builds an in memory tree. (1) will just allow us to do the exact same thing 
but will be reading the info from less files. Any tuning, partitioning of files, 
and/or incremental loading of xpt files comes later. The packaging manifests 
already say which xpt files the product needs. I'm just suggesting we use that 
information at package time to link into one file rather than copy many. There 
isn't any special knowledge of the files' contents required for this task.
Putting on PDT+ based on conversation with dp.  We will get 10% perf improvement 
and this is low risk to combining all the xpt files.

dp, can you give us a ETA on fix date in Status Whiteboard please.
Whiteboard: [PDT+]
Jband you were so right. I did the test of putting all xpt files together. The 
result the timing went down from 985 to 200 millisecs (a lot of call graph 
timeing deleted).

So I expect to get about 13% improvement if we just put all the xpt files 
together using xpt_link

Ccing leaf and jj. what would be right way to do this. Doing it at every 
install or when we make the release bits. The command to do that is simple:

xpt_link all.xpt file1.xpt file2.xpt ....

Install time sounds better as then everyone will be testing with what we are 
going to ship and maybe we can turn of directory enumeration (more win for mac) 
and instead load all.xpt until we do autoreg style grovelling manifest 
building.

Jband, the malloc improvement: do you have any overall improvement number for 
it. Does it sound risky for beta.
Whiteboard: [PDT+] → [PDT+]stage1 2/29
dp,

Sorry, I was still caught up in fixing a crasher and have not done anything 
about reduing the malloc calls in libxpt. It may be considered a relatively 
risky bit of code to play with since it is used as in xpidl and xpt_link at 
build time too.

If you have a target system where you've reduced the .xpt file count to one, 
then you can easily measure the time spent in this libxpt code by checking the 
time on entry to XPT_DoHeader and the time on exit. This is called once per .xpt 
file in nsInterfaceInfoManager::indexify_file. Though on my fast system this 
cost was a big percentage. I really doubt that the realtime cost is high even 
on a on slower target system. Can you easily tests this? Reducing the touching 
of the file system may be the real low hanging fruit here.

I've attached a patch that reorders two calls in 
nsInterfaceInfoManager::initInterfaceTables. It turns out that 
nsFileSpec::IsFile() takes about twenty times longer than 
nsFileSpec::GetLeafName(). So, in the code that is looking through the 
components dir for files whose names end in ".xpt" this reordering checks the 
filename (and rejects most files!) *before* checking to see if the thing is a 
file or directory. So, we only do the 'stat' in isFile on the very few .xpt 
files instead of all files. This is another big win if stat is expensive and we 
have very few .xpt files. This is a *very* safe fix. I think we should commit 
it.

Sounds great. This patch along with the xpt file globbing should do the 
trick for this bug. And I agree that malloc twiddling is risky and not for 
beta.

I already have pdt approval for stat changing and xpt file combining. I have 
code reviewed your patch and approve/review it. Can you check it in before 
monday builds ?
Status: NEW → ASSIGNED
My "GetLeafName() before GetLeafName()" patch is checked in. a=jar.
Depends on: 29658
If this is really checked in... can we take it off the PDT+ radar??  Is there
something else that we're waiting for?
I'm adding the comment that we'll have to go to PDT- on 3/3
Whiteboard: [PDT+]stage1 2/29 → [PDT+]stage1 2/29 w/b minus on 3/3
Depends on: 30151
We got PDT+ bugs filed to cover the beta1 issues (see dependecies for this bug).
So this bug need not be PDT+ anymore. Removing PDT+ notation for
reconsideration.
No longer depends on: 30151
Whiteboard: [PDT+]stage1 2/29 w/b minus on 3/3
[PDT-]
Whiteboard: [PDT-]
Depends on: 30151
dp, I've been working on code to handle bug
http://bugzilla.mozilla.org/show_bug.cgi?id=30753

What more do you want to do with this bug?
Assignee: jband → dp
Status: ASSIGNED → NEW
Let me see the new timing to see if we need to find more stuff to fix...
Status: NEW → ASSIGNED
Target Milestone: M14 → M16
This is so optimized now.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
- Per last comments, age of bug, and no reopen - Marking Verified/Fixed.  Please 
reopen if still a problem. 
Status: RESOLVED → VERIFIED
No longer blocks: 7251
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: