28964 - NS_InitXPCOM() takes 3.35 secs out of 22 secs of startup

Assignee

Description

•

25 years ago

Function			%	#call	Time (secs)
-----------------------------------------------------------
main				100.00	1	25.35
  main1				86.79	1	22.00
  NS_InitXPCOM			13.21	1	 3.35

That is about 15% of startup. Most of it is xpt. Will post more details.

Suresh Duddi (gone)

Assignee

Comment 1

•

25 years ago

Nominating for beta1 for startup performance

Status: NEW → ASSIGNED

Keywords: beta1, perf

Priority: P3 → P1

Target Milestone: M14

Suresh Duddi (gone)

Assignee

Comment 2

•

25 years ago

Function                        %       #call   Time (milsecs)
-----------------------------------------------------------
NS_InitXPCOM			13.21	1	3350
  XPTI_GetInterfaceInfoManager	84.66	1	2835
  nsComponentManagerImpl::Init	8.43	1	 282
  PlatformPrePopulateRegistry	6.64	1	 222
  RegisterGenericFactory	0.07	35	   2

So as we can see, 85% of the init xpcom time is being spent in 
XPTI_GetInterfaceInfoManager.

We need to optmize the heck out of XPTI_InterfaceInfoManager, CompMgr::Init() 
and CompMgr::PrePopulateRegistry()

John Bandhauer

Comment 3

•

25 years ago

dp, Are you saying that you want .xpt loading out of NS_InitXPCOM or that you 
just want it to be faster? The xpt files are going to get loaded before we show 
a window anyway. NS_InitXPCOM is not such a bad place.

Various people have looked into solutions without finding anything that makes a 
lot of difference. There have been newsgroup threads on the topic.

Some options that come to mind...

1) tune the set of .xpt file to the minimum set needed.
2) link together .xpt files in the package phase into fewer large files.
3) load xpt stuff more incrementally.
4) look for speed problems in the .xpt loading code.

notes:
- There are real platform issues here. sfraser was seeing a lot of cost in 
iterating the directory and stat'ing files looking for .xpt files on Mac.
- Incremental loading schemes would require either prereading the file headers 
to find *where* particular interface info is, or building a manifest(s) ahead of 
time and loading *it*. Without careful tuning of sets of files with sets of 
interfaces that are known to be needed and loaded together there is real danger 
that we'll be spending a lot of time going back again and again to the file 
system - incremental loading could be a net loss.

The real point here is that too much of our info on how to improve this is 
anecdotal. It may be that we can't help it much. I agree that this is a good 
place to do some focused work to get the numbers and evaluate options. I don't 
mind doing some of that myself.

Suresh Duddi (gone)

Assignee

Comment 4

•

25 years ago

I wanted to see if we can improve the performance of XPTI_InterfaceInfoManager.

Here is more data for you:
Function                                %       #call   Time (secs)
-------------------------------------------------------------------
initInterfaceTables			100.00	  1	2835.37
  nsInterfaceInfoManager::indexify_file	 90.05	 83	2553.24
  nsFileSpec::IsFile			  9.60	185	 272.10

All that you said sounds good. Reassigning to jband. Doing incremental stuff 
would be good. Let us talk more about this. Do you bandwidth to tackle this for 
beta1

Assignee: dp → jband

Status: ASSIGNED → NEW

leger

Updated

•

25 years ago

Blocks: 7251

leger

Comment 5

•

25 years ago

Putting on PDT- radar for beta.  If you get compelling data to guarantee 
improvement, with low risk, can do for beta1.

Whiteboard: [PDT-]

chriss

Comment 6

•

25 years ago

I'm clearing PDT- since I think we need to improve startup time and since DP 
said in mail he'd like the PDT to reconsider it.

Whiteboard: [PDT-]

John Bandhauer

Comment 7

•

25 years ago

dp, I got sidetracked today to work on a crashing bug. I did want to relay some 
info...

My quantify results on a fast (450mhz) NT machine with a fast disk subsystem 
and no files in cache running a release build shows NS_InitXPCOM taking about 
1/3 of a second. Of course most machines will be a lot slower. Here is the 
breakdown I see:

(each indent level accounts for 100% of previous level)
100% NS_InitXPCOM
  48% XPTI_GetInterfaceInfoManager
    30% iterating files and doing stat to find .xpt files 178 calls
    66% nsInterfaceInfoManager::indexify_file
       16% doing stat (again) for file size 88 calls
       72% XPT_DoHeader (in memory only - malloc/free/memcpy/etc) > 50,000 calls
        1% doing file reads (only!?!)
  40% nsCompentManager::PrePopulateRegistry
    30% nsID::Parse (1006 calls - uses PR_sscanf)
    70% everything else
  12% misc. other stuff

As far as xpt stuff goes I think that we should focus on 3 things:

1) Reduce the number of xpt files at runtime by linking them at package time to 
one or a handful of files. We have the xpt_link tool that does this to make one 
.xpt file per idl source directory. But we are not using it to reduce the count 
at package time. We'd need a system to do this. I really haven't learned much 
about xpinstall yet. *Maybe* we could just leverage the packager manifests that 
exist now and have the packager just apply a different rule to .xpt files in 
the lists... Now the packager just copies each .xpt file in the manifest. Maybe 
we could have the packager build a script to run xpt_link by extracting the name 
of each .xpt file from the existing manifests and then use that script to link 
together one .xpt file and package it. Anyway, someone needs to look into this. 
If it is the case that the file i/o part is more expensive on most machines than 
on mine then reducing the file count could be the biggest win. We gain both in 
reducing the number of files whose size we need to assertain before allocing 
enough memory to read them and we gain in reduxcing the number of files to 
open/read/close.

2) Build a runtime manifest of the .xpt files to load. This would be a list 
generated each and every time we do autoreg. It would include our big .xpt file 
generated in item '1' above and also whatever other .xpt files might have found 
their way into the components dir. With this manifest we don't have to search 
the components dir on each startup. Still, it is not hardcoded; so it can be 
extended as components get installed. This manifest could be a standalone file 
at a hardcoded relative path location or it could be a (short!) list in the 
registry - assuming that we can read it before needing to know *any* typelib 
info. I think that this should just be a list of .xpt file. I don't think we 
need to handle .xpt files in different directories; i.e. no need to store path 
info. I also think that we don't need to fiddle with info about which interface 
info is in which file - incremental xpt loading is not the thing to do anytime 
soon. This system should be very linked to the system that does autoreg of DLLs.

3) The in memory stuff in libxpt is heavy on small allocs and little memcpy 
calls. It builds trees of structs that are never incrementally torn down. It is 
a small system. I think that it is a prime candidate for an arena. This could be 
a big win.

I think that these are three separate complementary tasks. If we can get some 
one(s) to help then I'd be inclined to work on item '3' because I'm familiar 
with libxpt and the junk around it. Someone would need to figure out xpinstall 
and the packaging system to do item '1'. We might need that team's help for 
that. Someone familiar with the component loading ought to be able to rip 
through item '2'.

If I'm to do these things myself I'd probably go 1-2-3 because limiting the 
amount of repetative i/o just seems like the biggest win. If we can get help on 
1 and/or 2 Then I'd rather jump on 3.

P.S. nsID::Parse looks pretty ugly to me. I wonder if the time gain of a special 
purpose parser for these things would be worth the cost of the complexity and 
risk of a not-quite-perfect implementation.

Suresh Duddi (gone)

Assignee

Comment 8

•

25 years ago

Gotcha! You focus on (3) then. (2) is engineer doable. (1) we can delay if we 
do (2) as that is going to improve only the autoreg time.

I am going to try get help to implement xpt manifest similar to the 
xpcom autoreg manifest.

John Bandhauer

Comment 9

•

25 years ago

I disagree with your characterization of task (1). It is not going to only 
improve autoreg time. It is going to reduce the number of times that we have to 
stat each .xpt file for size and then open and read it on each and every run. 
Reducing this from 80+ files to 1+ files is bound to make a difference. That 
said, there is nothing wrong with doing (2) first.

Suresh Duddi (gone)

Assignee

Comment 10

•

25 years ago

(1) We got to do this merging carefully right. If not, when we need could be 
reading in more data than needed 'cause some interface in a xpt file was used 
while the rest wont be.

Once I do (2) we will figure out the right combination of (1).

Also, I posted a bug on nsID::Parse(). It isn't spending much time. only 52 
millisecs for 1200 calls. bug# 29241

Depends on: 29241

John Bandhauer

Comment 11

•

25 years ago

No, the xptinfo system currently reads all the .xpt files in the components dir 
and builds an in memory tree. (1) will just allow us to do the exact same thing 
but will be reading the info from less files. Any tuning, partitioning of files, 
and/or incremental loading of xpt files comes later. The packaging manifests 
already say which xpt files the product needs. I'm just suggesting we use that 
information at package time to link into one file rather than copy many. There 
isn't any special knowledge of the files' contents required for this task.

leger

Comment 12

•

25 years ago

Putting on PDT+ based on conversation with dp.  We will get 10% perf improvement 
and this is low risk to combining all the xpt files.

dp, can you give us a ETA on fix date in Status Whiteboard please.

Whiteboard: [PDT+]

Suresh Duddi (gone)

Assignee

Comment 13

•

25 years ago

Jband you were so right. I did the test of putting all xpt files together. The 
result the timing went down from 985 to 200 millisecs (a lot of call graph 
timeing deleted).

So I expect to get about 13% improvement if we just put all the xpt files 
together using xpt_link

Ccing leaf and jj. what would be right way to do this. Doing it at every 
install or when we make the release bits. The command to do that is simple:

xpt_link all.xpt file1.xpt file2.xpt ....

Install time sounds better as then everyone will be testing with what we are 
going to ship and maybe we can turn of directory enumeration (more win for mac) 
and instead load all.xpt until we do autoreg style grovelling manifest 
building.

Jband, the malloc improvement: do you have any overall improvement number for 
it. Does it sound risky for beta.

Whiteboard: [PDT+] → [PDT+]stage1 2/29

John Bandhauer

Comment 14

•

25 years ago

Attached patch patch to only call IsFile after we see that name has .xpt — Details — Splinter Review

John Bandhauer

Comment 15

•

25 years ago

dp,

Sorry, I was still caught up in fixing a crasher and have not done anything 
about reduing the malloc calls in libxpt. It may be considered a relatively 
risky bit of code to play with since it is used as in xpidl and xpt_link at 
build time too.

If you have a target system where you've reduced the .xpt file count to one, 
then you can easily measure the time spent in this libxpt code by checking the 
time on entry to XPT_DoHeader and the time on exit. This is called once per .xpt 
file in nsInterfaceInfoManager::indexify_file. Though on my fast system this 
cost was a big percentage. I really doubt that the realtime cost is high even 
on a on slower target system. Can you easily tests this? Reducing the touching 
of the file system may be the real low hanging fruit here.

I've attached a patch that reorders two calls in 
nsInterfaceInfoManager::initInterfaceTables. It turns out that 
nsFileSpec::IsFile() takes about twenty times longer than 
nsFileSpec::GetLeafName(). So, in the code that is looking through the 
components dir for files whose names end in ".xpt" this reordering checks the 
filename (and rejects most files!) *before* checking to see if the thing is a 
file or directory. So, we only do the 'stat' in isFile on the very few .xpt 
files instead of all files. This is another big win if stat is expensive and we 
have very few .xpt files. This is a *very* safe fix. I think we should commit 
it.

Suresh Duddi (gone)

Assignee

Comment 16

•

25 years ago

Sounds great. This patch along with the xpt file globbing should do the 
trick for this bug. And I agree that malloc twiddling is risky and not for 
beta.

I already have pdt approval for stat changing and xpt file combining. I have 
code reviewed your patch and approve/review it. Can you check it in before 
monday builds ?

John Bandhauer

Updated

•

25 years ago

Status: NEW → ASSIGNED

John Bandhauer

Comment 17

•

25 years ago

My "GetLeafName() before GetLeafName()" patch is checked in. a=jar.

Suresh Duddi (gone)

Assignee

Updated

•

25 years ago

Depends on: 29658

Jim Roskind

Comment 18

•

25 years ago

If this is really checked in... can we take it off the PDT+ radar??  Is there
something else that we're waiting for?
I'm adding the comment that we'll have to go to PDT- on 3/3

Whiteboard: [PDT+]stage1 2/29 → [PDT+]stage1 2/29 w/b minus on 3/3

Suresh Duddi (gone)

Assignee

Updated

•

25 years ago

Depends on: 30151

Suresh Duddi (gone)

Assignee

Comment 19

•

25 years ago

We got PDT+ bugs filed to cover the beta1 issues (see dependecies for this bug).
So this bug need not be PDT+ anymore. Removing PDT+ notation for
reconsideration.

No longer depends on: 30151

Whiteboard: [PDT+]stage1 2/29 w/b minus on 3/3

Phil Peterson

Comment 20

•

25 years ago

[PDT-]

Whiteboard: [PDT-]

Suresh Duddi (gone)

Assignee

Updated

•

25 years ago

Depends on: 30151

John Bandhauer

Comment 21

•

24 years ago

dp, I've been working on code to handle bug
http://bugzilla.mozilla.org/show_bug.cgi?id=30753

What more do you want to do with this bug?

Assignee: jband → dp

Status: ASSIGNED → NEW

Suresh Duddi (gone)

Assignee

Comment 22

•

24 years ago

Let me see the new timing to see if we need to find more stuff to fix...

Status: NEW → ASSIGNED

Suresh Duddi (gone)

Assignee

Updated

•

24 years ago

Target Milestone: M14 → M16

Suresh Duddi (gone)

Assignee

Comment 23

•

24 years ago

This is so optimized now.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Tracy Walker [:tracy]

Comment 24

•

24 years ago

- Per last comments, age of bug, and no reopen - Marking Verified/Fixed.  Please 
reopen if still a problem.

Status: RESOLVED → VERIFIED

Cathleen

Updated

•

24 years ago

No longer blocks: 7251