Closed Bug 29063 Opened 25 years ago Closed 23 years ago

excessive stat() calls

Categories

(Core Graveyard :: Tracking, defect, P4)

All
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla0.9.5

People

(Reporter: colin, Assigned: dp)

References

Details

(Keywords: perf)

Attachments

(4 files, 1 obsolete file)

Not sure which category to put this into. I think it should start life belonging 
to RDF, but there doesn't appear to be an RDF category. 

I originally posted this in a couple of n.p.m newsgroups, but got no response 
other than Dan Veditz saying "You are not being paranoid. These stats are 
killing us on the Mac as well."

M14 code, and I'm still seeing more local file accesses than I would 
expect/hope. Once the browser is up and running, I go to a page, and then hit 
the back button. These are the local file accesses I see. Note that these are 
real hard accesses.

Access a simple page:

stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /dkb100/work/m14opt/chrome/global/skin/default/animthrob.gif
open /dkb100/work/m14opt/chrome/global/skin/default/animthrob.gif
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /dkb100/work/m14opt/chrome/global/skin/default/progressmeter-busy.gif
open /dkb100/work/m14opt/chrome/global/skin/default/progressmeter-busy.gif


Go back to previous page:

stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /dkb100/work/m14opt/chrome/global/skin/default/button32-bg-active.gif
open /dkb100/work/m14opt/chrome/global/skin/default/button32-bg-active.gif
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /dkb100/work/m14opt/chrome/global/skin/default/animthrob.gif
open /dkb100/work/m14opt/chrome/global/skin/default/animthrob.gif
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /alpha$dka0/users/cblake/.mozilla/mozProfile
stat /dkb100/work/m14opt/chrome/global/skin/default/progressmeter-busy.gif
open /dkb100/work/m14opt/chrome/global/skin/default/progressmeter-busy.gif

Two questions:

1. Why do we keep stat'ing the profile directory all the time?

2. Why do we always seem to stat a file before opening it? To see if its there? 
Gee, the open call would tell us that.

The pairs of stat calls are coming from nsChromeRegistry::GetProfileRoot. We get 
two stat calls per time through this.

Does anyone else think this is excessive and I should file a bug? Or am I just 
being paranoid because I live on a system that has slow stat's and open's?

Colin.
Forgot to say. I am seeing this on the OpenVMS platform, but that's built from 
the normal code base as if it were a UNIX derivative.
lets start with waterson.  also adding dp, mcafee, pav, sspit, alecf,
sfraser as general unix/mac hacks to cc list to enjoin them in the hunt.
RELEASE THE HOUNDS!
Assignee: chofmann → waterson
perf keyword
Keywords: perf
Stat is certainly an issue. In xpt loading we found 10% of time reading in 
about 50 small xpt files was being spent on stat. Jband is looking into 
removing that.

I can add more info on this when I get to my office machine with quantify data. 
I think we need to fix many part of this.

Waterson, may I take this bug from you. This is a performance issue. Once we 
find the culprits I will cycle the bug or file individual bugs to try 
eliminating them.
all yours dp.
Assignee: waterson → dp
Hooking into startup performance
Blocks: 7251
Severity: normal → major
Status: NEW → ASSIGNED
Priority: P3 → P1
Target Milestone: M14
putterman was seeing the excessive stat calls, too.  cc'ing him.
Stat timings on startup

Function			%	#call	Time(sec)
---------------------------------------------------------
stat				100	557	0.58
  nsFileSpec::IsFile		 45.32	185	0.26 (bug# 28964)
  nsFileSpec::Exists		 25.26	165	0.15
  nsFileSpec::GetFileSize	 20.91	 92	0.12
  nsFileSpec::IsDirectory	  8.50	115	0.05

All the 185 IsFile() calls are from xpt that are taken care in bug# 28964. So 
let us skip that. Looking at Exists().

Function				%	#call	Time(sec)
-----------------------------------------------------------------
nsFileSpec::Exists		 	100	165	0.15
  GetProfileDirectory			 74.74	 96	0.12 <got fix>
  nsChromeRegistry::			 12.93	 24	0.02
    CheckForProfileFile
  nsFileSpec::ResolveSymlink		  7.71	 28	0.01

GetProfileDirectory() and ChromeRegistry() look like they need to looked to see 
if they need to be doing this.

GetProfileDirectory() was simple enough to fix. Fixing chrome registry could 
give a bigger longer term yeild. Who want to take that ?
Depends on: 18048
With one ProfileDir fix and xpt file globbing, we have gotten rid of 280 out of 
550 calls.

What remains is:

  nsFileSpec::Exists             25.26   70     0.15
  nsFileSpec::GetFileSize        20.91   92     0.12
  nsFileSpec::IsDirectory         8.50  115     0.05
do we have stack traces for each of these? even the top 4 frames would be great.
The callers are not just one. So there would be more than one trace per 
category. I would just break in the debugger for say isDirectory() and see the 
stack.
Target Milestone: M14 → M16
prefs calls nsFileSpec::Exists MANY times more than it needs (in fact, it never
needs to call it) - I have patches in my tree awaiting review which cleans up
25-30 of them from startup.
Alec is your patch for pref checked in.
yes - I had to put one back because it broke some stuff. But I got rid of all 
but that one.
Hoping you would take care of this.
Assignee: dp → warren
Status: ASSIGNED → NEW
M16 has been out for a while now, these bugs target milestones need to be 
updated.
Can somebody measure this again and report back? Things have changed a lot in 
the file transport.

Or maybe we should close/ignore this bug since everything will change when the 
chrome comes out of jar files.
Target Milestone: M16 → M20
Its better (see below). mozProfile isn't being accessed all the time now. But 
we're still stat'ing a file before open'ing it. 

Is the plan to have all the chrome in a jar file AND have that jar file read 
permanently into memory?

FORWARD

stat /m16opt/chrome/skins/modern/global/skin/progressmeter-busy.gif
stat /m16opt/chrome/skins/modern/global/skin/animthrob.gif
open /m16opt/chrome/skins/modern/global/skin/animthrob.gif
open /m16opt/chrome/skins/modern/global/skin/progressmeter-busy.gif

BACKWARD

stat /m16opt/chrome/skins/modern/global/skin/progressmeter-busy.gif
stat /m16opt/chrome/skins/modern/global/skin/animthrob.gif
open /m16opt/chrome/skins/modern/global/skin/animthrob.gif
open /m16opt/chrome/skins/modern/global/skin/progressmeter-busy.gif
This now just seems like a dup of the bug that the throbber hits the disk each 
time. Bug 41268.
Warren, I am reassigning some bugs which you own over to me (dougt).  If you 
think that you will have time to address them please reassign them back to 
yourself.  This reassignment was suggested by choffman.
Assignee: warsome → dougt
leger is no longer @netscape. changing qa contact to component's default
QA Contact: leger → chofmann
I think someone needs to look at this issue again.

Last February Suresh reported that there were 557 stat calls at startup. Some
work was done to reduce this number quite considerably, and startup time 
improved.

I've noticed a slowdown at startup time and so I checked the number of stat
calls being made (now that we are using jar files I would have expected the
number to be small). We are making 744 stat calls to start up the browser and
display 'about:blank'. Whoa!!!

And add to that 215 access() calls, the majority of which I bet are just 
checking for the existance of a file before opening it. Talk about a waste
of cycles!

I can provide a list of the stat and access calls if required, but I think
the output that Suresh posted last time is much more meaningful.

Colin.
anybody know how to run dp juju?   is it just some output from prlog?
colin is that in a debug build?  Also, do you have jar packaging disabled?
Yes, it is a debug build (from a few days ago). I can only trace the calls in a
debug build on OpenVMS.

jar packaging in certainly ON. My bin tree is void of all the files that make
up the jar files so I know its using the jar files.
the dp juju look surprisingly familiar.. I want to say it's gprof - not sure.
when I do this on linux:

strace -f mozilla -P seth about:blank |& tee ~/strace.txt

I see us making 286 stat calls.

I'll attach my strace log, for those are interested.  
Seth, your trace is very interesting. More about that in a moment. First I 
should say that yesterday I found an RDF problem that was causing OpenVMS to 
always reinstall all the chrome upon startup. Now this is fixed my number of 
stat calls is down from 744 to 196, so that has helped a lot, but its still 
higher than your 119 (I was counting only stat, not fstat).

Comparing your trace and mine, they start out pretty much the same, but when 
yours stops mine continues. I see stat'ing of:

  rdf files in bin/chrome
  rdf files in user-profile/chrome
  css files in user-profile/chrome
  all the files in bin/searchplugins
  rdf files in user-profile/chrome/overlayinfo/xxx/xxx
  rdf files in bin/chrome/overlayino/xxx/xxx
  xml files in bin/res/builtin
  panels.rdf in user-profile

Any idea why OpenVMS is stat'ing all these files while UNIX is not?
mcafee discovered some interesting stuff last night that I would've never 
thought of. Specifically, it looks like when you run strace on ``just an 
optimized build'', there is a significant amount of filesystem activity 
following symlinks from the ``dist'' directory to the actual files (duh!). So, 
be sure you're measuring on a ``real'' installation that doesn't symlink to 
files built in another directory.
Once I downloaded a build, I got a run similar to what sspitzer posted here.
But ! jprof isn't enabled by default.  We either need this on by-default,
or: what's the mechanism to package up a build on unix?
You mean you're not seeing all the additional files getting stat'ed, like I am 
on OpenVMS? Damn. What is going on on OpenVMS???
jprof will not find excessive stat() calls because it is a time-based sampling
profiler. only strace or gprof (which probably doesn't work with mozilla) will
count these.
add topperf
Keywords: topperf
Comment from Alan Stange (stange@rentec.com) in
news://news.mozilla.org/3A8CB264.3000804%40rentec.com :

> Typically, [open and stat calls] are paired up.  First stat the file, then 
> open it.  There is a very nice optimization here to open() the file, if that 
> works, then do an fstat().  The fstat() is essentially free once the 
> open() is done.  The idiom of stat() then open() shows up all over.
dougt, are you working on this?  are you the right owner?
chris,

no I am not actively looking at this bug.  I don't think that this bug will be
found in one central place, but rather it is a bunch of usage cases.

As a sidenote, the nsIFile caches stat info.  If you create one of these
objects, all sucessive getter() calls will result in only one stat every being
made.  

Briefly looking at the OpenVMS stat log, it looks like we can quickly identify
some problems, one of which we know about.  However, I am not really sure what
this output is.  I am guessing these are calls from the application to the os
|stat| api.  (for example, a stat then an open would only result in one stat
being logged, right??)

The first problem is that gif files are loading (or at least stat'ed) multiply
times.  This is a known problem which will go away when necko lands the new
image library.  

The second problem is that some rdf files are also loading multiple times.  For
example, the search.rdf file is stat'ed twice and localstore.rdf is being loaded
three times? Why is this?   

Another problem is that some js files are being loaded twice.  I think that I
spoke to waterson (or brendan) about this in the passed.  If you look at
security-prefs.js, it is being loaded twice.  I bet this is caused by it being
references twice from XUL.  Waterson, does your xul cache hold onto compiled js?

Prefernces looks like it is pounding stat second only to imagelib.  It stats
unix.js and initprefs.js three times, and the other prefs javascript files
twice.  Prefs io needs a enema.

But on the brigher side, I do not see any duplicates of jar files or libraries.

The OpenVMS stat log is showing exactly what you say, Doug. That is, the calls 
from Mozilla to the O/S. So each stat you see is a real stat() call and NOT one 
that has been satisfied by the nsIFile stat cache.

If you see anything that looks like an OpenVMS specific problem (eg. other 
platforms only stat some file once but OpenVMS does it twice), please let me 
know!!
see also bug 69140
Blocks: 69140
kandrot is going to knock this one silly...
Assignee: dougt → kandrot
On a current Linux build, I see about 250 stat() calls at startup, then no more.
 I used the PRE_LOAD to gather this info (then verified it by adding the same
code to XPCOM), overriding stat(), inserting a printf (or fprintf), then calling
the original stat().  I do not see the gifs and all, mentioned in previous posts
(though I do see the two stat() before an open, which I shall look into).
Status: NEW → ASSIGNED
looks like this should fall of the top perf list if
we are in good shape...  anyone have any evidence that
stat() call are killing startup?
Keywords: topperf
Marking this as 0.9.1, since I do not know if there is enough info yet to
complete this bug.
Target Milestone: --- → mozilla0.9.1
are we still seeing lots of stat() calls?  Is this still affecting the startup
performance?
Summary: excessive stat calls → excessive stat() calls
forgot we decided in last performance meeting (3/9/01) that there doesn't appear 
to be any issues with stat() call, will take this off tracking list for now
No longer blocks: 7251
Severity: major → normal
Priority: P1 → P3
moving off the 0.9.1 radar
Target Milestone: mozilla0.9.1 → mozilla0.9.2
linux Bug 65845 incorporate code coverage info into link process 
esp Comments From Roger Chickering on:
2001-06-08 15:10, 2001-06-08 23:18, 2001-06-08 23:28 and 2001-06-08 23:54.
and the bug: 84871 Half of Linux cold start is spent stat()ing components files 

Target Milestone: mozilla0.9.2 → mozilla1.0
reassigning kandrot bugs.
Assignee: kandrot → dougt
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → ---
Blocks: 98275
to dp.
Assignee: dougt → dp
Need to analyze this to figure out exactly what we should do about it.
Target Milestone: --- → mozilla0.9.5
Status: NEW → ASSIGNED
Priority: P3 → P4
Analysis on stat calls:

Test case was

strace ./mozilla -P 'Default User' file:///usr/tmp/quit.html > trace.out

stat()
 86 .so
 47 .rdf
 34 .xpt
 31 .js
  6 .css
 24 searchplugin
 12 directory
----------------
240 total accounted
256 total

125 fstat()

Comments:
- 80 .so are comming from the prepopulation of registry code. We can cut this
down to 40
- 24 searchplugin : why are we loading the search plugin stuff
- 47 rdf files mmh!
Attachment #24569 - Attachment is obsolete: true
Attachment #50155 - Attachment description: List of files we stat (in order) on startup (post above patch) → List of files we stat (in order) on startup - BEFORE above patch
Comment on attachment 50154 [details] [diff] [review]
xpcom patch to eliminate 86 stats

Looks good to me. r=waterson
Attachment #50154 - Flags: review+
Comment on attachment 50154 [details] [diff] [review]
xpcom patch to eliminate 86 stats

awesome! So glad to see this part done. sr=alecf
Attachment #50154 - Flags: superreview+
Attached file stats after patch
bug# 100834 eliminates the 24 stats from searchplugins and a directory enumeration

Will look into plugins next. Looks like we are enumerating (loading too ?)
plugins on startup.
Depends on: 100834
Checked in xpcom portions removing 86 stats (Patch 09/20/01 15:28) into trunk.
Timings on these:

After fixing searchplugins and xpcom component stats, we have about 101 stats
taking 2.6msec out of 8000msec.

That is about 0.03% of startup. Not worth going after.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: