Closed Bug 65845 Opened 24 years ago Closed 9 years ago

incorporate code coverage info into link process

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: waterson, Unassigned)

References

Details

(Keywords: embed, memory-footprint)

Attachments

(14 files)

5.21 KB, patch
Details | Diff | Splinter Review
625 bytes, application/octet-stream
Details
6.64 KB, patch
Details | Diff | Splinter Review
4.80 KB, patch
Details | Diff | Splinter Review
677 bytes, text/plain
Details
2.78 KB, text/plain
Details
2.83 KB, text/plain
Details
6.03 KB, text/plain
Details
5.06 KB, patch
Details | Diff | Splinter Review
60.04 KB, text/plain
Details
543 bytes, patch
Details | Diff | Splinter Review
1.85 KB, patch
Details | Diff | Splinter Review
2.72 KB, patch
Details | Diff | Splinter Review
14.89 KB, application/octet-stream
Details
Per jband's suggestion, we should collect function usage information and use the 
windows linker's `/ORDER:@filename' flag to layout the function to minimize (or 
at least reduce) the working set. Ideally, we could simply:

1. Do a `special' MOZ_COVERAGE build

2. Run the build and exercise a well-defined test suite that would get us decent 
coverage information.

3. Shut down the browser, at which point the coverage information (suitable 
as input to the linker) would be dumped into each DLL's `build' directory.

4. Modify release builds to incorporate the coverage file in the linker's 
/OUTPUT directive.

I've done a bit of exploration on gathering coverage info. I'll attach the 
work-in-progress to the bug.
Keywords: embed, footprint
The above patch modifies the build to respond to MOZ_COVERAGE. When MOZ_COVERAGE 
is set in the environment, it will add `/Gh' to the CFLAGS, and link `trace.lib' 
with every executable and DLL in the produce. `trace.dll' is a proof-of-concept 
for the __penter() stuff, and records call counts by address in a hashtable. It 
uses the `imagehlp' stuff to dump demangled symbols on exit.

Still to do:

- determine input format for linker's /OUTPUT, make `trace.dll' dump in that 
format.

- make `trace.dll' dump information per DLL, not just in one big mess. jband 
suggested that we could use the imagehlp routines to figure out what module a 
symbol comes from, as well as a path to the `.pdb' for the module.

- figure out how to *not* build the `xpidl' compiler with /Gh! (Right now, 
you'll get call count information on `xpidl' every time we compile a type 
library!)

Anyway, I wouldn't get too worried about crossing the t's and dotting the i's 
until we verify that this actually has a good net impact on resident set size, 
so doing the first two tasks is probably most important.
dprice told me that the docs he read indicate that the linker will error out if
it finds exrtaneous entries in the order file (though the experiments are yet to
be done). Assuming this turns out to be true, I think the longterm plan should
be to write the per dll info NOT in a format that the linker will take, but
instead with the additional per line call count data we can use when merging
these files on subsequent runs. This will give us both extra per line info and
left over lines for defunct symbols. That's fine. At release build time we would
(as you suggested yesterday) do a link without an orderfile and then run a
script to take the generated mapfile and the checked-in orderfile and produce
the ready-to-use orderfile and then link again using that new orderfile. Double
linking each dll (and the exe) should not be that bad and the ready-to-use
orderfile can be built in the obj dir and not get in anyone's way.

Also, you may find you need to rebase the MOZ_COVERAGE builds to get accurate
info when mapping addresses to symbols - or is the current generation of imghelp
smart enough to do the right thing on a dynamically relocated dll? The older
imghelp did not used to get this right.
The output I'm getting from imagehlp on Win2K seems pretty reasonable, even 
without rebasing.
Hmm, digging around a bit more, I found this:

http://msdn.microsoft.com/library/default.asp?URL=/library/winresource/dnwinnt/s
800a.htm

Looks like there is a `working set tuner' that comes with the `Windows NT 
Resource Kit'? (Looking more closely, maybe this only shipped with NT 3.51?)
Okay, a couple more data points.

- We'll need to make sure that we compile with `/Gy' to generate a COMDAT for
  each function.

- The `order' file must contain decorated names for C++, and undecorated
  names for C; e.g.,

  extern "C" int foo(int a) {...} --> foo
  int bar(int a, int b) {...}     --> ?bar@@YAHHH@Z

- If you specify a symbol in the order file that doesn't exist, you get a
  warning (LNK4037), not a hard error.
I dug out the recent articles I mentioned. Robbins poinst out that newer
resource kits no longer ship that working set tuner. Its code (and Robbins'
better code) are messy to use on a big project. That's why I suggested the
scheme I did where we automatically dump the info in to each dll directory at
shutdown. 
adding build gurus to the cc list.

What's the benefit to having this in the release builds over just doing this for
profiling or footprint/performance analysis?
granrose: we will see. The theory is that we will significantly reduce the
working set size.
Talked to sfraser, who reminded me that Mac CW linker supports "arrange" files,
and we thought maybe the Windows-derived trace could produce an order file that
is XP enough to help Mac, and that can be re-mangled and otherwise translated to
work with the Mac linker, too.

/be
I applied the work-in-progress to my tree and rebuilt it.  I had to copy
trace.dll and nspr4.dll to dist/WIN32_D.OBJ/install  to get it to work.  One of
the executables generated in xpinstall needed it to finish linking.  But the
build ran and produced a ton of output.

I found a simple program and recompiled it with /Gy  Using dumpbin I was able to
make a file containing COMDATs of all the functions.  I reordered them, and
rebuilt with /order enabled.  When I dumped the contents of the new .obj none of
the functions had moved.  I haven't been able to figure out why.  
You mean `when I dumped the contents of the new *.[exe|dll]*, none of the 
functions had moved', right? The linker won't re-order the contents of the .obj 
files...
So I just did a down and nasty test, which I'll attach. The /ORDER flag seems to
work fine for me. I also discovered that the file you pass to `/ORDER' allows a
semicolon (';') as a comment character, which is groovy.
Dave, I was able to get the 2nd set of diffs to run through a clean build with 
no problems. (It also turns off tracing in xpidl.exe, which was slowing down the 
build a lot!)
Ok, I was so damn curious to see if this would do anything that I ran out and 
hacked stuff up far enough to create /ORDER files and wedge them into the build. 
The bad news is this. According to `TaskInfo 2000', re-ordering doesn't make a 
whit of difference.

Here's what I did.

1. Instrument a MOZ_PROFILE build with MOZ_COVERAGE, too.
2. Start winEmbed, shutdown. (Autoreg happens.)
3. Start winEmbed again. Shutdown. Collect symbols into `order.raw' file.
4. Write perl script to grovel through `order.raw' and produce per-module
   ordering information.

Then, in another tree:

1. Build with MOZ_DEBUG off and BUILD_OPT on. This is a full-on release
   build, baby.

2. Start winEmbed. Autoreg happens. Shut down.

3. Start winEmbed. Look at TaskInfo 2000. Note working set numbers. Shutdown.

4. Install linker ordering files created above in the proper locations.

5. Twiddle `config/dll.inc' to notice ordering files.

6. Rebuild; watch all DLLs relink, check `cmd.cfg' and `dumpbin'
   resulting DLLs to make sure that ordering info is really getting
   picked up.

7. Start winEmbed. (Autoreg happens). Shutdown.

8. Start winEmbed. Note TaskInfo 2000 numbers (specifically, `In Mem KB'
   and `Working Set KB') look a whole lot like they did before. Specifically,
   9,972KB before re-ordering, 9,984KB after re-ordering.

9. Curse.

I think I've probably tested a `best case' scenario for our code reording magic. 
(Our working set is only gonna get bigger as more pages load, images start to 
pile in, and more code is touched.) Since `nothing happened', I conclude that 
either

- I have screwed stuff up somehow
- There are more sinister problems afoot

cc'ing other Win32 gurus.
You didn't mention rebasing. Did you do it? The loader might load a lot of pages 
that would not otherwise be loaded if module base fixups are necessary. Also, it 
is not just a question of startup memory. If you cause lots of pages to be 
swapped out (say by iconizing) and then bring the app back to the front and do 
stuff, then the hope would be that the vm would need to map in less of the code 
pages.

 
Oh damn. I forgot to rebase. Let me try that.
Okay, now I've tried this with rebasing. Working set of a rebased build, without 
re-ordering: ~8860KB. Working set of a rebased build, with re-ordering, ~8528KB. 
 So that's about a 300KB savings in working set, or ~3%.

dprice: it might be cool to use that tool you've got that reads `dumpbin' output 
to see if the math works out here. Specifically, for each DLL, what do we 
compute the sum of the `required' code size to be? Does this match up with the 
the numbers that the OS is giving us?
Ok, so I've spent some more time with this hack, and I think maybe it's looking
a bit better. jband suggested that my `just startup winEmbed' stuff wasn't going
to be a very good test. So, I collected data by starting mozilla, browsing
around doing normal stuff for an hour or so.

Then, in my rebased but unordered `release' build, I start up and browse through
several pages; get a working set size of ~14MB. Apply the order files, relink,
rebase. Do a similar sprawl through several pages, get a working set size of ~12MB.

In both cases, I started the browser and it peaked at 15MB. I then minimized the
browser (which kicks in some Win32 logic that puts as much of the process on the
`standby list' as possible). Then bring the browser window back up and continue
to use it.
Chris, You don't sound real excited. I think that a change to keep 2 megs of
unused code from clogging up physical memory pages is an awesome achievement.
12/14 may not sound like a huge proportion, but that represents all the memory
used by the process, no? The thing you are fixing is the amount of physical
memory dedicated to unused code. I expect that a measurement of code pages
mapped to memory before and after your change will show a much more impresive
looking proportion. It seems to me that you are gettig into the ballpark of the
numbers that our code coverage analysis suggested we were wasting. Good work.
Sorry about the spam. An unset priority should be less than P5.  
Priority: -- → P5
I've attached a new version of trace.cpp that creates sorted order files all
over the tree.  Please review it and let me know what you think.  
Nice work! Couple of comments...

- Add the MPL license verbage at the top of the file.

- You've hard-coded `y:\' as the search path. Is there a better way to get
  this value? If we can't deduce it from within trace.cpp, maybe you could
  communicate the MOZ_SRC environment variable into this modules compile by
  adding a `-DMOZ_SRC=$(MOZ_SRC)'

- Go ahead and nuke the commented-out code in Reporter::~Reporter()
how do you feel about hard coding the depth, "..\..\..\..\"?
jband mentioned that the location on the .pdb file is at the very end of the .dll
It should be fairly simple to extract what we need from the .dlls themselves.
Sure, that might work.
I went with the MOZ_SRC option, in case someone set MOZ_OUT to somewhere other
than MOZ_SRC
Attached patch final diffsSplinter Review
attached final versions.  Asking leaf and reviewers@mozilla.org for a review.
sr=leaf for the build changes.
checked in
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Should this really have been closed? Did we really finish the task: "incorporate 
code coverage info into link process"?
Re-opening, because there are a few more things to finish up...

1. Either I gave you a bad patch, or it applied wrong, or something. We want
   `/Gh' to be added to CFLAGS independently of whether or not MOZ_DEBUG is
   defined. I'll attach a patch that corrects this.

2. We need to check in a first cut at the `win32.order' files, generated from
   an optimized build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Also, we should probably let this bug cover the Win32 code, and open new bugs 
for Mac and Linux.
as cls-prime for windows, sr=leaf for the patch to config.mak posted by
waterson.
I've sent mail to drivers@mozilla.org asking permission to check this in.  Right
now the tree is closed unless they bless your bug :(
blizzard? this is relatively safe.
Yep.  Go for it.
order files are checked in.
To find the right home for the order files, trace.dll is doing a search for the
.ilk files.  When you pass an order file to the linker, it doesn't do any
incremental linking.  So no .ilk files are created.  I'm gunna attach a patch
that searched for the .pdb files.
That patch includes changes to makefile.win to only build trace.dll if
MOZ_COVERAGE is defined.  jband, waterson wanna give me an r= or sr=?
r=waterson
Hey dprice, I'm seeing a fair number linker warnings when doing a MOZ_PROFILE 
build; e.g.:

win32.order : warning LNK4097: duplicate name 
"?GetAttributeMappingFunctions@nsHTMLObjectElement@@UBGIAAP6AXPBVnsIHTMLMappedAt
tributes@@PAVnsIMutableStyleContext@@PAVnsIPresContext@@@Z3@Z" in order file; 
using last one

Could you double-check the files you added and make sure that there aren't 
duplicates?

Also, we should probably make sure that we ignore the order files when doing a 
debug build...probably need to fix mozilla/config/dll.inc
I modified trace.dll to find duplicate entries in the list of function names.
It finds a TON of them.  I can modify it to remove duplicate entries that have
lower call counts.  Or I can pass through the list once, summing the call
counts.  Then pass through a second time to insert the function into the list.  
sr=leaf for the MOZ_COVERAGE check
Here's a fix to make us not use the order files in debug builds.

Index: dll.inc
===================================================================
RCS file: /cvsroot/mozilla/config/dll.inc,v
retrieving revision 3.5
diff -u -r3.5 dll.inc
--- dll.inc	2001/02/07 02:19:29	3.5
+++ dll.inc	2001/02/13 20:34:37
@@ -83,7 +83,7 @@
 !ifdef MAPFILE
 		/MAP:$(MAPFILE)
 !endif
-!if exist(win32.order)
+!if exist(win32.order) && !defined(MOZ_DEBUG)
         /ORDER:@win32.order
 !endif
         $(LFLAGS)

I wonder if, for makefile hygene, we should use a variable (e.g., $(ORDERFILE)) 
instead of hardcoding the file name. leaf?
yes, ORDERFILE should probably be used, and defined in the local makefiles. good 
catch, chris.
A few comments....

It looks like this is mostly working. Great!

On the duplicate entries question... What do you think is causing this? Each 
function has only one real entry point. Given a set of addresses that claim to 
map to the same symbol you can figure out which one *really* does map by using 
SymGetSymbolInfo. SymGetSymPrev and SymGetSymNext can tell you about the 
boundaries of the symbols that are known to be near by. Do you think these are 
functions without known symbols, or what? I think it is important to figure out 
what is really happening here because if we are improperly attributing some 
calls to the wrong symbol (or under-counting calls) then we can end up 
reordering such that calls are being made into pages where we thought no calls 
were being made. This can defeat much of what we are trying to gain. Also, I 
take it the subtraction of 5 from addr in the Log function is the known offset 
back from the return address in the call to penter to the real entry address of 
the function calling penter, right?

I'm looking at an order file and seeing a bunch of functions that should be 
inlined. I'm guessing that this was generated on a debug build, right? These 
should be generated on a rebased optimized build after doing some reasonable set 
of user activities. That is the plan, right?

As I originally imagined this, we would be accumulating this information on 
various runs over time. This requires the ability to read the order files, merge 
in info from the current run, and then write out the new file. You are planning 
to support this, right?

*Please* rename trace.cpp to something that says a little more about what it is 
and why it is there. We are not limited to 8.3 names. Maybe something like 
Win32SymbolOrderer.cpp or some such thing.
> On the duplicate entries question... What do you think is causing this? 

`static' functions. I've talked to dprice about this, and we aren't going to
incude any functions where SymGetSymFromAddr() shows a non-zero displacement.
Also N.B. that the ordering files that were initially checked in *were*
generated from a debug build, which (as you note) includes many out-of-line
impls :-(. dprice is going to fix that.

With respect to reading existing order files, filed bug 68802.
> `static' functions. I've talked to dprice about this, and we aren't going to
> incude any functions where SymGetSymFromAddr() shows a non-zero displacement.

I don't understand why this would generate function calls targeting multiple
addresses (maybe I don't need to understand). Does this happen in release builds
too? But anyway, why not include them? It seems like it would be better to lump
them together - sum the the call counts for the dups and add only one entry
rather than risk missing the symbol. Or did someone confirm that symbols with
non-zero displacement entries *always* also have a symbol with
zero-displacement? We for sure don't want to miss any.
new order files that greatly reduce the number of warnings
milestone = .9
Target Milestone: --- → mozilla0.9
As I see these win32.order files popping up recently, I assume this is still
being worked on. Are there any new numbers on performance/footprint gain? Is it
really worthwhile?
I am thinking about enabling this on OS/2 as well. Unfortunately, these
win32.order files use the mangled Win32 names so we would have to write a tool
that reads in the file, transmangles the names to our OS/2 compiler. Maybe the
same thing can be done for Unix/gcc and Mac/CW. Does that make sense?
There's still work going on to clear up some problems with the order files, so
we don't have solid numbers on the overall footprint changes.  Right now we
aren't seeing a huge win, but some of the larger dlls aren't being reordered
properly.

The MSVC++ linker requires that we give it mangled names.  We can tweak it to
provide unmangled names.  I don't know what the compilers on unix or OS/2
require for function ordering.
dprice, I'm still seeing a ton of warnings doing an optimized build; e.g.,

  _PreshShell::RepaintSelection does not exist; ingored

Please fix this. Do you want me to file another bug.
And my tests with filemon show that we're still reading scattered chunks from 
our .dll's on startup, so it doesn't appear to be helping.
Dude! Puh-leeze fix this! (Or yell ``uncle'' and hand it off if you're too swamped.)

The latest order files (as of 2001-04-04) do not contain ``decorated'' names.
Here's a sample of mozilla/content/build/win32.order:

  StyleContextImpl::GetStyleData ; 64925329
  NS_NewStyleContext ; 48923225
  nsDocumentFragment::AddRef ; 37241576
  CSSStyleRuleImpl::FirstSelector ; 28952348
  nsGenericElement::Release ; 20657952

So, something broke recently, and you're basically generating useless files
right now. :-(
I have brand new happy order files, ready to check in.
dprice: do these order files have demangled names? (Like the last set you
checked in?) If not, wouldn't you have changed something in
mozilla/config/trace.cpp? Could you attach the patch to that file?
A while back I had checked in order files that contained undecorated names.  I
had my build set up incorrectly.  I went back in and thought I had fixed the
problem.  It turns out that some of the files were missed.  pdb files were not
generated for some of the modules, so trace.cpp failed to find a good home for
the order file.  The forthcomig diffs make certain PDB files are made when
you've set both MOZ_PROFILE and MOZ_COVERGE  trace.cpp now creates a logfile
where one can check for errors.

The demangling of the C names wasn't quite right either.  trace.cpp now removes
the leading _ from C style names.
Status: REOPENED → ASSIGNED
- Don't nest MOZ_PROFILE and MOZ_COVERAGE. It turns out MOZ_PROFILE doesn't
  really need /PDB:NONE, so you can just do either/or here.

  !if defined(MOZ_PROFILE) || defined(MOZ_COVERAGE)

- You're inflicting a rather schizophrenic style on the file; e.g.,

  ``if('' vs. ``if (''
    Does whitespace preceded the ``if'' (while, for, ...) condition or not?
    Stock mozilla style is the latter.

  ``(cond)'' vs. ``( cond )''
    Are we offsetting parens or not? Stok mozilla style is the latter.

  ``(param, param)'' vs. ``(param,param)''
    Are parameters separated by space, or not? Stock mozilla style is
    the former.

  Make up your mind!

- For this:

  +        if( displacement > 0 ) return PL_DHASH_NEXT;

  Put the return on the next line, so you can set a breakpoint on it in
  the debugger.

- This could use a little explanation (by which I mean, a comment)

       while( cur ) {
  -       if(cur->function[0] == '_' && cur->function[1] == '_')
  +        if(cur->function[0] == '_' )
               fprintf(orderFile,"%s ; %d\n", cur->function+1, cur->count );

- Is a separate logfile really necessary?
  ``(cond)'' vs. ``( cond )''
    Are we offsetting parens or not? Stock mozilla style is the latter.

s/latter/former/ -- right?

/be
You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build.  Do
you prefer

!if defined(MOZ_COVERAGE) || defined(MOZ_PROFILE)

or leaving it as
!ifdef (MOZ_PROFILE)
and changing it to /PDB:$(PDBNAME) ?

I'm not in love with the logfile.  I'd be happy with it printing to the console.
 I tried using printf and running with -console, but none of the output was
displayed.  Should I do something more?  Or was whatever controls output to the
console already shutdown before the code in trace.dll got to run?

> You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build.

Why would you need to set MOZ_PROFILE in addition to MOZ_COVERAGE? (Assuming you 
do the MOZ_PROFILE || MOZ_COVERAGE thing.) I think that's the only file and line 
where the MOZ_PROFILE variable is used.

Regardless, MOZ_PROFILE (that is, a build meant to be used with Quantify) will 
work with either /PDB:NONE or /PDB:$(PDBFILE), so do whichever thing incurs the 
least amount of brain-print. :-)

you're right, MOZ_PROFILE isn't required for MOZ_COVERAGE, so I'll go with the A
|| B code.  I've also straightened up all of the schitzo spacing problems.  Any
thoughts on the output to console problem?  Do ya need to see another diff?
r=jband on last patch.
->0.9.1
Target Milestone: mozilla0.9 → mozilla0.9.1
dprice:

Please take the time to post to this bug the answers to two questions:

1) How much measured benefit do we get from these order files? What methodology 
was used in that measurement?

2) What *exactly* is the proceedure used to generate the order files? I'm 
especially interested in exactly what 'user-activity' is performed in the 
browser during the run(s) when these order files are generated. Is this done on 
first run? What parts of the browser get exercised?

Thanks!
I've done two things to test the impact of the order files.  I've run through
jrgm's page loader test.  There was no noticable improvement in page loads.  I
also measured the size of the resident set.  I used vadump.exe (from the win2k
resource kit) and saw there there's no real difference in the number of pages
kept in memory.

I've seen that everything is working correctly (vadump shows me the symbols on
each page, and they're in the order specified in the order files)  But I haven't
seen signifigant wins from the reordering.


To generate the order files, I build a MOZ_COVERAGE build which links in
trace.dll Then I run that build.  I got to my.netscape.com, www.cnn.com, run 5
iterations through jrgm's page loader test, open my email and read a few
messages, send a test email, read a newsgroup, visit the bugzilla query page and
run a query then edit the results page.  I make certain the mozilla/dist
directory is gone, to be sure that it is a first time execution.
Thanks Dave.

Hmm. A few thoughts...

With this stuff all in place we should put some effort into *making* it show 
measurable improvement. If we can't show said improvement then there is no good 
reason to keep it in place and we should remove it if it's not helping.

Your proceedure for generating the order files sounds good except, I suggest, 
you do this on a second run. There may well be a lot of code that *only* runs 
the first time and there's no use adding it to the mix. I don't think optimizing 
first run memory useage is the goal.

On testing, I suggest a test where you load the browser, load a page or two, 
then iconize the browser (and run some other app if necessary) to get it to swap 
pages out. *Then* bring the browser forward and load a page and see how much 
memory it needs to swap back in so that it can service the request.

We *know* we only running roughly half the code. Segregating the used code from 
the unused code has got to save *something*. Either the measurements are not 
showing the gain, or we're just not doing the right thing to segregate the code. 
It has been agreed that I should look into this.  I have some
previous experience using cord/pixie on IRIX which hopefully
I can apply to this problem.  There some relevant discussion
in bug 68802 that should probably have take place here.
Assignee: dprice → rogc
Status: ASSIGNED → NEW
Finally go that patch checked in today.
new entry point not found bugs keep cropping up.  We're not seeing a real win
for the ordering that's there now.  So after talking this over with waterson,
choffman and kandrot I'm going to disable the use of the win32.order files
unless MOZ_COVERAGE is set.  This will turn it off for the optimized builds.

here's a diff for your enjoyment:

Index: dll.inc
===================================================================
RCS file: /cvsroot/mozilla/config/dll.inc,v
retrieving revision 3.6
diff -u -r3.6 dll.inc
--- dll.inc     2001/02/15 23:04:33     3.6
+++ dll.inc     2001/05/10 00:23:11
@@ -83,7 +83,7 @@
!ifdef MAPFILE
/MAP:$(MAPFILE)
!endif
-!if exist(win32.order) && !defined(MOZ_DEBUG)
+!if exist(win32.order) && !defined(MOZ_DEBUG) && defined(MOZ_COVERAGE)
/ORDER:@win32.order
!endif
$(LFLAGS)
ever since dprice's changes to mozilla/config, talkback symbols have been
torched; still investigating why, if you want to help me look, dave, i'd
appreciate the help.
oh, i get it. you're making the pdb files explicit, while we were expecting the
debugging symbols to be in the binaries in dist/ (we use another utility to
strip them out into pdb files). If i can't get talkback reporting to work with
the several .pdb files scattered around the tree, is it ok to split up the
PROFILE and COVERAGE cases?
leaf: go for it.
Moving milestone out.  Still doing research.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Blocks: 7251, 71874
*** Bug 68855 has been marked as a duplicate of this bug. ***
I've been doing some stuff on Linux.  I've figured out how to
gather a trace of function calls and specify a function ordering
to the linker.  I've tried a couple of different algorithms for
specifying the order: Nat Friedman's "node grouping" algorithm
from the grope paper, and just putting them in the order in
which they were called ("startup").  "startup" does better than "node
grouping" for both memory use and startup time.

Using the "startup" algorithm, mozilla's resident size is slightly
reduced:

   normal static mozilla-bin:       20704 KB
   reordered "startup" mozilla-bin: 18932 KB

The reordered executable is also slightly smaller than the normal
static executable.

If I strip both the normal static mozilla-bin and the reordered
mozilla-bin, I get a slight startup time improvement:

   normal static mozilla-bin stripped:      19.914 seconds
   reorderd "startup" mozilla-bin stripped: 18.948 seconds

If I don't strip the executables, the reordered mozilla-bin actually
does worse:

   normal static mozilla-bin unstripped:      21.726 seconds
   reorderd "startup" mozilla-bin unstripped: 23.411 seconds

This is mystery #1.

Second startup is between 2 and 2.5 seconds for all the cases that I
tried.  The command "cp mozilla-bin /dev/null" takes about 8 seconds
if mozilla-bin has not been run.

Running two different versions of mozilla-bin linked against all the
same system libraries (exluding NSPR and ldap libraries) takes about
the same amount of time: there is not a significant advantage to
having another program that uses the same libraries run first.

I would expect our cold startup time to be about:

     2.5 seconds (warm start)
   +   8 seconds (cp mozilla-bin /dev/null)
   -  ~2 seconds (some savings from reordering to avoid reading the entire exe)
   +  ~2 seconds (some small number of seconds for mapping other libraries)
 --------
   ~10.5 seconds

Cold start is taking about 8 seconds longer than I would expect.  This
is mystery #2.

All these experiments have taken place on my 1GHz Pentium III w/ 512MB
memory running RedHat Linux 7.0 (+ newer compilers).

I will continue digging into these mysteries.  Any and all tips and
suggestions are appreciated :-)

Thanks!
-Roger
OK, I've found at least part of the answer to mystery #2.  By looking
at a timeline and strace output, I see that we spend between 6 and 11
seconds on cold start stat()ing the 126 .xpt files in dist/bin/components
4 times each.  Sure enough, I run a shell script that runs "ls -lL" 4
times in succession on that directory and it takes 9 seconds.  I'll
file a separate bug.

This is all happening from within a call to NS_SetupRegistry_1()
in nsAppRunner.cpp.

-Roger
I filed bug 84871 for the stat() issue in case you're interested.

-Roger
It turns out the stat() issue goes away when you have real
files in the components directory rather than symlinks, so I
marked bug 84871 invalid.

The upshot is that I've learned something about how to set up
my test environment.  Sorry for the SPAM.  :-)

-Roger
rogc: Are you wasting your time profiling debug builds? We do autoregistration 
of xpt file and DLLS on every startup of debug builds but not for optimized 
builds. This is very much intentional. Pleny of other stuff happens in debug 
builds that would skew measurement. 

What exectly do you mean by "Cold start"? First run in a clean directory that 
has no autoreg created registries? Or just a run of a previously run build after 
a reboot to clear any caching? Or something else?
No, I'm not doing timings on a debug build, but I turns out
I neglected to "export BUILD_OPT=1" which may be why I'm
running into the problem with all the stat() calls.

OK, so with this in my .mozconfig:

  ac_add_options --disable-shared
  ac_add_options --enable-static
  ac_add_options --disable-debug
  ac_add_options --enable-optimize

and with BUILD_OPT=1 in my environment, will I really get a
static optimized build?  Is there anything else?

Can someone please explain to me why I have to set the 
BUILD_OPT environment variable in addition to two config 
options to get an optimized build?

What I'm measuring as "cold start" is a second or subsequent
run of a build after rebooting:

  1. build it.
  2. run it.
  3. reboot.
  4. time it.
  5. repeat ad nauseum.

Thanks!
-Roger

OK, I changed my test setup to copy the dist directory structure and
thus remove the resolution of symlinks from my timings.  Now I'm
getting results that make a lot more sense.  The reordered executable
was faster and used less memory for both stripped and unstripped
binaries.  The numbers below are from stripped executables in both
cases.

I ran each test case five times, and computed the mean and standard
deviation for cold start time, cold memory use, warm start time, and
warm memory use.

Reordering saves us 10% of cold start time and 8.6% of cold start
memory use, and 2.4% of warm start time and 9.2% of warm start memory
use on my 1GHz PIII/512MB/RHLinux7.0 machine.  I will run these
experiments on a slower machine in Mountain View tomorrow or Wednesday
and I will also run experiments on Windows as soon as I can.

Timing numbers:

static default link ordering:
cold start   	cold mem    	 warm start    	 warm mem
8.985        	20940       	 2.305           20756
9.174        	20776       	 2.409           20816
9.174        	20800       	 2.304           20808
9.029        	20780       	 2.304           20768
9.373        	20780       	 2.425           20764
----------------------------------------------------------------
9.147 (0.152)   20815.2 (70.4)   2.349 (0.062)   20782.4 (27.51)

static "startup" link ordering:
cold start      cold mem         warm start      warm mem
8.395        	19084       	 2.237       	 18976
8.074        	18880       	 2.25        	 18852
8.129        	19100       	 2.356       	 18872
8.373        	18904       	 2.254       	 18812
8.151        	19184       	 2.362       	 18880
----------------------------------------------------------------
8.224 (0.149)   19030.4 (132.2)  2.292 (0.062)   18878.4 (60.57)


cold start mean delta: .923 seconds (10.1%)
cold mem mean delta:   1784.8 MB    (8.6%)
warm start mean delta: 0.57 secons  (2.4%)
warm mem mean delta:   1904 MB      (9.2%)

While I'm disappointed that the cold start time didn't improve by 20%
or more, I still think this optimization is worth doing for Linux.
So from seeing Roger's work under Linux would I be right in thinking
that this bug# is no longer mswindows2000-specific?  If that's the
case, someone might want to change the bug's platform/OS.
The work I'm doing will apply to all the Win32 platforms, and
to any platform that uses GNU cc and GNU ld.  I'm also planning
on looking into this on the Mac.  There's really no way to specify 
that with the choices on the menus.  :-)

I have no problem with just leaving it Windows 2000.  If another
platform/OS combination would help people keep track of things better
I'll be happy to accomodate, though.
OS: Windows 2000 → All
Hardware: PC → All
A couple of people have asked about the tools I've been using, 
so I attached them to this bug.  They're still very rough!
Here's some data from a slower Linux machine: 200MHzPPro/64MB/RedHat7.1


static default link ordering:
cold start   	cold mem    	 warm start    	 warm mem
21.217          20564            13.787          20740
21.991          20616            12.034          20604
22.042          20600            11.822          20636
22.143          20544            12.032          20604
22.026          20628            12.302          20604
--------------------------------------------------------------
21.884 (0.377)  20590.4 (35.4)   12.3954(0.796)  20637.6(58.9)

static startup link ordering:
cold start   	cold mem    	 warm start    	 warm mem
19.12           18652            11.579          18660        
19.228          18920            11.695          18668        
19.278          18800            11.744          18668        
19.194          18688            11.664          18656        
18.778          18704            11.743          18656        
--------------------------------------------------------------
19.1196(0.199)  18752.8(108.3)   11.685(0.068)   18661.6(6.07)

cold start mean delta: 2.76 seconds (12.6%)
cold mem mean delta:   1837.6 MB    (8.9%)
warm start mean delta: .71 seconds  (5.7%)
warm mem mean delta:   1976 MB      (9.5%)
Cathleen checked some patches into STATIC_BUILD_20010612_BRANCH that I 
posted to bug 46755 that do function reordering for Linux.  So far it's
just the build changes and an initial ordering file; the tools aren't
in yet.
Argh, that should have been bug 46775 in my previous message!
I've been digging into why reordering doesn't help very much on
Windows.  During startup mozilla executes 8272 functions from
mozilla.exe.  Of these, 6704 can be reordered and 1568 cannot.  Of the
latter, 695 are functions which have been declared static.  The others
are functions that have names like "_$E10".  I didn't track down all
873 of them, but the ones that I did track down correspond to static
or global variables of classes that have constructors.  For example,
this code:

  class Foo {
    public:
      Foo();
  };
  
  Foo foo;

will provoke VC++ into creating two functions named "_$E1" and "_$E2",
presumably so the C++ runtime can arrange to construct foo at the
proper time.  (Apparently there are lots of violators of the C++
portability guideline
http://www.mozilla.org/hacking/portable-cpp.html#dont_use_static_constructors 
:-)

Anyway, these two kinds of non-reorderable functions that we call
during startup are distributed fairly evenly throughout the .text
region of mozilla.exe without a whole lot of clustering.  Not only
that, but moving an orderable function away from a non-orderable
function from the same compilation unit could potentially make things
worse, because two functions from the same compilation unit are likely
to be near one another in the address space with the default ordering
(remember, we're talking about functions that are actually getting
called).

I believe this explains why reordering doesn't help as much on Windows
as it does on Linux.  I could try to improve this by changing
functions to be global instead of static and finding alternatives to
global objects with constructors, but I'll bet I could help out
performance more by spending time on other stuff.

I'm still trying to decide whether it's worth it to go ahead with
Windows reordering anyway.  Based on my experiments I would expect to
see a slight but measurable improvement of around 5% on startup time.
The reason I'm hesitating is that if we don't keep the .order files
current I'm worried that startup time will actually be worse than with
the default ordering because of the static functions.  Comments?
*** Bug 68802 has been marked as a duplicate of this bug. ***
I'm leaving this open but moving the milestone out.  I've
opened a separate bug 87695 for checking in new order files
to the branch prior to shipping.

After I get CVS write access I'll check my tools into the trunk.

-Roger
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Doesn't look like this is getting fixed before the freeze tomorrow night.
Pushing out a milestone.  Please correct if I'm mistaken.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
No longer blocks: 7251
Blocks: 7251
I had an Aha! moment on this bug. It is true that file static functions are not 
reordered and this apparently leaves us with tons of pages where uncalled 
functions are interleaved with called static functions. My insight was that we 
should be reordering both the called and uncalled functions! Though uncalled 
statics will still be mixed with called statics, we should be able to lump 
together the bulk of the uncalled functions and end up with a lot of code pages 
that never need to get swapped in.

I did some small scale experiments and this ought to do what I expect. I started  
to setup a tree to do this and discovered some whacky stuff in the system 
currently in the tree...

1) It looks to me like we overload MOZ_COVERAGE to both mean: "generate hook 
functions and dump win32.order files" AND "use win32.order files at link time". 
That seems crazy.

2) The code in trace.cpp makes some bad assumptions about hashtable init and so 
it crashes during shutdown during the build process in something called 
nsztool.exe.

3) We don't seem to be ordering exe files. mozilla.exe is not so small that this 
would not help.

4) We should have code in the exe to trigger the trace.cpp dump on shutdown so 
that only the exe's we care about (mozilla.exe) will do this.

I'm inclined to find some time to fiddle with all this and see where I end up.
jband: Good thinking!

order.tgz has a bunch of work I did on this which is not checked
into the tree.  My "trace.cpp" is completely different; it just
logs all function access to a file, which is then processed
later by the "order" program.  You can hack "order" to put all
unused non-static functions at the end.

I would suggest doing this work on a static build, for maximum
return on investment.

-Roger
Target Milestone: mozilla0.9.4 → ---
To Cathleen
Assignee: rogc → cathleen
"
I would suggest doing this work on a static build, for maximum
return on investment.
"

see bug 104483 (for windows)
and the attached patches
Reassigning to shaver for now. shaver, if you don't want it, send it back,
postage is guaranteed.
Assignee: cathleen → shaver
I've been working on this the past week or so (on windows), so unless shaver
really wants it I'll take it.
Assignee: shaver → dveditz
Status: NEW → ASSIGNED
Keywords: mozilla1.0+
remove self
Keywords: mozilla1.0+mozilla1.0-
Product: Browser → Seamonkey
Assignee: dveditz → nobody
Status: ASSIGNED → NEW
Priority: P5 → --
QA Contact: granrosebugs → build-config
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Status: NEW → UNCONFIRMED
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.

If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.

Query tag for this change: mass-UNCONFIRM-20090614
Ever confirmed: false
Product: SeaMonkey → Core
Actually, this bug has nothing to do with code coverage, despite its name. Its intent appears to be to do PGO for our builds... which we've been doing already for a few years.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago9 years ago
Resolution: --- → WORKSFORME
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: