Closed
Bug 65845
Opened 24 years ago
Closed 10 years ago
incorporate code coverage info into link process
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: waterson, Unassigned)
References
Details
(Keywords: embed, memory-footprint)
Attachments
(14 files)
5.21 KB,
patch
|
Details | Diff | Splinter Review | |
625 bytes,
application/octet-stream
|
Details | |
6.64 KB,
patch
|
Details | Diff | Splinter Review | |
4.80 KB,
patch
|
Details | Diff | Splinter Review | |
677 bytes,
text/plain
|
Details | |
2.78 KB,
text/plain
|
Details | |
2.83 KB,
text/plain
|
Details | |
6.03 KB,
text/plain
|
Details | |
5.06 KB,
patch
|
Details | Diff | Splinter Review | |
60.04 KB,
text/plain
|
Details | |
543 bytes,
patch
|
Details | Diff | Splinter Review | |
1.85 KB,
patch
|
Details | Diff | Splinter Review | |
2.72 KB,
patch
|
Details | Diff | Splinter Review | |
14.89 KB,
application/octet-stream
|
Details |
Per jband's suggestion, we should collect function usage information and use the
windows linker's `/ORDER:@filename' flag to layout the function to minimize (or
at least reduce) the working set. Ideally, we could simply:
1. Do a `special' MOZ_COVERAGE build
2. Run the build and exercise a well-defined test suite that would get us decent
coverage information.
3. Shut down the browser, at which point the coverage information (suitable
as input to the linker) would be dumped into each DLL's `build' directory.
4. Modify release builds to incorporate the coverage file in the linker's
/OUTPUT directive.
I've done a bit of exploration on gathering coverage info. I'll attach the
work-in-progress to the bug.
Reporter | ||
Updated•24 years ago
|
Reporter | ||
Comment 1•24 years ago
|
||
Reporter | ||
Comment 2•24 years ago
|
||
The above patch modifies the build to respond to MOZ_COVERAGE. When MOZ_COVERAGE
is set in the environment, it will add `/Gh' to the CFLAGS, and link `trace.lib'
with every executable and DLL in the produce. `trace.dll' is a proof-of-concept
for the __penter() stuff, and records call counts by address in a hashtable. It
uses the `imagehlp' stuff to dump demangled symbols on exit.
Still to do:
- determine input format for linker's /OUTPUT, make `trace.dll' dump in that
format.
- make `trace.dll' dump information per DLL, not just in one big mess. jband
suggested that we could use the imagehlp routines to figure out what module a
symbol comes from, as well as a path to the `.pdb' for the module.
- figure out how to *not* build the `xpidl' compiler with /Gh! (Right now,
you'll get call count information on `xpidl' every time we compile a type
library!)
Anyway, I wouldn't get too worried about crossing the t's and dotting the i's
until we verify that this actually has a good net impact on resident set size,
so doing the first two tasks is probably most important.
Comment 3•24 years ago
|
||
dprice told me that the docs he read indicate that the linker will error out if
it finds exrtaneous entries in the order file (though the experiments are yet to
be done). Assuming this turns out to be true, I think the longterm plan should
be to write the per dll info NOT in a format that the linker will take, but
instead with the additional per line call count data we can use when merging
these files on subsequent runs. This will give us both extra per line info and
left over lines for defunct symbols. That's fine. At release build time we would
(as you suggested yesterday) do a link without an orderfile and then run a
script to take the generated mapfile and the checked-in orderfile and produce
the ready-to-use orderfile and then link again using that new orderfile. Double
linking each dll (and the exe) should not be that bad and the ready-to-use
orderfile can be built in the obj dir and not get in anyone's way.
Also, you may find you need to rebase the MOZ_COVERAGE builds to get accurate
info when mapping addresses to symbols - or is the current generation of imghelp
smart enough to do the right thing on a dynamically relocated dll? The older
imghelp did not used to get this right.
Reporter | ||
Comment 4•24 years ago
|
||
The output I'm getting from imagehlp on Win2K seems pretty reasonable, even
without rebasing.
Reporter | ||
Comment 5•24 years ago
|
||
Hmm, digging around a bit more, I found this:
http://msdn.microsoft.com/library/default.asp?URL=/library/winresource/dnwinnt/s
800a.htm
Looks like there is a `working set tuner' that comes with the `Windows NT
Resource Kit'? (Looking more closely, maybe this only shipped with NT 3.51?)
Reporter | ||
Comment 6•24 years ago
|
||
Okay, a couple more data points.
- We'll need to make sure that we compile with `/Gy' to generate a COMDAT for
each function.
- The `order' file must contain decorated names for C++, and undecorated
names for C; e.g.,
extern "C" int foo(int a) {...} --> foo
int bar(int a, int b) {...} --> ?bar@@YAHHH@Z
- If you specify a symbol in the order file that doesn't exist, you get a
warning (LNK4037), not a hard error.
Comment 7•24 years ago
|
||
I dug out the recent articles I mentioned. Robbins poinst out that newer
resource kits no longer ship that working set tuner. Its code (and Robbins'
better code) are messy to use on a big project. That's why I suggested the
scheme I did where we automatically dump the info in to each dll directory at
shutdown.
Comment 8•24 years ago
|
||
adding build gurus to the cc list.
What's the benefit to having this in the release builds over just doing this for
profiling or footprint/performance analysis?
Reporter | ||
Comment 9•24 years ago
|
||
granrose: we will see. The theory is that we will significantly reduce the
working set size.
Comment 10•24 years ago
|
||
Talked to sfraser, who reminded me that Mac CW linker supports "arrange" files,
and we thought maybe the Windows-derived trace could produce an order file that
is XP enough to help Mac, and that can be re-mangled and otherwise translated to
work with the Mac linker, too.
/be
Comment 11•24 years ago
|
||
I applied the work-in-progress to my tree and rebuilt it. I had to copy
trace.dll and nspr4.dll to dist/WIN32_D.OBJ/install to get it to work. One of
the executables generated in xpinstall needed it to finish linking. But the
build ran and produced a ton of output.
I found a simple program and recompiled it with /Gy Using dumpbin I was able to
make a file containing COMDATs of all the functions. I reordered them, and
rebuilt with /order enabled. When I dumped the contents of the new .obj none of
the functions had moved. I haven't been able to figure out why.
Reporter | ||
Comment 12•24 years ago
|
||
You mean `when I dumped the contents of the new *.[exe|dll]*, none of the
functions had moved', right? The linker won't re-order the contents of the .obj
files...
Reporter | ||
Comment 13•24 years ago
|
||
So I just did a down and nasty test, which I'll attach. The /ORDER flag seems to
work fine for me. I also discovered that the file you pass to `/ORDER' allows a
semicolon (';') as a comment character, which is groovy.
Reporter | ||
Comment 14•24 years ago
|
||
Reporter | ||
Comment 15•24 years ago
|
||
Reporter | ||
Comment 16•24 years ago
|
||
Dave, I was able to get the 2nd set of diffs to run through a clean build with
no problems. (It also turns off tracing in xpidl.exe, which was slowing down the
build a lot!)
Reporter | ||
Comment 17•24 years ago
|
||
Ok, I was so damn curious to see if this would do anything that I ran out and
hacked stuff up far enough to create /ORDER files and wedge them into the build.
The bad news is this. According to `TaskInfo 2000', re-ordering doesn't make a
whit of difference.
Here's what I did.
1. Instrument a MOZ_PROFILE build with MOZ_COVERAGE, too.
2. Start winEmbed, shutdown. (Autoreg happens.)
3. Start winEmbed again. Shutdown. Collect symbols into `order.raw' file.
4. Write perl script to grovel through `order.raw' and produce per-module
ordering information.
Then, in another tree:
1. Build with MOZ_DEBUG off and BUILD_OPT on. This is a full-on release
build, baby.
2. Start winEmbed. Autoreg happens. Shut down.
3. Start winEmbed. Look at TaskInfo 2000. Note working set numbers. Shutdown.
4. Install linker ordering files created above in the proper locations.
5. Twiddle `config/dll.inc' to notice ordering files.
6. Rebuild; watch all DLLs relink, check `cmd.cfg' and `dumpbin'
resulting DLLs to make sure that ordering info is really getting
picked up.
7. Start winEmbed. (Autoreg happens). Shutdown.
8. Start winEmbed. Note TaskInfo 2000 numbers (specifically, `In Mem KB'
and `Working Set KB') look a whole lot like they did before. Specifically,
9,972KB before re-ordering, 9,984KB after re-ordering.
9. Curse.
I think I've probably tested a `best case' scenario for our code reording magic.
(Our working set is only gonna get bigger as more pages load, images start to
pile in, and more code is touched.) Since `nothing happened', I conclude that
either
- I have screwed stuff up somehow
- There are more sinister problems afoot
cc'ing other Win32 gurus.
Reporter | ||
Comment 18•24 years ago
|
||
Reporter | ||
Comment 19•24 years ago
|
||
Reporter | ||
Comment 20•24 years ago
|
||
Reporter | ||
Comment 21•24 years ago
|
||
Comment 22•24 years ago
|
||
You didn't mention rebasing. Did you do it? The loader might load a lot of pages
that would not otherwise be loaded if module base fixups are necessary. Also, it
is not just a question of startup memory. If you cause lots of pages to be
swapped out (say by iconizing) and then bring the app back to the front and do
stuff, then the hope would be that the vm would need to map in less of the code
pages.
Reporter | ||
Comment 23•24 years ago
|
||
Oh damn. I forgot to rebase. Let me try that.
Reporter | ||
Comment 24•24 years ago
|
||
Okay, now I've tried this with rebasing. Working set of a rebased build, without
re-ordering: ~8860KB. Working set of a rebased build, with re-ordering, ~8528KB.
So that's about a 300KB savings in working set, or ~3%.
dprice: it might be cool to use that tool you've got that reads `dumpbin' output
to see if the math works out here. Specifically, for each DLL, what do we
compute the sum of the `required' code size to be? Does this match up with the
the numbers that the OS is giving us?
Reporter | ||
Comment 25•24 years ago
|
||
Ok, so I've spent some more time with this hack, and I think maybe it's looking
a bit better. jband suggested that my `just startup winEmbed' stuff wasn't going
to be a very good test. So, I collected data by starting mozilla, browsing
around doing normal stuff for an hour or so.
Then, in my rebased but unordered `release' build, I start up and browse through
several pages; get a working set size of ~14MB. Apply the order files, relink,
rebase. Do a similar sprawl through several pages, get a working set size of ~12MB.
In both cases, I started the browser and it peaked at 15MB. I then minimized the
browser (which kicks in some Win32 logic that puts as much of the process on the
`standby list' as possible). Then bring the browser window back up and continue
to use it.
Comment 26•24 years ago
|
||
Chris, You don't sound real excited. I think that a change to keep 2 megs of
unused code from clogging up physical memory pages is an awesome achievement.
12/14 may not sound like a huge proportion, but that represents all the memory
used by the process, no? The thing you are fixing is the amount of physical
memory dedicated to unused code. I expect that a measurement of code pages
mapped to memory before and after your change will show a much more impresive
looking proportion. It seems to me that you are gettig into the ballpark of the
numbers that our code coverage analysis suggested we were wasting. Good work.
Comment 27•24 years ago
|
||
Sorry about the spam. An unset priority should be less than P5.
Priority: -- → P5
Comment 28•24 years ago
|
||
Comment 29•24 years ago
|
||
I've attached a new version of trace.cpp that creates sorted order files all
over the tree. Please review it and let me know what you think.
Reporter | ||
Comment 30•24 years ago
|
||
Nice work! Couple of comments...
- Add the MPL license verbage at the top of the file.
- You've hard-coded `y:\' as the search path. Is there a better way to get
this value? If we can't deduce it from within trace.cpp, maybe you could
communicate the MOZ_SRC environment variable into this modules compile by
adding a `-DMOZ_SRC=$(MOZ_SRC)'
- Go ahead and nuke the commented-out code in Reporter::~Reporter()
Comment 31•24 years ago
|
||
how do you feel about hard coding the depth, "..\..\..\..\"?
jband mentioned that the location on the .pdb file is at the very end of the .dll
It should be fairly simple to extract what we need from the .dlls themselves.
Reporter | ||
Comment 32•24 years ago
|
||
Sure, that might work.
Comment 33•24 years ago
|
||
I went with the MOZ_SRC option, in case someone set MOZ_OUT to somewhere other
than MOZ_SRC
Comment 34•24 years ago
|
||
Comment 35•24 years ago
|
||
Comment 36•24 years ago
|
||
attached final versions. Asking leaf and reviewers@mozilla.org for a review.
Comment 37•24 years ago
|
||
sr=leaf for the build changes.
Comment 38•24 years ago
|
||
checked in
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 39•24 years ago
|
||
Should this really have been closed? Did we really finish the task: "incorporate
code coverage info into link process"?
Reporter | ||
Comment 40•24 years ago
|
||
Re-opening, because there are a few more things to finish up...
1. Either I gave you a bad patch, or it applied wrong, or something. We want
`/Gh' to be added to CFLAGS independently of whether or not MOZ_DEBUG is
defined. I'll attach a patch that corrects this.
2. We need to check in a first cut at the `win32.order' files, generated from
an optimized build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 41•24 years ago
|
||
Reporter | ||
Comment 42•24 years ago
|
||
Also, we should probably let this bug cover the Win32 code, and open new bugs
for Mac and Linux.
Comment 43•24 years ago
|
||
as cls-prime for windows, sr=leaf for the patch to config.mak posted by
waterson.
Comment 44•24 years ago
|
||
I've sent mail to drivers@mozilla.org asking permission to check this in. Right
now the tree is closed unless they bless your bug :(
Comment 45•24 years ago
|
||
blizzard? this is relatively safe.
Comment 46•24 years ago
|
||
Yep. Go for it.
Comment 47•24 years ago
|
||
order files are checked in.
To find the right home for the order files, trace.dll is doing a search for the
.ilk files. When you pass an order file to the linker, it doesn't do any
incremental linking. So no .ilk files are created. I'm gunna attach a patch
that searched for the .pdb files.
That patch includes changes to makefile.win to only build trace.dll if
MOZ_COVERAGE is defined. jband, waterson wanna give me an r= or sr=?
Comment 48•24 years ago
|
||
Reporter | ||
Comment 49•24 years ago
|
||
r=waterson
Reporter | ||
Comment 50•24 years ago
|
||
Hey dprice, I'm seeing a fair number linker warnings when doing a MOZ_PROFILE
build; e.g.:
win32.order : warning LNK4097: duplicate name
"?GetAttributeMappingFunctions@nsHTMLObjectElement@@UBGIAAP6AXPBVnsIHTMLMappedAt
tributes@@PAVnsIMutableStyleContext@@PAVnsIPresContext@@@Z3@Z" in order file;
using last one
Could you double-check the files you added and make sure that there aren't
duplicates?
Also, we should probably make sure that we ignore the order files when doing a
debug build...probably need to fix mozilla/config/dll.inc
Comment 51•24 years ago
|
||
I modified trace.dll to find duplicate entries in the list of function names.
It finds a TON of them. I can modify it to remove duplicate entries that have
lower call counts. Or I can pass through the list once, summing the call
counts. Then pass through a second time to insert the function into the list.
Comment 52•24 years ago
|
||
sr=leaf for the MOZ_COVERAGE check
Reporter | ||
Comment 53•24 years ago
|
||
Here's a fix to make us not use the order files in debug builds.
Index: dll.inc
===================================================================
RCS file: /cvsroot/mozilla/config/dll.inc,v
retrieving revision 3.5
diff -u -r3.5 dll.inc
--- dll.inc 2001/02/07 02:19:29 3.5
+++ dll.inc 2001/02/13 20:34:37
@@ -83,7 +83,7 @@
!ifdef MAPFILE
/MAP:$(MAPFILE)
!endif
-!if exist(win32.order)
+!if exist(win32.order) && !defined(MOZ_DEBUG)
/ORDER:@win32.order
!endif
$(LFLAGS)
I wonder if, for makefile hygene, we should use a variable (e.g., $(ORDERFILE))
instead of hardcoding the file name. leaf?
Comment 54•24 years ago
|
||
yes, ORDERFILE should probably be used, and defined in the local makefiles. good
catch, chris.
Comment 55•24 years ago
|
||
A few comments....
It looks like this is mostly working. Great!
On the duplicate entries question... What do you think is causing this? Each
function has only one real entry point. Given a set of addresses that claim to
map to the same symbol you can figure out which one *really* does map by using
SymGetSymbolInfo. SymGetSymPrev and SymGetSymNext can tell you about the
boundaries of the symbols that are known to be near by. Do you think these are
functions without known symbols, or what? I think it is important to figure out
what is really happening here because if we are improperly attributing some
calls to the wrong symbol (or under-counting calls) then we can end up
reordering such that calls are being made into pages where we thought no calls
were being made. This can defeat much of what we are trying to gain. Also, I
take it the subtraction of 5 from addr in the Log function is the known offset
back from the return address in the call to penter to the real entry address of
the function calling penter, right?
I'm looking at an order file and seeing a bunch of functions that should be
inlined. I'm guessing that this was generated on a debug build, right? These
should be generated on a rebased optimized build after doing some reasonable set
of user activities. That is the plan, right?
As I originally imagined this, we would be accumulating this information on
various runs over time. This requires the ability to read the order files, merge
in info from the current run, and then write out the new file. You are planning
to support this, right?
*Please* rename trace.cpp to something that says a little more about what it is
and why it is there. We are not limited to 8.3 names. Maybe something like
Win32SymbolOrderer.cpp or some such thing.
Reporter | ||
Comment 56•24 years ago
|
||
> On the duplicate entries question... What do you think is causing this?
`static' functions. I've talked to dprice about this, and we aren't going to
incude any functions where SymGetSymFromAddr() shows a non-zero displacement.
Also N.B. that the ordering files that were initially checked in *were*
generated from a debug build, which (as you note) includes many out-of-line
impls :-(. dprice is going to fix that.
With respect to reading existing order files, filed bug 68802.
Comment 57•24 years ago
|
||
> `static' functions. I've talked to dprice about this, and we aren't going to
> incude any functions where SymGetSymFromAddr() shows a non-zero displacement.
I don't understand why this would generate function calls targeting multiple
addresses (maybe I don't need to understand). Does this happen in release builds
too? But anyway, why not include them? It seems like it would be better to lump
them together - sum the the call counts for the dups and add only one entry
rather than risk missing the symbol. Or did someone confirm that symbols with
non-zero displacement entries *always* also have a symbol with
zero-displacement? We for sure don't want to miss any.
Comment 58•24 years ago
|
||
new order files that greatly reduce the number of warnings
Comment 60•24 years ago
|
||
As I see these win32.order files popping up recently, I assume this is still
being worked on. Are there any new numbers on performance/footprint gain? Is it
really worthwhile?
I am thinking about enabling this on OS/2 as well. Unfortunately, these
win32.order files use the mangled Win32 names so we would have to write a tool
that reads in the file, transmangles the names to our OS/2 compiler. Maybe the
same thing can be done for Unix/gcc and Mac/CW. Does that make sense?
Comment 61•24 years ago
|
||
There's still work going on to clear up some problems with the order files, so
we don't have solid numbers on the overall footprint changes. Right now we
aren't seeing a huge win, but some of the larger dlls aren't being reordered
properly.
The MSVC++ linker requires that we give it mangled names. We can tweak it to
provide unmangled names. I don't know what the compilers on unix or OS/2
require for function ordering.
Reporter | ||
Comment 62•24 years ago
|
||
dprice, I'm still seeing a ton of warnings doing an optimized build; e.g.,
_PreshShell::RepaintSelection does not exist; ingored
Please fix this. Do you want me to file another bug.
Comment 63•24 years ago
|
||
And my tests with filemon show that we're still reading scattered chunks from
our .dll's on startup, so it doesn't appear to be helping.
Reporter | ||
Comment 64•24 years ago
|
||
Dude! Puh-leeze fix this! (Or yell ``uncle'' and hand it off if you're too swamped.)
The latest order files (as of 2001-04-04) do not contain ``decorated'' names.
Here's a sample of mozilla/content/build/win32.order:
StyleContextImpl::GetStyleData ; 64925329
NS_NewStyleContext ; 48923225
nsDocumentFragment::AddRef ; 37241576
CSSStyleRuleImpl::FirstSelector ; 28952348
nsGenericElement::Release ; 20657952
So, something broke recently, and you're basically generating useless files
right now. :-(
Comment 65•24 years ago
|
||
I have brand new happy order files, ready to check in.
Reporter | ||
Comment 66•24 years ago
|
||
dprice: do these order files have demangled names? (Like the last set you
checked in?) If not, wouldn't you have changed something in
mozilla/config/trace.cpp? Could you attach the patch to that file?
Comment 67•24 years ago
|
||
A while back I had checked in order files that contained undecorated names. I
had my build set up incorrectly. I went back in and thought I had fixed the
problem. It turns out that some of the files were missed. pdb files were not
generated for some of the modules, so trace.cpp failed to find a good home for
the order file. The forthcomig diffs make certain PDB files are made when
you've set both MOZ_PROFILE and MOZ_COVERGE trace.cpp now creates a logfile
where one can check for errors.
The demangling of the C names wasn't quite right either. trace.cpp now removes
the leading _ from C style names.
Status: REOPENED → ASSIGNED
Comment 68•24 years ago
|
||
Reporter | ||
Comment 69•24 years ago
|
||
- Don't nest MOZ_PROFILE and MOZ_COVERAGE. It turns out MOZ_PROFILE doesn't
really need /PDB:NONE, so you can just do either/or here.
!if defined(MOZ_PROFILE) || defined(MOZ_COVERAGE)
- You're inflicting a rather schizophrenic style on the file; e.g.,
``if('' vs. ``if (''
Does whitespace preceded the ``if'' (while, for, ...) condition or not?
Stock mozilla style is the latter.
``(cond)'' vs. ``( cond )''
Are we offsetting parens or not? Stok mozilla style is the latter.
``(param, param)'' vs. ``(param,param)''
Are parameters separated by space, or not? Stock mozilla style is
the former.
Make up your mind!
- For this:
+ if( displacement > 0 ) return PL_DHASH_NEXT;
Put the return on the next line, so you can set a breakpoint on it in
the debugger.
- This could use a little explanation (by which I mean, a comment)
while( cur ) {
- if(cur->function[0] == '_' && cur->function[1] == '_')
+ if(cur->function[0] == '_' )
fprintf(orderFile,"%s ; %d\n", cur->function+1, cur->count );
- Is a separate logfile really necessary?
Comment 70•24 years ago
|
||
``(cond)'' vs. ``( cond )''
Are we offsetting parens or not? Stock mozilla style is the latter.
s/latter/former/ -- right?
/be
Comment 71•24 years ago
|
||
You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build. Do
you prefer
!if defined(MOZ_COVERAGE) || defined(MOZ_PROFILE)
or leaving it as
!ifdef (MOZ_PROFILE)
and changing it to /PDB:$(PDBNAME) ?
I'm not in love with the logfile. I'd be happy with it printing to the console.
I tried using printf and running with -console, but none of the output was
displayed. Should I do something more? Or was whatever controls output to the
console already shutdown before the code in trace.dll got to run?
Reporter | ||
Comment 72•24 years ago
|
||
> You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build.
Why would you need to set MOZ_PROFILE in addition to MOZ_COVERAGE? (Assuming you
do the MOZ_PROFILE || MOZ_COVERAGE thing.) I think that's the only file and line
where the MOZ_PROFILE variable is used.
Regardless, MOZ_PROFILE (that is, a build meant to be used with Quantify) will
work with either /PDB:NONE or /PDB:$(PDBFILE), so do whichever thing incurs the
least amount of brain-print. :-)
Comment 73•24 years ago
|
||
you're right, MOZ_PROFILE isn't required for MOZ_COVERAGE, so I'll go with the A
|| B code. I've also straightened up all of the schitzo spacing problems. Any
thoughts on the output to console problem? Do ya need to see another diff?
Comment 74•24 years ago
|
||
r=jband on last patch.
Comment 76•24 years ago
|
||
dprice:
Please take the time to post to this bug the answers to two questions:
1) How much measured benefit do we get from these order files? What methodology
was used in that measurement?
2) What *exactly* is the proceedure used to generate the order files? I'm
especially interested in exactly what 'user-activity' is performed in the
browser during the run(s) when these order files are generated. Is this done on
first run? What parts of the browser get exercised?
Thanks!
Comment 77•24 years ago
|
||
I've done two things to test the impact of the order files. I've run through
jrgm's page loader test. There was no noticable improvement in page loads. I
also measured the size of the resident set. I used vadump.exe (from the win2k
resource kit) and saw there there's no real difference in the number of pages
kept in memory.
I've seen that everything is working correctly (vadump shows me the symbols on
each page, and they're in the order specified in the order files) But I haven't
seen signifigant wins from the reordering.
To generate the order files, I build a MOZ_COVERAGE build which links in
trace.dll Then I run that build. I got to my.netscape.com, www.cnn.com, run 5
iterations through jrgm's page loader test, open my email and read a few
messages, send a test email, read a newsgroup, visit the bugzilla query page and
run a query then edit the results page. I make certain the mozilla/dist
directory is gone, to be sure that it is a first time execution.
Comment 78•24 years ago
|
||
Thanks Dave.
Hmm. A few thoughts...
With this stuff all in place we should put some effort into *making* it show
measurable improvement. If we can't show said improvement then there is no good
reason to keep it in place and we should remove it if it's not helping.
Your proceedure for generating the order files sounds good except, I suggest,
you do this on a second run. There may well be a lot of code that *only* runs
the first time and there's no use adding it to the mix. I don't think optimizing
first run memory useage is the goal.
On testing, I suggest a test where you load the browser, load a page or two,
then iconize the browser (and run some other app if necessary) to get it to swap
pages out. *Then* bring the browser forward and load a page and see how much
memory it needs to swap back in so that it can service the request.
We *know* we only running roughly half the code. Segregating the used code from
the unused code has got to save *something*. Either the measurements are not
showing the gain, or we're just not doing the right thing to segregate the code.
Comment 79•24 years ago
|
||
It has been agreed that I should look into this. I have some
previous experience using cord/pixie on IRIX which hopefully
I can apply to this problem. There some relevant discussion
in bug 68802 that should probably have take place here.
Assignee: dprice → rogc
Status: ASSIGNED → NEW
Comment 80•24 years ago
|
||
Finally go that patch checked in today.
Comment 81•24 years ago
|
||
new entry point not found bugs keep cropping up. We're not seeing a real win
for the ordering that's there now. So after talking this over with waterson,
choffman and kandrot I'm going to disable the use of the win32.order files
unless MOZ_COVERAGE is set. This will turn it off for the optimized builds.
here's a diff for your enjoyment:
Index: dll.inc
===================================================================
RCS file: /cvsroot/mozilla/config/dll.inc,v
retrieving revision 3.6
diff -u -r3.6 dll.inc
--- dll.inc 2001/02/15 23:04:33 3.6
+++ dll.inc 2001/05/10 00:23:11
@@ -83,7 +83,7 @@
!ifdef MAPFILE
/MAP:$(MAPFILE)
!endif
-!if exist(win32.order) && !defined(MOZ_DEBUG)
+!if exist(win32.order) && !defined(MOZ_DEBUG) && defined(MOZ_COVERAGE)
/ORDER:@win32.order
!endif
$(LFLAGS)
Comment 82•24 years ago
|
||
ever since dprice's changes to mozilla/config, talkback symbols have been
torched; still investigating why, if you want to help me look, dave, i'd
appreciate the help.
Comment 83•24 years ago
|
||
oh, i get it. you're making the pdb files explicit, while we were expecting the
debugging symbols to be in the binaries in dist/ (we use another utility to
strip them out into pdb files). If i can't get talkback reporting to work with
the several .pdb files scattered around the tree, is it ok to split up the
PROFILE and COVERAGE cases?
Comment 84•24 years ago
|
||
leaf: go for it.
Comment 85•24 years ago
|
||
Moving milestone out. Still doing research.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Comment 86•23 years ago
|
||
*** Bug 68855 has been marked as a duplicate of this bug. ***
Comment 87•23 years ago
|
||
I've been doing some stuff on Linux. I've figured out how to
gather a trace of function calls and specify a function ordering
to the linker. I've tried a couple of different algorithms for
specifying the order: Nat Friedman's "node grouping" algorithm
from the grope paper, and just putting them in the order in
which they were called ("startup"). "startup" does better than "node
grouping" for both memory use and startup time.
Using the "startup" algorithm, mozilla's resident size is slightly
reduced:
normal static mozilla-bin: 20704 KB
reordered "startup" mozilla-bin: 18932 KB
The reordered executable is also slightly smaller than the normal
static executable.
If I strip both the normal static mozilla-bin and the reordered
mozilla-bin, I get a slight startup time improvement:
normal static mozilla-bin stripped: 19.914 seconds
reorderd "startup" mozilla-bin stripped: 18.948 seconds
If I don't strip the executables, the reordered mozilla-bin actually
does worse:
normal static mozilla-bin unstripped: 21.726 seconds
reorderd "startup" mozilla-bin unstripped: 23.411 seconds
This is mystery #1.
Second startup is between 2 and 2.5 seconds for all the cases that I
tried. The command "cp mozilla-bin /dev/null" takes about 8 seconds
if mozilla-bin has not been run.
Running two different versions of mozilla-bin linked against all the
same system libraries (exluding NSPR and ldap libraries) takes about
the same amount of time: there is not a significant advantage to
having another program that uses the same libraries run first.
I would expect our cold startup time to be about:
2.5 seconds (warm start)
+ 8 seconds (cp mozilla-bin /dev/null)
- ~2 seconds (some savings from reordering to avoid reading the entire exe)
+ ~2 seconds (some small number of seconds for mapping other libraries)
--------
~10.5 seconds
Cold start is taking about 8 seconds longer than I would expect. This
is mystery #2.
All these experiments have taken place on my 1GHz Pentium III w/ 512MB
memory running RedHat Linux 7.0 (+ newer compilers).
I will continue digging into these mysteries. Any and all tips and
suggestions are appreciated :-)
Thanks!
-Roger
Comment 88•23 years ago
|
||
OK, I've found at least part of the answer to mystery #2. By looking
at a timeline and strace output, I see that we spend between 6 and 11
seconds on cold start stat()ing the 126 .xpt files in dist/bin/components
4 times each. Sure enough, I run a shell script that runs "ls -lL" 4
times in succession on that directory and it takes 9 seconds. I'll
file a separate bug.
This is all happening from within a call to NS_SetupRegistry_1()
in nsAppRunner.cpp.
-Roger
Comment 89•23 years ago
|
||
I filed bug 84871 for the stat() issue in case you're interested.
-Roger
Comment 90•23 years ago
|
||
It turns out the stat() issue goes away when you have real
files in the components directory rather than symlinks, so I
marked bug 84871 invalid.
The upshot is that I've learned something about how to set up
my test environment. Sorry for the SPAM. :-)
-Roger
Comment 91•23 years ago
|
||
rogc: Are you wasting your time profiling debug builds? We do autoregistration
of xpt file and DLLS on every startup of debug builds but not for optimized
builds. This is very much intentional. Pleny of other stuff happens in debug
builds that would skew measurement.
What exectly do you mean by "Cold start"? First run in a clean directory that
has no autoreg created registries? Or just a run of a previously run build after
a reboot to clear any caching? Or something else?
Comment 92•23 years ago
|
||
No, I'm not doing timings on a debug build, but I turns out
I neglected to "export BUILD_OPT=1" which may be why I'm
running into the problem with all the stat() calls.
OK, so with this in my .mozconfig:
ac_add_options --disable-shared
ac_add_options --enable-static
ac_add_options --disable-debug
ac_add_options --enable-optimize
and with BUILD_OPT=1 in my environment, will I really get a
static optimized build? Is there anything else?
Can someone please explain to me why I have to set the
BUILD_OPT environment variable in addition to two config
options to get an optimized build?
What I'm measuring as "cold start" is a second or subsequent
run of a build after rebooting:
1. build it.
2. run it.
3. reboot.
4. time it.
5. repeat ad nauseum.
Thanks!
-Roger
Comment 93•23 years ago
|
||
OK, I changed my test setup to copy the dist directory structure and
thus remove the resolution of symlinks from my timings. Now I'm
getting results that make a lot more sense. The reordered executable
was faster and used less memory for both stripped and unstripped
binaries. The numbers below are from stripped executables in both
cases.
I ran each test case five times, and computed the mean and standard
deviation for cold start time, cold memory use, warm start time, and
warm memory use.
Reordering saves us 10% of cold start time and 8.6% of cold start
memory use, and 2.4% of warm start time and 9.2% of warm start memory
use on my 1GHz PIII/512MB/RHLinux7.0 machine. I will run these
experiments on a slower machine in Mountain View tomorrow or Wednesday
and I will also run experiments on Windows as soon as I can.
Timing numbers:
static default link ordering:
cold start cold mem warm start warm mem
8.985 20940 2.305 20756
9.174 20776 2.409 20816
9.174 20800 2.304 20808
9.029 20780 2.304 20768
9.373 20780 2.425 20764
----------------------------------------------------------------
9.147 (0.152) 20815.2 (70.4) 2.349 (0.062) 20782.4 (27.51)
static "startup" link ordering:
cold start cold mem warm start warm mem
8.395 19084 2.237 18976
8.074 18880 2.25 18852
8.129 19100 2.356 18872
8.373 18904 2.254 18812
8.151 19184 2.362 18880
----------------------------------------------------------------
8.224 (0.149) 19030.4 (132.2) 2.292 (0.062) 18878.4 (60.57)
cold start mean delta: .923 seconds (10.1%)
cold mem mean delta: 1784.8 MB (8.6%)
warm start mean delta: 0.57 secons (2.4%)
warm mem mean delta: 1904 MB (9.2%)
While I'm disappointed that the cold start time didn't improve by 20%
or more, I still think this optimization is worth doing for Linux.
Comment 94•23 years ago
|
||
So from seeing Roger's work under Linux would I be right in thinking
that this bug# is no longer mswindows2000-specific? If that's the
case, someone might want to change the bug's platform/OS.
Comment 95•23 years ago
|
||
The work I'm doing will apply to all the Win32 platforms, and
to any platform that uses GNU cc and GNU ld. I'm also planning
on looking into this on the Mac. There's really no way to specify
that with the choices on the menus. :-)
I have no problem with just leaving it Windows 2000. If another
platform/OS combination would help people keep track of things better
I'll be happy to accomodate, though.
Comment 96•23 years ago
|
||
Comment 97•23 years ago
|
||
A couple of people have asked about the tools I've been using,
so I attached them to this bug. They're still very rough!
Comment 98•23 years ago
|
||
Here's some data from a slower Linux machine: 200MHzPPro/64MB/RedHat7.1
static default link ordering:
cold start cold mem warm start warm mem
21.217 20564 13.787 20740
21.991 20616 12.034 20604
22.042 20600 11.822 20636
22.143 20544 12.032 20604
22.026 20628 12.302 20604
--------------------------------------------------------------
21.884 (0.377) 20590.4 (35.4) 12.3954(0.796) 20637.6(58.9)
static startup link ordering:
cold start cold mem warm start warm mem
19.12 18652 11.579 18660
19.228 18920 11.695 18668
19.278 18800 11.744 18668
19.194 18688 11.664 18656
18.778 18704 11.743 18656
--------------------------------------------------------------
19.1196(0.199) 18752.8(108.3) 11.685(0.068) 18661.6(6.07)
cold start mean delta: 2.76 seconds (12.6%)
cold mem mean delta: 1837.6 MB (8.9%)
warm start mean delta: .71 seconds (5.7%)
warm mem mean delta: 1976 MB (9.5%)
Comment 99•23 years ago
|
||
Cathleen checked some patches into STATIC_BUILD_20010612_BRANCH that I
posted to bug 46755 that do function reordering for Linux. So far it's
just the build changes and an initial ordering file; the tools aren't
in yet.
Comment 100•23 years ago
|
||
Argh, that should have been bug 46775 in my previous message!
Comment 101•23 years ago
|
||
I've been digging into why reordering doesn't help very much on
Windows. During startup mozilla executes 8272 functions from
mozilla.exe. Of these, 6704 can be reordered and 1568 cannot. Of the
latter, 695 are functions which have been declared static. The others
are functions that have names like "_$E10". I didn't track down all
873 of them, but the ones that I did track down correspond to static
or global variables of classes that have constructors. For example,
this code:
class Foo {
public:
Foo();
};
Foo foo;
will provoke VC++ into creating two functions named "_$E1" and "_$E2",
presumably so the C++ runtime can arrange to construct foo at the
proper time. (Apparently there are lots of violators of the C++
portability guideline
http://www.mozilla.org/hacking/portable-cpp.html#dont_use_static_constructors
:-)
Anyway, these two kinds of non-reorderable functions that we call
during startup are distributed fairly evenly throughout the .text
region of mozilla.exe without a whole lot of clustering. Not only
that, but moving an orderable function away from a non-orderable
function from the same compilation unit could potentially make things
worse, because two functions from the same compilation unit are likely
to be near one another in the address space with the default ordering
(remember, we're talking about functions that are actually getting
called).
I believe this explains why reordering doesn't help as much on Windows
as it does on Linux. I could try to improve this by changing
functions to be global instead of static and finding alternatives to
global objects with constructors, but I'll bet I could help out
performance more by spending time on other stuff.
I'm still trying to decide whether it's worth it to go ahead with
Windows reordering anyway. Based on my experiments I would expect to
see a slight but measurable improvement of around 5% on startup time.
The reason I'm hesitating is that if we don't keep the .order files
current I'm worried that startup time will actually be worse than with
the default ordering because of the static functions. Comments?
Comment 102•23 years ago
|
||
*** Bug 68802 has been marked as a duplicate of this bug. ***
Comment 103•23 years ago
|
||
I'm leaving this open but moving the milestone out. I've
opened a separate bug 87695 for checking in new order files
to the branch prior to shipping.
After I get CVS write access I'll check my tools into the trunk.
-Roger
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Comment 104•23 years ago
|
||
Doesn't look like this is getting fixed before the freeze tomorrow night.
Pushing out a milestone. Please correct if I'm mistaken.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Comment 105•23 years ago
|
||
I had an Aha! moment on this bug. It is true that file static functions are not
reordered and this apparently leaves us with tons of pages where uncalled
functions are interleaved with called static functions. My insight was that we
should be reordering both the called and uncalled functions! Though uncalled
statics will still be mixed with called statics, we should be able to lump
together the bulk of the uncalled functions and end up with a lot of code pages
that never need to get swapped in.
I did some small scale experiments and this ought to do what I expect. I started
to setup a tree to do this and discovered some whacky stuff in the system
currently in the tree...
1) It looks to me like we overload MOZ_COVERAGE to both mean: "generate hook
functions and dump win32.order files" AND "use win32.order files at link time".
That seems crazy.
2) The code in trace.cpp makes some bad assumptions about hashtable init and so
it crashes during shutdown during the build process in something called
nsztool.exe.
3) We don't seem to be ordering exe files. mozilla.exe is not so small that this
would not help.
4) We should have code in the exe to trigger the trace.cpp dump on shutdown so
that only the exe's we care about (mozilla.exe) will do this.
I'm inclined to find some time to fiddle with all this and see where I end up.
Comment 106•23 years ago
|
||
jband: Good thinking!
order.tgz has a bunch of work I did on this which is not checked
into the tree. My "trace.cpp" is completely different; it just
logs all function access to a file, which is then processed
later by the "order" program. You can hack "order" to put all
unused non-static functions at the end.
I would suggest doing this work on a static build, for maximum
return on investment.
-Roger
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → ---
Comment 108•23 years ago
|
||
"
I would suggest doing this work on a static build, for maximum
return on investment.
"
see bug 104483 (for windows)
and the attached patches
Reporter | ||
Comment 109•23 years ago
|
||
Reassigning to shaver for now. shaver, if you don't want it, send it back,
postage is guaranteed.
Assignee: cathleen → shaver
Comment 110•23 years ago
|
||
I've been working on this the past week or so (on windows), so unless shaver
really wants it I'll take it.
Assignee: shaver → dveditz
Updated•23 years ago
|
Status: NEW → ASSIGNED
Updated•23 years ago
|
Keywords: mozilla1.0+
Comment 111•23 years ago
|
||
remove self
Updated•23 years ago
|
Keywords: mozilla1.0+ → mozilla1.0-
Updated•20 years ago
|
Product: Browser → Seamonkey
Updated•18 years ago
|
Assignee: dveditz → nobody
Status: ASSIGNED → NEW
Updated•16 years ago
|
Priority: P5 → --
QA Contact: granrosebugs → build-config
Comment 112•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Status: NEW → UNCONFIRMED
Comment 113•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Comment 114•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Comment 115•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Comment 116•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Comment 117•15 years ago
|
||
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state.
If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way.
If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar).
If no action happens within the next few months, we move this bug report to an EXPIRED state.
Query tag for this change: mass-UNCONFIRM-20090614
Updated•15 years ago
|
Ever confirmed: false
Product: SeaMonkey → Core
Comment 118•10 years ago
|
||
https://developer.mozilla.org/en-US/docs/Measuring_Code_Coverage_on_Firefox and https://wiki.mozilla.org/QA:CodeCoverage imply that this has already been implemented. Close?
Comment 119•10 years ago
|
||
Actually, this bug has nothing to do with code coverage, despite its name. Its intent appears to be to do PGO for our builds... which we've been doing already for a few years.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago → 10 years ago
Resolution: --- → WORKSFORME
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•