Closed Bug 65845 Opened 24 years ago Closed 10 years ago

incorporate code coverage info into link process

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: waterson, Unassigned)

References

Details

(Keywords: embed, memory-footprint)

Attachments

(14 files)

5.21 KB, patch
Details | Diff | Splinter Review
625 bytes, application/octet-stream
Details
6.64 KB, patch
Details | Diff | Splinter Review
4.80 KB, patch
Details | Diff | Splinter Review
677 bytes, text/plain
Details
2.78 KB, text/plain
Details
2.83 KB, text/plain
Details
6.03 KB, text/plain
Details
5.06 KB, patch
Details | Diff | Splinter Review
60.04 KB, text/plain
Details
543 bytes, patch
Details | Diff | Splinter Review
1.85 KB, patch
Details | Diff | Splinter Review
2.72 KB, patch
Details | Diff | Splinter Review
14.89 KB, application/octet-stream
Details
Per jband's suggestion, we should collect function usage information and use the windows linker's `/ORDER:@filename' flag to layout the function to minimize (or at least reduce) the working set. Ideally, we could simply: 1. Do a `special' MOZ_COVERAGE build 2. Run the build and exercise a well-defined test suite that would get us decent coverage information. 3. Shut down the browser, at which point the coverage information (suitable as input to the linker) would be dumped into each DLL's `build' directory. 4. Modify release builds to incorporate the coverage file in the linker's /OUTPUT directive. I've done a bit of exploration on gathering coverage info. I'll attach the work-in-progress to the bug.
Keywords: embed, footprint
The above patch modifies the build to respond to MOZ_COVERAGE. When MOZ_COVERAGE is set in the environment, it will add `/Gh' to the CFLAGS, and link `trace.lib' with every executable and DLL in the produce. `trace.dll' is a proof-of-concept for the __penter() stuff, and records call counts by address in a hashtable. It uses the `imagehlp' stuff to dump demangled symbols on exit. Still to do: - determine input format for linker's /OUTPUT, make `trace.dll' dump in that format. - make `trace.dll' dump information per DLL, not just in one big mess. jband suggested that we could use the imagehlp routines to figure out what module a symbol comes from, as well as a path to the `.pdb' for the module. - figure out how to *not* build the `xpidl' compiler with /Gh! (Right now, you'll get call count information on `xpidl' every time we compile a type library!) Anyway, I wouldn't get too worried about crossing the t's and dotting the i's until we verify that this actually has a good net impact on resident set size, so doing the first two tasks is probably most important.
dprice told me that the docs he read indicate that the linker will error out if it finds exrtaneous entries in the order file (though the experiments are yet to be done). Assuming this turns out to be true, I think the longterm plan should be to write the per dll info NOT in a format that the linker will take, but instead with the additional per line call count data we can use when merging these files on subsequent runs. This will give us both extra per line info and left over lines for defunct symbols. That's fine. At release build time we would (as you suggested yesterday) do a link without an orderfile and then run a script to take the generated mapfile and the checked-in orderfile and produce the ready-to-use orderfile and then link again using that new orderfile. Double linking each dll (and the exe) should not be that bad and the ready-to-use orderfile can be built in the obj dir and not get in anyone's way. Also, you may find you need to rebase the MOZ_COVERAGE builds to get accurate info when mapping addresses to symbols - or is the current generation of imghelp smart enough to do the right thing on a dynamically relocated dll? The older imghelp did not used to get this right.
The output I'm getting from imagehlp on Win2K seems pretty reasonable, even without rebasing.
Hmm, digging around a bit more, I found this: http://msdn.microsoft.com/library/default.asp?URL=/library/winresource/dnwinnt/s 800a.htm Looks like there is a `working set tuner' that comes with the `Windows NT Resource Kit'? (Looking more closely, maybe this only shipped with NT 3.51?)
Okay, a couple more data points. - We'll need to make sure that we compile with `/Gy' to generate a COMDAT for each function. - The `order' file must contain decorated names for C++, and undecorated names for C; e.g., extern "C" int foo(int a) {...} --> foo int bar(int a, int b) {...} --> ?bar@@YAHHH@Z - If you specify a symbol in the order file that doesn't exist, you get a warning (LNK4037), not a hard error.
I dug out the recent articles I mentioned. Robbins poinst out that newer resource kits no longer ship that working set tuner. Its code (and Robbins' better code) are messy to use on a big project. That's why I suggested the scheme I did where we automatically dump the info in to each dll directory at shutdown.
adding build gurus to the cc list. What's the benefit to having this in the release builds over just doing this for profiling or footprint/performance analysis?
granrose: we will see. The theory is that we will significantly reduce the working set size.
Talked to sfraser, who reminded me that Mac CW linker supports "arrange" files, and we thought maybe the Windows-derived trace could produce an order file that is XP enough to help Mac, and that can be re-mangled and otherwise translated to work with the Mac linker, too. /be
I applied the work-in-progress to my tree and rebuilt it. I had to copy trace.dll and nspr4.dll to dist/WIN32_D.OBJ/install to get it to work. One of the executables generated in xpinstall needed it to finish linking. But the build ran and produced a ton of output. I found a simple program and recompiled it with /Gy Using dumpbin I was able to make a file containing COMDATs of all the functions. I reordered them, and rebuilt with /order enabled. When I dumped the contents of the new .obj none of the functions had moved. I haven't been able to figure out why.
You mean `when I dumped the contents of the new *.[exe|dll]*, none of the functions had moved', right? The linker won't re-order the contents of the .obj files...
So I just did a down and nasty test, which I'll attach. The /ORDER flag seems to work fine for me. I also discovered that the file you pass to `/ORDER' allows a semicolon (';') as a comment character, which is groovy.
Dave, I was able to get the 2nd set of diffs to run through a clean build with no problems. (It also turns off tracing in xpidl.exe, which was slowing down the build a lot!)
Ok, I was so damn curious to see if this would do anything that I ran out and hacked stuff up far enough to create /ORDER files and wedge them into the build. The bad news is this. According to `TaskInfo 2000', re-ordering doesn't make a whit of difference. Here's what I did. 1. Instrument a MOZ_PROFILE build with MOZ_COVERAGE, too. 2. Start winEmbed, shutdown. (Autoreg happens.) 3. Start winEmbed again. Shutdown. Collect symbols into `order.raw' file. 4. Write perl script to grovel through `order.raw' and produce per-module ordering information. Then, in another tree: 1. Build with MOZ_DEBUG off and BUILD_OPT on. This is a full-on release build, baby. 2. Start winEmbed. Autoreg happens. Shut down. 3. Start winEmbed. Look at TaskInfo 2000. Note working set numbers. Shutdown. 4. Install linker ordering files created above in the proper locations. 5. Twiddle `config/dll.inc' to notice ordering files. 6. Rebuild; watch all DLLs relink, check `cmd.cfg' and `dumpbin' resulting DLLs to make sure that ordering info is really getting picked up. 7. Start winEmbed. (Autoreg happens). Shutdown. 8. Start winEmbed. Note TaskInfo 2000 numbers (specifically, `In Mem KB' and `Working Set KB') look a whole lot like they did before. Specifically, 9,972KB before re-ordering, 9,984KB after re-ordering. 9. Curse. I think I've probably tested a `best case' scenario for our code reording magic. (Our working set is only gonna get bigger as more pages load, images start to pile in, and more code is touched.) Since `nothing happened', I conclude that either - I have screwed stuff up somehow - There are more sinister problems afoot cc'ing other Win32 gurus.
You didn't mention rebasing. Did you do it? The loader might load a lot of pages that would not otherwise be loaded if module base fixups are necessary. Also, it is not just a question of startup memory. If you cause lots of pages to be swapped out (say by iconizing) and then bring the app back to the front and do stuff, then the hope would be that the vm would need to map in less of the code pages.
Oh damn. I forgot to rebase. Let me try that.
Okay, now I've tried this with rebasing. Working set of a rebased build, without re-ordering: ~8860KB. Working set of a rebased build, with re-ordering, ~8528KB. So that's about a 300KB savings in working set, or ~3%. dprice: it might be cool to use that tool you've got that reads `dumpbin' output to see if the math works out here. Specifically, for each DLL, what do we compute the sum of the `required' code size to be? Does this match up with the the numbers that the OS is giving us?
Ok, so I've spent some more time with this hack, and I think maybe it's looking a bit better. jband suggested that my `just startup winEmbed' stuff wasn't going to be a very good test. So, I collected data by starting mozilla, browsing around doing normal stuff for an hour or so. Then, in my rebased but unordered `release' build, I start up and browse through several pages; get a working set size of ~14MB. Apply the order files, relink, rebase. Do a similar sprawl through several pages, get a working set size of ~12MB. In both cases, I started the browser and it peaked at 15MB. I then minimized the browser (which kicks in some Win32 logic that puts as much of the process on the `standby list' as possible). Then bring the browser window back up and continue to use it.
Chris, You don't sound real excited. I think that a change to keep 2 megs of unused code from clogging up physical memory pages is an awesome achievement. 12/14 may not sound like a huge proportion, but that represents all the memory used by the process, no? The thing you are fixing is the amount of physical memory dedicated to unused code. I expect that a measurement of code pages mapped to memory before and after your change will show a much more impresive looking proportion. It seems to me that you are gettig into the ballpark of the numbers that our code coverage analysis suggested we were wasting. Good work.
Sorry about the spam. An unset priority should be less than P5.
Priority: -- → P5
I've attached a new version of trace.cpp that creates sorted order files all over the tree. Please review it and let me know what you think.
Nice work! Couple of comments... - Add the MPL license verbage at the top of the file. - You've hard-coded `y:\' as the search path. Is there a better way to get this value? If we can't deduce it from within trace.cpp, maybe you could communicate the MOZ_SRC environment variable into this modules compile by adding a `-DMOZ_SRC=$(MOZ_SRC)' - Go ahead and nuke the commented-out code in Reporter::~Reporter()
how do you feel about hard coding the depth, "..\..\..\..\"? jband mentioned that the location on the .pdb file is at the very end of the .dll It should be fairly simple to extract what we need from the .dlls themselves.
Sure, that might work.
I went with the MOZ_SRC option, in case someone set MOZ_OUT to somewhere other than MOZ_SRC
Attached patch final diffsSplinter Review
attached final versions. Asking leaf and reviewers@mozilla.org for a review.
sr=leaf for the build changes.
checked in
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Should this really have been closed? Did we really finish the task: "incorporate code coverage info into link process"?
Re-opening, because there are a few more things to finish up... 1. Either I gave you a bad patch, or it applied wrong, or something. We want `/Gh' to be added to CFLAGS independently of whether or not MOZ_DEBUG is defined. I'll attach a patch that corrects this. 2. We need to check in a first cut at the `win32.order' files, generated from an optimized build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Also, we should probably let this bug cover the Win32 code, and open new bugs for Mac and Linux.
as cls-prime for windows, sr=leaf for the patch to config.mak posted by waterson.
I've sent mail to drivers@mozilla.org asking permission to check this in. Right now the tree is closed unless they bless your bug :(
blizzard? this is relatively safe.
Yep. Go for it.
order files are checked in. To find the right home for the order files, trace.dll is doing a search for the .ilk files. When you pass an order file to the linker, it doesn't do any incremental linking. So no .ilk files are created. I'm gunna attach a patch that searched for the .pdb files. That patch includes changes to makefile.win to only build trace.dll if MOZ_COVERAGE is defined. jband, waterson wanna give me an r= or sr=?
r=waterson
Hey dprice, I'm seeing a fair number linker warnings when doing a MOZ_PROFILE build; e.g.: win32.order : warning LNK4097: duplicate name "?GetAttributeMappingFunctions@nsHTMLObjectElement@@UBGIAAP6AXPBVnsIHTMLMappedAt tributes@@PAVnsIMutableStyleContext@@PAVnsIPresContext@@@Z3@Z" in order file; using last one Could you double-check the files you added and make sure that there aren't duplicates? Also, we should probably make sure that we ignore the order files when doing a debug build...probably need to fix mozilla/config/dll.inc
I modified trace.dll to find duplicate entries in the list of function names. It finds a TON of them. I can modify it to remove duplicate entries that have lower call counts. Or I can pass through the list once, summing the call counts. Then pass through a second time to insert the function into the list.
sr=leaf for the MOZ_COVERAGE check
Here's a fix to make us not use the order files in debug builds. Index: dll.inc =================================================================== RCS file: /cvsroot/mozilla/config/dll.inc,v retrieving revision 3.5 diff -u -r3.5 dll.inc --- dll.inc 2001/02/07 02:19:29 3.5 +++ dll.inc 2001/02/13 20:34:37 @@ -83,7 +83,7 @@ !ifdef MAPFILE /MAP:$(MAPFILE) !endif -!if exist(win32.order) +!if exist(win32.order) && !defined(MOZ_DEBUG) /ORDER:@win32.order !endif $(LFLAGS) I wonder if, for makefile hygene, we should use a variable (e.g., $(ORDERFILE)) instead of hardcoding the file name. leaf?
yes, ORDERFILE should probably be used, and defined in the local makefiles. good catch, chris.
A few comments.... It looks like this is mostly working. Great! On the duplicate entries question... What do you think is causing this? Each function has only one real entry point. Given a set of addresses that claim to map to the same symbol you can figure out which one *really* does map by using SymGetSymbolInfo. SymGetSymPrev and SymGetSymNext can tell you about the boundaries of the symbols that are known to be near by. Do you think these are functions without known symbols, or what? I think it is important to figure out what is really happening here because if we are improperly attributing some calls to the wrong symbol (or under-counting calls) then we can end up reordering such that calls are being made into pages where we thought no calls were being made. This can defeat much of what we are trying to gain. Also, I take it the subtraction of 5 from addr in the Log function is the known offset back from the return address in the call to penter to the real entry address of the function calling penter, right? I'm looking at an order file and seeing a bunch of functions that should be inlined. I'm guessing that this was generated on a debug build, right? These should be generated on a rebased optimized build after doing some reasonable set of user activities. That is the plan, right? As I originally imagined this, we would be accumulating this information on various runs over time. This requires the ability to read the order files, merge in info from the current run, and then write out the new file. You are planning to support this, right? *Please* rename trace.cpp to something that says a little more about what it is and why it is there. We are not limited to 8.3 names. Maybe something like Win32SymbolOrderer.cpp or some such thing.
> On the duplicate entries question... What do you think is causing this? `static' functions. I've talked to dprice about this, and we aren't going to incude any functions where SymGetSymFromAddr() shows a non-zero displacement. Also N.B. that the ordering files that were initially checked in *were* generated from a debug build, which (as you note) includes many out-of-line impls :-(. dprice is going to fix that. With respect to reading existing order files, filed bug 68802.
> `static' functions. I've talked to dprice about this, and we aren't going to > incude any functions where SymGetSymFromAddr() shows a non-zero displacement. I don't understand why this would generate function calls targeting multiple addresses (maybe I don't need to understand). Does this happen in release builds too? But anyway, why not include them? It seems like it would be better to lump them together - sum the the call counts for the dups and add only one entry rather than risk missing the symbol. Or did someone confirm that symbols with non-zero displacement entries *always* also have a symbol with zero-displacement? We for sure don't want to miss any.
new order files that greatly reduce the number of warnings
milestone = .9
Target Milestone: --- → mozilla0.9
As I see these win32.order files popping up recently, I assume this is still being worked on. Are there any new numbers on performance/footprint gain? Is it really worthwhile? I am thinking about enabling this on OS/2 as well. Unfortunately, these win32.order files use the mangled Win32 names so we would have to write a tool that reads in the file, transmangles the names to our OS/2 compiler. Maybe the same thing can be done for Unix/gcc and Mac/CW. Does that make sense?
There's still work going on to clear up some problems with the order files, so we don't have solid numbers on the overall footprint changes. Right now we aren't seeing a huge win, but some of the larger dlls aren't being reordered properly. The MSVC++ linker requires that we give it mangled names. We can tweak it to provide unmangled names. I don't know what the compilers on unix or OS/2 require for function ordering.
dprice, I'm still seeing a ton of warnings doing an optimized build; e.g., _PreshShell::RepaintSelection does not exist; ingored Please fix this. Do you want me to file another bug.
And my tests with filemon show that we're still reading scattered chunks from our .dll's on startup, so it doesn't appear to be helping.
Dude! Puh-leeze fix this! (Or yell ``uncle'' and hand it off if you're too swamped.) The latest order files (as of 2001-04-04) do not contain ``decorated'' names. Here's a sample of mozilla/content/build/win32.order: StyleContextImpl::GetStyleData ; 64925329 NS_NewStyleContext ; 48923225 nsDocumentFragment::AddRef ; 37241576 CSSStyleRuleImpl::FirstSelector ; 28952348 nsGenericElement::Release ; 20657952 So, something broke recently, and you're basically generating useless files right now. :-(
I have brand new happy order files, ready to check in.
dprice: do these order files have demangled names? (Like the last set you checked in?) If not, wouldn't you have changed something in mozilla/config/trace.cpp? Could you attach the patch to that file?
A while back I had checked in order files that contained undecorated names. I had my build set up incorrectly. I went back in and thought I had fixed the problem. It turns out that some of the files were missed. pdb files were not generated for some of the modules, so trace.cpp failed to find a good home for the order file. The forthcomig diffs make certain PDB files are made when you've set both MOZ_PROFILE and MOZ_COVERGE trace.cpp now creates a logfile where one can check for errors. The demangling of the C names wasn't quite right either. trace.cpp now removes the leading _ from C style names.
Status: REOPENED → ASSIGNED
- Don't nest MOZ_PROFILE and MOZ_COVERAGE. It turns out MOZ_PROFILE doesn't really need /PDB:NONE, so you can just do either/or here. !if defined(MOZ_PROFILE) || defined(MOZ_COVERAGE) - You're inflicting a rather schizophrenic style on the file; e.g., ``if('' vs. ``if ('' Does whitespace preceded the ``if'' (while, for, ...) condition or not? Stock mozilla style is the latter. ``(cond)'' vs. ``( cond )'' Are we offsetting parens or not? Stok mozilla style is the latter. ``(param, param)'' vs. ``(param,param)'' Are parameters separated by space, or not? Stock mozilla style is the former. Make up your mind! - For this: + if( displacement > 0 ) return PL_DHASH_NEXT; Put the return on the next line, so you can set a breakpoint on it in the debugger. - This could use a little explanation (by which I mean, a comment) while( cur ) { - if(cur->function[0] == '_' && cur->function[1] == '_') + if(cur->function[0] == '_' ) fprintf(orderFile,"%s ; %d\n", cur->function+1, cur->count ); - Is a separate logfile really necessary?
``(cond)'' vs. ``( cond )'' Are we offsetting parens or not? Stock mozilla style is the latter. s/latter/former/ -- right? /be
You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build. Do you prefer !if defined(MOZ_COVERAGE) || defined(MOZ_PROFILE) or leaving it as !ifdef (MOZ_PROFILE) and changing it to /PDB:$(PDBNAME) ? I'm not in love with the logfile. I'd be happy with it printing to the console. I tried using printf and running with -console, but none of the output was displayed. Should I do something more? Or was whatever controls output to the console already shutdown before the code in trace.dll got to run?
> You need to set MOZ_PROFILE and MOZ_COVERAGE when you do a coverage build. Why would you need to set MOZ_PROFILE in addition to MOZ_COVERAGE? (Assuming you do the MOZ_PROFILE || MOZ_COVERAGE thing.) I think that's the only file and line where the MOZ_PROFILE variable is used. Regardless, MOZ_PROFILE (that is, a build meant to be used with Quantify) will work with either /PDB:NONE or /PDB:$(PDBFILE), so do whichever thing incurs the least amount of brain-print. :-)
you're right, MOZ_PROFILE isn't required for MOZ_COVERAGE, so I'll go with the A || B code. I've also straightened up all of the schitzo spacing problems. Any thoughts on the output to console problem? Do ya need to see another diff?
r=jband on last patch.
->0.9.1
Target Milestone: mozilla0.9 → mozilla0.9.1
dprice: Please take the time to post to this bug the answers to two questions: 1) How much measured benefit do we get from these order files? What methodology was used in that measurement? 2) What *exactly* is the proceedure used to generate the order files? I'm especially interested in exactly what 'user-activity' is performed in the browser during the run(s) when these order files are generated. Is this done on first run? What parts of the browser get exercised? Thanks!
I've done two things to test the impact of the order files. I've run through jrgm's page loader test. There was no noticable improvement in page loads. I also measured the size of the resident set. I used vadump.exe (from the win2k resource kit) and saw there there's no real difference in the number of pages kept in memory. I've seen that everything is working correctly (vadump shows me the symbols on each page, and they're in the order specified in the order files) But I haven't seen signifigant wins from the reordering. To generate the order files, I build a MOZ_COVERAGE build which links in trace.dll Then I run that build. I got to my.netscape.com, www.cnn.com, run 5 iterations through jrgm's page loader test, open my email and read a few messages, send a test email, read a newsgroup, visit the bugzilla query page and run a query then edit the results page. I make certain the mozilla/dist directory is gone, to be sure that it is a first time execution.
Thanks Dave. Hmm. A few thoughts... With this stuff all in place we should put some effort into *making* it show measurable improvement. If we can't show said improvement then there is no good reason to keep it in place and we should remove it if it's not helping. Your proceedure for generating the order files sounds good except, I suggest, you do this on a second run. There may well be a lot of code that *only* runs the first time and there's no use adding it to the mix. I don't think optimizing first run memory useage is the goal. On testing, I suggest a test where you load the browser, load a page or two, then iconize the browser (and run some other app if necessary) to get it to swap pages out. *Then* bring the browser forward and load a page and see how much memory it needs to swap back in so that it can service the request. We *know* we only running roughly half the code. Segregating the used code from the unused code has got to save *something*. Either the measurements are not showing the gain, or we're just not doing the right thing to segregate the code.
It has been agreed that I should look into this. I have some previous experience using cord/pixie on IRIX which hopefully I can apply to this problem. There some relevant discussion in bug 68802 that should probably have take place here.
Assignee: dprice → rogc
Status: ASSIGNED → NEW
Finally go that patch checked in today.
new entry point not found bugs keep cropping up. We're not seeing a real win for the ordering that's there now. So after talking this over with waterson, choffman and kandrot I'm going to disable the use of the win32.order files unless MOZ_COVERAGE is set. This will turn it off for the optimized builds. here's a diff for your enjoyment: Index: dll.inc =================================================================== RCS file: /cvsroot/mozilla/config/dll.inc,v retrieving revision 3.6 diff -u -r3.6 dll.inc --- dll.inc 2001/02/15 23:04:33 3.6 +++ dll.inc 2001/05/10 00:23:11 @@ -83,7 +83,7 @@ !ifdef MAPFILE /MAP:$(MAPFILE) !endif -!if exist(win32.order) && !defined(MOZ_DEBUG) +!if exist(win32.order) && !defined(MOZ_DEBUG) && defined(MOZ_COVERAGE) /ORDER:@win32.order !endif $(LFLAGS)
ever since dprice's changes to mozilla/config, talkback symbols have been torched; still investigating why, if you want to help me look, dave, i'd appreciate the help.
oh, i get it. you're making the pdb files explicit, while we were expecting the debugging symbols to be in the binaries in dist/ (we use another utility to strip them out into pdb files). If i can't get talkback reporting to work with the several .pdb files scattered around the tree, is it ok to split up the PROFILE and COVERAGE cases?
leaf: go for it.
Moving milestone out. Still doing research.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Blocks: 7251, 71874
*** Bug 68855 has been marked as a duplicate of this bug. ***
I've been doing some stuff on Linux. I've figured out how to gather a trace of function calls and specify a function ordering to the linker. I've tried a couple of different algorithms for specifying the order: Nat Friedman's "node grouping" algorithm from the grope paper, and just putting them in the order in which they were called ("startup"). "startup" does better than "node grouping" for both memory use and startup time. Using the "startup" algorithm, mozilla's resident size is slightly reduced: normal static mozilla-bin: 20704 KB reordered "startup" mozilla-bin: 18932 KB The reordered executable is also slightly smaller than the normal static executable. If I strip both the normal static mozilla-bin and the reordered mozilla-bin, I get a slight startup time improvement: normal static mozilla-bin stripped: 19.914 seconds reorderd "startup" mozilla-bin stripped: 18.948 seconds If I don't strip the executables, the reordered mozilla-bin actually does worse: normal static mozilla-bin unstripped: 21.726 seconds reorderd "startup" mozilla-bin unstripped: 23.411 seconds This is mystery #1. Second startup is between 2 and 2.5 seconds for all the cases that I tried. The command "cp mozilla-bin /dev/null" takes about 8 seconds if mozilla-bin has not been run. Running two different versions of mozilla-bin linked against all the same system libraries (exluding NSPR and ldap libraries) takes about the same amount of time: there is not a significant advantage to having another program that uses the same libraries run first. I would expect our cold startup time to be about: 2.5 seconds (warm start) + 8 seconds (cp mozilla-bin /dev/null) - ~2 seconds (some savings from reordering to avoid reading the entire exe) + ~2 seconds (some small number of seconds for mapping other libraries) -------- ~10.5 seconds Cold start is taking about 8 seconds longer than I would expect. This is mystery #2. All these experiments have taken place on my 1GHz Pentium III w/ 512MB memory running RedHat Linux 7.0 (+ newer compilers). I will continue digging into these mysteries. Any and all tips and suggestions are appreciated :-) Thanks! -Roger
OK, I've found at least part of the answer to mystery #2. By looking at a timeline and strace output, I see that we spend between 6 and 11 seconds on cold start stat()ing the 126 .xpt files in dist/bin/components 4 times each. Sure enough, I run a shell script that runs "ls -lL" 4 times in succession on that directory and it takes 9 seconds. I'll file a separate bug. This is all happening from within a call to NS_SetupRegistry_1() in nsAppRunner.cpp. -Roger
I filed bug 84871 for the stat() issue in case you're interested. -Roger
It turns out the stat() issue goes away when you have real files in the components directory rather than symlinks, so I marked bug 84871 invalid. The upshot is that I've learned something about how to set up my test environment. Sorry for the SPAM. :-) -Roger
rogc: Are you wasting your time profiling debug builds? We do autoregistration of xpt file and DLLS on every startup of debug builds but not for optimized builds. This is very much intentional. Pleny of other stuff happens in debug builds that would skew measurement. What exectly do you mean by "Cold start"? First run in a clean directory that has no autoreg created registries? Or just a run of a previously run build after a reboot to clear any caching? Or something else?
No, I'm not doing timings on a debug build, but I turns out I neglected to "export BUILD_OPT=1" which may be why I'm running into the problem with all the stat() calls. OK, so with this in my .mozconfig: ac_add_options --disable-shared ac_add_options --enable-static ac_add_options --disable-debug ac_add_options --enable-optimize and with BUILD_OPT=1 in my environment, will I really get a static optimized build? Is there anything else? Can someone please explain to me why I have to set the BUILD_OPT environment variable in addition to two config options to get an optimized build? What I'm measuring as "cold start" is a second or subsequent run of a build after rebooting: 1. build it. 2. run it. 3. reboot. 4. time it. 5. repeat ad nauseum. Thanks! -Roger
OK, I changed my test setup to copy the dist directory structure and thus remove the resolution of symlinks from my timings. Now I'm getting results that make a lot more sense. The reordered executable was faster and used less memory for both stripped and unstripped binaries. The numbers below are from stripped executables in both cases. I ran each test case five times, and computed the mean and standard deviation for cold start time, cold memory use, warm start time, and warm memory use. Reordering saves us 10% of cold start time and 8.6% of cold start memory use, and 2.4% of warm start time and 9.2% of warm start memory use on my 1GHz PIII/512MB/RHLinux7.0 machine. I will run these experiments on a slower machine in Mountain View tomorrow or Wednesday and I will also run experiments on Windows as soon as I can. Timing numbers: static default link ordering: cold start cold mem warm start warm mem 8.985 20940 2.305 20756 9.174 20776 2.409 20816 9.174 20800 2.304 20808 9.029 20780 2.304 20768 9.373 20780 2.425 20764 ---------------------------------------------------------------- 9.147 (0.152) 20815.2 (70.4) 2.349 (0.062) 20782.4 (27.51) static "startup" link ordering: cold start cold mem warm start warm mem 8.395 19084 2.237 18976 8.074 18880 2.25 18852 8.129 19100 2.356 18872 8.373 18904 2.254 18812 8.151 19184 2.362 18880 ---------------------------------------------------------------- 8.224 (0.149) 19030.4 (132.2) 2.292 (0.062) 18878.4 (60.57) cold start mean delta: .923 seconds (10.1%) cold mem mean delta: 1784.8 MB (8.6%) warm start mean delta: 0.57 secons (2.4%) warm mem mean delta: 1904 MB (9.2%) While I'm disappointed that the cold start time didn't improve by 20% or more, I still think this optimization is worth doing for Linux.
So from seeing Roger's work under Linux would I be right in thinking that this bug# is no longer mswindows2000-specific? If that's the case, someone might want to change the bug's platform/OS.
The work I'm doing will apply to all the Win32 platforms, and to any platform that uses GNU cc and GNU ld. I'm also planning on looking into this on the Mac. There's really no way to specify that with the choices on the menus. :-) I have no problem with just leaving it Windows 2000. If another platform/OS combination would help people keep track of things better I'll be happy to accomodate, though.
OS: Windows 2000 → All
Hardware: PC → All
A couple of people have asked about the tools I've been using, so I attached them to this bug. They're still very rough!
Here's some data from a slower Linux machine: 200MHzPPro/64MB/RedHat7.1 static default link ordering: cold start cold mem warm start warm mem 21.217 20564 13.787 20740 21.991 20616 12.034 20604 22.042 20600 11.822 20636 22.143 20544 12.032 20604 22.026 20628 12.302 20604 -------------------------------------------------------------- 21.884 (0.377) 20590.4 (35.4) 12.3954(0.796) 20637.6(58.9) static startup link ordering: cold start cold mem warm start warm mem 19.12 18652 11.579 18660 19.228 18920 11.695 18668 19.278 18800 11.744 18668 19.194 18688 11.664 18656 18.778 18704 11.743 18656 -------------------------------------------------------------- 19.1196(0.199) 18752.8(108.3) 11.685(0.068) 18661.6(6.07) cold start mean delta: 2.76 seconds (12.6%) cold mem mean delta: 1837.6 MB (8.9%) warm start mean delta: .71 seconds (5.7%) warm mem mean delta: 1976 MB (9.5%)
Cathleen checked some patches into STATIC_BUILD_20010612_BRANCH that I posted to bug 46755 that do function reordering for Linux. So far it's just the build changes and an initial ordering file; the tools aren't in yet.
Argh, that should have been bug 46775 in my previous message!
I've been digging into why reordering doesn't help very much on Windows. During startup mozilla executes 8272 functions from mozilla.exe. Of these, 6704 can be reordered and 1568 cannot. Of the latter, 695 are functions which have been declared static. The others are functions that have names like "_$E10". I didn't track down all 873 of them, but the ones that I did track down correspond to static or global variables of classes that have constructors. For example, this code: class Foo { public: Foo(); }; Foo foo; will provoke VC++ into creating two functions named "_$E1" and "_$E2", presumably so the C++ runtime can arrange to construct foo at the proper time. (Apparently there are lots of violators of the C++ portability guideline http://www.mozilla.org/hacking/portable-cpp.html#dont_use_static_constructors :-) Anyway, these two kinds of non-reorderable functions that we call during startup are distributed fairly evenly throughout the .text region of mozilla.exe without a whole lot of clustering. Not only that, but moving an orderable function away from a non-orderable function from the same compilation unit could potentially make things worse, because two functions from the same compilation unit are likely to be near one another in the address space with the default ordering (remember, we're talking about functions that are actually getting called). I believe this explains why reordering doesn't help as much on Windows as it does on Linux. I could try to improve this by changing functions to be global instead of static and finding alternatives to global objects with constructors, but I'll bet I could help out performance more by spending time on other stuff. I'm still trying to decide whether it's worth it to go ahead with Windows reordering anyway. Based on my experiments I would expect to see a slight but measurable improvement of around 5% on startup time. The reason I'm hesitating is that if we don't keep the .order files current I'm worried that startup time will actually be worse than with the default ordering because of the static functions. Comments?
*** Bug 68802 has been marked as a duplicate of this bug. ***
I'm leaving this open but moving the milestone out. I've opened a separate bug 87695 for checking in new order files to the branch prior to shipping. After I get CVS write access I'll check my tools into the trunk. -Roger
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Doesn't look like this is getting fixed before the freeze tomorrow night. Pushing out a milestone. Please correct if I'm mistaken.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
No longer blocks: 7251
Blocks: 7251
I had an Aha! moment on this bug. It is true that file static functions are not reordered and this apparently leaves us with tons of pages where uncalled functions are interleaved with called static functions. My insight was that we should be reordering both the called and uncalled functions! Though uncalled statics will still be mixed with called statics, we should be able to lump together the bulk of the uncalled functions and end up with a lot of code pages that never need to get swapped in. I did some small scale experiments and this ought to do what I expect. I started to setup a tree to do this and discovered some whacky stuff in the system currently in the tree... 1) It looks to me like we overload MOZ_COVERAGE to both mean: "generate hook functions and dump win32.order files" AND "use win32.order files at link time". That seems crazy. 2) The code in trace.cpp makes some bad assumptions about hashtable init and so it crashes during shutdown during the build process in something called nsztool.exe. 3) We don't seem to be ordering exe files. mozilla.exe is not so small that this would not help. 4) We should have code in the exe to trigger the trace.cpp dump on shutdown so that only the exe's we care about (mozilla.exe) will do this. I'm inclined to find some time to fiddle with all this and see where I end up.
jband: Good thinking! order.tgz has a bunch of work I did on this which is not checked into the tree. My "trace.cpp" is completely different; it just logs all function access to a file, which is then processed later by the "order" program. You can hack "order" to put all unused non-static functions at the end. I would suggest doing this work on a static build, for maximum return on investment. -Roger
Target Milestone: mozilla0.9.4 → ---
To Cathleen
Assignee: rogc → cathleen
" I would suggest doing this work on a static build, for maximum return on investment. " see bug 104483 (for windows) and the attached patches
Reassigning to shaver for now. shaver, if you don't want it, send it back, postage is guaranteed.
Assignee: cathleen → shaver
I've been working on this the past week or so (on windows), so unless shaver really wants it I'll take it.
Assignee: shaver → dveditz
Status: NEW → ASSIGNED
Keywords: mozilla1.0+
remove self
Keywords: mozilla1.0+mozilla1.0-
Product: Browser → Seamonkey
Assignee: dveditz → nobody
Status: ASSIGNED → NEW
Priority: P5 → --
QA Contact: granrosebugs → build-config
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
Status: NEW → UNCONFIRMED
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
This bug report is registered in the SeaMonkey product, but has been without a comment since the inception of the SeaMonkey project. This means that it was logged against the old Mozilla suite and we cannot determine that it's still valid for the current SeaMonkey suite. Because of this, we are setting it to an UNCONFIRMED state. If you can confirm that this report still applies to current SeaMonkey 2.x nightly builds, please set it back to the NEW state along with a comment on how you reproduced it on what Build ID, or if it's an enhancement request, why it's still worth implementing and in what way. If you can confirm that the report doesn't apply to current SeaMonkey 2.x nightly builds, please set it to the appropriate RESOLVED state (WORKSFORME, INVALID, WONTFIX, or similar). If no action happens within the next few months, we move this bug report to an EXPIRED state. Query tag for this change: mass-UNCONFIRM-20090614
Ever confirmed: false
Product: SeaMonkey → Core
Actually, this bug has nothing to do with code coverage, despite its name. Its intent appears to be to do PGO for our builds... which we've been doing already for a few years.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago10 years ago
Resolution: --- → WORKSFORME
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: