Open
Bug 553721
Opened 15 years ago
Updated 2 years ago
Improve code locality in Windows builds
Categories
(Firefox :: General, defect)
Tracking
()
NEW
People
(Reporter: dietrich, Unassigned)
References
Details
(Whiteboard: [ts][win])
Attachments
(2 files, 4 obsolete files)
8.22 KB,
patch
|
Details | Diff | Splinter Review | |
204.41 KB,
application/x-zip-compressed
|
Details |
From Ehsan:
So, for Windows, the key is the /ORDER linker option[1]. With this
function, you should pass in a text file which specifies the exact
order in which functions should be linked. This file, of course, uses
the mangled method names. Here's how you can get the mangles names
for arbitrary method names, if needed: [2]. These are the basics of
how to get it to work on Windows. IIRC, if the function order file
you pass to the linker does not contain all linked functions, the
linker will dump the unspecified functions after all specified ones,
in an unpredictable order.
[1] http://msdn.microsoft.com/en-us/library/00kh39zz.aspx
[2] http://msdn.microsoft.com/en-us/library/b06ww5dd.aspx
Reporter | ||
Updated•15 years ago
|
Whiteboard: [ts]
Updated•15 years ago
|
Whiteboard: [ts] → [ts][win]
Comment 1•15 years ago
|
||
Isn't this something that PGO does for us automatically? Function-level ordering may not always be the best way (i.e. you want the hot paths on the same page and the cold paths on a different page).
Shouldn't prefetch and super-fetch make this work almost redundant? Also, the address space layout randomization in Windows Vista+ could make this a wasted effort if the code pages are moved around in memory.
Comment 3•15 years ago
|
||
(In reply to comment #2)
> Shouldn't prefetch and super-fetch make this work almost redundant?
The prefetch/super-fetch features predict which pages from disk will be needed but we could still improve performance by laying out related blocks of code in adjacent pages since disk seek time still matters for those of us (read: most) who don't have an SSD.
> Also, the
> address space layout randomization in Windows Vista+ could make this a wasted
> effort if the code pages are moved around in memory.
The address space randomization in Vista+ is not that sophisticated - it shifts executables up and down by increments of page size - doing otherwise could hurt performance of programs by destroying alignment. This randomizes the upper bits of addresses which is enough to cause trouble in most cases for shell code.
>>
The address space randomization in Vista+ is not that sophisticated - it shifts
executables up and down by increments of page size - doing otherwise could hurt
performance of programs by destroying alignment. This randomizes the upper bits
of addresses which is enough to cause trouble in most cases for shell code.
>>
Aah, thanks for that clarification.
As regards Prefetch, it predicts which pages need to be read from disk by recording what the application does if a prefetch log doesn't exist for the application. And, the recording isn't just done once; it's done a few times to allow for variance in the startup sequence of an application. You're right in that Prefetch is "best effort"; my assertion is that any kind of hand-drafted file that specifies order of function execution is going to be not comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a few point releases of the browser and/or the OS.
Comment 5•15 years ago
|
||
(In reply to comment #4)
> my assertion is that any kind of hand-drafted
> file that specifies order of function execution is going to be not
> comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a
> few point releases of the browser and/or the OS.
Indeed. Thats why the goal is to generate these automatically during the build process. The idea is that our worst-case performance with a rearranged binary will be about as bad as doing nothing(ie random). So there isn't much too lose, but a fair bit to gain.
Comment 6•15 years ago
|
||
I do have a hard time believing that the PGO link doesn't already take care of this for us. It should be simple enough for someone to compare the dumpbin output of a non-PGO Firefox build vs. a nightly build, right?
Comment 7•15 years ago
|
||
(In reply to comment #6)
> I do have a hard time believing that the PGO link doesn't already take care of
> this for us. It should be simple enough for someone to compare the dumpbin
> output of a non-PGO Firefox build vs. a nightly build, right?
Yes and check where XRE_main(or some other known function) is relative to the functions called from it.
Comment 8•15 years ago
|
||
(In reply to comment #6)
> I do have a hard time believing that the PGO link doesn't already take care of
> this for us. It should be simple enough for someone to compare the dumpbin
> output of a non-PGO Firefox build vs. a nightly build, right?
Looking at <http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx>, one of the PG optimizations is function layout, which seems to be taking care of this for us (still I think we need to verify this, because if the real behavior differs from the documentations, it won't be the first time!).
Comment 9•15 years ago
|
||
I think someone can test what I said in comment 8 this way:
1. Use the technique in <http://www.johnpanzer.com/aci_cuj/index.html> to figure out the actual function call order during startup on Windows.
2. Generate a PGO build, preferably using the same startup profiling test case as step 1.
3. Grab the order of functions in the PGO build using dumpbin.
4. Compare the results of steps 1 and 3.
Comment 10•15 years ago
|
||
I'm writing the tool which I suggested in comment 9. I'm adding a --enable-function-call-log mozconfig option to generate a log file containing the function calls inside Firefox in a chronological order. So far I'm struggling with the build system mostly. I will post my progress here when I have something useful.
Comment 11•15 years ago
|
||
I need to add the -Gh compiler option *only to c/cpp modules which end up to be part of Firefox*.
First I tried modifying CFLAGS/CXXFLAGS, but I found out that those variables are used for host programs such as mkdepend.exe. Then, I modified COMPILE_CFLAGS/COMPILE_CXXFLAGS, but there are many more modules inside the codebase which use those flags, and therefore end up getting -Gh upon compilation, and won't link because of the missing _penter symbol.
So I added build/win32/dummy_penter.c, which implements a dummy _penter symbol. Now, there are two problems. One of them is that as you can see inside this patch (which is not finished yet) I have to do this all over the place, and will create maintenance headaches. The other problem is that the compilation flags for dummy_penter.c differ, so for example I can't link mozalloc.dll because one of the obj files uses the static crt and the other uses the dynamic crt.
Asking for Ted's feedback to see if he knows of a better way for doing this.
Attachment #439653 -
Flags: feedback?(ted.mielczarek)
Comment 12•15 years ago
|
||
Ehsan, the other way to do this is to run the binary under wine under valgrind. I haven't had much luck with extracting symbols that way, but if this proves insufficient we can try the valgrind trick.
Comment 13•15 years ago
|
||
Well, this trick always works (if I can figure out the build system part of it, that is!), and we can be sure that the call order the same as it would be running of top of the real Win32 API, and it's just a mozconfig flag, so we can potentially get this on our release build automation as well, therefore I really prefer to do this inside the build system.
Updated•15 years ago
|
Attachment #439653 -
Flags: feedback?(blassey.bugs)
Attachment #439653 -
Flags: feedback?(benjamin)
Comment 14•15 years ago
|
||
For reference. This function-collection trick has been done on wince in bug 493427.
Comment 15•15 years ago
|
||
--callcap and --fastcap work on desktop windows as well. That might be cleaner.
Comment 16•15 years ago
|
||
things seem to be building without too many hacks. Not yet sure why xpidl is crashing.
Comment 17•15 years ago
|
||
Comment on attachment 442872 [details] [diff] [review]
wip
Is this patch correct? I think touching the HOST variables is not safe (because it will change how our host binaries are built.) The xpidl crash might be because of this, and this was my main problem when I was trying to work on this bug.
Also, there's the smaller problem that the Makefile changes need to be MSVC-specific, but we can solve it later.
Comment 18•15 years ago
|
||
(In reply to comment #17)
> (From update of attachment 442872 [details] [diff] [review])
> Is this patch correct? I think touching the HOST variables is not safe
> (because it will change how our host binaries are built.) The xpidl crash
> might be because of this, and this was my main problem when I was trying to
> work on this bug.
problem is that we need to touch those to avoid having the tracing CXXFLAGS leak into host tool building.
I haven't yet debugged as to why xpidl is being linked with tracing stuff(I think it should be using HOST_* settings for building, but it may not be).
> Also, there's the smaller problem that the Makefile changes need to be
> MSVC-specific, but we can solve it later.
Yeah. It's a WIP.
Comment 19•15 years ago
|
||
This is a hack that seems to work for collecting the log.
Currently I only log the C++ files(cos modifying CFLAGS would mean fixing xpidl linkage). This is incredibly inefficient, has no locking. So builds run slowly and sometimes crash. But at the end of a successful run firefox leaves a xul.dll.txt(+for of the other libs) log files. Tomorrow I'll try to coax the linker into accepting them.
Attachment #439653 -
Attachment is obsolete: true
Attachment #442872 -
Attachment is obsolete: true
Attachment #439653 -
Flags: feedback?(ted.mielczarek)
Attachment #439653 -
Flags: feedback?(blassey.bugs)
Attachment #439653 -
Flags: feedback?(benjamin)
Comment 20•15 years ago
|
||
so in theory this patch works. I have tested this in --enable-debug and it produced useful log files.
The proper way to use this is to specify --enable-penter --enable-debug-symbols, compile&run. Then do another normal build and override compilation of libxul to specify -order:@/path/to/xul.dll.log
Unfortunately builds with --enable-debug-symbols produce a weird error about missing msvcp09[dr].dll. Suggestions on what the heck is going on would be appreciated.
Attachment #443769 -
Attachment is obsolete: true
Comment 21•15 years ago
|
||
(In reply to comment #20)
> Unfortunately builds with --enable-debug-symbols produce a weird error about
> missing msvcp09[dr].dll. Suggestions on what the heck is going on would be
> appreciated.
Could you check firefox.exe and our DLLs with depends.exe to see which symbols we're importing from those DLLs?
Comment 22•15 years ago
|
||
Turned out I was compiling my penter_log.cpp file with wrong flags which confused the runtime linker(i think).
To use this produce 2 builds.
1) --enable-debug-symbols --enable-penter, run it a few times, get a log you like(those get written on exit)
2) compile a second build without above flags. use -order:@xul.dll.log to link
Attachment #443959 -
Attachment is obsolete: true
Comment 23•15 years ago
|
||
Here is a sample log(try it on your own machine!). My problem is that I am not seeing any significant difference in page faulting patterns.
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Created an attachment (id=444000) [details]
> sample log file
>
> Here is a sample log(try it on your own machine!). My problem is that I am not
> seeing any significant difference in page faulting patterns.
Maybe you can run an xperf hard fault profile with stacks, so that we determine which parts of the binary are mapped when and by which code. The offset of the symbols inside xul.dll can be determined using dumpbin. Then we can match the xperf log with the actual layout of the image to try to determine what's going on here, and come up with a way of improving things.
Comment 25•14 years ago
|
||
fwiw, mozilla used to use /order in the netscape days....
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•