Open Bug 553721 Opened 15 years ago Updated 2 years ago

Improve code locality in Windows builds

Categories

(Firefox :: General, defect)

x86
Windows XP
defect

Tracking

()

People

(Reporter: dietrich, Unassigned)

References

Details

(Whiteboard: [ts][win])

Attachments

(2 files, 4 obsolete files)

From Ehsan: So, for Windows, the key is the /ORDER linker option[1]. With this function, you should pass in a text file which specifies the exact order in which functions should be linked. This file, of course, uses the mangled method names. Here's how you can get the mangles names for arbitrary method names, if needed: [2]. These are the basics of how to get it to work on Windows. IIRC, if the function order file you pass to the linker does not contain all linked functions, the linker will dump the unspecified functions after all specified ones, in an unpredictable order. [1] http://msdn.microsoft.com/en-us/library/00kh39zz.aspx [2] http://msdn.microsoft.com/en-us/library/b06ww5dd.aspx
Whiteboard: [ts]
Whiteboard: [ts] → [ts][win]
Isn't this something that PGO does for us automatically? Function-level ordering may not always be the best way (i.e. you want the hot paths on the same page and the cold paths on a different page).
Shouldn't prefetch and super-fetch make this work almost redundant? Also, the address space layout randomization in Windows Vista+ could make this a wasted effort if the code pages are moved around in memory.
(In reply to comment #2) > Shouldn't prefetch and super-fetch make this work almost redundant? The prefetch/super-fetch features predict which pages from disk will be needed but we could still improve performance by laying out related blocks of code in adjacent pages since disk seek time still matters for those of us (read: most) who don't have an SSD. > Also, the > address space layout randomization in Windows Vista+ could make this a wasted > effort if the code pages are moved around in memory. The address space randomization in Vista+ is not that sophisticated - it shifts executables up and down by increments of page size - doing otherwise could hurt performance of programs by destroying alignment. This randomizes the upper bits of addresses which is enough to cause trouble in most cases for shell code.
>> The address space randomization in Vista+ is not that sophisticated - it shifts executables up and down by increments of page size - doing otherwise could hurt performance of programs by destroying alignment. This randomizes the upper bits of addresses which is enough to cause trouble in most cases for shell code. >> Aah, thanks for that clarification. As regards Prefetch, it predicts which pages need to be read from disk by recording what the application does if a prefetch log doesn't exist for the application. And, the recording isn't just done once; it's done a few times to allow for variance in the startup sequence of an application. You're right in that Prefetch is "best effort"; my assertion is that any kind of hand-drafted file that specifies order of function execution is going to be not comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a few point releases of the browser and/or the OS.
(In reply to comment #4) > my assertion is that any kind of hand-drafted > file that specifies order of function execution is going to be not > comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a > few point releases of the browser and/or the OS. Indeed. Thats why the goal is to generate these automatically during the build process. The idea is that our worst-case performance with a rearranged binary will be about as bad as doing nothing(ie random). So there isn't much too lose, but a fair bit to gain.
I do have a hard time believing that the PGO link doesn't already take care of this for us. It should be simple enough for someone to compare the dumpbin output of a non-PGO Firefox build vs. a nightly build, right?
(In reply to comment #6) > I do have a hard time believing that the PGO link doesn't already take care of > this for us. It should be simple enough for someone to compare the dumpbin > output of a non-PGO Firefox build vs. a nightly build, right? Yes and check where XRE_main(or some other known function) is relative to the functions called from it.
(In reply to comment #6) > I do have a hard time believing that the PGO link doesn't already take care of > this for us. It should be simple enough for someone to compare the dumpbin > output of a non-PGO Firefox build vs. a nightly build, right? Looking at <http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx>, one of the PG optimizations is function layout, which seems to be taking care of this for us (still I think we need to verify this, because if the real behavior differs from the documentations, it won't be the first time!).
I think someone can test what I said in comment 8 this way: 1. Use the technique in <http://www.johnpanzer.com/aci_cuj/index.html> to figure out the actual function call order during startup on Windows. 2. Generate a PGO build, preferably using the same startup profiling test case as step 1. 3. Grab the order of functions in the PGO build using dumpbin. 4. Compare the results of steps 1 and 3.
I'm writing the tool which I suggested in comment 9. I'm adding a --enable-function-call-log mozconfig option to generate a log file containing the function calls inside Firefox in a chronological order. So far I'm struggling with the build system mostly. I will post my progress here when I have something useful.
Attached patch Build system patch (WIP) (obsolete) — Splinter Review
I need to add the -Gh compiler option *only to c/cpp modules which end up to be part of Firefox*. First I tried modifying CFLAGS/CXXFLAGS, but I found out that those variables are used for host programs such as mkdepend.exe. Then, I modified COMPILE_CFLAGS/COMPILE_CXXFLAGS, but there are many more modules inside the codebase which use those flags, and therefore end up getting -Gh upon compilation, and won't link because of the missing _penter symbol. So I added build/win32/dummy_penter.c, which implements a dummy _penter symbol. Now, there are two problems. One of them is that as you can see inside this patch (which is not finished yet) I have to do this all over the place, and will create maintenance headaches. The other problem is that the compilation flags for dummy_penter.c differ, so for example I can't link mozalloc.dll because one of the obj files uses the static crt and the other uses the dynamic crt. Asking for Ted's feedback to see if he knows of a better way for doing this.
Attachment #439653 - Flags: feedback?(ted.mielczarek)
Ehsan, the other way to do this is to run the binary under wine under valgrind. I haven't had much luck with extracting symbols that way, but if this proves insufficient we can try the valgrind trick.
Well, this trick always works (if I can figure out the build system part of it, that is!), and we can be sure that the call order the same as it would be running of top of the real Win32 API, and it's just a mozconfig flag, so we can potentially get this on our release build automation as well, therefore I really prefer to do this inside the build system.
Attachment #439653 - Flags: feedback?(blassey.bugs)
Attachment #439653 - Flags: feedback?(benjamin)
For reference. This function-collection trick has been done on wince in bug 493427.
--callcap and --fastcap work on desktop windows as well. That might be cleaner.
Depends on: 561883
Attached patch wip (obsolete) — Splinter Review
things seem to be building without too many hacks. Not yet sure why xpidl is crashing.
Comment on attachment 442872 [details] [diff] [review] wip Is this patch correct? I think touching the HOST variables is not safe (because it will change how our host binaries are built.) The xpidl crash might be because of this, and this was my main problem when I was trying to work on this bug. Also, there's the smaller problem that the Makefile changes need to be MSVC-specific, but we can solve it later.
(In reply to comment #17) > (From update of attachment 442872 [details] [diff] [review]) > Is this patch correct? I think touching the HOST variables is not safe > (because it will change how our host binaries are built.) The xpidl crash > might be because of this, and this was my main problem when I was trying to > work on this bug. problem is that we need to touch those to avoid having the tracing CXXFLAGS leak into host tool building. I haven't yet debugged as to why xpidl is being linked with tracing stuff(I think it should be using HOST_* settings for building, but it may not be). > Also, there's the smaller problem that the Makefile changes need to be > MSVC-specific, but we can solve it later. Yeah. It's a WIP.
Attached patch wip (obsolete) — Splinter Review
This is a hack that seems to work for collecting the log. Currently I only log the C++ files(cos modifying CFLAGS would mean fixing xpidl linkage). This is incredibly inefficient, has no locking. So builds run slowly and sometimes crash. But at the end of a successful run firefox leaves a xul.dll.txt(+for of the other libs) log files. Tomorrow I'll try to coax the linker into accepting them.
Attachment #439653 - Attachment is obsolete: true
Attachment #442872 - Attachment is obsolete: true
Attachment #439653 - Flags: feedback?(ted.mielczarek)
Attachment #439653 - Flags: feedback?(blassey.bugs)
Attachment #439653 - Flags: feedback?(benjamin)
Attached patch basically works (obsolete) — Splinter Review
so in theory this patch works. I have tested this in --enable-debug and it produced useful log files. The proper way to use this is to specify --enable-penter --enable-debug-symbols, compile&run. Then do another normal build and override compilation of libxul to specify -order:@/path/to/xul.dll.log Unfortunately builds with --enable-debug-symbols produce a weird error about missing msvcp09[dr].dll. Suggestions on what the heck is going on would be appreciated.
Attachment #443769 - Attachment is obsolete: true
(In reply to comment #20) > Unfortunately builds with --enable-debug-symbols produce a weird error about > missing msvcp09[dr].dll. Suggestions on what the heck is going on would be > appreciated. Could you check firefox.exe and our DLLs with depends.exe to see which symbols we're importing from those DLLs?
Attached patch working patchSplinter Review
Turned out I was compiling my penter_log.cpp file with wrong flags which confused the runtime linker(i think). To use this produce 2 builds. 1) --enable-debug-symbols --enable-penter, run it a few times, get a log you like(those get written on exit) 2) compile a second build without above flags. use -order:@xul.dll.log to link
Attachment #443959 - Attachment is obsolete: true
Attached file sample log file
Here is a sample log(try it on your own machine!). My problem is that I am not seeing any significant difference in page faulting patterns.
(In reply to comment #23) > Created an attachment (id=444000) [details] > sample log file > > Here is a sample log(try it on your own machine!). My problem is that I am not > seeing any significant difference in page faulting patterns. Maybe you can run an xperf hard fault profile with stacks, so that we determine which parts of the binary are mapped when and by which code. The offset of the symbols inside xul.dll can be determined using dumpbin. Then we can match the xperf log with the actual layout of the image to try to determine what's going on here, and come up with a way of improving things.
fwiw, mozilla used to use /order in the netscape days....
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: