Last Comment Bug 553721 - Improve code locality in Windows builds
: Improve code locality in Windows builds
Status: NEW
[ts][win]
:
Product: Firefox
Classification: Client Software
Component: General (show other bugs)
: Trunk
: x86 Windows XP
: -- normal with 4 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 561883
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-19 15:38 PDT by Dietrich Ayala (:dietrich)
Modified: 2010-05-24 00:32 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Build system patch (WIP) (5.53 KB, patch)
2010-04-16 17:06 PDT, :Ehsan Akhgari (busy, don't ask for review please)
no flags Details | Diff | Review
wip (4.17 KB, patch)
2010-04-30 17:44 PDT, (dormant account)
no flags Details | Diff | Review
wip (7.98 KB, patch)
2010-05-05 17:14 PDT, (dormant account)
no flags Details | Diff | Review
basically works (8.12 KB, patch)
2010-05-06 14:21 PDT, (dormant account)
no flags Details | Diff | Review
working patch (8.22 KB, patch)
2010-05-06 17:31 PDT, (dormant account)
no flags Details | Diff | Review
sample log file (204.41 KB, application/x-zip-compressed)
2010-05-06 17:32 PDT, (dormant account)
no flags Details

Description Dietrich Ayala (:dietrich) 2010-03-19 15:38:57 PDT
From Ehsan:

So, for Windows, the key is the /ORDER linker option[1].  With this
function, you should pass in a text file which specifies the exact
order in which functions should be linked.  This file, of course, uses
the mangled method names.  Here's how you can get the mangles names
for arbitrary method names, if needed: [2].  These are the basics of
how to get it to work on Windows.  IIRC, if the function order file
you pass to the linker does not contain all linked functions, the
linker will dump the unspecified functions after all specified ones,
in an unpredictable order.

[1] http://msdn.microsoft.com/en-us/library/00kh39zz.aspx
[2] http://msdn.microsoft.com/en-us/library/b06ww5dd.aspx
Comment 1 Rob Arnold [:robarnold] 2010-03-19 16:46:47 PDT
Isn't this something that PGO does for us automatically? Function-level ordering may not always be the best way (i.e. you want the hot paths on the same page and the cold paths on a different page).
Comment 2 Manoj 2010-03-20 11:06:29 PDT
Shouldn't prefetch and super-fetch make this work almost redundant? Also, the address space layout randomization in Windows Vista+ could make this a wasted effort if the code pages are moved around in memory.
Comment 3 Rob Arnold [:robarnold] 2010-03-20 11:25:08 PDT
(In reply to comment #2)
> Shouldn't prefetch and super-fetch make this work almost redundant?

The prefetch/super-fetch features predict which pages from disk will be needed but we could still improve performance by laying out related blocks of code in adjacent pages since disk seek time still matters for those of us (read: most) who don't have an SSD.

> Also, the
> address space layout randomization in Windows Vista+ could make this a wasted
> effort if the code pages are moved around in memory.

The address space randomization in Vista+ is not that sophisticated - it shifts executables up and down by increments of page size - doing otherwise could hurt performance of programs by destroying alignment. This randomizes the upper bits of addresses which is enough to cause trouble in most cases for shell code.
Comment 4 Manoj 2010-03-20 11:36:03 PDT
>>
The address space randomization in Vista+ is not that sophisticated - it shifts
executables up and down by increments of page size - doing otherwise could hurt
performance of programs by destroying alignment. This randomizes the upper bits
of addresses which is enough to cause trouble in most cases for shell code.
>>

Aah, thanks for that clarification.

As regards Prefetch, it predicts which pages need to be read from disk by recording what the application does if a prefetch log doesn't exist for the application. And, the recording isn't just done once; it's done a few times to allow for variance in the startup sequence of an application. You're right in that Prefetch is "best effort"; my assertion is that any kind of hand-drafted file that specifies order of function execution is going to be not comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a few point releases of the browser and/or the OS.
Comment 5 (dormant account) 2010-03-20 15:05:38 PDT
(In reply to comment #4)
> my assertion is that any kind of hand-drafted
> file that specifies order of function execution is going to be not
> comprehensive, tedious to maintain and suffer from bitrot/obsolescence within a
> few point releases of the browser and/or the OS.

Indeed. Thats why the goal is to generate these automatically during the build process. The idea is that our worst-case performance with a rearranged binary will be about as bad as doing nothing(ie random). So there isn't much too lose, but a fair bit to gain.
Comment 6 Ted Mielczarek [:ted.mielczarek] 2010-03-22 07:51:54 PDT
I do have a hard time believing that the PGO link doesn't already take care of this for us. It should be simple enough for someone to compare the dumpbin output of a non-PGO Firefox build vs. a nightly build, right?
Comment 7 (dormant account) 2010-03-22 08:27:27 PDT
(In reply to comment #6)
> I do have a hard time believing that the PGO link doesn't already take care of
> this for us. It should be simple enough for someone to compare the dumpbin
> output of a non-PGO Firefox build vs. a nightly build, right?

Yes and check where XRE_main(or some other known function) is relative to the functions called from it.
Comment 8 :Ehsan Akhgari (busy, don't ask for review please) 2010-03-22 09:05:32 PDT
(In reply to comment #6)
> I do have a hard time believing that the PGO link doesn't already take care of
> this for us. It should be simple enough for someone to compare the dumpbin
> output of a non-PGO Firefox build vs. a nightly build, right?

Looking at <http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx>, one of the PG optimizations is function layout, which seems to be taking care of this for us (still I think we need to verify this, because if the real behavior differs from the documentations, it won't be the first time!).
Comment 9 :Ehsan Akhgari (busy, don't ask for review please) 2010-03-30 13:08:35 PDT
I think someone can test what I said in comment 8 this way:

1. Use the technique in <http://www.johnpanzer.com/aci_cuj/index.html> to figure out the actual function call order during startup on Windows.
2. Generate a PGO build, preferably using the same startup profiling test case as step 1.
3. Grab the order of functions in the PGO build using dumpbin.
4. Compare the results of steps 1 and 3.
Comment 10 :Ehsan Akhgari (busy, don't ask for review please) 2010-04-16 16:29:16 PDT
I'm writing the tool which I suggested in comment 9.  I'm adding a --enable-function-call-log mozconfig option to generate a log file containing the function calls inside Firefox in a chronological order.  So far I'm struggling with the build system mostly.  I will post my progress here when I have something useful.
Comment 11 :Ehsan Akhgari (busy, don't ask for review please) 2010-04-16 17:06:00 PDT
Created attachment 439653 [details] [diff] [review]
Build system patch (WIP)

I need to add the -Gh compiler option *only to c/cpp modules which end up to be part of Firefox*.

First I tried modifying CFLAGS/CXXFLAGS, but I found out that those variables are used for host programs such as mkdepend.exe.  Then, I modified COMPILE_CFLAGS/COMPILE_CXXFLAGS, but there are many more modules inside the codebase which use those flags, and therefore end up getting -Gh upon compilation, and won't link because of the missing _penter symbol.

So I added build/win32/dummy_penter.c, which implements a dummy _penter symbol.  Now, there are two problems.  One of them is that as you can see inside this patch (which is not finished yet) I have to do this all over the place, and will create maintenance headaches.  The other problem is that the compilation flags for dummy_penter.c differ, so for example I can't link mozalloc.dll because one of the obj files uses the static crt and the other uses the dynamic crt.

Asking for Ted's feedback to see if he knows of a better way for doing this.
Comment 12 (dormant account) 2010-04-16 17:36:09 PDT
Ehsan, the other way to do this is to run the binary under wine under valgrind. I haven't had much luck with extracting symbols that way, but if this proves insufficient we can try the valgrind trick.
Comment 13 :Ehsan Akhgari (busy, don't ask for review please) 2010-04-19 13:42:46 PDT
Well, this trick always works (if I can figure out the build system part of it, that is!), and we can be sure that the call order the same as it would be running of top of the real Win32 API, and it's just a mozconfig flag, so we can potentially get this on our release build automation as well, therefore I really prefer to do this inside the build system.
Comment 14 (dormant account) 2010-04-20 17:23:55 PDT
For reference. This function-collection trick has been done on wince in bug 493427.
Comment 15 Brad Lassey [:blassey] (use needinfo?) 2010-04-20 19:35:13 PDT
--callcap and --fastcap work on desktop windows as well. That might be cleaner.
Comment 16 (dormant account) 2010-04-30 17:44:16 PDT
Created attachment 442872 [details] [diff] [review]
wip

things seem to be building without too many hacks. Not yet sure why xpidl is crashing.
Comment 17 :Ehsan Akhgari (busy, don't ask for review please) 2010-05-03 17:11:12 PDT
Comment on attachment 442872 [details] [diff] [review]
wip

Is this patch correct?  I think touching the HOST variables is not safe (because it will change how our host binaries are built.)  The xpidl crash might be because of this, and this was my main problem when I was trying to work on this bug.

Also, there's the smaller problem that the Makefile changes need to be MSVC-specific, but we can solve it later.
Comment 18 (dormant account) 2010-05-04 11:59:19 PDT
(In reply to comment #17)
> (From update of attachment 442872 [details] [diff] [review])
> Is this patch correct?  I think touching the HOST variables is not safe
> (because it will change how our host binaries are built.)  The xpidl crash
> might be because of this, and this was my main problem when I was trying to
> work on this bug.

problem is that we need to touch those to avoid having the tracing CXXFLAGS leak into host tool building. 

I haven't yet debugged as to why xpidl is being linked with tracing stuff(I think it should be using HOST_* settings for building, but it may not be).

> Also, there's the smaller problem that the Makefile changes need to be
> MSVC-specific, but we can solve it later.

Yeah. It's a WIP.
Comment 19 (dormant account) 2010-05-05 17:14:49 PDT
Created attachment 443769 [details] [diff] [review]
wip

This is a hack that seems to work for collecting the log.
Currently I only log the C++ files(cos modifying CFLAGS would mean fixing xpidl linkage). This is incredibly inefficient, has no locking. So builds run slowly and sometimes crash. But at the end of a successful run firefox leaves a xul.dll.txt(+for of the other libs) log files. Tomorrow I'll try to coax the linker into accepting them.
Comment 20 (dormant account) 2010-05-06 14:21:03 PDT
Created attachment 443959 [details] [diff] [review]
basically works

so in theory this patch works. I have tested this in --enable-debug and it produced useful log files. 
The proper way to use this is to specify --enable-penter --enable-debug-symbols, compile&run. Then do another normal build and override compilation of libxul to specify -order:@/path/to/xul.dll.log

Unfortunately builds with --enable-debug-symbols produce a weird error about missing msvcp09[dr].dll. Suggestions on what the heck is going on would be appreciated.
Comment 21 :Ehsan Akhgari (busy, don't ask for review please) 2010-05-06 16:27:00 PDT
(In reply to comment #20)
> Unfortunately builds with --enable-debug-symbols produce a weird error about
> missing msvcp09[dr].dll. Suggestions on what the heck is going on would be
> appreciated.

Could you check firefox.exe and our DLLs with depends.exe to see which symbols we're importing from those DLLs?
Comment 22 (dormant account) 2010-05-06 17:31:18 PDT
Created attachment 443999 [details] [diff] [review]
working patch

Turned out I was compiling my penter_log.cpp file with wrong flags which confused the runtime linker(i think).

To use this produce 2 builds.
1) --enable-debug-symbols --enable-penter, run it a few times, get a log you like(those get written on exit)
2) compile a second build without above flags. use -order:@xul.dll.log  to link
Comment 23 (dormant account) 2010-05-06 17:32:39 PDT
Created attachment 444000 [details]
sample log file

Here is a sample log(try it on your own machine!). My problem is that I am not seeing any significant difference in page faulting patterns.
Comment 24 :Ehsan Akhgari (busy, don't ask for review please) 2010-05-07 10:56:08 PDT
(In reply to comment #23)
> Created an attachment (id=444000) [details]
> sample log file
> 
> Here is a sample log(try it on your own machine!). My problem is that I am not
> seeing any significant difference in page faulting patterns.

Maybe you can run an xperf hard fault profile with stacks, so that we determine which parts of the binary are mapped when and by which code.  The offset of the symbols inside xul.dll can be determined using dumpbin.  Then we can match the xperf log with the actual layout of the image to try to determine what's going on here, and come up with a way of improving things.
Comment 25 timeless 2010-05-24 00:32:15 PDT
fwiw, mozilla used to use /order in the netscape days....

Note You need to log in before you can comment on or make changes to this bug.