Closed Bug 871712 Opened 7 years ago Closed 7 years ago

Investigate which files and directories do not participate in the PGO profile and exclude them from the PGO build

Categories

(Core :: General, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: ehsan, Assigned: ted)

References

Details

Attachments

(5 files, 1 obsolete file)

I met with some Microsoft compiler folks last week, and we discussed this issue in detail.  Their suggestion was to stop PGOing the files and directories which do not matter in terms of PGO wins.

The idea here is to use the pgomgr tool to come up with a list of files and directories which do not benefit a lot from PGO, disable PGO on them, and see how much of a win that would be in terms of linker memory usage, rinse and repeat.

Please find some of the information that my Microsoft contacts provided below.

·         What is a .pgd file ?

·         A .pgd file is an encapsulation of multiple .pgc files generated during the training phase of the PGO build. The .pgd files include all the training data required and used by the optimize phase (backend compiler) of the PGO process to make optimization decisions.  

 

·         How to selectively find and remove cold functions/modules from a PGO session

·         We provide a tool called ‘pgomgr’ <http://msdn.microsoft.com/en-us/library/2kw46d8w(v=vs.80).aspx>. Among other things, it can be used to understand the hotness of methods/functions. This will help to identify scenario cold functions. Removing these scenario cold functions should have minimal impact on performance
and ease the pain from the memory utilization perspective.

pgomgr /summary pgd-filename.pgd

Using this command, you should verbose output on per function statistics. For example take a look at the logfile attached from running the ‘pgomgr /summary NBodyGravityCPU.pgd >> logfile’
Functions with lower ‘dynamic instruction count’ and lower ‘%’ are essentially scenario cold functions. Removing modules attributing to these should have minimal performance impact.
At the same time, we should make pgo opt-in on windows, instead of opt-out.
(In reply to comment #1)
> At the same time, we should make pgo opt-in on windows, instead of opt-out.

With the data in hand, that would be a good implementation strategy indeed.
We don't seem to update PGO profiling script (Current PGO is based by bug 418772.) several years.

We may have to consider that this profiling script fits usages in current.
Changing the PGO profiling script from "start and shutdown the browser" to "start the browser, run a bunch of tests and shutdown" had almost un-measurable perf impact. I don't think that's a worthwhile thing to pursue. Most of our PGO perf wins are from exercising code that gets hit during startup, it seems.
The current PGO scripts are doing "start the browser, run a bunch of tests and shutdown". They're however not running a lot of tests, so code such as canvas is not exercised. We would probably get performance boosts from actually specifically profiling those. But let's already get memory usage down, while avoiding impact on what we do profile, and see about running more tests later.
(In reply to comment #5)
> The current PGO scripts are doing "start the browser, run a bunch of tests and
> shutdown". They're however not running a lot of tests, so code such as canvas
> is not exercised. We would probably get performance boosts from actually
> specifically profiling those. But let's already get memory usage down, while
> avoiding impact on what we do profile, and see about running more tests later.

Yes, I agree.  What the profile generation script does is sort of orthogonal to our goal here, and hopefully we can do the same thing that we're going to do here again the next time that we want to update the profile generation script.
Ted, can you attach the summary from pgomgr to this bug as you do this?  I'm pretty curious to know what it sees as the hot functions.
It's 23MB of text, so I uploaded it here (2.2MB gzipped):
http://people.mozilla.com/~tmielczarek/xul-pgo.txt.gz

The #1 hottest function in libxul is apparently nsID::Equals.
This is not at all friendly to scripting, but I'm working on it. What we really want is a list of object files and the cumulative total of their functions' dynamic instruction count, so we can find places to disable PGO. Unfortunately the pgomgr output only gives function names, so for functions declared inline in header files there are multiple lines in the output with the same function name. I wrote a Python script to munge the pgomgr output into a slightly more usable format, and I wrote a little tool to use DIA to output a list of object file,function name, and now I'm going to try to mash them together and see if I can get sensible results.
The data is a little bit suspect (for reasons described above, I'll share more in a minute), but here are the dynamic instruction counts summed up per directory (within the objdir) from a local PGO build. It seems like a no-brainer to disable PGO in the directories with zero instructions executed.
Here's the same data but broken down by object file.
So, per comment 9, it's hard to get really accurate data here since pgomgr will only give me function names, and I really want object files. I wrote a tool to dump all the functions from the PDB file and their associated object files using DIA:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/dumpfuncs

Then I wrote a couple of scripts to munge the output of pgomgr and the output of this tool into the output attached above:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pgo-datamangling

The workflow went like:
cd $objdir/toolkit/library
pgomgr -summary -unique xul.pgd > xul.pgosummary
DumpFuncs.exe xul.pdb > xul.funcs
python pgosummary.py < xul.pgosummary > xul.pgodat
python pgomangle.py xul.funcs xul.pgodat > xul.dircounts

(pgomangle.py needs to be edited to switch between outputing dircounts and objcounts.)

The biggest caveat here is that pgomangle.py outputs:
Dropping 574070188/2301629391 instructions [24.94%] (64869/166244 functions [39.02%])

meaning that nearly 25% of the instructions executed, and 39% of the functions in the pgomgr output could not be mapped back to an object file properly. I spent a lot of time looking at this and I couldn't figure out a better way to deal with that. If anyone has any ideas I'm all ears.

Even with this imperfect data we should be able to make some improvements.
This script takes the per-directory output from above and modifies Makefiles to opt-in to PGO if their instruction count is above a certain level. I arbitrarily chose 100k, we could tweak that.
This switches MSVC PGO from opt-out to opt-in. Directories will need to set MSVC_ENABLE_PGO = 1 in their Makefiles to opt-in.
Attachment #749960 - Flags: review?(mh+mozilla)
Attachment #749960 - Flags: review?(mh+mozilla) → review+
And this is the result of running the above script on the previously attached output. This is not quite ready because it only enables PGO for things that are linked into libxul, whereas currently we can PGO js and gkmedias libraries.
I manually added js/src, and I did some scripting to add everything that gets linked into gkmedias. I guess I'm still missing mozalloc/mozglue. Do those matter?
Attachment #749965 - Attachment is obsolete: true
(In reply to comment #16)
> Created attachment 749981 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=749981&action=edit
> opt-in to MSVC PGO in the directories that matter
> 
> I manually added js/src, and I did some scripting to add everything that gets
> linked into gkmedias. I guess I'm still missing mozalloc/mozglue. Do those
> matter?

It shouldn't.
tbsaunde pointed out that it was curious that netwerk/protocol/ftp shows up as relatively hot in this data, and I agreed, since we don't actually run any FTP code during the profiling run. I poked around, and it turns out that almost all of the instructions executed are from:
nsTArray_base<nsTArrayInfallibleAllocator>::IsEmpty(void)       1506798

So this is just ICF folding all the implementations of that templated method under the name of the one from FTPChannelChild, and it's apparently a reasonably hot method, so it inflates the stats for this directory. This is unfortunate, but we can probably just look at the list of top directories manually to see if there are any other weird outliers like this.
I'll push this to Try when it's open again.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16)
> I guess I'm still missing mozalloc/mozglue. Do those matter?

mozalloc is essentially a wrapper, so it probably doesn't count, but mozglue contains jemalloc, it might be better to PGO it.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> tbsaunde pointed out that it was curious that netwerk/protocol/ftp shows up
> as relatively hot in this data, and I agreed, since we don't actually run
> any FTP code during the profiling run. I poked around, and it turns out that
> almost all of the instructions executed are from:
> nsTArray_base<nsTArrayInfallibleAllocator>::IsEmpty(void)       1506798
> 
> So this is just ICF folding all the implementations of that templated method
> under the name of the one from FTPChannelChild, and it's apparently a
> reasonably hot method, so it inflates the stats for this directory. This is
> unfortunate, but we can probably just look at the list of top directories
> manually to see if there are any other weird outliers like this.

Or... you could try getting new data with ICF disabled.
For nsID::Equals -- shall we revive bug 164580, and see how things are in the new world of Sandy Bridge/Haswell and ARMv7?
Pushed a Windows PGO build to Try with a full set of Talos runs:
https://tbpl.mozilla.org/?tree=Try&rev=3dc2a576e9ff
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> Pushed a Windows PGO build to Try with a full set of Talos runs:
> https://tbpl.mozilla.org/?tree=Try&rev=3dc2a576e9ff

linker max vsize: 3218993152

\o/
The try push looks pretty good. The vsize number looks lower:
linker max vsize: 3218993152

I used jmaher's compare.py from the Talos repo (with a local patch from jmaher to fix some issues) to compare results:
Win:
    tdhtmlr: No results found
    :) tresize: 27.6884 -> 30.3916; 27.68.   [N/A] [PGO: N/A]
    tdhtmlr_nochrome: No results found
    :) tspaint_places_med: 879.053 -> 1001.58; 784.526.   [N/A] [PGO: N/A]
    :) tspaint_places_max: 875.158 -> 998.368; 787.789.   [N/A] [PGO: N/A]
    :) kraken: 3315.0 -> 3572.0; 3131.5.   [N/A] [PGO: N/A]
    :( v8: 8311.95 -> 9468.1; 9590.99.   [N/A] [PGO: N/A]
    :) dromaeo_css: 2410.51 -> 2895.04; 3206.49.   [N/A] [PGO: N/A]
    :) dromaeo_dom: 337.907 -> 399.28; 487.813.   [N/A] [PGO: N/A]
    :) tscrollr: 10094.2 -> 10106.0; 10089.9.   [N/A] [PGO: N/A]
    :) a11yr: 380.5 -> 400.0; 254.0.   [N/A] [PGO: N/A]
    :) ts_paint: 880.105 -> 992.368; 786.737.   [N/A] [PGO: N/A]
    :) tpaint: 246.0 -> 344.0; 198.0.   [N/A] [PGO: N/A]
    tsvgr: 2202.9 -> 3421.95; 3357.27.   [N/A] [PGO: N/A]
    tsvgr_opacity: 180.5 -> 499.5; 320.5.   [N/A] [PGO: N/A]
    tp5n: No results found
    :) tp5o: 359.708 -> 432.74; 274.083.   [N/A] [PGO: N/A]

Everything looks like it is either within the normal range or better. the v8 results are actually OK, the script just doesn't know that higher-is-better for v8.

Ehsan triggered a bunch more jobs to sanity check, but I think we're ok here. This might actually have *improved* perf slightly, somehow.
For comparison, from a recent inbound Windows PGO build:
linker max vsize: 3709313024
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

Ok, we're probably good to go with this then.
Attachment #749981 - Flags: review?(mh+mozilla)
OS: Mac OS X → Windows 7
Took about 50 minutes off the build time too.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #25)
> The try push looks pretty good. The vsize number looks lower:
> linker max vsize: 3218993152

That's the level we had when the graphs start, in january. With the switch to PGO as opt-in, we can expect that future growth will be slower too, which means we should be good until well into next year. Although, if we do want to switch to 2012 at some point, we have to remember that 2012 does suck more memory than 2010.
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

Review of attachment 749981 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/library/Makefile.in
@@ +20,5 @@
>  ifeq ($(MOZ_WIDGET_TOOLKIT),cocoa)
>  # This is going to be a framework named "XUL", not an ordinary library named
>  # "libxul.dylib"
>  LIBRARY_NAME=XUL
> +MSVC_ENABLE_PGO := 1

You don't need this one :)
Attachment #749981 - Flags: review?(mh+mozilla) → review+
This patch also gives everyone a pony. Folded into one changeset for landing, fixed the issue glandium pointed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9bfd469a26df
Blocks: 873038
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

If we are PGOing widget/windows should we add the other widget backends too (since a Windows only analysis generated this)?
https://hg.mozilla.org/mozilla-central/rev/9bfd469a26df
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
(In reply to comment #32)
> Comment on attachment 749981 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=749981
> opt-in to MSVC PGO in the directories that matter
> 
> If we are PGOing widget/windows should we add the other widget backends too
> (since a Windows only analysis generated this)?

You mean on non-Windows builds?  Those don't have the opt-in PGO requirements, so everything will be PGOed there.
Depends on: 874141
Should we re-run this analysis periodically to see if any directories need to be added or removed from the whitelist?
(In reply to comment #35)
> Should we re-run this analysis periodically to see if any directories need to
> be added or removed from the whitelist?

We should do that if we get close to the ceiling again.  Otherwise, I'm not sure if it adds too much benefit, unless it's really easy for anyone to just run a script.
It's not unreasonable, but it's also unlikely to be super useful given how crappy the data is.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #36)
> We should do that if we get close to the ceiling again.  Otherwise, I'm not
> sure if it adds too much benefit, unless it's really easy for anyone to just
> run a script.

I'm mostly concerned about new directories (from code that is added or moved) that should be included in PGO but are not.  We could be hurting performance by excluding them for no good reason when we are not approaching the ceiling.
(In reply to comment #38)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #36)
> > We should do that if we get close to the ceiling again.  Otherwise, I'm not
> > sure if it adds too much benefit, unless it's really easy for anyone to just
> > run a script.
> 
> I'm mostly concerned about new directories (from code that is added or moved)
> that should be included in PGO but are not.  We could be hurting performance by
> excluding them for no good reason when we are not approaching the ceiling.

Do we do that often enough?
You need to log in before you can comment on or make changes to this bug.