Investigate which files and directories do not participate in the PGO profile and exclude them from the PGO build

RESOLVED FIXED in mozilla24

Status

()

Core
General
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Away for a while, Assigned: ted)

Tracking

Trunk
mozilla24
x86
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
I met with some Microsoft compiler folks last week, and we discussed this issue in detail.  Their suggestion was to stop PGOing the files and directories which do not matter in terms of PGO wins.

The idea here is to use the pgomgr tool to come up with a list of files and directories which do not benefit a lot from PGO, disable PGO on them, and see how much of a win that would be in terms of linker memory usage, rinse and repeat.

Please find some of the information that my Microsoft contacts provided below.

·         What is a .pgd file ?

·         A .pgd file is an encapsulation of multiple .pgc files generated during the training phase of the PGO build. The .pgd files include all the training data required and used by the optimize phase (backend compiler) of the PGO process to make optimization decisions.  

 

·         How to selectively find and remove cold functions/modules from a PGO session

·         We provide a tool called ‘pgomgr’ <http://msdn.microsoft.com/en-us/library/2kw46d8w(v=vs.80).aspx>. Among other things, it can be used to understand the hotness of methods/functions. This will help to identify scenario cold functions. Removing these scenario cold functions should have minimal impact on performance
and ease the pain from the memory utilization perspective.

pgomgr /summary pgd-filename.pgd

Using this command, you should verbose output on per function statistics. For example take a look at the logfile attached from running the ‘pgomgr /summary NBodyGravityCPU.pgd >> logfile’
Functions with lower ‘dynamic instruction count’ and lower ‘%’ are essentially scenario cold functions. Removing modules attributing to these should have minimal performance impact.
At the same time, we should make pgo opt-in on windows, instead of opt-out.
(Reporter)

Comment 2

5 years ago
(In reply to comment #1)
> At the same time, we should make pgo opt-in on windows, instead of opt-out.

With the data in hand, that would be a good implementation strategy indeed.
We don't seem to update PGO profiling script (Current PGO is based by bug 418772.) several years.

We may have to consider that this profiling script fits usages in current.
(Assignee)

Comment 4

5 years ago
Changing the PGO profiling script from "start and shutdown the browser" to "start the browser, run a bunch of tests and shutdown" had almost un-measurable perf impact. I don't think that's a worthwhile thing to pursue. Most of our PGO perf wins are from exercising code that gets hit during startup, it seems.
The current PGO scripts are doing "start the browser, run a bunch of tests and shutdown". They're however not running a lot of tests, so code such as canvas is not exercised. We would probably get performance boosts from actually specifically profiling those. But let's already get memory usage down, while avoiding impact on what we do profile, and see about running more tests later.
(Reporter)

Comment 6

5 years ago
(In reply to comment #5)
> The current PGO scripts are doing "start the browser, run a bunch of tests and
> shutdown". They're however not running a lot of tests, so code such as canvas
> is not exercised. We would probably get performance boosts from actually
> specifically profiling those. But let's already get memory usage down, while
> avoiding impact on what we do profile, and see about running more tests later.

Yes, I agree.  What the profile generation script does is sort of orthogonal to our goal here, and hopefully we can do the same thing that we're going to do here again the next time that we want to update the profile generation script.
Ted, can you attach the summary from pgomgr to this bug as you do this?  I'm pretty curious to know what it sees as the hot functions.
(Assignee)

Comment 8

5 years ago
It's 23MB of text, so I uploaded it here (2.2MB gzipped):
http://people.mozilla.com/~tmielczarek/xul-pgo.txt.gz

The #1 hottest function in libxul is apparently nsID::Equals.
(Assignee)

Comment 9

5 years ago
This is not at all friendly to scripting, but I'm working on it. What we really want is a list of object files and the cumulative total of their functions' dynamic instruction count, so we can find places to disable PGO. Unfortunately the pgomgr output only gives function names, so for functions declared inline in header files there are multiple lines in the output with the same function name. I wrote a Python script to munge the pgomgr output into a slightly more usable format, and I wrote a little tool to use DIA to output a list of object file,function name, and now I'm going to try to mash them together and see if I can get sensible results.
(Assignee)

Comment 10

5 years ago
Created attachment 749938 [details]
instruction counts of the PGO profile run per-directory

The data is a little bit suspect (for reasons described above, I'll share more in a minute), but here are the dynamic instruction counts summed up per directory (within the objdir) from a local PGO build. It seems like a no-brainer to disable PGO in the directories with zero instructions executed.
(Assignee)

Comment 11

5 years ago
Created attachment 749939 [details]
instruction counts of the PGO profile run per-object file

Here's the same data but broken down by object file.
(Assignee)

Comment 12

5 years ago
So, per comment 9, it's hard to get really accurate data here since pgomgr will only give me function names, and I really want object files. I wrote a tool to dump all the functions from the PDB file and their associated object files using DIA:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/dumpfuncs

Then I wrote a couple of scripts to munge the output of pgomgr and the output of this tool into the output attached above:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/pgo-datamangling

The workflow went like:
cd $objdir/toolkit/library
pgomgr -summary -unique xul.pgd > xul.pgosummary
DumpFuncs.exe xul.pdb > xul.funcs
python pgosummary.py < xul.pgosummary > xul.pgodat
python pgomangle.py xul.funcs xul.pgodat > xul.dircounts

(pgomangle.py needs to be edited to switch between outputing dircounts and objcounts.)

The biggest caveat here is that pgomangle.py outputs:
Dropping 574070188/2301629391 instructions [24.94%] (64869/166244 functions [39.02%])

meaning that nearly 25% of the instructions executed, and 39% of the functions in the pgomgr output could not be mapped back to an object file properly. I spent a lot of time looking at this and I couldn't figure out a better way to deal with that. If anyone has any ideas I'm all ears.

Even with this imperfect data we should be able to make some improvements.
(Assignee)

Comment 13

5 years ago
Created attachment 749957 [details]
script to enable PGO in certain dirs

This script takes the per-directory output from above and modifies Makefiles to opt-in to PGO if their instruction count is above a certain level. I arbitrarily chose 100k, we could tweak that.
(Assignee)

Comment 14

5 years ago
Created attachment 749960 [details] [diff] [review]
make MSVC PGO opt-in per-directory

This switches MSVC PGO from opt-out to opt-in. Directories will need to set MSVC_ENABLE_PGO = 1 in their Makefiles to opt-in.
Attachment #749960 - Flags: review?(mh+mozilla)
Attachment #749960 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 15

5 years ago
Created attachment 749965 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter for libxul

And this is the result of running the above script on the previously attached output. This is not quite ready because it only enables PGO for things that are linked into libxul, whereas currently we can PGO js and gkmedias libraries.
(Assignee)

Comment 16

5 years ago
Created attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

I manually added js/src, and I did some scripting to add everything that gets linked into gkmedias. I guess I'm still missing mozalloc/mozglue. Do those matter?
(Assignee)

Updated

5 years ago
Attachment #749965 - Attachment is obsolete: true
(Reporter)

Comment 17

5 years ago
(In reply to comment #16)
> Created attachment 749981 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=749981&action=edit
> opt-in to MSVC PGO in the directories that matter
> 
> I manually added js/src, and I did some scripting to add everything that gets
> linked into gkmedias. I guess I'm still missing mozalloc/mozglue. Do those
> matter?

It shouldn't.
(Assignee)

Comment 18

5 years ago
tbsaunde pointed out that it was curious that netwerk/protocol/ftp shows up as relatively hot in this data, and I agreed, since we don't actually run any FTP code during the profiling run. I poked around, and it turns out that almost all of the instructions executed are from:
nsTArray_base<nsTArrayInfallibleAllocator>::IsEmpty(void)       1506798

So this is just ICF folding all the implementations of that templated method under the name of the one from FTPChannelChild, and it's apparently a reasonably hot method, so it inflates the stats for this directory. This is unfortunate, but we can probably just look at the list of top directories manually to see if there are any other weird outliers like this.
(Assignee)

Comment 19

5 years ago
I'll push this to Try when it's open again.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16)
> I guess I'm still missing mozalloc/mozglue. Do those matter?

mozalloc is essentially a wrapper, so it probably doesn't count, but mozglue contains jemalloc, it might be better to PGO it.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> tbsaunde pointed out that it was curious that netwerk/protocol/ftp shows up
> as relatively hot in this data, and I agreed, since we don't actually run
> any FTP code during the profiling run. I poked around, and it turns out that
> almost all of the instructions executed are from:
> nsTArray_base<nsTArrayInfallibleAllocator>::IsEmpty(void)       1506798
> 
> So this is just ICF folding all the implementations of that templated method
> under the name of the one from FTPChannelChild, and it's apparently a
> reasonably hot method, so it inflates the stats for this directory. This is
> unfortunate, but we can probably just look at the list of top directories
> manually to see if there are any other weird outliers like this.

Or... you could try getting new data with ICF disabled.
For nsID::Equals -- shall we revive bug 164580, and see how things are in the new world of Sandy Bridge/Haswell and ARMv7?
(Assignee)

Comment 23

5 years ago
Pushed a Windows PGO build to Try with a full set of Talos runs:
https://tbpl.mozilla.org/?tree=Try&rev=3dc2a576e9ff
(Reporter)

Comment 24

5 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> Pushed a Windows PGO build to Try with a full set of Talos runs:
> https://tbpl.mozilla.org/?tree=Try&rev=3dc2a576e9ff

linker max vsize: 3218993152

\o/
(Assignee)

Comment 25

5 years ago
The try push looks pretty good. The vsize number looks lower:
linker max vsize: 3218993152

I used jmaher's compare.py from the Talos repo (with a local patch from jmaher to fix some issues) to compare results:
Win:
    tdhtmlr: No results found
    :) tresize: 27.6884 -> 30.3916; 27.68.   [N/A] [PGO: N/A]
    tdhtmlr_nochrome: No results found
    :) tspaint_places_med: 879.053 -> 1001.58; 784.526.   [N/A] [PGO: N/A]
    :) tspaint_places_max: 875.158 -> 998.368; 787.789.   [N/A] [PGO: N/A]
    :) kraken: 3315.0 -> 3572.0; 3131.5.   [N/A] [PGO: N/A]
    :( v8: 8311.95 -> 9468.1; 9590.99.   [N/A] [PGO: N/A]
    :) dromaeo_css: 2410.51 -> 2895.04; 3206.49.   [N/A] [PGO: N/A]
    :) dromaeo_dom: 337.907 -> 399.28; 487.813.   [N/A] [PGO: N/A]
    :) tscrollr: 10094.2 -> 10106.0; 10089.9.   [N/A] [PGO: N/A]
    :) a11yr: 380.5 -> 400.0; 254.0.   [N/A] [PGO: N/A]
    :) ts_paint: 880.105 -> 992.368; 786.737.   [N/A] [PGO: N/A]
    :) tpaint: 246.0 -> 344.0; 198.0.   [N/A] [PGO: N/A]
    tsvgr: 2202.9 -> 3421.95; 3357.27.   [N/A] [PGO: N/A]
    tsvgr_opacity: 180.5 -> 499.5; 320.5.   [N/A] [PGO: N/A]
    tp5n: No results found
    :) tp5o: 359.708 -> 432.74; 274.083.   [N/A] [PGO: N/A]

Everything looks like it is either within the normal range or better. the v8 results are actually OK, the script just doesn't know that higher-is-better for v8.

Ehsan triggered a bunch more jobs to sanity check, but I think we're ok here. This might actually have *improved* perf slightly, somehow.
(Assignee)

Comment 26

5 years ago
For comparison, from a recent inbound Windows PGO build:
linker max vsize: 3709313024
(Assignee)

Comment 27

5 years ago
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

Ok, we're probably good to go with this then.
Attachment #749981 - Flags: review?(mh+mozilla)
(Assignee)

Updated

5 years ago
OS: Mac OS X → Windows 7
Took about 50 minutes off the build time too.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #25)
> The try push looks pretty good. The vsize number looks lower:
> linker max vsize: 3218993152

That's the level we had when the graphs start, in january. With the switch to PGO as opt-in, we can expect that future growth will be slower too, which means we should be good until well into next year. Although, if we do want to switch to 2012 at some point, we have to remember that 2012 does suck more memory than 2010.
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

Review of attachment 749981 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/library/Makefile.in
@@ +20,5 @@
>  ifeq ($(MOZ_WIDGET_TOOLKIT),cocoa)
>  # This is going to be a framework named "XUL", not an ordinary library named
>  # "libxul.dylib"
>  LIBRARY_NAME=XUL
> +MSVC_ENABLE_PGO := 1

You don't need this one :)
Attachment #749981 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 31

5 years ago
This patch also gives everyone a pony. Folded into one changeset for landing, fixed the issue glandium pointed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9bfd469a26df
(Reporter)

Updated

5 years ago
Blocks: 873038
Comment on attachment 749981 [details] [diff] [review]
opt-in to MSVC PGO in the directories that matter

If we are PGOing widget/windows should we add the other widget backends too (since a Windows only analysis generated this)?
https://hg.mozilla.org/mozilla-central/rev/9bfd469a26df
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
(Reporter)

Comment 34

5 years ago
(In reply to comment #32)
> Comment on attachment 749981 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=749981
> opt-in to MSVC PGO in the directories that matter
> 
> If we are PGOing widget/windows should we add the other widget backends too
> (since a Windows only analysis generated this)?

You mean on non-Windows builds?  Those don't have the opt-in PGO requirements, so everything will be PGOed there.
Depends on: 874141
Should we re-run this analysis periodically to see if any directories need to be added or removed from the whitelist?
(Reporter)

Comment 36

5 years ago
(In reply to comment #35)
> Should we re-run this analysis periodically to see if any directories need to
> be added or removed from the whitelist?

We should do that if we get close to the ceiling again.  Otherwise, I'm not sure if it adds too much benefit, unless it's really easy for anyone to just run a script.
(Assignee)

Comment 37

5 years ago
It's not unreasonable, but it's also unlikely to be super useful given how crappy the data is.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #36)
> We should do that if we get close to the ceiling again.  Otherwise, I'm not
> sure if it adds too much benefit, unless it's really easy for anyone to just
> run a script.

I'm mostly concerned about new directories (from code that is added or moved) that should be included in PGO but are not.  We could be hurting performance by excluding them for no good reason when we are not approaching the ceiling.
(Reporter)

Comment 39

5 years ago
(In reply to comment #38)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #36)
> > We should do that if we get close to the ceiling again.  Otherwise, I'm not
> > sure if it adds too much benefit, unless it's really easy for anyone to just
> > run a script.
> 
> I'm mostly concerned about new directories (from code that is added or moved)
> that should be included in PGO but are not.  We could be hurting performance by
> excluding them for no good reason when we are not approaching the ceiling.

Do we do that often enough?
You need to log in before you can comment on or make changes to this bug.