Open Bug 347592 (PCH) Opened 18 years ago Updated 2 years ago

Use precompiled headers (GCC, clang, and VC)

Categories

(Firefox Build System :: General, enhancement)

enhancement

Tracking

(Not tracked)

mozilla2.0

People

(Reporter: wbrana, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1b1) Gecko/20060805 BonEcho/2.0b1
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1b1) Gecko/20060805 BonEcho/2.0b1

GCC 3.4 and newer support precompiled headers. http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html


Reproducible: Always

Actual Results:  
Precompiled headers aren't used during building.

Expected Results:  
Precompiled headers could be used to speedup compilation.
OS: All → Linux
Summary: using precompiled headers with GCC → Use precompiled headers with GCC
We actually will want to do this for Visual C++ as well as GCC.
OS: Linux → All
Product: Firefox → Core
QA Contact: build.config → build-config
Hardware: PC → All
Summary: Use precompiled headers with GCC → Use precompiled headers (GCC and VC)
Target Milestone: --- → mozilla2.0
Assignee: nobody → ted.mielczarek
Status: UNCONFIRMED → NEW
Ever confirmed: true
bsmedberg and I discussed this a while ago, our thought was to make one header per module that included everything for that module, so like <xpcom.h>. Then per-compile directory we would generate a PCH from the contents of REQUIRES.
I'm for this. . . having seen PCHs speed up builds from 5 minutes to 1 minute (with VC++, but I assume it's the same for gcc), I know they can be a huge improvement. I'm willing to help if any help is needed or desired.
Last night I was motivated to experiment with this after noticing that files like layout/base/nsPresShell.cpp include ridiculous numbers of files.  Although nsPresShell.cpp is 8000 lines of code, the preprocessed input that gets compiled for it is almost 900,000 lines of code that comes from ultimately including 1725 (not distinct) headers.

I hacked together the following:
A makefile defines PCH_EXPORT = <module> to export its headers to a stub header that can later be pulled in by defining PCH_IMPORT = <module>.
A makefile defines PCH_HEADERS = <headers> to export the given headers.  All IDL generated headers are added automatically
A makefile defines PCH_IMPORT = <modules> to import the given modules and compile a PCH for that directory.  Some guards are present to ensure that if compile flags change or weirdness happens we don't hose the build.

So I added PCH_EXPORT = xpcom to everything relevant in xpcom/ and also exported nsCOMPtr.h and a few others (in addition to the automatically exported IDL generated headers) and then added PCH_IMPORT = xpcom to xpcom/io and timed a complete build of that directory with and without PCH.

That is, I did

cd xpcom/io
<disable PCH>
make clean
make #timed this
<enable PCH>
make clean
make #timed this

The time for a total build of the directory (including IDL processing, XPT generation, etc) dropped 20%.

I would expect to see even more substantial gains in directories like content/ and to a lesser extent layout/ which include those absurd numbers of headers.
1724 headers.  I didn't count unique headers but there is a LOT of duplication (nsCOMPtr.h gets included 52 times) which balloons the time to preprocess these files.  I didn't do any measurements of this but making nsPresShell.i takes a substantial fraction (feels like at least 50%) of the time it takes to make the object file outright.
Attachment #449889 - Attachment mime type: application/octet-stream → text/plain
(All of these measurements are for a debug build)
More timing measurements (opt build this time, on Win32).

layout/base (no PCH)
real    1m41.992s
user    0m1.521s
sys     0m3.643s

layout/base (PCH)
real    1m32.650s
user    0m1.380s
sys     0m3.790s

10% win just from xpcom, probably a lot more win to be had with PCHifying dom/, content/, and widget/

We also appear to spend a ridiculous amount of time building dependencies.  I wonder if we could instead grab the output of showincludes and process that to make our .pp files.
Alias: PCH
Unsurprisingly, there's a bug filed on that (bug 518136). There's also bug 462463 about dependencies.
Assignee: ted.mielczarek → nobody
I experimented with this a bit in clang.

As an extremely simple test, I moved all of nsGlobalWindow.cpp's includes into a PCH. Build time for that file went from 5s to 3s, a speedup of 1.7.  The PCH takes 4s to compile.

Crucially, I found that commenting out the includes in nsGlobalWindow.cpp when using PCH had no effect on build speed.  This means that we don't need to modify every source file in order to take advantage of PCH.

It would be cool if we could do something here.
Summary: Use precompiled headers (GCC and VC) → Use precompiled headers (GCC, clang, and VC)
I took a look at trying to get these included. Initial results can look promising, but I don't think that managing them is quite as easy as I had hoped. If they are used incorrectly, they can result in worse incremental build performance than without them, since extra files may be unnecessarily re-compiled. Granted I'm not an expert in using PCH, so if I missed something obvious here please let me know.

First, I am able to reproduce some of the results above. I tried to start out small by making a PCH to be shared among files in xpcom/base, and manually ran some tests with a shell script to compile *.cpp and compare timings. In order to figure out which .h files should be listed in the PCH, I grepped the *.pp files and pulled out a list of header files included by every .cpp file (~30 .h files). In Linux, this resulted in a ~6% speedup (7.753s vs 7.330s), whereas Windows fared much better with a ~20% speedup (17.290s vs 14.365s).

Here is where I ran into the first problem - how do we really decide what goes into the PCH? I was manually creating it by grepping *.pp, but that doesn't lend itself to automation, since to create *.pp, we have to first pre-process every file, which is what we're trying to avoid. Another option is to let an expert in the subsystem manually decide what goes into the PCH (essentially list what they think is needed by *.cpp in the subsystem, perhaps by grepping *.pp :), and check that file in and keep it up-to-date. However, this list may bit-rot, which has the following downsides:

1) The PCH may list a .h file that is not actually needed by all of the .cpp files. Take jlebar's example - if we put all of nsGlobalWindow.cpp's headers into the PCH for that directory, then when changing nsGlobalWindow.cpp there is a net gain of 2s. However, if we were to instead change nsGlobalWindow.h, there is now a net loss. The first effect is because we have to re-generate the PCH and then build nsGlobalWindow.cpp (3s + 4s == 7s, a loss of 2 seconds). The second, and more significant effect, is that we are now re-building all 29 files in the directory, instead of just the 12 that actually include nsGlobalWindow.h. Clearly this is not as simple as it might seem at first.

2) Listing "foo.h" in the PCH means it is automatically included by every .cpp. So now a new .cpp file may start using things from foo.h, even though it doesn't explicitly include it. Updating the PCH to remove foo.h due to 1) is now more difficult, since you have to go back and re-add the headers everywhere they are needed.

3) We would also need to include the PCH even if it's not actually pre-compiled, otherwise the list of headers included in a PCH build will differ from a non-PCH build. Without this, someone may check something in that compiles fine with PCH on, but is broken without PCH since the .cpp file is missing required headers. So we either pay a price of a higher number of broken commits, or pay the price of unnecessary re-compiles in 1) even when PCH is disabled.

Assuming there is a way to get a correct PCH listing and keep it up-to-date, I continued testing on a larger scale. I repeated this test for xpcom/io by generating a new list of .h files used by xpcom/io/*.cpp and pre-compiling it. Here things were even better, as re-compiling *.cpp in Linux is ~10% faster (6.563s vs 5.955s), and Windows up to ~25% faster (15s vs 12s).

I now tried to merge these two directories so that they could share a single "xpcom.h" file, by listing only the headers included by all files in both directories. Since the list of headers being pre-compiled was now a bit smaller, the gain was a little smaller as well (xpcom/base fell from 6% to 5% in Linux). However, this is where I hit the second problem - PCHs can't be shared across files where -D flags are different. I built the PCH using xpcom/base's flags, so the PCH was ignored when I tried to use it in xpcom/io. To get an idea of the scale of this problem, grep through Makefile.in's for DEFINES (find . -name Makefile.in | xargs grep 'DEFINES\s*+=' or some such). Although we might create an xpcom.h, it would need to be compiled separately in at least 23 directories under xpcom. (Or, some poor soul could go through and try to consolidate all of the defines...). This means each PCH is not compiled just once, but once per directory.

In summary, I believe to implement PCH we would need to do the following:

A) Come up with a way to figure out which files should go into the PCH list. This list needs to be pre-compiled for each directory that uses it, given the scope of the -D flag problem. So we get xpcom/base/xpcom.pch and xpcom/io/xpcom.pch which are both derived from xpcom/xpcom.h. Note that each .cpp file can only have one PCH, so there's no point in trying to share them across subsystems.

B) Use -include to force including the header list first (even if PCH is disabled) to avoid the problem of PCH builds behaving differently from non-PCH builds due to missing #includes.

C) Keep these lists up-to-date to avoid hurting incremental build performance.

I am concerned that by introducing PCH, people will be tempted to include more files in the list than should actually be there because it makes a clobber build faster. I know that clobber build times seem like an attractive thing to optimize, but they are only really necessary because of the deficiencies of make. Given my focus on incremental build performance I am worried this will be more damaging long-term, to the point where changing any .h file results in many more files being re-compiled than are strictly necessary. I would much prefer that the number of things that need to be rebuilt when a .h file changes goes down, not up.

Some questions I have: Are there strategies for managing or automating what goes in to the PCH, both initially and over time as people forget about them? Is there a way to mitigate the -D flags added by the various Makefile.ins so a PCH could be shared at a higher level without being compiled multiple times? Is there an argument for how this won't negatively affect incremental builds?

I was hoping this would end up being a --build-faster flag, but I think it is not so simple.
A lot of good things to reply to here.

>Although we might create an xpcom.h, it would need to be compiled separately
>in at least 23 directories under xpcom. (Or, some poor soul could go through
>and try to consolidate all of the defines...). [...]
>This means each PCH is not compiled just once, but once per directory.

I think figuring this out is pretty key to getting PCH to work.  If we have to
do a new PCH for each directory, we're likely missing key gains in top-level
directories where we spend a lot of time (dom, content come to mind).

I wonder if things are not as bad as they seem, and if the vast majority of
source files share the same -Ds.  (Or, if they're nearly the same, if they
could be made to be the same.)

One way you could check this is by doing a clobber build, recording all of the
command-line invocations, and counting how many sets of -Ds are used in how
many different invocations of CC.

>A) Come up with a way to figure out which files should go into the PCH list.
>This list needs to be pre-compiled for each directory that uses it, given the
>scope of the -D flag problem. So we get xpcom/base/xpcom.pch and
>xpcom/io/xpcom.pch which are both derived from xpcom/xpcom.h. Note that each
>.cpp file can only have one PCH, so there's no point in trying to share them
>across subsystems.

Clang does support multiple PCH's per cpp file (the dependency graph can be
either a list or a DAG).  Given that the vast majority of developers are on
Linux or Mac, I'd be quite happy with a solution which required clang.

http://clang.llvm.org/docs/PCHInternals.html

But I recognize this doesn't get us around the -D problem.

>C) Keep these lists up-to-date to avoid hurting incremental build performance.
>
>I am concerned that by introducing PCH, people will be tempted to include more
>files in the list than should actually be there because it makes a clobber
>build faster. I know that clobber build times seem like an attractive thing to
>optimize, but they are only really necessary because of the deficiencies of
>make.

gps said something like this in his recent build-faster presentation, so you're
in good company.  But I don't think it is true at all.

I think the misconception may stem from choosing the dichotomy of "clobber"
versus "incremental" builds.  But I don't think that's the relevant dichotomy
here.

Instead, we should think about the differences between builds which recompile
many files and builds which recompile few files.  It's my contention that
builds which recompile many files will be common even if we have a build system
which never requires us to rm -rf the objdir.

To wit, you end up compiling many files when you

* change compiler flags (e.g., I usually make only debug builds, but
  sometimes I need a release build, a valgrind build, or a trace-malloc build).

* change a header which is included by many files (we have many such
  headers, and some, such as nsContentUtils.h, are changed frequently).

* pull a change from upstream which modifies such a header.  In practice, I
  end up rebuilding much of the tree every time I pull from upstream.

* switch between branches which include a change to such a header. In
  practice, nearly all of my substantial branches include such a change.

PCH helps us in all of these cases, and no build system change is going to
substantially reduce the numer of files we have ot re-compile in these cases.

In addition, our automated builders end up re-compiling most object files on
most builds, due to the above.  So PCH should be a build-faster win on our
automated builders.

Previous attempts I'm aware of to reduce the number of headers that our cpp
files depend on have not yielded great success, in part because many of these
dependencies are true dependencies.

>Given my focus on incremental build performance I am worried this will
>be more damaging long-term, to the point where changing any .h file results in
>many more files being re-compiled than are strictly necessary. I would much
>prefer that the number of things that need to be rebuilt when a .h file
>changes goes down, not up.

I agree this is a concern; I don't have a good answer.

One way to address this is to ensure that everything builds both with and
without PCH.  Then you can have the option of not using PCH for your
incremental builds, or of using the dependencies generated from a non-PCH
invocation of GCC to decide which files to rebuild.

But in practice, people break builds unless they're on tinderbox.  So we'd have
to have non-PCH builds on tinderbox, which would result in more, not less, load
on builders.

Running non-PCH builds on tinderbox doesn't entirely defeat our purpose here,
because we wouldn't need to wait for non-PCH builds to finish before running
tests, so tests could still complete faster, and developers could still get
tryserver results faster.  And of course builds on devs' machines would be
faster.  But it's not a great solution.
(In reply to Justin Lebar [:jlebar] from comment #11)
> Clang does support multiple PCH's per cpp file (the dependency graph can be
> either a list or a DAG).  Given that the vast majority of developers are on
> Linux or Mac, I'd be quite happy with a solution which required clang.

This is a feedback loop that results in the vast majority of developers using a different product (non-Windows Firefox) from what 95% of our users are using (Firefox on Windows). I've heard many times that we'd like to have more developers developing on Windows so we have more "genuine" dogfooding of the product, but we're making nearly everything painful and painfully slow on Windows (including, especially, B2G and Fennec, since they require the build to be done in a VM). It's wrong to hold back improvements on other platforms that Windows can't get as easily, but it's also wrong to suggest we can just avoid making those improvements on Windows.

> Instead, we should think about the differences between builds which recompile
> many files and builds which recompile few files.  It's my contention that
> builds which recompile many files will be common even if we have a build
> system which never requires us to rm -rf the objdir.

[snip examples]

In my experience, none of your examples are nearly as common as the case where I make a change to a single .cpp file and I want to recompile just that file and relink libxul. My second most common type of build is where I update a header file used by a few dozen files only in my own module. I do dozens of those builds a day. However, I pull in changes that require me to clobber only about once a day. Consequently, I'd guess that it is extremely important to make incremental builds as incremental as possible, as often as possible, and we should be extremely careful (at least) about avoiding regressing this aspect.
> This is a feedback loop that results in the vast majority of developers using a 
> different product (non-Windows Firefox) from what 95% of our users are using (Firefox 
> on Windows). 

I don't think there's any disagreement between us that if we can make Windows builds fast, that's great, and if we can't without regressing other things, oh well.  But I would /really/ like to avoid getting into the more general issue of how we incentivise developers to work on particular platforms here, if you don't mind.

> In my experience, none of your examples are nearly as common as the case where I make a 
> change to a single .cpp file and I want to recompile just that file and relink libxul.
> [...]
> Consequently, I'd guess that it is extremely important to make incremental builds as 
> incremental as possible, as often as possible, and we should be extremely careful (at 
> least) about avoiding regressing this aspect.

I don't disagree.  What I object to is the notion that full rebuilds "are only really necessary because of the deficiencies of make," so therefore, given that our long-run plan is to eliminate make, we don't care about full-rebuild time.

Note that we already have hacks to make recompiling one file and relinking libxul fast.  I don't disagree that those hacks are harmful and shouldn't have to exist, but the fact remains that hacks exist for getting around this pain to some degree.

In contrast, there are no hacks for speeding up the cases I listed, short of modifying your hardware in some way (*).  Other than our bugs to entirely rewrite the build, this bug is the only one I'm aware of with a large potential speedup for full rebuilds.

(*) Well, okay, ccache helps with some of them.
Once we have moz.build files, we should more easily be able to obtains the sets of all files sharing the same set of compiler flags across the tree. We should then be able to construct chained PCH files and share PCH files across directories.

There is an initial seeding problem to identify the full set of .h files for each PCH "bundle." One way to solve that is with Clang's API. We could use the Clang Python bindings or write a small C/C++ program to traverse the set of #include'd files. Unfortunately, this only works where we use Clang.

We should also minimize the variance in compiler arguments across the tree to minimize the number of PCH combinations. As Mike said, there are many directory-local defines. I'm willing to bet a lot of these are cruft, cargo culted, or could be pulled out into global defines. Minimizing the variance here could also allow us to reduce the number of compiler invocations if we ever start compiling multiple source files per invocation.
Here's evidence of someone going to great lengths to optimize their full-rebuild times on Linux and Mac: http://glandium.org/blog/?p=2908
(In reply to Justin Lebar [:jlebar] from comment #11)
> A lot of good things to reply to here.
> 
> >Although we might create an xpcom.h, it would need to be compiled separately
> >in at least 23 directories under xpcom. (Or, some poor soul could go through
> >and try to consolidate all of the defines...). [...]
> >This means each PCH is not compiled just once, but once per directory.
> 
> I think figuring this out is pretty key to getting PCH to work.  If we have
> to
> do a new PCH for each directory, we're likely missing key gains in top-level
> directories where we spend a lot of time (dom, content come to mind).
> 
> I wonder if things are not as bad as they seem, and if the vast majority of
> source files share the same -Ds.  (Or, if they're nearly the same, if they
> could be made to be the same.)
> 
> One way you could check this is by doing a clobber build, recording all of
> the
> command-line invocations, and counting how many sets of -Ds are used in how
> many different invocations of CC.

Ok, I did this on OSX. I used the following perl script to process the output:

#! /usr/bin/perl

while(<STDIN>) {
        if(m,^/usr/bin/clang,) {
                @args = split(/ /);
                $filename = $args[$#args];
                chomp($filename);
                $filename =~ m,(.*/)[^/]*,;
                $dirs{$1}++;
                $argstring = "";
                foreach $arg (sort @args) {
                        if($arg =~ /^-D/) {
                                $argstring .= "$arg ";
                        }
                }
                $myargs{$argstring}++;

                $numfiles++;
        }
}

foreach $dir (keys %dirs) {
        print "In dir $dir built " . $dirs{$dir} . " files\n";
}

foreach $arg (keys %myargs) {
        print "Args '$arg' used " . $myargs{$arg} . " times\n";
        if($myargs{$arg} > $max) {
                $max = $myargs{$arg};
                $maxarg = $arg;
        }
}
print "Max: $max, Arg = $maxarg\n";
print "Total number of directories : " . scalar(keys %dirs) . "\n";
print "Total number of argment strings: " . scalar(keys %myargs) . "\n";
print "Total number of files compiled: $numfiles\n";

Basically it just grabs all of the lines that call clang, picks out the -D options, sorts them into a string and counts the number of times that string of -D options appear. Here's the output summary:

# snip: Lots of data about each -D set
Max: 457, Arg = -DEXPORT_XPTC_API -DEXPORT_XPT_API -DIMPL_NS_NET -DIMPL_THEBES -DIMPL_XREAPI -DMOZILLA_CLIENT -DMOZILLA_INTERNAL_API -DNDEBUG -DNO_NSPR_10_SUPPORT -DNO_X11 -DSTATIC_EXPORTABLE_JS_API -DTRIMMED -D_IMPL_NS_COM -D_IMPL_NS_GFX -D_IMPL_NS_LAYOUT -D_IMPL_NS_WIDGET 
Total number of directories : 512
Total number of argment strings: 145
Total number of files compiled: 5359

So 457 files share that set of -D flags, which is the most widely used set. Some other sets of flags are only used by a handful of files - there are 145 different sets of flags used across the 512 directories that contain compiled files. Unless these are consolidated, that means we'd need 145 sets of PCHs (less than the one-per-directory I was suggesting, but figuring out where they can be shared by settings in the Makefiles may be more difficult than a simple one-per-directory model).

> 
> >A) Come up with a way to figure out which files should go into the PCH list.
> >This list needs to be pre-compiled for each directory that uses it, given the
> >scope of the -D flag problem. So we get xpcom/base/xpcom.pch and
> >xpcom/io/xpcom.pch which are both derived from xpcom/xpcom.h. Note that each
> >.cpp file can only have one PCH, so there's no point in trying to share them
> >across subsystems.
> 
> Clang does support multiple PCH's per cpp file (the dependency graph can be
> either a list or a DAG).  Given that the vast majority of developers are on
> Linux or Mac, I'd be quite happy with a solution which required clang.
> 
> http://clang.llvm.org/docs/PCHInternals.html
> 
> But I recognize this doesn't get us around the -D problem.

Ahh, I wasn't aware of that! I naively assumed it was similar to gcc and MSVC. I believe that would address my concerns of adding unnecessary header dependencies, though as you point out we'd really need to slim down the -D mess to take advantage of it. Then it might be as simple as pre-processing .h files after they are installed in dist/include (I'm speculating here...)

> Instead, we should think about the differences between builds which recompile
> many files and builds which recompile few files.  It's my contention that
> builds which recompile many files will be common even if we have a build
> system
> which never requires us to rm -rf the objdir.
> 
> To wit, you end up compiling many files when you
> 
> * change compiler flags (e.g., I usually make only debug builds, but
>   sometimes I need a release build, a valgrind build, or a trace-malloc
> build).
> 
> * change a header which is included by many files (we have many such
>   headers, and some, such as nsContentUtils.h, are changed frequently).
> 
> * pull a change from upstream which modifies such a header.  In practice, I
>   end up rebuilding much of the tree every time I pull from upstream.
> 
> * switch between branches which include a change to such a header. In
>   practice, nearly all of my substantial branches include such a change.
> 
> PCH helps us in all of these cases, and no build system change is going to
> substantially reduce the numer of files we have ot re-compile in these cases.

But what I'm pointing out is that PCH (the gcc/MSVC way) can also hurt in many of these cases. Someone may put nsContentUtils.h in the PCH, for example, because it makes a clobber build some percentage faster. Currently without PCH, changing nsContentUtils.h will re-compile only the files that include it (eg: in content/base/src, there are 68 that include it according to my *.o.pp files). However if it's in the PCH, we'll re-compile everything in that directory (since every .cpp files includes the PCH), for a total of 86 files -- now you're ~26% slower.
> Unless these are consolidated, that means we'd need 145 sets of PCHs

Not necessarily, because we don't need to apply PCH to every file in the tree.  But it does sound like we'd want to consolidate the defines.  It's hard to tell from your data whether that's feasible.

> But what I'm pointing out is that PCH (the gcc/MSVC way) can also hurt in many of these 
> cases.

Understood.  I suggested one way we could try to work around this, at the end of comment 11, but it's not a great solution.  If we can't figure it out, maybe we have to bail on PCH.  :(
(In reply to Michael Shal [:mshal] from comment #16)
> #! /usr/bin/perl

use strict;
use warnings FATAL => 'all';

;)
 

> # snip: Lots of data about each -D set
> Max: 457, Arg = -DEXPORT_XPTC_API -DEXPORT_XPT_API -DIMPL_NS_NET
> -DIMPL_THEBES -DIMPL_XREAPI -DMOZILLA_CLIENT -DMOZILLA_INTERNAL_API -DNDEBUG
> -DNO_NSPR_10_SUPPORT -DNO_X11 -DSTATIC_EXPORTABLE_JS_API -DTRIMMED
> -D_IMPL_NS_COM -D_IMPL_NS_GFX -D_IMPL_NS_LAYOUT -D_IMPL_NS_WIDGET 
> Total number of directories : 512
> Total number of argment strings: 145
> Total number of files compiled: 5359

We need to minimize the number of unique argument strings. We should be able to start this project today, even with Makefile.in's. The problem isn't going to go away with the mechanical transition to moz.build.

As far as "to PCH or not to PCH," wouldn't it be awesome if the build backend could make intelligent choices depending on the number of invalidated targets and remembered resource usage from previous builds?
(In reply to Gregory Szorc [:gps] from comment #18)
> (In reply to Michael Shal [:mshal] from comment #16)
> > #! /usr/bin/perl
> 
> use strict;
> use warnings FATAL => 'all';
> 
> ;)

It was a quick hack! Avert your eyes! :)

> As far as "to PCH or not to PCH," wouldn't it be awesome if the build
> backend could make intelligent choices depending on the number of
> invalidated targets and remembered resource usage from previous builds?

I'm not sure what exactly you mean here - can you give an example?
(In reply to Michael Shal [:mshal] from comment #19)
> > As far as "to PCH or not to PCH," wouldn't it be awesome if the build
> > backend could make intelligent choices depending on the number of
> > invalidated targets and remembered resource usage from previous builds?
> 
> I'm not sure what exactly you mean here - can you give an example?

The build backend would be able to attribute a cost to:

a) rebuild PCH file and rebuild everything that depends on it
b) rebuild individual .cpp files without rebuilding the PCH
c) some variant of above

and then automatically select the option at build time with the lowest cost.

I suspect we would use PCH for clobber builds and less so for incremental builds. But, that's just wild speculation which I haven't spent many brain cycles thinking through.
> Currently without PCH, changing nsContentUtils.h will re-compile only the files that 
> include it (eg: in content/base/src, there are 68 that include it according to my *.o.pp 
> files). However if it's in the PCH, we'll re-compile everything in that directory (since 
> every .cpp files includes the PCH), for a total of 86 files -- now you're ~26% slower.

Thinking about this more, I don't think this is a problem: We can use PCH and also get exact dependencies without running the preprocessor on every file.

Consider: ccache does exactly this, by searching through files for "#include".  If we cached the "#include"s inside each header file, this tool could be made very fast; in the common case you'd just have to stat the relevant headers to ensure that they haven't been modified.  (Using mtime here is safe, since that's what make itself relies on for correctness.  In contrast, ccache has to hash files to ensure that they haven't changed, which increases its overhead here.)
Attached file moz-global-pch.h
I just ran a test:
- threw in some very common includes and with some GFX headers for fun.
- Added CXXFLAGS += -include-pch moz-global-pch.h.pch in a few dirs

I got a saving of 30 CPU seconds. It seams a bit less then what we're getting from unification but noteworthy never the less.

I think we could precompile a generic PCH for mozilla. Then each dir by default would not use PCH, could request the global PCH or (for large modules) could specify their own PCH that the module owner would maintain. We can add something to the moz.build file to specify PCH for that directory.
(In reply to Benoit Girard (:BenWa) from comment #22)
> I got a saving of 30 CPU seconds.

Define 30 CPU seconds. If it's 30 seconds on the 70+ minutes of cpu time used during a build, it's worthless.
I did a targeted test for gfx/layers. With the PCH header posted above the time went from 17 to 13 CPU seconds. That's a 25% saving in CPU time which is worth doing. This is on a directory where we unify sources. This confirms that PCH header are still useful with unified sources.
(In reply to Mike Hommey [:glandium] from comment #23)
> (In reply to Benoit Girard (:BenWa) from comment #22)
> > I got a saving of 30 CPU seconds.
> 
> Define 30 CPU seconds. If it's 30 seconds on the 70+ minutes of cpu time
> used during a build, it's worthless.

This statement is very premature.  Benoit's results are quite worth pursuing, and they will only get better as we use PCH in more places in the tree.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #25)
> (In reply to Mike Hommey [:glandium] from comment #23)
> > (In reply to Benoit Girard (:BenWa) from comment #22)
> > > I got a saving of 30 CPU seconds.
> > 
> > Define 30 CPU seconds. If it's 30 seconds on the 70+ minutes of cpu time
> > used during a build, it's worthless.
> 
> This statement is very premature.  Benoit's results are quite worth
> pursuing, and they will only get better as we use PCH in more places in the
> tree.

The comment didn't mention how many directories and how much time those directories were normally building in. Thus the question.

Now, comment 24 is more useful to get an idea.

With that being said, I'd rather not venture in precompile headers territory without a *very* careful plan. For example, we need to be very careful about defines (thankfully, they are now in moz.build). Maybe we should just go forward with the bug that i can't find that's essentially about moving defines to specialized headers (so instead of -D foo, we'd e.g. #include "foo.h").
Found it: bug 902825.
(In reply to comment #26)
> With that being said, I'd rather not venture in precompile headers territory
> without a *very* careful plan. For example, we need to be very careful about
> defines (thankfully, they are now in moz.build). Maybe we should just go
> forward with the bug that i can't find that's essentially about moving defines
> to specialized headers (so instead of -D foo, we'd e.g. #include "foo.h").

The defines are not going to be a problem for now.  I did some experiments today, and clang only cares about the defines which are actually used in the headers that you precompile.  Other defines are ignored during AST validation.

In addition to the defines, we need to be careful with things such as LOCAL_INCLUDES which change the include search order, flags affecting the codegen target such as -msse2 (because precompiled header files are not target agnostic) and perhaps other things which we are going to find out about along the way.

Here is my proposed plan:

1. Start small, create a mozilla-precompiled.h which is included *everywhere* and add things such as libc headers to it.  Work on that patch and get it landed.
2. Get a list of _highly_ popular headers everywhere in our tree, verify that they do not depend on defines etc, and add them to mozilla-precompiled.h.
3. Once we're there, and only then, start to worry about other headers.

How does this sound?
(In reply to Mike Hommey [:glandium] from comment #26)
> For example, we need to be very careful about
> defines (thankfully, they are now in moz.build).

Yes of course.

> Maybe we should just go
> forward with the bug that i can't find that's essentially about moving
> defines to specialized headers (so instead of -D foo, we'd e.g. #include
> "foo.h").

bug 902825 has much large scope then that which isn't explicable here. Moving defines to headers is an interesting idea but at this point we have no solid reason to block PCH on having that piece.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #28)
> How does this sound?

This sounds reasonable, but we need to ensure that those headers do not ever depend on defines.
That is, not just when we add them to mozilla-precompiled.h.
(In reply to comment #30)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #28)
> > How does this sound?
> 
> This sounds reasonable, but we need to ensure that those headers do not ever
> depend on defines.

Yes.  I still don't have a good solution for that...
(In reply to Mike Hommey [:glandium] from comment #30)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #28)
> > How does this sound?
> 
> This sounds reasonable, but we need to ensure that those headers do not ever
> depend on defines.

I guess these defines are not only from the command-line, but also added by moz.build files?  Can't we do something like:

if ! cmp -s <($CC -E $ALL_THE_OPTIONS -o - $header) <($CC -E -o - $header); then
  echo header depends on defines, fixme
fi

in any necessary directory?
(In reply to Nathan Froyd (:froydnj) (AFK 23-30 November) from comment #33)
> (In reply to Mike Hommey [:glandium] from comment #30)
> > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #28)
> > > How does this sound?
> > 
> > This sounds reasonable, but we need to ensure that those headers do not ever
> > depend on defines.
> 
> I guess these defines are not only from the command-line, but also added by
> moz.build files?  Can't we do something like:
> 
> if ! cmp -s <($CC -E $ALL_THE_OPTIONS -o - $header) <($CC -E -o - $header);
> then
>   echo header depends on defines, fixme
> fi
> 
> in any necessary directory?

It's not that easy.  See prlog.h for example.  You can have the following in a translation unit:

#define FORCE_PR_LOG
#include "prlog.h"

Therefore we cannot precompile that header.  It's not clear to me how your proposal would detect that situation.
This patch implements PCH support for the entire tree, and has only been tested on clang/OSX.  Here I went through the most popular included headers across the code base, and precompiled most of them.  Here's the size of the database with this patch:

$ ls -lah obj-ff-dbg/mozilla-config.h.precompiledheader
-rw-r--r--  1 ehsan  staff   3.4M  1 Dec 16:06 obj-ff-dbg/mozilla-config.h.precompiledheader

And here is my timing of the affect of this before and after this patch:

Before:
real	7m24.288s
user	37m20.551s
sys	4m26.605s

After:
real	7m22.874s
user	36m43.693s
sys	4m24.324s

The affect of this patch for me is between 40-60 seconds of CPU time, but always a negligible amount of wall time.  I guess we might be very far from being fully parallel in a big enough of a part of my build that these kinds of CPU time improvements may start to get diminished in the wall time.  (And indeed according to mach resource-usage, I'm spending around 2:30 with very poor parallelism in this build now.)

I'd like to ask for feedback on the patch, and also that other people please grab the patch and use it on other machines and measure their build times before and after.  Then we can decide whether we should pursue this path.
Assignee: nobody → ehsan
Attachment #8340794 - Attachment is obsolete: true
I'd suggest trying again with some large but less common header like jsapi.h, layers.h and the likes. When I tried including simple but common headers like you have here I didn't see a win either when testing in 22. You need to be more aggressive with large headers.
Comment on attachment 8340800 [details] [diff] [review]
Experimental support for precompiled headers with clang

Hang on, it seems like a very weird behavior in the expansion of $< which seems to be context sensitive caused me to oversee a bug in my patch which caused all of the tree to be built *without* the precompiled header. :(  So my timings above are basically useless.  I need to fix up the patch to make the tree build again and then submit actual numbers.
Attachment #8340800 - Attachment is obsolete: true
You need CXXPCHFLAGS =, not CXXPCHFLAGS :=
OK, here's an actual patch which builds, and here is the build time:

real	7m2.328s
user	34m51.262s
sys	4m7.972s

So this gives us around 5-6% improvement across the tree.  I should also note that I stopped precompiling some of the headers because they were depending on MOZILLA_INTERNAL_API.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #40)
> So this gives us around 5-6% improvement across the tree.  I should also
> note that I stopped precompiling some of the headers because they were
> depending on MOZILLA_INTERNAL_API.

Very few c++ files are built without MOZILLA_INTERNAL_API. Maybe the right thing to do here is to use precompiled headers only for C++ files that are built with MOZILLA_INTERNAL_API (which essentially is whenever LIBXUL_LIBRARY is set).
Also, please measure the build time difference with gcc too, if you can, because i suspect the wins would be bigger there.
(In reply to Mike Hommey [:glandium] from comment #41)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #40)
> > So this gives us around 5-6% improvement across the tree.  I should also
> > note that I stopped precompiling some of the headers because they were
> > depending on MOZILLA_INTERNAL_API.
> 
> Very few c++ files are built without MOZILLA_INTERNAL_API. Maybe the right
> thing to do here is to use precompiled headers only for C++ files that are
> built with MOZILLA_INTERNAL_API (which essentially is whenever
> LIBXUL_LIBRARY is set).

Is there an easy build system level trick which I can employ to differentiate those files?  I agree that limiting PCH to those files makes a lot of sense.

(In reply to Mike Hommey [:glandium] from comment #42)
> Also, please measure the build time difference with gcc too, if you can,
> because i suspect the wins would be bigger there.

Will do.  For now I've only focused on my local build environment, but extending this to GCC should be fairly simple.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #43)
> Is there an easy build system level trick which I can employ to
> differentiate those files?  I agree that limiting PCH to those files makes a
> lot of sense.

ifdef LIBXUL_LIRBARY in rules.mk or Makefile.in (but only after config.mk is included)
This patch adds gcc support, but it actually doesn't work yet.  The gcc PCH docs are a disaster, apparently there is no way to tell gcc to use a given PCH file, and you have to do that indirectly by placing a header.h.gch file in the same directory as the header, except that gcc doesn't pick it up during the build at all.  I tested it by sticking a #error in mozilla-config.h after the PCH was generated.  gcc has some warnings that it issues when it cannot use the PCH file that it found, and when I turn it on as errors it's not triggered even once.  I suspect that gcc doesn't even try to pick up the .gch file at all.  And to make things worse, all of this stuff works when I try out the commands on a small test.cpp/pch.h that I have around for testing purposes; it's not clear to me why this stops working in our build system.
Attachment #8340817 - Attachment is obsolete: true
Note that I haven't tried comment 44 yet, I would like to figure out how we can convince gcc to use the PCH first.
This patch only uses PCH on internal API code, and precompiles a few more headers.  With this patch, I get the following times:

Before:
real	7m1.261s
user	35m28.100s
sys	4m2.561s

After:
real	6m37.862s
user	34m14.911s
sys	3m53.180s

The gcc support doesn't still work.

glandium, what do you think about this patch?  (Note that if we don't figure out the gcc problem, we can disable it and land the initial support for clang only.)
Attachment #8341100 - Attachment is obsolete: true
Attachment #8344414 - Flags: feedback?(mh+mozilla)
For windows:

To create pch:

cl -FIforced.h -Ycforced.h -Fphdrs.pch -c -TP -Foignored.obj forced.h

(-TP: compile everything as C++, -Fo:use this output .obj file)

To use pch:

cl -FIforced.h -Yuforced.h -Fphdrs.pch hello.cpp
With comment 48, this should be very straightforward to port to MSVC, where this is presumbaly the most useful.
Comment on attachment 8344414 [details] [diff] [review]
Experimental support for precompiled headers with clang and gcc

Clearing the flag since even if I got feedback after 2 years, I wouldn't be motivated to address it.
Attachment #8344414 - Flags: feedback?(mh+mozilla)
Assignee: ehsan → nobody
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: