Open Bug 789317 Opened 12 years ago Updated 2 years ago

Disable ccache

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

Details

I instrumented the build system to gather system metrics during building and pushed a few try builds to gather data.

Data is limited to OS X builders because they are the only ones with Python 2.7 + the Python development files.

ccache enabled:

https://tbpl.mozilla.org/?tree=Try&rev=d5b5305bb373 (bld-lion-r5-035)
https://tbpl.mozilla.org/?tree=Try&rev=e85bd40362c4 (bld-lion-r5-040)

ccache disabled:

https://tbpl.mozilla.org/?tree=Try&rev=806b3441e4a7 (bld-lion-r5-006)
https://tbpl.mozilla.org/?tree=Try&rev=8eaa60c9dc14 (bld-lion-r5-026)
https://tbpl.mozilla.org/?tree=Try&rev=13ab77733075 (bld-lion-r5-006)

The raw data is printed at the bottom of the build logs.

System resource usage for the opt builds breaks down as follows (mean followed by raw values):

ccache enabled
==============

Wall time (s):        1266           (1281; 1251)
Effective CPU (%):    65.99          (64.14; 67.84)
I/O Read (bytes):     431,390,720    (441,974,784; 420,806,656)
I/O Write (bytes):    10,212,594,688 (10,196,536,320; 10,228,653,056)
I/O wait read (ms):   284,524,691    (338,120,630; 230,928,752)
I/O wait write (ms):  717,335,601    (791,824,447; 642,846,754)

ccache disabled
===============

Wall time (s):       1032         (1029; 1036; 1032)
Effective CPU (%):   73.34        (73.25; 73.09; 73.68)
I/O Read (bytes):    322,984,618  (280,956,928; 279,068,672; 408,928,256)
I/O Write (bytes):   2,588,996,608 (2,590,042,112; 2,586,801,152; 2,590,146,560)
I/O wait read (ms):  86,867,161   (67,943,805; 86,817,741; 105,839,937)
I/O wait write (ms): 264,928,619  (259,033,841; 275,819,475; 259,932,542)

Average Delta (ccache to no ccache)
===================================

Wall time (s):       -234 (3:54)
Effective CPU (%):   7.35
I/O Read (bytes):    -108,406,102
I/O Write (bytes):   -7,623,598,080
I/O wait read (ms):  -197,659,530
I/O wait write (ms): -452,406,982

Percentages (no ccache over ccache then inverse)
=========================================

Wall time:      81.5%  :  122.7%
I/O Read:       74.9%  :  133.6%
I/O Write:      25.4%  :  394.5%
I/O wait read:  30.5%  :  327.5%
I/O wait write: 36.9%  :  270.8%

Before we analyze, some asterisks:

* Try builders always do clobber builds
* Are Try builders on the same hardware as actual builders?
* The sample set is small.
* There could be variance in the hardware configurations
* Try builds aren't incremental builds (saying it again because it is important)
* Repeat builders could be influenced by page cache (although bld-lion-r5-006 was responsible for no ccache #1 and #3 and the numbers are very similar with the exception of read bytes. No clue why that's so different.

FWIW, I tried to get a larger sample set, but I didn't want to abuse Try. I got a few repeat builders for ccache builds, which threw off metrics so I threw them away. These show that clobber builds with populated ccache are pretty fast - about 558s total (or about 2x faster than without a populated ccache)! So, ccache *does* have its place.

While the sample set is small, it shows that ccache has an overall negative impact on build performance, at least under the conditions tested. Specifically, it increases I/O significantly. The extra 7.6GB write I/O for ccache is IMO absurd. The extra time spent in I/O wait is probably the main contributor to slower build times with ccache.

Anyway, I would like us to conduct an experiment: I would like for ccache to be disabled for a day or two (maybe a week so we have a bigger data set). If any build configurations show an improvement in build times, we leave it disabled. If not, we just revert the change. It's quite possible we'll see mixed results across Linux and OS X, clobber and incremental, try and non-try, and possibly even debug and non-debug.
This contradicts some of the analysis done in bug 775305.
(In reply to Chris AtLee [:catlee] from comment #1)
> This contradicts some of the analysis done in bug 775305.

So it does. And that is pretty recent, too. Hmmm.

FWIW, I pushed a try build that performed a |ccache -C -z| right before configure and |ccache -s| right after |make build| finished. The results:

total Time - 1213.80863404s
total CPU - Average CPU Usage: 72.2763733416%
total I/O - Reads: 27862; Writes: 185471; Read Bytes: 398737408; Write Bytes: 10115586048; Read Time (ms): 130798731; Write time (ms): 572693542

cache directory                     /builds/ccache
cache hit (direct)                     5
cache hit (preprocessed)               2
cache miss                          4827
called for link                       23
called for preprocessing               8
compile failed                         1
preprocessor error                     1
bad compiler arguments                 1
unsupported source language            3
autoconf compile/link                 24
unsupported compiler option          121
no input file                        178
files in cache                     14303
cache size                         450.3 Mbytes

This is with CCACHE_COMPRESS=1. If the size of the ccache is only 450MB, how do ccache builds consume 7+GB more write I/O than non-ccache builds?!

My resource logging also grabs per-tier and per-directory-in-tier resource usage (I don't believe there is in PARALLEL_DIRS magic happening here, so the numbers should be exclusive). Take a look at a log for yourself: https://tbpl.mozilla.org/php/getParsedLog.php?id=15035063&tree=Try&full=1. Here are some that stand out:

 platform libs content: 1786042368
 platform libs layout: 1272828928

We probably have a dozen directories in the hundreds of megabytes range. Where is this I/O coming from?!

If I had root on these machines I'd throw DTrace at the problem. (http://dtrace.org/blogs/brendan/2011/10/10/top-10-dtrace-scripts-for-mac-os-x/ has some good scripts).

There is also a non-0 chance that psutil and/or my resource monitoring code is misreporting data. Someone with shell access to a build slave should be able to validate these numbers.

If you do the math, 7.623GB spread over an extra 234s comes out to ~32.6MB/s. That's pretty impressive if you ask me. Something smells fishy...
The plot thickens.

I pushed a build that unsets CCACHE_COMPRESS from the environment. I can confirm it works by looking at ccache -s after the build:

2339.90 cache directory                     /builds/ccache
2339.90 cache hit (direct)                     5
2339.90 cache hit (preprocessed)               2
2339.90 cache miss                          4827
2339.90 called for link                       23
2339.90 called for preprocessing               8
2339.90 compile failed                         1
2339.90 preprocessor error                     1
2339.90 bad compiler arguments                 1
2339.90 unsupported source language            3
2339.90 autoconf compile/link                 24
2339.90 unsupported compiler option          121
2339.90 no input file                        178
2339.90 files in cache                     14292
2339.90 cache size                           1.5 Gbytes
2339.90 max cache size                      10.0 Gbytes

The impact on build times is thus:

2339.91 total Time - 1212.30662608s
2339.92 total CPU - Average CPU Usage: 71.3377127439%
2339.93 total I/O - Reads: 77569; Writes: 178390; Read Bytes: 771170304; Write Bytes: 9791247360; Read Time (ms): 386029714; Write time (ms): 782197788

Read bytes almost doubles and write bytes goes down by a few hundred MB. In terms of percentage, write bytes isn't that much. It could be some other background process. Still, this is somewhat unexpected. Although for this single sample, wall time has gone down. CPU overhead for doing compression? A larger statistical sample is needed to conclude anything.

I almost enabled dumping of the raw data return by psutil. You can see it interleaved in the output at https://tbpl.mozilla.org/php/getParsedLog.php?id=15038182&tree=Try&full=1

It shows there are two disks on the machine and both are active during the build. I'll let RelEng clarify what each does.

There is some interesting data buried in there. For example:

SystemResourceUsage(start=1346990934.988518, end=1346990935.494347, cpu=[100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], io=iostat(read_count=3L, write_count=190L, read_bytes=16384L, write_bytes=16151552L, read_time=7943L, write_time=639238L), virt=vmem(total=8589934592L, available=5837058048L, percent=32.0, used=4822716416L, free=3757694976L, active=984604672L, inactive=2079363072L, wired=1758748672L), swap=swap(total=67108864L, used=0L, free=67108864L, percent=0.0, sin=228769792L, sout=0L))

These are dropped every 0.5s. What gets me about this is write_time is reported as increasing by 639238ms in just 0.5s! Um, no way. I've always been a little skeptical of psutil saying it is reporting I/O wait in ms because the times seem a bit high to me. Now I'm all but certain there is more to this story. I'll have to dig around how psutil grabs I/O wait on OS X then see if there are any gotchas (network sockets, etc).

Also, there is memory data there and AFAICT inactive memory gets down to 1.75GB free but never lower. So, um, is OS X not as greedy about the page cache as it could be?

So many questions...
> 2339.90 cache hit (direct)                     5
> 2339.90 cache hit (preprocessed)               2
> 2339.90 cache miss                          4827

Obviously ccache is going to hurt if you have 0 cache hits.

The question is how often that happens in practice.

Can we just dump ccache -s on a random sample of builders (try and otherwise) and see what that looks like?
Wish retrospectively granted: it's been clogging up the tail end of build logs for ages.

https://tbpl.mozilla.org/php/getParsedLog.php?id=15039078&tree=Mozilla-Inbound should have been a fairly good case, the slave had done inbound opt Linux32 three times in a row, 0 hits. The build before was https://tbpl.mozilla.org/php/getParsedLog.php?id=15036828&tree=Mozilla-Inbound, 2400 hits.
Philor is magic sometimes.  :)

I suspect that ccache is not helpful on m-i except when we back out or clobber.  Try should be a better use-case, I should think.

Also, I can't think of a reason not to use ccache_hardlink on the builders.  (It kind of messes you up if you have multiple trees, but that's not an issue.)  That should greatly reduce the I/O involved.
psutil is reporting I/O wait as microseconds because of a bug (it should be reporting milliseconds). I filed https://code.google.com/p/psutil/issues/detail?id=323.
(In reply to Justin Lebar [:jlebar] from comment #6)
> Also, I can't think of a reason not to use ccache_hardlink on the builders. 
> (It kind of messes you up if you have multiple trees, but that's not an
> issue.)  That should greatly reduce the I/O involved.

It's worth testing at least! Need to be sure to unset CCACHE_COMPRESS at the same time.

How will using multiple trees mess it up? Our non-try builders build multiple trees all on the same machine.

(In reply to Phil Ringnalda (:philor) from comment #5)
> Wish retrospectively granted: it's been clogging up the tail end of build
> logs for ages.
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=15039078&tree=Mozilla-
> Inbound should have been a fairly good case, the slave had done inbound opt
> Linux32 three times in a row, 0 hits. The build before was
> https://tbpl.mozilla.org/php/getParsedLog.php?id=15036828&tree=Mozilla-
> Inbound, 2400 hits.

I think in this case the builds with 0 hits did almost no real work. Only 0/801 cache hits/misses vs. 2407/1925 for the previous build.
> How will using multiple trees mess it up? Our non-try builders build multiple trees all on the 
> same machine.

ccache uses the mtime of the file in its cache to implement its LRU eviction mechanism.

As a result, whenever ccache has a cache hit, it updates the mtime of the object file in its cache to now.

The mtime lives in the inode, so if you're compiling with HARDLINK, a cache hit updates the mtime of the object file in all objdirs that it lives in.

So that means that if you compile tree A, then compile tree B, then compile tree A again, the final compilation of tree A may have spurious relinks because some of its object files may be newer than its libraries, even though the object files didn't change.

I don't think this is a big deal in our tree.  You almost always have to relink libxul anyway, and the other libraries are small and pretty quick to link.  But it's something to be aware of.
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.