I instrumented the build system to gather system metrics during building and pushed a few try builds to gather data. Data is limited to OS X builders because they are the only ones with Python 2.7 + the Python development files. ccache enabled: https://tbpl.mozilla.org/?tree=Try&rev=d5b5305bb373 (bld-lion-r5-035) https://tbpl.mozilla.org/?tree=Try&rev=e85bd40362c4 (bld-lion-r5-040) ccache disabled: https://tbpl.mozilla.org/?tree=Try&rev=806b3441e4a7 (bld-lion-r5-006) https://tbpl.mozilla.org/?tree=Try&rev=8eaa60c9dc14 (bld-lion-r5-026) https://tbpl.mozilla.org/?tree=Try&rev=13ab77733075 (bld-lion-r5-006) The raw data is printed at the bottom of the build logs. System resource usage for the opt builds breaks down as follows (mean followed by raw values): ccache enabled ============== Wall time (s): 1266 (1281; 1251) Effective CPU (%): 65.99 (64.14; 67.84) I/O Read (bytes): 431,390,720 (441,974,784; 420,806,656) I/O Write (bytes): 10,212,594,688 (10,196,536,320; 10,228,653,056) I/O wait read (ms): 284,524,691 (338,120,630; 230,928,752) I/O wait write (ms): 717,335,601 (791,824,447; 642,846,754) ccache disabled =============== Wall time (s): 1032 (1029; 1036; 1032) Effective CPU (%): 73.34 (73.25; 73.09; 73.68) I/O Read (bytes): 322,984,618 (280,956,928; 279,068,672; 408,928,256) I/O Write (bytes): 2,588,996,608 (2,590,042,112; 2,586,801,152; 2,590,146,560) I/O wait read (ms): 86,867,161 (67,943,805; 86,817,741; 105,839,937) I/O wait write (ms): 264,928,619 (259,033,841; 275,819,475; 259,932,542) Average Delta (ccache to no ccache) =================================== Wall time (s): -234 (3:54) Effective CPU (%): 7.35 I/O Read (bytes): -108,406,102 I/O Write (bytes): -7,623,598,080 I/O wait read (ms): -197,659,530 I/O wait write (ms): -452,406,982 Percentages (no ccache over ccache then inverse) ========================================= Wall time: 81.5% : 122.7% I/O Read: 74.9% : 133.6% I/O Write: 25.4% : 394.5% I/O wait read: 30.5% : 327.5% I/O wait write: 36.9% : 270.8% Before we analyze, some asterisks: * Try builders always do clobber builds * Are Try builders on the same hardware as actual builders? * The sample set is small. * There could be variance in the hardware configurations * Try builds aren't incremental builds (saying it again because it is important) * Repeat builders could be influenced by page cache (although bld-lion-r5-006 was responsible for no ccache #1 and #3 and the numbers are very similar with the exception of read bytes. No clue why that's so different. FWIW, I tried to get a larger sample set, but I didn't want to abuse Try. I got a few repeat builders for ccache builds, which threw off metrics so I threw them away. These show that clobber builds with populated ccache are pretty fast - about 558s total (or about 2x faster than without a populated ccache)! So, ccache *does* have its place. While the sample set is small, it shows that ccache has an overall negative impact on build performance, at least under the conditions tested. Specifically, it increases I/O significantly. The extra 7.6GB write I/O for ccache is IMO absurd. The extra time spent in I/O wait is probably the main contributor to slower build times with ccache. Anyway, I would like us to conduct an experiment: I would like for ccache to be disabled for a day or two (maybe a week so we have a bigger data set). If any build configurations show an improvement in build times, we leave it disabled. If not, we just revert the change. It's quite possible we'll see mixed results across Linux and OS X, clobber and incremental, try and non-try, and possibly even debug and non-debug.
This contradicts some of the analysis done in bug 775305.
(In reply to Chris AtLee [:catlee] from comment #1) > This contradicts some of the analysis done in bug 775305. So it does. And that is pretty recent, too. Hmmm. FWIW, I pushed a try build that performed a |ccache -C -z| right before configure and |ccache -s| right after |make build| finished. The results: total Time - 1213.80863404s total CPU - Average CPU Usage: 72.2763733416% total I/O - Reads: 27862; Writes: 185471; Read Bytes: 398737408; Write Bytes: 10115586048; Read Time (ms): 130798731; Write time (ms): 572693542 cache directory /builds/ccache cache hit (direct) 5 cache hit (preprocessed) 2 cache miss 4827 called for link 23 called for preprocessing 8 compile failed 1 preprocessor error 1 bad compiler arguments 1 unsupported source language 3 autoconf compile/link 24 unsupported compiler option 121 no input file 178 files in cache 14303 cache size 450.3 Mbytes This is with CCACHE_COMPRESS=1. If the size of the ccache is only 450MB, how do ccache builds consume 7+GB more write I/O than non-ccache builds?! My resource logging also grabs per-tier and per-directory-in-tier resource usage (I don't believe there is in PARALLEL_DIRS magic happening here, so the numbers should be exclusive). Take a look at a log for yourself: https://tbpl.mozilla.org/php/getParsedLog.php?id=15035063&tree=Try&full=1. Here are some that stand out: platform libs content: 1786042368 platform libs layout: 1272828928 We probably have a dozen directories in the hundreds of megabytes range. Where is this I/O coming from?! If I had root on these machines I'd throw DTrace at the problem. (http://dtrace.org/blogs/brendan/2011/10/10/top-10-dtrace-scripts-for-mac-os-x/ has some good scripts). There is also a non-0 chance that psutil and/or my resource monitoring code is misreporting data. Someone with shell access to a build slave should be able to validate these numbers. If you do the math, 7.623GB spread over an extra 234s comes out to ~32.6MB/s. That's pretty impressive if you ask me. Something smells fishy...
The plot thickens. I pushed a build that unsets CCACHE_COMPRESS from the environment. I can confirm it works by looking at ccache -s after the build: 2339.90 cache directory /builds/ccache 2339.90 cache hit (direct) 5 2339.90 cache hit (preprocessed) 2 2339.90 cache miss 4827 2339.90 called for link 23 2339.90 called for preprocessing 8 2339.90 compile failed 1 2339.90 preprocessor error 1 2339.90 bad compiler arguments 1 2339.90 unsupported source language 3 2339.90 autoconf compile/link 24 2339.90 unsupported compiler option 121 2339.90 no input file 178 2339.90 files in cache 14292 2339.90 cache size 1.5 Gbytes 2339.90 max cache size 10.0 Gbytes The impact on build times is thus: 2339.91 total Time - 1212.30662608s 2339.92 total CPU - Average CPU Usage: 71.3377127439% 2339.93 total I/O - Reads: 77569; Writes: 178390; Read Bytes: 771170304; Write Bytes: 9791247360; Read Time (ms): 386029714; Write time (ms): 782197788 Read bytes almost doubles and write bytes goes down by a few hundred MB. In terms of percentage, write bytes isn't that much. It could be some other background process. Still, this is somewhat unexpected. Although for this single sample, wall time has gone down. CPU overhead for doing compression? A larger statistical sample is needed to conclude anything. I almost enabled dumping of the raw data return by psutil. You can see it interleaved in the output at https://tbpl.mozilla.org/php/getParsedLog.php?id=15038182&tree=Try&full=1 It shows there are two disks on the machine and both are active during the build. I'll let RelEng clarify what each does. There is some interesting data buried in there. For example: SystemResourceUsage(start=1346990934.988518, end=1346990935.494347, cpu=[100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], io=iostat(read_count=3L, write_count=190L, read_bytes=16384L, write_bytes=16151552L, read_time=7943L, write_time=639238L), virt=vmem(total=8589934592L, available=5837058048L, percent=32.0, used=4822716416L, free=3757694976L, active=984604672L, inactive=2079363072L, wired=1758748672L), swap=swap(total=67108864L, used=0L, free=67108864L, percent=0.0, sin=228769792L, sout=0L)) These are dropped every 0.5s. What gets me about this is write_time is reported as increasing by 639238ms in just 0.5s! Um, no way. I've always been a little skeptical of psutil saying it is reporting I/O wait in ms because the times seem a bit high to me. Now I'm all but certain there is more to this story. I'll have to dig around how psutil grabs I/O wait on OS X then see if there are any gotchas (network sockets, etc). Also, there is memory data there and AFAICT inactive memory gets down to 1.75GB free but never lower. So, um, is OS X not as greedy about the page cache as it could be? So many questions...
> 2339.90 cache hit (direct) 5 > 2339.90 cache hit (preprocessed) 2 > 2339.90 cache miss 4827 Obviously ccache is going to hurt if you have 0 cache hits. The question is how often that happens in practice. Can we just dump ccache -s on a random sample of builders (try and otherwise) and see what that looks like?
Wish retrospectively granted: it's been clogging up the tail end of build logs for ages. https://tbpl.mozilla.org/php/getParsedLog.php?id=15039078&tree=Mozilla-Inbound should have been a fairly good case, the slave had done inbound opt Linux32 three times in a row, 0 hits. The build before was https://tbpl.mozilla.org/php/getParsedLog.php?id=15036828&tree=Mozilla-Inbound, 2400 hits.
Philor is magic sometimes. :) I suspect that ccache is not helpful on m-i except when we back out or clobber. Try should be a better use-case, I should think. Also, I can't think of a reason not to use ccache_hardlink on the builders. (It kind of messes you up if you have multiple trees, but that's not an issue.) That should greatly reduce the I/O involved.
psutil is reporting I/O wait as microseconds because of a bug (it should be reporting milliseconds). I filed https://code.google.com/p/psutil/issues/detail?id=323.
(In reply to Justin Lebar [:jlebar] from comment #6) > Also, I can't think of a reason not to use ccache_hardlink on the builders. > (It kind of messes you up if you have multiple trees, but that's not an > issue.) That should greatly reduce the I/O involved. It's worth testing at least! Need to be sure to unset CCACHE_COMPRESS at the same time. How will using multiple trees mess it up? Our non-try builders build multiple trees all on the same machine. (In reply to Phil Ringnalda (:philor) from comment #5) > Wish retrospectively granted: it's been clogging up the tail end of build > logs for ages. > > https://tbpl.mozilla.org/php/getParsedLog.php?id=15039078&tree=Mozilla- > Inbound should have been a fairly good case, the slave had done inbound opt > Linux32 three times in a row, 0 hits. The build before was > https://tbpl.mozilla.org/php/getParsedLog.php?id=15036828&tree=Mozilla- > Inbound, 2400 hits. I think in this case the builds with 0 hits did almost no real work. Only 0/801 cache hits/misses vs. 2407/1925 for the previous build.
> How will using multiple trees mess it up? Our non-try builders build multiple trees all on the > same machine. ccache uses the mtime of the file in its cache to implement its LRU eviction mechanism. As a result, whenever ccache has a cache hit, it updates the mtime of the object file in its cache to now. The mtime lives in the inode, so if you're compiling with HARDLINK, a cache hit updates the mtime of the object file in all objdirs that it lives in. So that means that if you compile tree A, then compile tree B, then compile tree A again, the final compilation of tree A may have spurious relinks because some of its object files may be newer than its libraries, even though the object files didn't change. I don't think this is a big deal in our tree. You almost always have to relink libxul anyway, and the other libraries are small and pretty quick to link. But it's something to be aware of.
You need to log in before you can comment on or make changes to this bug.