Open Bug 988976 Opened 10 years ago Updated 2 years ago

BEWARE: ccache does not work efficiently with -DGRE_BUILDID=YYYYMMDDhhmmss compiler option

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect

Tracking

(Not tracked)

People

(Reporter: ishikawa, Unassigned)

References

Details

(This applies to OSX also, and Windows, too, if ccache becomes available there.)

BEWARE: ccache does not work efficiently with -DGRE_BUILDID=YYYYMMDDhhmmss

(This was originally posted
Bug 905646 - Enable -gsplit-dwarf by default on local builds.
by mistake as
https://bugzilla.mozilla.org/show_bug.cgi?id=905646#c22 
Robert Kaiser pointed out that this should be put in a
DIFFERENT bugzilla entry, and so here it is after a minor edit.)

On a PC (Debian GNU/Linux 32-bit), I noticed ccache did not use
cache very well. Where I thought hits would have been produced, there were
cache misses.

After investigation, I realize the problem.
Depending on the setup in mozconfig file, I think,
[and maybe the use of |mach| command ?]
a following option in the C compiler command line is produced:
e.g., -DGRE_BUILDID=20140327174446

It looks like a timestamp of a sort.  Since ccache uses the command
line arguments as well as the preprocessed C source file to create the
hash key for cached object lookup, if the value of GRE_BUILDID is
different at the each invocation of compiler even if the preprocessed
source file is the same, so is the hash and there is no cache hit.
[I am not sure when the GRE_BUILDID becomes different. It may be that
it is recreated at the time of reconfigure. I have only seen this
behavior yesterday and do not have much to say yet.]

Also, ccache has a feature called "direct hit" check which seems to
use only the timestamps of involved files (basically the list of files
produced by -MD to, say, gcc), and the command line arguments. If the
source file and involved files have not changed, and the compiler
command arguments are the same, then we don't even need to create a
preprocessed output, and can pick up the cached object immediately.
This obviously won't work if there is a changing GRE_BUILDID define on
the command line.)

So we have to be careful in setting up the mozconfig for local
compilation to make use of ccache effectively.

I am not sure which mozconfig setting controls the appearance of
-DGRE_BUILDID=YYYYMMHHmmss on the command line. (Or can it be an
environment variable setting? Does the use of |mach| affect this?)

At least, this -DGRE_BUILDID is not seen on another PC, so I think
it has something to do with different mozconfig setup or something.


Anyway, this is something one wants to be aware of when one wants to
use ccache for local development.  (Such an argument to C compiler
won't make ccache useful on shared compilation server, too.)

Someone in the know can fill in when this -DGRE_BUILDID is placed
on the command line, and when the value changes.

TIA

cf.
I checked if there is any related entry that handles ccache efficiency head on.
Here is a summary.

The short answer is not quite.

The long answer is as follows.
I checked if there is relevant entries for ccache.

After searching for "ccache" without the quotes, and
brief inspection to filter unrelated entries,
I ended up with the following possibly relevant entries.

Bug 927499 	Core 	Build Config 	NEW 	--- 	bootstrap.py should install ccache 	2013-10-16

Bug 911640 	Core 	mach 	 	NEW 	--- 	mach should warn if using ccache but the cache size is small 	2013-09-03

Bug 795183 	Core 	Build Config 	NEW 	--- 	mach misses build warnings that are printed using relative paths (due to use of "--with-ccache" and CCACHE_BASEDIR) 	2012-09-30

Bug 789317 	Core 	Build Config 	NEW 	--- 	Disable ccache 	2012-09-10

Bug 778711 	Thunderbird 	Build Config 	NEW 	--- 	Remove --with-ccache from Thunderbird mozconfigs 	2012-07-30

275790 	Firefox 	Build Config 	REOP 	--- 	more ccache friendly NSPR_CFLAGS 	2010-03-03


Note that no entries seem to be related to efficiency, i.e., of 
cache hits directly.
(Bug 911640 is somewhat related. It asks for the warning from |mach|
when
cache size is too small (and so there can not be many cache hits. I
use 3GB for cache directory.)

Bug 275790 (filed in 2004) discusses the issue of superficially different pathnames
that actually point to the files of exactly the same content in the
source tree.  Such source files ought to produce the same object and
thus can produce cache hit, but due to the different paths used on the
command line which was used as part of the input to hash function, it
did not produce the same hash value and thus there was no hit.
(But later version of ccache seemed to handle this different path
issue cleverly according to  a comment filed in 2010.
https://bugzilla.mozilla.org/show_bug.cgi?id=275790#c8
I have not verified the behavior of ccache yet.)

Bug 789317 and Bug 778711 seem rather disturbing because
it asks for disabling the use of ccache in the standard config (!?). 

Bug 789317 - Disable ccache
I read this and it reported a rather dismal performance  of ccache
WHEN THERE ARE VERY FEW HITS. This is to be expected.
It all depends on the compilation scenario.
As is noted in 
https://bugzilla.mozilla.org/show_bug.cgi?id=789317#c6
I think it is worthwhile on the try server when someone pushes the
build a few times, etc. or the configuration is changed, but
the source files remain the same (as in clobber), etc.

Bug 778711 has a more ominous title.
But as I read it, I realize
it has something to do with the use (or non-use) of ccache when
gcc/clang transition took place(?) on OS X.
Eventually, then in "Bug 775305 - reenable ccache on os x", ccache was shown
to be effective on try server with some data.
( https://bugzilla.mozilla.org/show_bug.cgi?id=775305#c9
and the followups.)
So was installed back again on OSX. So the use of ccache was
resurrected (Bug 703232 - Clean up mozconfigs a bit), I think.  
I now realize Bug 778711 "Remove --with-ccache from Thunderbird" was never acted upon
although the bugzilla entry was entered(?). 
So using ccache with thunderbird should be OK for your local config.

TIA
Hi,

This is the difference of the content of mozconfig for different PCs.

/tmp/t.tmp.1 is the sorted non-comment lines from the MOZCONFIG
where -DGRE_BUILDID is NOT produced.
/tmp/t.tmp.2 is the same data from a PC where -DGRE_BUILDID was
produced on the command line.

--- /tmp/t.tmp.1	2014-03-28 14:08:12.000000000 +0900
+++ /tmp/t.tmp.2	2014-03-28 14:08:12.000000000 +0900
@@ -1,27 +1,17 @@
+ 
 ac_add_options --disable-gstreamer
 ac_add_options --disable-jemalloc
 ac_add_options --disable-libjpeg-turbo
-ac_add_options --disable-necko-wifi
-ac_add_options --disable-necko-wifi
 ac_add_options --disable-unified-compilation
 ac_add_options --disable-webrtc
 ac_add_options --enable-application=mail
 ac_add_options --enable-crashreporter
 ac_add_options --enable-debug
 ac_add_options --enable-debug-symbols=-gsplit-dwarf
-ac_add_options --enable-default-toolkit=cairo-gtk2
-ac_add_options --enable-gc-zeal
-ac_add_options --enable-official-branding
 ac_add_options --enable-optimize="-g -O2 -freorder-blocks"
-ac_add_options --enable-shared
 ac_add_options --enable-tests
 ac_add_options --enable-valgrind
 ac_add_options --with-ccache=/usr/bin/ccache
-ac_add_options --with-system-ply
-mk_add_options BUILD_MODULES=all
-mk_add_options BUILD_OFFICIAL=1
-mk_add_options MOZILLA_OFFICIAL=1
-mk_add_options MOZ_CO_PROJECT=mail
-mk_add_options MOZ_MAKE_FLAGS="-j4"
-mk_add_options MOZ_MAKE_FLAGS="-s" 
-mk_add_options MOZ_OBJDIR=/REF-OBJ-DIR/objdir-tb3
+mk_add_options MOZ_MAKE_FLAGS="-j2"
+mk_add_options MOZ_MAKE_FLAGS="-s"
+mk_add_options MOZ_OBJDIR=/extra/ishikawa/download/TB-3HG/objdir-tb3

I think the following settings, which I didn't pay attention to much
when I created mozconfig for my local use, may have bearings on the
appearance of -DGRE_BUILDID.

I don't have the following settings on a PC (Debian GNU/Linux 32-bit)
where -DGRE_BUILDID was produced on the command line, whereas
I do have the following on a PC (Debian GNU/Linux 64-bit) where
I don't see -DGRE_BUILDID on the command line.

-ac_add_options --enable-official-branding
-ac_add_options --enable-shared
-ac_add_options --with-system-ply
-mk_add_options BUILD_MODULES=all
-mk_add_options BUILD_OFFICIAL=1
-mk_add_options MOZILLA_OFFICIAL=1
-mk_add_options MOZ_CO_PROJECT=mail

I have no idea which one may be responsible.
(Also, maybe I should check the environment variable setting?)
Or the script to invoke configure.
I searched for GRE_BUILDID in mxr.mozilla.org, but could not figure out exactly where
things are done this way.

On a production server that creates a daily release, having a timestamp makes sense.

But on a tryserver (or local development machine), this timestamp may not have
that much value (unless one is doing a private release to third parties, etc.).

So a warning about this would be appreciated in MDN document SOMEWHERE.
(I could not find it using google after a few tries. It means it is hidden very well if it is written at all. :-( )

TIA

PS: All I want is a "rock-solid mail client (tm)" that can be used in a
cross-platform manner. Well, it takes a long time to reach "rock-solid" part. It is
as if I am going through an adventure game :-)
OK, this problem may happen only for the top-level (in view of application structure)
nsAppRunner.o, etc.

If this were the case, the problem may not have such a big negative impact.

I am checking it in the background. Stay tuned.

TIA
OK, I tested a few compilation scenarios.

The situation is not as bad as it seemed at first sight.

1 - There *ARE* files that are compiled with -DGRE_BUILDID with different time stamps
  for each compilation after hg qpop -a; and hg qpush -a.
  (The particular set of qpop/qpush on my PC did not cause configure to run.)

2 - But the number of such files are not that large (only a couple in my couple of dozen hg patch queues).
  They are under mozilla/xre/ subdirectory of c-c.

3 - Also, although "direct hit" does NOT occur for these files, it turned out that cache hit after 
  the preprocessed matching DID produce cache hit. (I am not sure why I didn't see it before.
  It could be that the cache was purged without my knowing it: unlikely, but possible. After the 
  incident, I cleared the cache used by ccache for local testing.)

I am not sure why, in my particular observed compilation, the configure took place (after qpop/qpush, etc.), and excessive number of recompilation took place. But given my observation above,
if someone in the know can explain how/where -DGRE_BUILDID is generated on the fly to compile
certain files, I think my evaluation of the issue to be of grave importance can be
degraded to something that can be ignored safely (especially considering that the
large change to the files often results in configure, or even clobber.)
On an individual development machine, that is.

Yes, though, on a shared machine, if many users submit jobs as on tryserver, the
appearance of -DGRE_BUILDID=... with varying time will result in certainly a loss of
direct hit, and unnecessary CPP invocation that could have been saved.
Looking at the discussion of build infrastructure machine usage and maintenance,
anything that can lessen the workload should be tried. This may be one such target.

So if someone can explain why/how -DGRE_BUILDID is produced on the fly,
I will be very delighted.

TIA
> 3 - Also, although "direct hit" does NOT occur for these files, it turned
> out that cache hit after 
>   the preprocessed matching DID produce cache hit.

I would think the preprocessed file would only match for cases where we pass in -DGRE_BUILDID, but the file doesn't actually use the macro. For example, in toolkit/xre, only nsAppRunner.cpp uses GRE_BUILDID, but the define is passed into all the .cpp files in that directory. So I'd expect you'd get no direct hits, then get preprocessor hits for *.cpp minus nsAppRunner.cpp, then a cache miss for nsAppRunner.cpp

> I am not sure why, in my particular observed compilation, the configure took
> place (after qpop/qpush, etc.), and excessive number of recompilation took
> place. But given my observation above,
> if someone in the know can explain how/where -DGRE_BUILDID is generated on
> the fly to compile
> certain files, I think my evaluation of the issue to be of grave importance
> can be
> degraded to something that can be ignored safely (especially considering
> that the
> large change to the files often results in configure, or even clobber.)
> On an individual development machine, that is.

The buildid is generated from config/Makefile.in:

# Generate a new buildid every time we "export" in config... that's only
# supposed to be once per-build!
export::
ifdef MOZ_BUILD_DATE
        printf '%s' $(MOZ_BUILD_DATE) > buildid
else
        $(PYTHON) $(topsrcdir)/toolkit/xre/make-platformini.py --print-buildid > buildid
endif

When you do a top-level build, you'll see a line like this in the output:

/home/mshal/mozilla-central-git/obj-x86_64-unknown-linux-gnu/_virtualenv/bin/python /home/mshal/mozilla-central-git/toolkit/xre/make-platformini.py --print-buildid > buildid

That generates the new buildid file, which is used by some Makefiles to generate the '-DGRE_BUILDID=XYZ' command-line argument.

Although I agree the problem is not as big as you initially suspected (-DGRE_BUILDID is confined to ~7 Makefiles, gets passed in to ~23 .cpp files, and is only actually used in ~3 .cpp files), we could still make some improvements here:

1) Only pass in -DGRE_BUILDID to files that actually use it (this would help with the direct hit problem)
2) Disable caching on files that do use GRE_BUILDID, since they'll pretty much never get a cache hit anyway
3) Fix other the buildid dependency issue (bug 729488)
(In reply to Michael Shal [:mshal] from comment #4)
> > 3 - Also, although "direct hit" does NOT occur for these files, it turned
> > out that cache hit after 
> >   the preprocessed matching DID produce cache hit.
> 
> I would think the preprocessed file would only match for cases where we pass
> in -DGRE_BUILDID, but the file doesn't actually use the macro. For example,
> in toolkit/xre, only nsAppRunner.cpp uses GRE_BUILDID, but the define is
> passed into all the .cpp files in that directory. So I'd expect you'd get no
> direct hits, then get preprocessor hits for *.cpp minus nsAppRunner.cpp,
> then a cache miss for nsAppRunner.cpp
> 
> > I am not sure why, in my particular observed compilation, the configure took
> > place (after qpop/qpush, etc.), and excessive number of recompilation took
> > place. But given my observation above,
> > if someone in the know can explain how/where -DGRE_BUILDID is generated on
> > the fly to compile
> > certain files, I think my evaluation of the issue to be of grave importance
> > can be
> > degraded to something that can be ignored safely (especially considering
> > that the
> > large change to the files often results in configure, or even clobber.)
> > On an individual development machine, that is.
> 
> The buildid is generated from config/Makefile.in:
> 
> # Generate a new buildid every time we "export" in config... that's only
> # supposed to be once per-build!
> export::
> ifdef MOZ_BUILD_DATE
>         printf '%s' $(MOZ_BUILD_DATE) > buildid
> else
>         $(PYTHON) $(topsrcdir)/toolkit/xre/make-platformini.py
> --print-buildid > buildid
> endif
> 
> When you do a top-level build, you'll see a line like this in the output:
> 
> /home/mshal/mozilla-central-git/obj-x86_64-unknown-linux-gnu/_virtualenv/bin/
> python /home/mshal/mozilla-central-git/toolkit/xre/make-platformini.py
> --print-buildid > buildid
> 
> That generates the new buildid file, which is used by some Makefiles to
> generate the '-DGRE_BUILDID=XYZ' command-line argument.
> 

I see, thank you for the explanation.

> Although I agree the problem is not as big as you initially suspected
> (-DGRE_BUILDID is confined to ~7 Makefiles, gets passed in to ~23 .cpp
> files, and is only actually used in ~3 .cpp files), we could still make some
> improvements here:

Yes, it is true.
If this happens on a shared compilation farm or something, we would like to
use up CPU to the last remaining cycle and
the following optimization would be desirable.
 
> 1) Only pass in -DGRE_BUILDID to files that actually use it (this would help
> with the direct hit problem)
> 2) Disable caching on files that do use GRE_BUILDID, since they'll pretty
> much never get a cache hit anyway
> 3) Fix other the buildid dependency issue (bug 729488)

TIA
See Also: → 729488
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.