Tool to summarize build dependencies

NEW
Unassigned

Status

Firefox Build System
General
5 years ago
2 months ago

People

(Reporter: gps, Unassigned)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments, 2 obsolete attachments)

(Reporter)

Description

5 years ago
We know we have a header dependency problem in Gecko with some headers getting included into thousands of files and causing mass dependency invalidation.

I'd like to put a tool in developer's hands that allows them to summarize dependency information.

I'm thinking for the first incantation we can just search for .pp files in the objdir and print summaries or dump lists of dependent targets.

I wrote a tool to do this for http://gregoryszorc.com/presentations/2012-11-29-firefox-build-system/. I'm sure I have it somewhere in my patch queue. I'll resurrect it as a mach command or something and get it checked in.
I hope you're using the -H option supported by GCC and clang!  It prints out the headers that are read and makes this kind of thing really easy.  

Crucially, both compilers avoid reading .h files that are #included multiple times by a .cpp file if they can (assuming the .h files have the appropriate #ifndef guards -- see bug 881579 for details), and if you try to compute how many headers are #included without accounting for that you'll probably get the wrong number.
Nick did a bunch of work in bug 634839 and bug 888768 on reducing includes in SpiderMonkey, which is worth a read for anybody looking to do that in the browser.
I also posted a detailed summary to the related dev.build/dev.platform thread.  It's not visible yet at https://groups.google.com/forum/#!topic/mozilla.dev.builds/sQgQtSDr6yI but hopefully will be soon.
(Reporter)

Comment 4

5 years ago
Created attachment 785541 [details] [diff] [review]
mach command to extract dependency counts for pp files

Turns out I had an old patch in my patch queue that prints counts of
dependencies. It does this by parsing all the .pp dependency files in
the object directory created during builds. The files listed in these
files come from the compiler, so I assume the list is as robust as
passing -H to the compiler.

We probably want to make this more useful before landing. What other
features do people want?
(Reporter)

Updated

5 years ago
Assignee: nobody → gps
Can you give some example output?  I'm still not clear exactly what this tool does.
Created attachment 785594 [details]
Python code to write an inclusion graph

My computer helpfully crashed just as I was uploading this the first time :-(

This a simple python script that tries to read all of the files in the source and object directories of mozilla-central/comm-central [I ran it on a ccrework'd comm-central locally] and build up a complete directed graph of all inclusions in the tree. It's not perfect--it can't resolve about 2500 of the 115000 inclusions to specific files--but the heuristics I used were:

1. Find the header in dist/include[/nspr]? Use that!
2. Is it in the local srcdir? Use that!
3. Is it in the relevant objdir? Use that!
4. Is there only one file with that name? Use that!
5. Either there's no file or too many files... we're screwed :-)

The output is a DOT file written to stdout (yes, I'm lazy). The graph is way too big for DOT to handle, but the format is well-known enough that other graph libraries can read or write it.
Created attachment 785599 [details]
cycles.txt

As an example of what this map can tell us, here is a list of all simple cycles in the inclusion graph. It's amusing that the number of cycles with 1 node in them is not 0...
Created attachment 785612 [details]
How many non-.h files #include a header transitively

I ran this python code with my full includes list:

import networkx as nx

includesG = nx.DiGraph(nx.read_dot('/src/mozilla-tools/full-includes.dot'))

files = dict()
for n in includesG.nodes():
    if n.endswith('.h'):
        continue
    for x in nx.descendants(includesG, n):
        files[x] = files.get(x, 0) + 1
includeCount = list((v,k) for k,v in files.items())
includeCount.sort(reverse=True)
for count, name in includeCount:
    print '%5d  %s' % (count, name)


And that made this file. The results are somewhat surprising--the most heavily included non-XPCOM/MFBT/JS/mozalloc header is PSpdyPush3.h. No, I am not kidding, and I looked it up, and the rationale actually makes sense.
Created attachment 787875 [details]
Headers by number of changes

This is another data point to correlate with: the number of times .h/.idl/.ipdl files have been changed in the last 10000 changesets before a11cd50edb0d (getting us back to the beginning of May, so about 3 months worth).

Conclusion, we're rebuilding more than 3000 files daily on average because of jsapi.h alone.
A lot of those JS headers have seen a lot of churn recently because of all the header/#include refactoring I've done in the past couple of months.

Having said that, jsapi.h is enormous and does get a lot of churn.
Created attachment 789006 [details]
List of inclusion directives by criticality

This is a list of all include relations, ranked by their "criticality": the number of source files that would no longer include this header if the #include directive were dropped. Note that I'm scraping by the source code here, so this is going to assume that every #ifdef foo/#include/#endif is being activated. I've also eliminated the 92% of the file where the criticality is 1 or 0.

Cycles are broken in arbitrary places, and I've changed my definition of source files from "anything not named .h or .idl" to "any filename that contains .c or .C or .m and is never included by anyone else"
At first glance I wonder if templatizing XPCOM strings would be a net win over the file I/O incurred by all those includes...
(In reply to Nathan Froyd (:froydnj) from comment #12)
> At first glance I wonder if templatizing XPCOM strings would be a net win
> over the file I/O incurred by all those includes...

Bug 717356? I don't know if C-style templating is slower or faster than C++-style templating.
> This is a list of all include relations, ranked by their "criticality"

This line appears to be bogus:

  707 js/src/jit/shared/IonFrames-shared.h from ipc/chromium/src/base/hash_tables.h
Blocks: 904962
Blocks: 905017
I was cooking up a patch as part of bug 858927 to limit the inclusion of TimeStamp.h because every time I touched that header it seemed that I had to rebuild half of my tree. Shall I make a separate bug for it or is it enough to track that work there?

Comment 16

5 years ago
(In reply to comment #15)
> I was cooking up a patch as part of bug 858927 to limit the inclusion of
> TimeStamp.h because every time I touched that header it seemed that I had to
> rebuild half of my tree. Shall I make a separate bug for it or is it enough to
> track that work there?

Please file a separate bug, thanks!
(In reply to Joshua Cranmer [:jcranmer] from comment #11)
> Created attachment 789006 [details]
> List of inclusion directives by criticality

Might it be worthwhile to put this on an Etherpad or something so people can investigate the top problems and annotate them with bug numbers/reasons why they can't be removed/etc.?
> Might it be worthwhile to put this on an Etherpad or something so people can
> investigate the top problems and annotate them with bug numbers/reasons why
> they can't be removed/etc.?

That would be useful.  Having said that, we'll need to re-run the analysis every so often (e.g. I just landed bug 904962 which removed a few critical ones).  I'm not sure how to combine those two needs.  I guess we could get the diff each time it's re-run, and then somehow incorporate the changes into the etherpad?
jcranmer, a bunch of header improvements have been made.  Can you regenerate the "transitively" and "criticality" lists?  Thanks!
Flags: needinfo?(Pidgeot18)
Created attachment 798117 [details]
Transitive includes of headers
Attachment #785612 - Attachment is obsolete: true
Flags: needinfo?(Pidgeot18)
Created attachment 798118 [details]
List of inclusion directives by criticality

And the updated version of criticality. These are up-to-date as of 4ccfffb6262a.
Attachment #789006 - Attachment is obsolete: true
Attachment #798117 - Attachment is patch: false
Here are the top 5 transitively included files, before:

 7915  mozilla/obj-x86_64-unknown-linux-gnu/dist/include/nspr/prcpucfg.h
 7726  mozilla/obj-x86_64-unknown-linux-gnu/dist/include/nspr/prtypes.h
 7726  mozilla/obj-x86_64-unknown-linux-gnu/dist/include/nspr/obsolete/protypes.h
 7584  mozilla/obj-x86_64-unknown-linux-gnu/dist/include/nspr/prinrval.h
 7327  mozilla/obj-x86_64-unknown-linux-gnu/dist/include/nspr/prthread.h

and after:

 5922  --GENERATED--/dist/include/nspr/prcpucfg.h
 5733  --GENERATED--/dist/include/nspr/prtypes.h
 5733  --GENERATED--/dist/include/nspr/obsolete/protypes.h
 5516  --GENERATED--/dist/include/nspr/prinrval.h
 5257  --GENERATED--/dist/include/nspr/prlong.h

Four of the five are the same files, but the numbers are way down.  Looks like the work that various people are doing is having an effect!
(In reply to Nicholas Nethercote [:njn] from comment #22)
> Four of the five are the same files, but the numbers are way down.  Looks
> like the work that various people are doing is having an effect!

The two lists aren't comparable: I excluded only files ending in .h for the old version, but both .h and .idl in the second version. I think I may also have used comm-central for the old version, but all recent stats are computed with mozilla-central.
(Reporter)

Updated

5 years ago
Assignee: gps → nobody

Updated

3 months ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.