Open Bug 1422041 Opened 7 years ago Updated 2 years ago

Detect unused files in the tree

Categories

(Developer Infrastructure :: Lint and Formatting, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: Gijs, Unassigned)

References

(Blocks 1 open bug)

Details

We have tests that detect frontend files we *ship* (js/css/images, at least - not sure if we cover html/xul, but we should!) that are unused. This is very nice.

We unfortunately do not have tests that check for stuff that's hanging around our tree that isn't shipped. It would be nice if we did, because that would help with:

1) not having dead code hanging around the source tree, leading to confusing DXR queries ("oh, foo is using feature X, I guess I have to fix foo when making changes to X" despite foo being unused), *even* more files for new folk to wrap their head around, etc.
2) warnings when people add files but forget the manifest entries or other integration bits that ensure we actually do something with the files
3) quicker checkouts. This is probably the least important.


Ideally, I'd like to run this as a static-ish lint/build check. I don't know much about those, how to write them, or how I'd go about doing this. My understanding right now is that we include frontend js/css/image files by means of:

- js modules
- js XPCOM components
- jar.mn files
- %includes and similar preprocessing of files that are in one of the other categories
- test manifests and their support files

I don't know if there's an easy way to, effectively, build a graph of used files out of those 4-5-ish sources of truth, and compare it with the total list of files.

:gps, do you? :-)
Flags: needinfo?(gps)
(this after the recent changes wrt the -moz-border-*-colors CSS property prompted me to find 2 unused CSS files in DXR, and a bunch of unused images when I started digging in some of the related dirs... I expect there's more where that came from, just not sure how to find it right now...)
At https://marco-c.github.io/code-coverage-reports/, we have a report of source files with 0 coverage. This is not the same thing you are asking, but it can be useful for the same purpose.
Clearly 0 coverage could mean either that the file is not tested but should be, or that it is dead code.
(In reply to Marco Castelluccio [:marco] from comment #2)
> At https://marco-c.github.io/code-coverage-reports/, we have a report of
> source files with 0 coverage. This is not the same thing you are asking, but
> it can be useful for the same purpose.
> Clearly 0 coverage could mean either that the file is not tested but should
> be, or that it is dead code.

I'm a little confused about what "0 coverage" means here. aboutRobots is listed, for instance, but looking at:


https://codecov.io/gh/marco-c/gecko-dev/src/master/browser/base/content/aboutRobots.xhtml

at least one line of script in that file did get executed.

(this is obviously not very good for overall coverage*** but it shows the file is used in the product)

I'm also unsure if this report even lists files that we don't package in the product, but I don't know of a good JS example off-hand. :-(

***
Note that actually, there are only about 10-ish lines of script in that file, so I hope we actually generate coverage based on executable lines, not just based on total number of lines, as "coverage" for the HTML/CSS itself is a bit meaningless of course...
(In reply to :Gijs from comment #3)
> (In reply to Marco Castelluccio [:marco] from comment #2)
> > At https://marco-c.github.io/code-coverage-reports/, we have a report of
> > source files with 0 coverage. This is not the same thing you are asking, but
> > it can be useful for the same purpose.
> > Clearly 0 coverage could mean either that the file is not tested but should
> > be, or that it is dead code.
> 
> I'm a little confused about what "0 coverage" means here. aboutRobots is
> listed, for instance, but looking at:
> 
> 
> https://codecov.io/gh/marco-c/gecko-dev/src/master/browser/base/content/
> aboutRobots.xhtml
> 
> at least one line of script in that file did get executed.

In all JavaScript files the top-level is always executed (unless the file is not loaded at all), so there are no files which are completely not covered. We consider as "0 coverage" the files which contain at least one function and whose functions are all not covered.


> I'm also unsure if this report even lists files that we don't package in the
> product, but I don't know of a good JS example off-hand. :-(

No, it doesn't. That's why using this report can be useful for dead code removal, but restricted to files which we actually at least load (which are the most problematic ones in my opinion, as they not only make the code less maintainable, but also negatively affect performance).
We could generate such a report by gathering all JS files in the tree in a list and removing the covered ones from it.

> 
> ***
> Note that actually, there are only about 10-ish lines of script in that
> file, so I hope we actually generate coverage based on executable lines, not
> just based on total number of lines, as "coverage" for the HTML/CSS itself
> is a bit meaningless of course...

The coverage percentage is calculated as covered lines / coverable lines, so we're fine on this point.
The build system and packaging tools have a reasonable idea of which files are used. But we don't track this data explicitly.

A potential way I see this working is we produce a build system artifact listing all files consumed by the build. We can aggregate all these file sets for a particular revision in a lint task [that depends on all packaging tasks] and then look for orphans.

Of course, getting the build system to emit a list of all files touched by packaging may not be trivial. I feel like we can get 90% there with little effort. The long tail I'm less confident about.
Flags: needinfo?(gps)
One problem is that we have *many* different ways things are built and packaged, some of which are out of tree (comm-central), and while one particular build can possibly tell you which files weren't used, that won't tell you if those files are never ever used by some other configuration.
(In reply to Mike Hommey [:glandium] from comment #6)
> One problem is that we have *many* different ways things are built and
> packaged, some of which are out of tree (comm-central), and while one
> particular build can possibly tell you which files weren't used, that won't
> tell you if those files are never ever used by some other configuration.

Sure, we'll need to check things manually, and need a list of exceptions of some sort so nobody tries to remove CLOBBER or all the readme files. I can live with that - we've done something along those lines for most static checks we use, and they still add value for everything else we check.
Perfect is very much the enemy of good here. If we can get something that gets us coverage for specific directories and allows us to have strict rules for those directories - allowing us to push all the "wonky" things we can't properly annotate in their own directories - that's better than nothing because it allows us to focus all our scrutiny on a limited set of inputs.

But yeah, glandium's point still stands. There's a long tail of random things and even refactoring things so all the "bad parts" are isolated might be a lot of work.
I'll add that the way things are currently set, we may very well be shipping things that aren't actually needed, so looking at what we ship may not be the right thing to look at. Also note that "shipping" is very broad, from the Firefox installer to test archives, etc.
(In reply to Mike Hommey [:glandium] from comment #9)
> I'll add that the way things are currently set, we may very well be shipping
> things that aren't actually needed

We have a test trying to detect this for files shipped within the Firefox installer: https://searchfox.org/mozilla-central/source/browser/base/content/test/static/browser_all_files_referenced.js (Obviously the whitelist there is still longer than I would like). Do you see ways that unneeded files could get packaged but not caught by this test? (other than bug 1356029 that I still want to finish)
See the part of my comment that you removed. Also, that test is limited to frontend files.
Product: Testing → Firefox Build System
Product: Firefox Build System → Developer Infrastructure
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.