Closed Bug 1494069 Opened 1 year ago Closed 1 year ago

[mozlint] Be more explicit with arguments and return values in `filterpaths`

Categories

(Firefox Build System :: Lint and Formatting, enhancement)

enhancement
Not set

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(8 files, 1 obsolete file)

46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
Currently the pathutils.filterpaths function accepts a linter definition and lintargs. It then extracts the values out of those that it needs. It can also modify these dictionaries in-place sometimes.

This can lead to surprising behaviour and make debugging more difficult. I'd like to change it so that we need to explicitly pass in only the values that it needs to perform the filtering. This way not only will it be easier to keep track of, but we won't accidentally modify any references in-place. Instead, we'll need to explicitly return the values that the greater mozlint module expects.

This will give us clear insight into the inputs and outputs of this function, in a sense making it more "dumb".
Scandir is a faster implementation of os.listdir since it caches file metadata
as it works. Using listdir and then calling things like os.path.isfile after
the fact will result in multiple system calls. Whereas with scandir, there will
only be a single system call per file.

Scandir is part of the stdlib in Python 3.6+, so the following can be used for
Python 2/3 compatible code:

try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk
Often we specify globs in our exclude patterns, e.g:

exclude:
    - **/node_modules
    - obj*

However, these globs get expanded out to *every* file that matches them. This
can sometimes be thousands or even tens of thousands of files.

We then pass these paths on to the underlying linters and tell them to
exclude them all. This causes a lot of overhead and slows down performance.

This commit implements a "collapse" function. Given a set of paths, it'll
collapse them into the smallest set of parent directories that contain the
original set, and that don't contain any extra files.

For example, given a directory structure like this:

a
-- foo.txt
-- b
   -- bar.txt
   -- baz.txt
-- c
   -- ham.txt
   -- d
      -- spam.txt

Then the following will happen:

  >>> collapse(['a/foo.txt', 'a/b/bar.txt', 'a/c/ham.txt', 'a/c/d/spam.txt'])
  ['a/foo.txt', 'b/bar.txt', 'c']

Since all files under directory 'c' are specified by the original set (both
'c/ham.txt' and 'c/d/spam.txt'), we can collapse it down to just 'c'. However
not all files under 'b' are specified (we're missing 'a/b/baz.txt'), so we
can't collapse 'b' (and by extension also can't collapse 'a').

If we had included 'a/b/baz.txt':

  >>> collapse(['a/foo.txt', 'a/b/bar.txt', 'a/b/baz.txt', 'a/c/ham.txt', 'a/c/d/spam.txt'])
  ['a']

In both cases, the smallest set of paths that contains the original set (and
only the original set) is computed.

The collapse function has a little bit of overhead but it's not too bad.
For example collapsing all files matched by '**/node_modules' takes ~0.015s.
Collapsing two full objdirs, takes ~0.6s. But a follow up commit is planned to
make sure we stop using 'obj*' to reduce that overhead.

Depends on D7738
When using globs in exclude directorives, FileFinder will return every *file*
that gets matched. This is can be thousands of files in the case of an objdir.
While we now collapse these files down to highest possible directories, this
collapse operation can still take a noticeable amount of time (0.6s). This
simply scans topsrcdir for files that start with 'obj' to avoid the glob.

This also moves the '_activate_virtualenv' call to the top of the function
because in CI, this will cause an objdir to be created (to store the
virtualenv). If this happens *after* calculating the global excludes, we won't
catch it since it doesn't exist yet. This will result in the objdir's
virtualenv being linted and erroneous failures.

Depends on D7739
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a2956764213e
Vendor scandir==1.9.0 to third_party/python, r=davehunt
https://hg.mozilla.org/integration/autoland/rev/fe0fb280dbfc
[mozlint] Collapse exclude paths into their smallest possible set, r=egao
https://hg.mozilla.org/integration/autoland/rev/9752f179b9c3
[lint] Explicitly list out objdirs rather than depend on 'obj*' in the global exclude, r=rwood
Backout by ncsoregi@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/580c03c8ae38
Backed out 3 changesets for blocking 1498215. a=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Turns out vendoring scandir causes problems if people happen to have an older version of it installed on their system. Also I decided to try using os.listdir and do some profiling comparisons. It turns out that using scandir isn't even much of a perf improvement for mozlint's common case (where we use ** with FileFinder). There likely are edge cases where we would see some bigger improvements, but it's not really worth the headaches that vendoring scandir causes.

So I'm just going to re-implement this with os.listdir. If we ever end up vendoring scandir for some other purpose (or start requiring python 3), we can switch back to using it at that point.
Keywords: leave-open
Attachment #9014452 - Attachment is obsolete: true
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7e586708ecc
[mozlint] Collapse exclude paths into their smallest possible set, r=egao
https://hg.mozilla.org/integration/autoland/rev/2064477895c3
[lint] Explicitly list out objdirs rather than depend on 'obj*' in the global exclude, r=rwood
Patch dump incoming. This bug kind of ended up being a bit of a catch-all for all the minor issues I found with path filtering as I was working through my mozlint refactor. The good news is that I've been adding tests along the way and I'm now way more confident that the filtering logic is doing what it's supposed to.

Also while there may be a large number of patches, I hope this makes them a lot easier to review than one giant monolithic patch.
I missed an edge case where the computed base itself was specified in the
paths input.
This fixes a latent bug that is currently not being hit (by sheer luck). Basically
the 'ignore' argument of a FileFinder object needs all paths to be relative to the
base. As luck would have it, most of the time it worked out that way if you were
running |mach lint| from the root of the repo.

However there are edge cases where this will cause an 'exclude' directive to get
missed. Plus this bug is about to be exposed 100% of the time in the next commit :).

Depends on D8842
Right now there are excludes defined in the linter definition (via the .yml
files), as well as excludes defined in lintargs (via the mach_commands.py).

This is a minor simplification that extends each linter definition's local
excludes with the global ones right off the bat. This just makes it a bit
easier to keep track of.

Depends on D5863
We were currently adding exclude paths to the "discard" set if the path contains
the include, but we weren't adding them if the include contains the path.

Depends on D5863
Currently pathutils.filterpaths computes some extra paths that the underlying linter
should try to exclude if it can. It sets these on lintargs['exclude'], and relies on
the fact that it was passed by reference to work.

However, it seems like pytest does some magic under the hood that prevents this value
from being propagated back. It will also fail if 'filterpaths' was invoked in a
subprocess.

It's much simpler and cleaner to pass this value back directly, rather than rely on a
reference.

Depends on D5862
Attachment #9017500 - Attachment description: Bug 1494069 - [mozlint] Return extra 'excludes' directly from pathutils.filterpaths, r=rwood → Bug 1494069 - [mozlint] Verify the expected_exclude paths in test_filterpaths, r?rwood
Attachment #9017500 - Attachment is obsolete: true
Currently pathutils.filterpaths computes some extra paths that the underlying linter
should try to exclude if it can. It sets these on lintargs['exclude'], and relies on
the fact that it was passed by reference to work.

However, it seems like pytest does some magic under the hood that prevents this value
from being propagated back. It will also fail if 'filterpaths' was invoked in a
subprocess.

It's much simpler and cleaner to pass this value back directly, rather than rely on a
reference.
Attachment #9017500 - Attachment is obsolete: false
Duplicate of this bug: 1467234
Attachment #9017496 - Attachment description: Bug 1494069 - [mozlint] Fix edge case in pathutils.collapse, r?egao → Bug 1494069 - [mozlint] Fix edge case in pathutils.collapse, r?rwood
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fbd19192e4fc
[mozlint] Fix edge case in pathutils.collapse, r=rwood
https://hg.mozilla.org/integration/autoland/rev/3cbd82a01bc2
[mozlint] Properly exclude files in LineType linters, r=rwood
https://hg.mozilla.org/integration/autoland/rev/9c587734513a
[mozlint] Return extra 'excludes' directly from pathutils.filterpaths, r=rwood
https://hg.mozilla.org/integration/autoland/rev/53abbe1d64d4
[mozlint] Append global 'excludes' to each linter at parse time, r=rwood
https://hg.mozilla.org/integration/autoland/rev/b9beb91e9992
[mozlint] Verify the expected_exclude paths in test_filterpaths, r=rwood
https://hg.mozilla.org/integration/autoland/rev/8a1fd7460ad5
[mozlint] Make sure exclude paths always get discarded when necessary, r=rwood
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.