Closed Bug 1511122 Opened 6 years ago Closed 3 years ago

Implement |mach format|

Categories

(Developer Infrastructure :: Lint and Formatting, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ahal, Unassigned)

References

Details

Attachments

(1 obsolete file)

We currently have a |mach clang-format| command that lives here:
https://searchfox.org/mozilla-central/rev/e22c0d152060f4f8d4ca8904094f15f65a1b6f93/python/mozbuild/mozbuild/mach_commands.py#2339

There's talk of adding rustfmt in bug 1369792. There's also a good chance we'll want to start formatting other languages (e.g we may want to use yapf on our python code).

This is reminiscent of the early days of linting at Mozilla. I.e we're at risk of having many different incohesive formatting implementations for various different languages. We don't want developers to need to remember which formatters have to be run against which file types. We also don't want automated tooling to need integration shims for each different formatter.

Similar to |mach lint|, I propose we implement |mach format|. It will give us a single entry point to all of the formatters we end up adding to mozilla-central. By default (no args), it would run all the formatters, though developers would still have the ability to only run a single formatter.

E.g:
$ ./mach format                    # runs all formatters
$ ./mach format --formatter clang  # only runs clang-format
$ ./mach format -f rustfmt         # only runs rustfmt
$ ./mach format --check            # print diff and return 1 if formatting needed (useful in CI)
Attached file Bug 1511122 - Implement |mach format| (obsolete) —
This also adds rustfmt as an example. Usage:

$ ./mach format           # apply formatting
$ ./mach format --check   # print diff and exit
As outlined here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1369792#c27

I believe we should re-use mozlint (the library) under the hood. The attached patch is a very quick and dirty example of what that might look like. It is by no means even close to ready to land, but I wanted to illustrate that using mozlint would be easy and appropriate for this case. It took me ~30 min to write.

In the past, there has been resistance to the idea of using 'mozlint' to support our formatting. I think mostly due to the fact that "linting" and "formatting" are two very different tasks. I completely agree, we should not try to shoe-horn formatting tools into linting infrastructure.

The reason I still advocate the use of "mozlint", is that I think "mozlint" is a terrible name and I regret choosing it. If you distill mozlint down to its essential components, you'll notice that aside from unfortunate class/function naming, it really has little to do with linting, it is much more generic than that. I now think of mozlint as a job execution framework that can run any arbitrary tasks which take file paths as inputs. This definition applies equally well to both linting and formatting.

To re-iterate some points from my comment in the other bug:

1) I think we should rename mozlint and refactor out any lint specific assumptions
2) Have two distinct entry points for linting/formatting (./mach lint and ./mach format respectively)
3) Have two distinct sets of configuration for linting/formatting (tools/lint and tools/format respectively)
4) Share the common library which handles task registration, execution, error handling, parallelization, etc.
See Also: → 1369792
Sorry if this is an unrelated feature request, but does mozlint support parallel invocations of the underlying tool?  Right now the way that clang-format runs through mach is extremely poorly designed, as it's purely sequential so reformatting of large portions of the tree only uses a single core.  It'd be nice to have something more efficient and since your comment mentioned parallelization I decided to probe more.  :-)
Yes it does!

At the moment mozlint parallelizes across two vectors: tools and paths. For example, each tool is guaranteed to run in its own subprocess. But in addition to that if you specify many paths (which is common when running |mach lint --outgoing| on a revision that touches lots of files), it'll create multiple subprocesses for the same tool, each paired with a different set of paths. The logic for figuring this out depends on the number of CPUs you have, the number of paths specified and the number of tools that are running at once.

I will say that I'm not happy with how this is currently implemented and I have a patch in my local queue that refactors the job control logic to make it more flexible. But I need to find time to test and polish it up.
Also should have mentioned that bug 1369792 was my main motivation for filing this. While I think it would be nice to have a generic |mach format| that also runs clang-format, I think trying to run `rustfmt` as a linter is going to prove to be a mistake. This is my attempt to prevent that, while also preventing us from re-inventing the wheel for each new formatter.
The use case I have in mind for parallelization is commit hooks, where you use a VCS command to get a list of modified files, break them into NCORES buckets and hand them off to mach format-worker jobs of sorts, where each job figures out what file type it has been fed, whether it needs formatting (Is it JS?  OK, no.  Is it cpp?  Alright, clang-format.  etc.) and does the work.  (Of course I'm speaking conceptually, it's probably more efficient to figure out the job types a priori.)

What makes me interested in this is that in order to give developers decent code formatting experience, we should try to catch misformatting mistakes as early and as often as possible, so we really want everyone to enable commit hooks, and for that we want the commit hooks to be blazing fast.  Right now, ./mach clang-format is the opposite.  :-)

Is this a use case you've thought about?
(In reply to :Ehsan Akhgari from comment #6)
> Is this a use case you've thought about?

Yes, quite a bit actually! We have the same need for speed in the linting hooks. The good news is that the job sorting you mention is exactly what mozlint already handles. Every "task" we define registers up front which extensions it should apply to, as well as directories to include/exclude. Here's an example:
https://searchfox.org/mozilla-central/source/tools/lint/yaml.yml

When you specify a list of paths (which may include directories), mozlint automatically figures out which linter's are "triggered" by that set of paths based on their config. That said, there is a noticeable overhead to |mach lint|. There are two reasons for this:

1) The path filtering logic is a lot more complicated than I initially thought it would be. We need to take into account included/excluded paths, specified paths, cwd, globs, directories (e.g do we need to search for filetypes under this directory?). If most formatters are meant to apply to the tree globally, and if we make certain simplifications (e.g say no globs allowed) we can make this a lot faster.

2) Python process overhead. You probably know that Python isn't the greatest choice for parallelism due to the GIL and relatively large process overhead. I suspect that for a small number of jobs, the benefits of parallelism are being negated by the process overhead (though with a large number of specified paths the benefits are obvious). I've toyed with the idea of re-writing mozlint in Rust.
I think there's room to improve the performance in the current python implementation and the job control refactoring I mentioned I had locally was a first step towards that. However, I'm not sure the python implementation will be able to meet your definition of "blazing fast".
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> (In reply to :Ehsan Akhgari from comment #6)
> > Is this a use case you've thought about?
> 
> Yes, quite a bit actually! We have the same need for speed in the linting
> hooks. The good news is that the job sorting you mention is exactly what
> mozlint already handles. Every "task" we define registers up front which
> extensions it should apply to, as well as directories to include/exclude.
> Here's an example:
> https://searchfox.org/mozilla-central/source/tools/lint/yaml.yml

Fantastic!

> When you specify a list of paths (which may include directories), mozlint
> automatically figures out which linter's are "triggered" by that set of
> paths based on their config. That said, there is a noticeable overhead to
> |mach lint|. There are two reasons for this:
> 
> 1) The path filtering logic is a lot more complicated than I initially
> thought it would be. We need to take into account included/excluded paths,
> specified paths, cwd, globs, directories (e.g do we need to search for
> filetypes under this directory?). If most formatters are meant to apply to
> the tree globally, and if we make certain simplifications (e.g say no globs
> allowed) we can make this a lot faster.

For this use case at least, do you think there would be formatters that are *not* going to apply to the tree globally?  Most of the code we have that I can think of is spread across many directories, perhaps with the notable exception of python code but even that is spread across a few...

> 2) Python process overhead. You probably know that Python isn't the greatest
> choice for parallelism due to the GIL and relatively large process overhead.
> I suspect that for a small number of jobs, the benefits of parallelism are
> being negated by the process overhead (though with a large number of
> specified paths the benefits are obvious). I've toyed with the idea of
> re-writing mozlint in Rust.

Right, yes I do expect to run into that limitation at some point, but I haven't done any measurements so I don't know how close to that threshold we're running right now.  Even with the Python based implementation it should be possible to tune things a bit (e.g. decide how many jobs per core would make it "worth" parallelizing, etc.)  But yeah, I'm doubtful if Python is going to be a great tool for a speedy tool here long term...

(In reply to Andrew Halberstadt [:ahal] from comment #8)
> I think there's room to improve the performance in the current python
> implementation and the job control refactoring I mentioned I had locally was
> a first step towards that. However, I'm not sure the python implementation
> will be able to meet your definition of "blazing fast".

FWIW I think what matters here is how much we'll make people's commits become slower.  For me at least (using git) there is no noticeable delay when typing "git commit" right now.  While a bit of a delay can be acceptable I think if "git commit" suddenly started to take a minute to finish, I'd probably disable the offending hook...

(Also, do you know whether commit hooks would run during operations such as rebases?  If yes, the performance of the hooks would be even more critical.)
Severity: normal → enhancement
Priority: -- → P2
See Also: → 1521772
See Also: → 1466070

@ahal Is this still required? Can I take this up?

Hey nisarg, Sylvestre and I had a chat about this bug and we decided it's not worth the effort. We currently have a bunch of formatters already integrated with mach lint and the effort involved with this bug doesn't seem worth the trouble (especially since we'd still want to run them in check mode in CI anyway). I filed a new bug for a much easier solution to provide a mach format command here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1698838

Sylvestre also suggested the following two bugs might be a better use of time if you were looking for something else:
https://bugzilla.mozilla.org/show_bug.cgi?id=1670949
https://bugzilla.mozilla.org/show_bug.cgi?id=1670950

Thanks for the interest though!

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX

Sure. Thanks.

Product: Firefox Build System → Developer Infrastructure
Attachment #9028721 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: