Closed Bug 794509 Opened 12 years ago Closed 12 years ago

mach support for auto-loading command modules

Categories

(Firefox Build System :: Mach Core, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
mozilla18

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(3 files)

Currently mach manually imports modules and registers classes with the command driver. It would be nice if modules/classes could "self register" by either existing in a certain place or providing some kind of hook registered via distutils/site.py/etc.

This isn't a huge priority. But, it is annoying to have to update mach every time you add a new sub-command.
so if we use setuptools/distribute they, for better or worse it has entry points: http://packages.python.org/distribute/pkg_resources.html#entry-points  However, if we're not running setup.py {install|develop}, then these won't work

There are a few third-party "registry" type packages as well.

Or we can just either continue with the current system for now or roll-our-own registry.
Component: Build Config → mach
This arguably doesn't need to be in this bug, but I'm too lazy to create a new bug. This patch refactors mach commands to be defined in terms of Python decorators. This paves the road for slightly easier command defining, IMO.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #667694 - Flags: review?(jhammel)
I definitely like the ideas for decorators here.  One idea, though:

+    @Command('mochitest-a11y', help='Run an a11y mochitest.')
+    @CommandArgument('test_file', default=None, nargs='?', metavar='TEST',
+        help=generic_help)
+    def run_mochitest_a11y(self, test_file):

Instead of having two decorators here (or more for more command arguments), what about

@Command('mochiest-ally', help='Run an a11y mochitest.', test_file=dict(nargs='?', metavar='TEST'))

I'm actually somewhat hesistent about the duplication here.  I tend to be a bit too implicit for these kinds of parsers (see http://k0s.org/mozilla/hg/bzconsole/ for an example), but the level of duplication between the definition for CLI vs function signature and docstring does bother me.
With auto-loading command modules, commands no longer need to be inside the mach source distribution. So, I'm moving all the mozbuild-specific commands into mozbuild because that's where they belong.
Attachment #667717 - Flags: review?(jhammel)
Automatic discovery and loading of Python modules!
Attachment #667720 - Flags: review?(jhammel)
Attachment #667720 - Flags: feedback?(tarek)
B2G is going to get some mach love, so bumping priority.
Priority: P5 → P3
(In reply to Jeff Hammel [:jhammel] from comment #3)
> Instead of having two decorators here (or more for more command arguments),
> what about
> 
> @Command('mochiest-ally', help='Run an a11y mochitest.',
> test_file=dict(nargs='?', metavar='TEST'))

This isn't so bad for the single argument case. But, once you have multiple arguments things get a bit unwieldy and I think multiple decorators makes more sense.

> I'm actually somewhat hesistent about the duplication here.  I tend to be a
> bit too implicit for these kinds of parsers (see
> http://k0s.org/mozilla/hg/bzconsole/ for an example), but the level of
> duplication between the definition for CLI vs function signature and
> docstring does bother me.

Well, we had that duplication before. Now, it's just all consolidated in proximity at the method definitions so it seems worse.

I agree there is some duplication. Docstrings are arguably not necessary now. If we wanted to be "crazy," we could probably parse docstrings and/or function signatures into commands/arguments. I /think/ there are some Python packages in the wild that do this. I think that would be an interesting feature to explore. And, we could certainly support that in addition to decorators at some future date. The current implementation certainly doesn't preclude it.
jhammel: I should add that I didn't use entry points for auto-discovery after talking things over with Tarek (one of our resident Python gurus) and realizing it isn't optimal for us.

Essentially, entry points rely on metadata in packaged eggs. Since we don't use eggs in the m-c virtualenv (just .pth files), we can't use entry points. The solution was thus to reinvent part of Python's module loading.

FWIW, we can't just call "import" on the corresponding module names for discovered .py files because once Python has discovered the directory for a package or sub-module, it assumes all modules within should be there. In other words, it doesn't like having the .py files for a package/sub-module scattered over multiple directories. And, in order for different components to provide subcommands to mach without having to modify the mach package itself and without having to import the world, we essentially need packages/sub-modules split over multiple directories. Hence the low-level module loading.
Comment on attachment 667720 [details] [diff] [review]
Part 3: Discover mach command modules and automatically load, v1

Review of attachment 667720 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. You might consider using os.splitext() instead of endswith() to filter out py file

Or even better: use a glob-style matching, see http://docs.python.org/library/glob
Attachment #667720 - Flags: feedback?(tarek) → feedback+
(In reply to Tarek Ziadé (:tarek) from comment #9)
> Comment on attachment 667720 [details] [diff] [review]
> Part 3: Discover mach command modules and automatically load, v1
> 
> Review of attachment 667720 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good. You might consider using os.splitext() instead of endswith() to
> filter out py file
> 
> Or even better: use a glob-style matching, see
> http://docs.python.org/library/glob

I'm not sure glob-style matching would make things clearer, IMHO.  I think the way it is is fine
(In reply to Gregory Szorc [:gps] from comment #8)
> jhammel: I should add that I didn't use entry points for auto-discovery
> after talking things over with Tarek (one of our resident Python gurus) and
> realizing it isn't optimal for us.
> 
> Essentially, entry points rely on metadata in packaged eggs. Since we don't
> use eggs in the m-c virtualenv (just .pth files), we can't use entry points.
> The solution was thus to reinvent part of Python's module loading.

Right, I figured as such
(In reply to Gregory Szorc [:gps] from comment #7)
> (In reply to Jeff Hammel [:jhammel] from comment #3)
> > Instead of having two decorators here (or more for more command arguments),
> > what about
> > 
> > @Command('mochiest-ally', help='Run an a11y mochitest.',
> > test_file=dict(nargs='?', metavar='TEST'))
> 
> This isn't so bad for the single argument case. But, once you have multiple
> arguments things get a bit unwieldy and I think multiple decorators makes
> more sense.

I would tend to disagree, but will defer on this one.

> > I'm actually somewhat hesistent about the duplication here.  I tend to be a
> > bit too implicit for these kinds of parsers (see
> > http://k0s.org/mozilla/hg/bzconsole/ for an example), but the level of
> > duplication between the definition for CLI vs function signature and
> > docstring does bother me.
> 
> Well, we had that duplication before. Now, it's just all consolidated in
> proximity at the method definitions so it seems worse.

Sure.  The only reason I didn't bring it up then was because it was an initial implementation.  As we move towards some sort of final solution, I get more picky about things.  I wonder if we should actually make the CLI parsing stuff its own package.

> I agree there is some duplication. Docstrings are arguably not necessary
> now. If we wanted to be "crazy," we could probably parse docstrings and/or
> function signatures into commands/arguments. I /think/ there are some Python
> packages in the wild that do this. 

Yep; see the above linked-to http://k0s.org/mozilla/hg/bzconsole/ for one.

> I think that would be an interesting
> feature to explore. And, we could certainly support that in addition to
> decorators at some future date. The current implementation certainly doesn't
> preclude it.

I think some sort of notation or API will be necessary to note which methods are CLI functions.  I'd also like to explore this in the future, namely the problem space: "Given: a python CLI with subcommands, each subcommand linking to a function with function parameters as CLI arguments, (etc)". This of course has been done in a bunch of different ways; I tend to like mine the best, though I'm biased and it's very implicit.  In general, since at Mozilla alone I've seen (and written, for that matter) several different ways of doing this, it would be nice to standardize on something that is best of breed. Isil'zha
Comment on attachment 667694 [details] [diff] [review]
Part 1: Use decorators for defining commands, v1

+        # Clean up after decorator because it isn't important now.
+        del value._mach_command
+
+        if arguments is not None:
+            del value._mach_command_args

Do we really need to? (If so, that should be said vs "isn't important")

Also more comments on CommandProvider telling where these attrs come from would be nice.

This definitely looks nicer.  Let's go with this for now.
Attachment #667694 - Flags: review?(jhammel) → review+
Comment on attachment 667717 [details] [diff] [review]
Part 2: Move some command modules elsewhere, v1

I'm a bit iffy about combining these modules into one package, but I guess that is okay
Attachment #667717 - Flags: review?(jhammel) → review+
Comment on attachment 667720 [details] [diff] [review]
Part 3: Discover mach command modules and automatically load, v1

+        for path in sys.path:
+            # We only support importing .py files from directories.
+            commands_path = os.path.join(path, 'mach', 'commands')

??? Why? Why look over all the directories on the path?  Can't we just look for the one directory that we know this will actually exist?

I'm also a little bit conflicted on the importing all of these ahead of time versus saving import time by not importing gross functionality until function time.  I realize that this is done for perf reasons and I suppose I can live with that
(In reply to Jeff Hammel [:jhammel] from comment #13)
> Comment on attachment 667694 [details] [diff] [review]
> Part 1: Use decorators for defining commands, v1
> 
> +        # Clean up after decorator because it isn't important now.
> +        del value._mach_command
> +
> +        if arguments is not None:
> +            del value._mach_command_args
> 
> Do we really need to? (If so, that should be said vs "isn't important")

No, we don't need to. However, I figured they weren't doing anything after the class is scanned, so why have them? At run-time, they can only cause trouble (albeit only in rare scenarios which I'll admit to not having a good example of. This is just me being defensive.

(In reply to Jeff Hammel [:jhammel] from comment #15)
> Comment on attachment 667720 [details] [diff] [review]
> Part 3: Discover mach command modules and automatically load, v1
> 
> +        for path in sys.path:
> +            # We only support importing .py files from directories.
> +            commands_path = os.path.join(path, 'mach', 'commands')
> 
> ??? Why? Why look over all the directories on the path?  Can't we just look
> for the one directory that we know this will actually exist?
>
> I'm also a little bit conflicted on the importing all of these ahead of time
> versus saving import time by not importing gross functionality until
> function time.  I realize that this is done for perf reasons and I suppose I
> can live with that

We need to look everywhere and import everything we find in order to populate the argument parser. And, a side-effect of populating the argument parser is exposing help information. Even if we lazily-populated (e.g. only if help is requested), we still need to perform a brute force scan and load modules until we find the one that satisfies the requested command. I suppose we could tighten the convention a bit (one command per module of the same name). But, that seems somewhat restrictive and results in a lot of small Python modules. I suppose if performance of scanning and loading all the modules becomes an issue we can look at ways to optimize it. I'm not convinced it will be an issue.
(In reply to Gregory Szorc [:gps] from comment #16)
> (In reply to Jeff Hammel [:jhammel] from comment #13)
> > Comment on attachment 667694 [details] [diff] [review]
> > Part 1: Use decorators for defining commands, v1
> > 
> > +        # Clean up after decorator because it isn't important now.
> > +        del value._mach_command
> > +
> > +        if arguments is not None:
> > +            del value._mach_command_args
> > 
> > Do we really need to? (If so, that should be said vs "isn't important")
> 
> No, we don't need to. However, I figured they weren't doing anything after
> the class is scanned, so why have them? At run-time, they can only cause
> trouble (albeit only in rare scenarios which I'll admit to not having a good
> example of. This is just me being defensive.

So that's kind of worrisome.  I think if there is a concrete reason we should document it.  If there's not, we shouldn't delete these attributes.

> (In reply to Jeff Hammel [:jhammel] from comment #15)
> > Comment on attachment 667720 [details] [diff] [review]
> > Part 3: Discover mach command modules and automatically load, v1
> > 
> > +        for path in sys.path:
> > +            # We only support importing .py files from directories.
> > +            commands_path = os.path.join(path, 'mach', 'commands')
> > 
> > ??? Why? Why look over all the directories on the path?  Can't we just look
> > for the one directory that we know this will actually exist?
> >
> > I'm also a little bit conflicted on the importing all of these ahead of time
> > versus saving import time by not importing gross functionality until
> > function time.  I realize that this is done for perf reasons and I suppose I
> > can live with that
> 
> We need to look everywhere and import everything we find in order to
> populate the argument parser. And, a side-effect of populating the argument
> parser is exposing help information. Even if we lazily-populated (e.g. only
> if help is requested), we still need to perform a brute force scan and load
> modules until we find the one that satisfies the requested command. I
> suppose we could tighten the convention a bit (one command per module of the
> same name). But, that seems somewhat restrictive and results in a lot of
> small Python modules. I suppose if performance of scanning and loading all
> the modules becomes an issue we can look at ways to optimize it. I'm not
> convinced it will be an issue.

So I understand why we have to load all of the files in the directory where 'mach.commands' resides manually.  What I don't understand is why we look through all of sys.path for this.

The native sys.path on my machine is:

['', '/home/jhammel/python', '/home/jhammel/mozbase', '/home/jhammel/virtualenv', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/ubuntuone-client', '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel', '/usr/lib/python2.7/dist-packages/ubuntuone-couch', '/usr/lib/python2.7/dist-packages/ubuntuone-installer', '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']

Why would we want to scan through that?  We know the location of mach.commands. Can't we just use it?
(In reply to Jeff Hammel [:jhammel] from comment #17)
> (In reply to Gregory Szorc [:gps] from comment #16)
> > (In reply to Jeff Hammel [:jhammel] from comment #13)
> > > Comment on attachment 667694 [details] [diff] [review]
> > > Part 1: Use decorators for defining commands, v1
> > > 
> > > +        # Clean up after decorator because it isn't important now.
> > > +        del value._mach_command
> > > +
> > > +        if arguments is not None:
> > > +            del value._mach_command_args
> > > 
> > > Do we really need to? (If so, that should be said vs "isn't important")
> > 
> > No, we don't need to. However, I figured they weren't doing anything after
> > the class is scanned, so why have them? At run-time, they can only cause
> > trouble (albeit only in rare scenarios which I'll admit to not having a good
> > example of. This is just me being defensive.
> 
> So that's kind of worrisome.  I think if there is a concrete reason we
> should document it.  If there's not, we shouldn't delete these attributes.

OK. I'll remove the deletion and keep the attributes.

> > (In reply to Jeff Hammel [:jhammel] from comment #15)
> > > Comment on attachment 667720 [details] [diff] [review]
> > > Part 3: Discover mach command modules and automatically load, v1
> > > 
> > > +        for path in sys.path:
> > > +            # We only support importing .py files from directories.
> > > +            commands_path = os.path.join(path, 'mach', 'commands')
> > > 
> > > ??? Why? Why look over all the directories on the path?  Can't we just look
> > > for the one directory that we know this will actually exist?
> > >
> > > I'm also a little bit conflicted on the importing all of these ahead of time
> > > versus saving import time by not importing gross functionality until
> > > function time.  I realize that this is done for perf reasons and I suppose I
> > > can live with that
> > 
> > We need to look everywhere and import everything we find in order to
> > populate the argument parser. And, a side-effect of populating the argument
> > parser is exposing help information. Even if we lazily-populated (e.g. only
> > if help is requested), we still need to perform a brute force scan and load
> > modules until we find the one that satisfies the requested command. I
> > suppose we could tighten the convention a bit (one command per module of the
> > same name). But, that seems somewhat restrictive and results in a lot of
> > small Python modules. I suppose if performance of scanning and loading all
> > the modules becomes an issue we can look at ways to optimize it. I'm not
> > convinced it will be an issue.
> 
> So I understand why we have to load all of the files in the directory where
> 'mach.commands' resides manually.  What I don't understand is why we look
> through all of sys.path for this.
> 
> The native sys.path on my machine is:
> 
> ['', '/home/jhammel/python', '/home/jhammel/mozbase',
> '/home/jhammel/virtualenv', '/usr/lib/python2.7',
> '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk',
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
> '/usr/local/lib/python2.7/dist-packages',
> '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL',
> '/usr/lib/python2.7/dist-packages/gst-0.10',
> '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7',
> '/usr/lib/python2.7/dist-packages/ubuntu-sso-client',
> '/usr/lib/python2.7/dist-packages/ubuntuone-client',
> '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel',
> '/usr/lib/python2.7/dist-packages/ubuntuone-couch',
> '/usr/lib/python2.7/dist-packages/ubuntuone-installer',
> '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']
> 
> Why would we want to scan through that?  We know the location of
> mach.commands. Can't we just use it?

We know the location of mach.commands *today*. However, a long term goal is for people to be able "drop" mach.commands modules into something on sys.path and it magically works. If the module is in your virtualenv, mach will find it. In this magically world, people can easily add custom mach commands. You could even have multiple projects, each providing their own commands/modules.

IMO we could apply some heuristic on whether to scan a directory in sys.path. But, I really don't think we need to bother. Once you've run mach once, the stat() results are cached by the OS and things should be pretty fast.

I concede we can't get there just yet because the main mach script sets up sys.path. Presumably we'll have a more robust solution for that once bug 794506 is resolved.
I'm not convinced but we can go with this for now.  I find this both a bit cumbersome, a bit implicit, and a bit reinventing the wheel (even if the wheel we have, that is setuptools entry points, is a fairly lopsided wheel).

If I wanted to add a mach command to the current system that is local to me, I would have the following directory structure:

/home/jhammel/python
|
-mach
 |
 -__init__.py # not actually necessary for this patch, but necessary if i ever want to import on its own
 |
 -commands
  |
  -__init__.py # I can't put commands in here, either o_O
  |
  -myactualcommands.py

If I wanted to turn this off, I would have to remove this directory structure.

I would much rather have an environment variable and/or some combination of dotfiles to explicitly be able to control what commands I have available.

While setuptools entry points is fairly crippled and requires setuptools, it is at least a known bad.  I'd much rather look at a registry-type system such as this versus becoming too attached to the methodology of doing this in patch.

(This way of course also prohibits e.g. zipped eggs but I don't really care about that. Zipped eggs can die die die.)
Comment on attachment 667720 [details] [diff] [review]
Part 3: Discover mach command modules and automatically load, v1

Again, I'm very hesistent on this methodology and hope we can find another before we get too attached to this, but fine for now
Attachment #667720 - Flags: review?(jhammel) → review+
(In reply to Jeff Hammel [:jhammel] from comment #19)
> I'm not convinced but we can go with this for now.  I find this both a bit
> cumbersome, a bit implicit, and a bit reinventing the wheel (even if the
> wheel we have, that is setuptools entry points, is a fairly lopsided wheel).

I would use setuptools entry points in a heart beat if they were compatible with our usage of Python in m-c.

> If I wanted to add a mach command to the current system that is local to me,
> I would have the following directory structure:
> 
> /home/jhammel/python
> |
> -mach
>  |
>  -__init__.py # not actually necessary for this patch, but necessary if i
> ever want to import on its own
>  |
>  -commands
>   |
>   -__init__.py # I can't put commands in here, either o_O
>   |
>   -myactualcommands.py
> 
> If I wanted to turn this off, I would have to remove this directory
> structure.
> 
> I would much rather have an environment variable and/or some combination of
> dotfiles to explicitly be able to control what commands I have available.

I agree that is sub-optimal. I think we can change the heuristics later to look for a semaphore file, environment variable, config file, etc.

> While setuptools entry points is fairly crippled and requires setuptools, it
> is at least a known bad.  I'd much rather look at a registry-type system
> such as this versus becoming too attached to the methodology of doing this
> in patch.

I'm open to alternatives. The requirement that things work without relying on entry points does limit our choices. I'm pretty sure that without entry points or some one-time hook to install things in a certain way in the virtualenv, our only remaining option is filesystem globbing.
https://hg.mozilla.org/mozilla-central/rev/15c2bcb1a982
https://hg.mozilla.org/mozilla-central/rev/d04782dc0091
https://hg.mozilla.org/mozilla-central/rev/533f124453a0
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
(In reply to Jeff Hammel [:jhammel] from comment #20)
> Comment on attachment 667720 [details] [diff] [review]
> Part 3: Discover mach command modules and automatically load, v1
> 
> Again, I'm very hesistent on this methodology and hope we can find another
> before we get too attached to this, but fine for now

FWIW it's *very* similar to what setuptools does with entry points. It scans sys.path and looks for the egg metadata to read the entries.txt file.

What Greg did is basically the same but using convention over configuration (e.g. the plugin location in each package) exactly like what Django does.
AIUI, setuptools entry points are

[myentrypointnames]
myentrypoint = foo.bar:fleem

which means 'the "fleem" object in the foo.bar' module.  One of the real disadvantages here is that since this is done in setup.py, you're basically globbing together in whichever order you've happened to run setup.pys, again AIUI.

If we're allowing/encouraging developers to add their own mach commands, I would personally like something a bit more explicit, though obviously I can only speak as to what I would want

The above format (or whatever variant) is basically fine; I would just like a better way to control this than having a 'mach.commands' top level module.  See comment 21.  In other words, if there was a dotfile that allowed me to extend and override commands, that would be golden.  The default config could come from scanning the particular 'mach.commands' directory from the tree.  But additional overrides/extensions should be able to come from anywhere.

Another disadvantage of setuptools entry points is the requirement to be on sys.path.  What if I want a /home/jhammel/mymachcommands.py file?  Should I have to put that on system path? I would much rather just specify a path to a file.

Putting it all together, and arbitrarily choosing .ini format because that's what setuptools (can) use, this would look something like

[mach.commands]

# default commands generated by mach
mach.commands.* =

# commands from other installed python packages
jhammels.mach.extension.* =

# other local resources that aren't packaged
/home/jhammel/mylocalmachfile.py =

I realize that this is probably bikeshedding.  That said, I've seen this feature done wrong a bunch, and I'd rather us do it right.
> Another disadvantage of setuptools entry points is the requirement to be on sys.path.  What if I want a /home/jhammel/mymachcommands.py file? 
> Should I have to put that on system path? I would much rather just specify a path to a file.

Yeah well that's because you want auto-discovery. I like explicit better, like what Mercurial does: a central config file where you just list the plugins you want to activate using "foo.bar.fleem"

> I realize that this is probably bikeshedding.  
> That said, I've seen this feature done wrong a bunch, and I'd rather us do it right.
 
I think it's just deciding if you want auto-discovery or not. I personally find the feature dangerous because you implicitly activate code -- vs an explicit activation in a central config file.

I gave my initial feedback on the assumption auto-discovery was required though
IMHO, autodiscovery is good for the inbuilt modules, so they don't have to be double-catalogued, and less so for user-extensions </opinion>
I want some minimal form of auto-discovery because I think it is a better overall user experience. One of my big complaints with Mercurial is that a lot of useful functionality is disabled by default. If people don't know some useful feature exists, how will they know to use it? If some useful feature exists, why require a user to configure it?

I think supplementing the current heuristic of scanning sys.path + mach.modules can be supplemented. mach does have config files. Let's put them to use to allow defining custom locations of mach command modules.
(In reply to Gregory Szorc [:gps] from comment #27)
> I think supplementing the ... can be supplemented.

Don't let me give out any r+ today. Ugh.
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.