Closed Bug 1388894 Opened 8 years ago Closed 4 years ago

Centralize registration of commands

Categories

(Firefox Build System :: Mach Core, enhancement)

enhancement

Tracking

(firefox98 fixed)

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: ahal, Assigned: alex.lopez.zorzano)

References

Details

Attachments

(4 files)

Yes, I know a major design goal of mach was to avoid precisely this, but hear me out :). The main goal here is to avoid loading every mach_commands.py on every invocation of `mach`. Afaict (and I might be wrong), the only reason we need to import every mach_commands.py file is to populate the main |mach help|. For each command, we need: A) The command's name B) The command's description C) The file the command is defined in C is already centrally registered in mach_bootstrap.py, so I'm proposing we simply add A + B to the mix (whether in mach_bootstrap.py or another file, I don't care). An example format: ('build', 'Build the tree.', 'python/mozbuild/mozbuild/mach_commands.py') There are some downsides to this approach: 1. Less convenient for command authors 2. Can't use 'conditions' to filter commands out of main help (unless we figure out a way to centrally register that too). But I don't think 1) is going to be that big of a burden, and 2) isn't a terribly compelling feature anyway. I think not needing to worry about the performance cost of mach_commands.py outweighs the cons.
We can also get most of the way there with caching. If the mach_commands.py hasn't changed since last run, we can load a cached command -> file mapping. We do have lazily loaded argument parsers. But I think those are *after* the @Command. i.e. a mach_commands.py should *always* have the command name in @Command, which means a file change check can safely be used to determine if the set of commands for that file needs refreshed. Regarding conditions and `mach help`, `mach help` is already special in that it loads all the lazily loaded things. Note how `mach help` takes several hundred ms longer to run than say `mach uuid`. So things should just work for `mach help`. I'm not sure where to cache this data though. I'd prefer to not write it to the source directory because I really want source directories to be read-only. We do find the ~/.mozbuild state directory during bootstrap of the mach driver though. We could write there if the directory exists.
Comment on attachment 8895626 [details] Bug 1388894 - Move loading of mach commands modules to registrar; https://reviewboard.mozilla.org/r/166868/#review172302
Attachment #8895626 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8895627 [details] Bug 1388894 - Cache mach commands dispatch table; https://reviewboard.mozilla.org/r/166870/#review172282 Thanks for putting this up, I like it! ::: commit-message-ecafe:13 (Diff revision 2) > +* settings providers need to always be loaded so all settings > + are registered when settings files loaded at dispatch time I wonder if we should be more strict about this, i.e force commands to declare what settings they use so we only have to load the settings when also loading a relevant command. Better left to a follow-up. ::: commit-message-ecafe:15 (Diff revision 2) > + > +TODO > + > +* settings providers need to always be loaded so all settings > + are registered when settings files loaded at dispatch time > +* need to cache categories? If we also cached category and description, we could use the cache for |mach help| instead of doing force_load. Would be good follow-up fodder. Beyond that I don't see why it would be necessary. ::: build/mach_bootstrap.py:253 (Diff revision 2) > os.path.join(topsrcdir, 'build', > 'submit_telemetry_data.py'), > get_state_dir()[0]], > stdout=devnull, stderr=devnull) > > + def topsrcdir_state_dir(): Can this also live in mozboot.util? I know of several instances already where we store srcdir-specific data that should be moved to this new directory. It would be more convenient to update if we could simply call this new method instead of requiring a mach context. ::: build/mach_bootstrap.py:333 (Diff revision 2) > mach.define_category(category, meta['short'], meta['long'], > meta['priority']) > > - for path in MACH_MODULES: > - mach.load_commands_from_file(os.path.join(mozilla_dir, path)) > + state_dir, state_dir_exists = topsrcdir_state_dir() > + if state_dir_exists: > + dispatch_cache = os.path.join(state_dir, 'mach_dispatch.json') The cached tasks for |mach try| are being stored under a state_dir/cache directory, we should put caching related stuff in the same directory. ::: python/mach/mach/registrar.py:41 (Diff revision 2) > + except ValueError: > + return None I think we should at least print a warning anytime the cache isn't formatted properly. Otherwise developers won't realize anything is wrong and we won't detect bugs. I'd prefer we just raised, but understand you might want to avoid that for fear of breaking mach for everyone. ::: python/mach/mach/registrar.py:290 (Diff revision 2) > + # For the cache to be used: > + # > + # 1. The set of commands files must align exactly. > + # 2. All commands files must be unchanged from cached data. > + if cache and set(cache['source_files']) == self.source_files: > + mtimes = cache['source_files'] > + if all(os.path.getmtime(p) <= mtimes[p] for p in self.source_files): > + self.command_handlers = LazyCommandsDict(self, cache['commands']) > + cache_used = True > + > + # Cache wasn't used. Load everything normally. > + if not cache_used: > + for path in source_files: > + self.register_commands_file(path) Why does this need to be all or nothing? Let's load what we can from the cache, and load what's left normally.
Attachment #8895627 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8895627 [details] Bug 1388894 - Cache mach commands dispatch table; https://reviewboard.mozilla.org/r/166870/#review173874 I must say, I don't really like that this relies on a cache, that needs to be (in)validated. Very few people add mach commands, and that's not something that happens often. Why not keep a hardcoded dict of command->file mappings in tree, and validate it corresponds to the current code in a python test? ::: build/mach_bootstrap.py:262 (Diff revision 2) > + for state belonging to this topsrcdir. > + """ > + import hashlib > + > + state_dir = get_state_dir()[0] > + ident = hashlib.md5(topsrcdir).hexdigest() You should canonicalize topsrcdir here. Something like os.path.normcase(os.path.abspath())
Attachment #8895627 - Flags: review?(mh+mozilla)
Since lazy loading of mach_commands.py will facilitate running `mach` with Python 3, tracking this against overall Python 3 efforts.
Blocks: buildpython3
Product: Core → Firefox Build System
If this is still something we plan on fixing and we go the cache route, we should depend on bug 1483228.
Depends on: 1483228

Re-reading the cache patches, I think I'd prefer a single static place to register commands. Instead of a tuple, maybe it can be a yml/toml/json file where each command is keyed by its top-level name and contains some metadata. For example, 'path', 'description', 'conditions', etc. Just the bare minimum we need to show |mach help| and dispatch. After dispatch, things can keep working the way they do now. If we really wanted to avoid centralization, we could register this information in moz.build I guess, but that adds its own overhead.

We also don't need to overthink settings. Settings that should be loaded globally, need to be defined somewhere globally (e.g in mach itself). Settings that only affect a specific command, get defined next to that command. I think the only corner case is if a command depends on settings that were originally defined somewhere else for another purpose. I don't think a case like this exists, but if it does maybe we can get that second command to re-define the settings.

We will need to adjust the mach-commands, mach-debug-commands and mach-completion commands though.

Blocks: 1519990
Blocks: 1555762

FWIW I wrote a very simple implementation of this idea for wpt [1]. Commands are defined in json files, there's a central list of places to search for the json files, and we don't execute any code except that required to run the actual command. In theory this model could be extended to do things like install additional components before even accessing the argument parser, but that's not a requirement for wpt (it has been suggested that there are use cases for gecko to have commands where all the code lives in an external library).

[1] https://github.com/web-platform-tests/wpt/blob/master/tools/wpt/wpt.py

See Also: → 1695312
Blocks: 1695312
See Also: 1695312

I'd like to tweak this bug to focus more on the central registration and less on the performance/lazy-load factor (that has been relegated to 1695312).

I imagine that this central registration is very similar to what :ahal has recommended above: rather than centrally defining a block of python scripts to import together, we should centrally define a list of Mach commands.

So, I'm guessing it would look like this:

(
    builder
        .register_command(<command name>, <command description>, <function reference>)
        .register_command("try", "Push selected tasks to the try server", tools.tryselect.try_default)
        .register_command("build", "Build the tree", mozbuild.mozbuild.build_commands.build)
        ...
)

A couple notes and requirements:

  • As previously brought up, there's concerns around conditions. They either need to be ported, or maybe they're unused enough that they can be removed.
  • I'd prefer to have this registration function with parameters as opposed to individual tuples so that each parameter is associated with a name (and, arguably, a type).
  • I feel strongly that this is better than a cached command -> file mapping, because it has lower complexity and there's no extra cache state to manage.

Adding "top-level function commands" as a dependent, because that makes referencing commands in this centralized list easier, since we don't need to manage the instantiation of the specific MachCommandBase object.

Depends on: 1696251
Summary: Central registration of commands to avoid import penalty → Centralize registration of commands
No longer blocks: buildpython3

This reduces some code duplication and paves the way for modifying how
Mach loads modules.

The plan is to follow this up by making this function take a more fleshed out 'spec'
that maps command names to the files they're defined at and possibly more metadata.
Since this may affect how Mach works internally (e.g. with lazy loading in the roadmap),
it makes sense to move this logic inside the Mach class.

Assignee: nobody → alex.lopez.zorzano
Status: NEW → ASSIGNED

This starts a centralized place to keep track of mach commands with some essential
metadata.

Depends on D136789

Attachment #9260475 - Attachment description: Bug 1388894 - Replace list of mach modules to load with a dict of mach commands r=mhentges → WIP: Bug 1388894 - Replace list of mach modules to load with a dict of mach commands r=mhentges
Attachment #9260475 - Attachment description: WIP: Bug 1388894 - Replace list of mach modules to load with a dict of mach commands r=mhentges → Bug 1388894 - Replace list of mach modules to load with a dict of mach commands r=mhentges
No longer blocks: 1519990, 1555762
Severity: normal → --
Pushed by mhentges@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9bcccaed059c Add function in Mach to load commands from files in bulk. r=mhentges https://hg.mozilla.org/integration/autoland/rev/6123fcc09630 Replace list of mach modules to load with a dict of mach commands r=mhentges
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: