Open Bug 1388894 Opened 2 years ago Updated 6 months ago

Central registration of commands to avoid import penalty


(Firefox Build System :: Mach Core, enhancement)

Not set


(firefox57 wontfix)

Tracking Status
firefox57 --- wontfix


(Reporter: ahal, Unassigned)


(Blocks 2 open bugs)



(2 files)

Yes, I know a major design goal of mach was to avoid precisely this, but hear me out :). The main goal here is to avoid loading every on every invocation of `mach`.

Afaict (and I might be wrong), the only reason we need to import every file is to populate the main |mach help|. For each command, we need:
A) The command's name
B) The command's description
C) The file the command is defined in

C is already centrally registered in, so I'm proposing we simply add A + B to the mix (whether in or another file, I don't care). An example format:

('build', 'Build the tree.', 'python/mozbuild/mozbuild/')

There are some downsides to this approach:
1. Less convenient for command authors
2. Can't use 'conditions' to filter commands out of main help (unless we figure out a way to centrally register that too).

But I don't think 1) is going to be that big of a burden, and 2) isn't a terribly compelling feature anyway. I think not needing to worry about the performance cost of outweighs the cons.
We can also get most of the way there with caching. If the hasn't changed since last run, we can load a cached command -> file mapping.

We do have lazily loaded argument parsers. But I think those are *after* the @Command. i.e. a should *always* have the command name in @Command, which means a file change check can safely be used to determine if the set of commands for that file needs refreshed.

Regarding conditions and `mach help`, `mach help` is already special in that it loads all the lazily loaded things. Note how `mach help` takes several hundred ms longer to run than say `mach uuid`. So things should just work for `mach help`.

I'm not sure where to cache this data though. I'd prefer to not write it to the source directory because I really want source directories to be read-only. We do find the ~/.mozbuild state directory during bootstrap of the mach driver though. We could write there if the directory exists.
Comment on attachment 8895626 [details]
Bug 1388894 - Move loading of mach commands modules to registrar;
Attachment #8895626 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8895627 [details]
Bug 1388894 - Cache mach commands dispatch table;

Thanks for putting this up, I like it!

::: commit-message-ecafe:13
(Diff revision 2)
> +* settings providers need to always be loaded so all settings
> +  are registered when settings files loaded at dispatch time

I wonder if we should be more strict about this, i.e force commands to declare what settings they use so we only have to load the settings when also loading a relevant command.

Better left to a follow-up.

::: commit-message-ecafe:15
(Diff revision 2)
> +
> +
> +* settings providers need to always be loaded so all settings
> +  are registered when settings files loaded at dispatch time
> +* need to cache categories?

If we also cached category and description, we could use the cache for |mach help| instead of doing  force_load. Would be good follow-up fodder.

Beyond that I don't see why it would be necessary.

::: build/
(Diff revision 2)
>                                os.path.join(topsrcdir, 'build',
>                                             ''),
>                                get_state_dir()[0]],
>                                stdout=devnull, stderr=devnull)
> +    def topsrcdir_state_dir():

Can this also live in mozboot.util? I know of several instances already where we store srcdir-specific data that should be moved to this new directory. It would be more convenient to update if we could simply call this new method instead of requiring a mach context.

::: build/
(Diff revision 2)
>          mach.define_category(category, meta['short'], meta['long'],
>              meta['priority'])
> -    for path in MACH_MODULES:
> -        mach.load_commands_from_file(os.path.join(mozilla_dir, path))
> +    state_dir, state_dir_exists = topsrcdir_state_dir()
> +    if state_dir_exists:
> +        dispatch_cache = os.path.join(state_dir, 'mach_dispatch.json')

The cached tasks for |mach try| are being stored under a state_dir/cache directory, we should put caching related stuff in the same directory.

::: python/mach/mach/
(Diff revision 2)
> +    except ValueError:
> +        return None

I think we should at least print a warning anytime the cache isn't formatted properly. Otherwise developers won't realize anything is wrong and we won't detect bugs.

I'd prefer we just raised, but understand you might want to avoid that for fear of breaking mach for everyone.

::: python/mach/mach/
(Diff revision 2)
> +        # For the cache to be used:
> +        #
> +        # 1. The set of commands files must align exactly.
> +        # 2. All commands files must be unchanged from cached data.
> +        if cache and set(cache['source_files']) == self.source_files:
> +            mtimes = cache['source_files']
> +            if all(os.path.getmtime(p) <= mtimes[p] for p in self.source_files):
> +                self.command_handlers = LazyCommandsDict(self, cache['commands'])
> +                cache_used = True
> +
> +        # Cache wasn't used. Load everything normally.
> +        if not cache_used:
> +            for path in source_files:
> +                self.register_commands_file(path)

Why does this need to be all or nothing? Let's load what we can from the cache, and load what's left normally.
Attachment #8895627 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8895627 [details]
Bug 1388894 - Cache mach commands dispatch table;

I must say, I don't really like that this relies on a cache, that needs to be (in)validated.
Very few people add mach commands, and that's not something that happens often. Why not keep a hardcoded dict of command->file mappings in tree, and validate it corresponds to the current code in a python test?

::: build/
(Diff revision 2)
> +        for state belonging to this topsrcdir.
> +        """
> +        import hashlib
> +
> +        state_dir = get_state_dir()[0]
> +        ident = hashlib.md5(topsrcdir).hexdigest()

You should canonicalize topsrcdir here. Something like os.path.normcase(os.path.abspath())
Attachment #8895627 - Flags: review?(mh+mozilla)
Since lazy loading of will facilitate running `mach` with Python 3, tracking this against overall Python 3 efforts.
Blocks: buildpython3
Product: Core → Firefox Build System
If this is still something we plan on fixing and we go the cache route, we should depend on bug 1483228.
Depends on: 1483228

Re-reading the cache patches, I think I'd prefer a single static place to register commands. Instead of a tuple, maybe it can be a yml/toml/json file where each command is keyed by its top-level name and contains some metadata. For example, 'path', 'description', 'conditions', etc. Just the bare minimum we need to show |mach help| and dispatch. After dispatch, things can keep working the way they do now. If we really wanted to avoid centralization, we could register this information in I guess, but that adds its own overhead.

We also don't need to overthink settings. Settings that should be loaded globally, need to be defined somewhere globally (e.g in mach itself). Settings that only affect a specific command, get defined next to that command. I think the only corner case is if a command depends on settings that were originally defined somewhere else for another purpose. I don't think a case like this exists, but if it does maybe we can get that second command to re-define the settings.

We will need to adjust the mach-commands, mach-debug-commands and mach-completion commands though.

Blocks: 1519990
Blocks: 1555762
You need to log in before you can comment on or make changes to this bug.