Closed Bug 883954 Opened 11 years ago Closed 9 years ago

Figure out how to express code generation in moz.build

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox38 fixed)

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: ted, Assigned: froydnj)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(3 files, 7 obsolete files)

8.98 KB, patch
gps
: review+
Details | Diff | Splinter Review
2.56 KB, patch
gps
: review+
Details | Diff | Splinter Review
16.57 KB, patch
gps
: review+
Details | Diff | Splinter Review
We have a bunch of places where we do code generation in Makefiles, and they all have custom rules at the moment. We'll need to migrate this all to moz.build, so we should figure how to express these things in a moz.build world.

Here's a strawman:
GENERATED_FILES += {
  "outputs": ["foo.h"],
  "inputs": ["foo.tmpl"],
  "generator": "foo.py",
  "arguments": ["-d", ...]
}

I'm not sure how we'd express arguments precisely, I think we'd probably want to be able to interpolate the values of "inputs" and "outputs" in there.

Thoughts?
No longer blocks: nomakerules
So a few things about our WebIDL generator:

1)  The list of inputs blows out the command line with all the other stuff that has to be on the command line length.  Right now we pass it via writing it to a file and then passing that file as an argument.

2)  We want one of the arguments to the generator to be the value of $?.  But it blows out the command line length, so we write $? to a file and pass that filename.

3)  The outputs have dependencies other than the inputs and the generator script.

4)  The outputs don't depend directly on the inputs, but on a dummy target that the generator knows how to regenerate.  This dummy target depends on some other targets (which are the things that should go into $?) which themselves depend on the inputs indirectly (that's expressed via .pp files).

5)  The sets of .pp files that are relevant during export (when the codegen is done) and libs are disjoint.
Looking at the comm-central build rules (<http://www.tjhsst.edu/~jcranmer/bxr-comm-central.html>), things that fall under this category loosely include (not counting things that look packager-ish or l10n-ish):

1. libical's autogenerated files, done by Perl scripts. But calendar folks are working on killing libical, so I'm not going to put heavy weight on it.
2. Calendar builds a timezone.sqlite by running some sqlite update commands and some xpcshell JS logic. (o_O)
3. Some png files get converted to icons by a png2ico Python script.
4. MIDL generation (arguably, this is used just enough that it should be a "core" config/rules.mk thing).
5. Suite does some sed file mucking about that looks like it might just want some CONFIG_SUBST_FILES love.

If we limit our concerns to python-based generators, we look at tweaking our preprocessor architecture for some other things (e.g., installer stuff), and we add MIDL to rules.mk, then that should solve all major use cases; everything else could arguably be shoved into a python wrapper script if need be.

In terms of input/output constraints, I've been looking at the rules mostly in terms of a "sh -c" approach. Since the python files are under our control, we can change some stuff as necessary to suit needs, but I don't think $(PYTHON) args $(infiles) > $(outfile) is necessarily the correct thing, though I could probably make it work for stuff in comm-central.
We might get lucky with a number of "generation" "tasks," but as bz pointed out, I don't think there is a silver bullet and we'll likely always have to deal with one-offs.
We talked about this and we think we want to scope this very tightly to cover only simple cases. Complicated things like dom/bindings will be treated as special cases and get handled separately in moz.build.
Assignee: nobody → ted
Blocks: 892644
We can likely hook this up to the work done in bug 891474 and require that the "generator" function be the name of a module in mozbuild.actions. This would have the side-effect (IMO a positive one) of requiring all known build actions to be "registered" in a central location. This is in-line with our goal of reducing foot guns since it establishes a clear and obvious review requirement for the build team to r+ any new build actions to the build system. This should help reduce duplication and increase quality.
Version: 22 Branch → Trunk
For reference:

http://facebook.github.io/buck/rule/genrule.html
https://code.google.com/p/gyp/wiki/GypLanguageSpecification#Rules
https://github.com/twitter/commons/tree/master/src/python/twitter/pants/targets

I /think/ the pants target we'd be interested in is in python_target.py.

Looking at those, my strawman would be:

build_action('<action>', inputs=['foo.in'], outputs=['foo.out'], arguments=['-i', 'foo.in', '-o', 'foo.out'])

Maybe also throw in a stage/tier in there (at least temporarily). Long term, I imagine tiers go away and we do all this basic install/action stuff wheneer in the build.

Although, I wish arguments could be defined easier. Ideally we'd have something that allows simple repeated arguments for each input. e.g. "add |-i <arg>| for each item in <inputs>." Short of inventing a custom formatting language, I'm not sure what else we can do here.
How hard would it be to, rather than passing the arguments on the command line, call some function in the build action? Then we could default to passing through the inputs/outputs arguments to that function.
In terms of scoping requirements, and in the interest of making all of this stuff explicit, there are some questions I have:

1. Are we looking at only supporting python code generation "scripts"? Or is this designed to support other kinds of code generation, e.g., midl generation?

2. What are paths relative to? Inputs seem like they should be relative to srcdir and outputs to the object directory, but there may be cases where other directories are intended.

3. How are paths passed to the tool? Absolute, or relative? If absolute, which format of path is used on Windows? What is the working directory of the invocation?

4. If this is explicitly being scoped to only python code generation, how exactly will the python script be invoked?

5. How to handle implicit dependency generation? Some of our "more advanced" dependency scripts (e.g., those in js/xpconnect/src) generate extra Makefile dependency lists. These are useful only to a make backend, and I wonder if there are other things we can do to be more backend-neutral.
(In reply to :Ms2ger from comment #7)
> How hard would it be to, rather than passing the arguments on the command
> line, call some function in the build action? Then we could default to
> passing through the inputs/outputs arguments to that function.

Oh, that's a fantastic idea! If we require all code generation "actions" to be Python functions/modules somewhere, that opens the door to a lot of awesomeness. If an underlying action isn't implemented in Python, that Python action can just shell out or POpen the underling script/executable.
Depends on: 941450
So here's a strawman to play with. I haven't actually tested it all the way through, I ran out of time hacking it together. You can look at the test moz.build files here to see how it works, this basically lets you do:

GENERATED_FILES += [
    GeneratedFile("foo.bar", action="foo", inputs=["a","b"], depends="c")
]

...which will invoke "foo" as a py_command. Comments welcome. Needs to be fleshed out a bit more obviously.
How about using the same kind of construct as other variables?

GENERATED_FILES += ['foo.bar']

GENERATED_FILES['foo.bar'].action = 'foo'
GENERATED_FILES['foo.bar'].inputs += [ 'a', 'b']
GENERATED_FILES['foo.bar'].depends = 'c'
That seems a lot more verbose given that every one of these will have to specify at least outputs and action.
As I mentioned on IRC: there's no provision for extra arguments in this patch.
Blocks: 973708
This is fleshed out to the point of usefulness. You can look at the unit tests to see how it works, but basically it boils down to:

generate_file('generated_file',
              'script.py',
              inputs=['whatever'])

This will add a rule to the Makefile that will invoke a main() method from script.py with an output and inputs parameters which are a FileAvoidWrite opened to the output file, and a dict of input file basenames -> open file objects on output files.

glandium has already said he doesn't like this approach, and that's okay, I put this up here to have a working prototype that we can talk about. I think I could reuse a lot of this code to implement what he suggested in comment 11, I may take a crack at that.

I successfully converted one existing code-generation instance to use this, I'll attach that in another patch.
Attachment #8379947 - Attachment is obsolete: true
This converts the netwerk/dns/etld_data.inc code generation Makefile/script to use the new moz.build setup.
Attachment #8464388 - Flags: feedback?(mshal)
Attachment #8464388 - Flags: feedback?(mh+mozilla)
Attachment #8464388 - Flags: feedback?(gps)
Here's glandium's suggested approach. I think I agree with him that this is nicer.
...and the converted Makefile using this approach.
Comment on attachment 8464672 [details] [diff] [review]
Support for generated files in moz.build by way of flags on GENERATED_FILES entries

Feedback on which approach you prefer would be most useful, but feedback on any other part is welcome as well.
Attachment #8464672 - Flags: feedback?(mshal)
Attachment #8464672 - Flags: feedback?(mh+mozilla)
Attachment #8464672 - Flags: feedback?(gps)
I promise I'll take a look tomorrow.
I added 'args' to the script input, I found a case where it would be useful pretty quickly after looking for more generated files that could be converted.
Attachment #8465489 - Flags: feedback?(mh+mozilla)
Attachment #8464388 - Attachment is obsolete: true
Attachment #8464388 - Flags: feedback?(mshal)
Attachment #8464388 - Flags: feedback?(mh+mozilla)
Attachment #8464388 - Flags: feedback?(gps)
Comment on attachment 8465489 [details] [diff] [review]
Support for generated files in moz.build by way of flags on GENERATED_FILES entries

Review of attachment 8465489 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is on the right track. There's one case neither versions try to handle, and for which the function version would actually work better, however much I don't like it: multi-input multi-output. A (probably awkward) way to do it with assignments would be something like (deriving from the unit-test):

GENERATED_FILES += [
    'abc.xyz',
    'foo.bar',
]
 
f = GENERATED_FILES['abc.xyz', 'foo.bar']
f.script = 'foo'
f.inputs = ['a', 'b']
f.args = ['1', 2, True]
f.deps = ['c']

::: python/mozbuild/mozbuild/action/file_generate.py
@@ +72,5 @@
> +        return 1
> +    return ret if ret is not None else 0
> +
> +if __name__ == '__main__':
> +    sys.exit(main(sys.argv[1:]))

I think it would be less complex if you added a module providing a base class that generator action scripts would derive from, instead of having a wrapper script.

You could also get rid of the json file by using nargs='+' arguments with argparse (less files to write by the backend == better).

::: python/mozbuild/mozbuild/backend/recursivemake.py
@@ +1093,5 @@
> +""".format(output=mozpath.basename(info.output),
> +           inputs=' '.join(info.inputs),
> +           deps=' '.join(info.deps),
> +           script=info.script,
> +           script_input=script_input))

For a future improvement, it would be good to have dependency generation (e.g. with deep python module dependencies, preprocessor dependencies, etc. as appropriate). Speaking of which, what kind of manual dependency did you have in mind when you added .deps?

::: python/mozbuild/mozbuild/frontend/emitter.py
@@ +411,5 @@
> +                    # TODO: validate inputs/deps existence
> +                    g = GeneratedFile(sandbox,
> +                                      mozpath.join(sandbox['OBJDIR'], f),
> +                                      mozpath.join(sandbox['SRCDIR'],
> +                                                   flags.script),

Would be better to do the path adjustments in GeneratedFile, especially because SandboxDerived instances have direct attributes for that (self.objdir, self.srcdir)

::: python/mozbuild/mozbuild/frontend/sandbox_symbols.py
@@ +116,5 @@
>          and built as a single source file. This can help make the build faster
>          and reduce the debug info size.
>          """, None),
>  
> +    'GENERATED_FILES': (StrictOrderingOnAppendListWithFlagsFactory({'script': unicode, 'inputs': list, 'args': list, 'deps': list}), list,

indent

Maybe to make it more obvious it's meant to be used with "pyaction", replace "script" with "action"?

@@ +121,4 @@
>          """Generic generated files.
>  
> +        This variable contains a list of generated files for the build system
> +        to generate at export time.

description of the flags would be nice :)
Attachment #8465489 - Flags: feedback?(mh+mozilla) → feedback+
(In reply to Mike Hommey [:glandium] from comment #21)
> I think this is on the right track. There's one case neither versions try to
> handle, and for which the function version would actually work better,
> however much I don't like it: multi-input multi-output. A (probably awkward)
> way to do it with assignments would be something like (deriving from the
> unit-test):

I originally had multi-output support in the functional version, but I removed it. I haven't looked at every instance in the tree, but I suspect that anything that's generating multiple files is going to want custom moz.build infrastructure anyway (thinking specifically of MIDL here). My plan was to start going through and converting Makefile rules to use this and see if it was actually necessary.

 
> f = GENERATED_FILES['abc.xyz', 'foo.bar']

I had no idea this was even legal Python. It's not terrible, I think we can keep this in our back pocket and use it if we find cases where it would make things work in a straightforward way.

> ::: python/mozbuild/mozbuild/action/file_generate.py
> @@ +72,5 @@
> > +        return 1
> > +    return ret if ret is not None else 0
> > +
> > +if __name__ == '__main__':
> > +    sys.exit(main(sys.argv[1:]))
> 
> I think it would be less complex if you added a module providing a base
> class that generator action scripts would derive from, instead of having a
> wrapper script.

So, I really want to allow generator scripts to live in the source tree next to their inputs and moz.build definition, I think that makes sense. Given that I think the file_generate action is necessary, so I don't see how your suggestion will improve things. Can you show me some sample code of what you had in mind?

> You could also get rid of the json file by using nargs='+' arguments with
> argparse (less files to write by the backend == better).

I agree with the file writing, and maybe we could just write out one big JSON file for all generated files, but one nicety of the JSON file is that the 'args' parameter will get serialized in a sane way in the trip between moz.build and the generator script.


> For a future improvement, it would be good to have dependency generation
> (e.g. with deep python module dependencies, preprocessor dependencies, etc.
> as appropriate). Speaking of which, what kind of manual dependency did you
> have in mind when you added .deps?

Yeah, I tried to get a start at this with providing file objects for inputs/output, ideally we'd be able to ban the use of open/file directly and force everything to go through wrappers that could write useful deps. I don't remember what 'deps' were for, they may not be useful as-is. If we find they're not being used we can always remove them.

> Would be better to do the path adjustments in GeneratedFile, especially
> because SandboxDerived instances have direct attributes for that
> (self.objdir, self.srcdir)

OK

> Maybe to make it more obvious it's meant to be used with "pyaction", replace
> "script" with "action"?

I had it that way before, but as written I use the file_generate action to load a script. I should update the documentation to be clear about this.

> > +        This variable contains a list of generated files for the build system
> > +        to generate at export time.
> 
> description of the flags would be nice :)


Yeah, this needs docs. I noticed that SOURCES doesn't have docs for its flags, I was trying to figure out if we had an example I could crib from.
Attachment #8464672 - Attachment is obsolete: true
Attachment #8464672 - Flags: feedback?(mshal)
Attachment #8464672 - Flags: feedback?(mh+mozilla)
Attachment #8464672 - Flags: feedback?(gps)
Attachment #8464390 - Attachment is obsolete: true
Comment on attachment 8464673 [details] [diff] [review]
convert etld generation to new moz.build mechanism (v2)

Review of attachment 8464673 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/dns/moz.build
@@ +61,5 @@
>      'etld_data.inc',
>  ]
>  
> +GENERATED_FILES['etld_data.inc'].script = 'prepare_tlds.py'
> +GENERATED_FILES['etld_data.inc'].inputs = ['effective_tld_names.dat']

Idea: Have GENERATED_FILES be a defaultdict. That would get rid of some boilerplate. We could easily check for empty entries post-evaluation and raise.
I'm not sure how that helps? I realize I could rewrite those lines like:
g = GENERATED_FILES['etld_data.inc']
g.script = ...
g.inputs = ...

I also considered adding a .update method to the Flags class so I could write:
GENERATED_FILES['etld_data.inc'].update(script=..., inputs=...)
Depends on: 1050922
This patch is mostly useful for being able to see these changes
independently of the major changes to GENERATED_FILES.  We are going to
need proper moz.build objects for GENERATED_FILES when we add the
ability to define scripts and arguments for them, so we might as well do
that first.
Attachment #8537384 - Flags: review?(mshal)
Now that we have proper moz.build objects for GENERATED_FILES, we can
add 'script' flags and 'args' flags in moz.build for select
GENERATED_FILES.  We restrict 'args' to being filenames for ease of
implementing checks for file existence, and many (all?) of the examples
of file generation throughout the tree don't need arbitrary strings or
Python data.

I chose to do things differently than Ted's write-out-a-json-file approach for
several reasons:

- Fewer files being generated by build backends is a good thing.

- I didn't see much evidence that build scripts throughout the tree needed
  arbitrary arguments, so I decided to enforce filenames only.

- Even with only requiring filenames, I didn't implement Ted's "pass in
  already-opened filehandles for input files" because we have several scripts
  that get invoked like:

    script.py input-data1 > output1
    script.py input-data2 > output2

  and I didn't see a good way to make that work with Ted's approach.

We will see how well these choices hold up.
Attachment #8537387 - Flags: review?(mshal)
As a proof-of-usefulness, let's change the build process for
etld_data.inc to write out the rules generating it automatically through
moz.build.
Attachment #8537388 - Flags: review?(mshal)
(In reply to Nathan Froyd (:froydnj) from comment #26)
> Created attachment 8537387 [details] [diff] [review]
> - I didn't see much evidence that build scripts throughout the tree needed
>   arbitrary arguments, so I decided to enforce filenames only.

I kind of stalled out on this because of trying to figure out requirements for what's already in the tree. I will note that one of the in-process patches I had was converting instances of manual Preprocessor.py invocation, where the same input file was being preprocessed with different preprocessor args into different output files:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/0ca1814a9f1b/generated-files

> 
> - Even with only requiring filenames, I didn't implement Ted's "pass in
>   already-opened filehandles for input files" because we have several scripts
>   that get invoked like:
> 
>     script.py input-data1 > output1
>     script.py input-data2 > output2
> 
>   and I didn't see a good way to make that work with Ted's approach.

My plan was to simply fix the scripts as we did this, you can see from my etld_data.inc changes that it wasn't terribly invasive for that script (not sure about other scripts):
http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/0ca1814a9f1b/generated-file-usage-2#l62

I'm not incredibly attached to these choices though, I was just trying to build something that would have minimal footguns for developers trying to use it. If you're going to finish this off you should do whatever you think is expedient. :)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28)
> (In reply to Nathan Froyd (:froydnj) from comment #26)
> > Created attachment 8537387 [details] [diff] [review]
> > - I didn't see much evidence that build scripts throughout the tree needed
> >   arbitrary arguments, so I decided to enforce filenames only.
> 
> I kind of stalled out on this because of trying to figure out requirements
> for what's already in the tree. I will note that one of the in-process
> patches I had was converting instances of manual Preprocessor.py invocation,
> where the same input file was being preprocessed with different preprocessor
> args into different output files:
> http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/0ca1814a9f1b/
> generated-files

Ah, bleh.  Maybe we should say we can do something like:

gfile.script = 'whatever.py'
gfile.options = ['-D' 'foo']
gfile.inputs = ['file1', 'file2']

?

> > - Even with only requiring filenames, I didn't implement Ted's "pass in
> >   already-opened filehandles for input files" because we have several scripts
> >   that get invoked like:
> > 
> >     script.py input-data1 > output1
> >     script.py input-data2 > output2
> > 
> >   and I didn't see a good way to make that work with Ted's approach.
> 
> My plan was to simply fix the scripts as we did this, you can see from my
> etld_data.inc changes that it wasn't terribly invasive for that script (not
> sure about other scripts):
> http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/0ca1814a9f1b/
> generated-file-usage-2#l62

I was mostly thinking about something like:

http://mxr.mozilla.org/mozilla-central/source/security/apps/Makefile.in#5
(In reply to Nathan Froyd (:froydnj) from comment #26)
> Created attachment 8537387 [details] [diff] [review]
> part 2 - add support for defining generating scripts for GENERATED_FILES
> 
> Now that we have proper moz.build objects for GENERATED_FILES, we can
> add 'script' flags and 'args' flags in moz.build for select
> GENERATED_FILES.  We restrict 'args' to being filenames for ease of
> implementing checks for file existence, and many (all?) of the examples
> of file generation throughout the tree don't need arbitrary strings or
> Python data.

I don't really understand restricting 'args' to being filenames.  Is the idea that you're going to add the args as pre-requisite Make targets?  I think this is useful but not sufficient.  We don't have a way to define "high level" dependencies in moz.build right now, but it seems clear to me that this is not tenable over time, and we shouldn't paper over this forever.

One place that I currently use "non filename args" is in mobile/android/base/locales/Makefile.in, where we run custom Python scripts to generate l10n resources.  I expect that as I we move l10n forward we'll have lots of similar Python scripts that care about AB_CD.  How would such cases fit into this proposal?

(It's possible we'll skew hard the other way and remove l10n from the regular Make process entirely, and therefore run no scripts that care about AB_CD, in which case this is a nice forcing function.)
(In reply to Nick Alexander :nalexander from comment #30)
> I don't really understand restricting 'args' to being filenames.  Is the
> idea that you're going to add the args as pre-requisite Make targets?  I
> think this is useful but not sufficient.  We don't have a way to define
> "high level" dependencies in moz.build right now, but it seems clear to me
> that this is not tenable over time, and we shouldn't paper over this forever.
>
> One place that I currently use "non filename args" is in
> mobile/android/base/locales/Makefile.in, where we run custom Python scripts
> to generate l10n resources.  I expect that as I we move l10n forward we'll
> have lots of similar Python scripts that care about AB_CD.  How would such
> cases fit into this proposal?

I was mostly attempting to ensure that I could easily tell what things are filenames, so that we could check for existence of those files at moz.build execution time and complain, something that's already found stupid mistakes while I was converting etld_data.inc.  What's in the patch is The Simplest Thing That Could Possibly Work(tm).

But as you and Ted have pointed out, if we really want this to be workable for all the scripts in the tree, we're going to need to figure out some way to tack on arguments.  (I think I'm going to ignore the AB_CD stuff in **/locales/Makefile.in, as I don't think there's a good way to represent that in moz.build short of stuffing "$(AB_CD)" into moz.build files, and letting that pass through the recursive make backend unmodified, which is a hack.  Ideally glandium will solve those issues in Q1.)

Maybe we can say that, pace comment 29, scripts can have "options" and "inputs"; inputs must be filenames, but options are permitted to be freeform things, and scripts are always invoked:

  script.py $(options) $(inputs)

?  I realize that means that people might define their input filenames as options, and lose out on extra checking, but maybe that's not so bad...?
I guess I am just re-proposing Ted's scheme, aren't I?
(In reply to Nathan Froyd (:froydnj) from comment #31)
> Maybe we can say that, pace comment 29, scripts can have "options" and
> "inputs"; inputs must be filenames, but options are permitted to be freeform
> things, and scripts are always invoked:
> 
>   script.py $(options) $(inputs)

http://dxr.mozilla.org/comm-central/source/mail/app/Makefile.in#62
write-message.ico: $(mailtoolbar) $(srcdir)/png2ico.py
  $(PYTHON) $(srcdir)/png2ico.py $(mailtoolbar) 19 1 16 write-message.ico
(after undoing some substitutions)
Any reference to AB_CD is neck deep in l10n repacks and is dependent on glandium's work to rewrite l10n repacks. It is best to avoid anything dealing with AB_CD or jar.mn files.
(In reply to Joshua Cranmer [:jcranmer] from comment #33)
> (In reply to Nathan Froyd (:froydnj) from comment #31)
> > Maybe we can say that, pace comment 29, scripts can have "options" and
> > "inputs"; inputs must be filenames, but options are permitted to be freeform
> > things, and scripts are always invoked:
> > 
> >   script.py $(options) $(inputs)
> 
> http://dxr.mozilla.org/comm-central/source/mail/app/Makefile.in#62
> write-message.ico: $(mailtoolbar) $(srcdir)/png2ico.py
>   $(PYTHON) $(srcdir)/png2ico.py $(mailtoolbar) 19 1 16 write-message.ico
> (after undoing some substitutions)

mailtoolbar = ...
GENERATED_FILES += ['write-message.ico']
ico = GENERATED_FILES['write-message.ico']
ico.script = 'png2ico.py'
ico.options = ['19', '1', '16']
ico.inputs = [mailtoolbar]

?

I mean, we'd have to rewrite the script a little bit for the different argument order, but presumably we'd have to do that anyway for getting the output file right.

Actually, that raises another interesting point: write-message.ico needs to be written in binary mode...but we don't want to open every output file in binary mode...do we?  Maybe we need a .binary flag on GENERATED_FILES things...
> but we don't want to open every output file in binary mode...do we?

Actually, do we actually want to open most output files in text mode? All that achieves is having files with automatic \r\n on Windows, which is pretty much pointless.
Comment on attachment 8537384 [details] [diff] [review]
part 1 - make GENERATED_FILES emit proper moz.build objects

Moving these over to gps, as mshal is busy and going on PTO.
Attachment #8537384 - Flags: review?(mshal) → review?(gps)
Attachment #8537387 - Flags: review?(mshal) → review?(gps)
Attachment #8537388 - Flags: review?(mshal) → review?(gps)
Attachment #8537384 - Flags: review?(gps) → review+
Comment on attachment 8537387 [details] [diff] [review]
part 2 - add support for defining generating scripts for GENERATED_FILES

Review of attachment 8537387 [details] [diff] [review]:
-----------------------------------------------------------------

This gets my blessing. We could bikeshed on missing features, etc. But something is better than nothing. IMO having the full definition in moz.build is better than having Makefile.in files stick around.

Regarding using Python scripts for doing the generation, my utopian end vision is that we not invoke things as Python scripts. Instead, everything is a module and can be imported from the virtualenv. I would favor a solution that treated generation code as modules, not scripts. But we are so far from this ideal vision that it probably isn't worth worrying about now. As long as we have a migration path for converting things to modules, I'm happy.

BTW, you should install a Python linter into your version control setup. The "mozext" extension from https://hg.mozilla.org/hgcustom/version-control-tools/file/default/hgext/mozext will do this (assuming you use Mercurial). flake8 has some Git hooks that you should consider installing globally.

::: python/mozbuild/mozbuild/action/file_generate.py
@@ +3,5 @@
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +# Given a Python script and arguments describing the output file, and
> +# the arguments that can be used to generate the output file, call the
> +# script's |main| method appropriate arguments.

"with" appropriate arguments.

@@ +26,5 @@
> +    args = parser.parse_args(argv)
> +
> +    script = args.python_script
> +    sys.path.append(os.path.dirname(script))
> +    module = __import__(os.path.splitext(os.path.basename(script))[0])

sys.path hackery should not be needed (in the ideal world). But for this short-lived script, it is probably OK.

You probably want imp.load_module(). I won't hold back r+ on this, however.

@@ +30,5 @@
> +    module = __import__(os.path.splitext(os.path.basename(script))[0])
> +    if not hasattr(module, 'main'):
> +        print('Error: script "{0}" is missing a main method'.format(script),
> +              file=sys.stderr)
> +        return 1

I'm not a huge fan of requiring a "main" function in the script. The |if __name__ == '__main__'| hack is standard practice, as ugly as that may be.

I'd be a fan of changing the definition of the generation method to path/to/file.py:method or package.module.method. I reckon this could be a follow-up bug.

@@ +38,5 @@
> +        with FileAvoidWrite(args.output_file) as output:
> +            ret = module.main(output, *args.additional_arguments)
> +    except IOError as e:
> +        print('Error opening file "{0}"'.format(e.filename), file=sys.stderr)
> +        import traceback

Just import traceback at the file level.

@@ +41,5 @@
> +        print('Error opening file "{0}"'.format(e.filename), file=sys.stderr)
> +        import traceback
> +        traceback.print_exc()
> +        return 1
> +    return ret if ret is not None else 0

sys.exit(None) == sys.exit(0). You should just be able to |return ret| here.

::: python/mozbuild/mozbuild/frontend/context.py
@@ +492,5 @@
>          """Generic generated files.
>  
> +        This variable contains a list of files for the build system to
> +        generate at export time. The generation method may be declared
> +        with optional 'script' and 'args' flags on individual entries.

These docs get parsed as RST. Please use ``var`` for quoting, as it gets turned into <code> by Sphinx. e.g. https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/build/buildsystem/mozbuild-symbols.html#test-harness-files

@@ +499,5 @@
> +        the associated Makefile.in.
> +
> +        Example::
> +
> +           GENERATED_FILES += [ 'bar.c', 'baz.c', 'foo.c' ]

Nit: please cuddle braces

::: python/mozbuild/mozbuild/frontend/emitter.py
@@ +516,5 @@
> +                output = f
> +                if flags.script:
> +                    script = mozpath.join(context.srcdir, flags.script)
> +                    args = [mozpath.join(context.srcdir, a) for a in flags.args]
> +                        

Nit: trailing whitespace

@@ +520,5 @@
> +                        
> +                    if not os.path.exists(script):
> +                        raise SandboxValidationError(
> +                            'Script for generating %s does not exist: %s'
> +                            % (f, script), context)

I imagine one day we'll want to define the generation method using package.module.method so we can just import modules and invoke them like any other Python method. I reckon we can invent a .method attribute for these with different processing semantics.

@@ +529,5 @@
> +                    for a in args:
> +                        if not os.path.exists(a):
> +                            raise SandboxValidationError(
> +                                'Input for generating %s does not exist: %s'
> +                                % (f, a), context)

It seems weird to me that arguments must be files. Can you not have e.g. --foo flags to the scripts we invoke?

::: python/mozbuild/mozbuild/test/frontend/test_emitter.py
@@ +198,5 @@
> +        reader = self.reader('generated-files-no-script')
> +        with self.assertRaisesRegexp(SandboxValidationError,
> +            'Script for generating bar.c does not exist'):
> +            objs = self.read_topsrcdir(reader)
> +        

Nit: 3 lines here having trailing whitespace.

@@ +209,5 @@
> +    def test_generated_files_no_python_script(self):
> +        reader = self.reader('generated-files-no-python-script')
> +        with self.assertRaisesRegexp(SandboxValidationError,
> +            'Script for generating bar.c is not a python script'):
> +            objs = self.read_topsrcdir(reader)

Thank you for writing these tests!
Attachment #8537387 - Flags: review?(gps) → feedback+
Attachment #8537388 - Flags: review?(gps) → review+
(In reply to Gregory Szorc [:gps] from comment #38)
> @@ +529,5 @@
> > +                    for a in args:
> > +                        if not os.path.exists(a):
> > +                            raise SandboxValidationError(
> > +                                'Input for generating %s does not exist: %s'
> > +                                % (f, a), context)
> 
> It seems weird to me that arguments must be files. Can you not have e.g.
> --foo flags to the scripts we invoke?

Yeah, this is comment 28 and following.  As in comment 31, I'd like to make sure that we have some way of differentiating filename arguments from non-filename arguments.  I don't have much to add to the discussion that's not already been said.  The proposed schemes, AFAICT, boil down to:

- Have some split between "options" and "inputs"/"args", where things in the latter category are guaranteed to be filenames.  This means that non-filename, non-optional arguments need to be placed in "options", which is a little weird (comment 35).  This also opens the possibility of people putting files in the "options" category, and losing the static checking this scheme was designed to bring.
- Just stuff everything together and lose the static check that script file inputs appear in the source tree.

Ideally, there would be something better...
Flags: needinfo?(gps)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #39)
> (In reply to Gregory Szorc [:gps] from comment #38)
> > @@ +529,5 @@
> > > +                    for a in args:
> > > +                        if not os.path.exists(a):
> > > +                            raise SandboxValidationError(
> > > +                                'Input for generating %s does not exist: %s'
> > > +                                % (f, a), context)
> > 
> > It seems weird to me that arguments must be files. Can you not have e.g.
> > --foo flags to the scripts we invoke?
> 
> Yeah, this is comment 28 and following.  As in comment 31, I'd like to make
> sure that we have some way of differentiating filename arguments from
> non-filename arguments.  I don't have much to add to the discussion that's
> not already been said.  The proposed schemes, AFAICT, boil down to:
> 
> - Have some split between "options" and "inputs"/"args", where things in the
> latter category are guaranteed to be filenames.  This means that
> non-filename, non-optional arguments need to be placed in "options", which
> is a little weird (comment 35).  This also opens the possibility of people
> putting files in the "options" category, and losing the static checking this
> scheme was designed to bring.

There's options, arguments, parameters, inputs, ... enough to confuse no matter which semantic distinction we draw.  For some of the Android stuff, I broke down and annotated certain compound data types with "extra_recursive_make_targets" to be explicit about (Make-based) dependency tracking.

> - Just stuff everything together and lose the static check that script file
> inputs appear in the source tree.

I anticipate pipelining, meaning feeding generated files into code generating functions, so while this has value it probably needs to be more than "static in source tree".
(In reply to Nick Alexander :nalexander from comment #40)
> > - Have some split between "options" and "inputs"/"args", where things in the
> > latter category are guaranteed to be filenames.  This means that
> > non-filename, non-optional arguments need to be placed in "options", which
> > is a little weird (comment 35).  This also opens the possibility of people
> > putting files in the "options" category, and losing the static checking this
> > scheme was designed to bring.
> 
> There's options, arguments, parameters, inputs, ... enough to confuse no
> matter which semantic distinction we draw.  For some of the Android stuff, I
> broke down and annotated certain compound data types with
> "extra_recursive_make_targets" to be explicit about (Make-based) dependency
> tracking.

Yeah, requiring annotations in moz.build files is an option too.

> > - Just stuff everything together and lose the static check that script file
> > inputs appear in the source tree.
> 
> I anticipate pipelining, meaning feeding generated files into code
> generating functions, so while this has value it probably needs to be more
> than "static in source tree".

Do you have an example of this sort of thing?
> > I anticipate pipelining, meaning feeding generated files into code
> > generating functions, so while this has value it probably needs to be more
> > than "static in source tree".
> 
> Do you have an example of this sort of thing?

Not many, but Android builds do more generation than vanilla builds.  For example, at a high level, we preprocess strings.xml, which is an input to aapt, which generates R.java, which gets compiled into gecko.ap_, which gets packaged into fennec.apk.

A smaller, more concrete example is: in the past, we used to preprocess a "fragment" file into $OBJDIR, and then #include that fragment into AndroidManifest.xml (again using the preprocessor).  I got rid of that :)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #39)
> (In reply to Gregory Szorc [:gps] from comment #38)
> > @@ +529,5 @@
> > > +                    for a in args:
> > > +                        if not os.path.exists(a):
> > > +                            raise SandboxValidationError(
> > > +                                'Input for generating %s does not exist: %s'
> > > +                                % (f, a), context)
> > 
> > It seems weird to me that arguments must be files. Can you not have e.g.
> > --foo flags to the scripts we invoke?
> 
> Yeah, this is comment 28 and following.  As in comment 31, I'd like to make
> sure that we have some way of differentiating filename arguments from
> non-filename arguments.  I don't have much to add to the discussion that's
> not already been said.  The proposed schemes, AFAICT, boil down to:
> 
> - Have some split between "options" and "inputs"/"args", where things in the
> latter category are guaranteed to be filenames.  This means that
> non-filename, non-optional arguments need to be placed in "options", which
> is a little weird (comment 35).  This also opens the possibility of people
> putting files in the "options" category, and losing the static checking this
> scheme was designed to bring.
> - Just stuff everything together and lose the static check that script file
> inputs appear in the source tree.

In my original patch series I figured we'd just modify the scripts as we switched them to this feature. I think trying to support arbitrary commandlines on scripts is just going to be a headache and you should force a specific commandline on them. My patches went farther and imported the script as a module and called a method with a specific set of arguments, but that may not be necessary.
I think this version addresses review comments.

I've kept the must-be-filename restriction on 'args', and not provided any
extra support for --options or similar.  I looked at all the cases in tree
where we invoke $(PYTHON) directly; ~90% of the current use cases are OK with
this restriction.

For those that aren't (e.g. mobile/android/base/fennec-ids-generator.py), I'd
argue that the support for --options is really a case of overgeneralization.
Most of these scripts are written to generate one specific set of files; it is
premature generalization to dress them up with lots of options.  We ought to be
able to do away with the -o/--options in the conversion to moz.build
GENERATED_FILES.  (I admit that -o/--options can make reading things slightly
nicer; I think this could be addressed by generating things from config.status,
see below.)

For those things that actually need --options (I think
toolkit/xre/make-platformini.py falls into this category), I think they can be
addressed with the:

GENERATED_FILES['foo'].script = 'bar.py:function'

syntax that gps suggested in comment 38.  |function| can then invoke whatever
other functions it needs to with the appropriate flags toggled.  Once we do
that, we could have many GENERATED_FILES be built during config.status runs.
Since we can run everything from Python at that point, we don't have to go to
the shell, and we could consider something like:

GENERATED_FILES['foo'].args = { 'named_arg': ..., 'other_arg': ... }

and let Python keyword arguments do some heavy lifting.

There are still some things that don't fall nicely into the GENERATED_FILES
paradigm (addon-sdk/'s cfx script, and config/'s system/stl header wrapping).
I'm punting on those for now.  Uses of $(PYTHON) in */installer/* or
*/locales/* directories don't seem to fit into GENERATED_FILES-type things, so
I am ignoring those.  I'm also punting on uses of
$(call py_action,preprocessor,...), such as those in browser/themes/*.  I think
it might be feasible to do something like:

GENERATED_FILES['foo'].preprocess = True
GENERATED_FILES['foo'].defines = ...

to take care of preprocessor-requiring cases, since preprocessed files are
relatively common.  Failing that, we can write wrapper Python scripts that
invoke the preprocessor.

nalexander said to me in #build, "Honestly, landing something is better than
nothing".  Not that nalexander would necessarily agree with all the things
here, but I think it's good enough to be refined as we convert things.
Attachment #8537387 - Attachment is obsolete: true
Attachment #8559316 - Flags: review?(gps)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #44)
> For those that aren't (e.g. mobile/android/base/fennec-ids-generator.py), I'd
> argue that the support for --options is really a case of overgeneralization.
> Most of these scripts are written to generate one specific set of files; it
> is
> premature generalization to dress them up with lots of options.  We ought to
> be
> able to do away with the -o/--options in the conversion to moz.build
> GENERATED_FILES.  (I admit that -o/--options can make reading things slightly
> nicer; I think this could be addressed by generating things from
> config.status,
> see below.)

+1 million. We have a tendency to write overly-general tools for things. YAGNI, let's make things do what they need to, and we can add needed complexity *if* we need it down the road.
Assignee: ted → nfroyd
Attachment #8464673 - Attachment is obsolete: true
Attachment #8465489 - Attachment is obsolete: true
gps only needs one thing to respond to in this bug.
Flags: needinfo?(gps)
Comment on attachment 8559316 [details] [diff] [review]
part 2 - add support for defining generating scripts for GENERATED_FILES

Review of attachment 8559316 [details] [diff] [review]:
-----------------------------------------------------------------

I have a terminology nit. But that horse has been beaten to death. We can always mass rewrite later if you don't feel like changing it now.

::: python/mozbuild/mozbuild/action/file_generate.py
@@ +42,5 @@
> +            ret = module.main(output, *args.additional_arguments)
> +    except IOError as e:
> +        print('Error opening file "{0}"'.format(e.filename), file=sys.stderr)
> +        traceback.print_exc()
> +        return 1

You could probably re-raise and have similar end result. (This is a nit - this code is fine.)

::: python/mozbuild/mozbuild/frontend/context.py
@@ +502,5 @@
> +
> +           GENERATED_FILES += ['bar.c', 'baz.c', 'foo.c']
> +           bar = GENERATED_FILES['bar.c']
> +           bar.script = 'generate.py'
> +           bar.args = [ 'datafile-for-bar' ]

"args" feels weird because the srcdir path gets expanded in emitter. Rename to "inputs"?

::: python/mozbuild/mozbuild/frontend/emitter.py
@@ +532,5 @@
> +                            % (f, script), context)
> +                    if os.path.splitext(script)[1] != '.py':
> +                        raise SandboxValidationError(
> +                            'Script for generating %s is not a python script: %s'
> +                            % (f, script), context)

Files without ".py" but with python in the #! are valid python scripts.

Please change the error message to indicate that scripts must end in ".py"
Attachment #8559316 - Flags: review?(gps) → review+
https://hg.mozilla.org/mozilla-central/rev/5e837e179aab
https://hg.mozilla.org/mozilla-central/rev/21b8e8473781
https://hg.mozilla.org/mozilla-central/rev/81256bde8592
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Nathan: thanks for finishing this!
Depends on: 1135043
Blocks: 1135046
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: