Enforce good #include hygiene in SpiderMonkey

RESOLVED FIXED in mozilla25

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

unspecified
mozilla25
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [js:t])

Attachments

(1 attachment, 5 obsolete attachments)

We should write a script that analyzes SpiderMonkey's #include statements to make sure we don't do anything bad.  Examples:

- No recursive dependencies (see jorendorff's script in bug 872416 for this).

- No normal header should #include a inlines.h/-inl.h file.

- We could even check the #include ordering if we want to be pedantic.  That's more of an aesthetic thing, whereas the previous two items can cause actual problems.

We need to fix the existing problems first, of course.
Depends on: 634839
Other things:

- Check there is:
  (a) a #ifndef wrapper in all .h files
  (b) that it's at the top of the file
  (c) that it's in the vanilla form that GCC and clang can avoid multiple-inclusion (see bug 881579)
  (d) that the name used is appropriate for the filename

Each of these four points are currently violated by some of our headers.
(copied from bug 881577)

Driveby comment, but if we're going to do this (I'm certainly not opposed!), we should use include guards don't violate C++ naming rules.  Most particularly they shouldn't start with "_", and they shouldn't include sequential underscores.  A whole bunch of our existing include guards violate these rules right now, because copypasta for one, and because who ever has time to clean up that sort of junk?

It might also be worth standardizing include-guard naming at the same time.  I have no feelings on the format, except that it seems worth poking the Gecko people to pick one scheme that everyone, everywhere, could use.
I sent an email to the JS internals list about standardizing the guard names.

None of our current guards start with underscore, but all of them end with two or three underscores.  (I want to standardize on two.)  Is that a problem?
Two or three underscores is a problem, because C++ reserves all identifiers containing more than one underscore in a row, as was noted in that thread.
As for Gecko, we use the way the header is supposed to be included, with punctuation replaced by underscores, and no trailing underscores (nsINode_h, mozilla_dom_Element_h).
> As for Gecko, we use the way the header is supposed to be included, with
> punctuation replaced by underscores, and no trailing underscores (nsINode_h,
> mozilla_dom_Element_h).

When I looked at Gecko that was one of about 11 different styles in use...
I should have said: "in new code, when someone is looking"
Another thing to check:  #includes should have full paths, e.g. "ion/Ion.h", not "Ion.h".
Depends on: 883697
Depends on: 883696
Depends on: 886205
Depends on: 887558
Depends on: 888083
Depends on: 888088
This is a first cut at a header/#include checker script.  It does some of the
checking, including cycle checking.  Comments in the script explain.  It also
has some self-tests, which is good.
Attachment #768797 - Flags: review?(jorendorff)
I realized that this script can be generalized for non-header-related style
checks of Spidermonkey code, so I renamed various things.  I also made the
cycle printing deterministic, and made the error message easier to read.

gps, I'm r?ing you as a build peer.  This patch adds a script that checks
various aspects of SpiderMonkey code style, which is run on |make check|.  I
don't think there's anything objectionable here but it's a non-trivial build
system change so I thought I'd get your input.
Attachment #769536 - Flags: review?(jorendorff)
Attachment #769536 - Flags: review?(gps)
Comment on attachment 769536 [details] [diff] [review]
Introduce check_spidermonkey_style.py, which currently checks SpiderMonkey header and #include hygiene, and some tests for it.  code=njn,jorendorff.

Review of attachment 769536 [details] [diff] [review]:
-----------------------------------------------------------------

::: config/check_spidermonkey_style.py
@@ +1,1 @@
> +# vim: set ts=8 sts=4 et sw=4 tw=99:

pep8 says 80 columns

@@ +9,5 @@
> +#----------------------------------------------------------------------------
> +# Check the following things in headers.
> +#
> +# - No cyclic dependencies.
> +# 

Please drop the trailing whitespace

::: js/src/Makefile.in
@@ +329,5 @@
> +	    echo "TES""T-P""ASS | check_spidermonkey_style.py | ok"; \
> +	    exitcode=0; \
> +	fi; \
> +	rm -f $@.tmp $@.tmp2; \
> +	exit $$exitcode

This part should be in python as well, imo.
Comment on attachment 769536 [details] [diff] [review]
Introduce check_spidermonkey_style.py, which currently checks SpiderMonkey header and #include hygiene, and some tests for it.  code=njn,jorendorff.

Review of attachment 769536 [details] [diff] [review]:
-----------------------------------------------------------------

Man, all this cool work happening in SpiderMonkey land - when does Gecko get love? ;)

Serious question now: since this falls under the realm of static analysis, should we implement this on top of the existing static analysis builder (bug 880434)? Unfortunately, that may mean only running where Clang is used (notably not Windows). Of course, we could always change the static analysis builder configuration to invoke more functionality (this patch) and run everywhere.

Another high-level concern: I think people want feedback as early as possible during a build (e.g. during the actual build, not |make check|). It would be rad if we could incorporate static analysis into build proper so failures are caught at compile time, not during (frequently optional) post analysis. But, I think this is mostly follow-up fodder.

Not granting r+ because of assumed Git compatibility issue and because you are coding a script in a Makefile.in rule. I could be convinced to let the latter slide on the grounds that I am a fan of static analysis and am willing to bend to see it employed more at Mozilla.

::: config/check_spidermonkey_style.py
@@ +101,5 @@
> +    CPP   = 2
> +    INL_H = 3
> +    H     = 4
> +    TBL   = 5
> +    MSG   = 6

Serious Python programmers frown upon vertical alignment like this. I think it's technically against PEP-8. I like flake8 (https://bitbucket.org/tarek/flake8/wiki/Home) for critiquing Python style.

@@ +126,5 @@
> +
> +        test_failure(filename, 'unknown file kind')
> +
> +
> +def main():

Holy main() function, Batman! Any chance you could split things into a few functions and make main less monolithic? I don't want you to architect astronaut, but I suspect others may want to extend this in the future to run static analysis on other parts of the source tree. It would be nice if there were some configuration.

@@ +128,5 @@
> +
> +
> +def main():
> +    root_dir = subprocess.check_output(['hg', 'root'], universal_newlines=True).strip()
> +    all_filenames = subprocess.check_output(['hg', 'manifest'], universal_newlines=True).split('\n')

The Git people aren't going to like this. They are going to like it even less if it breaks |make check|. I /think/ the b2g builds might even be pulling from Git. If that's the case, failing to handle Git constitutes r-.

I've long wanted to add a Python API to the tree to facilitate VCS lookups. In hindsight, I wish I could point you at that :/

Also, I guess since that this is limited to SpiderMonkey, you have the luxury of not having to worry about auto-generated files (e.g. xpidl-derived .h files). I predict there will come a day when this script needs to interface with the objdir in some form. I do not look forward to that day.

@@ +181,5 @@
> +
> +            with open(os.path.join(root_dir, filename)) as f:
> +                linenum = 0
> +                for line in f:
> +                    linenum += 1

for linenum, line in enumerate(f):

@@ +182,5 @@
> +            with open(os.path.join(root_dir, filename)) as f:
> +                linenum = 0
> +                for line in f:
> +                    linenum += 1
> +                    m = re.match(r'\s*#\s*include\s+"([^"]*)"', line)

... and I stopped reviewing this file here. It must have been the regular expression :)

::: js/src/Makefile.in
@@ +329,5 @@
> +	    echo "TES""T-P""ASS | check_spidermonkey_style.py | ok"; \
> +	    exitcode=0; \
> +	fi; \
> +	rm -f $@.tmp $@.tmp2; \
> +	exit $$exitcode

I echo Ms2ger's sentiments that this should be in Python. We are shying away from this script-like targets in make files and opting to move actions into mach commands (which are really just specially-formulated Python scripts). The latter is more powerful and easily discoverable. Although, mach isn't part of the build currently. So, you'll probably want to port this logic to a standalone .py file and have the check:: target below simply invoke $PYTHON blah.py.
Attachment #769536 - Flags: review?(gps) → feedback+
Joshua may have comments on integrating this with the new static analysis builder he's been championing.
Thanks for suggesting flake8, Gregory - I'll apply that to the script I'm working on for bug 888088.

I'm not looking forward to comparing this script with what I'm cobbling together for that bug, by the way. I'm hoping my script will make things nicer, but considering how new I am to Python..
> Man, all this cool work happening in SpiderMonkey land - when does Gecko get
> love? ;)

When the Gecko style guide specifies some detailed rules about #include usage, and somebody takes the time to fix the existing code to follow them!

> Serious question now: since this falls under the realm of static analysis,
> should we implement this on top of the existing static analysis builder (bug
> 880434)? Unfortunately, that may mean only running where Clang is used
> (notably not Windows). Of course, we could always change the static analysis
> builder configuration to invoke more functionality (this patch) and run
> everywhere.

At first I thought you were suggesting that I rewrite this as a clang plugin, but now I think you're not suggesting that.

> Another high-level concern: I think people want feedback as early as
> possible during a build (e.g. during the actual build, not |make check|). It
> would be rad if we could incorporate static analysis into build proper so
> failures are caught at compile time, not during (frequently optional) post
> analysis. But, I think this is mostly follow-up fodder.

I'd prefer all the above suggestions to be follow-ups.  The current patch works on all platforms and even though it's a check-time thing it's easy to invoke:  |make check-style| (which is invoked by |make check|).  I've gone to non-trivial effort to get SpiderMonkey's and #includes into a good state, with some real benefits (see bug 886205 comment 29) and I'd like to get this working in some form ASAP to prevent backsliding.

> Not granting r+ because of assumed Git compatibility issue and because you
> are coding a script in a Makefile.in rule. I could be convinced to let the
> latter slide on the grounds that I am a fan of static analysis and am
> willing to bend to see it employed more at Mozilla.

I can move the make rule into a script, that's easy.  As for the git compatibility, what do you suggest?  I guess I can just do a braindead directory traversal, though that has the disadvantage that if any files get put into the srcdir that aren't in the repo, they'll be considered too.

> Also, I guess since that this is limited to SpiderMonkey, you have the
> luxury of not having to worry about auto-generated files (e.g. xpidl-derived
> .h files). I predict there will come a day when this script needs to
> interface with the objdir in some form. I do not look forward to that day.

SpiderMonkey has a few generated files, but I just ignore them, which makes a certain amount of sense the style of a generated file is arguably less important than the style of a handwritten file.  See |included_inclnames_to_ignore|.
(In reply to Nicholas Nethercote [:njn] from comment #15)
> > Serious question now: since this falls under the realm of static analysis,
> > should we implement this on top of the existing static analysis builder (bug
> > 880434)? Unfortunately, that may mean only running where Clang is used
> > (notably not Windows). Of course, we could always change the static analysis
> > builder configuration to invoke more functionality (this patch) and run
> > everywhere.
> 
> At first I thought you were suggesting that I rewrite this as a clang
> plugin, but now I think you're not suggesting that.

It's not clear to me if gps is requesting a clang plugin or something else. Clang plugins only work on Linux non-cross-compiled builds right now (because the build system for doing this sort of stuff is a piece of crap, and I don't have a mac to test Mac builds on), and the builds are hidden on tbpl until we get better try support (bug 887641 and bug 887642).

Now, that said, I personally tend to disfavor ad-hoc C++ parsing (even if it is just lexing), and this patch illustrates one of the reasons why: we so muck around the compiler path that you're ending up hard-coding special knowledge of circumstances. This may work fine for SpiderMonkey, but it will be an unmaintainable mess in Gecko. I don't recall much about the include situation, but looking for files in dist/include, objdir, srcdir, and LOCAL_INCLUDES should solve almost all the include issues, at the cost of needing to track files between the objdir and the srcdir. I think work would be better spent putting this on the clang infrastructure than building increasingly hacky mockups of the same thing in Python.

Looking at your requirements, none of them are difficult with clang's infrastructure:
> - No recursive dependencies (see jorendorff's script in bug 872416 for this).

This is the hardest to do in clang, but LLVM comes with an SCC finder builtin.

> - No normal header should #include a inlines.h/-inl.h file.
> - We could even check the #include ordering if we want to be pedantic.

Some basic ad-hoc checks.

>   (a) a #ifndef wrapper in all .h files
>   (b) that it's at the top of the file
>   (c) that it's in the vanilla form that GCC and clang can avoid
> multiple-inclusion (see bug 881579)

These three actually basically come for free: check that HeaderFileInfo::getControllingMacro is not null...

>   (d) that the name used is appropriate for the filename
... and this ain't much harder.
Requiring this as a clang plugin will kill this patch and bug.  Let's not let perfect be the enemy of good here.  If someone wants to reimplement this as a clang plugin later that's fine by me.
Changes in this version.

- It should now work with Git, though I haven't tried it.
  - It now tries both |hg manifest| and |git ls-files| to get the list of all
    repository files, and bails if neither works.
  - I hard-coded the repo root dir as '../../' instead of using |hg root|.

- I moved the logic in the |check-style| rule, and the expected output, into
  check_spidermonkey_style.py.

- I made main() less monolithic.

I still think having it run in |make check| (on all platforms) is the best
option.  Making this only part of the static analysis builds means it would
only run once per day on the test machines -- is that right?
Attachment #771537 - Flags: review?(jorendorff)
Attachment #771537 - Flags: review?(gps)
Attachment #769536 - Attachment is obsolete: true
Attachment #769536 - Flags: review?(jorendorff)
(In reply to Nicholas Nethercote [:njn] from comment #18)
> I still think having it run in |make check| (on all platforms) is the best
> option.  Making this only part of the static analysis builds means it would
> only run once per day on the test machines -- is that right?

Static analysis runs on every checkin these days, except try (for arcane reasons).
Attachment #771537 - Attachment is obsolete: true
Attachment #771537 - Flags: review?(jorendorff)
Attachment #771537 - Flags: review?(gps)
I found a bug where I was treating inline headers differently from vanilla
headers in cycle detection.  This version fixes that and modifies the cycle
detection tests so some inline headers are involved.
Attachment #771890 - Flags: review?(jorendorff)
Attachment #771890 - Flags: review?(gps)
Attachment #771886 - Attachment is obsolete: true
Attachment #771886 - Flags: review?(jorendorff)
Attachment #771886 - Flags: review?(gps)
Comment on attachment 771890 [details] [diff] [review]
Introduce check_spidermonkey_style.py, which currently checks SpiderMonkey header and #include hygiene, and some tests for it.  code=njn,jorendorff.

Review of attachment 771890 [details] [diff] [review]:
-----------------------------------------------------------------

r+ conditional on fixed bareword excepts.

As for the rest, perfect is the enemy of good. I wish we didn't reinvent parsing. I kinda wished it integrated with other systems better. But, you have to start somewhere. Something is better than nothing.

::: config/check_spidermonkey_style.py
@@ +149,5 @@
> +        out('    ' + line)
> +    out('')
> +
> +
> +class FileKind:

class FileKind(object):

Omitting the "(object)" makes it an old style Python class. You don't want to do that.

@@ +189,5 @@
> +            all_filenames = subprocess.check_output(cmd, universal_newlines=True,
> +                                                    stderr=subprocess.PIPE).split('\n')
> +            return all_filenames
> +        except:
> +            continue

Python rule: never use bareword except. One reason is it catches *all* exceptions, including KeyboardInterrupt and SystemExit. In this code, if a user hits ctrl+c during |hg manifest| |git ls-files| will be executed (I would expect the script to stop execution).

For this code, I'd catch just subprocess.CalledProcessError (http://docs.python.org/2/library/subprocess.html#subprocess.CalledProcessError).

@@ +257,5 @@
> +                                     fromfile='check_spider_monkey_style.py expected output',
> +                                       tofile='check_spider_monkey_style.py actual output')
> +    for diffline in difflines:
> +        global exit_code
> +        exit_code = 1

Can't you just return a value from the function instead of modifying a global?

@@ +263,5 @@
> +
> +
> +def do_line(filename, inclname, file_kind, linenum, line, all_inclnames, included_h_inclnames):
> +    # Look for a |#include "..."| line.
> +    m = re.match(r'\s*#\s*include\s+"([^"]*)"', line)

I don't believe Python is yet smart enough to cache the compilation of regular expressions at parse or first-use time. So, this will get recompiled on every execution. You typically see module scope assignments of the compiled regular expression. e.g.

RE_LINE_INCLUDE = re.compile('...')
...
RE_LINE_INCLUDE.match(line)

@@ +381,5 @@
> +
> +
> +def main():
> +    try:
> +        check_style()

I wish this returned something instead of relying on global state. A more formal API more easily allows other uses of this script, including as a mach command.

@@ +384,5 @@
> +    try:
> +        check_style()
> +    except:
> +        exctype, value = sys.exc_info()[:2]
> +        print("EXCEPTION:", exctype, value)

The better way to write this is:

except Exception as e:
    exctype = type(e)
    value = e

Or, you may be satisfied with:

except Exception as e:
    print('EXCEPTION: %s' % e)
Attachment #771890 - Flags: review?(gps) → review+
> I don't believe Python is yet smart enough to cache the compilation of
> regular expressions at parse or first-use time. So, this will get recompiled
> on every execution. You typically see module scope assignments of the
> compiled regular expression. e.g.

It is smart enough, according to http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile.  Furthemore, I tried using re.compile() with a module-scope assignment, because I thought it might be more readable, and it actually slowed things down slightly!
> I wish we didn't reinvent
> parsing. I kinda wished it integrated with other systems better.

Agreed!  But, FWIW, I've been running this locally, frequently, while doing some work on streamlining #include statements (bug 891215) to check for #include cycles.  So the ability to run it locally without requiring special infrastructure *is* useful.
(In reply to Nicholas Nethercote [:njn] from comment #23)
> > I don't believe Python is yet smart enough to cache the compilation of
> > regular expressions at parse or first-use time. So, this will get recompiled
> > on every execution. You typically see module scope assignments of the
> > compiled regular expression. e.g.
> 
> It is smart enough, according to
> http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-
> compile.  Furthemore, I tried using re.compile() with a module-scope
> assignment, because I thought it might be more readable, and it actually
> slowed things down slightly!

The cache (2.5-6 era) used to be really small -- just 10 entries, iirc. With the potential of a sharp perf cliff, people would habitually make a manual cache: premature optimization, mostly.
Attachment #768797 - Attachment is obsolete: true
Attachment #768797 - Flags: review?(jorendorff)
jorendorff, can I get an ETA on a review for this patch?
Flags: needinfo?(jorendorff)
Blocks: 898274
> r+ conditional on fixed bareword excepts.

I just removed that try/catch block altogether.  IIRC it was needed to catch any exceptions within the script itself (e.g. syntax error).  But now that I don't have the complicated make target, that's no longer needed.

> > +def main():
> > +    try:
> > +        check_style()
> 
> I wish this returned something instead of relying on global state. A more
> formal API more easily allows other uses of this script, including as a mach
> command.

check_style() now returns a boolean indicating if the test passed.
New version, addresses gps' comments.

jorendorff, helloooooooooo... can you hear meeeeeeeee...
Attachment #781519 - Flags: review?(jorendorff)
Attachment #771890 - Attachment is obsolete: true
Attachment #771890 - Flags: review?(jorendorff)
I've waited long enough.  On Monday I will land this with r=gps.
Flags: needinfo?(jorendorff)
https://hg.mozilla.org/mozilla-central/rev/eb3e049e7558
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Attachment #781519 - Flags: review?(jorendorff)
You need to log in before you can comment on or make changes to this bug.