Closed
Bug 558723
Opened 14 years ago
Closed 13 years ago
clean up JS #includes
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 634839
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
Attachments
(1 file, 1 obsolete file)
2.47 KB,
patch
|
n.nethercote
:
review-
|
Details | Diff | Splinter Review |
SpiderMonkey's #includes have become a bit of a mess, particularly because of the recent introduction of jsfooinlines.h files, which themselves became necessary due to the conversion of various macros into inline functions. (Nb: This messiness isn't just academic, it has made life difficult for me recently -- when trying to add new inline functions I've had hard-to-decipher warnings about some of them being used-but-undefined, and I'm sick of playing the game of "add a random #include to a file to see if that fixes it", as well as its sibling "try to work out why jsfoo.h gets #included in jsbar.cpp but not in jsbaz.cpp".) I'll start by enumerating the principles that Brendan wants, as discussed on IRC. Once we've agreed on those, I'll start fixing the code to satisfy these rules. - A .h file should be includable by itself, e.g. if you include a.h in a.cpp you shouldn't have to include b.h as well. This is testable by compiling a C++ module that contains a single #include and nothing else. - A corollary of this is that any inline function defined in a .h file cannot depend on any names (variables, functions, even macros) defined in another .h file. Any function with such a dependency must instead be defined in a jsfooinlines.cpp file. - Includes should be listed at the top of a file. The exception is for include that must be within a #if/#ifdef; they should be near the top of the file where possible. - System includes should be listed first, in alphabetical order. Then local includes, in alphabetical order. Then a blank line and any jsfooinlines.h includes, in alphabetical order. For example: #include <a.h> #include <q.h> #include "jsp.h" #include "jsx.h" #include "jspinlines.h" #include "jsxinlines.h" Does that sound right? Something else I've wondered... some #includes have this comment: #include "jshash.h" /* Added by JSIFY */ Is that significant? Judging from jsify.pl these comments were added almost 12 years ago. Can they be removed?
Assignee | ||
Comment 1•14 years ago
|
||
I guess another corollary of the first principle is that circular dependencies between .h files are not allowed.
Comment 2•14 years ago
|
||
Thanks for this bug -- it's a real issue, for sure. > - A .h file should be includable by itself, e.g. if you include a.h in a.cpp > you shouldn't have to include b.h as well. This is testable by compiling a C++ > module that contains a single #include and nothing else. This is sound. It should be clear that if the x.cpp that includes a.h also has need of interface declarations from b.h, then x.cpp should include b.h directly. No bootlegging via internal/implementation/interface-abstracted .h file nesting. > - A corollary of this is that any inline function defined in a .h file cannot > depend on any names (variables, functions, even macros) defined in another .h > file. This is slightly too strong. If b.h includes a.h because b.h defines a class subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof dependencies that require b.h to include a.h, then inline methods/functions in b.h may depend on a.h contents. > Any function with such a dependency must instead be defined in a > jsfooinlines.cpp file. This is important when there are many, and non-interface (non-signature-type) dependencies, for implementing the inline function/method bodies. It's always clear without inline that implementation != interface, since .cpp != .h. But inline introduces a new kind of .h, wherefore our jsfooinlines.h convention. > - System includes should be listed first, in alphabetical order. Then local > includes, in alphabetical order. Then a blank line and any jsfooinlines.h > includes, in alphabetical order. For example: > > #include <a.h> > #include <q.h> > #include "jsp.h" > #include "jsx.h" > > #include "jspinlines.h" > #include "jsxinlines.h" > > > Does that sound right? Right, but we have historical NSPR-forked jsprf.h, jsutil.h, jsarena.h, jsbit.h and other files segregated without a blank line, just by being grouped ahead of the "JS not NSPR" .h files. We should probably lose this confusing bit of history and sort all the local non-inline includes. > Something else I've wondered... some #includes have this comment: > > #include "jshash.h" /* Added by JSIFY */ > > Is that significant? Judging from jsify.pl these comments were added almost 12 > years ago. Can they be removed? Yes, please. That kind of noise mattered for about a month, then became ugly deadwood. It's due to that NSPR forking of files into js/src to make it stand alone. From 1998. /be
Comment 3•14 years ago
|
||
(In reply to comment #2) > This is important when there are many, and non-interface (non-signature-type) > dependencies, for implementing the inline function/method bodies. It's always > clear without inline that implementation != interface, since .cpp != .h. Of course, it's not "always clear" even excluding inline methods, since C++ and C seem to require types to be visible to declare members, and members (esp. private ones in C++) are implementation-not-interface. What saves us from over-including due to member types, often, is the member being a pointer-to-T, and the ability to declare opaque pointer types in C via struct tags and struct tag typedefs, hence jsprvtd.h and jspubtd.h; in C++ via class and struct forward declarations (alas nothing for enum). But implementation chocolate gets into interface peanut butter even without inline -- the primal sin is #include, the pre-processor. /be
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #2) > > This is sound. It should be clear that if the x.cpp that includes a.h also has > need of interface declarations from b.h, then x.cpp should include b.h > directly. No bootlegging via internal/implementation/interface-abstracted .h > file nesting. Yes, but it's easy to bootleg unintentionally. But it's a good aim. > > - A corollary of this is that any inline function defined in a .h file cannot > > depend on any names (variables, functions, even macros) defined in another .h > > file. > > This is slightly too strong. If b.h includes a.h because b.h defines a class > subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof > dependencies that require b.h to include a.h, then inline methods/functions in > b.h may depend on a.h contents. Hmm, actually, if b.h include a.h then it should be ok for any inline function body in b.h to refer to any identifier from a.h as well, right? That preserves the first principle. > > - System includes should be listed first, in alphabetical order. Then local > > includes, in alphabetical order. Then a blank line and any jsfooinlines.h > > includes, in alphabetical order. For example: > > > > #include <a.h> > > #include <q.h> > > #include "jsp.h" > > #include "jsx.h" > > > > #include "jspinlines.h" > > #include "jsxinlines.h" > > Right, but we have historical NSPR-forked jsprf.h, jsutil.h, jsarena.h, jsbit.h > and other files segregated without a blank line, just by being grouped ahead of > the "JS not NSPR" .h files. We should probably lose this confusing bit of > history and sort all the local non-inline includes. That's what I figured, from what you said on IRC. > > #include "jshash.h" /* Added by JSIFY */ > > > > Can they be removed? > > Yes, please. That kind of noise mattered for about a month, then became ugly > deadwood. It's due to that NSPR forking of files into js/src to make it stand > alone. From 1998. Can jsify.pl be removed too?
Assignee | ||
Comment 5•14 years ago
|
||
I did the "compile with a single #include" test on js/src/*.h, the following list shows which ones (34 out of 66) had problems. Note that not all of these necessarily need fixing, as fixing one broken header may cause others to work again. jsapi.h - jsarena.h - jsarray.h bad jsatom.h - jsatominlines.h bad jsbit.h - jsbool.h bad jsbuiltins.h - jsclist.h - jscntxt.h bad jscntxtinlines.h bad jscompat.h - jscpucfg.h - jsdate.h bad jsdbgapi.h - jsdhash.h - jsdtoa.h - jsdtracef.h bad (?) jsemit.h bad jsexn.h bad jsfun.h bad jsgc.h bad jshash.h - jshashtable.h - jsinterp.h bad jsinttypes.h - jsiter.h bad jslibmath.h - jslock.h - jslong.h - jsmath.h bad jsnum.h bad jsobj.h bad jsobjinlines.h bad json.h bad jsopcode.h - jsotypes.h bad jsparse.h bad jsprf.h - jspropertycache.h - jspropertycacheinlines.h bad jspropertytree.h - jsprvtd.h - jspubtd.h - jsregexp.h bad jsscan.h bad jsscope.h bad jsscopeinlines.h bad jsscript.h - jsscriptinlines.h bad jsstaticcheck.h - jsstdint.h - jsstr.h bad jsstrinlines.h bad jstask.h bad jstl.h - jstracer.h bad jstypedarray.h bad jstypes.h - jsutil.h bad jsvector.h bad jsversion.h - jsxdrapi.h - jsxml.h bad prmjtime.h - resource.h -
Assignee | ||
Comment 6•14 years ago
|
||
BTW, js-confdefs.h is auto-included in by the build system for all compiler invocations, so I think it's safe for a header to implicitly depend on it.
Assignee | ||
Comment 7•14 years ago
|
||
It also pays to use the same compile flags as a normal build, when doing an include-a-single-header-file build.
Comment 8•14 years ago
|
||
Why should a user be required to #include both "jsfoo.h" and jsfooinlines.h"? It seems to me that jsfoo.h should just #include jsfooinlines.h itself; that way, one can think of "jsfoo.h" as "what I need to use foos". Whether foos use inline functions is an implementation detail.
Comment 9•14 years ago
|
||
(In reply to comment #8) > Why should a user be required to #include both "jsfoo.h" and jsfooinlines.h"? > It seems to me that jsfoo.h should just #include jsfooinlines.h itself; that > way, one can think of "jsfoo.h" as "what I need to use foos". Whether foos use > inline functions is an implementation detail. The problem is that the implementation detail costs everyone including jsfoo.h, even if most do not invoke the inline(s) that drag in the kitchen sink. We do not want over-including just for an inlining optimization. Over-including breaks abstraction too: * It makes stuff visible you shouldn't see (leading to more bootlegging by accident or on purpose). * It makes stuff visible that may collide with local code in the including .cpp that did not expect the implementation-not-interface namespace pollution. This is mostly about macros, yeah. * It can slow down builds -- we've seen this in spades in Firefox, when someone put a #include "jsnum.h" in "nsContentUtils.h". /be
Comment 10•14 years ago
|
||
Oh, and perhaps the best reason not to force implementation includes on all interface includers just because some inline methods need the implementation .h files: * It will create cycles in the current codebase. This could be fixed by splitting of headers into pieces that can be topologically sorted. But that sometimes is make-work if the only cause is inline-ing. The jsfooinlines.h compromise seemed better to us at the time we started doing it for all the reasons given. Maybe it's not, but even if we come up with a minimal, topo-sorted dependency graph, implementation invading interface due to the damn pre-processor seems a poor reason for all the make-work. The valgrind no-nesting approach (I recall it from SVR-old AT&T Unixes) goes to the other extreme. jsfooinlines.h is in the middle, and at least keeps impl. bits (as opposed to interface) pulled together only in .cpp files, not in .h files. Bug agree that it's a compromise, so unsatisfying. In particular, forgetting to include the right jsfooinlines.h means you get a fairly obscure error message. /be
Comment 11•14 years ago
|
||
jimb, inlines often use multiple header files, so including jsobjinlines.h in jsobj.h is pointless. We could have put the stuff in jsobj.h directly if that worked.
Comment 12•14 years ago
|
||
(In reply to comment #11) > jimb, inlines often use multiple header files, so including jsobjinlines.h in > jsobj.h is pointless. We could have put the stuff in jsobj.h directly if that > worked. Right, I understand that now. I didn't realize the distinction was necessary for correctness; I had thought it was for organizational purposes.
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #2) > > It should be clear that if the x.cpp that includes a.h also has > need of interface declarations from b.h, then x.cpp should include b.h > directly. No bootlegging via internal/implementation/interface-abstracted .h > file nesting. I'm most of the way through a patch, and it's clear to me that this is a nice aim, but almost impossible to attain in practice. How do you verify you aren't bootlegging in x.cpp? You'd have to look at every single thing imported by x.cpp, work out which .h file declares it, and make sure you've included that .h file directly. The transitivity of #include really sucks. I say "almost impossible" because there is a way to attain it -- use the Valgrind-style headers-don't-include-headers approach. That way bootlegging is impossible. More generally, with that approach the compiler will enforce your principles for you. The downside is that it can be verbose; it's really annoying if you want to include jscntxt.h and you have to include eight other .h files first. (This problem has never been so bad in Valgrind, maybe because Valgrind is relatively small, or maybe because the module system was designed under these constraints from the start.) > If b.h includes a.h because b.h defines a class > subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof > dependencies that require b.h to include a.h, then inline methods/functions in > b.h may depend on a.h contents. Again, it's a nice idea but really hard to enforce. If bootlegging is so difficult to avoid, is it worth outlawing it? The only real downside I can see is that it can be difficult to know what is being included, but that'll always be true unless we use the Valgrind-style approach.
Comment 14•14 years ago
|
||
Bootlegging is the least evil here. Anti-modularity of the valgrind kind, where dependents have to know about a dependency's dependencies, I call worse :-P. /be
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #14) > Bootlegging is the least evil here. Anti-modularity of the valgrind kind, where > dependents have to know about a dependency's dependencies, I call worse :-P. Sure, there's pros and cons to both approaches and I'm not suggesting we change SpiderMonkey to use Valgrind's approach. But we shouldn't fool ourselves into thinking that we'll avoid bootlegging, esp. since the typical programmer's approach is to randomly add #include statements until the thing compiles again :)
Comment 16•14 years ago
|
||
Yeah, I never thought laws would prevent crime either ;-). Srsly, the only way to stamp out bootlegging is automation. At SGI many years ago I dreamt of a static analysis framework that would use-count every definition and declaration in a .h file, and identify bootleggers. Sixgill and possibly even our GCC-based static analysis plugins may be up to this task. Low priority! /be
Comment 17•14 years ago
|
||
(In reply to comment #16) > Srsly, the only way to stamp out bootlegging is automation. At SGI many years > ago I dreamt of a static analysis framework that would use-count every > definition and declaration in a .h file, and identify bootleggers. Also identify useless direct and nested includes, of course. /be
Assignee | ||
Comment 18•14 years ago
|
||
I've hit one roadblock in the quest to alphabeticize include statements. On Windows, jsstdint.h must be included before nanojit/nanojit.h, because both files try to define the stdint types, because jsstdint.h #defines _STDINT_H which suppresses nanojit/nanojit.h's attempt. Hmm.
Comment 19•14 years ago
|
||
I mentioned this on irc but forgot to repeat it. There's often one weird rule that makes no sense. Long ago, for Windows 3.1, we had a "jsstddef.h" you *had* to include first (even before system headers). Let's make jsstdint.h the one weird thing. It'll be ok if we keep it to one. /be
Comment 20•14 years ago
|
||
(In reply to comment #18) > I've hit one roadblock in the quest to alphabeticize include statements. On > Windows, jsstdint.h must be included before nanojit/nanojit.h, because both > files try to define the stdint types, because jsstdint.h #defines _STDINT_H > which suppresses nanojit/nanojit.h's attempt. Hmm. Yeah, the problem there is the asymmetry: VMPI.h checks for jsstdint.h's definitions, but not vice versa; and jsstdint.h defines _STDINT_H, but VMPI.h doesn't.
Comment 21•14 years ago
|
||
How would this do, to decouple jsstdint.h and VMPI.h? It needs approval for nanojit, too; who should I r? for that?
Attachment #438857 -
Flags: review?(nnethercote)
Assignee | ||
Comment 22•14 years ago
|
||
(In reply to comment #21) > > How would this do, to decouple jsstdint.h and VMPI.h? It needs approval for > nanojit, too; who should I r? for that? Not sure if that's ok -- Nanojit is used by Tamarin as well. I don't know how significant _STDINT_H, eg. if it's generated by configure or something. And putting MOZ_* things into Nanojit is a bit weird.
Comment 23•14 years ago
|
||
jsstdint.h must come first as an extra rule is not terrible, if you can't get Nanojit buy-in (using a non-MOZ_ macro name might help there :-P). /be
Assignee | ||
Comment 24•14 years ago
|
||
Bug 559132 might lead to the relaxation of the jsstdint.h-first rule, but for this bug I can live with it.
Comment 25•14 years ago
|
||
(In reply to comment #22) > Not sure if that's ok -- Nanojit is used by Tamarin as well. I don't know how > significant _STDINT_H, eg. if it's generated by configure or something. And > putting MOZ_* things into Nanojit is a bit weird. Right, I was trying to accomodate Tamarin. _STDINT_H was chosen arbitrarily by Graydon in bug 519856. It just happens to be the preprocessor symbol #defined by <stdint.h> on Linux; it would never be #defined if AVMPLUS_WIN32 is. So its use in Tamarin is already an accomodation of SM. The idea here would simply be to stop using a symbol outside our namespace. I've updated the patch to use NANOJIT_STDINT_TYPES_DEFINED instead.
Comment 26•14 years ago
|
||
Second cut at patch: use NANOJIT_STDINT_TYPES_DEFINED as symbol name.
Attachment #439284 -
Flags: review?(nnethercote)
Assignee | ||
Updated•14 years ago
|
Attachment #438857 -
Attachment is obsolete: true
Attachment #438857 -
Flags: review?(nnethercote)
Assignee | ||
Comment 27•14 years ago
|
||
(In reply to comment #26) > Created an attachment (id=439284) [details] > Let js/src/jsstdint.h and js/src/nanojit/VMPI.h communicate about <stdint.h> > emulation. > > Second cut at patch: use NANOJIT_STDINT_TYPES_DEFINED as symbol name. This allows arbitrary ordering, but I don't think that allowing TM or NJ (whoever gets there first) to define the stdint types is the right way to do it. I mentioned 559132, I think it should fix this by requiring either NJ or TM/TR to define them, but not both.
Assignee | ||
Updated•14 years ago
|
Attachment #439284 -
Flags: review?(nnethercote) → review-
Comment 28•14 years ago
|
||
(In reply to comment #27) > This allows arbitrary ordering, but I don't think that allowing TM or NJ > (whoever gets there first) to define the stdint types is the right way to do > it. I mentioned 559132, I think it should fix this by requiring either NJ or > TM/TR to define them, but not both. I'm not getting how this would work. If NJ needs them, it is a separable component, so it must define them. Thus TM should depend on NJ's definitions. Is that what you have in mind?
Assignee | ||
Comment 29•14 years ago
|
||
NJ is designed to be embedded in a bigger program, the "embedder". In practice, TM and TR are the only two embedders. NJ provides a bunch of declarations/definitions for the embedder to use. NJ is not totally stand-alone, however, and so requires the embedder to provide a few declarations/definitions. (See nanojit/avmplus.cpp, for example, which despite its name, is not a core part of nanojit -- TM uses it, but TR doesn't.) The interface between NJ and the embedder isn't very clear, hence bug 559132. That's why TM (the embedder) could provide the definitions of these stdint types for NJ's use. In this case that is probably best, as TM needs to define those types for many places where NJ isn't used, eg. in the interpreter. Make sense?
Comment 30•14 years ago
|
||
So NJ will not define them, but require its clients to define them somehow? "Don't include NJ headers until after you have defined the <stdint.h> types, using whatever shenanigans you like"?
Assignee | ||
Comment 31•14 years ago
|
||
(In reply to comment #30) > So NJ will not define them, but require its clients to define them somehow? > "Don't include NJ headers until after you have defined the <stdint.h> types, > using whatever shenanigans you like"? Something like that, I guess, it's all rather unclear at the moment, hence bug 559132. But having whichever gets there first (NJ or the embedder) declare them is not good.
Assignee | ||
Comment 32•14 years ago
|
||
(In reply to comment #16) > > Srsly, the only way to stamp out bootlegging is automation. At SGI many years > ago I dreamt of a static analysis framework that would use-count every > definition and declaration in a .h file, and identify bootleggers. Sixgill and > possibly even our GCC-based static analysis plugins may be up to this task. Low > priority! Bug 634839 may be the panacea here!
Assignee | ||
Comment 33•13 years ago
|
||
I will dup this forward to bug 634839; if this stuff is ever going to be improved, an automated tool is the right way to do it.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•