Closed Bug 604701 Opened 14 years ago Closed 14 years ago

Create a script to generate tracers for properties in C++ objects

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

Attachments

(2 files, 6 obsolete files)

The script will take as input .as and .h files with annotations on traceable properties, and generate appropriate C++ member functions to trace those objects exactly. The script must handle both ScriptObject-derived types (that have an AS3 presence) but must also be able to handle most internal managed data types. The script should perform error checking: in particular, it should be able to discover misspelled annotations. The script will be called from nativegen.py; changes to nativegen.py are considered part of this bug.
Blocks: 604702
Attached file Documentation for annotation system (obsolete) —
This is the documentation for the annotation system. I'd like a review on this. A small example follows in the next attachment.
Attachment #485633 - Flags: review?(treilly)
Attachment #485633 - Flags: review?(stejohns)
Attachment #485634 - Flags: review?(treilly)
Attachment #485634 - Flags: review?(stejohns)
To be specific, what I'm looking for are opinions on the soundness of this system. It's clear that annotation does not come without risk, but here the risk is reduced in several ways: - The use of macros rather than structured comments means that most misspellings will result in compilation errors. - There is redundancy between GC_CPP_EXACT and GC_AS3_EXACT on the one hand and GC_DATA_BEGIN and GC_DATA_END on the other, and this redundancy will be checked by the script. - The script will check that a file that contains any annotation contains a meaningful set of annotations that belong together (eg, GC_POINTER by itself is an error because it requires a GC_CPP_EXACT or GC_AS3_EXACT, and those require a matching GC_DATA_BEGIN and GC_DATA_END). In other words, assume roughly as much error checking as possible in the generation of the tracers. Note in the example that it's self-contained; no other code needs to be written in either .h or .cpp files.
Comment on attachment 485633 [details] Documentation for annotation system I'm guessing that we will probably also want a simpler, ReadersDigest version for Flash Player glue engineers, with a simple cookbook of best practices.
Attachment #485633 - Flags: review?(stejohns) → review+
Attachment #485634 - Flags: review?(stejohns) → review+
Comment on attachment 485633 [details] Documentation for annotation system Why does GC_AS3_EXACT come after class but GC_CPP_EXACT subsumes the "class", or is that a typo? I'm not sure I can review this thoroughly w/o seeing the definitions of all these macros.
Attachment #485633 - Flags: review?(treilly) → review-
Comment on attachment 485634 [details] Example of the annotation system in use Looks fine based on docs, what happens if you leave off GC_CPP_EXACT? What happens if you forget a GC_DATA_BEGIN or GC_DATA_END macro? Ditto for GC_POINTER and GC_STRUCTURE? I suspect some are compile errors and some are silent errors (which will be addressed by our exact tracing audit bug).
Attachment #485634 - Flags: review?(treilly) → review+
(In reply to comment #6) > Comment on attachment 485634 [details] > Example of the annotation system in use > > Looks fine based on docs, what happens if you leave off GC_CPP_EXACT? What > happens if you forget a GC_DATA_BEGIN or GC_DATA_END macro? Ditto for > GC_POINTER and GC_STRUCTURE? I suspect some are compile errors and some are > silent errors (which will be addressed by our exact tracing audit bug). There are interlocks in the script, so GC_CPP_EXACT without GC_DATA_BEGIN or GC_DATA_END, or either end of that pair missing, is a static error. If you leave off GC_POINTER or GC_STRUCTURE there's nothing I can do; that's just a fact of life. You want garbage collection with C++? Sure, I'll take your money...
(In reply to comment #5) > Comment on attachment 485633 [details] > Documentation for annotation system > > Why does GC_AS3_EXACT come after class but GC_CPP_EXACT subsumes the "class", > or is that a typo? Typo. > I'm not sure I can review this thoroughly w/o seeing the > definitions of all these macros. You ought to be able to, since it's the documentation that needs review. (If you need to see the implementation then the documentation is not good enough.)
(In reply to comment #4) > Comment on attachment 485633 [details] > Documentation for annotation system > > I'm guessing that we will probably also want a simpler, ReadersDigest version > for Flash Player glue engineers, with a simple cookbook of best practices. A lot of the complexity comes from catering to various bizarre needs in the VM; the avmglue classes should be a lot simpler. But once you start using GC'd storage for arbitrary C++ data types the complexity goes up again.
Attachment #485633 - Attachment is obsolete: true
Attachment #485634 - Attachment is obsolete: true
These macros fit into GC.h and are used to annotate C++ code for exact tracing. An extensive user's guide is in the script that is part of the next patch.
Attachment #488837 - Flags: review?(treilly)
Attachment #488837 - Flags: review?(stejohns)
Three things here: - A new script, utils/exactgc.as, that is used to generate trace functions from annotations added by hand to .as and .h files. Extensive documentation at the beginning of the script. - Changes to utils/nativegen.py to generate a trace function for those slots that are part of the hidden C++ slot structure for the native class. This change is /only/ necessary because the slot descriptor (currently the "destroy info") does not track those slots. If it did, the slots would be traced by more general machinery. - Changes to builtin.py and shell_toplevel.py to invoke the new script when the builtins are built. Again, this change is /only/ necessary because the slot descriptor does not track slots in the native classes. (There's a slight bug in the change to nativegen.py that I will not fix right now, for TracePointer(p) read TraceLocation(&p).)
Attachment #488839 - Flags: review?(treilly)
Attachment #488839 - Flags: review?(stejohns)
Comment on attachment 488837 [details] [diff] [review] Macros used for annotating C++ code style nit: IMHO this would be more readable with vertical alignment for groups of macro payloads, eg #define GC_POINTER(field) field #define GC_POINTER_IFDEF(field,ifdef) field also, the intended usage of the _IFDEF variants is not clear from this patch alone; presumably it will be made clear in subsequent patches?
Attachment #488837 - Flags: review?(stejohns) → review+
Comment on attachment 488839 [details] [diff] [review] Script and changes to generate exact tracers since we're adding yet more autogenerated files, I wonder if this is a good excuse to go ahead and stop checking them in to source control (along with builtin.cpp.h/etc)? (would require revising the build process and adding asc.jar to the required toolset, those definite scope creep for this bug)
> > since we're adding yet more autogenerated files, I wonder if this is a good > excuse to go ahead and stop checking them in to source control (along with > builtin.cpp.h/etc)? (would require revising the build process and adding > asc.jar to the required toolset, those definite scope creep for this bug) This bug discusses the pros/cons of removing the generated builtin code, bug #493609
Comment on attachment 488839 [details] [diff] [review] Script and changes to generate exact tracers uber-nit: the comment "Substructures, like List<> members, AtomArray" refers to classes that no longer exist in TR tip. For the line: while (s.length > 0 && s.charAt(s.length-1) == "\n") will this handle all possible line-endings? I suspect not; we should ensure that we handle at least LF and CRLF properly. (Or is this code only used for internally-generated output? If the latter, nevermind) typo nit: // an inline fixed-size pointer array for the lookup cache, and it it ^^ 2000 seems to be a magic chunk-size constant used in a few places; naming it might be useful for future experimentation and code hygiene. The technique used in noUsefulTracer() is interesting, I've never seen that before... efficiency q: while efficiency doesn't matter (much) in this script execution, are we constructing that object literal every time we call the function? If so, pulling it into a constant seems like a simple win.
Attachment #488839 - Flags: review?(stejohns) → review+
Comment on attachment 488837 [details] [diff] [review] Macros used for annotating C++ code At one point I went through and made all GC macros start with MMGC, obviously some (notably all the write barrier macros) weren't changed. I don't feel strongly either way just wanted to bring it up. We could take this opportunity adopt the rule that MMgc end user macros must start with GC_. With the safegc work most of the write barrier related exception go away and the GC prefix on GCRef,GCMember jives well with GC_ for macros.
Depends on: 610946
No longer depends on: 610946
Comment on attachment 488837 [details] [diff] [review] Macros used for annotating C++ code DECLARE_EXACT_GC_METHODS is the only one that doesn't start with GC_ but its not likely to cause collisions and is only for exceptions. For some oddball player use case I presume?
Attachment #488837 - Flags: review?(treilly) → review+
I'm not done but here's what I got so far: Docs about writing manual tracers in the script for auto generating tracers is surprising. But I like that all the docs are in one place. I worry about shell limits, exactgc should take an argument that is a file containing file names or read file names from stdin. I thought names starting with _ in C were verbotten? Can we add some comments on what the the interlock system does and how it works? Why are all the gen'd headers included avmshell.cpp? Doesn't seem like the shell/player should have to include the core stuff. Maybe avmplus.cpp or builtin.cpp?
Could emitArrayChunked use a helper in the GC so we don't emit so much boilerplate? What happens if I have a class with more than one array to trace chunkedly? won't the names collide? emitArrayChunked has a return at the end of it but code appears after it, ie: return true; #ifndef GC_TRIVIAL_TRACER_VTable #endif } This looks like an error waiting to happen
(In reply to comment #15) > Comment on attachment 488839 [details] [diff] [review] > Script and changes to generate exact tracers > > uber-nit: the comment "Substructures, like List<> members, AtomArray" refers to > classes that no longer exist in TR tip. Good catch. > For the line: > > while (s.length > 0 && s.charAt(s.length-1) == "\n") > > will this handle all possible line-endings? I suspect not; we should ensure > that we handle at least LF and CRLF properly. (Or is this code only used for > internally-generated output? If the latter, nevermind) It is only used for internally-generated output so \n should mean \n. > typo nit: > > // an inline fixed-size pointer array for the lookup cache, and it it > ^^ OK. > 2000 seems to be a magic chunk-size constant used in a few places; naming it > might be useful for future experimentation and code hygiene. It's more arbitrary than magic actually but you're right, if it's used more than one place it should be named. > The technique used in noUsefulTracer() is interesting, I've never seen that > before... Hm, I thought nativegen.py did very similar things with dictionaries. No matter. > efficiency q: while efficiency doesn't matter (much) in this script > execution, are we constructing that object literal every time we call the > function? Yes. In principle the VM could discover - at least at run-time - that it does not need to be constructed every time (plain object, hasOwnProperty does not modify it unless the method has been replaced in Object.prototype), but I'd be surprised. > If so, pulling it into a constant seems like a simple win. I'll shelve that suggestion for now; if the performance of the exactgc.as script becomes important I think it will be useful to profile it before going after specific optimization opportunities.
(In reply to comment #17) > Comment on attachment 488837 [details] [diff] [review] > Macros used for annotating C++ code > > DECLARE_EXACT_GC_METHODS is the only one that doesn't start with GC_ but its > not likely to cause collisions and is only for exceptions. For some oddball > player use case I presume? That's the thought. The name is accidental; I'll clean it up.
(In reply to comment #18) > Docs about writing manual tracers in the script for auto > generating tracers is surprising. But I like that all the docs > are in one place. And I wish more scripts would be documented that way. (*cough* nativegen *cough*). > I worry about shell limits, exactgc should take an argument that > is a file containing file names or read file names from stdin. I suppose that's a good point. I'll not make that block the initial landing but I'll create a followup work item that we'll need for Player work. > I thought names starting with _ in C were verbotten? They are, but we use them everywhere, and I gave up tilting at that particular windmill. > Can we add some comments on what the the interlock system does > and how it works? Sure. > Why are all the gen'd headers included avmshell.cpp? Doesn't > seem like the shell/player should have to include the core stuff. > Maybe avmplus.cpp or builtin.cpp? This is partly rational, partly accidental. The rational bit is that the interlock system depends on seeing all interlock definitions, thus avmplus-gc-interlock.h must be included when we compile avmshell-cpp-gc.h and avmshell-as3-gc.h. At that point it seemed easier simply to include all files at the same point. The Player will have the same problem and I expected it would be solved the same way there. I don't mind having the avmplus-*-gc.h files handled exclusively by core code though, I can change that. And I guess it's possible to duplicate the core interlocks in the shell/player files somehow, but it just moves the complexity around, it doesn't reduce it or remove it. So I'll probably not try to change that.
(In reply to comment #19) > Could emitArrayChunked use a helper in the GC so we don't emit so much > boilerplate? Maybe. > What happens if I have a class with more than one array to trace chunkedly? > won't the names collide? There is an arbitrary restriction (resulting in an abort) in exactgc.as that prevents the use of more than one array per class. If that were to be listed it would still be the case that only one array could be variable length, and it would be relatively astonishing if the fixed-length array were so large that it should be traced chunkedly. So you can call this a punt or a deliberately complexity-reducing design decision. > emitArrayChunked has a return at the end of it but code appears after it, ie: > > return true; > #ifndef GC_TRIVIAL_TRACER_VTable > > #endif > } > > This looks like an error waiting to happen Yes, that's a bug. Thanks.
(In reply to comment #23) > If that were to be listed ^^^^^^ lifted
(In reply to comment #23) > (In reply to comment #19) > > Could emitArrayChunked use a helper in the GC so we don't emit so much > > boilerplate? > > Maybe. Specifically, there are three instances of this in all of avmcore; the others locations for trailing arrays hint that the array is small and avoid the boilerplate. So I think I'll leave this alone for now, and if it turns out to be a problem we'll fix it later.
Only one change: the name of the macro to declare the tracers.
Attachment #488837 - Attachment is obsolete: true
Attachment #489800 - Flags: review?(treilly)
In this version I've addressed most comments from Steven and Tommy (including the ability to read filenames from a file using the standard '@filename' syntax).
Attachment #488839 - Attachment is obsolete: true
Attachment #489801 - Flags: review?(treilly)
Attachment #488839 - Flags: review?(treilly)
Attachment #489800 - Flags: review?(treilly) → review+
Comment on attachment 489801 [details] [diff] [review] Script and changes to generate exact tracers, v2 Some builtin.cpp changes here, do you have a newer ASC or something?
Attachment #489801 - Flags: review?(treilly) → review+
(In reply to comment #28) > Comment on attachment 489801 [details] [diff] [review] > Script and changes to generate exact tracers, v2 > > Some builtin.cpp changes here, do you have a newer ASC or something? I may have an older ASC, actually...
Two to-do items I will note for posterity: - It wouldn't be very hard to support classes that derive from templated classes with a syntax like "GC_CPP_EXACT_T(myclass, baseclass, parameter)"; at this time, tracers for this case must be hand-written (eg VectorObject). - It wouldn't be very hard to support generation of tracers for in-line structures, like LookupCache in MethodEnv.h, with a structure along the lines of "GC_CPP_STRUCT(name)".
Very minor update to the script to deal with some refactoring in Tamarin, plus support for generating tracers for native classes whose names in the AS3 files are of the form "::avmshell::Fnord". I'm going to a assume a re-review is not necessary.
Attachment #489801 - Attachment is obsolete: true
(Without builtin.cpp and builtin.abc changes)
Attachment #491159 - Attachment is obsolete: true
changeset: 5584:4b1b3f32c071 user: Lars T Hansen <lhansen@adobe.com> summary: ExactGC work: Part of 604701 - Create a script to generate tracers for properties in C++ objects: annotation macros (r=treilly) http://hg.mozilla.org/tamarin-redux/rev/4b1b3f32c071
Comment on attachment 491160 [details] [diff] [review] Script and changes to generate exact tracers, v3 I mistakenly thought I had to review the nativegen patch in redux-revolution as part of the work for Bug 604702. So since I already went to the trouble of writing up comments on the exactgc.as script there, here they are. (If it is excessively bad form to post an unrequested review, let me know.) utils/exactgc.as - the GC_CONSERVATIVE doc says: The field can be a uintptr_t, intptr_t, or some pointer type (if that pointer carries tags). GC_CONSERVATIVE do not actually require the pointer to carry tags, but the parenthetical could be easily interpreted as an "only if." Possibly rephase as "(where that pointer may carry tags)" - In exactgc.as, note that the _IFDEF and _IF variants are meant to work in _tandem_ with ifdef's that are also in the input text (right?). - (Orthogonal Scope): It seems like this and a few other scripts (apivergen.as, opcodes.as, peephole.as) have hard-coded the LICENSE block for insertion into the generated code. Would it make sense to consolidate the content of the text into a one text file somewhere (at least -- one might go further and try to consolidate the scattered quotation mechanisms)? - RE: hard-coding skipping of GC.h into exactgc.as: the exactgc is special-purpose, so I don't mind this sort of wart; but I am debating whether mentioning it in "Usage & options" at the top is worthwhile. Maybe just say in "Usage & options" that "(some files are skipped; see processOptionsAndFindFiles fcn below)" - Style: I hate staring at regexps in-line, e.g. in parseAttrValue (maybe I'm just not a PCRE-hacker; obviously so, since I had never encountered the non-capturing-grouping operator ``(?:pattern)'' before now), and I'm still unsure whether I've properly identified what each of them is doing. Anyway, consider lifting the regexp-producing expressions out of your if-tests and giving them names that reflect what they do, e.g.: var matchString = /^\s*\"([^\"]*)\"\s*$/ var matchWithTemplates = /^\s*((?:< | >|[a-zA-Z0-9_:<>])+)\s*$/ var matchNumber = /^\s*([0-9]+(?:\.[0-9]+)?)\s*$/ (Extra-credit: after doing so, update splitAttrs so that the other occurrences of these patterns in a single regexp are replaced with references to the bound regexps instead.) Also, assuming that I have properly identified what the regexp I've named matchWithTemplates is doing, what's the consequence if the input has more than one space after < or before >? (I.E. how unreadable is the resulting error message from your script? Can it silently produce crazy output?) - I'm surprised that the standard pattern for type-dispatch is throw/catch, but I guess the displined alternative is the visitor pattern and that's probably overkill here. - in the three printToFile invocations at end the emitTracers include a note about which script generated it. Something like: "machine generated file via exactgc.as -- do not edit" utils/nativegen.py - Yay documentation! Make sure to open a bug for resolving the "?"'s next to the methods and customconstruct named attributes in your comment.
(In reply to comment #34) > Comment on attachment 491160 [details] [diff] [review] > Script and changes to generate exact tracers, v3 > > (If it is > excessively bad form to post an unrequested review, let me know.) It isn't :-) > utils/exactgc.as > - the GC_CONSERVATIVE doc says: > The field can be a uintptr_t, intptr_t, or some pointer type > (if that pointer carries tags). > GC_CONSERVATIVE do not actually require the pointer to carry > tags, but the parenthetical could be easily interpreted as an > "only if." Possibly rephase as "(where that pointer may carry tags)" Right. > - In exactgc.as, note that the _IFDEF and _IF variants are meant > to work in _tandem_ with ifdef's that are also in the input > text (right?). Yes indeed, and good point. > - (Orthogonal Scope): It seems like this and a few other scripts > (apivergen.as, opcodes.as, peephole.as) have hard-coded the > LICENSE block for insertion into the generated code. Would it > make sense to consolidate the content of the text into a one > text file somewhere (at least -- one might go further and try > to consolidate the scattered quotation mechanisms)? Maybe, but I'm not going to worry about it. > - RE: hard-coding skipping of GC.h into exactgc.as: the exactgc is > special-purpose, so I don't mind this sort of wart; but I am > debating whether mentioning it in "Usage & options" at the top > is worthwhile. Maybe just say in "Usage & options" that "(some > files are skipped; see processOptionsAndFindFiles fcn below)" I think I had it in there but I removed it as "too much information". > - Style: I hate staring at regexps in-line, e.g. in parseAttrValue > (maybe I'm just not a PCRE-hacker; obviously so, since I had > never encountered the non-capturing-grouping operator > ``(?:pattern)'' before now), and I'm still unsure whether I've > properly identified what each of them is doing. Anyway, > consider lifting the regexp-producing expressions out of your > if-tests and giving them names that reflect what they do, e.g.: > > var matchString = /^\s*\"([^\"]*)\"\s*$/ > var matchWithTemplates = /^\s*((?:< | >|[a-zA-Z0-9_:<>])+)\s*$/ > var matchNumber = /^\s*([0-9]+(?:\.[0-9]+)?)\s*$/ That actually changes the meaning if you're not careful :-) It may not matter too much in this case, but a literal regular expression is recreated every time it is encountered, this matters because a regular expression object is stateful, and sometimes the state affects the match. > Also, assuming that I have properly identified what the regexp > I've named matchWithTemplates is doing, what's the consequence > if the input has more than one space after < or before >? > (I.E. how unreadable is the resulting error message from your > script? Can it silently produce crazy output?) Probably. I'll look into it. > - I'm surprised that the standard pattern for type-dispatch is > throw/catch, but I guess the displined alternative is the > visitor pattern and that's probably overkill here. That's a bit of a hack. For ES4 we invented 'switch type ...', which is easily translated into a try/catch, and for some reason I decided to just go with that style here, because it saves me from a type test followed by an explicit cast for every branch (ick). > - in the three printToFile invocations at end the emitTracers > include a note about which script generated it. Something like: > "machine generated file via exactgc.as -- do not edit" Sure. > utils/nativegen.py > - Yay documentation! Thank you :-) > Make sure to open a bug for resolving the > "?"'s next to the methods and customconstruct named attributes in > your comment. Will do.
> > - I'm surprised that the standard pattern for type-dispatch is > > throw/catch, but I guess the displined alternative is the > > visitor pattern and that's probably overkill here. > > That's a bit of a hack. For ES4 we invented 'switch type ...', which is easily > translated into a try/catch, and for some reason I decided to just go with that > style here, because it saves me from a type test followed by an explicit cast > for every branch (ick). (offtopic, but i couldnt help it) FWIW, I consider this a first class idiom, and can say with 60% certianty that some future JIT will support it and turn switch type into something smart (either fast series of load/cmp/branch tests, a perfect hash of possible types, or something), and also boil away the exception handling overhead when possible. (eg if the exception can't escape, no setjmp/longjmp mess is required.) The alternatives you'd open code in AS3 (if (x is T) {} elseif ...) are wordier and harder to optimize because you have to count on predicate analysis to optimize away the explicit casts. otoh, a dedicated typeswitch opcode could be nice, but also requires new (different) surface syntax, compiler/ide support, abc changes, and so on.
Blocks: 617936
Blocks: 617939
changeset: 5660:1c6307dd90fc user: Lars T Hansen <lhansen@adobe.com> summary: Fix 604701 - Create a script to generate tracers for properties in C++ objects (r=treilly, r=stejohns) http://hg.mozilla.org/tamarin-redux/rev/1c6307dd90fc
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: