Extend nativegen.py to allow declaring slots with native C++ types

ASSIGNED
Unassigned

Status

Tamarin
Virtual Machine
--
enhancement
ASSIGNED
8 years ago
7 years ago

People

(Reporter: Steven Johnson, Unassigned)

Tracking

unspecified
Bug Flags:
flashplayer-qrb +
flashplayer-bug -

Details

(Whiteboard: Discussion-locus; invalid)

(Reporter)

Description

8 years ago
In TT, we allowed you to declare private slots with metadata that augmented the slots with C++ type information; this allowed us to remove the need to subclass ScriptObject in C++ to add member variables (they could be declared in AS3 and accessed via getters/setters). We should re-add this ability to nativegen.py, as it would simplify object layout and make partly-exact tracing easier and less error-prone for avmglue objects.

(See bug 576307 for related info.)
(Reporter)

Comment 1

8 years ago
Note also that completion of bug 538943 (DRC removal) will make this slightly simpler (though not required).

Updated

8 years ago
See Also: → bug 557688

Comment 2

8 years ago
I've experimented with various approaches to this issue for a day and it's complicated to cover all cases well unless we want to rewrite working code or restrict what people can do in C++.

The approach I've taken is to allow [native] metdata on slots, as in this case from XML.as:

    [native]
    public static var kAttribute:String;

With this information, nativegen.py can generate a C++-visible field called kAttribute, it can wrap that in a DRCWB macro (because it knows what String is), and it can generate a gc tracer for the class that traces that field.

Similarly nativegen.py can know about int and uint and avoid WB macros and tracing:

    [native]
    public static var m_prettyIndent:int;
    
A slightly more interesting example is this from Vector.as:

    [native(type="ClassClosure*")]
    private static var index_type:*; 

or again, this from XML.as, where the C++ code currently uses a byte:

    [native(type="uint8_t")]
    public static var m_flags:*;

In either case the type is unknown to nativegen.py but the presence of the '*' at the end of "ClassClosure*" allows it to infer that it is a pointer; absent any other information it assumes it's an RCObject* which again allows for proper WB macros and tracers.  On the other hand, for m_flags the type does not end in * and absent other information it is assumed to be an atomic non-traced member type.

Supposing it were a GCObject/GCFinalizedObject we can annotate it further to indicate that we want DWB and not DRCWB:

    [native(type="MethodEnv*", rc="no")]
    protected var env:*; 

Or we can indicate a non-pointer-containing field withe the "gc" specificer ("atomic" implies rc="no"; all RC objects are assumed to be pointer-containing):

    [native(type="uint8_t*", gc="atomic")]
    private var data:*; 

So far so good, but what is the meaning of the annotations, how are the properties visible, and so on (observe the mix of private and public above, which mirror the access modifiers in the existing C++ code).

My initial approach was that the [native] properties in the AS file would boil away completely and would only show up in the C++ file.  The appeal of that approach is that little or no C++ code need be rewritten save for cases where a C++ property name clashes with an existing property name in the AS3 code ('prototype' in ClassClass, 'date' in DateObject).  Hacking nativegen.py to emit the correct code was not hard but we currently have no infrastructure for stripping the [native] properties from the ABC file (or regenerating the ABC file without those properties present) and it's not clear to me that that would not change the meaning of the ABC file and the generated C++ unless we take a sophisticated approach.  So I've left the approach alone for now.

The other approach would be to treat the [native] properties as actual slots in the AS3 object, and generate C++ accessors for them on the C++ side as we do with 'native' properties now.  Initially that's less appealing because it means rewriting more code - possibly a lot of code - to use the new accessors rather than access what's now non-private properties directly.  But it would go like this:

    [native(access="public")]
    private native static var kAttribute:String;

    [native(type="ClassClosure*")]
    private native static var index_type:*; 

    [native(type="MethodEnv*", rc="no", access="protected")]
    private native var env:*; 

In this case the "native" specifier would say something only about the C++ side of things; the AS3 side of things would be specified entirely by AS3 code.  Properties without anything special going on - a private String or int slot, say - wouldn't even need a [native] attribute.  All the properties would have to be private or they'd be visible to user code, a definite no-no.

There's an open issue on whether intellisense etc would see and reveal the existence of the private properties, I'm going to assume not.

So the appeal here is that properties are simply moved from the C++ side to the AS3 side, and sometimes we specify something about how they are typed and how to GC them and what kind of barrier they have.

The downside is chiefly the churn in adapting C++ code to use getters and setters, but there's a trickier issue: C++ properties that don't fit into the "slots" idiom of AS3.  Consider this case from Array.as:

        [native(type="AtomArray", gc="struct", access="public")]
        private var m_denseArr:*;

AtomArray is not a pointer type, it is a structure type.  (gc="struct" means that the structure has a method gcTrace() that will be called to trace the contents.)  nativegen.py does not know its size, which causes problems for the slot layout algorithm for one (size of the slot array; alignment in the slot array).  We could expand it into separate properties in AS3 but now the churn is becoming unreasonable.

The case from XML.as discussed earlier is also instructive:

    [native(type="uint8_t")]
    public static var m_flags:*;

Suppose there were a second field:

    [native(type="uint8_t")]
    public static var m_capabilities:*;

The C++ programmer would expect those two fields to be packed together; how we would handle that in nativegen.py without manually packing them is not clear.  (The obvious hack is to add 'bytes="1"' to both to tell nativegen.py to pack them together in a slot; ick.)

Finally, a complication is that if an AS3 slot contains a named field whose contents is a pointer to managed storage but not an AvmPlusScriptableObject or a boxed double (consider the MethodEnv* example) then there is a grave danger that the AS3 code in that class will be rewritten to access that field.  In that case, all bets are off - if it gets read it will be treated as an atom, following which it will be re-tagged if it is stored.  It will not end well.

All that said, these are exceptions from the rule, which is that so far it looks like a lot of glue code has simple needs: simple pointer and non-pointer fields.  Autogenerating code for those fields will be beneficial in most cases because it removes the need to manually add WB macros and manually write precise tracers.  It is possible to live with not handling eg the Array object because a precise tracer can be hand-written at modest cost.

Updated

8 years ago
See Also: → bug 576307

Updated

8 years ago
Depends on: 509070

Comment 3

8 years ago
Just for the sake of completion, a dehydra approach could work along these lines:

* gc_trace_maker.js could see all C++ type definitions, and take commandline args or a config file that specified which types we wanted to generate tracers for.

* it would have hardwired knowlege of each kind of traceable field (int, uint, int64, the write barrier smart pointers, naked pointers, etc.  I'm going out on a limb that it would know about at least the same types as the nativegen.py approach knew about, plus be smart about WB's and structs.

* it could print errors for illegal constructs (such as? tbd).  e.g. we could require WB's and not allow unwrapped pointers to GC objects.

* it could generate the T::gcTrace() method for structs, and for the overall class, and the output could be #included into an existing source file.

* a different dehydra script could run during build automation to *check* that generated gcTrace() methods are still valid.  So, a dev could ignore the whole toolchain until needing to add/remove/rename fields.  

Limitations:
* linux only, gcc-4.3 or 4.5 only.  This could be a deal killer, but with the other approaches having some hard tradeoffs, maybe its not.  Over time the situation could improve.

* I've done enough experimentation to find that use or non-use of GCObject as a base class is pretty random.  We might need to tidy up pointer declarations to make them visible to the script.  But we'd have to do that anyway as part of moving them to AS3 declarations.

* packaging the whole tool in a small vmware vm would not be hard.

Comment 4

8 years ago
Som further notes from me.

1) Slots cannot be declared 'native' in AS3, but this is no big deal, the 'native' qualifier was a red herring anyway.

2) I have working code that allows me to move pointer fields into the AS3 files with a disciplined escape to C++ to handle odd cases like AtomArray, as described above.  The tracers are autogenerated.  This is the high bit and I'm reasonably happy with how little work is involved, modulo changing from field accesses to getter/setter calls.  (Non-pointer fields can be left in the C++ file... whether that's a good idea or not does not matter to me right now, but it reduces the engineering work and churn probably.)

I will try to post some patches tomorrow, the patch queue is getting pretty long.

Comment 5

8 years ago
A complication in the above scheme is members that are defined under #ifdef control in the C++ header file, notably #ifdef DEBUGGER.

How do we express this on the AS3 side?

We don't want the member to be there unconditionally, so a similar kind of configuration control will be required in AS3.  We can use CONFIG:: controls but that means (re)building the AS3 builtins differently for different kinds of builds.

If DEBUGGER fields are exceptions to the rule then the "disciplined escape mechanism" can be used to handle them, I suppose.

Comment 6

8 years ago
(In reply to comment #2)

I'm sure you covered this, but I want to call it out specifically. In

    [native(type="uint8_t")]
    public static var m_flags:*;

It's fairly clear what the corresponding C++ code would look like, but I can't even begin to imagine what it would mean to access it in ActionScript. Because it would look exactly like a tagged atom to ASC and at the bytecode level.
(Reporter)

Comment 7

8 years ago
(In reply to comment #6)
> I'm sure you covered this, but I want to call it out specifically. In
> 
>     [native(type="uint8_t")]
>     public static var m_flags:*;

Hmm... for non-AS3-visible fields, maybe we don't want (or need) an AS3 var declaration at all, maybe we could do it entirely with metadata:


    [native(var="m_flags", type="uint8_t", access="private")]

Comment 8

8 years ago
(In reply to comment #7)
> (In reply to comment #6)
> > I'm sure you covered this, but I want to call it out specifically. In
> > 
> >     [native(type="uint8_t")]
> >     public static var m_flags:*;
> 
> Hmm... for non-AS3-visible fields, maybe we don't want (or need) an AS3 var
> declaration at all, maybe we could do it entirely with metadata:
> 
> 
>     [native(var="m_flags", type="uint8_t", access="private")]

My thinking has moved on a bit from the earlier post, in response to similar issues, and right now the (simple) rules are:

 - Fields that will hold native pointers must be declared 'uint' in AS3:

       [native(type="ClassClosure*")]
       private static var env:uint;

   That is so because declaring it 'uint' provides us with untagged Atom-sized
   storage initialized to zero, and that's not likely to change.  We could
   stick with '*' as I earlier proposed but it would mean either hacking
   ASC or doing some bytecode rewriting, neither of which is attractive at
   this point.

 - Fields that hold atoms should be declared '*' obviously, and no [native]
   is needed.

 - Fields that hold pointers to AvmPlusScriptableObject subtypes should be
   declared with their native type (String, Namespace, XyzObject).  Again
   no [native] is needed.  The only tricky part here is 'ScriptObject' which
   would have to be declared as 'Object' in AS3 but that actually means
   'Atom'.  So here we'd have to fall back to [native(type="ScriptObject*")].

 - In the example Stan cites, the only reasonable type for m_flags is uint,
   not * as I earlier wrote (again modulo ASC/bytecode hacking).

 - Generally, however, fields that don't hold pointers or Atoms should
   simply not be moved into AS3: there's no actual advantage to doing so,
   save regularity in how members are accessed.  My only concern right now is
   garbage collection and pointers, and since I have rules and mechanisms
   to cover those I am likely to leave it at that.

(Currently conducting some experiments in the Flash Player, see bug #576307 and the Watson bug linked from there.)

Comment 9

8 years ago
>  That is so because declaring it 'uint' provides us with untagged Atom-sized
>  storage initialized to zero, and that's not likely to change

The gotcha is that on 64-bit, uint = uint32_t, and pointers and Atom are 64bits.  But, the code that lays out slots might be able to take [native] into account and give [native] uint's 64 bits (ugly, but this feature sort of threads the needle, so maybe okay).

Comment 10

8 years ago
(In reply to comment #9)
> >  That is so because declaring it 'uint' provides us with untagged Atom-sized
> >  storage initialized to zero, and that's not likely to change
> 
> The gotcha is that on 64-bit, uint = uint32_t, and pointers and Atom are
> 64bits.

Hm, good point.

But uint values in an Atom are more than 32 bits, are they not?

> But, the code that lays out slots might be able to take [native] into
> account and give [native] uint's 64 bits (ugly, but this feature sort of
> threads the needle, so maybe okay).

The code that lays slots out will not take [native] into account, that's the whole point.  [native] is a signal to nativegen.py about how to emit C++ code, not to anyone else for any other purpose - as it stands now.  It's becoming clear that that's not tenable.

Comment 11

8 years ago
(In reply to comment #10)

> Hm, good point.
> 
> But uint values in an Atom are more than 32 bits, are they not?

on 64-bit, yeah, they can get up to 53 bits.  well... whatever # of bits such that conversion to double won't lose precision in the mantissa.  on 32-bit, they can get up to 28 bits (MSB must be 0) before overflowing to kDoubleType.   But a field declared :uint won't use Atom.

> > But, the code that lays out slots might be able to take [native] into
> > account and give [native] uint's 64 bits (ugly, but this feature sort of
> > threads the needle, so maybe okay).
> 
> The code that lays slots out will not take [native] into account, that's the
> whole point.  [native] is a signal to nativegen.py about how to emit C++ code,
> not to anyone else for any other purpose - as it stands now.  It's becoming
> clear that that's not tenable.

sigh, yeah.

Comment 12

8 years ago
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
>    That is so because declaring it 'uint' provides us with untagged Atom-sized
>    storage initialized to zero, and that's not likely to change.  

I'm not sure I follow this...how are these values being cleared?
(Reporter)

Comment 13

8 years ago
(In reply to comment #12)
> I'm not sure I follow this...how are these values being cleared?

Any member var declared "uint" is (by default) initialized to zero on object creation.

Comment 14

8 years ago
Actually this appears to be a better approach:

In Object.as we add these that are only visible to the builtins:

    [API(CONFIG::AIR_SYS)]
    public class GCObject extends Object {}

    [API(CONFIG::AIR_SYS)]
    public class GCAtomicObject extends Object {}

    [API(CONFIG::AIR_SYS)]
    public class RCObject extends Object {}

and then in Vector.as we can do this:

    [native(type="ClassClosure*")]
    private static var index_type:RCObject; 

Now the 'type' attribute is used to give the proper C++ type name to the field, but whether the pointed-to object is RC or pointer-containing is encoded in the AS type name, and since the AS type is reprsented an untagged pointer the field will be of the correct size on all platforms while still being initialized as the desired all-zero-bits native C++ NULL pointer.

All that's required, I think, is to enshrine the three internal class names as something nativegen.py can depend on, and then make it depend on those when it  generates tracers.  Slot layout will be correct automatically.  ASC need not change at all.
Assignee: nobody → lhansen
Blocks: 576307
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P3
Hardware: x86 → All
Target Milestone: --- → flash10.x - Serrano
(Reporter)

Comment 15

8 years ago
(In reply to comment #14)
>     [API(CONFIG::AIR_SYS)]
>     public class GCObject extends Object {}

Tricky gotcha: right now all slots that are descendants-of-Object are assumed to be descendant-of-ScriptObject (and thus RCObjects); various bits of code assume this. (When we one day nuke RCObject this will of course become a nonissue, but that day is not here yet.)

Comment 16

8 years ago
(In reply to comment #15)
> (In reply to comment #14)
> >     [API(CONFIG::AIR_SYS)]
> >     public class GCObject extends Object {}
> 
> Tricky gotcha: right now all slots that are descendants-of-Object are assumed
> to be descendant-of-ScriptObject (and thus RCObjects); various bits of code
> assume this.

Details on which bits of code might assume it would be most welcome, of course.

> (When we one day nuke RCObject this will of course become a
> nonissue, but that day is not here yet.)

I don't understand why it would become a nonissue, as we'll still have the distinction between GCObject and GCFinalizedObject - that might go away too but it's going to be tricky in practice, so I wouldn't bet on it.

I think that nails it - it's going to be entirely inappropriate to declare these internal pointer types in AS3.  I'm going to be looking for a different approach.

(We should treat this bug as RESOLVED - INVALID but I'm keeping it open as a discussion locus for the time being.)
No longer blocks: 576307
No longer depends on: 509070
Priority: P3 → --
Whiteboard: Discussion-locus; invalid
Target Milestone: flash10.x - Serrano → ---

Comment 17

8 years ago
here's a thought, if ScriptObject is subclassed, move slots out of object into a separate allocation and exact mark those using the traits aware marking scheme.   If you look at the most common player object (SpriteObject) it doesn't have a lot of C++ members and they are all pointers (pointer dense).   However some of the flex subclasses of SpriteObject have hundreds of properties.  Ie its more important to exactly mark the slots than the native stuff.

Comment 18

8 years ago
ie: the Application class has an extra size of ~1488 and the SpriteObject itself is only like 76 bytes
(Reporter)

Comment 19

8 years ago
(In reply to comment #16)
> Details on which bits of code might assume it would be most welcome, of course.

Ah yes. Most obvious examples are ScriptObject::getSlotAtom/getSlotObject/coerceAndSetSlotAtom. Probably others too if I dug more.
 
> I don't understand why it would become a nonissue, as we'll still have the
> distinction between GCObject and GCFinalizedObject - that might go away too but
> it's going to be tricky in practice, so I wouldn't bet on it.

Doh. Yes.

Comment 20

8 years ago
(In reply to comment #17)
> here's a thought, if ScriptObject is subclassed, move slots out of object into
> a separate allocation and exact mark those using the traits aware marking
> scheme.   If you look at the most common player object (SpriteObject) it
> doesn't have a lot of C++ members and they are all pointers (pointer dense).  
> However some of the flex subclasses of SpriteObject have hundreds of
> properties.  Ie its more important to exactly mark the slots than the native
> stuff.

That may be so.  For now my aim is to be able to exactly mark also the native stuff, and without changing more than I have to, and also without breaking up allocations or creating indirections more than I have to.

It's not hard to write code by hand to perform the scanning, and that code can be very stylized so that it's hard to introduce errors when constructing it.  It's a little harder to maintain it unfortunately, which is why it's appealing to have the scanner be generated from the same slot description used by the AS3 or C++ compiler.

There's some discussion of this now in the design document for bug #576307, likely some machine-readable annotation in the .h files will be most appropriate.  Maintainability is still going to be awkward given how a lot of the glue classes in the Flash Player are laid out (large classes with data members declared in various places).
(Reporter)

Comment 21

8 years ago
(In reply to comment #20)
> Maintainability is still going to be awkward given how a lot of
> the glue classes in the Flash Player are laid out (large classes with data
> members declared in various places).

That's an antipattern that is (sadly) quite prevalent in Player glue code. IMHO we should take this opportunity to clean that as it presents itself.

Updated

8 years ago
Assignee: lhansen → stejohns

Updated

7 years ago
Assignee: stejohns → nobody
Flags: flashplayer-qrb+
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.