Closed Bug 629746 Opened 13 years ago Closed 6 years ago

ANI: need an opaque Atom type for outside-the-VM code

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: stejohns, Unassigned)

References

Details

Attachments

(2 files, 3 obsolete files)

      No description provided.
Blocks: 629757
Attached patch Sketch of Box (obsolete) — Splinter Review
Here's my initial sketch, based loosely on the old TT Box work. Important differences to note:

-- This API is designed for things *outside the VM*; code inside the VM will get a superset (or possibly a completely different API).

-- I deliberately simplified the API and removed stuff that outside-the-VM code is unlikely to need. Want it simple.

-- API needs to have identical semantics for initial Box (which is same bit pattern as Atom) and future NaN-encoded Box.

-- Collapsed all numeric types into a single "BoxType" variant; there are two good reasons for this: (1) Atom and TT-Box have subtly different semantics about when they are Integer vs Double (as does 32-bit vs 64-bit Atom); (2) I don't think that I believe that there's code outside the VM that needs to know about the distinction, either for functional or performance reasons. So the semantics I'm offering are that a Box holds numeric values; you can extract them as double or int, and create the same way, but we don't promise the internal rep.

-- All creation (etc) is done via a Boxer utility class (which is likely to be a thin wrapper around AvmCore). The motivation for this is that some calls need extra args currently, for GC etc, (eg boxFromDouble) but won't in the future; having everything be an instance method of this simplifies the transition at minimal cost (assuming that compilers optimize away pointless references to "this".

(The other thing this will need is a way to convert between Box and Atom; obviously in the initial state that's trivial, with more complex tricks involved if Box is really a NaN-box approach, but IMHO those conversions don't belong in this conversation since they will be a transitory thing.)
Assignee: nobody → stejohns
Attachment #512357 - Flags: feedback?(edwsmith)
Attachment #512357 - Flags: feedback?(lhansen)
Attachment #512357 - Flags: feedback?(treilly)
(It appears my tab-fu is once again busted; sigh...)
OS: Mac OS X → Windows 7
Comment on attachment 512357 [details] [diff] [review]
Sketch of Box

BoxIsNull looks wrong, it returns true even for "invalid" values; at a minimum you should catch this in debug builds but I suspect you want to just catch it.  If they're invalid they're invalid, and you don't want to have the semantics change if you change representations.

I think this API will be robust even if we end up with multiple manifest number types (eg float, decimal).

The use of the prefix "box" on every method inside Boxer seems superflous and not a good idea.
Attachment #512357 - Flags: feedback?(lhansen) → feedback+
(In reply to comment #3)
> BoxIsNull looks wrong, it returns true even for "invalid" values; at a minimum
> you should catch this in debug builds but I suspect you want to just catch it. 
> If they're invalid they're invalid, and you don't want to have the semantics
> change if you change representations.

yeah, could be, I'll do a re-scrub. 
 
> I think this API will be robust even if we end up with multiple manifest number
> types (eg float, decimal).

Agreed, that's the idea.
 
> The use of the prefix "box" on every method inside Boxer seems superflous and
> not a good idea.

Yeah, probably right; the box prefix is a remnant from when the api was all standalone functions, rather than member functions. I'll try nuking them uniformly and see how usage looks.
Comment on attachment 512357 [details] [diff] [review]
Sketch of Box

In the time since TT, I've come to think of Box and Atom as implementations, and the abstract "thing" is simply an actionscript value.  So, the class (Boxer) would be a bag of operations that produce & consume values.  +1 to static methods on a class -- instance methods is too inflexible, methods in a namespace is to un-encapsulated.

If you're thinking about names, sadly, "Value" is taken (see framestate.h) but is easy to change if you want to, it doesn't leak outside the vm.

If you're thinking ahead to pluggable implementations, The many vm primitive functions that peek inside value would need changing, if the value encoding changes -- its not easy to abstract a switch statement.  If we want to support >1 value representation (via build switch), then the various operations also need to be customized.  

Ergo, the "operations" class probably will get big, and could divide into at least two -- operations we let VM embedders call (like Boxer), and ones we don't (don't put them in Boxer, or put them there and make them private).
Attachment #512357 - Flags: feedback?(edwsmith) → feedback+
(In reply to comment #6)
> In the time since TT, I've come to think of Box and Atom as implementations,
> and the abstract "thing" is simply an actionscript value.  So, the class
> (Boxer) would be a bag of operations that produce & consume values.  +1 to
> static methods on a class -- instance methods is too inflexible, methods in a
> namespace is to un-encapsulated.

Actually, instance methods are indeed what I'm proposing (though most methods could be transparently implemented as static methods)... why do you consider them too inflexible?

> If you're thinking about names, sadly, "Value" is taken (see framestate.h) but
> is easy to change if you want to, it doesn't leak outside the vm.

Originally I was going to call this "Value", but decided that it would make for weird discussions, eg "what's the value of this Value"? I'm not married to "Box" though, open for other suggestions (but probably not "Value")
 
> If you're thinking ahead to pluggable implementations, The many vm primitive
> functions that peek inside value would need changing, if the value encoding
> changes -- its not easy to abstract a switch statement.  If we want to support
> >1 value representation (via build switch), then the various operations also
> need to be customized.  
> 
> Ergo, the "operations" class probably will get big, and could divide into at
> least two -- operations we let VM embedders call (like Boxer), and ones we
> don't (don't put them in Boxer, or put them there and make them private).

Yep, that's precisely it; I'm *certain* that the VM itself will need much more access than what's here. This is intended to be a minimum viable set of calls that embedders will need. (Whether the VM-private calls are private or elsewhere is not clear to me; I'm leaning towards entirely-elsewhere, with "Box" being a name that is only meaningful to the embedder; under the hood, "Box" is simply a layer on top of "Atom" (currently) or "future-NaNbox-thing"....)
(In reply to comment #7)
> (In reply to comment #6)
> > In the time since TT, I've come to think of Box and Atom as implementations,
> > and the abstract "thing" is simply an actionscript value.  So, the class
> > (Boxer) would be a bag of operations that produce & consume values.  +1 to
> > static methods on a class -- instance methods is too inflexible, methods in a
> > namespace is to un-encapsulated.
> 
> Actually, instance methods are indeed what I'm proposing (though most methods
> could be transparently implemented as static methods)... why do you consider
> them too inflexible?

a manager class (Boxer) with instance methods seems ok.  I meant that instances methods on the value type is too inflexible (rules out "int" as the value type, for example)

> I'm not married to
> "Box" though, open for other suggestions (but probably not "Value")

I don't have concrete ones I'm in love with.  ASVal is better than Value.  spidermonkey uses "jsval".  The general case is that the handle is a reference (hence specializations for member, which wrap write barriers), so Ref (too generic), ASRef, etc come to mind.  Box is good enough for now I guess.

> Yep, that's precisely it; I'm *certain* that the VM itself will need much more
> access than what's here. This is intended to be a minimum viable set of calls
> that embedders will need. (Whether the VM-private calls are private or
> elsewhere is not clear to me; I'm leaning towards entirely-elsewhere, with
> "Box" being a name that is only meaningful to the embedder; under the hood,
> "Box" is simply a layer on top of "Atom" (currently) or
> "future-NaNbox-thing"....)

A lot of the VM should end up using the same abstraction.  Only where we need to know the encoding, should we see implementation names like Atom or NaNval, and that will be in code that itself is part of a module tied to the encoding.

"nanval" seems to be the defacto name for the nan encoding scheme.  nunval was a pun on it.
(In reply to comment #8)
> a manager class (Boxer) with instance methods seems ok.  I meant that instances
> methods on the value type is too inflexible (rules out "int" as the value type,
> for example)

Ah, right, agreed.
 
> I don't have concrete ones I'm in love with.  ASVal is better than Value. 
> spidermonkey uses "jsval".  The general case is that the handle is a reference
> (hence specializations for member, which wrap write barriers), so Ref (too
> generic), ASRef, etc come to mind.  Box is good enough for now I guess.

I could live with ASVal, or ASValue. Any other commenters have votes?
 
> A lot of the VM should end up using the same abstraction.  Only where we need
> to know the encoding, should we see implementation names like Atom or NaNval,
> and that will be in code that itself is part of a module tied to the encoding.

Agreed.
Comment on attachment 512357 [details] [diff] [review]
Sketch of Box

Should we hit the ground running with Box implemenated as a class larger than uintptr_t for DEBUG builds to ensure clean usage?   Like maybe stuff a dummy uintptr_t in from the of the real box or something.  +1 on ASValue.
Attachment #512357 - Flags: feedback?(treilly) → feedback+
-1 on ASValue.  AS3Value on the other hand...  There is more than one ActionScript outside Tamarin; no sense in confusing the matter.  If we are going to have to retain reference counting for AS2 objects but not have it for AS3 objects I'd like as little space for confusion as possible.

(AVMValue is no better, since there's the AVM- as well as the AVM+.)
AS3Value is hard to type :-(
This code is more aligned with the vm than the language so what about TValue (tamarin value).   The AS2 value class is called ScriptAtom so I think we're good if we avoid that ;-)
AS3Value will look pretty silly if we call some future language AS4
(In reply to comment #14)
> AS3Value will look pretty silly if we call some future language AS4

How so?  Either this hypothetical new language runs on the same VM and is largely compatible with AS3; AS3Value will probably continue to be just fine then.  Or it runs on another VM and is incompatible, and then we really need AS4Value.

(Most likely there will never be an AS4, just an evolution of AS3 over time.  At some point Marketing may choose to call a particular iteration along that line "AS4", but that's just marketing.)
We could always go left-field and call it some random noun, eg "Cookie"...
I'll just ramble out a few more neutral names:  vmval, avmval.  add 'ue' if val over-abbreviates, add caps to taste.
Is it a given that this thing lives in the avmplus namespace?   We started putting public stuff in MMgc::GCAPI and forcing it to be open for GCRef/GCMember thinking to a future where MMgc:: isn't used.

-1 on avmval if its in the avmplus (avmplus::avmval?) namespace.  +1 if not.   Maybe we could have an avmplus::API namespace that's meant to always be open that it lives in.
Left field: orb, cog, base, note
(In reply to comment #18)
> Is it a given that this thing lives in the avmplus namespace? 

No. My prototypes so far are putting new, opaque types into the "avm" namespace where possible.
Okay so then I'd vote for val, value or Value in that order.    

I wonder if the GCMember, GCRef etc types should move from MMgc::GCAPI to the avm namespace (unified front theory).
Originally I was going with "avm::Value", but decided that (even namespaced) "Value" was too generic. Arguments to the contrary?
more pseudo random names; box,tub,vat,silo,can,crate...add prefix if you like e.g. tamcan, vmcan, vmcrate etc...but seriously, my vote so far is for vmval.
I haven't found any name I like substantially more than "Box", so unless someone presents a compelling argument against it, that's what I'm going with.
(In reply to comment #24)
> I haven't found any name I like substantially more than "Box", so unless
> someone presents a compelling argument against it, that's what I'm going with.

I dislike "Box."  To me, the term "box" is strongly tied to the presence of a pointer to a payload value, in the case of either (1) a mutable single-entry cell, or (2) implementation strategy for bignums and/or doubles.  I don't think your intent with this API is to bring either topic to mind.

Are you just returning back to the issue of english statements like "what is the value of that Value?"  (I would argue such problematic statements are easily replaced, at least in writing, by phrases like "what does that Value denote" ; off-the-cuff speech is a separate problem less easily addressed.)

Even assuming that "Value" is no good, we can still do better.  This is the top of our dynamic type hierarchy, our universal type, right?  So perhaps "Univ", "Unival", "Any", "Top", or (sigh) "Obj"?

If we prefer to start with the term "Atom" and come up with something related to that: "Quark", "Hadron", "Ion", "Element", "Particle", "Unit" (though I dislike this for other reasons), " 

Or best of all, going back to "Value" again: how about "Datum"?
(In reply to comment #25)
> I dislike "Box."  To me, the term "box" is strongly tied to the presence of a
> pointer to a payload value, in the case of either (1) a mutable single-entry
> cell, or (2) implementation strategy for bignums and/or doubles.  I don't think
> your intent with this API is to bring either topic to mind.

I don't see either connotation, personally. If other people agree, though, I'll listen.

> Even assuming that "Value" is no good, we can still do better.  This is the top
> of our dynamic type hierarchy, our universal type, right?  So perhaps "Univ",
> "Unival", "Any", "Top", or (sigh) "Obj"?

I greatly want to avoid an abbreviation or acronym, and something terse is highly desirable IMHO. That rules out "Univ" and "Unival". "Any" is too wildcard-y. "Top" I just don't get at all.
 
> If we prefer to start with the term "Atom" and come up with something related
> to that: "Quark", "Hadron", "Ion", "Element", "Particle", "Unit" (though I
> dislike this for other reasons), " 

I toyed with some of those but they all seem a little too precious. This is a type we're going to see a *lot* of; an annoying name is going to be *really* annoying. 
 
> Or best of all, going back to "Value" again: how about "Datum"?

Well, I guess it does sound a little like "Atom"... but my initial response is "bleah", as common usage has evolved to the point where even people who know better use "Data" for both singular and plural, leaving "Datum" orphaned...
(In reply to comment #26)
> (In reply to comment #25)
> > I dislike "Box."  To me, the term "box" is strongly tied to the presence of a
> > pointer to a payload value, in the case of either (1) a mutable single-entry
> > cell, or (2) implementation strategy for bignums and/or doubles.  I don't think
> > your intent with this API is to bring either topic to mind.
> 
> I don't see either connotation, personally. If other people agree, though, I'll
> listen.

I forgot about the third instance of the pointer-to-payload pattern where the term "box" arises, namely (3) Java-style boxing classes that hold a single field of the appropriate primitive type, e.g. Boolean, Integer, etc.  (Arguably (2) and (3) could probably be merged into one case if one squinted hard enough at it.)

Is this third case, or something similar in .NET, the source of your use of the term "box"?  (Just curious.)

My goal is not exhaustively list all the cases where the pointer-to-payload pattern arises; I'm just stating that I tend to associate the word "box" with that pattern.  (Of course, I'm not the target customer for your API.)
If we switch from molecular to biological, how about "cell"?
(Is there any particular reason you could not make "avm::Atom" and "avmplus::Atom" coexist?  You can safeguard the abstraction by defining the former as some structure type now and again.)

I'm with Felix; "Box" is an implementation term.  "Atom" at least has the benefit of being conventional.  "Value" describes what it is: an immutable value (whose representation could be in-line or boxed).  "Cell" is usually used about an updateable location; not the connotation you want.

In regard to "AS3Value is hard to type", how often do you write these things?  Almost never, once you've ported the glue code.  People seem to write MMgc::Blah hundreds of times without thinking, when "using namespace MMgc" at the top of the file would have been the obvious thing to do.  Frankly I think people don't care a lot about whether something is easy to type; they will read it thousands of times for every time it's written.
Having slept on it, I agree that Atom and Box are implementation terms, and my money is on Value or Val, with namespace qualification if necessary.  Renaming avmplus::Value to FrameValue (its a verify-only helper class) would get it out of the way easily.

Or, what Lars said: just seal up Atom and we can anoint it as the api name, changing the implementation name to something else, later. (e.g. TaggedPtrAtom... etc, not a big deal if it is well encapsulated).
OK, I'll revisit "Value" as the name, and rename FrameValue so there's no confusing "avmplus::Value" of any sort, and see how it feels.
Attached patch Revised to Value (obsolete) — Splinter Review
Mostly nomenclature changes, with a few other tweaks.
Attachment #512357 - Attachment is obsolete: true
Attachment #516038 - Flags: review?(edwsmith)
Attachment #516038 - Flags: feedback?(lhansen)
Not strictly necessary, but will definitely avoid possibilities of ambiguity with avm::Value
Attachment #516125 - Flags: review?(edwsmith)
Comment on attachment 516125 [details] [diff] [review]
rename avmplus::Value to avmplus::FrameValue

Cool, I'm in favor of landing this either way, sooner than later.
Attachment #516125 - Flags: review?(edwsmith) → review+
Comment on attachment 516038 [details] [diff] [review]
Revised to Value

I like the structure a lot so when I F- it is for the following reason:

I dislike abbreviations intensely; they almost never pay for themselves.  They introduce confusion.  They create the need for mental mapping where none should be needed.  They tax a thousand readers for every writer who gets to shave off a second.

Some examples, by no means exhaustive:

toStr / fromStr etc should be toString / fromString.
toNs / fromNs should be toNamespace / fromNamespace.
getObj should be getObject.
getNumber should be called getDouble, probably (cf fromDouble, toDouble)
Attachment #516038 - Flags: feedback?(lhansen) → feedback-
I could go either way on abbreviations, but I have a vague memory of Edwin advocating them in this case; Edwin, do I recall correctly? If so, let's hear a counterargument. If not, I'll unabbreviate.
Hows this for a counter-counterargument:  To quote Brendan, "that's not the hill we are going to die on", I would just make the changes.

The google style guide (*) says about the same thing as Lars, with the only exception being made for extremely idiomatic uses, like a "Ptr" suffix on smart pointer classes, vs Pointer (their example, not mine), or "i" and "n" for iteration. (*) The great thing about style guides is there's so many to choose from.

When words get really long, and get mentioned a lot, is when I get tempted (Instruction vs Instr, for example).  Lars's examples are not like that, even the full words aren't very long.  When synonyms are available, we should just choose simple short words instead of abbreviating long ones.

I'd add ValueMgr to the list -- Manager is okay.  (ExecMgr notwithstanding.. I volunteer to rename it).

Moreover, it pays to err on "fullness" in an API, even if one starts abbreviating in internal code -- more readers justifies more formality.
Comment on attachment 516125 [details] [diff] [review]
rename avmplus::Value to avmplus::FrameValue

TR 6029:c74137815a52
(In reply to comment #37)
> When words get really long, and get mentioned a lot, is when I get tempted
> (Instruction vs Instr, for example).  Lars's examples are not like that, even
> the full words aren't very long.  When synonyms are available, we should just
> choose simple short words instead of abbreviating long ones.

Fair enough. (Though I want to keep "Bool" vs "Boolean")
(In reply to comment #35)
> getNumber should be called getDouble, probably (cf fromDouble, toDouble)

I disagree on this one: it is symmetric with the underlying Value type, which is "Number"; the idea of the "get" methods is that there is no coercion done (as opposed to the "to" methods, which coerce to the return type).

Granted, this is a bit confusing, especially since a "Number" Value might actually be stored as an int (or double) representation, while GetNumber() always returns as a double, but the alternative (GetDouble) implies that isNumber() implies a double storage type.
Attached patch Value, v2 (obsolete) — Splinter Review
de-abbreviated, but Bool and getNumber left as-is
Attachment #516038 - Attachment is obsolete: true
Attachment #516366 - Flags: review?(edwsmith)
Attachment #516366 - Flags: feedback?(lhansen)
Attachment #516038 - Flags: review?(edwsmith)
(In reply to comment #40)
> 
> ... but the alternative (GetDouble) implies that
> isNumber() implies a double storage type.

Why should it not?  Number will never be anything other than a "double".  Or is there something more specific that sentence is saying than I think it's saying?
The point here is that all the other "get" methods correspond syntactically to a kValueType... eg kValueTypeBool -> getBool(), kValueTypeString -> getString(), etc. Thus we have kValueTypeNumber -> getNumber(), rather than kValueTypeNumber -> getDouble().
Comment on attachment 516366 [details] [diff] [review]
Value, v2

As usual the value of calling it _values (rather than "values" or something else) eludes me.

The AS3 type is called "Boolean" (and the constant is even kBooleanType), not "Bool", so "isBool", "getBool" etc remain misnamed.  (Obviously the ones operating on C++ "bool" are exceptions there.)

Otherwise no particular misgivings.
Attachment #516366 - Flags: feedback?(lhansen) → feedback+
Comment on attachment 516366 [details] [diff] [review]
Value, v2

> _values

Sadly, all three styles are used in AvmCore.  Since it is an instance of ValueManager, how about calling it m_value_manager or valueManager?  The existence of a getter argues for m_value_manager.

I agree with Lars about Boolean, in the context of AS3 type names, it should be spelled Boolean.  In the context of C++ types or machine/storage types, it 'bool' or 'bool32' would be correct.

To test this, you should add a dummy field to struct Value{} and make sure everything compiles/runs okay.  No client of Value should care about its size or copy-by-value vs by-const-ref.

kValueTypeInvalid -- ValueMember fields can be in this state and visible to both ABC and native code, during object construction.  In that case, the whole Value must be zero'd -- any nonzero bits are definitely a bug.

ValueManager could use a comment explaining what its for.  Particularly, why it is a class with state, and why all methods are nonstatic, nonconst even though many could be static and/or const.  (future-proofing the api, right?)

need to edit avmplus2008.vcproj too, and maybe the 2010 project file.
Attachment #516366 - Flags: review?(edwsmith) → review+
(In reply to comment #45)
> Sadly, all three styles are used in AvmCore.  Since it is an instance of
> ValueManager, how about calling it m_value_manager or valueManager?  The
> existence of a getter argues for m_value_manager.

standardizing on m_ for private members is probably best.
 
> I agree with Lars about Boolean, in the context of AS3 type names, it should be
> spelled Boolean.  In the context of C++ types or machine/storage types, it
> 'bool' or 'bool32' would be correct.

Fair enough, willfix.

> To test this, you should add a dummy field to struct Value{} and make sure
> everything compiles/runs okay.  No client of Value should care about its size
> or copy-by-value vs by-const-ref.

Okey doke.
 
> kValueTypeInvalid -- ValueMember fields can be in this state and visible to
> both ABC and native code, during object construction.  In that case, the whole
> Value must be zero'd -- any nonzero bits are definitely a bug.


Will add as comment.
 
> ValueManager could use a comment explaining what its for.  Particularly, why it
> is a class with state, and why all methods are nonstatic, nonconst even though
> many could be static and/or const.  (future-proofing the api, right?)

Future-proofing is indeed the goal.
Attached patch Value, v3Splinter Review
Version tweaked with suggested, some other cosmetic cleanup. Not requesting a re-review, but feedback welcomed. Not planning on landing it until I have something ready to use it (hopefully soon)
Attachment #516366 - Attachment is obsolete: true
Moving to Future for now.  Highly desirable but holding change until it is required or we can focus a small team on ANI.
Assignee: stejohns → nobody
Flags: flashplayer-qrb+
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Target Milestone: --- → Future
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: