Closed
Bug 572204
Opened 15 years ago
Closed 6 years ago
String could be independent of AvmCore.
Categories
(Tamarin Graveyard :: Virtual Machine, enhancement)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: kpalacz, Assigned: kpalacz)
References
Details
Attachments
(2 files)
30.37 KB,
patch
|
stejohns
:
feedback-
|
Details | Diff | Splinter Review |
35.87 KB,
patch
|
edwsmith
:
feedback+
|
Details | Diff | Splinter Review |
Currently String depends on AvmCore, mostly for storage of its 'statics' (i.e., data shared by all instances in a given AvmCore). If String-related aspects of AvmCore were factored out of it, it would be potentially possible to use Strings before AvmCore is initialized. Potentially, it could also be a first step towards sharing String instances across AvmCores (a multiprocessing optimization).
Assignee | ||
Comment 1•15 years ago
|
||
Experimental work in progress (ended up way beyond the intended scope), posting for early feedback, since the implementation technique is likely to be controversial. The patch overwrites every String's C++ vtable pointer to point into a heap-allocated object, which can hold the String's statics.
The statics can then be found by indexing off of that object, instead of an instance of AvmCore.
Several consequences of the approach were found and addressed in the patch.
This technique could be also used to allow virtual C++ methods and AS3 method pointers to coexist in a single vtable (an alternative to the approach in bug 566043) while saving one word per object (the actual memory footprint reduction would be relatively sensitive to the number of virtual methods in Object).
Assignee: nobody → kpalacz
Attachment #451385 -
Flags: feedback?(stejohns)
Attachment #451385 -
Flags: feedback?(rreitmai)
Attachment #451385 -
Flags: feedback?(edwsmith)
Comment 2•15 years ago
|
||
> The patch overwrites every String's C++ vtable pointer
I have not reviewed the patch yet, but this scares the HELL out of me... :-)
Comment 3•15 years ago
|
||
Comment on attachment 451385 [details] [diff] [review]
Separation of AvmCore and String using overriding of vtable pointers.
While I laud the goal, I'd be... reluctant... to use this approach in real code. It may work in some (or even most) compilers, but overwriting the C++ vtable is a recipe for pain somewhere down the road, guaranteed. But... if we want to do this, why not roll our own? Make a struct (with no virtual methods) and fill in an explicit vtable-like structure.
Attachment #451385 -
Flags: feedback?(stejohns) → feedback-
Assignee | ||
Comment 4•15 years ago
|
||
Uses a thread local var to find the StringFactory, which may or may not be the same object as AvmCore (but only in the cases where a thread local var was already being used). StringFactory would probably be better off as a component of AvmCore (rather than an ancestor).
Attachment #452008 -
Flags: feedback?(stejohns)
Attachment #452008 -
Flags: feedback?(edwsmith)
Assignee | ||
Comment 5•15 years ago
|
||
(In reply to comment #3)
> (From update of attachment 451385 [details] [diff] [review])
> While I laud the goal, I'd be... reluctant... to use this approach in real
> code. It may work in some (or even most) compilers, but overwriting the C++
> vtable is a recipe for pain somewhere down the road, guaranteed. But... if we
> want to do this, why not roll our own? Make a struct (with no virtual methods)
> and fill in an explicit vtable-like structure.
It would be interesting to look into adopting or reinventing a part of, say, GObject
(http://en.wikipedia.org/wiki/GObject), but seems like this this would entail convincing MMgc to give up on C++ virtual dispatch, reimplementing String finalization, or living with an extra word of overhead per String.
Caution definitely legitimate here, however, we are already in the business of second-guessing C++ compilers.
What would be the likely scenarios that would invalidate this technique without simultaneously confusing MMgc? The compiler can overwrite the vtable pointer, and in fact, it looks like g++ indeed does it in constructors and destructors, but that can be compensated for.
While the cost/benefit calculation in this particular case may not be convincing, if the technique proves feasible, it could potentially be used to shave a word from every ScriptObject, which is tempting.
Comment 6•15 years ago
|
||
Is there much point in making it independent of AvmCore if it's not independent of a GC instance?
If it's independent of a GC instance, we're going to get into accounting trouble - somebody needs to account for the large volume of string data. The GC works by knowing about storage volume.
MMgc will not give up on virtual dispatch for the foreseeable future, I think. It is certainly not something I'm keen on fitting into the other work that's ongoing, unless there's a significant benefit. We've discussed decoupling MMgc from the hierarchy of GC'd storage by removing the requirement to subclass a GC class, and I think that's a good idea, but it's not on the roadmap for 10.2 and we'd have to have a compelling reason to do it (functionality, stability, performance, memory use) and some data to back that up.
How are we already in the business of second-guessing C++ compilers?
G++ overwrites vtables in constructors and destructors because C++ semantics call for it, I believe - something to do with the visibility of virtual methods inside those methods.
Assignee | ||
Comment 7•15 years ago
|
||
(In reply to comment #6)
> Is there much point in making it independent of AvmCore if it's not independent
> of a GC instance?
I think there is. We could (1) start making strings after GC is created but before AvmCore is created, (2) make strings in a separate GC instance (3) templatize String with allocation strategies to allow a malloc version. It seems to me that (1) could be actually acceptable for bootstrapping a configuration mechanism that would apply to GC itself, because although several GC options set at GC instantiation, fewer are actually read until later, so perhaps the initialization could be delayed (and GC could run with default settings until configured). If that's not acceptable, there's (2) or (3).
[...]
> How are we already in the business of second-guessing C++ compilers?
AFAIK, C++ puts very few restrictions on the object layout (unless it's a POD type in C++ speak), it doesn't require vtables (MMgc has some debugging code that expects them), and it doesn't even require that objects be laid out contiguously. So, for example, in principle, an object could be represented as a linked list connected using hidden compressed pointers, which won't necessarily look like pointers to MMgc. We're assuming that this doesn't happen, but I think theoretically it could. I can't say if compressing visible pointers would be legal or not, but it might.
>
> G++ overwrites vtables in constructors and destructors because C++ semantics
> call for it, I believe - something to do with the visibility of virtual methods
> inside those methods.
I believe the C++ definition calls for the particular visibility semantics of virtual methods, but it doesn't mandate that this be implemented by overwriting vtable pointers (not that I could come up with an alternative implementation strategy, but that's no proof of absence).
Comment 8•15 years ago
|
||
As a potential user, I could get by with AvmPlus strings for many purposes unless I needed to operate on multiple threads. And in this case being bound to a GC would be problematic as well.
Comment 9•15 years ago
|
||
(In reply to comment #6)
> G++ overwrites vtables in constructors and destructors because C++ semantics
> call for it, I believe - something to do with the visibility of virtual methods
> inside those methods.
*all* compliant C++ compilers do this -- they are required to. See http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.5
Comment 10•15 years ago
|
||
(In reply to comment #6)
> Is there much point in making it independent of AvmCore if it's not independent
> of a GC instance?
In theory, this would allow String to be used by Flash's AS1/AS2 engine, which relies on MMgc but not AvmCore. (In practice, it's unclear whether such a rewrite would happen.)
Making String separate from AvmCore is conceptually appealing, but it's not clear to me there's a likely compelling use-case at the moment.
Comment 11•15 years ago
|
||
(In reply to comment #10)
> (In reply to comment #6)
> > Is there much point in making it independent of AvmCore if it's not independent
> > of a GC instance?
>
> In theory, this would allow String to be used by Flash's AS1/AS2 engine, which
> relies on MMgc but not AvmCore. (In practice, it's unclear whether such a
> rewrite would happen.)
And, again, we would want a string class that could be used regardless of thread environment and without having to think about write barriers. MMgc dependence is a deal-killer for a general-purpose string class.
Assignee | ||
Comment 12•15 years ago
|
||
If there are more use cases for non-MMgc'd avmplus::String, perhaps it's worthwhile to parametrize this class by allocation strategies, non unlike avmpus::List. The original reason for this work was bug 570919, which would also benefit.
Assignee | ||
Comment 13•15 years ago
|
||
(In reply to comment #9)
> (In reply to comment #6)
> > G++ overwrites vtables in constructors and destructors because C++ semantics
> > call for it, I believe - something to do with the visibility of virtual methods
> > inside those methods.
>
> *all* compliant C++ compilers do this -- they are required to. See
> http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.5
They can't be required to overwrite what they're not required to have, but that's how the mandatory source language semantics can implemented.
Comment 14•15 years ago
|
||
(In reply to comment #12)
> If there are more use cases for non-MMgc'd avmplus::String, perhaps it's
> worthwhile to parametrize this class by allocation strategies, non unlike
> avmpus::List.
Yep. Of course, such strings couldn't be used interchangeably, but maintaining a single codecase and API is still a good thing. I'll defer to Stan as to whether he thinks that having two flavors of String (gc and non-gc) would help or hinder his Flash work....
Comment 15•15 years ago
|
||
The usual problem people have is that there are too many string classes but that none is sufficiently general to meet their general-purpose needs. So in practice, introducing a new string type just sets everybody back (more choices) unless you can go in and replace one or more of the existing string classes with the new one.
I've surveyed the strings I'm dealing with and I think the best strategy, for us anyway, is to nominate an existing string and generalize it. It turns out there is a great candidate for that.
So in this case generalizing the avm string wouldn't really help much and, to the extent that it introduces one more choice that developers don't need, it's probably a net loss. That's likely to be the case for other projects as well, I bet.
Not using an AVM-based string probably means walking away from rep-sharing opportunities but the more I study that the more I'm convinced that rep-sharing is a net loss: the overheads of sharing (especially if you start to think about threading) are significant in locality-of-reference, interlocked bus transactions, cache misses and pipeline stalls. Most strings are short, memcpy is fast, and "const String&" parameters cover most interesting "sharing" cases. Allocator overheads are not to be sneezed at, but making the allocator faster is always a win and there are lots of good techniques for that.
Assignee | ||
Comment 16•15 years ago
|
||
(In reply to comment #15)
So I went through similar reasoning and thought that generalizing avmplus::String is the right thing to do, but that's the only choice available from the point of view of a standalone Tamarin build. If I understand correctly, you are saying that there's another choice? Would it be usable in standalone Tamarin?
[...]
> Not using an AVM-based string probably means walking away from rep-sharing
> opportunities but the more I study that the more I'm convinced that rep-sharing
> is a net loss: the overheads of sharing (especially if you start to think about
> threading) are significant in locality-of-reference, interlocked bus
> transactions, cache misses and pipeline stalls.
Also, in the case of avmplus::String at least, rep-sharing entails internal polymorphism, which results in manual dispatch on type and longer code paths for frequent operations, which makes them less likely candidates for JIT inlining.
Comment 17•15 years ago
|
||
re: rep-sharing, this paper (recent work, too) makes a good case against rep-sharing, in favor of plain allocated strings, in Java, for use cases simliar to ones that matter for Flash:
http://www.christianwimmer.at/Publications/Haeubl10a/
Comment 18•15 years ago
|
||
Comment on attachment 451385 [details] [diff] [review]
Separation of AvmCore and String using overriding of vtable pointers.
withdrawing f? request due to previous comments. If MMgc vtable pointer is here to stay for the near future due to strings being Rc objects, this seems not very feasible.
Attachment #451385 -
Flags: feedback?(edwsmith)
Comment 19•15 years ago
|
||
(In reply to comment #15)
> I've surveyed the strings I'm dealing with and I think the best strategy, for
> us anyway, is to nominate an existing string and generalize it. It turns out
> there is a great candidate for that.
What is the candidate? I take it that you don't mean avmplus::String
The motivating use case here is strings needed for parsing configuration data during VM bootstrapping. A usable string class could also simplify commandline argument processing, maybe, but its not a motivating use case for now.
one more use case for non-avmplus::String strings, (again, maybe) is in the type system; symbol tables (MultinameHashtable) consist of Namespace and String instances; this is heavyweight and painful for bootstrapping. A symbol table not composed of RCObjects would facilitate fixed-allocation or arena allocation since we know the death time of these items. (its when their owning PoolObject is finalized).
Factoring String junk out of AvmCore into a manager class is good hygene if nothing else. Agree that it should be a member, not a base class. One might even argue for it being a GC member, or related to a GC instance, instead of AvmCore.
Updated•15 years ago
|
Attachment #452008 -
Flags: feedback?(edwsmith) → feedback+
Comment 20•14 years ago
|
||
Comment on attachment 451385 [details] [diff] [review]
Separation of AvmCore and String using overriding of vtable pointers.
removing self from feedback...let me know if you want to revisit this patch.
Attachment #451385 -
Flags: feedback?(rreitmai)
Comment 21•14 years ago
|
||
Comment on attachment 452008 [details] [diff] [review]
More conventional approach.
Removing myself from feedback since this bug appears to be dormant...
Attachment #452008 -
Flags: feedback?(stejohns)
Updated•13 years ago
|
QA Contact: dschaffe → brbaker
Comment 22•13 years ago
|
||
I have a candidate String class that is designed to interoperate well with StxxxString and which does not use the GC. Please discuss with me before proceeding any further on this bug.
Updated•13 years ago
|
QA Contact: brbaker → vm
Comment 23•13 years ago
|
||
Note that this bug gets a +1 from the ANI effort b/c it wants to hide AvmCore completely from the avmplus public interface.
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•