Closed Bug 617252 Opened 14 years ago Closed 14 years ago

Optimistically stack allocated doubles

Categories

(Tamarin Graveyard :: Virtual Machine, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: alexmac, Unassigned)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4
Build Identifier: 

Tight loops in as3 code that contain calls to op_add resulting in double atoms are very slow due to all of the doubles that end up being allocated.

This patch adds a new codepath for op_add which instead of returning a newly allocated double Atom will instead store the value through a pointer to a pre-allocated Atom. The AOT compiler will optimistically allocate an double *on the stack* so that a call to op_add never needs to perform a heap allocation. Clearly this stack allocated Atom can't outlive its callframe so it is promoted back to a heap allocation after one of three situations arrises:

1) Atom is returned out of this call frame
2) Someone calls incr_atom on it (indicating a store to a property or similar escape from the callframe)
3) The Atom is thrown

Currently this is only used by AOT, but this optimization could also be done in the JIT. I don't have any up to date numbers on the performance right now but I'll do a performance run tomorrow and update the bug.

Reproducible: Always
Attachment #495759 - Flags: review?(stejohns)
Attachment #495759 - Flags: review?(edwsmith)
Comment on attachment 495759 [details] [diff] [review]
Optimistic Double Atoms

I don't see any obvious flaws, but the proper reviewers are probably Ed (for JIT implications) and Lars (for Atom implications)... I suspect that having the possibility of double-atom-on-stack will have ramifications for ExactGC.
Attachment #495759 - Flags: review?(stejohns) → review?(lhansen)
I'd be much happier if a change like this were encapsulated.  Spreading it through AvmCore and messing with the write barrier is asking for trouble.

In short:

  +1 on allocating a double on the stack and lazy-creating an Atom
  -100 on hijacking Atom to do it.

potential problems:

* this can promote an atom (good) but it doesn't save the promoted heap allocated atom; the result could be a penalty where the same stack allocated double gets promoted over and over.

* At some point soonish, we want to support doing exact marking of stack frames.  If the JIT doesnt know if a double atom is heap allocated or not, that complicates exact marking.

* what happens if you miss one of the ways a stack allocated atom can escape to the heap?

I would rather see us spend the engineering time on using a different encoding from Atom, in localized areas, and specializing the VM narrowly in those areas, than to "play with fire" by stack allocating something that many separate pieces of code currently assume is heap allocated.

One possible avenue is to align with the JIT.  Instead of creating an Atom in the first place (on the stack or the heap), instead create a double and a separate "kind" value.  Pass those values to helper functions which know what they are, and that can work on them -- delay creating a heap allocated Atom as much as possible.

Another way to do that is to just define a struct wrapper for the stack allocated double, and make helpers that work on it in a type-safe way.
Blocks: 616780
(In reply to comment #3)
> I'd be much happier if a change like this were encapsulated.  Spreading it
> through AvmCore and messing with the write barrier is asking for trouble.
> 
> In short:
> 
>   +1 on allocating a double on the stack and lazy-creating an Atom
>   -100 on hijacking Atom to do it.

we're not lazy creating the Atom, we're lazily heap-allocating the double when it is known to escape the callframe the double was stack-allocated in... I guess you could say we're lazily creating the "real" Atom.

> 
> potential problems:
> 
> * this can promote an atom (good) but it doesn't save the promoted heap
> allocated atom; the result could be a penalty where the same stack allocated
> double gets promoted over and over.

True, but in practice we've not seen this, and even if we did the "penalty" would simply bring us back to square one modulo some small hit due to the extra check in incr_atom.

> 
> * At some point soonish, we want to support doing exact marking of stack
> frames.  If the JIT doesnt know if a double atom is heap allocated or not, that
> complicates exact marking.

can you be more clear about soonish? we're hoping to do this optimization for spicy...

> * what happens if you miss one of the ways a stack allocated atom can escape to
> the heap?

As I'm sure you can imagine this would result in bogus data/crash.... bad in other words :)

> I would rather see us spend the engineering time on using a different encoding
> from Atom, in localized areas, and specializing the VM narrowly in those areas,
> than to "play with fire" by stack allocating something that many separate
> pieces of code currently assume is heap allocated.
> 
> One possible avenue is to align with the JIT.  Instead of creating an Atom in
> the first place (on the stack or the heap), instead create a double and a
> separate "kind" value.  Pass those values to helper functions which know what
> they are, and that can work on them -- delay creating a heap allocated Atom as
> much as possible.
> 
> Another way to do that is to just define a struct wrapper for the stack
> allocated double, and make helpers that work on it in a type-safe way.

Lets chat about this offline, I'm not entirely sure how this "kind" tracking the JIt is doing works, or how this would be different from what we've done...
(In reply to comment #4)
> (In reply to comment #3)

> we're not lazy creating the Atom, we're lazily heap-allocating the double when
> it is known to escape the callframe the double was stack-allocated in... I
> guess you could say we're lazily creating the "real" Atom.

Yep, I understood that.  Delaying the heap allocation is what I meant.  I'm seriously concerned about all sorts of bad things that will go wrong when generic code gets ahold of a kDoubleType atom that doesn't point into the heap.

> True, but in practice we've not seen this, and even if we did the "penalty"
> would simply bring us back to square one modulo some small hit due to the extra
> check in incr_atom.

I'm thinking of something like this, with many variants possible:

   x = /* stack-allocate a double atom */
   for (i=0; i < bazillion; i++)
      a[i] = x /* heap-allocation when incr_atom called internally */

> > * At some point soonish, we want to support doing exact marking of stack
> > frames.  If the JIT doesnt know if a double atom is heap allocated or not, that
> > complicates exact marking.
> 
> can you be more clear about soonish? we're hoping to do this optimization for
> spicy...

nigel, not serrano or spicy. 

> Lets chat about this offline, I'm not entirely sure how this "kind" tracking
> the JIt is doing works, or how this would be different from what we've done...

Sounds good.

You can tell from my reaction that I think this would be fragile, and will further over-constrain what we can do with the already over-constrained Atom encoding, even if its a short-term win.  I'll try to think up more risks.

I'm still gung-ho to find ways not to heap allocate doubles in hot loops, preferrably ways we can use in the JIT too, and start supporting cleanly throughout the VM code.
(In reply to comment #3)
> In short:
> 
>   +1 on allocating a double on the stack and lazy-creating an Atom
>   -100 on hijacking Atom to do it.
> 
> * At some point soonish, we want to support doing exact marking of stack
> frames.  If the JIT doesnt know if a double atom is heap allocated or not,
> that complicates exact marking.

This is a subtle point and exact stack marking is not going to happen this year (Nigel is the earliest, and we're only going to do it if we can't effectively time the stack scanning so as to reduce the impact of conservative scanning), but if a slot is designated as an Atom slot then the pointer, if not null, *must* point into GC-allocated storage.  Anything else is going to be a recipe for disaster or worse performance.
What is the rationale for using

>  template <bool useOptimistic...>
>  ...
>     if (useOptimistic...) {} else {}

as opposed to #ifdef, which is used everywhere else?  Not that I mind, necessarily, (its prettier), but it seems like an outlier.
Comment on attachment 495759 [details] [diff] [review]
Optimistic Double Atoms

On the whole, I think this is a seriously dangerous hack, and I'd hate to see it in the code.  The patch is safe insofar that it ifdefs the necessary pieces of code.

But, it puts #ifdefs in code that we're actively working on (GC, exact marking, write barriers, etc), and the patch also doesn't include tests which heavily exersize the functionality and ensure that its safe.  (I do think those tests should go in tamarin).

R- for those second reasons.  I think with sufficient explanation in comments, tests, then I could be sold to land the code.  But really, I'd rather pursue ways to avoid allocating doubles entirely, that both AOT and JIT's can use.
Attachment #495759 - Flags: review?(edwsmith) → review-
(In reply to comment #8)
> On the whole, I think this is a seriously dangerous hack, and I'd hate to see
> it in the code.  The patch is safe insofar that it ifdefs the necessary pieces
> of code.

To be pedantic regarding the review process: any code that qualifies as "dangerous hack" should automatically get R-, regardless of ifdef-ing. (To paraphrase Brendan, "ifdef is not the root password to hg-push".) 

Please don't interpret this as an attack on this code or the author, but this code feels extremely risky and fragile in its current form; if I give it an R+, I'm implicitly saying "I think this patch will make the code better", but I don't think that's the case.
Comment on attachment 495759 [details] [diff] [review]
Optimistic Double Atoms

Several problems here:

- Code assumes stack lives at higher addresses than heap.  This is not assumed
  anywhere else, and in fact we should expect this not to be true as we go
  increasingly multi-threaded.

- Further slows down reference counting by adding a call on the hot path in
  incr_atom.

- Introduces an ad-hoc notion of objects that may be relocated.  I understand
  identity does not matter in this case but I don't care for it, it means all
  calling code has to be careful not to cache values in registers / locals
  across write barrier calls and that kind of invariant is system-wide and
  something we'd want to consider carefully.

- Makes tagged storage point into the stack, which is illegal as far as
  partly-exact GC is concerned.  (Partly-exact GC has now landed in the VM
  and will soon land in the Player.)  I understand that tagged pointers
  pointing into the stack aren't supposed to escape into the heap so it's
  not a problem at present; it will become a problem if/when we implement
  precise scanning of JIT/interpreter stack frames.
Attachment #495759 - Flags: review?(lhansen) → review-
Given the potential conflicts with future GC work I'm going to mark the bug won't fix. Hopefully we can find other ways of reducing the number of untyped adds or improve the efficiency of AllocDouble (couldn't it just be a simple pointer bump allocator in a page dedicated to storing doubles?)
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
(In reply to comment #11)
> Given the potential conflicts with future GC work I'm going to mark the bug
> won't fix. Hopefully we can find other ways of reducing the number of untyped
> adds or improve the efficiency of AllocDouble (couldn't it just be a simple
> pointer bump allocator in a page dedicated to storing doubles?)

Generally speaking we're moving toward pointer-bumping allocators, hopefully a lot of that work will happen in 2011.  AllocDouble may or may not be an easy special case that we could handle even if we don't introduce it generally.
bulk verifying resolved !fixed issues
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: