Closed Bug 616780 Opened 14 years ago Closed 14 years ago

AOT optimizations

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED INVALID

People

(Reporter: alexmac, Unassigned)

References

Details

Attachments

(1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4
Build Identifier: 

This patch contains various tweaks to the VM to support some of the AOT optimizations we've been implementing over the past month.

- when an AbcEnv is first created for a given abc we parse all the multinames, AOT generated code digs static multinames directly out of this cache.

- Each AbcEnv also contains a GenericGuard (patch coming soon!) used to do exception based caching of pushstrings and finddefs

- Faster argument array creation without needing a redundant copy of the atoms

- Optimistic double Atoms: the AOT compiler now provides a heap-allocated double so that calls to op_add never need to perform heap allocations. A modified version of op_add takes a pointer to this heap allocated double and fills it in with the result producing an Atom that points to this optimistic double. The AOT compiler promotes this stack-allocated double to a heap-allocated double whenever an atom that references it escapes.

I've push the body of op_add into a templatized "op_add_impl" function so it can be shared by the normal and optimistic variants of op_add. I validated that gcc does indeed inline op_add_impl into the two varianst of op_add and that the resulting asm is the same as the original.

- cdecl didn't change much but I re-indented it to better follow the styles used elsewhere (verify by doing a whitespace ignoring diff)

- static mops storage: allows you to use one fixed size static buffer for the alchemy/mops opcodes to work on, it also allows you to disable range-checking. Obviously neither of these apply to the shipping flash player/AIR, but on certain platforms like iOS where we generate native apps with captive runtimes that can't execute untrusted code we want to expose an option for people to use this as it allows the AOt compiler the opportunity to generate significantly faster code from alchemy'ed swfs.

Reproducible: Always
Attached patch AOT optimizations (obsolete) — Splinter Review
Attachment #495334 - Flags: review?(stejohns)
Attachment #495334 - Flags: review?(edwsmith)
(In reply to comment #0)
> - Optimistic double Atoms: the AOT compiler now provides a heap-allocated
> double so that calls to op_add never need to perform heap allocations

That first line should say "stack-allocated"
Comment on attachment 495334 [details] [diff] [review]
AOT optimizations

Adding Lars, as several parts of this are likely to collide with ongoing ExactGC work.
Attachment #495334 - Flags: review?(lhansen)
Comment on attachment 495334 [details] [diff] [review]
AOT optimizations

(In reply to comment #0)

General comment: this really needs to be multiple patches; there are several completely different issues here that really need to be considered separately. (In point of fact, it needs to be multiple *bugs*, each addressing one issue.) When the patches are minor tweaks, making a grab-bag of changes has been fine, but these are major changes, several of which are on a collision course with other work that's underway:

The changes to ArrayObject are definitely going to collide with the changes in bug 610063 when they land (but shouldn't be too hard to resolve); I'd recommend you look those over to see if any unpleasant surprises await. In particular, though, "getDenseCopy()" disturbs me, as it makes an assumption about the underlying Array representation (which is ok) and exposes this assumption to the outside world (which is not)...

All of class AvmValue is presented without any explanation or documentation as to what it's for, when it's used, etc; all of this needs to be done. Also consider that Atom tagging schemes are under revision as part of the changes for ExactGC and also for float and float4 support; in particular, the representation of kDoubleType is changing (see bug 611484), which will definitely impact this in a major way.

> - Each AbcEnv also contains a GenericGuard (patch coming soon!) used to do
> exception based caching of pushstrings and finddefs

If you are proposing (re)adding MMU support to do this, be aware that this was a source of extreme pain and suffering in the MIR days, and proved a huge debug-and-maintain win in removing. I'd be loathe to re-add it, and would definitely want to see evidence of *substantial* performance wins that can't be replicated in other ways.
 
> - Optimistic double Atoms: the AOT compiler now provides a heap-allocated
> double so that calls to op_add never need to perform heap allocations. A
> modified version of op_add takes a pointer to this heap allocated double and
> fills it in with the result producing an Atom that points to this optimistic
> double. The AOT compiler promotes this stack-allocated double to a
> heap-allocated double whenever an atom that references it escapes.

This makes no sense to me; a double Atom is always heap-allocated in the first place. (Or at least it's supposed to be; if AOT violates this contract then severe pain awaits us)

> - static mops storage: allows you to use one fixed size static buffer for the
> alchemy/mops opcodes to work on, it also allows you to disable range-checking.
> Obviously neither of these apply to the shipping flash player/AIR, but on
> certain platforms like iOS where we generate native apps with captive runtimes
> that can't execute untrusted code we want to expose an option for people to use
> this as it allows the AOt compiler the opportunity to generate significantly
> faster code from alchemy'ed swfs.

disabling range-checking sounds like an extremely dangerous option, even for AOT code.


Style nit: tab spacing is wrong in some cases (eg in the AOT specific ArrayObject ctor, it's 2 spaces rather than 4 spaces)

Style nit: Tamarin prefers constant-on-RHS for new code, thus

	AvmAssert(!(7 & (uintptr_t)dst));

should be

	AvmAssert(!((uintptr_t)dst & 7));
Attachment #495334 - Flags: review?(stejohns) → review-
(In reply to comment #4)
> Comment on attachment 495334 [details] [diff] [review]
> AOT optimizations
> 
> (In reply to comment #0)
> 
> General comment: this really needs to be multiple patches; there are several
> completely different issues here that really need to be considered separately.
> (In point of fact, it needs to be multiple *bugs*, each addressing one issue.)
> When the patches are minor tweaks, making a grab-bag of changes has been fine,
> but these are major changes, several of which are on a collision course with
> other work that's underway:

Ok no problem, I've started breaking the patch apart.

> All of class AvmValue is presented without any explanation or documentation as
> to what it's for, when it's used, etc; all of this needs to be done. Also

The diff is confusing, but its mostly whitespace/tabbing changes + some minor bug fixes, the AvmValue class already exists.

I've broken the cdecl refactoring out as bug 617246

> consider that Atom tagging schemes are under revision as part of the changes
> for ExactGC and also for float and float4 support; in particular, the
> representation of kDoubleType is changing (see bug 611484), which will
> definitely impact this in a major way.

Yeah we'll definitely need to update cdecl to take that into account when it lands

> 
> > - Each AbcEnv also contains a GenericGuard (patch coming soon!) used to do
> > exception based caching of pushstrings and finddefs
> 
> If you are proposing (re)adding MMU support to do this, be aware that this was
> a source of extreme pain and suffering in the MIR days, and proved a huge
> debug-and-maintain win in removing. I'd be loathe to re-add it, and would
> definitely want to see evidence of *substantial* performance wins that can't be
> replicated in other ways.

The implementation is certainly a bit low-level but the code is all contained within one file and exposed via a straightforward API, it doesn't have any ugly tendrils into the rest of the codebase. We could probably get 90% of the performance by just using plain old pointers but we need every last drop of performance on mobile. Lets debate this more when I post the patch for the Generic Guard code.

> > - Optimistic double Atoms: the AOT compiler now provides a heap-allocated
> > double so that calls to op_add never need to perform heap allocations. 
> 
> This makes no sense to me; a double Atom is always heap-allocated in the first
> place. (Or at least it's supposed to be; if AOT violates this contract then
> severe pain awaits us)

I fudged the description, that should have said stack-allocated, anyway I've broken it out as a separate bug as requested: bug 617252

> > - static mops storage: allows you to use one fixed size static buffer for the
> > alchemy/mops opcodes to work on, it also allows you to disable range-checking.
> 
> disabling range-checking sounds like an extremely dangerous option, even for
> AOT code.

Given that devs are already allowed to inject arbitrary native code into their iOS binary I don't think this is an issue. It will of course be opt-in for those people who want speed at the expense of safety. I've broken this out as bug 617250
Attachment #495334 - Attachment is obsolete: true
Attachment #495334 - Flags: review?(lhansen)
Attachment #495334 - Flags: review?(edwsmith)
Depends on: 617252
Depends on: 617250
Whiteboard: Tracking
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 617642
Depends on: 617682
Depends on: 617254
Depends on: 617246
This bug is no longer being used as the AOT tracking bug, see bug 620678 instead
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
Whiteboard: Tracking
bulk verifying resolved !fixed issues
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: