Closed
Bug 600723
Opened 14 years ago
Closed 6 years ago
Background Compiler v2
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), enhancement)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: siwilkin, Unassigned)
References
Details
(Whiteboard: Tracking)
Attachments
(1 file, 5 obsolete files)
83.17 KB,
patch
|
Details | Diff | Splinter Review |
This is version 2 of the background compiler for Tamarin.
In brief, it implements a multi-threaded, hot-spotting, background compiler based on the current foreground JIT.
It shares very little with version 1 (tracked by bugs 538181, 538185, 540487, 545094 and 556837). Version 1 was built with the knowledge that performing compilation required access to managed objects, but the VM's GC supported only a single mutator. Hence, the majority of version 1's patch-stack works hard to solve the GC safety issues.
For version 2, a GC that supports multiple mutators is available (bug 582770), together with a framework for executing asynchronous tasks via a per-AvmCore pool of background threads (bug 582772). Built on this framework, version 2 provides identical features to that of version 1, but with a 90% reduction in the number of source-file changes.
Full details of the work and ongoing performance results will be added in the comments below.
For the eager, here's instructions for building and running:
Build:
To enable the background compiler in an avmshell build, enable AVMFEATURE_BG_COMPILATION. Note, FlashPlayer builds require very few changes to the player codebase (details published internally).
Run:
By default, 1 compiler thread is used, with a compilation threshold of 1 (the compilation threshold determines how many times a method is interpreted before triggering its background compilation). These parameters can be adjusted for the avmshell with commandline options '-exec_thread_qty' and '-bg_comp_threshold'.
Reporter | ||
Comment 1•14 years ago
|
||
Reporter | ||
Comment 2•14 years ago
|
||
Reporter | ||
Updated•14 years ago
|
Reporter | ||
Comment 3•14 years ago
|
||
Current design notes:
1. Overview
The background compiler is implemented mainly by the BgCompilerExecMgr class, which is a new implementation of the ExecMgr interface. An AvmCore delegates AS3 execution responsibilities to an ExecMgr, including 'invocation, policy decisions for how to execute, when to verify, and when to translate'. The BgCompilerExecMgr class (and some friends) represent approximately 70% of the patch; the remaining 30% is concerned with making existing VM data-structures thread-safe. Note the current diffstat:
exec-jit.cpp | 462 ++++++++++++++++++++++++++++++++++++++++++++++++-
exec.h | 150 +++++++++++++++
exec-inlines.h | 57 ++++++
CodegenLIR.cpp | 52 +++--
MultinameHashtable.h | 45 ++++
Verifier.cpp | 41 ++++
MultinameHashtable.cpp | 23 ++
Traits.h | 20 +-
PoolObject.cpp | 17 +
Traits.cpp | 16 +
PoolObject.h | 13 +
Verifier.h | 11 +
Traits-inlines.h | 9
QCache.h | 7
MethodInfo.h | 4
15 files changed, 888 insertions(+), 39 deletions(-)
2. BgCompilerExecMgr
Fortunately, the existing BaseExecMgr implementation of the ExecMgr interface provides much of what is needed for the background compiler (BaseExecMgr implements the current foreground JIT driver). Hence, BgCompilerExecMgr does not implement ExecMgr directly, rather, it is a subclass of BaseExecMgr and does not even need to specialize any of the ExecMgr methods, the BaseExecMgr versions are sufficient. It turns out that the only hook BgCompilerExecMgr needs is to override BaseExecMgr::verifyJit(...) (which was not virtual previously, but is in this patch).
The BaseExecMgr version of verifyJit(...) invokes the verifier on an AS3 method with the JIT attached as a CodeWriter, and then binds the resulting native code as the AS3 method's implementation. The overridden BgCompilerExecMgr version invokes the verifier on the AS3 method with a null CodeWriter. When verified, a BgCompilationTask object is then created for the AS3 method, with a small amount of metadata persisted from the verification phase. BgCompilationTask derives from ExecutorTask, which is the base-class for closures that can be passed to a AvmCore's ExecutorService thread-pool for background execution (see bug 582772). The BgCompilationTask object acts as the eventual compiler driver for the AS3 method (note that the BgCompilationTask and ExecutorService are GCObjects managed by the VM's GC, and the ExecutorService's threads are mutators of the same GC instance). The BgCompilationTask object is hung from the AS3 method's MethodInfo, in preparation for future submission to the VM's ExecutorService. The final task of BgCompilerExecMgr::verifyJit(...) is to set the AS3 method's implementation to a 'countdown' trampoline that matches its signature. Control-flow now returns to AS3 execution, so the trampoline will be called.
'Countdown' trampolines perform multiple functions. Firstly they decrement a counter that represents the number of interpreted invocations of the given MethodEnv's method. If this is greater than zero, then the interpreter is invoked on the MethodEnv, but the countdown trampoline remains in-place. If the count is less than or equal to zero, then the BgCompilationTask object hanging from the MethodEnv's MethodInfo is submitted to the AvmCore's ExecutorService for background execution, the countdown trampoline is replaced with a 'polling' trampoline, and finally the interpreter is invoked on the MethodEnv. Hence, countdown trampolines drive hot-spot compilation; each method can be given a different compilation threshold, however, for now, a global threshold is used.
It is the responsibility of polling trampolines to check the progress of compilation tasks (by inspecting the BgCompilationTask object hanging from the given MethodEnv's MethodInfo), and if the compilation is complete, bind the resulting native code as the method's implementation. Note that it is not possible for a compiler thread to perform this task without heavy-handed synchonrization; swapping an AS3 method's implementation from interpreted to compiled involves setting at least 4 fields in the MethodInfo, and there can also potentially be many stale MethodEnvProcHolder::_implGPR pointers that need resetting to the new code (these will still point to the countdown trampoline).
After the polling trampoline has installed the compiled code as the AS3 method's implementation (replacing itself), the BgCompilationTask object is dropped on the floor (it is a GCObject), and the compiled code is called.
3. BgCompilationTask
A BgCompilationTask object (closure) acts a method's compilation driver by overriding its base-class's ExecutorTask::run() method. After submission to an AvmCore's ExecutorService, the run() method will eventually be called by a background thread. The driver is very simple:
void BgCompilationTask::run() {
GprMethodProc codePrivate;
{
// This is block scoped as we need the CodegenLIR destructor
// to run promptly, to set our code memory as executable. This must
// happen before we make the code visible to the application thread.
CodegenLIR jit(methodInfo, methodSignature, toplevel);
Verifier verifier(methodInfo, methodSignature, toplevel, abc_env);
verifier.setPreVerifiedMaxTryRegion(m_from, m_to, m_handlerIsReachable);
verifier.disableScopeWriter();
verifier.verify(&jit);
codePrivate = jit.emitMD();
}
// Make sure all generated code is visible before the application
// thread gets chance to install and call it.
AtomicOps::memoryBarrier();
m_code = codePrivate == NULL
? BgCompilationTask::COMPILATION_FAILED
: codePrivate;
}
Note that the Verifier is still used as the CodegenLIR driver, albeit with slight modifications to disable the ScopeWriter and exception table parsing (both of which will have happened in the verification prior to interpretation). The polling trampolines discussed earlier inspect a BgCompilationTask's m_code field to determine if the compilation is complete. Note we can get away with no synchronization (except a fence) to do the hand-off.
4. InvokerCompiler
No major changes have been made to the operation of the InvokerCompiler. It is still run inline on the application thread (following the installation of its trampoline by a polling trampoline, when installing background compiled code). Note however, that it will run concurrently with the background compiler.
5. Thread-Safety
This patch depends on bug 582776, which provides thread-safe String and Namespace interning. Apart from these, there are few areas of the VM that need special consideration. The tree of ABC metadata objects hanging from an AS3 method's parent PoolObject is heavily accessed during the method's compilation, however the overwhelming majority of its mutation are one-time initializations that will have occurred during ABC parsing or verification prior to interpretation.
Below is a brief explanation of the thread-safety added by this patch (Note that the MethodSignature and TraitsBindings caches do not have a full thread-safety solution as yet, and so must be set to infinite size. Rather than provide an adhoc solution for these structures, a general-solution for efficient one-time-initialization and weakref-style caching in the face of multiple-threads is currently being developed.)
- avmplus::CodeMgr. Currently a PoolObject has a single CodeMgr. This has been replaced with BgCompilerCodeMgr, which maintains a pool of CodeMgrs; one for each thread which could possibly perform compilation. The BgCompilerCodeMgr retains the property of deallocating its code memory when the PoolObject is collected. With the current JIT, flushing of a CodeMgr's binding caches occurs within a postsweep() callback from the GC, hence this now occurs within a safepoint, so no extra synchronization is required.
- PoolObject::m_namedTraits. This field is a MultinameHashtable which can be updated by the application thread whilst being read by a compiler thread (all other similar MultinameHashtables are only ever written to during ABC parsing or verification prior to interpretation, so are essential immutable by the time they are accessed by multiple threads). However, even put()/get() races are benign for this structure, it is only the grow-and-rehash operation from within a put() that must be made thread-safe. As this is the slow-path, this operation is move to within a safepoint task.
- Verifier::findCommonBase. The current algorithm tracks its search by writing to a Trait's bitfield. This has been replaced with a thread-private algorithm based on sub-type depth.
- QCaches. Even when set to infinite capacity, their access requires synchronization. This is currently provided by a per-instance lock. This will be later subsumed by the general one-time-initialization and weakref-style caching solution discussed previously.
Reporter | ||
Comment 4•14 years ago
|
||
This update fixes a latent race condition in how nanojit's CodeAlloc protects executable memory.
Previously, when re-using a partially full memory block, a CodeAlloc would reset the block's permissions from RX to RW (although this is a little obscured by the SPI). This is clearly not going to work with a background compiler; the application thread may want to execute existing code from the block (why this race has taken so long to show-up, even after some extreme stress-testing, I don't know!)
The fix adds an new VMPI function, VMPI_makeCodeMemoryExecutableAndWritable, which allows CodeAllocs to set the correct permissions temporarily as needed.
Attachment #479614 -
Attachment is obsolete: true
Comment 5•14 years ago
|
||
(In reply to comment #4)
>
> The fix adds an new VMPI function, VMPI_makeCodeMemoryExecutableAndWritable,
> which allows CodeAllocs to set the correct permissions temporarily as needed.
A security impact analysis is probably required for that.
Reporter | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
> (In reply to comment #4)
> >
> > The fix adds an new VMPI function, VMPI_makeCodeMemoryExecutableAndWritable,
> > which allows CodeAllocs to set the correct permissions temporarily as needed.
>
> A security impact analysis is probably required for that.
These are the exact permissions that the Windows implementation of VMPI_makeCodeMemoryExecutable sets. So I assumed it wasn't an issue.
Are thing different with Posix? I make sure that blocks are set to non-executable before being handed back to the general pool...
Reporter | ||
Comment 7•14 years ago
|
||
Latest version fixes how a Trait's supertypeDepth is calculated.
This pre-calculated value is used in the thread-safe version of Verifier::findCommonBase.
Attachment #480010 -
Attachment is obsolete: true
Comment 8•14 years ago
|
||
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > The fix adds an new VMPI function, VMPI_makeCodeMemoryExecutableAndWritable,
> > which allows CodeAllocs to set the correct permissions temporarily as needed.
>
> A security impact analysis is probably required for that.
I don't see the addition of this API as making our implementation any less secure, since we already have periods of time wherein page(s) containing previously compiled code is switched over to RW while we add new code to the page.
Some proposals in this area; 595049 and more discussion can be found in 506693.
My larger concern is that CodeAlloc (on x86 at least) opens up all the pages in a block, rather than the sub-set available for adding new code.
See CodeAlloc::alloc() . Come to think of it, that is probably worth fixing on its own (bug 601165).
Comment 9•14 years ago
|
||
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > (In reply to comment #4)
> > > The fix adds an new VMPI function, VMPI_makeCodeMemoryExecutableAndWritable,
> > > which allows CodeAllocs to set the correct permissions temporarily as needed.
> >
> > A security impact analysis is probably required for that.
>
> I don't see the addition of this API as making our implementation any less
> secure, since we already have periods of time wherein page(s) containing
> previously compiled code is switched over to RW while we add new code to the
> page.
The new API makes code RWX. Previously, on reasonable platforms, code memory is allocated RW, and transitions between RW and RX - it should never be RWX. At least that's according to documentation of the desired behavior in VMPI.h. (If that behavior's not properly implemented there's little I can do about it, except complain.)
I realize that as long as code memory is ever writeable (and how can it not be) there's a window in which it can be corrupted by an attacker. But only if it is RWX can the attacking code hope to write to the page containing the attacking code without going through an intermediary on a different page. So from my (probably limited) perspective the RW <-> RX transition adds an additional layer of integrity protection.
Comment 10•14 years ago
|
||
We might be in a "local minimum" with the RX/RW scheme. We knew this possibility loomed, and chose the RX/RW scheme despite it (best tradeoff at the time). It doesn't work well with frequent code patching, nor with concurrent compilation/execution. (JVM's do both, and JS-VM's all at least do frequent patching).
Bug 506693 proposes to map code memory twice (RW and RX at different points in the address space), and is controversial. It's a way to have pseudo-RWX with less risk, although whether it is less-risk is debated.
If we generated PIC (or at least, relocatable code), then could we use a safe point to install new code? synchronously stop threads, change permissions, copy and relocate code, then continue. Seems like a lot of work.
Reporter | ||
Comment 11•14 years ago
|
||
(In reply to comment #10)
> If we generated PIC (or at least, relocatable code), then could we use a safe
> point to install new code? synchronously stop threads, change permissions,
> copy and relocate code, then continue. Seems like a lot of work.'
We wouldn't need a safepoint.
The main thread could do the copy and relocate from the polling trampoline.
Comment 12•14 years ago
|
||
(In reply to comment #10)
> Seems like a lot of work.
The patch for bug 601724 opens the door for reducing the # of pages affected by a RW <-> RX transitions.
We'd then be able to toggle the page permission bits on a page-by=page basis (as opposed to multiple pages at a time). Assuming performance is reasonable, this is probably the tightest we could get in terms of securing the permission bits without relocating/rebuilding code.
As another layer of protection we could resort to a technique proposed in bug 595049, which boils down to:
(1) compute CRC of page contents
(2) toggle permission to RWX
(3) apply patch
(4) toggle permission bits back to RX
(5) recompute CRC of page contents + patch contents
(6) does recomputed value == expected ? If so, we're fine, otherwise 'intruder detected'
But again seems like a lot of work.
Updated•14 years ago
|
Flags: flashplayer-qrb+
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → edwsmith
Reporter | ||
Comment 13•14 years ago
|
||
Attachment #479615 -
Attachment is obsolete: true
Attachment #480146 -
Attachment is obsolete: true
Reporter | ||
Comment 14•14 years ago
|
||
(In reply to comment #13)
> Created attachment 526071 [details] [diff] [review]
> Latest. TR rev 6090. Patch queue rev 280
This latest patch is not fully thread-safe:
The synchronization of access to ABC meta-data has been removed as the rebasing effort over many months is much more than just adding it when required. The main areas to look at are Domains and anything QCache'd.
All other thread-safety concerns remain fully solved by this patch and its 'depends on' patches. Note that the thread-safe implementation of Verifier::findCommonBase has been factored out of this patch into bug 593431
Reporter | ||
Comment 15•14 years ago
|
||
Attachment #526071 -
Attachment is obsolete: true
Comment 16•14 years ago
|
||
Maybe a dumb question, but will this affect jägermonkey too? TM+JM compiling time is significant in some benchmarks, so running in interpreted mode while compiling in the background might provide a nice performance boost.
Comment 17•14 years ago
|
||
This patch is to tamarin code, so won't have an effect on jägermonkey, but something similar probably could be designed. I beleive previous attempts to move TM compilation to a background thread were successful, but only marginal improvements because a significant part of TM compile time is interleaved with interpreter execution.
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Target Milestone: --- → Future
Comment 18•13 years ago
|
||
Not to be done in the next releases, removing Andre blocker.
No longer blocks: 538185
Updated•13 years ago
|
Assignee: edwsmith → nobody
Whiteboard: Tracking
Comment 19•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 20•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•