Closed Bug 539094 (OSR) Opened 15 years ago Closed 6 years ago

OSR: On-Stack replacement of interpreted methods by JIT-ted methods

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P2)

Tracking

(Not tracked)

RESOLVED WONTFIX
Q2 12 - Cyril

People

(Reporter: bmathisk, Assigned: wmaddox)

References

Details

(Whiteboard: Swift,fixed-in-spicier)

Attachments

(8 files, 15 obsolete files)

45.85 KB, patch
Details | Diff | Splinter Review
21.91 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
1.38 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
742 bytes, patch
Details | Diff | Splinter Review
3.65 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
12.93 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
16.26 KB, patch
rreitmai
: review+
Details | Diff | Splinter Review
92.68 KB, patch
rreitmai
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10
Build Identifier: 

Tamarin JIT-compiles almost every method it encounters in a conservative attempt to avoid having an interpreted loop active on the stack. If such a loop were to become hot, it would continue to be interpreted, since there is no way to bail out from interpretation, JIT-compile and then resume execution. This is what OSR provides.

By compiling almost all methods, Tamarin also compiles many that execute only once or otherwise too rarely to amortize compilation cost.

If it is easier to implement, an acceptable solution is to create an extra frame that does not actually replace the interpreter frame at hand, but supplements it with a JIT-compiled version. On return from the JIT-generated routine, the interpreter frame also returns.

Corner cases that can be dealt with by workarounds, since they seem relatively infrequent: 
- Methods that contain exception handlers may still get JIT-compiled right away. It might otherwise be too complicated to establish setjmp contexts on the stack.
- OSR at multiple different entry points in the same method may be omitted/suppressed for now.


Reproducible: Always

Steps to Reproduce:
1. Run Tamarin on a method that has a loop that calls one or more leaf methods.
2. Trace what gets JIT-compiled or observe a breakpoint in the JIT.
3. The loop method gets JIT-compiled immediately.
4. Any simple method without a loop also gets compiled, with few exceptions, e.g. initializers.
Actual Results:  
Loop method compiled immediately.

Expected Results:  
Loop method to be compiled after a number of loops, while still running.

This feature enables advanced dynamic compilation strategies. It will be complemented with "deoptimization", i.e. OSR backwards, i.e. replacement of a JIT-compiled frame with an interpreter frame. Together, these will then enable elaborate code cache management (and more).
Priority: -- → P3
Assignee: nobody → bmathisk
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: flashplayer-qrb+
Priority: P3 → P2
Target Milestone: --- → flash10.2
Attachment #426536 - Flags: feedback?(edwsmith)
Attachment #426536 - Flags: feedback?(edwsmith)
Depends on: 486699
Depends on: 561366
Depends on: 561912
Now that the verifier runs in two phases, we have enough information to skip early-jit for methods that don't have loops (even before OSR is ready).
Depends on: 528375
No longer depends on: 561366
crude #s from checkintest.swf on development build with the call trigger fixed, using the same threshold for calls and loops.

config   methods jit'd  startup time  avg
------   -------------  ------------  ---
TR       2076           573,543,540   552
osr 2    1463           530,536,528   531
osr 10    915           516,489,499   501
osr 20    782           506,488,502   499
osr 50    561           510,541,536   529
osr 100   423           534,524,520   526
interp     47           638,604,636   626
Ideas for testing

* footprint: also measure memory and method sizes (abc opcodes)
* interactiviity: compilation pause stats, pause clusters.  is there a good way to automatically detect phase changes so we can measure compilation pauses around phase changes?
* tuning:  one compilation threshold, or two (loops vs calls).  smarter estimates?  count up or down?
if every method could use a different threshold then countdown-to-zero seems to make sense.

On a recent Conference with Bernd, he stated that the counting mechanism isn't very important, but what is important is that there be some kind of decay so that recency is taken into account and not just absolute # of calls or loops.  (thought experiment:  a method called once per minute should not necessarily ever get jit compiled, even after N minutes (e.g. 20).  

Currently the prototype has no such mechanism (nor should it, per se.  just noting for future work).
Depends on: 563119
No longer depends on: 528375
Component: Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: vm → nanojit
Whiteboard: Swift
Changes since last posting

* mostly rebased and reworked as an extension to ExecMgr.
* Invocation counting refactored to use trampolines instead of inserting code in Interpreter prolog.
* ExecMgr::setInterp() tweaked to distinguish between interp-and-trigger-osr and plain interp
Attachment #426536 - Attachment is obsolete: true
Attachment #440747 - Attachment is obsolete: true
Assignee: bmathisk → edwsmith
Blocks: 583853
ESC compiling itself (x64)
-Ojit    6.4
-Dinterp 6.3
osr 2    5.7
osr 10   5.4
osr 20   5.3
osr 30   5.25
osr 50   5.2
osr 100+ ERROR (x64-only)

(x86)
redux    6.0
osr-20   5.8
osr-30   5.6
osr-50   5.7

The best threshold seems to be lower for x86 than for x64.
.digression
On a PACMAN note, given the good -Dinterp results ESC must be spending a lot of its time in C++ support routines.  My guesses would be charCodeAt and prototype chain lookups (the AS3 namespace is not open).  Indeed, the source contains special uses of the AS3 namespace in performance-critical parts:

function hash_string(s: String) {
 // See http://www.cse.yorku.ca/~oz/hash.html; this is the djb2 algorithm
 var h = 5381;
 for ( var i=0, limit=s.length ; i < limit ; i++ )
   h = ((h << 5) + h) + s."http://adobe.com/AS3/2006/builtin"::charCodeAt(i);
 return uint(h);
}

Another job for shape trees / hidden classes, maybe.
.endd
Changes since last refresh:

Had to move the adjust_frame() call earlier in the prolog so that it skips over generated code that initializes object fields.  This required having adjust_frame call debugEnter() when debugging is enabled.

With this, it works like before, but is weird:  if debugging (or profiling) is enabled, there will be a debugEnter() call when a method starts running in the interpreter, and then another one when we trigger OSR.  and they nest, so there will later be two debugExits.  really, there ought to be one, but this will take yet more fiddling.

options:
1. skip debugEnter on OSR transition, run the jit's debugExit, then skip the interpreter's debugExit
2. skip debugEnter on OSR entry, and set a flag so the jit's debugExit is also skipped, then run the interpreter's debugExit.

Debugging jit'd code is a royal pain, and painful enough to tempt me to just disable OSR entirely in debugger mode, looking forward to when deopt makes the whole issue go away.  the only reason i hesitate is that "debugger mode" includes profilng, and, well, people want profiling to be somewhat accurate.
Attachment #450707 - Attachment is obsolete: true
Depends on: 563146
Depends on: 604062
Introduced a VARSIZE constant (8) in CodegenLIR.h, use it instead of
the literal 8 in lots of places.

Factored out argSize() calculation.

Factored emitDebugEnter(), called by writePrologue(), and moved
initNotNull(), emitInitializers(), and emitDebugEnter() to after
we switch the LirWriter from prolog to body.  This shouldn't matter
now, but is needed when OSR lands.

Made subclasses of InitVisitor have virtual destructors.
Attachment #486638 - Flags: review?
Attached patch (v5) OSR prototype (obsolete) — Splinter Review
Changes since last patch:

* Moved OSR.cpp/h to exec-osr.cpp/h to match conventions.
* Added and cleaned up more comments.
* Combined invocation counter and loop counter into one,
  and count down instead of up.
* Added config variables, and commandline option.
* Class OSR now encapsulates the state of a single compilation
  session, instead of using shared variables on BaseExecMgr.  The OSR
  instance is stack allocated.

The master todo list is at the top of exec-osr.cpp.
Attachment #486680 - Flags: feedback?
Attachment #486638 - Flags: review? → review?(wsharp)
Attachment #486680 - Flags: feedback? → feedback?(wsharp)
Comment on attachment 486638 [details] [diff] [review]
Code cleanup factored out of OSR branch

Any reason the debugEnter logic moved slightly in emitPrologue to after the "SWITCH PIPELINE FROM PROLOG TO BODY" block?

With patch applied, two *8 still exist here:

src = FramePtr(uintptr_t(src) + srcPos*8);
dst = FramePtr(uintptr_t(dst) + dstPos*8);

Not sure if they are related or need to be changed.

We have two argSize functions that basically do the same thing.  Should they be coalesced into one?

  C:\cygwin\home\wsharp\tamarin-redux\core\CodegenLIR.h(349):        static int32_t argSize(MethodSignaturep, int32_t i);
  C:\cygwin\home\wsharp\tamarin-redux\core\MethodInfo.cpp(552):    static uint32_t argSize(Traits* t) { return Traits::getBuiltinType(t) == BUILTIN_number ? sizeof(double) : sizeof(Atom); }

I would have done MethodSignature::argSize() for the codegenlir one since that object is all that is referenced in the LirHelper variant.  Maybe it just calls a static Traits::argSize helper?
Attachment #486638 - Flags: review?(wsharp) → review+
Attachment #463405 - Attachment is obsolete: true
(In reply to comment #14)

> Any reason the debugEnter logic moved slightly in emitPrologue to after the
> "SWITCH PIPELINE FROM PROLOG TO BODY" block?

Yes, the point where the prolog ends and the method body begins (ie where the SWITCH happens), must be before the new check-for-osr-and-branch code that the OSR patch will add.  And, that branch must be before debugEnter().

> With patch applied, two *8 still exist here:
> 
> src = FramePtr(uintptr_t(src) + srcPos*8);
> dst = FramePtr(uintptr_t(dst) + dstPos*8);
> 
> Not sure if they are related or need to be changed.

They are related, but I don't want to include CodegenLIR.h at this point (major breakage of encapsulation).  The real problem is that this method and a few others on MethodInfo, access the stack layout and therefore must know what the static layout is.

I'll move ARGSIZE from CodgenLIR.h to exec.h (even though it's really defined by the JIT implementation) as a low-churn intermediate fix.

> We have two argSize functions that basically do the same thing.  Should they be
> coalesced into one?

Good catch, I didn't notice the one in MethodInfo.cpp.  argSize() as an aspect of the vm-wide calling convention, which is declared in exec.h (GprMethodProc/FprMethodProc), so I'll move the definition.
Comment on attachment 486638 [details] [diff] [review]
Code cleanup factored out of OSR branch

http://hg.mozilla.org/tamarin-redux/rev/394e2a25905a
Attachment #486638 - Attachment is obsolete: true
Target Milestone: flash10.x - Serrano → flash10.2.x-Spicy
Assignee: edwsmith → wmaddox
Attached patch v5 patch rebased (obsolete) — Splinter Review
I'm seeing a couple of test failures in a debug build (IA32, MacOS X):

wmaddox-macpro:~ wmaddox$ ./tr-osr/deb32/shell/avmshell tr-osr/test/acceptance/as3/RuntimeErrors/Error1115NotAConstructor.abc
Runtime Error = TypeError: Error #1007 FAILED! expected: TypeError: Error #1115
wmaddox-macpro:~ wmaddox$ ./tr-osr/deb32/shell/avmshell tr-osr/test/acceptance/as3/Vector/nonindexproperty.abc
  Vector non-index properties
Assertion failed: "((name && name->isInterned()))" ("/Users/wmaddox/tr-osr/core/Multiname-inlines.h":138)
Trace/BPT trap

Are either of these known issues?  I'll investigate further tomorrow.
Attachment #489400 - Flags: feedback?(edwsmith)
Error1115NotAConstructor.as and nonindexproperty.as both test behavior differences between interpreter and jit.  (see dependent bug chain).  

The assertion is new, I haven't seen it before.  I'm also looking at a few windows compile failures.
Comment on attachment 489400 [details] [diff] [review]
v5 patch rebased

Nothing obviously wrong -- be sure to commit to the working repo, I'm also working from the same repo.  no need for formal reviews in that repo except for anything major.
Attachment #489400 - Flags: feedback?(edwsmith) → feedback+
(In reply to comment #18)
> Error1115NotAConstructor.as and nonindexproperty.as both test behavior
> differences between interpreter and jit.  (see dependent bug chain).  
> 
> The assertion is new, I haven't seen it before.  I'm also looking at a few
> windows compile failures.

the two tests are expected failures in tr when running -Ojit or -Djitordie
Is running -osr=0 (or default) equivalent to -Ojit?
-osr=0 is equivalent to no option (default policy).
I can do more testing on the patch.  I'll wait for the changes to land in the osr working repo.
No longer depends on: 563146
Latest code.  Patch was generated from this repository:

  http://asteam.macromedia.com/hg/users/bmathisk/osr/

with this command:

  hg diff -rdefault -X .cproject
Attachment #486680 - Attachment is obsolete: true
Attachment #489400 - Attachment is obsolete: true
Attachment #486680 - Flags: feedback?(wsharp)
This is a diff against the working repository at
http://asteam.macromedia.com/hg/users/wmaddox/tr-spicy-osr.new .
At present, the variable VMCFG_OSR is manually defined in avbuild.h.
Acceptance test results are the same as for tr-spicy when OSR is configured out,
and do not show the anomalies seen when OSR is included but supposedly disabled.
Attachment #490785 - Flags: review?(edwsmith)
This patch contains all changes to the public tr-spicy branch that are in play in the working tr-spicy-osr.new repository.  This should make clear exactly what code is conditionalized, and what is not.  In particular, the osr-prep cleanup patch is unconditionally included, as are some other apparent cleanups.

As it is, I think I've conditionalized the code a bit too heavily for the purpose, and would consider using, for example, dummy types and arguments to  avoid conditionalizing for method signatures.  If we feel that it is necessary to more strictly partition the changes, I'd be inclined to resort to code duplication over further #ifdef intrusions.  At some point, the conditionalization is counterproductive because you really don't know what code you're compiling anymore.
Attachment #490787 - Flags: feedback?(edwsmith)
Attached file Acceptance test results (obsolete) —
Current status of tr-spicy acceptance tests, including ifdef'd version
(In reply to comment #25)

> As it is, I think I've conditionalized the code a bit too heavily for the
> purpose, and would consider using, for example, dummy types and arguments to 
> avoid conditionalizing for method signatures.  If we feel that it is necessary
> to more strictly partition the changes, I'd be inclined to resort to code
> duplication over further #ifdef intrusions.  At some point, the
> conditionalization is counterproductive because you really don't know what code
> you're compiling anymore.

Agreed.  Lets just #ifdef around the contents of exec-osr.cpp, and do just the minimum #ifdefs everywhere else to make it work. 

If you add AVMFEATURE_OSR to avmfeatures.as, like other features are defined, then flash/avmhost-features.h can enable/disable OSR on just the desired builds.
Comment on attachment 490787 [details] [diff] [review]
Cumulative Spicy OSR patch against tr-spicy

too many #ifdefs, per previous comment.
Attachment #490787 - Flags: feedback?(edwsmith) → feedback-
Attachment #490785 - Flags: review?(edwsmith) → review-
(In reply to comment #24)
> At present, the variable VMCFG_OSR is manually defined in avbuild.h.

Awesome! When do we get VMCFG_OMGWTFBBQ?

:-)
Patch against http://asteam.macromedia.com/hg/users/wmaddox/tr-spicy-osr.new .
Attachment #490785 - Attachment is obsolete: true
Attachment #490787 - Attachment is obsolete: true
Attachment #490791 - Attachment is obsolete: true
Attachment #491011 - Flags: review?(edwsmith)
Same as previous patch, except the OSR environment variable facility is not included unless VMCFG_OSR_ENV_VAR is defined in avmbuild.h.  The define is present but commented out, with a suitable warning that it is a dirty hack for testing.
Attachment #491011 - Attachment is obsolete: true
Attachment #491050 - Flags: review?(edwsmith)
Attachment #491011 - Flags: review?(edwsmith)
Comment on attachment 491050 [details] [diff] [review]
(v2) Make OSR a configurable feature AVMFEATURE_OSR

only nits:

Interpreter.cpp:270
exec.cpp:79
exec.cpp:230
 - only need #ifdef VMCFG_OSR since OSR implies JIT.

exec.cpp:349
 - the verbose output should not be #ifdef'd
Attachment #491050 - Flags: review?(edwsmith) → review+
http://asteam.macromedia.com/hg/users/wmaddox/tr-spicy-osr.new/rev/17a36fcafac9

Removed #ifdef on verbose output at exec.cpp:349

Left in redundant VMCFG_NANOJIT configuration to make it easier to strip
out AVMFEATURE_OSR for TR as discussed in IM with Edwin.
With no -osr option on the command line, config.osr_threshold was not being properly initialized.  This issue also appears in the earlier v6 patch for TR.
Attachment #491468 - Flags: review?(edwsmith)
Comment on attachment 491468 [details] [diff] [review]
Properly initalize config.osr_threshold in default case

Nice catch. I take it this is why the default policy wasn't taking effect?
Attachment #491468 - Flags: review?(edwsmith) → review+
(In reply to comment #36)

> Nice catch. I take it this is why the default policy wasn't taking effect?

Unfortunately, the default policy is still broken.  The patch does correct failure
to honor the OSR environment variable on the Mac platform, and possibly elsewhere.
Note that if the uninitialized memory were fortuitously zero, the bug I've fixed here would not manifest.
This patch fixes the spurious acceptance test failures with -osr=0.
Previously, if osr_threshold == 0 in mode RM_mixed, we always compiled,
even static initializers.
Attachment #492583 - Flags: review?(edwsmith)
This allows ATS build smokes to run with OSR integrated into Win32 Flash Player.
Attachment #492598 - Flags: review?(edwsmith)
Attachment #492598 - Flags: feedback?(rreitmai)
Whiteboard: Swift → Swift,fixed-in-spicier
Comments and assertions in the code claim that we cannot recover from a JIT failure during OSR, and thus we attempt to preempt such failure. We require the following, among others, else a method will be compiled:

    (size_t)ms->frame_size() < NJ_MAX_STACK_ENTRY/sizeof(Atom)

This looks very similar to the test we use to fix bug 601794:

    ms->frame_size() * 2 > NJ_MAX_STACK_ENTRY

This test correctly acknowledges that frame_size is measured in 8-byte units NJ_MAX_STACK_ENTRY in 4 byte units.  The JIT may still fail even when this test is satisfied, as there is some overhead in the stack frame not accounted for in frame_size.  This is by design.

On the face of it, the test used in OSR makese seems to assume that frame_size is reported in bytes, and includes all overhead.  This test must never fail to predict a JIT failure, thus it is not safe to ignore overhead.  In fact, by assuming a byte count rather than a slot count, we have made it so conservative that the overhead is certainly covered, but at great cost, and in a way that does not appear to be intended.

While this defect could be fixed, it is not clear to me why we cannot easily recover from JIT failure, as Bernd pointed out to me in a recent offfline discussion. It appears that we need only provide a second result from OSR::execute() indicating whether the compilation succeeded, and simply exit the OSR() macro calls in the interpreter normally, just as if OSR::countEdge() had returned false, in the event that the JIT failed.  This assumes that the mere fact of running the JIT has not corrupted data structures required to continue interpreting the method, but I don't see anything here that wouldn't be a problem for the normal JIT-fail case as well.
(In reply to comment #41)

> following, among others, else a method will be compiled:

Compiled eagerly, that is, instead of deferring for possible OSR.
Given that spill slots also take up space in the frame, it appears that reliably predicting JIT success conservatively, such that predicted success is never followed by actual failure, is impractical.  Unlike MethodFrame overhead and space for special variables like coreAddr, the number of spill slots can only be computed once we have generated the code.
This patch implements what seems to be the obvious and straightforward way to deal with JIT failure.  The stack limit has been downsized to 75 to provoke a large number of JIT failures.  Running the acceptance suite, it appears to work for a releae build, but debug builds show an assertion in MMgc indicating a storage leak:

#3  0x00001fc8 in VMPI_debugBreak () at /Users/wmaddox/tr-spicy-osr/VMPI/MacDebugUtils.cpp:52
#4  0x000077de in avmplus::AvmDebugMsg (p=0x1864a3 "Leaks!", debugBreak=true) at /Users/wmaddox/tr-spicy-osr/vmbase/AvmAssert.cpp:69
#5  0x000074b5 in avmplus::AvmAssertFail (message=0x1864a3 "Leaks!") at AvmAssert.h:66
#6  0x000074cf in avmplus::_AvmAssertMsg (condition=0, message=0x1864a3 "Leaks!") at AvmAssert.h:72
#7  0x0005af04 in MMgc::GCHeap::DestroyInstance (this=0x1ca4f8) at /Users/wmaddox/tr-spicy-osr/MMgc/GCHeap.cpp:273
#8  0x0005b254 in MMgc::GCHeap::Destroy () at /Users/wmaddox/tr-spicy-osr/MMgc/GCHeap.cpp:175
#9  0x0002bc90 in avmshell::Shell::run (argc=2, argv=0xbffff9fc) at /Users/wmaddox/tr-spicy-osr/shell/avmshell.cpp:151
#10 0x0003bbf7 in main (argc=2, argv=0xbffff9fc) at /Users/wmaddox/tr-spicy-osr/shell/avmshellMac.cpp:114

Note that stack size limits of 64 and 32 cause nanojit assertion failures, duplicated in TR.  I suspect this is indicative of an NJ bug, and I'll investigate this further.
Attachment #493169 - Flags: feedback?(edwsmith)
Attachment #493169 - Flags: feedback?(bmathisk)
(In reply to comment #41)

> On the face of it, the test used in OSR makes seems to assume that frame_size
> is reported in bytes, and includes all overhead.  This test must never fail to
> predict a JIT failure, thus it is not safe to ignore overhead.  In fact, by
> assuming a byte count rather than a slot count, we have made it so conservative
> that the overhead is certainly covered, but at great cost, and in a way that
> does not appear to be intended.

That was exactly my intent, despite appearances.  The cost should be eager compilation of methods that fail the conservative test, and if the eager
compilation fails we also shouldn't try to osr-compile it later.  (we should
double-check that dark corner of policy logic).

> While this defect could be fixed, it is not clear to me why we cannot easily
> recover from JIT failure, as Bernd pointed out to me in a recent offfline
> discussion. It appears that we need only provide a second result from
> OSR::execute() indicating whether the compilation succeeded, and simply exit
> the OSR() macro calls in the interpreter normally, just as if OSR::countEdge()
> had returned false, in the event that the JIT failed.  This assumes that the
> mere fact of running the JIT has not corrupted data structures required to
> continue interpreting the method, but I don't see anything here that wouldn't
> be a problem for the normal JIT-fail case as well.

Recovering from JIT failure is doable; I was just not as pessimistic about
the conservative test, and not as comfortable with a refactoring that had
to return a pass/fail result in addition to a value.  (shrug).

In the OSR() macro in Interpreter.cpp, taking the address of a1 is bad;
See the comments above its definition.  A dedicated tmp_atom might be fine,
following the pattern of tmp_u30 and tmp_pc.  If the values in framep[] were
dead after successful-jit, then we could pass the result value in framep[0].
But they aren't strictly dead due to the redundant call to debugExit.

exec-osr.cpp still has this cryptic "todo", which if fixed might
make it okay to pass the return value via framep[0].

  * ensure debugEnter/Exit pairing is correct
Attachment #493169 - Flags: feedback?(edwsmith) → feedback+
Attachment #492583 - Flags: review?(edwsmith)
Attachment #492598 - Flags: review?(edwsmith) → review+
(In reply to comment #45)

> That was exactly my intent, despite appearances.  The cost should be eager
> compilation of methods that fail the conservative test, and if the eager
> compilation fails we also shouldn't try to osr-compile it later.  (we should
> double-check that dark corner of policy logic).

It appears to me that the "conservative" test, however, is not conservative,
in the sense that it does not guarantee success of the JIT in the worst case.
Am I missing something?
(In reply to comment #46)
> (In reply to comment #45)
> 
> > That was exactly my intent, despite appearances.  The cost should be eager
> > compilation of methods that fail the conservative test, and if the eager
> > compilation fails we also shouldn't try to osr-compile it later.  (we should
> > double-check that dark corner of policy logic).
> 
> It appears to me that the "conservative" test, however, is not conservative,
> in the sense that it does not guarantee success of the JIT in the worst case.
> Am I missing something?

The worst case being that so many things spill (or are not accounted for by frame_size*C) that the test passes, yet JIT fails anyway?  The intent was to fudge C so that this could not happen.

Anyway, you're right that its a fragile test that must be overly conservative to be accurate.  I still support the patch that allows osr-jit to fail.  And, lets make sure failure makes the function blacklist itself against future OSR compilation.

I think right now, we might attempt to re-compile the function each time the invocation counter wraps around.  the fast-path should still be decrement-and-branch; any blacklisting should be inside OSR::execute, probably.
(In reply to comment #47)

> The worst case being that so many things spill (or are not accounted for by
> frame_size*C) that the test passes, yet JIT fails anyway?

Yes.  It should be possible to construct an arithmetic expression that forces a spill for as many common subexpressions as you desire.
Comment on attachment 493169 [details] [diff] [review]
Patch to experiment with recovery from JIT failure during OSR

As Edwin already mentioned, it would be preferable not to take the address of a1. Instead, return the boolean indicating success by simulated pass-by-reference with a boolean argument whose address is taken. Then branch on the boolean var's value. Keep assigning to a1 from the return value.

The test for stack size that may suppress JIT-ing is taken out for testing only, right? In production it would stay in, I suppose to suppress pointless JIT attempts. Then the added test for JIT failure is just a safety device.
This patch corrects a storage leak and incorporates the fast-fail test to address bug 601794.
Attachment #493169 - Attachment is obsolete: true
Attachment #493859 - Flags: review?(edwsmith)
Attachment #493169 - Flags: feedback?(bmathisk)
(In reply to comment #49)
> As Edwin already mentioned, it would be preferable not to take the address of
> a1.

Actually I had hoped there would be at least one compiler out there that would obey the 'register' keyword and flag &a1 as an outright error.  It's pretty sad if compilers ignore the semantic implications of 'register' even if they (rightly) ignore it as a hint for which variables to place in a register.
Comment on attachment 493859 [details] [diff] [review]
Handle JIT failure during OSR, avoid compiling methods sure to fail

Cool.

in OSR::execute(), you could test method->_hasFailedJit instead of isInterpreted(), assuming the bit is set early enough -- hasFailedJit is
more precise and with our evolving policy, its good not to add more places
we call isInterpreted().

My coder within can't help but point out that

  if (OSR::countEdge(..)) {
     if (OSR::execute(..)) {

would be tighter as:

  if (OSR::countEdge(...) &&
      OSR::execute(...)) {

but it should generate exactly the same code.  shrug.
Attachment #493859 - Flags: review?(edwsmith) → review+
Attached patch Implement versioning for OSR (obsolete) — Splinter Review
OSR needs to be versioned to the same SWF version as the blocking bugs (interpreter/JIT semantic divergences).

Issue: currentBugCompatibility() takes the bug compatibility data from either the current CodeContext or from the builtin pool.  I am unclear on exactly how code contexts are used.  I am particularly concerned if I am going to get the right bug compatibility data from the query I've inserted in OSR:isSupported().

In a discussion with Rick, he cast doubt on whether it was possible to bind the appropriate compatibility flags at JIT time.  It is essential to do this for OSR, otherwise we can't version OSR itself, but would have to make the JIT generate code to emulate the behavior that would have obtained had the default eager compilation strategy been used.  This would have unpleasant implications for performance.
Attachment #494594 - Flags: superreview?(edwsmith)
Attachment #494594 - Flags: review?(lhansen)
Attachment #494594 - Flags: feedback?(rreitmai)
Pushed to tr-spicy:

http://hg.mozilla.org/users/stejohns_adobe.com/tr-spicy/rev/870b29a6bb9d

JIT failure recovery and fast-fail for methods with large frames.
Attachment #492598 - Flags: feedback?(rreitmai)
Attachment #494594 - Flags: review?(lhansen) → review+
Comment on attachment 494594 [details] [diff] [review]
Implement versioning for OSR

I'd like to see an assert or something that balks if OSR is enabled when one of the dependent bugs is not, to guard against misconfiguration.

Double check the implementation of bugCompatibility -- will it turn itself on and off for the right method?  ie is the top of the CodeContext stack pointing to the about-to-be-compiled method, when the bug check is invoked?  the verifier + jit run "inbetween" invocations (along the caller->callee edge), so its fiddly and important to get this right.
Attachment #494594 - Flags: superreview?(edwsmith) → superreview+
(In reply to comment #55)
> Comment on attachment 494594 [details] [diff] [review]
> Implement versioning for OSR

> Double check the implementation of bugCompatibility -- will it turn itself on
> and off for the right method?  ie is the top of the CodeContext stack pointing
> to the about-to-be-compiled method, when the bug check is invoked?  the
> verifier + jit run "inbetween" invocations (along the caller->callee edge), so
> its fiddly and important to get this right.

We don't set up a normal (non-override) MethodFrame until function entry, in the prologue, so currentMethodFrame should refer to the caller at the time the JIT is invoked (for invocation count) or the current function (for loop backedge).  I discussed this a bit with Rick.  I think that the correct approach is to first check whether the function about to be compiled is a builtin.  If it is not, use bug compability taken from the CodeContext in the AbcEnv associated with the method, otherwise currentBugCompatibility() at the call site should be correct.  This ties in with Bug 617424.
The essential thing we must guarantee is that compilation is only deferred when we know that the method will always be executed with a bug compatibility in which the interpreter/compiler divergences are fixed.  The bug compatibility of a builtin function is dependent on its caller.  There is no guarantee that subsequent calls will provide the same bug compatibility as the first.  Thus, absent a provision for multiple JIT'd versions of the same method, we cannot defer compilation of builtin methods, but must follow the legacy compilation policy so that unfixed divergences will manifest as they did pre-OSR.

Likewise, when implementing versioned bug fixes that affect code generation, we can check bug compatibility for those bugs at JIT time in the same manner as we check for the OSR "bug" as well.  Fixes may be freely baked into compiled *non-builtin* methods, but interpreted methods must implement a runtime check.

These claims are based on the understanding of the bug compatibility mechanism explicated by Steven, and reinforced by various comments in the avmplus code,
but see bug 617424.
Attachment #496373 - Flags: feedback?(edwsmith)
Attachment #496373 - Flags: feedback?(rreitmai)
Attachment #496373 - Attachment is patch: true
Attachment #496373 - Flags: feedback?(lhansen)
(In reply to comment #57)
> Created attachment 496373 [details] [diff] [review]
> 494594: Implement versioning for OSR (v2)

> *non-builtin* methods, but interpreted methods must implement a runtime check.
                             ^^^^^^^^^^^ 
                             should read 'builtin'
Attachment #496373 - Flags: review?(edwsmith)
Attachment #496373 - Flags: feedback?(edwsmith)
Attachment #496373 - Flags: feedback?
Attachment #496373 - Flags: feedback?
(In reply to comment #57)

I claimed above that we cannot OSR builtins, because they may be called with different BugCompatibility objects, and, in particular, one that does not allow for an interpreter/compiler divergence to be fixed.

In a meeting this morning, Edwin suggested that builtin code should always be treated as belonging to the most recent version (and thus would always use the
builtinBugCompatibility object).

It is inherent in the way that currentBugCompatibility() works, and in the design of the versioning mechanism, that builtins inherit bug compatibility from the innermost non-builtin activation.  This allows builtins to query the bug compatibility and provide the versioned behavior expected by their callers. I have assumed (and this understanding has received at least passing acquiescence from Steven) that versioning of VM-level behavior should follow this principle as well, and that is also what we have currently implemented.  The presumed benefit of this is that behavior of the builtins that could be induced by buggy behavior of the underlying VM will remain compatible where expected, even if the dependency has not been explicitly noted.  This is not a terribly robust scheme, as evolution of the builtin code may cause old bugs to manifest differently, and it remains necessary to test the APIs provided by the builtins at each version and explicitly emulate compatible behavior when required.

It is open to question whether the sort of VM-level bugs that we are concerned with here are actually relevant to behavior of the builtins as observed from user code.  Perhaps we should determine bug compatibility for the purpose of versioning these sorts of fixes in a different way, in which the bug compatibility for builtins is fixed, and cannot be overridden by the caller. One choice is to always choose the latest version.  Another is to *always* preserve old behavior in builtins, which is a bit more robust in the sense of retaining compatibility in the face of accidental/unknown dependencies.

In other words, the VM should compute current bug compatibility like so:
If the method executing (interpreter) or about to be compiled (jit)
is a builtin, use builtinBugCompatibility, else use env->abcEnv()->codeContext()->bugCompatibility(), e.g., the bug compatibility associated with the ABC or SWF from which the method bytecode was obtained.

Does it make sense/is it safe for the VM to bypass the usual bug compatibility mechanism, whose design center appears to be API compatibility, when dealing with minor VM-level errata?
(In reply to comment #59)

Proposal: OSR and OSR blocking bugs (and possibly other VM bugs) should get
their BugCompatibility object like so:

const BugCompatibility* AvmCore::currentVMBugCompatibility(MethodEnv* env) const
{
    if (env->method->pool()->isBuiltin)
        return env->codeContext()->bugCompatibility();
    else
        return builtinBugCompatibility;
}
(In reply to comment #60)

I should proofread my code:

const BugCompatibility* AvmCore::currentVMBugCompatibility(MethodEnv* env) const
{
    if (env->method->pool()->isBuiltin)
        return builtinBugCompatibility;
    else
        return env->abcEnv()->codeContext()->bugCompatibility();
}
(In reply to comment #59)
> Does it make sense/is it safe for the VM to bypass the usual bug compatibility
> mechanism, whose design center appears to be API compatibility, when dealing
> with minor VM-level errata?

The goal of the bugCompatibility mechanism is that SWF built for version N always sees the behavior it expects. What you are describing seems to directly contradict this, such that SWF built for version "N" will behave differently in newer VMs. This would be terribly problematic for compatibility of existing content; we can't break existing content unless there's compelling reason to do so.
(In reply to comment #62)

Do you object to the treatment of non-builtins here?  Or just builtins?
In the case of builtins, the issue is whether the bug in question (or it's fix) can affect the observable behavior of the platform.  My assumption has been, up to now, that we must assume that it can -- you seem to have assumed the same.  If I understood Edwin correctly this morning, however, he was questioning the necessity of that assumption.
(In reply to comment #63)
> Do you object to the treatment of non-builtins here?  Or just builtins?
> In the case of builtins, the issue is whether the bug in question (or it's fix)
> can affect the observable behavior of the platform.  My assumption has been, up
> to now, that we must assume that it can -- you seem to have assumed the same. 

Just the builtins, and only if it's observable in a way that might reasonably affect existing content.
(In reply to comment #63)
> (In reply to comment #62)
> 
> Do you object to the treatment of non-builtins here?  Or just builtins?
> In the case of builtins, the issue is whether the bug in question (or it's fix)
> can affect the observable behavior of the platform.  My assumption has been, up
> to now, that we must assume that it can -- you seem to have assumed the same. 
> If I understood Edwin correctly this morning, however, he was questioning the
> necessity of that assumption.

You might have misunderstood me.  Builtin code that implements some feature, has to emulate bugs in that feature for old content, whether its implemented in AS3 or C++.  However, that builtin code itself, is running on the latest VM always, and should benefit from bugfixes in the VM.  As long as fixes like this dont change the user-visible behavior, its fine.  Builtin code also sees the latest API version even when loaded content doesn't.
(In reply to comment #65)

> However, that builtin code itself, is running on the latest VM
> always, and should benefit from bugfixes in the VM. 
> Builtin code also sees the latest API version even when loaded content doesn't.

Unless I am thoroughly confused, neither of these statements is presently true, either in what is implemented, or in the design intent as communicated by Steven.  It does seem to be a reasonable position, if we are in fact willing to live up to the testing obligations that verify that compatible back-rev API behavior is maintained even when builtins are running on the latest platform.

What I was starting to drive at is that "VM errata" (think Intel's Specfication Updates) might be handled differently than "API errata", which seems to capture the essence of your argument.

What I'd like to see clarified is whether we have explicitly chosen to version such VM errata in builtin code because we understand that it *might*, in principle, indirectly affect the user-visible behavior of that code, or whether we are currently doing so simply because we don't distinguish such errata from clearly user-visible API changes.
Comment on attachment 496373 [details] [diff] [review]
494594: Implement versioning for OSR (v2)

f+ 

(A) nit, this just hurts my eyes '--m->_abc.countdown == 0'

(B) I'd love to see comments on each of the lines of the isSupported() method.  Or maybe references back to the comments , like so:

/**
   OSR only runs under the following conditions:
      [1]  ..
      [2]  …
   …
*/

then in the code 

    core.confg.runmode == RM_mixed &&   /* [1] */
   ….

which would clear up some confusion for example I'm guessing that osr_threshold == 0 implies disabled.
Attachment #496373 - Flags: feedback?(rreitmai) → feedback+
Comment on attachment 494594 [details] [diff] [review]
Implement versioning for OSR

see above comments
Attachment #494594 - Flags: feedback?(rreitmai) → feedback+
Comment on attachment 496373 [details] [diff] [review]
494594: Implement versioning for OSR (v2)

I think this is right... thinking out loud, we're only going to OSR a method if it's not builtin (so we dont have to worry about varying expectations of different callers), and if the method lives in an AbcEnv that expects OSR to be available.
Attachment #496373 - Flags: feedback+
Note: I'm pretty sure that the "intrinsic" bugCompatibility for the AbcEnv's own CodeContext is well-known up front, so if OSR::isSupported() shows itself to be hot, we could collapse some of this test into a single bit marked on MethodInfo, a la:

	isOSRCandidate = !hasExceptions() && !pool->builtin && bugCompatibility;

         return (m->isOSRCandidate() &&
                core->config.runmode == RM_mixed &&
                core->config.osr_threshold != 0 &&
                !m->hasFailedJit() &&
                !CodegenLIR::jitWillFail(ms))
This patch updates the v2 patch with some extra commentary per Rick's feedback, but does not make substantive changes to the versioning logic.

I've also added code to generate a dynamic check, in debug builds only, that every invocation of a compiled non-builtin (and thus potentially OSR-able) method receives bugCompatbility via currrentBugCompatibility() identical to what OSR would have assumed at JIT time.  This will help us flush out abuses of CodeContext that may break OSR, or future versioned fixes that bake the fix into the compiled code.
Attachment #494594 - Attachment is obsolete: true
Attachment #496373 - Attachment is obsolete: true
Attachment #496762 - Flags: review?(edwsmith)
Attachment #496373 - Flags: review?(edwsmith)
Attachment #496373 - Flags: feedback?(lhansen)
(In reply to comment #55)
> Comment on attachment 494594 [details] [diff] [review]
> Implement versioning for OSR
> 
> I'd like to see an assert or something that balks if OSR is enabled when one of
> the dependent bugs is not, to guard against misconfiguration.

Since the BugCompatibility objects are generally treated as read only,
misconfiguration is only possible at the one point in AvmCore::BugCompatibility() where these objects are constructed.  There doesn't seem to be much point in a separate assertion.  We could, however, break up the numerical ordering of the bugzillaXXXX bits, and group all of the OSR dependencies in the same place.

In v3, I've added a more substantive assertion that the BugCompatibility object itself is consistent between what is assumed at JIT time and what applies at execution time.
Attachment #496762 - Flags: review?(edwsmith) → review?(rreitmai)
Comment on attachment 496762 [details] [diff] [review]
Implement versioning for OSR (v3)

rubber stamp, since I've seen (and commented on) the prior n versions.
Attachment #496762 - Flags: review?(rreitmai) → review+
Severity: enhancement → major
Target Milestone: flash10.2.x-Spicy → flash10.x-Wasabi
Currently open work items:
1) Port tr-spicy patch to TR.
2) Flash integration issues to deploy in Wasabi.
3) Continued attention to blocking issues (bug 561368) in the Wasabi timeframe.
Target Milestone: flash10.x-Wasabi → flash11-Nigel
Attached patch OSR patch for TR (obsolete) — Splinter Review
OSR patch as landed in spicy, rebased for TR.
This is the OSR support as landed in Spicy, rebased for TR.  It includes the versioning support, and supersedes attachment 524326 [details] [diff] [review], in which the fast-fail mechanism was silently disabled due to a misapplied patch.

The OSR support must be explicitly enabled via --enable-osr, and is disabled by default.  It does not represent the last word on OSR, or what we want to make generally available in Brannan.  It is important that this get into TR and the flash mainline, however, in order to preserve the OSR support already shipped on AIR Android for Spicy/Spicier in the Serrano release.  An additional FP patch will be needed as well to carry the Spicy support forward.
Attachment #524326 - Attachment is obsolete: true
Attachment #524515 - Flags: review?(rreitmai)
Comment on attachment 524515 [details] [diff] [review]
Cumulative OSR patch for TR (port from Spicy)

R+ rubber stamp since all of these pieces have been previously reviewed.
Attachment #524515 - Flags: review?(rreitmai) → review+
Is there some limited amount of testing that should be done in the build system for OSR now that it has landed in tamarin-redux?

Setting in-testsuite to "?"
Flags: in-testsuite?
Tamarin should build with --enable-osr selected.  There's not much point in doing extensive testing until we're ready for general deployment, but we should
check for bit-rot.

The acceptance testing that was done for OSR for Spicy should likely be revisited for Serrano, particularly to make sure we're still seeing a similar performance profile.  It is really more important that this be examined in player than in the shell, so this may properly fall to the AIR Android folks.
There is a player patch needed, which has been prepared, and should land in the next day or two.
Blocks: 653484
Blocks: 661723
Retargeting to Cyril to align with Half Moon.
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril
Alias: OSR
Summary: OSR (On-Stack replacement of interpreted methods by JIT-ted methods) → OSR: On-Stack replacement of interpreted methods by JIT-ted methods
Depends on: 699855
Depends on: 724631
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: