Closed Bug 569321 Opened 14 years ago Closed 14 years ago

VM should optimize rest arguments

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

(Whiteboard: PACMAN, has-patch)

Attachments

(9 files, 9 obsolete files)

Work in progress 14 years ago Lars T Hansen 20.70 KB, patch		Details \| Diff \| Splinter Review
Work in progress 14 years ago Lars T Hansen 21.04 KB, patch		Details \| Diff \| Splinter Review
Functional, performant, nearly complete 14 years ago Lars T Hansen 23.25 KB, patch	edwsmith : feedback+ stejohns : feedback+	Details \| Diff \| Splinter Review
Benchmark: class with early-bindable 'push' that uses varargs 14 years ago Lars T Hansen 2.48 KB, text/plain		Details
Benchmark: class with early-bindable 'push' that takes one arg 14 years ago Lars T Hansen 2.47 KB, text/plain		Details
Generated code for Array.prototype.push 14 years ago Lars T Hansen 6.78 KB, text/plain		Details
Benchmark: class with early-bindable 'push' that uses varargs but can't be optimized 14 years ago Lars T Hansen 2.50 KB, text/plain		Details
Benchmark: class with early-bindable 'push' that uses varargs and for which the analysis succeeds, but where there's a fallback to the general case at run-time. 14 years ago Lars T Hansen 2.50 KB, text/plain		Details
Patch 14 years ago Lars T Hansen 35.55 KB, patch		Details \| Diff \| Splinter Review
Patch, v2 14 years ago Lars T Hansen 35.63 KB, patch	edwsmith : review+ stejohns : review+	Details \| Diff \| Splinter Review
Patch, v3 14 years ago Lars T Hansen 35.19 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review
Acceptance test 14 years ago Lars T Hansen 4.02 KB, patch	stejohns : review+ brbaker : review+	Details \| Diff \| Splinter Review
Testconfig patch 14 years ago Lars T Hansen 690 bytes, patch		Details \| Diff \| Splinter Review
Patch, v4 14 years ago Lars T Hansen 38.63 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review
Testconfig patch, v2 14 years ago Lars T Hansen 716 bytes, patch	brbaker : review-	Details \| Diff \| Splinter Review
Keep ap_param live; re-enable optimization 14 years ago Lars T Hansen 1.72 KB, patch		Details \| Diff \| Splinter Review
load Toplevel* at runtime, pass index as Atom to restargHelper() 14 years ago Edwin Smith 878 bytes, patch		Details \| Diff \| Splinter Review
Testconfig patch, v3 14 years ago Lars T Hansen 796 bytes, patch	brbaker : review+	Details \| Diff \| Splinter Review

Lars T Hansen

Assignee

Description

•

14 years ago

Investigation suggests it's possible for the VM to optimize the use of ...rest arguments so as to avoid allocating the rest arguments array in common cases, by means of a lightweight analysis coupled with a fallback.

Benchmarks suggest that this optimization significantly speeds up functions that use rest arguments, especially if those functions are early-bound (15x for simple cases) but also if they're found in the prototype chain (2x).

Lars T Hansen

Assignee

Comment 1

•

14 years ago

Attached patch Work in progress (obsolete) — Details — Splinter Review

This sketch is wrong for numerous reasons but provides the overall flavor: the verifier, or perhaps another front end stage, recognizes at low cost functions whose use of the rest arguments is benign.  LIR generation then generates the new internal restarg/restargc instructions and makes the prologue not allocate the rest array immediately.  The implementations of restarg/restargc access the unconsed representation normally, but if run-time usage turns out to violate the assumptions the restarg instruction will cons up the array, and the new instructions will subsequently access that instead.

The reasons this patch is wrong are at least:
- the analysis in the verifier is too lenient
- the fallback cases in restarg/restargc are not implemented
- the way i emit code in the verifier's case for getproperty is wrong, because the verifier crashes subsequently.

Lars T Hansen

Assignee

Comment 2

•

14 years ago

Attached patch Work in progress (obsolete) — Details — Splinter Review

Promoted myself from "novice" to "bumbling amateur" on the JIT.  This works on simple cases, other objections still stand.

Attachment #448496 - Attachment is obsolete: true

Steven Johnson

Comment 3

•

14 years ago

It would be interesting to see the assembly output from the newly-optimized cases.

Lars T Hansen

Assignee

Comment 4

•

14 years ago

(In reply to comment #3)
> It would be interesting to see the assembly output from the newly-optimized
> cases.

I'll see what I can do.

The current generated code calls out to C++ helper functions always, but if we could prove that rest[n] is accessed with n always a valid Uint32 value then we could do significantly better: a couple of instructions each for length and element access.  The verifier might have that information for typed code and interesting access patterns (I worry about 'int' vs 'uint'), so that would be an interesting follow-on patch.

Lars T Hansen

Assignee

Comment 5

•

14 years ago

Attached patch Functional, performant, nearly complete (obsolete) — Details — Splinter Review

This is ready for feedback / preliminary review:

- It passes the test suite on Mac in R and DD modes
- It doubles the performance on the untyped 'push' benchmark (from 27 to 54 iterations, Mac Pro) and improves the performance for calls to early-bound functions that use rest arguments in "nice" ways by a factor of 10 (from 50 to 475 iterations). The discrepancy in gains is due to the very expensive calls to prototype-bound methods; once we clean that up, this optimization will matter more for JS-style code.
- It requires no user changes to code, nor recompilation - a big win, really.
- It is reasonably clean and surprisingly small.

Particularly I would be interested in hearing whether the analysis fits into the verifier or whether it should be factored, and if so, how. I put it there because it was expedient.

The following also remain to be done:

- lazyrest_arg has a FIXME, where it needs to construct a complete multiname before accessing an array that has been consed up when the property name turned out not to be "nice".
- I need to have some test cases for the fallback functionality
- lazyrest_arg should probably allow 'double' property names that turn out to be uint32 values
- Since the verifier is pushing OP_restarg and OP_restargc down the coder pipeline I need to decide whether to support those in the wordcode interpreter (and possibly elsewhere? if so, where?) or whether to disable them entirely in non-jit builds. In principle it would be a real win for the wordcode interpreter to support them but perhaps that's a follow-on bug.
- lazyrest_arg and lazyrest_argc should probably be called restargHelper and restargcHelper, respectively

I will post benchmarks and some generated code subsequently.

Attachment #448556 - Attachment is obsolete: true

Attachment #448735 - Flags: feedback?(stejohns)

Attachment #448735 - Flags: feedback?(edwsmith)

Lars T Hansen

Assignee

Comment 6

•

14 years ago

Attached file Benchmark: class with early-bindable 'push' that uses varargs — Details

This was derived from the untyped 'push' benchmark.
Compile with -strict.

Performance (Mac Pro):
Without the optimization: 50 iterations/sec
With the optimization: 475 iterations/sec

Lars T Hansen

Assignee

Comment 7

•

14 years ago

Attached file Benchmark: class with early-bindable 'push' that takes one arg — Details

Directly comparable to the previous benchmark, this represents the best case for the optimization.
Compile with -strict.

Performance (Mac Pro): 892 iterations/sec.

The 2x advantage of this benchmark over the previous one with the optimization enabled shows that there are further gains to be made if we extend the analysis so that we avoid the subroutine calls to the helpers when we know that all property names used to access the rest array are Uint32 values.  In the case of the previous benchmark, the value is a constant - that's probably not terribly common.  But Array.prototype.push uses a uint-typed variable to access the array, and the optimization would apply in that case.

Lars T Hansen

Assignee

Comment 8

•

14 years ago

The "untyped push benchmark" referenced earlier is in test/performance/jsmicro/array-push-1.as.  Be careful how you run that - normally the performance tests are compiled with -AS3 and so these benchmarks will call AS3 methods, not prototype methods.  The AS3 method is native-coded and does not create the rest array (that's not possible for the prototype method, since it's generic).  Be sure to pass -f --nooptimize to the test runner.

Lars T Hansen

Assignee

Comment 9

•

14 years ago

Attached file Generated code for Array.prototype.push (obsolete) — Details

Search for restargHelper and restargcHelper to see where those calls are generated; compare with the original's call to createRestHelper (I haven't included that code here).

Another possible improvement is visible here; right now restargcHelper returns an atom always, but in performance contexts, like for Array.prototype.push, it will be converted to int or uint.  It would probably be better for the function to just return uint, that saves time both in the function and in important callers.

Edwin Smith

Comment 10

•

14 years ago

Design caution: its imperative that any thing happening in the verifier, not cause user visible changes to the types or nullability of the data model. As long as any analysis that piggyback's the verifier's analysis is passive, then its okay. "passive" means it can't make the verifier fail, and it can't affect the Value.traits or Value.notNull fields in FrameState. From what I can tell, this patch honors the contract.

replacing OBJ.length with restargc is a kind of early binding and we should think of it that way -- you should get faster code with the same type effects. (early binding to Array.length should yeild uint, for example).

Could the analysis be done within a CodeWriter? we have pending work to separate verification from jit-compiling, and it will be easier to tease apart the code if piggyback analysis is well encapsulated. (but, if the refactoring is awkward, comments are enough). I do see some annoyances... for example OP_getlocal calls CW.write() instead of writeOp1, forcing analyzers to re-parse abc to get the local var #.

if copies (getlocal/setlocal, etc) propagated the Value.isRestArray bit, would we still need the produceRest/consumeRest accounting? the accounting is a simple/conservative escape analysis right? (thinking about how its assumptions might be violate-able)

does produce/consumeRest stay valid in the face of exception handling? (thinking out loud -- stack contents are erased along the exception edge, so is it enough to zero produceRest/consumeRest at the start of the catch block? (should produce/consumeRest be fields in FrameState and participate in control-flow merge logic?) e.g. OP_getlocal<rest>; OP_jump; ... OP_getproperty<length>

Since restVar is a local variable, it is visible to an active debugger stepping through the code. debugging should probably just disable the whole optimization.

Edwin Smith

Updated

•

14 years ago

Attachment #448735 - Flags: feedback?(edwsmith) → feedback+

Brent Baker

Comment 11

•

14 years ago

(In reply to comment #8)
> The "untyped push benchmark" referenced earlier is in
> test/performance/jsmicro/array-push-1.as.  Be careful how you run that -
> normally the performance tests are compiled with -AS3 and so these benchmarks
> will call AS3 methods, not prototype methods.  The AS3 method is native-coded
> and does not create the rest array (that's not possible for the prototype
> method, since it's generic).  Be sure to pass -f --nooptimize to the test
> runner.

You can add a file with asc directives that should allow you to specify that this file should not be optimized. Something like this /should/ do what you want:

file: test/performance/jsmicro/array-push-1.as.asc_args
contents: merge| -no-optimize

example: test\acceptance\regress\vector_domain_bug.as.asc_args

Steven Johnson

Comment 12

•

14 years ago

Comment on attachment 448735 [details] [diff] [review]
Functional, performant, nearly complete

nits:

-- disabling the optimization for debugger makes sense, but why disable for the sampler? Trying to profile code that has this as a bottleneck will be misleading...

-- why is NO_VAR a #define rather than a well-typed const?

-- since lazyRest() is only valid post-verification, adding an assert that the MethodInfo has been verified would be nice.

-- instead of multiname.getName()->equalsLatin1("length"), you can use multiname.getName() == core->klength, as getName() will be interned.

-- in lazyrest_argc, we assume that argc will always fit into an intptr atom... is it possible for it to be greater than 2^28? (seems unlikely, but is it *possible*?)

-- meta-question: everywhere we deal with argc, it's a (signed) int, meaning we have to check for < 0... this seems insane, and while it's definitely outside the scope of this bug, it seems nutty that it isn't uint32_t...

Attachment #448735 - Flags: feedback?(stejohns) → feedback+

Lars T Hansen

Assignee

Comment 13

•

14 years ago

(In reply to comment #12)
> (From update of attachment 448735 [details] [diff] [review])
> nits:
> 
> -- disabling the optimization for debugger makes sense, but why disable for the
> sampler? Trying to profile code that has this as a bottleneck will be
> misleading...

Sampler != profiler, i believe.  Anyway the reason I did this is because the sampler 'callback' test (yes, that one again) fails with the optimization enabled, and I assumed without investigating too deeply that the sampler was grubbing around in state on the C++ level.  I should probably dig a little deeper.

> -- why is NO_VAR a #define rather than a well-typed const?

WILLFIX.

> -- since lazyRest() is only valid post-verification, adding an assert that the
> MethodInfo has been verified would be nice.

Good idea.

> -- instead of multiname.getName()->equalsLatin1("length"), you can use
> multiname.getName() == core->klength, as getName() will be interned.

Thank you.

> -- in lazyrest_argc, we assume that argc will always fit into an intptr atom...
> is it possible for it to be greater than 2^28? (seems unlikely, but is it
> *possible*?)

In the vicinity of 2^30 array elements is probably possible on 32-bit systems, but the maximum value we will have here is tied to the maximum number of arguments that can be passed to a function, and I assumed - again without digging - that that is much lower.  Will investigate if nobody provides an answer.

> -- meta-question: everywhere we deal with argc, it's a (signed) int, meaning we
> have to check for < 0... this seems insane, and while it's definitely outside
> the scope of this bug, it seems nutty that it isn't uint32_t...

Yes, I think the right thing to do is to morph the argc variable to AS3 'uint'.

Lars T Hansen

Assignee

Comment 14

•

14 years ago

(In reply to comment #10)
> Design caution:  its imperative that any thing happening in the verifier, not
> cause user visible changes to the types or nullability of the data model.  As
> long as any analysis that piggyback's the verifier's analysis is passive, then
> its okay.  "passive" means it can't make the verifier fail, and it can't affect
> the Value.traits or Value.notNull fields in FrameState.   From what I can tell,
> this patch honors the contract.

"If you lie to the computer, it will get you".  Yes, this has been on my mind and was the root cause of many of my headaches along the way, and in the current design I'm reasonably happy.  I still lie a little bit - the restargs local is marked not-null even if it is 0, the reasoning being that that is not observable to the program, and it ensures that pass1 and pass2 do not diverge.

> replacing OBJ.length with restargc is a kind of early binding and we should
> think of it that way -- you should get faster code with the same type effects. 
> (early binding to Array.length should yeild uint, for example).

At the moment I model this by returning Atom from restargcHelper and then coercing to whatever type the context expects; is this kosher?

> Could the analysis be done within a CodeWriter?  we have pending work to
> separate verification from jit-compiling, and it will be easier to tease apart
> the code if piggyback analysis is well encapsulated.  (but, if the refactoring
> is awkward, comments are enough).  I do see some annoyances... for example
> OP_getlocal calls CW.write() instead of writeOp1, forcing analyzers to re-parse
> abc to get the local var #.

No opinion on this, I'll probably be inclined to try to land as-is with comments, but I have no good view of the long-term ramifications.

> if copies (getlocal/setlocal, etc) propagated the Value.isRestArray bit, would
> we still need the produceRest/consumeRest accounting?  the accounting is a
> simple/conservative escape analysis right?  (thinking about how its assumptions
> might be violate-able)
> 
> does produce/consumeRest stay valid in the face of exception handling? 
> (thinking out loud -- stack contents are erased along the exception edge, so is
> it enough to zero produceRest/consumeRest at the start of the catch block? 
> (should produce/consumeRest be fields in FrameState and participate in
> control-flow merge logic?)  e.g.  OP_getlocal<rest>; OP_jump; ...
> OP_getproperty<length>

It's a speculative escape analysis and I'm not yet completely convinced of its correctness.  The produceRest/consumeRest values are a short-hand so that I did not have to add code to every instruction case, but I may have missed some cases: I'm not sure what the effect of 'dup' is, for example.

Right now the analysis requires that if the rest array is produced in a block it is consumed in that block, one consumer for each producer.  (So 'dup' probably needs to conditionally increment produceRest, or bail: set restVar to NO_VAR.)  Going beyond basic block is attractive but not necessary for the basic case, imo, which is to make it possible to use rest args in stylized ways without paying a penalty.  And that being the case, an exception shouldn't be a problem: it prunes the stack (throwing away any unconsumed implicit 'rest'), and enters a block at the beginning, and that block will always produce a rest array before consuming it.

> Since restVar is a local variable, it is visible to an active debugger stepping
> through the code.  debugging should probably just disable the whole
> optimization.

The four lines following the initialization of restVar do just that.

I will follow up tomorrow re the analysis, with some propositions about its correctness.

Lars T Hansen

Assignee

Comment 15

•

14 years ago

(In reply to comment #11)
> 
> You can add a file with asc directives that should allow you to specify that
> this file should not be optimized.

Is it possible to have one such file per directory, rather than one per test file?  I can automate the creation of those files, but it just turns into such a mass of files.

Lars T Hansen

Assignee

Comment 16

•

14 years ago

Following is the algorithm for the least ambitious but still useful analysis.  Please check it to see if you believe it.  I know it can be extended, but I would much prefer to start with something simple and useful than to try to create something general and much harder to implement.

-----

We aim to determine whether the 'rest array' is used in only two ways:

 - to access the 'length' property with either the public namespace or a public namespace in the namespace set

 - to access an unknown property with either the public namespace or a public namespace in the namespace set

If those are the only uses of the rest array then the rest array is eligible for a speculative optimization that delays or avoids its construction.  (The optimization can be non-speculative if more is known about the property names in the second case but I won't address that further.  In important use cases we won't have that information.)

To determine whether we can enable the optimization we analyze the ABC code for the method, this amounts to a simple escape analysis.  We say the analysis "fails" if the optimization is rejected because we can't prove that the rest array does not escape.

Nested methods cause the array to escape; the analysis fails if the code requires an activation record.  (It is possible to do better.)

A debugger causes the array to escape; the analysis fails if a debugger is present.

The rest array is stored in a distinguished local slot in the ABC code, and we know which slot this is.  We assume ASC does not do
something clever with the array, eg, by moving it to a different local slot.  (It is possible to do better.)

The analysis needs to prove the following:

* If the rest local is read by a GETLOCAL and the rest array value is produced, and if that value is consumed by an instruction (rather  than being discarded by an exception) then that value must be consumed by a GETPROPERTY instruction of one of two forms:

 - It reads a "length" property with a public namespace or a public namespace in its namespace set.

 - It reads a property with a run-time name, with a public namespace or a public namespace in its namespace set.

* If the rest local is read by any other instruction than a GETLOCAL, or is consumed or inspected by any other instruction than a
 GETPROPERTY of the two forms above, then the analysis fails. 

The analysis is fairly weak but handles code that occurs in practice.

A bit, isRestArray, on each Value in the FrameState tracks whether a value is a rest array.  This bit is set by GETLOCAL.  The bit is checked by every other instruction: the operands to the instruction are checked.  If the bit is set but the instruction is not a GETPROPERTY of the correct form then the analysis fails.  (This means the bit does not propagate with the value as the value is duplicated, moved into locals, etc.)

The analysis checks the values that remain in the FrameState at block boundaries.  If isRestArray is set on any value, the analysis fails.  (It is possible to do better but not necessary to do so at this point.)

Brent Baker

Comment 17

•

14 years ago

(In reply to comment #15)
> Is it possible to have one such file per directory, rather than one per test
> file?  I can automate the creation of those files, but it just turns into such
> a mass of files.

Yes you can have a dir.asc_args file and it will be used

Edwin Smith

Comment 18

•

14 years ago

I can't find any flaws in the algorithm that fail to meet its aim, which is to identify methods that only use the rest array in those two ways.  however the second way amounts to an escape, which the runtime must handle by lazy-creating the array (right?)

If lazy creating the array happens after some previously early bound code is emitted, the code could be incorrect.  Can you have a looping situation where the analysis succeeds, allowing early binding to rest.length, then runtime creates the array on the fly due to escaping, and then on a second iteration, the early bound length is wrong?

sketch:
  function test(name, ...rest) {
    for (var i=0; i < 2; i++) {
      getlocal #rest
      getproperty public::length  // early binds to argc
      getlocal #rest
      getlocal #name
      getproperty {public,AS3}::[] // does not cause analysis to fail
      pushint 1
      call     // invokes this.AS3::push(1) which modifies length      
  }
  test("push")

Lars T Hansen

Assignee

Comment 19

•

14 years ago

(In reply to comment #18)
> I can't find any flaws in the algorithm that fail to meet its aim, which is to
> identify methods that only use the rest array in those two ways.  however the
> second way amounts to an escape, which the runtime must handle by lazy-creating
> the array (right?)

Right.

> If lazy creating the array happens after some previously early bound code is
> emitted, the code could be incorrect.  Can you have a looping situation where
> the analysis succeeds, allowing early binding to rest.length, then runtime
> creates the array on the fly due to escaping, and then on a second iteration,
> the early bound length is wrong?

You can have a loop, but the bug here would be in the proposed implementation of early binding to 'length'.  Every early-bound access to length or to a property access on the object must check whether the rest array has been consed up, and access the array and not the cached information if so.  ie, in the current implementation the code would be

  function test(name, ...rest) {
    for (var i=0; i < 2; i++) {
      getlocal #rest
      restargc
      getlocal #rest
      getlocal #name
      restarg {public,AS3}::[] // does not cause analysis to fail
      pushint 1
      call     // invokes this.AS3::push(1) which modifies length      
  }
  test("push")

where restargc and restarg are pretty much as in the provided patch and dispatch on whether the rest array exists or not.

Edwin Smith

Comment 20

•

14 years ago

i buy that.  I didn't look at the code but I beleive the implementation of restargc will be right as long as it handles lazy creation.

For rest to escape via a nested function it would have to be copied to the scope chain via OP_getlocal+OP_pushscope, which would cause analysis to fail.  (not that we're trying to make the analysis more general, just noting).

I think what the analysis is really buying us is avoiding aliasing - no copies of the rest array will exist statically in the frame of the function, if the analysis passes.

Lars T Hansen

Assignee

Comment 21

•

14 years ago

(In reply to comment #14)
> > Could the analysis be done within a CodeWriter?  we have pending work to
> > separate verification from jit-compiling, and it will be easier to tease apart
> > the code if piggyback analysis is well encapsulated.  (but, if the refactoring
> > is awkward, comments are enough).  I do see some annoyances... for example
> > OP_getlocal calls CW.write() instead of writeOp1, forcing analyzers to re-parse
> > abc to get the local var #.
> 
> No opinion on this, I'll probably be inclined to try to land as-is with
> comments, but I have no good view of the long-term ramifications.

Just to comment on this some more before I offer the patch for review, I've restructured the analysis to use another CodeWriter, because this (a) removes the hacks with the counters and makes it more credible that the analysis is correct and (b) it makes it possible to do more sophisticated analysis (which I have not implemented yet but am pondering).

However, the current structure of the Verifier and CodeWriter are not friendly here.  The problem is that the Verifier is not simply a verifier, it also performs analysis and rewriting of its own.  In particular, it may decide that a particular property access is early bindable or whatever by its own analysis, and thus rewrite the code in some fashion.  At that point it may have preempted any optimization in any following CodeWriter which would have simply removed or otherwise significantly optimized the code, as I'm trying to do here.  On the other hand, the latter CodeWriter could not come before the Verifier without having to significantly duplicate the Verifier's effort, and even then it could not insert internal-only instructions in the instruction stream.

So the way I've had to do it is to rely on capturing intra-instruction behavior in write, writeOp1, etc, and post-instruction behavior in writeOpcodeVerified, and /then/ I've had to insert code inside the verifier to handle cases where the verifier's rewriting needs to be overridden (for OP_getproperty).

Based on a fairly naive understanding of things it seems to me that the Verifier should not be rewriting code at all, it should be verifying that the ABC is correct; rewriting should be left to later pipeline stages, and that would allow optimization stages to be inserted into the pipeline without having to change the verifier.

(Not a big deal, just observing.)

Lars T Hansen

Assignee

Comment 22

•

14 years ago

Attached file Benchmark: class with early-bindable 'push' that uses varargs but can't be optimized — Details

Lars T Hansen

Assignee

Comment 23

•

14 years ago

Attached file Benchmark: class with early-bindable 'push' that uses varargs and for which the analysis succeeds, but where there's a fallback to the general case at run-time. — Details

Lars T Hansen

Assignee

Updated

•

14 years ago

Attachment #449849 - Attachment description: Benchmark: class with early-bindable → Benchmark: class with early-bindable 'push' that uses varargs and for which the analysis succeeds, but where there's a fallback to the general case at run-time.

Lars T Hansen

Assignee

Comment 24

•

14 years ago

(In reply to comment #23)
> Created an attachment (id=449849) [details]
> Benchmark: class with early-bindable

Should have read "Benchmark: class with early-bindable 'push' that uses varargs and for which the analysis succeeds, but where there's a fallback to the general case at run-time."

A variant on this benchmark is to use an index value that is an integer, but out of range, eg "1", the performance profile is rather different (31 iterations/sec rather than 6 iterations/sec).

Generally what these benchmarks show is that if the analysis succeeds, but there's a fallback to allocating the array at run-time, then there's a regression in performance relative to always consing up the varargs in the first place.  Some of that overhead is inherent in calling the helper methods (we see a slowdown from the onearg case to the optimized-varargs case too), some probably comes from suboptimal handling in restargHelper.  It does not seem important, but it could be improved upon.

Lars T Hansen

Assignee

Comment 25

•

14 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

Addresses all concerns expressed previously.  Also see previous comments I've made, and benchmarks.

Follow-on work items could be:
 - a stronger escape analysis (less brittle in the face of ASC whims)
 - analysis that makes use of type and range information to avoid calling the helpers
 - more fast paths in restargHelper
 - (your idea here)

Attachment #448735 - Attachment is obsolete: true

Attachment #448749 - Attachment is obsolete: true

Attachment #449852 - Flags: review?(stejohns)

Attachment #449852 - Flags: review?(edwsmith)

Lars T Hansen

Assignee

Updated

•

14 years ago

Whiteboard: PACMAN → PACMAN, has-patch

Lars T Hansen

Assignee

Comment 26

•

14 years ago

One more observation: the main reason the patch is so big is because it has do decode instructions to determine the number of stack slots read.  If we had a table with this information the patch would have been a lot smaller.  The normal attribute table does not have that information; the wordcode interpreter does however, and Chris Brichford suggested merging all these tables together (because of the AOT work and general utility), but bug #545254 has been languishing since late February.

Edwin Smith

Comment 27

•

14 years ago

(In reply to comment #21)
> (In reply to comment #14)

> Based on a fairly naive understanding of things it seems to me that the
> Verifier should not be rewriting code at all, it should be verifying that the
> ABC is correct; rewriting should be left to later pipeline stages, and that
> would allow optimization stages to be inserted into the pipeline without having
> to change the verifier.
> 
> (Not a big deal, just observing.)

Completely agree, and that's also the subject of this cluster of bugs:

https://bugzilla.mozilla.org/showdependencytree.cgi?id=540487&hide_resolved=1

Edwin Smith

Updated

•

14 years ago

Attachment #449849 - Attachment mime type: application/applefile → text/plain

Lars T Hansen

Assignee

Comment 28

•

14 years ago

FYI, this item from earlier is still unresolved:

"- Since the verifier is pushing OP_restarg and OP_restargc down the coder
pipeline I need to decide whether to support those in the wordcode interpreter
(and possibly elsewhere? if so, where?) or whether to disable them entirely in
non-jit builds.  In principle it would be a real win for the wordcode
interpreter to support them but perhaps that's a follow-on bug."

The most likely fix is to make the optimization jit-only for now and open a bug to track the possibility of implementing the optimization for the wordcode interpreter.  It's an easy fix - disable the optimization in the RestArgAnalyzer's constructor if it's a wordcode build.

Also pending is rebasing to TR tip and a compile fix for Windows (unused parameters).

Steven Johnson

Comment 29

•

14 years ago

(In reply to comment #28)
> non-jit builds.  In principle it would be a real win for the wordcode
> interpreter to support them but perhaps that's a follow-on bug."

Brings up a meta-question, namely, what's the future of the wordcode interpreter? It was created to improve performance on JIT-prohibited systems, but at present there's only two of those on our radar (iOS and PS3), of which the former is obviously a nonfactor for us... are we likely to revive wordcode in a shipping system anytime in the nearterm future? Is it worth spending time maintaining?

Lars T Hansen

Assignee

Comment 30

•

14 years ago

Let's take the discussion about wordcode futures to email.

Lars T Hansen

Assignee

Comment 31

•

14 years ago

Attached patch Patch, v2 (obsolete) — Details — Splinter Review

Virtually the same as the previous patch, but disables the optimization for wordcode and non-jit compiles (in the RestArgsAnalyzer constructor) and fixes a Windows compile problem by removing two unused parameters on RestArgAnalyzer::getProperty.  Also rebased to tr tip.

Attachment #449852 - Attachment is obsolete: true

Attachment #449894 - Flags: review?(stejohns)

Attachment #449894 - Flags: review?(edwsmith)

Attachment #449852 - Flags: review?(stejohns)

Attachment #449852 - Flags: review?(edwsmith)

Steven Johnson

Comment 32

•

14 years ago

Comment on attachment 449894 [details] [diff] [review]
Patch, v2

My only concern is that this patch seems somewhat fragile, in that the necessary bits are (of necessity) strewn in various places in Verifier... probably unavoidable, and maybe I'm overreacting. Still, can we make specific acceptance tests for this to ensure it doesn't accidentally break in the future due to unforeseen Verifier tweaks?

Attachment #449894 - Flags: review?(stejohns) → review+

Edwin Smith

Comment 33

•

14 years ago

Comment on attachment 449894 [details] [diff] [review]
Patch, v2

Even though the patch is bigger, its nice to get the encapsulation, and the code should shrink when the opcode tables land and whenever we better isolate verification.  

The only hazard I can see is that when a verify exception is thrown, restAnalyzer.cleanup() wont be called.  If you fix it by creating a field in Verifier for restAnalyzer, then ~Verifier can clean up, (optionally) verifyBlock() doesn't need the extra argument.  Or did I miss something?  -- this is an example of a locally allocated CodeWriter, that the comment near the top of Verifier::verify() warns about.

R+ once cleanup is fixed/assured.

Attachment #449894 - Flags: review?(edwsmith) → review+

Lars T Hansen

Assignee

Comment 34

•

14 years ago

(In reply to comment #32)
> (From update of attachment 449894 [details] [diff] [review])
> My only concern is that this patch seems somewhat fragile, in that the
> necessary bits are (of necessity) strewn in various places in Verifier...
> probably unavoidable, and maybe I'm overreacting. Still, can we make specific
> acceptance tests for this to ensure it doesn't accidentally break in the future
> due to unforeseen Verifier tweaks?

I think it will be possible to construct a simple benchmark that would reliably show a large gain when the optimization kicks in.  I would drop this into the acceptance tests.  Separate patch coming up for that.

Lars T Hansen

Assignee

Comment 35

•

14 years ago

(In reply to comment #32)
> (From update of attachment 449894 [details] [diff] [review])
> My only concern is that this patch seems somewhat fragile, in that the
> necessary bits are (of necessity) strewn in various places in Verifier...

And indeed bugs remain.  I read verifyBlock superficially and assumed it would return at the end of a block, but this is not so: it only returns if the block ends unconditionally, by JUMP, LOOKUPSWITCH, RETURN, and THROW - not on conditional jumps.  Yet the soundness of the analysis is predicated on no unconsumed rest arguments remaining on the stack at any jump (because states are not merged).

The correct thing to do is to move the calls to endBlock out of the verifier main loop and into the analysis itself, in writeOpcodeVerified most likely.

Lars T Hansen

Assignee

Comment 36

•

14 years ago

Attached patch Patch, v3 (obsolete) — Details — Splinter Review

One final(?) round.  Changes here relative to the previous version are:

- The definition of RestArgAnalyzer has moved into Verifier.h
- RestArgAnalyzer is a field on Verifier, with two-stage setup (trivial constructor + init)
- RestArgAnalyzer::cleanup is gone
- The 'analyzer' arg on verifyBlock is gone
- The hooking of restArgAnalyzer into the pipeline has been abstracted in RestArgAnalyzer::hookup
- The calls to endBlock have been moved into restArgAnalyzer::writeOpcodeVerified
- The assert in NullWriter::NullWriter has been removed because (a) it's in the way and (b) if it's a condition for this field to be not-null then it should also be const.  LMK if this change is a hardship.
- In RestArgAnalyser, optimize is set to false during analysis by calling a new method 'fail', this aids debugging.

In other words, cleanup should now function correctly and the end-of-block condition on the analysis phase should be adhered to properly.

Attachment #449894 - Attachment is obsolete: true

Attachment #450079 - Flags: review?(edwsmith)

Lars T Hansen

Assignee

Comment 37

•

14 years ago

Attached patch Acceptance test — Details — Splinter Review

Measures two programs: one where the optimization must kick in, one where it won't.  The ratio of the v8 score of the latter to the former must be at least 5.  I will post a follow-up change to testconfig.txt that disables this test for debugger builds, since the debugger disables the optimization.

Though this test measures performance I put it in the acceptance tests (it goes into as3/Definitions/Function/RestOptimization.as).  Feel free to argue, but I felt that this is not primarily a performance test, but a test that the VM does what it's supposed to do with a specific type of progam.

Attachment #450087 - Flags: review?(stejohns)

Attachment #450087 - Flags: review?(brbaker)

Lars T Hansen

Assignee

Comment 38

•

14 years ago

Attached patch Testconfig patch (obsolete) — Details — Splinter Review

Attachment #450088 - Flags: review?(brbaker)

Lars T Hansen

Assignee

Updated

•

14 years ago

Blocks: 570944

Edwin Smith

Comment 39

•

14 years ago

the only argument i can think of in favor of the test being a performance test would be if it is not reliable... sampler acceptance tests are notorious this way.  if it runs in a reasonable time (all platform/configs) and isn't flaky then its fine in acceptance tests.  IIRC, the spidermonkey regression tests have at least one test that tries to ensure the big-O performance of Array.sort is O(NlogN).  this new one is the same idea.

Edwin Smith

Comment 40

•

14 years ago

(In reply to comment #36)
> - The assert in NullWriter::NullWriter has been removed because (a) it's in the
> way and (b) if it's a condition for this field to be not-null then it should
> also be const.  LMK if this change is a hardship.

Presumably you still utilize some default methods of NullWriter, and coder becomes nonnull at lazy-init time? cool; otherwise extending CodeWriter would be fine.  no hardship.  the assert would only mask an immediate NPE in broken code anyway, limited value.

Edwin Smith

Updated

•

14 years ago

Attachment #450079 - Flags: review?(edwsmith) → review+

Edwin Smith

Comment 41

•

14 years ago

Comment on attachment 450079 [details] [diff] [review]
Patch, v3

I'm sorry I didn't review the jit code more carefully earlier, i was obsessed about the analyze phase.

Curious, why the goto in restargcHelper() instead of:

   if (*restLocal == 0)
      return argc;
   return /* call get_length */

keeping style similar to restargHelper()?

passing restLocal by reference to restargcHelper() means that a call to restargcHelper() must be modelled by dead-variable removal as a "read"; without that modelling, stores to that variable might be found to be dead and incorrectly removed.  (odds are, there are other reads.  but with loops and wonky control flow path, this is not an area to count on luck).

This modelling is done in analyze_call().  An example of another function that passes vars by reference and is modelled as a 'read' is makeatom(), at the bottom of analyze_call().

An alternative for restargcHelper(), is to pass the restLocal variable by value, loaded with "LIR_ldp" (pointer-sized value).  Then, deadvars analysis will see the load and get the modelling right based on that.

restargHelper() both reads and writes to restLocal, so it should be modelled as such as well, which is a little trickier.  The only precedent like this is ordinary calls in debuggable code - they're modelled as a read and write to every local.  we need to model the read from restLocal in analyze_call(), and the [possible] write needs to be modelled in VarTracker::insCall() by clearing the remembered value in vars[].  (set vars[param_count] = 0 if we see a call to restargHelper()).

in CodegenLIR::writeEpilogue(), add 

    if (restArgc) livep(restArgc)

to keep the stack slot live for the whole method; this is required when the method contains loop and the last ordinary use is inside the loop.  (must extend the lifetime of the slot to end-of-loop).
(on the "wishlist" is a debug-only analyzer that spots this category of mistake.. we dont do live-range analysis automatically because its too slow and is only needed for a handful of jit-generated values).

In writing this, its a refreshing reminder that the jit is too fragile.  but it is what it is.

Edwin Smith

Comment 42

•

14 years ago

A corrolory to my above comment is that any modelling/mutation of vars[] should be accompanied by corresponding modelling/mutation of tags[].  I racked my brain on this a bit -- i think restarg optimization won't happen in any situation where tags[] has a meaningful value... but again, thats a fragile assumption.  

a scenario where tags[restLocal] read is:

  if ... 
    restLocal = OP_newarray
  else
    restLocal = /* something else */
  print(restLocal[1])

the reference in the control-flow-join block causes tags[] to be read, if the two assignments aren't the same type.  (SST_object vs SST_atom, for example).  I think the analysis would reject this code tho, so just noting.

During verify phase 1, this situation can be detected when FrameState.value(restLocal).sst_mask contains more than one bit set.

two cheap insurance policies would be:
   1. detect two bits set in sst_mask, and fail() in that case
   2. synronize modelling of restarg helpers to both vars[restLocal] and tags[restLoca] in the jit.

1 is cheaper than 2, probably.

Lars T Hansen

Assignee

Comment 43

•

14 years ago

Following up to the last four comments:

We get an order of magnitude improvement with the optimization in release mode and much more in debug mode so I expect a factor of five cutoff to be safe.  (Famous last words.)  I need to update the testconfig patch however because the test will fail in the interpreter.

Piggy-backing on NullWriter is convenient, I only implement five functions of CodeWriter.

Fixing restargcHelper to take the array value (or NULL) by-value is clearly the right thing.

I will model reads and writes for restargHelper as this seems like the right thing to do.

As discussed on IM, restLocal is written once and its type is set to ARRAY_TYPE.  The value is either a real array or 0.  If the value is 0, jitted code reads the value to pass it to restargcHelper, but never writes the value.  Thus I'm pretty sure I don't need to worry about consistency in the sense discussed in comment #42.

New patch coming.

Lars T Hansen

Assignee

Comment 44

•

14 years ago

Attached patch Patch, v4 — Details — Splinter Review

(And once more, with feeling.)

Made the fixes outlined in my previous comment, no other changes.

Attachment #450079 - Attachment is obsolete: true

Lars T Hansen

Assignee

Updated

•

14 years ago

Attachment #450174 - Attachment is patch: true

Attachment #450174 - Attachment mime type: application/octet-stream → text/plain

Attachment #450174 - Flags: review?(edwsmith)

Lars T Hansen

Assignee

Comment 45

•

14 years ago

Attached patch Testconfig patch, v2 (obsolete) — Details — Splinter Review

Also disable for interpreter runs

Attachment #450088 - Attachment is obsolete: true

Attachment #450176 - Flags: review?(brbaker)

Attachment #450088 - Flags: review?(brbaker)

Edwin Smith

Comment 46

•

14 years ago

Comment on attachment 450174 [details] [diff] [review]
Patch, v4

substance:

as discussed on IM, CodegenLir::restArgc is defined in the prolog and only read from elsewhere, so it doesn't need to be allocated storage, its just an SSA expression.

ship it!

nits:

jit-calls.h: 864:  if (restLocal == 0) // Array has been created lazily

missing "not" in the comment?

jit-call.h: 913; could simplify toplevel->toVTable(obj) to (*restLocal)->vtable, if desired.  we know obj is a non-null ScriptObject.  since this is the slow path, whatever, but avoids a call and switch before calling getproperty (which is a monster)

* restLocal, firstLocal, and param_count()+1 used at various different places in CodegenLIR.cpp; it probably should be a field in CodegenLIR and calculated once in writeProlog (but, still have to pass it around a couple places, like to VarTracker).

Attachment #450174 - Flags: review?(edwsmith) → review+

Edwin Smith

Updated

•

14 years ago

Component: Virtual Machine → JIT Compiler (NanoJIT)

QA Contact: vm → nanojit

Edwin Smith

Updated

•

14 years ago

Blocks: 420368

Edwin Smith

Updated

•

14 years ago

Blocks: 477139

Steven Johnson

Updated

•

14 years ago

Attachment #450087 - Flags: review?(stejohns) → review+

Lars T Hansen

Assignee

Comment 47

•

14 years ago

Fixed restArgc issue.
Fixed jit-calls.h nits.
Introduced a restLocal property on CodegenLIR, that clears up many remaining nits.

tamarin-redux changeset:4790:293e97059b2a, including test case and testconfig changes (awaiting Brent's review).

Lars T Hansen

Assignee

Updated

•

14 years ago

Blocks: 571468

Lars T Hansen

Assignee

Updated

•

14 years ago

Blocks: 571469

Lars T Hansen

Assignee

Comment 48

•

14 years ago

Note, a fix for 64-bit systems was posted in 4791 (LIR_livep should be LIR_livei) and then the optimization was disabled in 4792 while we investigate problems on PPC, ARM, and MIPS.

Lars T Hansen

Assignee

Comment 49

•

14 years ago

Remember to re-enable the acceptance test when re-enabling the optimization.

Lars T Hansen

Assignee

Comment 50

•

14 years ago

Attached patch Keep ap_param live; re-enable optimization — Details — Splinter Review

This will land once it's been tested in the sandbox.

Edwin Smith

Comment 51

•

14 years ago

Attached patch load Toplevel* at runtime, pass index as Atom to restargHelper() — Details — Splinter Review

Two more fixes noticed:

Toplevel* isn't constant with respect to code; loadEnvToplevel() produces a possibly hoisted Toplevel*.

Must convert index argument to Atom before calling helper (opportunity to specialize the helper for int/uint/double in the future, to avoid native->atom conversion).

Brent Baker

Comment 52

•

14 years ago

Comment on attachment 450176 [details] [diff] [review]
Testconfig patch, v2

There is an acceptance run in the deep phase of the build that runs a debugger build with the -Dnodebugger switch. This config /should/ optimize the restargs and therefore should be able to properly run the testcase. The config string for those runs looks like .*debugger-Dnodebugger

Attachment #450176 - Flags: review?(brbaker) → review-

Brent Baker

Updated

•

14 years ago

Attachment #450087 - Flags: review?(brbaker) → review+

Lars T Hansen

Assignee

Comment 53

•

14 years ago

Pushed my bug fix, Ed's bug fix, reenabled the optimization, but left the acceptance test disabled until I can properly debug the test selection:

tamarin-redux changeset:   4812:1c8482dc0b74

Lars T Hansen

Assignee

Comment 54

•

14 years ago

(A background paper on design alternatives etc for this optimization was added to the "Papers, Demos, etc" section on the ActionScript Engineering home page on zerowing.)

Lars T Hansen

Assignee

Comment 55

•

14 years ago

Attached patch Testconfig patch, v3 — Details — Splinter Review

Trying again.

Attachment #450176 - Attachment is obsolete: true

Attachment #452263 - Flags: review?(brbaker)

Brent Baker

Updated

•

14 years ago

Attachment #452263 - Flags: review?(brbaker) → review+

Lars T Hansen

Assignee

Comment 56

•

14 years ago

Comment on attachment 452263 [details] [diff] [review]
Testconfig patch, v3

tamarin-redux changeset:   4849:8e372ec48f0c

Lars T Hansen

Assignee

Updated

•

14 years ago

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Trevor Baker

Updated

•

13 years ago

Flags: flashplayer-bug+

You need to log in before you can comment on or make changes to this bug.