Closed Bug 682280 Opened 8 years ago Closed 8 years ago

Vector.length is slow

Categories

(Tamarin Graveyard :: Library, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED
Q1 12 - Brannan

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

(Whiteboard: PACMAN)

Attachments

(3 files, 6 obsolete files)

Attached file Benchmark (obsolete) —
Not as slow as Array.length... but slow.  The problem is that it's implemented as a getter, and the call to that getter is relatively expensive.  Since Vector is final and we control its layout we should just inline the two obvious instructions when the type information is available; this will be much faster and also smaller.  It will benefit all code that uses Vector, but especially code that uses Vector.length as a loop limit without loading it into a local first.

Attaching a benchmark that illustrates the cost of Vector.length, Array.length, and accessing the length property of a known class.
Oh, and the timings for that benchmark (TR tip Release build, Mac Pro 2x2.93GHz Xeon):

Array: 1127
Vector: 911
MyArray: 1

If we can't get the full benefit of the simulated MyArray case because of the Vector representation (level of indirection requires two loads), I'd be willing to settle for a 500x speedup.
Attached patch Work in progress (obsolete) — Splinter Review
Some updates.  A "fair" benchmark will be attached next, because there was a bug in the original one, making MyArray look much too good.  With the fair benchmark the time for MyArray increases to 367ms (wipes egg from face).

Anyway the fix would be basically as attached.  This yields a 2x speedup on Vector.length and brings the time down to 425ms.  If we take MyArray as a base case where length access has zero cost (obviously an underestimate), this means that Vector.length access is reduced from (911-376)/911=58% of the running time to (425-376)/425=12% of the running time.

How could we do better?  In principle the JIT could hoist and CSE the loading of vector->data if it could prove that the vector won't be updated in the loop, to do that we need to create an aliasing region for vector metadata, probably.
Attached file Less eggy benchmark (obsolete) —
Attachment #556019 - Attachment is obsolete: true
Summary: Vector.length is deadly slow → Vector.length is slow
The three test cases: extract "length" from Vector, Array, and object of a known type with a fixed, public "length" field.
Attachment #556047 - Attachment is obsolete: true
Attachment #556381 - Flags: review?(edwsmith)
Attached patch Optimize vector.length (obsolete) — Splinter Review
Looking for additional insights, and barring that I'll be inclined to offer this for a proper review as I don't know that anything's missing.

This should be independent of other pending optimizations (inlining, rcobject specialization).

Without this optimization we have TR tip (it/sec):

  array-read-length-1                                  87.4
  object-read-length-1                                273.9
  vector-read-length-1                                109.9

With this optimization:

  array-read-length-1                                  88.2
  object-read-length-1                                275.4
  vector-read-length-1                                236.1
Attachment #556046 - Attachment is obsolete: true
Attachment #556382 - Flags: feedback?(edwsmith)
Blocks: 681888
Attachment #556381 - Flags: review?(edwsmith) → review?(fklockii)
Assignee: nobody → lhansen
Flags: flashplayer-qrb+
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Priority: -- → P3
Target Milestone: --- → Q4 11 - Anza
Comment on attachment 556381 [details] [diff] [review]
Benchmarks extracted for asmicro/

R+

Nits first, since they *must* be fixed:

* Tabs in patch.

* Copyright notice says 2010.

Less important feedback:

* Why are you using a doubly-nested loop, instead of just feeding in 1000000 as the iter count and using a single for loop?  (If you decide to switch to a single loop, it will also address the issue immediately below.)

* The benchmarks have references to x.length in the loop-termination condition and in the loop-body.  (As you know from your eggy condition in comment 2.)  As humorous as that was, I'd lift this to a local, or just use iter again or some other constant, so that the expression under scrutiny (i.e. x.length) is confined to the loop body.

(The doubling of fetches of x.length doesn't *really* matter, since you are not doing anything like dividing by the iteration count to estimate the running time for each .length evaluation.  And maybe your implicit goal here is to illustrate that this problem matters in real code, since this is such a common coding pattern.  As long as all of the test cases remain in sync with each other, its obviously fine and can land as is if you like.)
Attachment #556381 - Flags: review?(fklockii) → review+
(In reply to Felix S Klock II from comment #6)

> * Tabs in patch.
> 
> * Copyright notice says 2010.

Fixed.

> * Why are you using a doubly-nested loop, instead of just feeding in 1000000
> as the iter count and using a single for loop?
...
> * The benchmarks have references to x.length in the loop-termination
> condition and in the loop-body.
...
> (The doubling of fetches of x.length doesn't *really* matter, since you are
> not doing anything like dividing by the iteration count to estimate the
> running time for each .length evaluation.  And maybe your implicit goal here
> is to illustrate that this problem matters in real code, since this is such
> a common coding pattern.  As long as all of the test cases remain in sync
> with each other, its obviously fine and can land as is if you like.)

These are good questions.  In truth, the structure evolved a little haphazardly.

In order to cast the overhead of the length accessor into the most unfavorable light possible the work of reading length should dominate in the loop.  So that justifies, if that's the word, the double access to x.length.  A more structured way of achieving that would be to just unroll a loop with a lot of accesses to vector.length, as I've done elsewhere.  On the other hand, that gives CSE a lot to work with.  On the third hand, the ground rule for asmicro is "don't worry about that", and indeed, would that not hold for the current code too, and would we not want CSE to exploit that if it could?  On the fourth hand, I found in some of the other benchmarks on vectors that I did in fact have to worry about that to get non-noisy results.

I like your justification that it "illustrates that this problem matters in real code, since this is such a common coding pattern" and will use that as an excuse to land the code as-is.
changeset: 6558:3515bec38de2
user:      Lars T Hansen <lhansen@adobe.com>
summary:   For 682280 - Vector.length is slow: benchmarks (r=fklockii)

http://hg.mozilla.org/tamarin-redux/rev/3515bec38de2
Comment on attachment 556382 [details] [diff] [review]
Optimize vector.length


This is the predicate you used to detect that we're calling the length getter:

> !name->isRuntime() && name->getName() == core->klength && name->containsAnyPublicNamespace()

I'd suggest using the known native method id instead.  Once we know we're calling a known native function (Vector.<T>.length getter) on a known Vector type (Vector.<T>), like this (code adapted from the fall-through case):

 if (AvmCore::hasGetterBinding(b)) {
   int disp_id = AvmCore::bindingToGetterId(b);
   MethodInfo* getter_info =obj.traits->getTraitsBindings()->getMethod(disp_id)
   switch (getter_info->method_id()) {
     case avmplus::NativeID::__AS3___vec_Vector_object_length_get:
     case avmplus::NativeID::__AS3___vec_Vector_double_length_get:
   }
 }

Take a look at CodegenLIR::specializedFunctions[] to see how the method id is used like this already.
Attachment #556382 - Flags: feedback?(edwsmith) → feedback+
Target Milestone: Q4 11 - Anza → Q1 12 - Brannan
Status: NEW → ASSIGNED
Whiteboard: PACMAN
Attached patch Optimize vector.length, v2 (obsolete) — Splinter Review
You mean like this, no additional testing needed?  (Sweet, if so.)
Attachment #556382 - Attachment is obsolete: true
Attachment #566569 - Flags: feedback?(edwsmith)
Comment on attachment 566569 [details] [diff] [review]
Optimize vector.length, v2

Yep, exactly what I meant.
Attachment #566569 - Flags: feedback?(edwsmith) → feedback+
The same patch except for a pair of braces inserted to placate GCC 4.4.
Attachment #566569 - Attachment is obsolete: true
Attachment #567735 - Flags: review?(edwsmith)
Attachment #567735 - Flags: review?(edwsmith) → review+
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
changeset: 6663:18834e3c7831
user:      Lars T Hansen <lhansen@adobe.com>
summary:   Fix 682280 - Vector.length is slow (r=edwsmith)

http://hg.mozilla.org/tamarin-redux/rev/18834e3c7831
(In reply to Tamarin Bot from comment #13)
> changeset: 6663:18834e3c7831
> user:      Lars T Hansen <lhansen@adobe.com>
> summary:   Fix 682280 - Vector.length is slow (r=edwsmith)
> 
> http://hg.mozilla.org/tamarin-redux/rev/18834e3c7831

FYI: according to my bisection, this changeset appears to introduce behavioral change that is blocking integration.

Still investigating (I don't even have a reduced test case yet), but I thought I'd provide a heads up in case you have an aha moment about what might be wrong.
No aha, but there's a bug in that patch that was corrected later, specifically in this changeset:

changeset:   6731:4dbf526976e9
user:        Lars T Hansen <lhansen@adobe.com>
date:        Tue Nov 15 09:47:20 2011 +0100
summary:     Guard the inlining of blessed getters on the pool id being the builtin pool (quasi-r=edwsmith)
(In reply to Lars T Hansen from comment #15)
> No aha, but there's a bug in that patch that was corrected later,
> specifically in this changeset:
> 
> changeset:   6731:4dbf526976e9
> user:        Lars T Hansen <lhansen@adobe.com>
> date:        Tue Nov 15 09:47:20 2011 +0100
> summary:     Guard the inlining of blessed getters on the pool id being the
> builtin pool (quasi-r=edwsmith)

Hmm, I had seen that when I started this process X weeks ago, and had thought I would "get away" with integrating without it this time around.  I don't know if the same bug is what I am seeing it, but it is worth checking.

Is there a pure .as regression test case that illustrates the bug that this changeset is fixing?  (I am ignorant of how one would flow into this code without the guard that rev6731 introduced.)
There is no test case, just an observation from Edwin in email that this is a problem he should have caught during review; he caught it when we were walking through the float code and expanding the set of intrinsics.  Presumably in a rare test case the method ID for some random getter could be confused with the method ID for the vector length getters.
(In reply to Lars T Hansen from comment #17)
> There is no test case, just an observation from Edwin in email that this is
> a problem he should have caught during review; he caught it when we were
> walking through the float code and expanding the set of intrinsics. 
> Presumably in a rare test case the method ID for some random getter could be
> confused with the method ID for the vector length getters.

Ah, yes, then that is probably what is happening to me: One of the getters being used internally within the AIR test infrastructure (not a test, but the infrastructure itself) is being miscompiled to the vector .length referencing code, and producing some bogus value in response.

Okay, great, that means I have a prayer of finishing up the integration today.
Regression test for getter guard bug discussed in comment 15 and comment 17.

It is large by necessity; the spraying technique it is using needs to cover a lot of getters in order to find a collision.  I have included a small model version that goes through only 10 getter/setters in order to allow one to understand if other bugs that it signals are happening across the board or only for certain extreme method ids.  (I have confirmed that the large version, and only the large version, does catch the bug discussed in comment 15 and comment 17.)

Note this draft of the test is currently only passing 3 out of the 4 tests.  So either we still have some bugs to flush out here, or the logic in the 4th test is somehow flawed in a way I do not yet see.  Illustration on a DebugDebugger build, rev 6743:4473be1f334c:

% python runtests.py -f --avm ../../objdir-ged64/shell/avmshell as3/Definitions/FunctionAccessors/AccessorSpray*.as

Tamarin tests started: 2011-11-24 15:01:31.273302
current configuration: x64-mac-tvm-debugdebugger
avm version: cyclone
asc version: 0
thread count: 4
Executing 2 tests against vm: ../../objdir-ged64/shell/avmshell
1 running as3/Definitions/FunctionAccessors/AccessorSpraySmall.as  
   uint setter spray = false FAILED! expected: true
   FAILED passes:3 fails:1 unexpected passes: 0 expected failures: 0

2 running as3/Definitions/FunctionAccessors/AccessorSpray.as  
   uint setter spray = false FAILED! expected: true
   FAILED passes:3 fails:1 unexpected passes: 0 expected failures: 0


FAILURES:
  as3/Definitions/FunctionAccessors/AccessorSpraySmall.abc : uint setter spray = false FAILED! expected: true
  as3/Definitions/FunctionAccessors/AccessorSpray.abc : uint setter spray = false FAILED! expected: true
END FAILURES

test run FAILED!
Tests complete at 2011-11-24 15:01:42.400093
Start Date: 2011-11-24 15:01:31.273302
End Date  : 2011-11-24 15:01:42.400093
Test Time : 0:00:11.126791
passes               : 6
failures             : 2
Attachment #576740 - Flags: superreview?(brbaker)
Attachment #576740 - Flags: review?(lhansen)
Attachment #576740 - Attachment is patch: true
(In reply to Felix S Klock II from comment #19)
> Note this draft of the test is currently only passing 3 out of the 4 tests. 
> So either we still have some bugs to flush out here, or the logic in the 4th
> test is somehow flawed in a way I do not yet see.  Illustration on a
> DebugDebugger build, rev 6743:4473be1f334c:

Its the former; the logic is flawed, because the negative number is being coerced to a positive integer.  Will fix.
Comment on attachment 576740 [details] [diff] [review]
patch R v1: regression test for getter guard bug

Switching F? in light of bug in test i noted in previous comment, so not review ready, but probably feedback is all i really want in the end anyway.
Attachment #576740 - Flags: superreview?(brbaker)
Attachment #576740 - Flags: review?(lhansen)
Attachment #576740 - Flags: feedback?(lhansen)
fixed bug.  now all the tests pass.  I don't see a better way to try to catch these sorts of bugs in avmshell, so I figure either we push something like this or decide its distasteful.

thus the F? for Lars.  Feedback from QE is welcome as well.
Attachment #576740 - Attachment is obsolete: true
Attachment #576740 - Flags: feedback?(lhansen)
Attachment #576742 - Flags: feedback?(lhansen)
Comment on attachment 576742 [details] [diff] [review]
patch R v2: regression test for getter guard bug

That's pretty funny.

Seems that a program to generate the appropriate ABC (or AS3, if eval were up to it, which I somehow doubt) at run-time would be smaller ;-)

(No, that was not a suggestion to consider that.)
Attachment #576742 - Flags: feedback?(lhansen) → feedback+
TR->BZ auto-comment-on-commit appears broken.  (Does the server need rebooting?)

changeset:   6744:482c94bbd5d9
user:        Felix S Klock II <fklockii@adobe.com>
date:        Fri Nov 25 11:44:47 2011 +0100
summary:     Bug 682280: regression test for getter guard (f=lhansen, r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/482c94bbd5d9
You need to log in before you can comment on or make changes to this bug.