Closed Bug 605324 Opened 14 years ago Closed 14 years ago

Vector<> should use the new List suite once that lands

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stejohns, Assigned: stejohns)

References

Details

Attachments

(3 files, 2 obsolete files)

Most of the smarts in Vector<> is now duplicative of that in the new List<> replacement suite; rewriting Vector<> to use that as its basis will simplify the code.
Add explicit setLength and setCapacity calls, needed for the next patch. (Note that it's possible to do without these, but much cleaner to simply add them at this level.)
Attachment #484158 - Flags: feedback?(wsharp)
Attachment #484158 - Flags: feedback?(wsharp) → feedback?(lhansen)
Passes existing acceptance tests, but may need more scrubbing, and probably needs some performance torture testing too. Posting here for feedback on general direction.
Attachment #484162 - Flags: feedback?(wsharp)
Assignee: nobody → stejohns
Blocks: 601817
Blocks: 588364
Blocks: 589827
Comment on attachment 484162 [details] [diff] [review] Rewrite of Vector using new List suite Overall this seems fine. I'd be curious to know the before and after code size with this change. I don't think the double get/set prop stuff is quite right. See bug 599357 and range error you can get from a value that fits in an uint but not a int. This should go away if we decide to inline the double checks anyway. Also, this change might make bug 599099 a bit harder to implement. If we want to directly access our data/length properties from the JIT, we'll need to access the m_list object. Performance testing would be very useful. Tests for the OPTIMIZEME tags you added as well as general push/op/shift/unshift, etc. tests would be nice to know if this causes any performance degradation. nit: Why the traits cast? toErrorString((Traits*)traits)
Attachment #484162 - Flags: feedback?(wsharp) → feedback+
(In reply to comment #3) > Overall this seems fine. I'd be curious to know the before and after code size > with this change. Yep, needs to be measured. Based on what I saw with a similar restructuring of List<>, it's probably smaller due to less gratuitous inlining, but gotta measure... > Also, this change might make bug 599099 a bit harder to implement. If we want > to directly access our data/length properties from the JIT, we'll need to > access the m_list object. Hmm, true, but not insurmountable, I suspect. > Performance testing would be very useful. Absolutely essential prior to landing. > nit: Why the traits cast? Urr, good question, looks like a code hickey.
Comment on attachment 484158 [details] [diff] [review] Add setLength, setCap calls Technically this condition in set_length is redundant: + else if (len > oldlen) In set_capacity this assertion is incorrect if the capacity on entry is greater than kListMinCapacity and the call sets the capacity to kListMinCapacity: + AvmAssert(cap > kListMinCapacity); I'm a little hazy on why it would be correct for the assert in RCListHelper::clearRange and AtomListHelper::clearRange to test against cap and not len, but if I understand correctly it is correct to do so as long as all elements are always initialized to a sane default value.
Attachment #484158 - Flags: feedback?(lhansen) → feedback+
I did a quick code-size comparison, and interestingly, the new code is about 40k larger on MacTel32 builds... which is odd because it's definitely less code. Will investigate.
OK, after some twiddling to avoid duplicate functions, it's down to about 12k larger, which is in the nuisance range. (Note: this is MacTel32 only, with dead-strip enabled, debug symbols disabled... for some reason we don't normally build avmshell with dead-strip.) Will check MSVC too. Benchmarks with tweaked patches (which I'll post shortly) look promising: Dir: jsbench/typed/ Crypt 1108 1109.7 1081 1083.7 2.4 2.3 + Euler 9395 9431.7 9328 9355.7 0.7 0.8 + FFT 1889 1901.3 1734 1741.3 8.2 8.4 ++ HeapSort 1500 1508 1375 1380 8.3 8.5 ++ LUFact 1738 1743.7 1339 1343 23.0 23.0 ++ Moldyn 4702 4732.3 4670 4717 0.7 0.3 RayTracer 1360 1364.7 1368 1370.3 -0.6 -0.4 - SOR 5522 5545.7 4450 4458.3 19.4 19.6 ++ Series 7846 7868.7 7745 7765.3 1.3 1.3 + SparseMatmult 2683 2696.7 2689 2703 -0.2 -0.2 Dir: sunspider/as3vector/ access-fannkuch 26 26.3 22 22.3 15.4 15.2 ++ access-nbody 6 6.7 6 6.3 0 5.0 access-nsieve 13 13.7 12 12.7 7.7 7.3 + bitops-nsieve-bits 8 8.7 8 8.3 0 3.8 math-cordic 17 17 16 16.7 5.9 2.0 ++ math-spectral-norm 16 16.7 17 17 -6.2 -2.0 -- s3d-cube 19 19.3 17 17 10.5 12.1 ++ s3d-morph 16 16 16 16.7 0 -4.2 string-fasta 43 43.3 43 43.7 0 -0.8 string-validate-input 37 44 37 37 0 15.9 Dir: v8/typed/ crypto 461 460.7 462 461.7 0.2 0.2 + deltablue 2413 2410.3 2422 2413.7 0.4 0.1 earley-boyer 984 981 984 981.7 0 0.1 raytrace 6062 5993.3 6218 6186.7 2.6 3.2 + richards 1877 1873 1847 1846 -1.6 -1.4 - Dir: v8.5/typed/ crypto 2130 2118 2186 2181.7 2.6 3.0 + deltablue 3183 3174.7 3219 3208.7 1.1 1.1 + earley-boyer 963 957.3 961 949.3 -0.2 -0.8 raytrace 7417 7372.3 7446 7402 0.4 0.4 regexp 83 82.7 85.9 85.7 3.5 3.6 + richards 3645 3634 3692 3676.3 1.3 1.2 + splay 916 914.7 930 925 1.5 1.1 + Dir: v8.5/optimized/ crypto 3234 3210.7 3483 3460.7 7.7 7.8 ++ deltablue 2998 2957.3 3005 2959.7 0.2 0.1 earley-boyer 976 960 971 959.3 -0.5 -0.1 raytrace 7461 7427 7402 7397.3 -0.8 -0.4 - regexp 82.7 82.1 85.8 85.7 3.7 4.3 + richards 3626 3612.3 3684 3677.3 1.6 1.8 + splay 5399 5366.3 5432 5408.7 0.6 0.8
Attached patch Rewrite, v2 (obsolete) — Splinter Review
Compacted and cleaned a bit; I think it's worthy of a full review. (Code size comparison still needs to be done on MSVC before it can land.)
Attachment #484162 - Attachment is obsolete: true
Attachment #486241 - Flags: review?(wsharp)
VC2008 avm win32 release shell is 512 bytes bigger with the new implementation. (857088 to 857600 bytes). A zipped exe is 487 bytes bigger. Your performance improvements are because you fixed the double->uint32_t slowdown from set/getDoubleProperty. Do we have any microbenchmarks on Vector that can be run to tell if just growing a vector is slower? Or any benchmarks on a single Vector API. Or I'd like to see those numbers above with a comparison shell that has the faster double->int32_t implementation for get/setDoubleProperty. I'll still review the patch as-is but some more perf numbers would be great.
(In reply to comment #9) > Your performance improvements are because you fixed the double->uint32_t Doh! Good catch, apples vs oranges. I'll apply a similar fix to the "before" numbers and re-run to see. No microbenchmarks on whether growing is faster or slower, but the same growth algo is being used, so not expecting dramatic change there (bench still needs to be done though)
Comment on attachment 486241 [details] [diff] [review] Rewrite, v2 mostly nits: (Still) Why the Traits *cast? core()->toErrorString((Traits*)traits)); This is awkward: ScriptObject* so_args = AvmCore::isObject(argv[1]) ? AvmCore::atomToScriptObject(argv[1]) : NULL; if (!so_args) throw Why not this instead? if (!isObject()) throw so_args = atomToScriptObject(..) Does TypedVectorClassBase::call ever get called with a large number of args[] and we should have a faster implementation than the for loop: v->setUintProperty(i, so_args->getUintProperty(i)); Or at the very least pre-size the Vector? Three OPTIMIZEME comments need research and decision about whether there is more work required. I think there may be an obscure bug with vector access of large unsigned integers (going through Atom path). If I have an string of "3000000000.0", the old code (getVectorIndex) was only dealing with integer values while the new code (VectorIndex) is dealing with unsigned integer values (and a slower double->uint32). One throws a reference error while the other would through an out of range error. NO BUILTINS PLEASE! Otherwise looks good.
Attachment #486241 - Flags: review?(wsharp) → review-
> (Still) Why the Traits *cast? > > core()->toErrorString((Traits*)traits)); Where is that? I don't see it. > Why not this instead? > > if (!isObject()) > throw Will investigate > Does TypedVectorClassBase::call ever get called with a large number of args[] > and we should have a faster implementation than the for loop: > > v->setUintProperty(i, so_args->getUintProperty(i)); I don't know -- FWIW, that was basically pre-existing code, didn't de-optimize it > Three OPTIMIZEME comments need research and decision about whether there is > more work required. Yep
+ void Toplevel::throwReferenceError(int id, String* name, const Traits* traits) const + { + Multiname mn(core()->findPublicNamespace(), name); + referenceErrorClass()->throwError(id, core()->toErrorString(mn), core()->toErrorString((Traits*)traits)); + }
> Does TypedVectorClassBase::call ever get called with a large number of args[] Doesn't matter, we only look at the first one. > and we should have a faster implementation than the for loop: > v->setUintProperty(i, so_args->getUintProperty(i)); Not possible in the general case, since the argument might not be any kind of object, but it appears that by far most usage is when the arg is an Array... that could be optimized some, but not much. I'd want to see evidence this is a meaningful bottleneck before bothering. > Or at the very least pre-size the Vector? That sounds worth doing, willfix. > Three OPTIMIZEME comments need research and decision about whether there is > more work required. The first OPTIMIZEME (in set_length) is probably FOL in order to maintain existing growth semantics. The other two (using alloca for calls to splice/insert) are arguably FOL, since we have to convert Atom-to-value somehow in order to complete the operation. We could avoid the alloca in both cases by replicating the logic of splice/insert inline and doing the conversion one at a time, but I'm skeptical (without proof) that many calls have a large enough argc for this to matter. Still, any thoughts as to how to verify? > I think there may be an obscure bug with vector access of large unsigned > integers (going through Atom path). If I have an string of "3000000000.0", the > old code (getVectorIndex) was only dealing with integer values while the new > code (VectorIndex) is dealing with unsigned integer values (and a slower > double->uint32). One throws a reference error while the other would through an > out of range error. Hmm, really? Aside from packaging it as a utility class I thought I left that function as-is, will re-examine... and I don't see that either getVectorIndex or VectorIndex throws any kind of error at all. Which code path are you looking at?
Attached patch Rewrite, v3Splinter Review
Notable changes: -- Rewrote _spliceHelper to avoid using temp storage entirely, and replaced insert() with another variant of _splice. (two OPTIMIZEME's down.) -- Remaining OPTIMIZEME is FOL to maintain existing semantics, so removed comment -- reworked the call() method to go thru _spliceHelper (which does the presizing for us) only remaining known issue is the "obscure bug with vector access of large unsigned integers"; not sure what you mean by this so awaiting further info on that from you. Perf numbers for this patch: (Note that for baseline numbers, I fixed the double->uint32 slowness in baseline so that get/setDoubleProperty should be comparable, but I'm still getting a significant speed bump in those methods... not sure if we have apples-to-apples or if this is a genuine speedup) avm avm2 test best avg best avg %dBst %dAvg Dir: jsbench/typed/ Crypt 1104 1109 1064 1067.7 3.6 3.7 + Euler 9458 9499.7 9119 9176.3 3.6 3.4 + FFT 1759 1761.7 1666 1681.7 5.3 4.5 ++ HeapSort 1388 1396.7 1357 1359 2.2 2.7 + LUFact 1548 1553 1314 1314.3 15.1 15.4 ++ Moldyn 4645 4687.7 4522 4550.7 2.6 2.9 + RayTracer 1372 1376 1406 1408 -2.5 -2.3 - SOR 4938 4952.3 4430 4455.7 10.3 10.0 ++ Series 7744 7777.7 7675 7698 0.9 1.0 + SparseMatmult 2659 2660.7 2687 2690.7 -1.1 -1.1 - Dir: sunspider/as3vector/ access-fannkuch 24 24.7 22 22.7 8.3 8.1 ++ access-nbody 6 6 8 8.3 -33.3 -38.9 -- access-nsieve 13 13 12 12 7.7 7.7 ++ bitops-nsieve-bits 8 8.3 8 8 0 4.0 math-cordic 16 16 16 16.7 0 -4.2 math-spectral-norm 16 16.3 16 16.7 0 -2.0 s3d-cube 19 19.3 16 16.7 15.8 13.8 ++ s3d-morph 16 16.7 16 16.3 0 2.0 string-fasta 42 43 43 43 -2.4 0 - string-validate-input 37 42 37 37 0 11.9 Dir: v8/typed/ crypto 460 459 464 460.7 0.9 0.4 deltablue 2429 2424.7 2434 2428.7 0.2 0.2 earley-boyer 993 991.3 989 987.7 -0.4 -0.4 - raytrace 6149 6131 5979 5952.3 -2.8 -2.9 - richards 1837 1832.7 1835 1834.3 -0.1 0.1 Dir: v8.5/typed/ crypto 2172 2165 2220 2210.7 2.2 2.1 + deltablue 3229 3224.3 3226 3192.3 -0.1 -1.0 earley-boyer 986 985 985 979.3 -0.1 -0.6 raytrace 7502 7485 7502 7495 0 0.1 regexp 84 83.6 85.6 85.5 1.9 2.3 + richards 3735 3727 3771 3761.7 1.0 0.9 + splay 934 931.3 941 936.3 0.7 0.5 + Dir: v8.5/optimized/ crypto 3298 3269 3550 3538.7 7.6 8.2 ++ deltablue 3037 3035.7 3005 2998 -1.1 -1.2 - earley-boyer 998 987 982 979.7 -1.6 -0.7 - raytrace 7517 7504.7 7532 7527.3 0.2 0.3 + regexp 84 83.7 85.9 85.5 2.3 2.1 + richards 3763 3743 3776 3771.3 0.3 0.8 splay 5525 5498.3 5520 5487.7 -0.1 -0.2
Attachment #486241 - Attachment is obsolete: true
Attachment #486519 - Flags: review?(wsharp)
Highly obscure test case: var v:Vector.<int> = new Vector.<int>; trace(v["3000000000.0"]); This used to generate: ReferenceError: Error #1069: Property 3000000000.0 not found on __AS3__.vec.Vector.<int> and there is no default value. And now generates a RangeError. The new error is correct but different than what we had before.
win32 release shell shows no difference in LUFact or SOR performance for me with a fixed double->int32_t shell and one with this patch applied. LUFact is bottlenecked though _getNativeDoubleProperty and _setNativeDoubleProperty (along with ftol2 helpers on win32). Maybe Mac with always enabled SSE2 is faster for some reason. Not really important though...as long as our perf results show no slowdown.
Comment on attachment 486519 [details] [diff] [review] Rewrite, v3 +review with the following comments: If deemed important, we should deal with the obscure test case with high unsigned integer values throwing a reference error vs. a range error. The VectorIndex class would to need to do double->int32_t conversion instead of: index = uint32_t(index_d); Ed brought up a good point via IM the other day that we should start hiding JIT wrappers as private functions and friend CodegenLIR so it can access them. The last thing we want is player code to start calling these APIs directly. This is beyond the scope of this bug but if you want to make this change, please do so.
Attachment #486519 - Flags: review?(wsharp) → review+
(In reply to comment #16) > Highly obscure test case: Woot, thanks, will investigate & resolve. (In reply to comment #17) > win32 release shell shows no difference in LUFact or SOR performance for me > with a fixed double->int32_t shell and one with this patch applied. Hmm, odd, I'll re-check this. (FWIW, the fact that this patch is a small-but-nonzero size increase and a generally small-but-nonzero perf increase makes it of modest value for reasons other than general code hygiene... IMHO the code cleanup makes it worthwhile, but opinions to the contrary welcomed)
followup patch (additive) to fix the odd parsing issue Werner identified.
Attachment #486690 - Flags: review?(wsharp)
Comment on attachment 486690 [details] [diff] [review] fix VectorIndex parsing nits: Do we need to put the comment on two lines? + // Note: convert using int, not uint, as it's + // much faster Leftovers: +// bool isNumber; +// bool isValid; We need to add some vector tests for these edge cases. I will do it as part of my Vector.get/set numeric bug.
Attachment #486690 - Flags: review?(wsharp) → review+
(In reply to comment #21) > Do we need to put the comment on two lines? > > + // Note: convert using int, not uint, as it's > + // much faster I personally like seeing code and comments that conforms to 80-column width limit, which is what this seems to be doing here. Except that a few lines down in the patch there's a line of code that goes over 80 columns, so its a pretty weak explanation. I'm in favor of 80-column width because I often put code up side-by-side; going over 80-columns makes that a bit painful when I'm working on my portrait-oriented display (or my laptop for that matter), or in some IDE's that seem to enjoy putting lots of tool bars up on the sides of my display. But I'm going to acknowledge ahead of time that other engineers have clamored against a rule requiring 80-columns; monitors are quite wide in their more common orientation these days.
TR 5420:821faaf8d7ff
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: