TT: support dense arrays not starting at 0

RESOLVED FIXED

Status

Tamarin
Tracing Virtual Machine
RESOLVED FIXED
11 years ago
10 years ago

People

(Reporter: Edwin Smith, Assigned: Steven Johnson)

Tracking

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

11 years ago
say you start indexing at 1 or 2 and the array is dense.  Neither TC nor TT will detect the array as dense since it doesn't start at 0.

sunspider/access-nsieve.as exploits this, isPrime[] starts at 2, but doesn't become a real array.  hashtable access is in the critical path.

In TT, it's worse because the numeric keys are converted to string before accessing the hashtable.
FWIW, in the upcoming Array work in SpiderMonkey, we're going to use a load factor test (something like "more than N slots empty, less than M% loaded between 0 and max-set-index") to determine whether we're going to be dense or not.
(Reporter)

Comment 2

11 years ago
Cool.  Honestly, the dense case is so common it seems like a good idea to start with the assumption it will be dense, at least if the keys start out numeric.  Also, arrays-as-tuples are common, so below a certian size (10?) it could make sense to disregard the load factor as long as keys are numeric.
(Assignee)

Comment 3

11 years ago
(In reply to comment #1)
> FWIW, in the upcoming Array work in SpiderMonkey, we're going to use a load

Mike, we should coordinate the work here to the extent that it makes sense... if/when you come up with good heuristics perhaps we should apply a similar test in TT
Yeah, the "more than N slots empty" part will handle tuple use, with N of ~30.  We handle non-numeric properties in a distinct map, though that may run afoul of ES4's requirement for enum in set order, in which case we'd go to the hash more eagerly.

stejohn: absolutely.
(Assignee)

Comment 5

10 years ago
Hey Mike, I am currently reworking the dense-array support in TT and would like to find out the SpiderMonkey tests you have in mind (specifically, the values for N and M in your example above)... I've been experimenting with value but I'm guessing you have waaaay more data to work with here
(Reporter)

Updated

10 years ago
Assignee: nobody → stejohns
Component: Virtual Machine → Tracing Virtual Machine
(Assignee)

Comment 6

10 years ago
The patch for https://bugzilla.mozilla.org/show_bug.cgi?id=416924 (theoretically) fixes this, but not optimally, as the heuristics for moving between dense and sparse are arbitrary and a little stupid at times. 
N and M for SpiderMonkey are (will be, once 322889 lands) based on the memory overhead of our non-dense property case, so yours will be different I'm sure.  We do have a minimum of 32 slots, though, to let odd access patterns that are mostly-dense get started.

Our rules are that we're dense until:
- we exceed the N/M sparseness threshold (and the 32 slots minimum)
- someone sets a non-numeric property
- someone fills a hole < the highest set index

The latter two cases cause us to fault to denseness so that we can preserve enumeration-order-is-property-set-order per the upcoming ES4 stricture (and web-compatibility requirement, though I do find myself wondering if people depend on that for arrays in the real world).  NB that some array methods like splice or reverse can therefore cause an array to fault to non-dense in our case, because ES3 specifies some unfortunate ordering for these operations, in its charming pseudocode.

I am thinking about asking TG1 to relax the enumeration order stricture for arrays, or at least permit alternate implementations of methods so that we can avoid faulting in those cases.  Your thoughts on such a proposal are welcome!
(Assignee)

Comment 8

10 years ago
The existing enumeration-order requirements are definitely limiting, and hugely problematic for efficient dense arrays. de facto usage might prevent us from reforming this, but IMHO Arrays shouldn't have to keep any history of set-order for enumeration. I would vote for something like: Arrays always enumerate 0...length-1, followed by any remaining properties in an unpredictable order.
(Reporter)

Comment 9

10 years ago
Tamarin doesn't honor any enumeration order rules, and ES3 doesn't have any.  obviously the web has rules.  and es4 has rules.  

if you make Array enumerate 0..length-1, it won't break anything in tamarin or flash, at least.
(Assignee)

Comment 10

10 years ago
Joy. Might as well do it right the first time. At least Vector has been excepted from these issues.
(Assignee)

Comment 11

10 years ago
So, I have prototyped out an adaptation of the SpiderMonkey approach that Mike outlined, and the enumeration-order requirements can cause major performance degradation in some cases (in particular, the requirement that filling in holes in the dense area out of order causes a fallback to sparse arrays) -- 300% slowdown (or more) in some benchmarks.

This may be a de-facto requirement for webcompatibility, but it's definitely something that we should try to regulate away for ES4 if at all possible, as the performance tradeoff is significant and probably impractical to reconcilewith current practice.

In the meantime, I will probably make the TT version of this code conditional, as Flash Player has no such enumeration order requirement and definitely won't want to pay the performance penalty.
(Assignee)

Comment 12

10 years ago
So after more work, the penalty for our-of-order is not as large as I indicated earlier, but is as much as 50% in some cases, at least in TT (e.g., a benchmark that consistently allocates 1600-entry arrays and starts filling them in from the end rather than the beginning).

I also added code that reconverts sparse->dense when a sparse array gets over 50% full, which makes s substantial difference in the above benchmark -- we end up with a slight net slowdown but it's fairly modest. Unfortunately, this optimization isn't compatible with preserving the enumeration order, so it will be associated with the conditional section described above -- Flash will almost certainly take advantage of it since it doesn't have the existing legacy behavior to support, but web clients might have to take the slower path.
(Assignee)

Comment 13

10 years ago
Created attachment 305049 [details] [diff] [review]
Patch

Rework Array to convert from dense to sparse in accordance with new SpiderMonkey approach: if an array drops below 25% usage, convert it to sparse. 

There's also code that can be enabled to convert to sparse more aggressively in order to preserve existing de factor order-of-enumeration behavior that web content depends on; this can cause significant performance degradation in some cases, though, so it's disabled by default (can opt-in by a compiler definition).

Removed all the "dense" optimizations from Array.as -- the semantics for them were no longer correct, and it turns out not to make a measurable performance difference for traced code anyway.

Performance is pretty much a wash, except for a couple of benchmarks that are slightly faster, and one (DrawGridHeadless) that is slightly slower -- this latter is a pathological case that allocates may 1600-entry Arrays and fills each one in starting at the back. Ordinarily I wouldn't want to check in a fix that slowed down a benchmark but the old code was wrong (always allocated a dense array at the start, regardless of size) and the new behavior is probably about as good as we can do while still balancing memory usage.

Also fixed a latent bug where native slots that held an RC object (as opposed to a GC object) weren't being set properly in the native slot wrapper code. Now we sniff the type of the object if the atom is kSpecialType and do the right thing.

Sample benchmarks:
                                TT-OLD  TT-NEW
prime4                          176 	174
boidsHeadless                   3198 	3233
boidshackHeadless               586 	574
DrawGridHeadless                564 	594
access-binary-trees             269 	250
access-fannkuch                 217 	221
access-nbody                    302 	300
access-nsieve                   138 	86
bitops-3bit-bits-in-byte        25 	29
bitops-bits-in-byte             84 	83
bitops-bitwise-and              373 	374
bitops-nsieve-bits              87 	84
crypto-md5                      864 	875
crypto-sha1                     106 	107
math-cordic                     84 	85
math-partial-sums               347 	462
math-spectral-norm              74 	84
s3d-cube                        509 	437
s3d-morph                       117 	119
s3d-raytrace                    596 	575
string-fasta                    286 	284
Attachment #305049 - Flags: review?(edwsmith)
(Reporter)

Comment 14

10 years ago
why is DrawGridHeadless pathalogical?  it fills in from the back to the front but it doesn't create any holes at any time.  so this array is 100% dense, all the time, it just doesn't happen to start at 0 until the end.  
(Assignee)

Comment 15

10 years ago
Good point -- it's pathological only in that my dense-detection code only starts from the front, not the back. Let me see if it can be smartened to check both...
(Reporter)

Comment 16

10 years ago
ah, cool. that was, after all, the point of this bug... (see summary field).  I guess I meant implicitly also to not care about which direction they grow.  

access-nsieve starts at 2 and grows forward, and that one sped up, so something's definitely working.  maybe its just passing the 0-based density threshold since there's only two holes between 0 and length.
(Assignee)

Comment 17

10 years ago
yeah, nsieve et al improve because we have a threshold of size (32) below which we are always dense, which works well for tuple-ish usage as well as things that start at small but nonzero indices.

the problem with things that fill in backwards is that it implies our dense area needs to be able to start at a nonzero index, then grow downwards, which is certainly doable -- but I wonder if this is a common enough scenario to make it worth specializing for. It also requires that our underlying storage be able to efficiently grow "down" rather than "up" which isn't currently the case.
(Reporter)

Comment 18

10 years ago
gotcha.  well, if you're soliciting opinions, i do think its worth specializing for, or rather, making the code agnostic about downward v. upward growth, i think of it as un-specializing for upward growth.  (sure, just semantics).  although it requires shifting, shifting infrequently is nearly the same optimization as reallocing / copying infrequently, it can have some hysteresis built in.  maybe List<T> really is Dequeue<T>.
(Assignee)

Comment 19

10 years ago
ok, after a day of experimenting with this, I have something that seems to fit the bill. let me run compliance tests and other builds tonight and hopefully I should have a new patch tomorrow...
(Assignee)

Comment 20

10 years ago
Created attachment 305847 [details] [diff] [review]
Patch

Revised patch that incorporates Ed's suggestions: we now allow dense arrays to start places other than zero, and grow downward. This enabled removing of the reconvert-to-dense optimization. Most benches are same speed (within noise factor), a handful are slightly faster:

                                TT-OLD  TT-NEW
prime4                          176 	176
boidsHeadless                   3205 	3228
boidshackHeadless               578 	557
DrawGridHeadless                546 	538
access-binary-trees             248 	244
access-fannkuch                 211 	213
access-nbody                    295 	288
access-nsieve                   133 	95
bitops-3bit-bits-in-byte        28 	27
bitops-bits-in-byte             101 	82
bitops-bitwise-and              370 	368
bitops-nsieve-bits              85 	83
crypto-md5                      850 	864
crypto-sha1                     105 	104
math-cordic                     82 	82
math-partial-sums               333 	331
math-spectral-norm              72 	72
s3d-cube                        448 	416
s3d-morph                       117 	109
s3d-raytrace                    587 	559
string-fasta                    280 	280
Attachment #305049 - Attachment is obsolete: true
Attachment #305847 - Flags: review?(edwsmith)
Attachment #305049 - Flags: review?(edwsmith)
(Reporter)

Comment 21

10 years ago
doing something right, check this out:  

AS3Chess_Headless               2407    256   (lower is better)
Crypt                           10228   1335  (from JavaScriptGrande)

(Assignee)

Comment 22

10 years ago
whoa... I didn't get anything like those results on AS3Chess, are you using a different version from what I have? and/or does this include other changes?
(Reporter)

Comment 23

10 years ago
i pulled your patch, built & tested on windows, my AS3Chess is mod'd to print a time value (lower is better), otherwise it's stock.
(Reporter)

Comment 24

10 years ago
if a user subclasses Array, will it mess up the initialization?  i.e. will fields end up with nonzero-but-default values when we're expecting 0-values?
(Assignee)

Comment 25

10 years ago
I don't think so, but I'll doublecheck
(Reporter)

Comment 26

10 years ago
* in Array.hasOwnProperty(V), 

                        var uv:uint = V;
                        if (uv == V)
               return this._hasUintProperty(uv);

will call V.valueOf() if V is an object.  The E3 spec says we'll call
V.toString().    it's a super nit, and unfortunate, but if the fix is painful
it's something we need to track somehow.  maybe a separate bug for E3
compatibility issues.

* Array.push() is almost always called with just one arg.  i dont know if there's a good way to optimize for that case if the args have already been turned into an array of rest args, but something to bug-ize, for future enhancement, maybe.    

when the tracer is allowed to follow at least one backedge, then this push loop will iterate one time, so we'll inline exactly 1 iteration of the loop if there is 1 arg to push().  the value will still have been laundered through the args array, but we can think about array load/store optimizations in the future.  (ie if you start with an arg, put it in an array, get it back out, and the array dies on the same trace, we shouldn't even need to allocate the array.

* does iteration distinguish holes from undefined?  e.g.
var a = [undefined]
for each (var e in a) { print(e) } // should print undefined

var a = [1]
delete a[0]
for each (var e in a) { print(e) } // prints nothing

* if the test suite didn't catch this, can we add a test or create a bug to track the add-the-test-task. (above test is fine).

* the old sparse->dense converter was O(N^2) for some common cases, but the new one could be very expensive for the cases the old one handled cheaply.  (credit: Werner).  fact of life?  maybe have Werner review too.

(Assignee)

Comment 27

10 years ago
the new sparse-to-dense converter is never used (commented out) -- currently, once we go sparse, we never go back to dense.
(Assignee)

Comment 28

10 years ago
re: if a user subclasses Array

good catch! subclasses of Array aren't at all happy with this patch. working on it now.
(Assignee)

Comment 29

10 years ago
Re: hasOwnProperty, wouldn't it be sufficient to change it to

			var sv = V.toString();
			var uv:uint = sv;
			if (uv == sv)
				return this._hasUintProperty(uv);
?
(Reporter)

Comment 30

10 years ago
should work, although ideally only do that if V is an object.  hasOwnProperty isn't called by OP_findproperty right?  if not then i don't think it's performance critical.
(Assignee)

Comment 31

10 years ago
-- Entered new bug for Array.push issue as https://bugzilla.mozilla.org/show_bug.cgi?id=420368

-- hasOwnProperty isn't performance critical, but how's this:

			var sv = (V is Object) ? V.toString() : V;
			var uv:uint = sv;
			if (uv == sv)
				return this._hasUintProperty(uv);

-- iteration was in fact broken, as you predicted, as were subclasses of Array. I have fixes I'm testing now and will submit a new patch soon; I'll also enter another bug to add a test case to catch them (I have a prototype test that just needs massaging to fit into the test framework).

-- re: sparse-to-dense, since it's not used, want me to eliminate the code entirely (vs leaving it in for future reference)? it isn't well-tested because it didn't seem to be nearly so useful now that we can have dense arrays that don't start at 0.
(Reporter)

Comment 32

10 years ago
sounds good.

an untested slab of commented out dead code? if it was me i'd delete it.  unless its turned on and tested it will just bitrot, so its a matter of how much brainpower to rewrite later if we want to.
(Assignee)

Comment 33

10 years ago
Created attachment 306604 [details] [diff] [review]
Patch

One more patch, incorporating Edwin's suggestions. (I did delete the not-in-use reconvert-to-dense code.)
Attachment #305847 - Attachment is obsolete: true
Attachment #306604 - Flags: review?(edwsmith)
Attachment #305847 - Flags: review?(edwsmith)
(Reporter)

Updated

10 years ago
Attachment #306604 - Flags: review?(edwsmith) → review+
(Assignee)

Comment 34

10 years ago
pushed as changeset:   308:6c26de327b1c
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.