Closed Bug 689828 Opened 13 years ago Closed 13 years ago

[workitem] instrument Array to gather stats on density and monomorphic-ness

Categories

(Tamarin Graveyard :: Library, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pnkfelix, Assigned: pnkfelix)

References

Details

Attachments

(4 files, 2 obsolete files)

Bug 688486 and Bug 688772 both require data about how Arrays tend to be used in practice.

Questions include:

* Would the simple-dense heuristic from Bug 688486 suffice?  Or does it need to be generalized to cover cases outlined by Lars in Bug 688486, comment 7

* Are sufficiently many arrays monomorphic [*] to make speculation worthwhile for Array::setUintProperty (see Bug 688772, comment 1).

This bug is dedicated to patches for gathering such data, as well as presenting results from runs on real-world code.

[*] The notion of monomorphic-ness here has not yet been fully defined.  It is not yet clear whether the array jit code could benefit in the same way that vector has from properties like "contains only RCObjects."  Unlike Vector, the Array jit code can only speculate that such properties hold; further discussion points are presented on Bug 688772, comment 3 and Bug 688772, comment 4.
Assignee: nobody → fklockii
Flags: flashplayer-bug-
The killer cases for monomorphism are probably int, uint, and Number, with some generic RCObject notion a runner-up (would be interesting to see what the JS engines are doing here, too).

Since int and uint can reasonably be coerced to Number in an Array (there's no requirement that the int representation is retained, for example) the analysis should be careful to also count "numeric" arrays, ie arrays containing mixtures of int, uint, and Number - who knows what we'll find.

One of the issues that are unresolved in my head is whether monomorphism vs unboxed-representation-Arrays are orthogonal or whether they're two ways of attacking the same problem.  If monomorphic arrays dominate strongly then a monomorphic-representation change might be a winner; if not, an unboxed-representation change might be more appropriate.  Either will be a leg up on what we have now if a lot of Arrays have numeric data.
I've got a plausible patch queue with instrumentation for array shapes on each array operation.  It generates a ton of data if you ask for all array operations, but it was easy (I think) to specialize it to only track array operations from the jit (i.e. the invocations to ArrayObject::_getUintProperty and friends that originate from the jit, as opposed to originating from within ArrayObject.cpp or elsewhere in the VM).

Here's the somewhat interesting data item: Unless the instrumentation is broken, it appears that relatively few of our avmshell performance tests are exercising the jit->array control paths.

In particular, out of 517 benchmarks in tests/performance, only 44 following report jit->array invocations.  If you take out asmicro, then only these 14 have jit->array invocations:

jsmicro/arguments-3.abc.log
misc/SHA256-array.abc.log
misc/boids.abc.log
misc/gameoflife.abc.log
scimark/FFT.abc.log
scimark/LU.abc.log
scimark/MonteCarlo.abc.log
scimark/SOR.abc.log
scimark/SparseCompRow.abc.log
sunspider/access-nbody.abc.log
sunspider/crypto-md5.abc.log
sunspider/crypto-sha1.abc.log
sunspider/s3d-raytrace.abc.log
sunspider/string-validate-input.abc.log
(In reply to Felix S Klock II from comment #2)
> Here's the somewhat interesting data item: Unless the instrumentation is
> broken, it appears that relatively few of our avmshell performance tests are
> exercising the jit->array control paths.
> 
> In particular, out of 517 benchmarks in tests/performance, only 44 following
> report jit->array invocations.  If you take out asmicro, then only these 14
> have jit->array invocations:

Ignore.  User error; I was not gathering these numbers the right way.  (The above are underestimates, and probably gross underestimates)
Also try -Ojit, a lot of the test cases use top-level code.
(In reply to Felix S Klock II from comment #3)
> (In reply to Felix S Klock II from comment #2)
> > Here's the somewhat interesting data item: Unless the instrumentation is
> > broken, it appears that relatively few of our avmshell performance tests are
> > exercising the jit->array control paths.
> > 
> > In particular, out of 517 benchmarks in tests/performance, only 44 following
> > report jit->array invocations.  If you take out asmicro, then only these 14
> > have jit->array invocations:
> 
> Ignore.  User error; I was not gathering these numbers the right way.  (The
> above are underestimates, and probably gross underestimates)

Corrected figure: 104 benchmarks (out of 517) has jit->array invocations; after taking out asmicro, the figure is 74 (out of 305 total).

(I will report results acquired via -Ojit shortly.)
(In reply to Felix S Klock II from comment #5)
> Corrected figure: 104 benchmarks (out of 517) has jit->array invocations;
> after taking out asmicro, the figure is 74 (out of 305 total).
> 
> (I will report results acquired via -Ojit shortly.)

Number of benchmarks that exercise jit->array control paths is unchanged when running with -Ojit.

(The nature of how each such benchmarks exercises the control paths *is* changed, at least for some of the benchmarks; in particular the number of such invocations seems to increase with -Ojit, based on a brief skim of a diff of the logs.  So that gives me a little bit of confidence that I'm not totally out to lunch here.)

Anyway 74/305 is a lot better than 14/305; I'm not as worried about this hypothetical problem (that our benchmarks are not exercising certain paths as heavily as they should to be considered real representatives); the current amount of exercise may well be good enough.

Next up: Player land.
Attached file swf generated from attachment D (obsolete) —
This and attachment D are just meant for double-checking the sanity of my instrumentation.

(I will post my instrumentation patches after I am confident in the results they are producing; which will hopefully be tonight or tomorrow morning.)
Attachment #568146 - Attachment mime type: application/octet-stream → application/x-shockwave-flash
Initial results showed that 100 operations is too small to stand out in the data, where a baseline run that doesn't hit any of the buttons already hits several of these paths as much as 5 thousand times.  So amp'ed up the number of operations to 100,000.
Attachment #568144 - Attachment is obsolete: true
(update to match D v2)
Attachment #568146 - Attachment is obsolete: true
Attachment #568384 - Attachment mime type: application/octet-stream → application/x-shockwave-flash
Posted current patch queue for the instrumentation work to:

  http://hg.mozilla.org/users/fklockii_adobe.com/arrayshape-patches/

(and I plan to keep future revisions there; this code is not targetted to be landed in TR itself, and thus won't require peer review, so there is not much reason to post patches for it to this ticket.)

I'll be posting a script for processing the output of the instrumentation [1] shortly, and then begin evaluating Flash content.

[1] The instrumentation output is stored into a log file, arrayshape.txt -- see the patches add-avmcore-redirect-vmpilog and add-redirectlog-to-arrayshapes.txt on the patch queue.
Sample output when applied to smoke test D v2 for three cases:

Log file: ArrayDemo-logs/baseline.arrayshapes.txt
{
 "_jit_getIntProperty": {
  "nonsimple": 5152,
  "simple": 3502
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 167
 },
 "_jit_setIntProperty": {
  "nonsimple": 44,
  "simple": 92
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 88
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 19
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 6
 }
}
Log file: ArrayDemo-logs/untyped-int-array.arrayshapes.txt
{
 "_jit_getIntProperty": {
  "nonsimple": 6832,
  "simple": 5352
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 393
 },
 "_jit_setIntProperty": {
  "nonsimple": 44,
  "simple": 188
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 184
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 37
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 100006
 }
}
Log file: ArrayDemo-logs/typed-int-array.arrayshapes.txt
{
 "_jit_getIntProperty": {
  "nonsimple": 6992,
  "simple": 5505
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 100413
 },
 "_jit_setIntProperty": {
  "nonsimple": 44,
  "simple": 196
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 100192
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 39
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 6
 }
}

Looking at this, I realize that while I'm gathering data on getpropertylate_* (to see how often we make such a call on a simple-dense array lacking a type annotation on the receiver), I forgot to put in similar instrumentation for setpropertylate_*.

For now, lets go ahead and gather the end data.  If it looks like there are a lot of hits for simple-dense-getpropertylate, then I'll throw in the setpropertylate instrumentation after that.  But if simple-dense-getpropertylate does not seem like a promising case to optimize, then it is reasonable to conclude that setpropertylate would have a similar outcome.
1. Some of the MMgc Stress Test Media is exposing a problem in my instrumentation code and/or the Player:
  - Flex DataGridScroll causes a crash.  I wasn't able to immediately determine the cause; looks like jit issue.  Need to double-check with a vanilla player later.
  - {Moodstream, MonoFace, GetTheGlass} all fail to generate a log file at all.  This may be symptom of flaw with my logic/hack for dealing with the issue that I've put the data on the AvmCore but our VMPI_log and RedirectLogOutput is global and not AvmCore-specific.

2. Having said that, with the remaining benchmarks in MMgc Stress Test Media, I think there is solid evidence that simple arrays show up as the majority (often vast majority) of the arrays (and objects in general) encountered by the JIT when it knows that the index expression is int/uint.

  - Slight exceptions (arguable): checkin, cignatabbing

  - The low number of hits for any of the operations post-postprocessing on some of the benchmarks (e.g. crushthecastle) leads me to think that I should either broaden the scope of the benchmarks I am surveying, *or* wrap this up quickly because the hypothetical optimization won't justify putting much more effort into it.  (I may want to extend the instrumentation to include other objects like Vector and then double-check these cases.)

Log file: BrentMark-Logs/bigpixelzombies-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 12741
 },
 "_jit_getIntProperty": {
  "nonsimple": 0,
  "simple": 775356
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 37520
 },
 "_jit_setDoubleProperty": {
  "nonsimple": 0,
  "simple": 528
 },
 "_jit_setIntProperty": {
  "nonsimple": 0,
  "simple": 23195
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 129
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 52
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 50
 }
}
Log file: BrentMark-Logs/brainstorm-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 58,
  "simple": 1704247
 },
 "_jit_getIntProperty": {
  "nonsimple": 3338647,
  "simple": 480769096
 },
 "_jit_getUintProperty": {
  "nonsimple": 2226,
  "simple": 11256
 },
 "_jit_setDoubleProperty": {
  "nonsimple": 0,
  "simple": 11
 },
 "_jit_setIntProperty": {
  "nonsimple": 251,
  "simple": 697026
 },
 "_jit_setUintProperty": {
  "nonsimple": 1100,
  "simple": 2206
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 1412521
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 19627
 }
}
Log file: BrentMark-Logs/checkin-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 584
 },
 "_jit_getIntProperty": {
  "nonsimple": 33247,
  "simple": 3070
 },
 "_jit_getUintProperty": {
  "nonsimple": 99,
  "simple": 279
 },
 "_jit_setIntProperty": {
  "nonsimple": 48,
  "simple": 1409
 },
 "_jit_setUintProperty": {
  "nonsimple": 68,
  "simple": 85
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 11554
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 505
 }
}
Log file: BrentMark-Logs/cignatabbing-arrayshapes.txt
{
 "_jit_getIntProperty": {
  "nonsimple": 37614627,
  "simple": 2422812
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 1791
 },
 "_jit_setIntProperty": {
  "nonsimple": 21,
  "simple": 14090
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 1
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 18733428
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 16427
 }
}
Log file: BrentMark-Logs/coverflow-arrayshapes.txt
{
 "_jit_getIntProperty": {
  "nonsimple": 0,
  "simple": 3038
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 92
 }
}
Log file: BrentMark-Logs/crushthecastle-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 24
 },
 "_jit_getIntProperty": {
  "nonsimple": 0,
  "simple": 983
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 513
 },
 "_jit_setDoubleProperty": {
  "nonsimple": 0,
  "simple": 4
 },
 "_jit_setIntProperty": {
  "nonsimple": 0,
  "simple": 126
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 123
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 486
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 33
 }
}
Log file: BrentMark-Logs/mechanism-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 1496
 },
 "_jit_getIntProperty": {
  "nonsimple": 222,
  "simple": 4632715
 },
 "_jit_getUintProperty": {
  "nonsimple": 0,
  "simple": 829810
 },
 "_jit_setDoubleProperty": {
  "nonsimple": 0,
  "simple": 113
 },
 "_jit_setIntProperty": {
  "nonsimple": 0,
  "simple": 1697715
 },
 "_jit_setUintProperty": {
  "nonsimple": 84,
  "simple": 136320
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 1893
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 75910
 }
}
Log file: BrentMark-Logs/waterlife-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 215050
 },
 "_jit_getIntProperty": {
  "nonsimple": 0,
  "simple": 672373
 },
 "_jit_getUintProperty": {
  "nonsimple": 746,
  "simple": 795909
 },
 "_jit_setIntProperty": {
  "nonsimple": 0,
  "simple": 46466
 },
 "_jit_setUintProperty": {
  "nonsimple": 0,
  "simple": 2011
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 71
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 284
 }
}
Log file: BrentMark-Logs/watson-arrayshapes.txt
{
 "_jit_getDoubleProperty": {
  "nonsimple": 0,
  "simple": 11959
 },
 "_jit_getIntProperty": {
  "nonsimple": 2091456,
  "simple": 216971
 },
 "_jit_getUintProperty": {
  "nonsimple": 1933,
  "simple": 12853
 },
 "_jit_setIntProperty": {
  "nonsimple": 45,
  "simple": 25375
 },
 "_jit_setUintProperty": {
  "nonsimple": 986,
  "simple": 1992
 },
 "getpropertylate_i": {
  "nonsimple": 0,
  "simple": 992891
 },
 "getpropertylate_u": {
  "nonsimple": 0,
  "simple": 10565
 }
}
the data in comment 13 was from running the post-process script (attachment 568393 [details]) on the data in the log files in this tar.gz ball.
(In reply to Felix S Klock II from comment #13)
>   - Flex DataGridScroll causes a crash.  I wasn't able to immediately
> determine the cause; looks like jit issue.  Need to double-check with a
> vanilla player later.

This didn't reproduce with the vanilla p4 player, so there's no immediate fire to put out there on that side.  I haven't tried with a vanilla FRR+TR yet.

>   - {Moodstream, MonoFace, GetTheGlass} all fail to generate a log file at
> all.  This may be symptom of flaw with my logic/hack for dealing with the
> issue that I've put the data on the AvmCore but our VMPI_log and
> RedirectLogOutput is global and not AvmCore-specific.

This is notabug; all three of the above are AS2 (not AS3), and therefore not eligible for these optimizations.
(In reply to Felix S Klock II from comment #13)
> Log file: BrentMark-Logs/cignatabbing-arrayshapes.txt
> {
>  "_jit_getIntProperty": {
>   "nonsimple": 37614627,
>   "simple": 2422812
>  },

Looked at this case (cignatabbing) randomly while trying to decide what to work on next.

As one can see from the rest of the histogram for this case, nearly all of the other instrumented operations are on simple arrays.  But then these 37 million calls to getIntProperty are dominating, and are nonsimple.  So: What are they?  From the log file, here's the answer:

[array] _jit_getIntProperty denseStart:2 denseUsed:6 m_length:8 m_isSimple:0 {untagged:0, object:6, string:0, namespace:0, undefined:0, boolean:0, integer:0, double:0, denseRC:6, denseDI32:0 denseDI64:0 nonDense:0} = count:7221996
[array] _jit_getIntProperty denseStart:2 denseUsed:7 m_length:9 m_isSimple:0 {untagged:0, object:7, string:0, namespace:0, undefined:0, boolean:0, integer:0, double:0, denseRC:7, denseDI32:0 denseDI64:0 nonDense:0} = count:30389741

Note in particular: these are non-simple because they have a non-zero denseStart; but they appear to be otherwise simple (because they start at 2, and their denseUsed+2=length).  So it might be worth spending a little bit of time seeing if I can generalize the approach of Bug 688486 so allow non-zero denseStart for simple arrays.  (Its somewhat unpalatable because it would require loading the denseStart, but its possible the cost will be restricted to that and a subtraction op.)
Instead of introducing denseStart would it not be easier simply to accept that there may be holes, and check for them, and then when denseStart is low enough the C++ code would just make it zero and insert holes at the beginning?  Not clear the inline code bloat will be any worse than having to deal with denseStart.
Good point.  I had been avoiding allowing holes precisely due to the inline code bloat cost.  But the inline code bloat for allowing holes is probably comparable to that for denseStart-compensentation, and would avoid the memory read imposed by loading denseStart.

Checking for a hole does introduce a branch instruction that would not necessarily be there with denseStart.  I'll try to evaluate both approaches.

(There is one case the hole-support approach would not suffice but the denseStart-compensation would: if you look at watson-arrayshapes.txt (within tarball TB), there are 2,091,320 hits on getIntProperty with an array that holds one object at index 526041152.  But I think that is an exceptional corner case; Watson looks like a strange beast.)

(Also, any of this could probably land in a later phase of optimization.  There's a decent argument that I should focus on landing the current (very-)simple dense work before attempting to extend its coverage.)
(In reply to Felix S Klock II from comment #18)

> Checking for a hole does introduce a branch instruction that would not
> necessarily be there with denseStart.  I'll try to evaluate both approaches.

You may be able to merge the two branches by making use of the fact that a hole is represented as all-bits-zero (conditional move / setcc may be your friend, don't know about ARM).

> (Also, any of this could probably land in a later phase of optimization. 
> There's a decent argument that I should focus on landing the current
> (very-)simple dense work before attempting to extend its coverage.)

Words out of my mouth.  Always Be Closing!
(In reply to Felix S Klock II from comment #18)
> Checking for a hole does introduce a branch instruction that would not
> necessarily be there with denseStart.  I'll try to evaluate both approaches.

Filed as a work-item: Bug 702260.

There is another implicit work-item that could be forked off here: to analyze the data logs attached to this ticket i.e. Bug 689828 to determine the frequency of monomorphic arrays of e.g. pure int, pure uint, pure Number, etc, and using that data to make speculative constructions of monomorphic array representations dynamically.  (But this may not be the right direction to go in long term, versus just encouraging people to use Vectors...)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: