All users were logged out of Bugzilla on October 13th, 2018

[pacman] Speculatively inline Array::getUintProperty in the JIT

RESOLVED FIXED

Status

RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: pnkfelix, Assigned: pnkfelix)

Tracking

Details

(Whiteboard: has-patch, PACMAN)

Attachments

(6 attachments, 6 obsolete attachments)

3.86 KB, patch
treilly
: review+
lhansen
: superreview+
lhansen
: feedback+
Details | Diff | Splinter Review
41.97 KB, text/plain
Details
41.89 KB, text/plain
Details
4.48 KB, patch
treilly
: review+
lhansen
: superreview+
Details | Diff | Splinter Review
28.70 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
8.27 KB, patch
lhansen
: review+
lhansen
: superreview+
Details | Diff | Splinter Review
For Array fetches on arrays that (1) are dense, (2) are not subclassed [*], (3) have no holes and (4) start at 0, we can use a similar path to the inline code that recently landed for Vector (Bug 599099).

Doing this properly might require adding a new member that tracks whether an array is "simple" (ie conforms to the above constraints) to the Array representation; that is probably acceptable, especially since simple-ness is likely to be required to do a similar treatment for Array::setUintProperty.

[*] Regarding the "not subclassed" constraint: Maybe this constraint is unnecessary?  Sealed-ness should not matter for reads (only for writes, i.e. setUintProperty).  Do overrides of the length getter matter for the semantics of Array referencing?
(Assignee)

Updated

7 years ago
Blocks: 687060
(Assignee)

Updated

7 years ago
Assignee: nobody → fklockii
(FYI: it isn't out of the question to remove other constraints, e.g. (3) and (4) may not be necessary, depending on how much code generation we're willing to pay for in the jit.  But the constraints as written provide a simple start that is likely to match common use cases.)
Created attachment 562044 [details] [diff] [review]
patch A v1: drive-by refactor, moving some inlines to .cpp

These inline routines are not be called from outside ArrayObject.cpp, at least within AVM.

When I started, I thought it would ease development (mostly by reducing compile times) to have them in the .cpp rather than the -inlines.h header.

In hindsight these routines were not touched by my subsequent work on this bug.  So it probably does not matter for now if this patch lands.
Created attachment 562049 [details] [diff] [review]
patch B: track "simple" arrays via explicit boolean field.

I haven't come up with a more clever representation than a separate field in the ArrayObject, and arguably its not worth trying to do so.

This patch isn't 100% of the story; it gets things started so that code like:
  var a = []
produces a simple-marked array.  But that's not good enough, since it doesn't cover:
  var b = new Array(0);

Follow-on patch C will show one way to cover the second case.
Attachment #562049 - Flags: feedback?(lhansen)
Created attachment 562055 [details] [diff] [review]
patch C: mark Array() constructed arrays as simple.

I admit up front that this patch smells bad.

The main goal is to identify which constructor calls are direct calls to new Array(..) itself, versus calls to the Array constructor from one of its subclasses.

The approach is modeled after what Steven did for Bug 654807.

I briefly tried to take a more disciplined approach by directly changing ArrayClass::createInstanceProc but quickly realized that I don't know enough about nativegen to do that.

(Also, the description for this bug mentions that it may not be necessary to keep the not-subclassed constraint.  Even so, its probably better to track m_isSimple in a manner analogous to what this patch does, to enable speculative inlining of setUintProperty.)
Attachment #562055 - Flags: feedback?(lhansen)
Created attachment 562059 [details] [diff] [review]
patch D v1: jit speculative fast-path for simple arrays

The payoff!  :)

This is my first foray into CodegenLIR.

My use of suspendCSE and resumeCSE is based on a conversation with wmaddox about how to generate code with a diamond control flow.

Here are some preliminary performance results; note that the read-{C,Number,int}-2 cases are all sparse arrays, so its not surprising that they regressed.  The degree of slow-down in the init's is a little more troubling:

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  array-1                    1706.3  1702.9  3802.2  3800.6   122.8   123.2 ++
  array-2                     553.4   550.8   542.9   542.4    -1.9    -1.5 - 
  array-init-1                398.2   398.0   372.6   371.7    -6.4    -6.6 --
  array-init-2                418.7   417.2   395.6   392.1    -5.5    -6.0 --
  array-init-3                266.2   265.0   257.2   256.1    -3.4    -3.4 - 
  array-init-4                253.0   252.9   247.5   247.2    -2.2    -2.3 - 
  array-pop-1                 383.2   380.1   350.3   346.5    -8.6    -8.8 --
  array-push-1                240.3   239.6   257.7   257.4     7.3     7.4 ++
  array-read-C-1              176.8   176.7   393.8   393.7   122.7   122.8 ++
  array-read-C-2               26.5    26.5    26.1    26.0    -1.7    -1.5 - 
  array-read-Number-1         174.8   174.7   397.2   393.1   127.2   125.1 ++
  array-read-Number-2          26.1    26.0    26.1    25.9    -0.0    -0.6   
  array-read-int-1            171.8   171.7   396.2   394.2   130.6   129.6 ++
  array-read-int-2             26.3    26.2    26.1    26.0    -0.8    -0.5   
  array-read-int-5            161.5   161.1   276.7   276.5    71.3    71.6 ++
  array-read-int-6            158.9   158.0   271.5   271.1    70.8    71.6 ++
  array-read-int-7            158.2   157.6   268.9   268.4    70.0    70.3 ++
  array-read-length-1          69.7    69.6    69.6    69.5    -0.1    -0.2
Attachment #562059 - Flags: feedback?(lhansen)
(In reply to Felix S Klock II from comment #0)
> [*] Regarding the "not subclassed" constraint: Maybe this constraint is
> unnecessary?  Sealed-ness should not matter for reads (only for writes, i.e.
> setUintProperty).  Do overrides of the length getter matter for the
> semantics of Array referencing?

(In reply to Felix S Klock II from comment #4)
> (Also, the description for this bug mentions that it may not be necessary to
> keep the not-subclassed constraint.  Even so, its probably better to track
> m_isSimple in a manner analogous to what this patch does, to enable
> speculative inlining of setUintProperty.)

Note also that getUintProperty is virtual on the C++ side; this is not just about preserving behavior for AS3 subclasses of Array, but also for C++ subclasses of Array.  (On the other hand, its conceivable that SemiSealedArray is the only such subclass...)

Anyway, the simple thing for the short term is to use the definition of simple-ness that I originally wrote in this description; that is, make subclassing (from either C++ or AS3) immediately imply non-simpleness.
(Assignee)

Updated

7 years ago
Whiteboard: has-patch

Comment 7

7 years ago
F+ on all three with the following comments.

- We need some stats on whether our "simple" heuristic is really good enough
  in real content.  It may well be that it isn't, and that we need to make
  allowances for some arrays that have a few holes, notably at the beginning
  and end, and possibly temporarily - ie arrays that are dense but not simple
  by this definition.  Good start though - and it might be enough!  Indeed if
  it is enough then type-specialized arrays will be a little easier to deal with.

- We need to look carefully at the generated code on x86 and ARM.  My gut feeling
  is that the m_simple field is not the way to go and that an m_simpleLength field
  would be better; it would be nonzero only if the array is simple.  Thus
  a single load and compare and branch will be necessary, while the code as it
  is here now has two loads and will generate an intermediate boolean result
  (sometimes expensive - cmov, setcc, whatever) and there is then additional
  manipulation of that.

  That setup will also come in handy if we think about type-specializing simple
  arrays - one length field per type we care about.

- For perf results, please always provide info about the machine they were 
  obtained on...  Anyway in this case I think that you should try a second
  architecture.  The slowdown on the init benchmarks is probably caused by
  cache effects or microarchitecture.

Memo to self: I need to look into whether the Vector optimizations should worry about disabling CSE in the manner you do here.

Updated

7 years ago
Attachment #562049 - Flags: feedback?(lhansen) → feedback+

Updated

7 years ago
Attachment #562055 - Flags: feedback?(lhansen) → feedback+

Updated

7 years ago
Attachment #562059 - Flags: feedback?(lhansen) → feedback+
(In reply to Lars T Hansen from comment #7)
> - For perf results, please always provide info about the machine they were 
>   obtained on...  Anyway in this case I think that you should try a second
>   architecture.  The slowdown on the init benchmarks is probably caused by
>   cache effects or microarchitecture.

My bad, I'm usually better about that.

The results in comment 5 were from Release 32-bit avmshell builds on my MacBook Pro (a 2.8 Ghz Intel Core 2 Duo).

Comment 9

7 years ago
> Memo to self: I need to look into whether the Vector optimizations should
> worry about disabling CSE in the manner you do here.

As a rule of thumb, if you're in CodegenLIR and generating diamond-shaped code,
then you need to disable/enable CSE around the fork/join flow.  Assume that
a label clears the CSE state.  Whether or not clearing CSE state really matters
depends more on the surrounding code than on the inlined path itself, so the 
effects of resetting CSE will likely be smaller on microbenchmarks than real code.
(Assignee)

Updated

7 years ago
Depends on: 689828
(Assignee)

Updated

7 years ago
Whiteboard: has-patch → has-patch, PACMAN
Created attachment 569288 [details] [diff] [review]
patch B v2: separate simple len and boolean

(we still need a boolean internally because we need to distinguish 0-length arrays that will remain simple when populated from 0-length arrays that cannot be simple when populated, namely the ones that have been subclassed.  But now the jit can just load the simple length flag on its own.)
Attachment #562049 - Attachment is obsolete: true
Created attachment 569289 [details] [diff] [review]
patch D v2: jit simple lenflag

(In case its not clear, patch B v2 and patch D v2 are incorporating Lars' suggestion from comment 7 to use a m_simpleLength field.)

I'll post results in a little bit.
Attachment #562059 - Attachment is obsolete: true
Created attachment 569356 [details]
transcript N1 v1: output from perf tests on Nexus One

Nexus One results

Executive summary (filtering out results where |%dAvg| < 4%, which I arbitrarily chose, and may distort true nature of results, etc etc):

Tamarin tests started: 2011-10-25 08:26:33.847245
Executing 588 test(s)
avm: ../../objdir-reldroid/shell/android_shell.baseline-r6676.py  version: 6679:42790e63d9bc
avm2: ../../objdir-reldroid/shell/android_shell.simple-lenflag.py  version: 6685:57759fa81991
iterations: 5

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  alloc-2                       3.1     2.6     3.0     3.0    -1.1    16.1   
  array-1                     200.8   200.1   462.5   462.3   130.3   131.0 ++
  array-read-C-1               22.4    22.3    74.9    74.9   235.2   236.4 ++
  array-read-Number-1          23.3    23.3    75.0    74.9   222.0   222.1 ++
  array-read-int-1             23.3    23.3    74.9    74.9   221.9   222.2 ++
  array-read-int-5             20.0    19.8    46.8    46.7   134.1   135.7 ++
  array-read-int-6             19.5    19.1    49.0    49.0   151.7   156.8 ++
  array-read-int-7             20.0    19.7    46.8    46.7   134.3   136.9 ++
  array-shift-1                16.0    15.8    15.3    15.2    -4.0    -4.2 - 
  parseInt-1                   19.2    19.2    17.7    17.5    -7.9    -8.7 --
  string-indexOf-3             19.7    19.7    21.5    21.4     8.7     8.5 ++
vector-nested-read-Number-2    13.9    13.1    13.9    13.9     0.1     5.6   
  vector-write-C-1             11.9    11.9    10.8    10.8    -9.5    -9.5 --
Metric: iterations/second 
Dir: jsmicro/
  arguments-3                   2.1     2.1     2.2     2.2     4.3     4.2 + 
  array-shift-1                 5.2     5.2     5.5     5.5     4.3     4.8 + 
  array-sort-1                  4.0     3.4     4.0     4.0     0.7    18.4   
  parseInt-1                   13.4    13.4    12.6    12.6    -6.3    -6.5 --
  string-indexOf-3              6.5     6.5     6.9     6.9     6.6     6.5 ++
Metric: time 
Dir: language/string/
  static_ascii_array_50      1914.0  1919.8  1950.0  1998.0    -1.9    -4.1   
Dir: language/string/typed/
  indexOf                    1458.0  1458.6  1518.0  1518.8    -4.1    -4.1 - 
  search                      839.0   889.2   790.0   830.8     5.8     6.6   
Metric: time 
Dir: scimark/
  MonteCarlo                22607.0 22650.8 20918.0 20951.4     7.5     7.5 ++
  SparseCompRow               746.0   750.8   597.0   604.4    20.0    19.5 ++
Dir: sunspider/
  bitops-nsieve-bits          268.0   269.6   251.0   251.4     6.3     6.8 ++
Dir: sunspider/as3/
  access-fannkuch             257.0   258.2   222.0   222.2    13.6    13.9 ++
  access-nbody                 55.0    55.0    51.0    51.4     7.3     6.5 ++
  bitops-bitwise-and           12.0    12.6    12.0    12.0     0.0     4.8   
  bitops-nsieve-bits          181.0   181.4   166.0   167.2     8.3     7.8 ++
  crypto-aes                  228.0   228.2   210.0   210.8     7.9     7.6 ++
  crypto-md5                  196.0   196.6   211.0   211.4    -7.7    -7.5 --
  math-cordic                 155.0   155.2   149.0   149.0     3.9     4.0 + 
Dir: sunspider-0.9.1/typed/
  access-fannkuch             327.0   327.8   294.0   294.8    10.1    10.1 ++
  access-nbody                 39.0    39.0    35.0    35.4    10.3     9.2 ++
  bitops-nsieve-bits          185.0   185.2   167.0   168.0     9.7     9.3 ++
  crypto-aes                  194.0   195.0   175.0   175.8     9.8     9.8 ++
  math-cordic                 165.0   165.2   155.0   155.8     6.1     5.7 ++
  s3d-raytrace                 77.0    77.2    73.0    73.2     5.2     5.2 ++


(My caveat at the top was half tongue-in-cheek; but note that there *are* a large number of additional regressions in the transcript; they just all had |%dAvg| < 4.)

Anyway, there are also the significant regressions listed above, which I don't have any explanation for.  (Why on earth would vector-write-C-1 regress by 10%?)
Created attachment 569363 [details] [diff] [review]
patch D v3: jit simple lenflag

This is essentially the same as patch D v2 except that I inverted the direction of the comparison and the corresponding branch in the hopes that it would improve branch prediction.

From what I can tell, the effect of this is either in the noise, or merely dampens the effect of the change (so that most of the regressions I saw seem to boil away to close to nothing, and the improvements may be lessened but only slightly so on the microbenchmarks).
Attachment #569289 - Attachment is obsolete: true
Created attachment 569373 [details]
transcript MBP v1: output from perf tests on fklockii-MacBookPro

fklockii-MacBookPro (32-bit release shell, 2.8 Ghz Core 2 Duo) results

Executive summary (filtering out results where |%dAvg| < 4%):

Tamarin tests started: 2011-10-25 13:57:46.694479
Executing 588 test(s)
avm: ../../objdir-rel32/shell/avmshell.baseline-r6676  version: 6679:42790e63d9bc
avm2: ../../objdir-rel32/shell/avmshell.simple-lenflag  version: 6683:86dd6b710726
iterations: 5

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  alloc-11                     13.3    13.2    12.5    12.4    -5.8    -6.0 --
  alloc-12                     13.2    13.1    12.3    12.3    -6.7    -6.4 --
  alloc-4                      45.3    45.3    42.6    42.6    -6.0    -6.0 --
  alloc-8                       8.9     8.3     9.0     8.7     0.4     4.9   
  arguments-1                 724.3   723.3   677.3   676.8    -6.5    -6.4 --
  array-1                    1707.3  1706.9  4127.9  4124.9   141.8   141.7 ++
  array-init-1                395.2   393.8   374.9   374.7    -5.1    -4.9 --
  array-init-2                426.1   425.1   396.2   395.8    -7.0    -6.9 --
  array-init-3                273.2   272.5   249.8   249.4    -8.6    -8.5 --
  array-init-4                259.7   259.5   240.5   240.4    -7.4    -7.4 --
  array-read-C-1              174.1   173.7   441.1   440.4   153.3   153.5 ++
  array-read-Number-1         177.5   177.3   448.1   447.7   152.5   152.5 ++
  array-read-int-1            170.5   170.3   448.6   448.4   163.1   163.4 ++
  array-read-int-5            163.3   163.3   321.7   321.5    96.9    96.9 ++
  array-read-int-6            163.5   163.2   321.0   320.8    96.3    96.5 ++
  array-read-int-7            159.8   159.8   318.0   317.8    99.0    98.9 ++
  array-sort-1                 25.2    24.9    25.1    23.6    -0.5    -5.2   
  array-sort-4                  8.8     8.7     8.7     8.3    -1.0    -4.7   
  array-unshift-1             151.2   151.0   149.4   138.9    -1.2    -8.0   
  array-write-int-1            50.4    47.4    51.6    50.7     2.4     7.0   
  array-write-int-2            27.5    27.4    27.6    26.3     0.5    -4.2   
  bytearray-read-byte-3        64.2    56.1    63.9    53.1    -0.5    -5.2   
  bytearray-read-double-2      22.8    21.4    22.6    22.4    -0.9     4.8   
  bytearray-read-int-1         19.9    19.8    21.0    20.9     5.5     5.4 ++
  bytearray-read-int-2         25.6    25.5    24.0    24.0    -6.1    -6.0 --
  bytearray-read-ubyte-1       28.2    28.1    26.3    26.3    -6.7    -6.6 --
  bytearray-read-utfbytes-1    27.1    27.1    26.0    26.0    -4.0    -4.0 - 
  bytearray-write-byte-1       13.4    12.8    13.5    13.4     1.1     4.9   
  bytearray-write-utf-1        36.0    35.9    30.9    30.9   -14.0   -13.9 --
  bytearray-write-utf-2        22.3    22.2    20.2    20.1    -9.6    -9.6 --
 bytearray-write-utfbytes-1    47.2    47.2    36.3    36.3   -23.1   -23.1 --
 bytearray-write-utfbytes-2    24.9    24.8    22.7    22.6    -8.8    -8.7 --
  closedvar-write-1          3115.9  3109.9  3430.6  3394.4    10.1     9.1 ++
  funcall-4                   107.6   106.9   103.3   102.4    -4.0    -4.2 - 
  restarg-1                   723.3   721.5   675.6   673.9    -6.6    -6.6 --
  string-indexOf-1            189.4   189.2   199.8   199.2     5.5     5.3 ++
  vector-init-1               213.4   212.2   194.8   194.8    -8.7    -8.2 --
  vector-init-2               236.1   235.6   220.3   219.1    -6.7    -7.0 --
  vector-init-3               146.4   145.9   130.5   130.3   -10.9   -10.7 --
Dir: jsmicro/
  alloc-1                      37.2    37.0    35.6    35.6    -4.3    -4.0 - 
  alloc-10                      8.4     8.1     7.9     7.7    -5.5    -4.2   
  alloc-7                      34.0    33.4    33.5    31.8    -1.6    -4.7   
  arguments-3                  15.6    15.4    16.6    16.3     6.7     6.2 ++
  array-2                     262.2   261.1   283.2   278.5     8.0     6.7 ++
Metric: time 
Dir: language/string/
  replace2                   1105.0  1106.4  1244.0  1245.2   -12.6   -12.5 --
Dir: language/string/typed/
  replace2                   1101.0  1101.8  1246.0  1247.8   -13.2   -13.3 --
Metric: iterations/second 
Dir: misc/
  bytearray-to-vector-be      569.9   565.4   526.9   522.0    -7.5    -7.7 --
  bytearray-to-vector-le      656.3   654.6   597.4   585.2    -9.0   -10.6 --
  bytearray-to-vector-worst   573.4   562.7   523.5   522.6    -8.7    -7.1 --
Metric: time 
Dir: scimark/
  MonteCarlo                 2884.0  2912.6  2702.0  2720.6     6.3     6.6 ++
  SparseCompRow                99.0   100.2    80.0    80.0    19.2    20.2 ++
Dir: sunspider/
  bitops-bits-in-byte          27.0    27.6    28.0    28.8    -3.7    -4.3 - 
Dir: sunspider/as3/
  access-fannkuch              42.0    43.2    39.0    39.4     7.1     8.8 + 
  access-nsieve                14.0    14.0    14.0    14.8     0.0    -5.7   
  bitops-3bit-bits-in-byte      5.0     5.8     5.0     5.4     0.0     6.9   
  bitops-bitwise-and            1.0     1.6     1.0     1.2     0.0    25.0   
  bitops-nsieve-bits           25.0    25.8    24.0    24.2     4.0     6.2 + 
  controlflow-recursive         4.0     4.4     4.0     4.2     0.0     4.5   
  crypto-aes                   30.0    31.0    28.0    28.8     6.7     7.1 + 
  crypto-md5                   28.0    28.0    29.0    29.8    -3.6    -6.4 - 
  math-cordic                  14.0    15.0    13.0    13.4     7.1    10.7   
  math-spectral-norm            6.0     6.0     5.0     5.0    16.7    16.7 ++
  s3d-morph                    29.0    31.4    29.0    29.8     0.0     5.1   
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     5.0     5.0   -25.0   -25.0 --
  s3d-cube                     23.0    23.0    24.0    24.2    -4.3    -5.2 - 
Dir: sunspider-0.9.1/typed/
  access-fannkuch              57.0    58.0    54.0    54.6     5.3     5.9 + 
  access-nbody                  3.0     4.2     3.0     3.6     0.0    14.3   
  bitops-nsieve-bits           24.0    24.8    22.0    22.8     8.3     8.1 ++
  controlflow-recursive         3.0     3.6     4.0     4.0   -33.3   -11.1 --
  crypto-aes                   25.0    25.4    23.0    23.8     8.0     6.3 + 
  crypto-md5                    3.0     3.4     4.0     4.0   -33.3   -17.6 --
  math-cordic                  16.0    16.0    14.0    14.6    12.5     8.8 ++
  math-spectral-norm            7.0     7.8     7.0     8.2     0.0    -5.1   
  s3d-cube                     16.0    16.4    15.0    15.4     6.2     6.1 + 
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/js/
  regexp                       69.7    69.3    64.8    63.3    -7.0    -8.6 - 
  richards                    259.0   242.4   258.0   254.8    -0.4     5.1   
Dir: v8.5/optimized/
  regexp                       71.8    71.4    66.1    65.8    -7.9    -7.9 --
  splay                      5197.0  5177.6  4991.0  4939.6    -4.0    -4.6 - 
Dir: v8.5/typed/
  regexp                       71.9    71.6    66.4    66.2    -7.6    -7.5 --
Dir: v8.5/untyped/
  regexp                       71.1    70.0    65.4    65.0    -8.0    -7.2 --


This was intended as a sanity check.  But some of the regressions above are pretty troubling, e.g. bytearray-write-utfbytes-1.  (OTOH, there are no regressions where %dAvg<=-10 on the nexus one results).

I'm going to put up a review request just to keep the patch rolling through the pipeline (and maybe get outside insight), but I hope to have an answer for the 9 cases here where %dAvg<=-10 before atttempting to land this.
Comment on attachment 562044 [details] [diff] [review]
patch A v1: drive-by refactor, moving some inlines to .cpp

(if you veto this, that is fine, I'll just rebase the queue.  Also, I find it a little strange that ArrayObject::getLength is inline and virtual and that everything still builds without an out-of-line definition for it.)
Attachment #562044 - Flags: review?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #569288 - Flags: review?(lhansen)
Comment on attachment 562055 [details] [diff] [review]
patch C: mark Array() constructed arrays as simple.

(I already noted my own displeasure with this in comment 4; but I also don't know a better way to do it yet.)
Attachment #562055 - Flags: review?(lhansen)
Comment on attachment 569363 [details] [diff] [review]
patch D v3: jit simple lenflag

(I think there's a bug fix for patch B in the portion of the patch to ArrayObject.cpp; I will probably rearrange things so that that part is associated with patch B  before landing)

Nanojit experts: Is there a bias in branch-instructions like LIR_jt in that I should choose one direction over the other in terms of branch-prediction?
Attachment #569363 - Flags: review?(lhansen)

Comment 18

7 years ago
(In reply to Felix S Klock II from comment #17)
> Comment on attachment 569363 [details] [diff] [review] [diff] [details] [review]
> patch D v3: jit simple lenflag
> 
> (I think there's a bug fix for patch B in the portion of the patch to
> ArrayObject.cpp; I will probably rearrange things so that that part is
> associated with patch B  before landing)
> 
> Nanojit experts: Is there a bias in branch-instructions like LIR_jt in that
> I should choose one direction over the other in terms of branch-prediction?

not really.  There are two effects at play that might bias your choice:

1. the cpu's we care about will be happier with the fast-path being the fall-through path, at least for forward branches.

2. nanojit allocates registers bottom up, so register decisions in the else-block of an if/then/else code fragment will tend to take precedence over register decisions in the then-block.  The then-block is more likely to spill after the branch position, if there are conflicts.

Which of these two effects is dominant is anyone's guess; my guess is the second one dominates, if either.
Comment on attachment 569288 [details] [diff] [review]
patch B v2: separate simple len and boolean

retracting review request; somethings not quite right with this patch queue.
Attachment #569288 - Flags: review?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #562055 - Flags: review?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #569363 - Flags: review?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #562044 - Flags: review?(lhansen)
Created attachment 570683 [details] [diff] [review]
patch A v2: move some inlines to .cpp; ease toggling verify mode
Attachment #562044 - Attachment is obsolete: true
Created attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

cleanup, fixed some bugs, added a few more assertions to internal verify mode.
Attachment #569288 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #570686 - Attachment description: patch B v2: state to track array simpleness. → patch B v3: state to track array simpleness.
(Assignee)

Updated

7 years ago
Attachment #570683 - Flags: review?(treilly)
(Assignee)

Updated

7 years ago
Attachment #570686 - Flags: review?(treilly)
(Assignee)

Updated

7 years ago
Attachment #562055 - Flags: review?(treilly)
(Assignee)

Updated

7 years ago
Attachment #569363 - Flags: review?(treilly)
(Assignee)

Updated

7 years ago
Attachment #570683 - Flags: superreview?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #570686 - Flags: superreview?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #562055 - Flags: superreview?(lhansen)
(Assignee)

Updated

7 years ago
Attachment #569363 - Flags: superreview?(lhansen)
I'm finally restarting the review request cycle (I had aborted the first last week after I discovered bugs in the code).

This code has been tested against the array white box tests in Bug 693431 (which I am planning to put up for review as well, as soon as they go through a buildbot pass).

The one remaining to-do item is to investigate the more troubling regressions that I observed on my MacBookPro, logged in comment 14, namely the following cases:

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  bytearray-write-utf-1        36.0    35.9    30.9    30.9   -14.0   -13.9 --
 bytearray-write-utfbytes-1    47.2    47.2    36.3    36.3   -23.1   -23.1 --
  vector-init-3               146.4   145.9   130.5   130.3   -10.9   -10.7 --
Metric: time 
Dir: language/string/
  replace2                   1105.0  1106.4  1244.0  1245.2   -12.6   -12.5 --
Dir: language/string/typed/
  replace2                   1101.0  1101.8  1246.0  1247.8   -13.2   -13.3 --
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     5.0     5.0   -25.0   -25.0 --
Dir: sunspider-0.9.1/typed/
  controlflow-recursive         3.0     3.6     4.0     4.0   -33.3   -11.1 --
  crypto-md5                    3.0     3.4     4.0     4.0   -33.3   -17.6 --

I am planning to fire up shark to do this next.  But I did not want that to hold up initiating the review process.
FYI: not all of the regressions noted in comment 22 reproduce on fklockii-iMac (2.66 Ghz Intel Core i5):

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  bytearray-write-utf-1        36.6    36.4    36.9    36.4     1.0     0.1   
  bytearray-write-utf-2        24.1    24.1    24.2    24.0     0.3    -0.3   
 bytearray-write-utfbytes-1    48.0    47.7    47.7    47.2    -0.7    -1.0   
 bytearray-write-utfbytes-2    27.0    26.6    27.1    26.8     0.6     0.7   
Metric: time 
Dir: language/string/
  replace                     506.0   509.2   500.0   505.2     1.2     0.8 + 
  replace2                    853.0   857.4   855.0   861.8    -0.2    -0.5   
Dir: language/string/typed/
  replace                     508.0   512.4   493.0   500.4     3.0     2.3 + 
  replace2                    846.0   852.0   854.0   858.0    -0.9    -0.7   
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     4.0     4.0     0.0     0.0   
Dir: sunspider-0.9.1/typed/
  controlflow-recursive         3.0     3.2     3.0     3.6     0.0   -12.5   
  crypto-md5                    3.0     3.0     3.0     3.4     0.0   -13.3   

I haven't done a full run of the performance benchmarks in this context; I just wanted to see if they reproduced somewhere other than my MacBookPro (a 2.8 Ghz Core 2 Duo), because it would probably be nicer to use Shark on the iMac.
Created attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

trivial rebase of the patch D that I missed yesterday.  (there was a bugfix that needed to be shifted backward into patch B to get it to pass my tests on its own.)
Attachment #569363 - Attachment is obsolete: true
Attachment #569363 - Flags: superreview?(lhansen)
Attachment #569363 - Flags: review?(treilly)
Attachment #570950 - Flags: superreview?(lhansen)
Attachment #570950 - Flags: review?(treilly)
(In reply to Felix S Klock II from comment #24)
> (there was a bugfix that needed to be shifted backward into patch B ...)

Should have said "needed to be and has been shifted as part of patch B v3"; the rebase of patch D was to accomodate that shift.

Updated

7 years ago
Attachment #570683 - Flags: superreview?(lhansen) → superreview+

Comment 26

7 years ago
Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

I have to read this more carefully (it's early and Starbucks is noisy) but this approach is indeed what I had in mind and you should consider this response an SR+ for practical purposes (ie landing, if it comes to that).

Re the comment in the patch header: I believe we go to various lengths to obey the "length" getter/setters in Array subclasses and we've had bugs in the past when we didn't do that right, so assume that we have to continue to worry about that until proven otherwise.

It might be useful to poll Lego or the forums to ask how many have subclassed Array in real code.  Usually people do this to add non-prototype methods, not to muck with the length property.  So a more lenient test is whether the instance is Array /or/ a subclass with the original length getter/setter pair.

Comment 27

7 years ago
Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

Looks reasonable.  I wonder whether it would be more correct to widen before scaling rather than vice versa but with a < 4GB object size limitation it should not matter.
Attachment #570950 - Flags: superreview?(lhansen) → superreview+

Comment 28

7 years ago
Comment on attachment 562055 [details] [diff] [review]
patch C: mark Array() constructed arrays as simple.

Effective SR+ based on earlier F+.

Updated

7 years ago
Attachment #570683 - Flags: review?(treilly) → review+

Comment 29

7 years ago
Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

why do we have m_isSimple?  Does m_lengthIfSimple != 0 not work?  Is there some corner case where a zero length array is simple?

Updated

7 years ago
Attachment #562055 - Flags: review?(treilly) → review+
(In reply to Tommy Reilly from comment #29)
> Comment on attachment 570686 [details] [diff] [review] [diff] [details] [review]
> patch B v3: state to track array simpleness.
> 
> why do we have m_isSimple?  Does m_lengthIfSimple != 0 not work?  Is there
> some corner case where a zero length array is simple?

m_isSimple tracks we are ever allowed to have (m_lengthIfSimple != 0).  Under the current design, once an array has been marked as non-simple (i.e. !m_isSimple), it will never become simple.  (Perhaps I should name the member 'm_canBeSimple' rather than 'm_isSimple'; the current name is an artifact of a previous variant.)

The main case I am concerned about is array subclasses.  Subclassed arrays are never simple (see comment 0 and comment 26).   When we construct an array subclass, its m_isSimple flag is set to false.  Then when the array subclas goes from empty to non-empty, its m_lengthIfSimple flag is kept at zero.

A bit more explanation:

Most arrays start off empty (even if their length is zero, their denseArray holds no elements).  You want to allow such arrays to treated as simple, obviously.  If I just used (m_lengthIfSimple != 0), then I would have to handle arrays transitioning from non-simple (when they have m_length == m_lengthIfSimple == 0) to simple.

If I did not have to worry about array subclasees, then I might be able to get rid of the m_isSimple flag, though with that would come a slight policy change, because arrays would then be able to go from simple -> non-simple -> simple again, which they currently cannot do (see my second paragraph above).

(But I think future PACMAN bugs for optimizing Array .length access are going to rely on the simplicity definition used here, or something very close to it.  So I'm not yet too interested in trying to revise the representation here until I prove to myself that it doesn't pay off.)
(Assignee)

Updated

7 years ago
Blocks: 683839
(Assignee)

Updated

7 years ago
Attachment #570686 - Flags: superreview?(lhansen)
Attachment #570686 - Flags: review?(treilly)
Attachment #570686 - Flags: review?(lhansen)
Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

(Moving treilly R? to lhansen, and then treating lhansen's SR+ as an R+.)

Lars: Obviously I can't push this until patch B lands, so you've got a shot at re-review if you want it.  :)
Attachment #570950 - Flags: review?(treilly)

Comment 32

7 years ago
Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

Nice.

Perhaps file a followup bug to address the expedient solution used in try_splice (not sure that it's worth it).
Attachment #570686 - Flags: review?(lhansen) → review+

Updated

7 years ago
Attachment #562055 - Flags: superreview?(lhansen) → superreview+

Comment 33

7 years ago
Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

This comment in ArrayObject.cpp is a little curious:

        // Bugzilla 688486: Should recheck above assertion after
        // jit speculative fast-path lands.

Otherwise nothing.  R+.
Attachment #570950 - Flags: review+

Comment 34

7 years ago
changeset: 6716:2aa4ce80999a
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: move some inlines to .cpp, ease toggling Array verify mode (r=treilly, sr=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/2aa4ce80999a

Comment 35

7 years ago
changeset: 6717:2a58c781d85d
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: state tracking array simpleness (r=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/2a58c781d85d

Comment 36

7 years ago
changeset: 6718:ef9e442690e0
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: mark arrays constructed via 'Array()' as simple (r=treilly, sr=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/ef9e442690e0
(In reply to Lars T Hansen from comment #33)
> Comment on attachment 570950 [details] [diff] [review] [diff] [details] [review]
> patch D v4: jit simple lenflag
> 
> This comment in ArrayObject.cpp is a little curious:
> 
>         // Bugzilla 688486: Should recheck above assertion after
>         // jit speculative fast-path lands.

The "assertion" its referring to is this:

         // This is a hot function.

I didn't want to leave the comment in there unchallenged.  But it might well still be a hot function even with this optimization in place, especially for interpreted systems... so I did not want to just delete it.

(But at this point I'm thinking I might just delete it.)

Comment 38

7 years ago
changeset: 6719:b15c9b7c1bcb
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: make jit emit fast path for simple arrays (r=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/b15c9b7c1bcb
(In reply to Lars T Hansen from comment #32)
> Perhaps file a followup bug to address the expedient solution used in
> try_splice (not sure that it's worth it).

Forked off as Bug 701097 as a minor enhancement.

Comment 40

7 years ago
changeset: 6720:cddb276c4350
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: windows builds broke so backing out to rev 6715:a2224e2b2a43; will revisit tomorrow (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/cddb276c4350

Comment 41

7 years ago
changeset: 6721:4f7123852123
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: attempt to fix windows build by putting FASTCALL after fcn return type in signature (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/4f7123852123
(In reply to Tamarin Bot from comment #35)
> changeset: 6717:2a58c781d85d
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 688486: state tracking array simpleness (r=lhansen).
> 
> http://hg.mozilla.org/tamarin-redux/rev/2a58c781d85d

(In reply to Tamarin Bot from comment #36)
> changeset: 6718:ef9e442690e0
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 688486: mark arrays constructed via 'Array()' as simple
> (r=treilly, sr=lhansen).
> 
> http://hg.mozilla.org/tamarin-redux/rev/ef9e442690e0

changeset 6718 injected a failure in the deep builds; the actual bug is really in changeset 6717, in my opinion.

Filed as Bug 702262.

Comment 43

7 years ago
changeset: 6728:ca173c61d9fb
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: fix template instantiations to fix AOT builds (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/ca173c61d9fb
(Assignee)

Updated

7 years ago
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.