Closed Bug 688486 Opened 13 years ago Closed 13 years ago

[pacman] Speculatively inline Array::getUintProperty in the JIT

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: pnkfelix, Assigned: pnkfelix)

References

Details

(Whiteboard: has-patch, PACMAN)

Attachments

(6 files, 6 obsolete files)

patch A v1: drive-by refactor, moving some inlines to .cpp 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.86 KB, patch		Details \| Diff \| Splinter Review
patch B: track "simple" arrays via explicit boolean field. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 12.42 KB, patch	lhansen : feedback+	Details \| Diff \| Splinter Review
patch C: mark Array() constructed arrays as simple. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 3.86 KB, patch	treilly : review+ lhansen : superreview+ lhansen : feedback+	Details \| Diff \| Splinter Review
patch D v1: jit speculative fast-path for simple arrays 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 6.86 KB, patch	lhansen : feedback+	Details \| Diff \| Splinter Review
patch B v2: separate simple len and boolean 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 14.41 KB, patch		Details \| Diff \| Splinter Review
patch D v2: jit simple lenflag 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 7.64 KB, patch		Details \| Diff \| Splinter Review
transcript N1 v1: output from perf tests on Nexus One 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 41.97 KB, text/plain		Details
patch D v3: jit simple lenflag 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 7.64 KB, patch		Details \| Diff \| Splinter Review
transcript MBP v1: output from perf tests on fklockii-MacBookPro 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 41.89 KB, text/plain		Details
patch A v2: move some inlines to .cpp; ease toggling verify mode 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 4.48 KB, patch	treilly : review+ lhansen : superreview+	Details \| Diff \| Splinter Review
patch B v3: state to track array simpleness. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 28.70 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
patch D v4: jit simple lenflag 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 8.27 KB, patch	lhansen : review+ lhansen : superreview+	Details \| Diff \| Splinter Review

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Description

•

13 years ago

For Array fetches on arrays that (1) are dense, (2) are not subclassed [*], (3) have no holes and (4) start at 0, we can use a similar path to the inline code that recently landed for Vector (Bug 599099).

Doing this properly might require adding a new member that tracks whether an array is "simple" (ie conforms to the above constraints) to the Array representation; that is probably acceptable, especially since simple-ness is likely to be required to do a similar treatment for Array::setUintProperty.

[*] Regarding the "not subclassed" constraint: Maybe this constraint is unnecessary?  Sealed-ness should not matter for reads (only for writes, i.e. setUintProperty).  Do overrides of the length getter matter for the semantics of Array referencing?

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Blocks: array-tracker

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Assignee: nobody → fklockii

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 1

•

13 years ago

(FYI: it isn't out of the question to remove other constraints, e.g. (3) and (4) may not be necessary, depending on how much code generation we're willing to pay for in the jit.  But the constraints as written provide a simple start that is likely to match common use cases.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 2

•

13 years ago

Attached patch patch A v1: drive-by refactor, moving some inlines to .cpp (obsolete) — Details — Splinter Review

These inline routines are not be called from outside ArrayObject.cpp, at least within AVM.

When I started, I thought it would ease development (mostly by reducing compile times) to have them in the .cpp rather than the -inlines.h header.

In hindsight these routines were not touched by my subsequent work on this bug.  So it probably does not matter for now if this patch lands.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 3

•

13 years ago

Attached patch patch B: track "simple" arrays via explicit boolean field. (obsolete) — Details — Splinter Review

I haven't come up with a more clever representation than a separate field in the ArrayObject, and arguably its not worth trying to do so.

This patch isn't 100% of the story; it gets things started so that code like:
  var a = []
produces a simple-marked array.  But that's not good enough, since it doesn't cover:
  var b = new Array(0);

Follow-on patch C will show one way to cover the second case.

Attachment #562049 - Flags: feedback?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 4

•

13 years ago

Attached patch patch C: mark Array() constructed arrays as simple. — Details — Splinter Review

I admit up front that this patch smells bad.

The main goal is to identify which constructor calls are direct calls to new Array(..) itself, versus calls to the Array constructor from one of its subclasses.

The approach is modeled after what Steven did for Bug 654807.

I briefly tried to take a more disciplined approach by directly changing ArrayClass::createInstanceProc but quickly realized that I don't know enough about nativegen to do that.

(Also, the description for this bug mentions that it may not be necessary to keep the not-subclassed constraint.  Even so, its probably better to track m_isSimple in a manner analogous to what this patch does, to enable speculative inlining of setUintProperty.)

Attachment #562055 - Flags: feedback?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 5

•

13 years ago

Attached patch patch D v1: jit speculative fast-path for simple arrays (obsolete) — Details — Splinter Review

The payoff!  :)

This is my first foray into CodegenLIR.

My use of suspendCSE and resumeCSE is based on a conversation with wmaddox about how to generate code with a diamond control flow.

Here are some preliminary performance results; note that the read-{C,Number,int}-2 cases are all sparse arrays, so its not surprising that they regressed.  The degree of slow-down in the init's is a little more troubling:

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  array-1                    1706.3  1702.9  3802.2  3800.6   122.8   123.2 ++
  array-2                     553.4   550.8   542.9   542.4    -1.9    -1.5 - 
  array-init-1                398.2   398.0   372.6   371.7    -6.4    -6.6 --
  array-init-2                418.7   417.2   395.6   392.1    -5.5    -6.0 --
  array-init-3                266.2   265.0   257.2   256.1    -3.4    -3.4 - 
  array-init-4                253.0   252.9   247.5   247.2    -2.2    -2.3 - 
  array-pop-1                 383.2   380.1   350.3   346.5    -8.6    -8.8 --
  array-push-1                240.3   239.6   257.7   257.4     7.3     7.4 ++
  array-read-C-1              176.8   176.7   393.8   393.7   122.7   122.8 ++
  array-read-C-2               26.5    26.5    26.1    26.0    -1.7    -1.5 - 
  array-read-Number-1         174.8   174.7   397.2   393.1   127.2   125.1 ++
  array-read-Number-2          26.1    26.0    26.1    25.9    -0.0    -0.6   
  array-read-int-1            171.8   171.7   396.2   394.2   130.6   129.6 ++
  array-read-int-2             26.3    26.2    26.1    26.0    -0.8    -0.5   
  array-read-int-5            161.5   161.1   276.7   276.5    71.3    71.6 ++
  array-read-int-6            158.9   158.0   271.5   271.1    70.8    71.6 ++
  array-read-int-7            158.2   157.6   268.9   268.4    70.0    70.3 ++
  array-read-length-1          69.7    69.6    69.6    69.5    -0.1    -0.2

Attachment #562059 - Flags: feedback?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 6

•

13 years ago

(In reply to Felix S Klock II from comment #0)
> [*] Regarding the "not subclassed" constraint: Maybe this constraint is
> unnecessary?  Sealed-ness should not matter for reads (only for writes, i.e.
> setUintProperty).  Do overrides of the length getter matter for the
> semantics of Array referencing?

(In reply to Felix S Klock II from comment #4)
> (Also, the description for this bug mentions that it may not be necessary to
> keep the not-subclassed constraint.  Even so, its probably better to track
> m_isSimple in a manner analogous to what this patch does, to enable
> speculative inlining of setUintProperty.)

Note also that getUintProperty is virtual on the C++ side; this is not just about preserving behavior for AS3 subclasses of Array, but also for C++ subclasses of Array.  (On the other hand, its conceivable that SemiSealedArray is the only such subclass...)

Anyway, the simple thing for the short term is to use the definition of simple-ness that I originally wrote in this description; that is, make subclassing (from either C++ or AS3) immediately imply non-simpleness.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Whiteboard: has-patch

Lars T Hansen

Comment 7

•

13 years ago

F+ on all three with the following comments.

- We need some stats on whether our "simple" heuristic is really good enough
in real content. It may well be that it isn't, and that we need to make
allowances for some arrays that have a few holes, notably at the beginning
and end, and possibly temporarily - ie arrays that are dense but not simple
by this definition. Good start though - and it might be enough! Indeed if
it is enough then type-specialized arrays will be a little easier to deal with.

- We need to look carefully at the generated code on x86 and ARM. My gut feeling
is that the m_simple field is not the way to go and that an m_simpleLength field
would be better; it would be nonzero only if the array is simple. Thus
a single load and compare and branch will be necessary, while the code as it
is here now has two loads and will generate an intermediate boolean result
(sometimes expensive - cmov, setcc, whatever) and there is then additional
manipulation of that.

That setup will also come in handy if we think about type-specializing simple
arrays - one length field per type we care about.

- For perf results, please always provide info about the machine they were
obtained on... Anyway in this case I think that you should try a second
architecture. The slowdown on the init benchmarks is probably caused by
cache effects or microarchitecture.

Memo to self: I need to look into whether the Vector optimizations should worry about disabling CSE in the manner you do here.

Lars T Hansen

Updated

•

13 years ago

Attachment #562049 - Flags: feedback?(lhansen) → feedback+

Lars T Hansen

Updated

•

13 years ago

Attachment #562055 - Flags: feedback?(lhansen) → feedback+

Lars T Hansen

Updated

•

13 years ago

Attachment #562059 - Flags: feedback?(lhansen) → feedback+

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 8

•

13 years ago

(In reply to Lars T Hansen from comment #7)
> - For perf results, please always provide info about the machine they were 
>   obtained on...  Anyway in this case I think that you should try a second
>   architecture.  The slowdown on the init benchmarks is probably caused by
>   cache effects or microarchitecture.

My bad, I'm usually better about that.

The results in comment 5 were from Release 32-bit avmshell builds on my MacBook Pro (a 2.8 Ghz Intel Core 2 Duo).

Edwin Smith

Comment 9

•

13 years ago

> Memo to self: I need to look into whether the Vector optimizations should
> worry about disabling CSE in the manner you do here.

As a rule of thumb, if you're in CodegenLIR and generating diamond-shaped code,
then you need to disable/enable CSE around the fork/join flow.  Assume that
a label clears the CSE state.  Whether or not clearing CSE state really matters
depends more on the surrounding code than on the inlined path itself, so the 
effects of resetting CSE will likely be smaller on microbenchmarks than real code.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Depends on: 689828

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Whiteboard: has-patch → has-patch, PACMAN

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 10

•

13 years ago

Attached patch patch B v2: separate simple len and boolean (obsolete) — Details — Splinter Review

(we still need a boolean internally because we need to distinguish 0-length arrays that will remain simple when populated from 0-length arrays that cannot be simple when populated, namely the ones that have been subclassed.  But now the jit can just load the simple length flag on its own.)

Attachment #562049 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 11

•

13 years ago

Attached patch patch D v2: jit simple lenflag (obsolete) — Details — Splinter Review

(In case its not clear, patch B v2 and patch D v2 are incorporating Lars' suggestion from comment 7 to use a m_simpleLength field.)

I'll post results in a little bit.

Attachment #562059 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 12

•

13 years ago

Attached file transcript N1 v1: output from perf tests on Nexus One — Details

Nexus One results

Executive summary (filtering out results where |%dAvg| < 4%, which I arbitrarily chose, and may distort true nature of results, etc etc):

Tamarin tests started: 2011-10-25 08:26:33.847245
Executing 588 test(s)
avm: ../../objdir-reldroid/shell/android_shell.baseline-r6676.py  version: 6679:42790e63d9bc
avm2: ../../objdir-reldroid/shell/android_shell.simple-lenflag.py  version: 6685:57759fa81991
iterations: 5

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  alloc-2                       3.1     2.6     3.0     3.0    -1.1    16.1   
  array-1                     200.8   200.1   462.5   462.3   130.3   131.0 ++
  array-read-C-1               22.4    22.3    74.9    74.9   235.2   236.4 ++
  array-read-Number-1          23.3    23.3    75.0    74.9   222.0   222.1 ++
  array-read-int-1             23.3    23.3    74.9    74.9   221.9   222.2 ++
  array-read-int-5             20.0    19.8    46.8    46.7   134.1   135.7 ++
  array-read-int-6             19.5    19.1    49.0    49.0   151.7   156.8 ++
  array-read-int-7             20.0    19.7    46.8    46.7   134.3   136.9 ++
  array-shift-1                16.0    15.8    15.3    15.2    -4.0    -4.2 - 
  parseInt-1                   19.2    19.2    17.7    17.5    -7.9    -8.7 --
  string-indexOf-3             19.7    19.7    21.5    21.4     8.7     8.5 ++
vector-nested-read-Number-2    13.9    13.1    13.9    13.9     0.1     5.6   
  vector-write-C-1             11.9    11.9    10.8    10.8    -9.5    -9.5 --
Metric: iterations/second 
Dir: jsmicro/
  arguments-3                   2.1     2.1     2.2     2.2     4.3     4.2 + 
  array-shift-1                 5.2     5.2     5.5     5.5     4.3     4.8 + 
  array-sort-1                  4.0     3.4     4.0     4.0     0.7    18.4   
  parseInt-1                   13.4    13.4    12.6    12.6    -6.3    -6.5 --
  string-indexOf-3              6.5     6.5     6.9     6.9     6.6     6.5 ++
Metric: time 
Dir: language/string/
  static_ascii_array_50      1914.0  1919.8  1950.0  1998.0    -1.9    -4.1   
Dir: language/string/typed/
  indexOf                    1458.0  1458.6  1518.0  1518.8    -4.1    -4.1 - 
  search                      839.0   889.2   790.0   830.8     5.8     6.6   
Metric: time 
Dir: scimark/
  MonteCarlo                22607.0 22650.8 20918.0 20951.4     7.5     7.5 ++
  SparseCompRow               746.0   750.8   597.0   604.4    20.0    19.5 ++
Dir: sunspider/
  bitops-nsieve-bits          268.0   269.6   251.0   251.4     6.3     6.8 ++
Dir: sunspider/as3/
  access-fannkuch             257.0   258.2   222.0   222.2    13.6    13.9 ++
  access-nbody                 55.0    55.0    51.0    51.4     7.3     6.5 ++
  bitops-bitwise-and           12.0    12.6    12.0    12.0     0.0     4.8   
  bitops-nsieve-bits          181.0   181.4   166.0   167.2     8.3     7.8 ++
  crypto-aes                  228.0   228.2   210.0   210.8     7.9     7.6 ++
  crypto-md5                  196.0   196.6   211.0   211.4    -7.7    -7.5 --
  math-cordic                 155.0   155.2   149.0   149.0     3.9     4.0 + 
Dir: sunspider-0.9.1/typed/
  access-fannkuch             327.0   327.8   294.0   294.8    10.1    10.1 ++
  access-nbody                 39.0    39.0    35.0    35.4    10.3     9.2 ++
  bitops-nsieve-bits          185.0   185.2   167.0   168.0     9.7     9.3 ++
  crypto-aes                  194.0   195.0   175.0   175.8     9.8     9.8 ++
  math-cordic                 165.0   165.2   155.0   155.8     6.1     5.7 ++
  s3d-raytrace                 77.0    77.2    73.0    73.2     5.2     5.2 ++


(My caveat at the top was half tongue-in-cheek; but note that there *are* a large number of additional regressions in the transcript; they just all had |%dAvg| < 4.)

Anyway, there are also the significant regressions listed above, which I don't have any explanation for.  (Why on earth would vector-write-C-1 regress by 10%?)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 13

•

13 years ago

Attached patch patch D v3: jit simple lenflag (obsolete) — Details — Splinter Review

This is essentially the same as patch D v2 except that I inverted the direction of the comparison and the corresponding branch in the hopes that it would improve branch prediction.

From what I can tell, the effect of this is either in the noise, or merely dampens the effect of the change (so that most of the regressions I saw seem to boil away to close to nothing, and the improvements may be lessened but only slightly so on the microbenchmarks).

Attachment #569289 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 14

•

13 years ago

Attached file transcript MBP v1: output from perf tests on fklockii-MacBookPro — Details

fklockii-MacBookPro (32-bit release shell, 2.8 Ghz Core 2 Duo) results

Executive summary (filtering out results where |%dAvg| < 4%):

Tamarin tests started: 2011-10-25 13:57:46.694479
Executing 588 test(s)
avm: ../../objdir-rel32/shell/avmshell.baseline-r6676  version: 6679:42790e63d9bc
avm2: ../../objdir-rel32/shell/avmshell.simple-lenflag  version: 6683:86dd6b710726
iterations: 5

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  alloc-11                     13.3    13.2    12.5    12.4    -5.8    -6.0 --
  alloc-12                     13.2    13.1    12.3    12.3    -6.7    -6.4 --
  alloc-4                      45.3    45.3    42.6    42.6    -6.0    -6.0 --
  alloc-8                       8.9     8.3     9.0     8.7     0.4     4.9   
  arguments-1                 724.3   723.3   677.3   676.8    -6.5    -6.4 --
  array-1                    1707.3  1706.9  4127.9  4124.9   141.8   141.7 ++
  array-init-1                395.2   393.8   374.9   374.7    -5.1    -4.9 --
  array-init-2                426.1   425.1   396.2   395.8    -7.0    -6.9 --
  array-init-3                273.2   272.5   249.8   249.4    -8.6    -8.5 --
  array-init-4                259.7   259.5   240.5   240.4    -7.4    -7.4 --
  array-read-C-1              174.1   173.7   441.1   440.4   153.3   153.5 ++
  array-read-Number-1         177.5   177.3   448.1   447.7   152.5   152.5 ++
  array-read-int-1            170.5   170.3   448.6   448.4   163.1   163.4 ++
  array-read-int-5            163.3   163.3   321.7   321.5    96.9    96.9 ++
  array-read-int-6            163.5   163.2   321.0   320.8    96.3    96.5 ++
  array-read-int-7            159.8   159.8   318.0   317.8    99.0    98.9 ++
  array-sort-1                 25.2    24.9    25.1    23.6    -0.5    -5.2   
  array-sort-4                  8.8     8.7     8.7     8.3    -1.0    -4.7   
  array-unshift-1             151.2   151.0   149.4   138.9    -1.2    -8.0   
  array-write-int-1            50.4    47.4    51.6    50.7     2.4     7.0   
  array-write-int-2            27.5    27.4    27.6    26.3     0.5    -4.2   
  bytearray-read-byte-3        64.2    56.1    63.9    53.1    -0.5    -5.2   
  bytearray-read-double-2      22.8    21.4    22.6    22.4    -0.9     4.8   
  bytearray-read-int-1         19.9    19.8    21.0    20.9     5.5     5.4 ++
  bytearray-read-int-2         25.6    25.5    24.0    24.0    -6.1    -6.0 --
  bytearray-read-ubyte-1       28.2    28.1    26.3    26.3    -6.7    -6.6 --
  bytearray-read-utfbytes-1    27.1    27.1    26.0    26.0    -4.0    -4.0 - 
  bytearray-write-byte-1       13.4    12.8    13.5    13.4     1.1     4.9   
  bytearray-write-utf-1        36.0    35.9    30.9    30.9   -14.0   -13.9 --
  bytearray-write-utf-2        22.3    22.2    20.2    20.1    -9.6    -9.6 --
 bytearray-write-utfbytes-1    47.2    47.2    36.3    36.3   -23.1   -23.1 --
 bytearray-write-utfbytes-2    24.9    24.8    22.7    22.6    -8.8    -8.7 --
  closedvar-write-1          3115.9  3109.9  3430.6  3394.4    10.1     9.1 ++
  funcall-4                   107.6   106.9   103.3   102.4    -4.0    -4.2 - 
  restarg-1                   723.3   721.5   675.6   673.9    -6.6    -6.6 --
  string-indexOf-1            189.4   189.2   199.8   199.2     5.5     5.3 ++
  vector-init-1               213.4   212.2   194.8   194.8    -8.7    -8.2 --
  vector-init-2               236.1   235.6   220.3   219.1    -6.7    -7.0 --
  vector-init-3               146.4   145.9   130.5   130.3   -10.9   -10.7 --
Dir: jsmicro/
  alloc-1                      37.2    37.0    35.6    35.6    -4.3    -4.0 - 
  alloc-10                      8.4     8.1     7.9     7.7    -5.5    -4.2   
  alloc-7                      34.0    33.4    33.5    31.8    -1.6    -4.7   
  arguments-3                  15.6    15.4    16.6    16.3     6.7     6.2 ++
  array-2                     262.2   261.1   283.2   278.5     8.0     6.7 ++
Metric: time 
Dir: language/string/
  replace2                   1105.0  1106.4  1244.0  1245.2   -12.6   -12.5 --
Dir: language/string/typed/
  replace2                   1101.0  1101.8  1246.0  1247.8   -13.2   -13.3 --
Metric: iterations/second 
Dir: misc/
  bytearray-to-vector-be      569.9   565.4   526.9   522.0    -7.5    -7.7 --
  bytearray-to-vector-le      656.3   654.6   597.4   585.2    -9.0   -10.6 --
  bytearray-to-vector-worst   573.4   562.7   523.5   522.6    -8.7    -7.1 --
Metric: time 
Dir: scimark/
  MonteCarlo                 2884.0  2912.6  2702.0  2720.6     6.3     6.6 ++
  SparseCompRow                99.0   100.2    80.0    80.0    19.2    20.2 ++
Dir: sunspider/
  bitops-bits-in-byte          27.0    27.6    28.0    28.8    -3.7    -4.3 - 
Dir: sunspider/as3/
  access-fannkuch              42.0    43.2    39.0    39.4     7.1     8.8 + 
  access-nsieve                14.0    14.0    14.0    14.8     0.0    -5.7   
  bitops-3bit-bits-in-byte      5.0     5.8     5.0     5.4     0.0     6.9   
  bitops-bitwise-and            1.0     1.6     1.0     1.2     0.0    25.0   
  bitops-nsieve-bits           25.0    25.8    24.0    24.2     4.0     6.2 + 
  controlflow-recursive         4.0     4.4     4.0     4.2     0.0     4.5   
  crypto-aes                   30.0    31.0    28.0    28.8     6.7     7.1 + 
  crypto-md5                   28.0    28.0    29.0    29.8    -3.6    -6.4 - 
  math-cordic                  14.0    15.0    13.0    13.4     7.1    10.7   
  math-spectral-norm            6.0     6.0     5.0     5.0    16.7    16.7 ++
  s3d-morph                    29.0    31.4    29.0    29.8     0.0     5.1   
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     5.0     5.0   -25.0   -25.0 --
  s3d-cube                     23.0    23.0    24.0    24.2    -4.3    -5.2 - 
Dir: sunspider-0.9.1/typed/
  access-fannkuch              57.0    58.0    54.0    54.6     5.3     5.9 + 
  access-nbody                  3.0     4.2     3.0     3.6     0.0    14.3   
  bitops-nsieve-bits           24.0    24.8    22.0    22.8     8.3     8.1 ++
  controlflow-recursive         3.0     3.6     4.0     4.0   -33.3   -11.1 --
  crypto-aes                   25.0    25.4    23.0    23.8     8.0     6.3 + 
  crypto-md5                    3.0     3.4     4.0     4.0   -33.3   -17.6 --
  math-cordic                  16.0    16.0    14.0    14.6    12.5     8.8 ++
  math-spectral-norm            7.0     7.8     7.0     8.2     0.0    -5.1   
  s3d-cube                     16.0    16.4    15.0    15.4     6.2     6.1 + 
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/js/
  regexp                       69.7    69.3    64.8    63.3    -7.0    -8.6 - 
  richards                    259.0   242.4   258.0   254.8    -0.4     5.1   
Dir: v8.5/optimized/
  regexp                       71.8    71.4    66.1    65.8    -7.9    -7.9 --
  splay                      5197.0  5177.6  4991.0  4939.6    -4.0    -4.6 - 
Dir: v8.5/typed/
  regexp                       71.9    71.6    66.4    66.2    -7.6    -7.5 --
Dir: v8.5/untyped/
  regexp                       71.1    70.0    65.4    65.0    -8.0    -7.2 --


This was intended as a sanity check.  But some of the regressions above are pretty troubling, e.g. bytearray-write-utfbytes-1.  (OTOH, there are no regressions where %dAvg<=-10 on the nexus one results).

I'm going to put up a review request just to keep the patch rolling through the pipeline (and maybe get outside insight), but I hope to have an answer for the 9 cases here where %dAvg<=-10 before atttempting to land this.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 15

•

13 years ago

Comment on attachment 562044 [details] [diff] [review]
patch A v1: drive-by refactor, moving some inlines to .cpp

(if you veto this, that is fine, I'll just rebase the queue.  Also, I find it a little strange that ArrayObject::getLength is inline and virtual and that everything still builds without an out-of-line definition for it.)

Attachment #562044 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #569288 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 16

•

13 years ago

Comment on attachment 562055 [details] [diff] [review]
patch C: mark Array() constructed arrays as simple.

(I already noted my own displeasure with this in comment 4; but I also don't know a better way to do it yet.)

Attachment #562055 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 17

•

13 years ago

Comment on attachment 569363 [details] [diff] [review]
patch D v3: jit simple lenflag

(I think there's a bug fix for patch B in the portion of the patch to ArrayObject.cpp; I will probably rearrange things so that that part is associated with patch B  before landing)

Nanojit experts: Is there a bias in branch-instructions like LIR_jt in that I should choose one direction over the other in terms of branch-prediction?

Attachment #569363 - Flags: review?(lhansen)

Edwin Smith

Comment 18

•

13 years ago

(In reply to Felix S Klock II from comment #17)
> Comment on attachment 569363 [details] [diff] [review] [diff] [details] [review]
> patch D v3: jit simple lenflag
> 
> (I think there's a bug fix for patch B in the portion of the patch to
> ArrayObject.cpp; I will probably rearrange things so that that part is
> associated with patch B  before landing)
> 
> Nanojit experts: Is there a bias in branch-instructions like LIR_jt in that
> I should choose one direction over the other in terms of branch-prediction?

not really.  There are two effects at play that might bias your choice:

1. the cpu's we care about will be happier with the fast-path being the fall-through path, at least for forward branches.

2. nanojit allocates registers bottom up, so register decisions in the else-block of an if/then/else code fragment will tend to take precedence over register decisions in the then-block.  The then-block is more likely to spill after the branch position, if there are conflicts.

Which of these two effects is dominant is anyone's guess; my guess is the second one dominates, if either.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 19

•

13 years ago

Comment on attachment 569288 [details] [diff] [review]
patch B v2: separate simple len and boolean

retracting review request; somethings not quite right with this patch queue.

Attachment #569288 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #562055 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #569363 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #562044 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 20

•

13 years ago

Attached patch patch A v2: move some inlines to .cpp; ease toggling verify mode — Details — Splinter Review

Attachment #562044 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 21

•

13 years ago

Attached patch patch B v3: state to track array simpleness. — Details — Splinter Review

cleanup, fixed some bugs, added a few more assertions to internal verify mode.

Attachment #569288 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570686 - Attachment description: patch B v2: state to track array simpleness. → patch B v3: state to track array simpleness.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570683 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570686 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #562055 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #569363 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570683 - Flags: superreview?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570686 - Flags: superreview?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #562055 - Flags: superreview?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #569363 - Flags: superreview?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 22

•

13 years ago

I'm finally restarting the review request cycle (I had aborted the first last week after I discovered bugs in the code).

This code has been tested against the array white box tests in Bug 693431 (which I am planning to put up for review as well, as soon as they go through a buildbot pass).

The one remaining to-do item is to investigate the more troubling regressions that I observed on my MacBookPro, logged in comment 14, namely the following cases:

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  bytearray-write-utf-1        36.0    35.9    30.9    30.9   -14.0   -13.9 --
 bytearray-write-utfbytes-1    47.2    47.2    36.3    36.3   -23.1   -23.1 --
  vector-init-3               146.4   145.9   130.5   130.3   -10.9   -10.7 --
Metric: time 
Dir: language/string/
  replace2                   1105.0  1106.4  1244.0  1245.2   -12.6   -12.5 --
Dir: language/string/typed/
  replace2                   1101.0  1101.8  1246.0  1247.8   -13.2   -13.3 --
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     5.0     5.0   -25.0   -25.0 --
Dir: sunspider-0.9.1/typed/
  controlflow-recursive         3.0     3.6     4.0     4.0   -33.3   -11.1 --
  crypto-md5                    3.0     3.4     4.0     4.0   -33.3   -17.6 --

I am planning to fire up shark to do this next.  But I did not want that to hold up initiating the review process.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 23

•

13 years ago

FYI: not all of the regressions noted in comment 22 reproduce on fklockii-iMac (2.66 Ghz Intel Core i5):

                                        avm            avm2
test                           best     avg    best     avg   %dBst   %dAvg
Metric: iterations/second 
Dir: asmicro/
  bytearray-write-utf-1        36.6    36.4    36.9    36.4     1.0     0.1   
  bytearray-write-utf-2        24.1    24.1    24.2    24.0     0.3    -0.3   
 bytearray-write-utfbytes-1    48.0    47.7    47.7    47.2    -0.7    -1.0   
 bytearray-write-utfbytes-2    27.0    26.6    27.1    26.8     0.6     0.7   
Metric: time 
Dir: language/string/
  replace                     506.0   509.2   500.0   505.2     1.2     0.8 + 
  replace2                    853.0   857.4   855.0   861.8    -0.2    -0.5   
Dir: language/string/typed/
  replace                     508.0   512.4   493.0   500.4     3.0     2.3 + 
  replace2                    846.0   852.0   854.0   858.0    -0.9    -0.7   
Dir: sunspider/as3vector/
  access-nbody                  4.0     4.0     4.0     4.0     0.0     0.0   
Dir: sunspider-0.9.1/typed/
  controlflow-recursive         3.0     3.2     3.0     3.6     0.0   -12.5   
  crypto-md5                    3.0     3.0     3.0     3.4     0.0   -13.3   

I haven't done a full run of the performance benchmarks in this context; I just wanted to see if they reproduced somewhere other than my MacBookPro (a 2.8 Ghz Core 2 Duo), because it would probably be nicer to use Shark on the iMac.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 24

•

13 years ago

Attached patch patch D v4: jit simple lenflag — Details — Splinter Review

trivial rebase of the patch D that I missed yesterday.  (there was a bugfix that needed to be shifted backward into patch B to get it to pass my tests on its own.)

Attachment #569363 - Attachment is obsolete: true

Attachment #569363 - Flags: superreview?(lhansen)

Attachment #569363 - Flags: review?(treilly)

Attachment #570950 - Flags: superreview?(lhansen)

Attachment #570950 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 25

•

13 years ago

(In reply to Felix S Klock II from comment #24)
> (there was a bugfix that needed to be shifted backward into patch B ...)

Should have said "needed to be and has been shifted as part of patch B v3"; the rebase of patch D was to accomodate that shift.

Lars T Hansen

Updated

•

13 years ago

Attachment #570683 - Flags: superreview?(lhansen) → superreview+

Lars T Hansen

Comment 26

•

13 years ago

Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

I have to read this more carefully (it's early and Starbucks is noisy) but this approach is indeed what I had in mind and you should consider this response an SR+ for practical purposes (ie landing, if it comes to that).

Re the comment in the patch header: I believe we go to various lengths to obey the "length" getter/setters in Array subclasses and we've had bugs in the past when we didn't do that right, so assume that we have to continue to worry about that until proven otherwise.

It might be useful to poll Lego or the forums to ask how many have subclassed Array in real code.  Usually people do this to add non-prototype methods, not to muck with the length property.  So a more lenient test is whether the instance is Array /or/ a subclass with the original length getter/setter pair.

Lars T Hansen

Comment 27

•

13 years ago

Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

Looks reasonable.  I wonder whether it would be more correct to widen before scaling rather than vice versa but with a < 4GB object size limitation it should not matter.

Attachment #570950 - Flags: superreview?(lhansen) → superreview+

Lars T Hansen

Comment 28

•

13 years ago

Comment on attachment 562055 [details] [diff] [review]
patch C: mark Array() constructed arrays as simple.

Effective SR+ based on earlier F+.

Tommy Reilly

Updated

•

13 years ago

Attachment #570683 - Flags: review?(treilly) → review+

Tommy Reilly

Comment 29

•

13 years ago

Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

why do we have m_isSimple?  Does m_lengthIfSimple != 0 not work?  Is there some corner case where a zero length array is simple?

Tommy Reilly

Updated

•

13 years ago

Attachment #562055 - Flags: review?(treilly) → review+

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 30

•

13 years ago

(In reply to Tommy Reilly from comment #29)
> Comment on attachment 570686 [details] [diff] [review] [diff] [details] [review]
> patch B v3: state to track array simpleness.
> 
> why do we have m_isSimple?  Does m_lengthIfSimple != 0 not work?  Is there
> some corner case where a zero length array is simple?

m_isSimple tracks we are ever allowed to have (m_lengthIfSimple != 0).  Under the current design, once an array has been marked as non-simple (i.e. !m_isSimple), it will never become simple.  (Perhaps I should name the member 'm_canBeSimple' rather than 'm_isSimple'; the current name is an artifact of a previous variant.)

The main case I am concerned about is array subclasses.  Subclassed arrays are never simple (see comment 0 and comment 26).   When we construct an array subclass, its m_isSimple flag is set to false.  Then when the array subclas goes from empty to non-empty, its m_lengthIfSimple flag is kept at zero.

A bit more explanation:

Most arrays start off empty (even if their length is zero, their denseArray holds no elements).  You want to allow such arrays to treated as simple, obviously.  If I just used (m_lengthIfSimple != 0), then I would have to handle arrays transitioning from non-simple (when they have m_length == m_lengthIfSimple == 0) to simple.

If I did not have to worry about array subclasees, then I might be able to get rid of the m_isSimple flag, though with that would come a slight policy change, because arrays would then be able to go from simple -> non-simple -> simple again, which they currently cannot do (see my second paragraph above).

(But I think future PACMAN bugs for optimizing Array .length access are going to rely on the simplicity definition used here, or something very close to it.  So I'm not yet too interested in trying to revise the representation here until I prove to myself that it doesn't pay off.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Blocks: 683839

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #570686 - Flags: superreview?(lhansen)

Attachment #570686 - Flags: review?(treilly)

Attachment #570686 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 31

•

13 years ago

Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

(Moving treilly R? to lhansen, and then treating lhansen's SR+ as an R+.)

Lars: Obviously I can't push this until patch B lands, so you've got a shot at re-review if you want it.  :)

Attachment #570950 - Flags: review?(treilly)

Lars T Hansen

Comment 32

•

13 years ago

Comment on attachment 570686 [details] [diff] [review]
patch B v3: state to track array simpleness.

Nice.

Perhaps file a followup bug to address the expedient solution used in try_splice (not sure that it's worth it).

Attachment #570686 - Flags: review?(lhansen) → review+

Lars T Hansen

Updated

•

13 years ago

Attachment #562055 - Flags: superreview?(lhansen) → superreview+

Lars T Hansen

Comment 33

•

13 years ago

Comment on attachment 570950 [details] [diff] [review]
patch D v4: jit simple lenflag

This comment in ArrayObject.cpp is a little curious:

        // Bugzilla 688486: Should recheck above assertion after
        // jit speculative fast-path lands.

Otherwise nothing.  R+.

Attachment #570950 - Flags: review+

Tamarin Bot

Comment 34

•

13 years ago

changeset: 6716:2aa4ce80999a
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: move some inlines to .cpp, ease toggling Array verify mode (r=treilly, sr=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/2aa4ce80999a

Tamarin Bot

Comment 35

•

13 years ago

changeset: 6717:2a58c781d85d
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: state tracking array simpleness (r=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/2a58c781d85d

Tamarin Bot

Comment 36

•

13 years ago

changeset: 6718:ef9e442690e0
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: mark arrays constructed via 'Array()' as simple (r=treilly, sr=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/ef9e442690e0

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 37

•

13 years ago

(In reply to Lars T Hansen from comment #33)
> Comment on attachment 570950 [details] [diff] [review] [diff] [details] [review]
> patch D v4: jit simple lenflag
> 
> This comment in ArrayObject.cpp is a little curious:
> 
>         // Bugzilla 688486: Should recheck above assertion after
>         // jit speculative fast-path lands.

The "assertion" its referring to is this:

         // This is a hot function.

I didn't want to leave the comment in there unchallenged.  But it might well still be a hot function even with this optimization in place, especially for interpreted systems... so I did not want to just delete it.

(But at this point I'm thinking I might just delete it.)

Tamarin Bot

Comment 38

•

13 years ago

changeset: 6719:b15c9b7c1bcb
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: make jit emit fast path for simple arrays (r=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/b15c9b7c1bcb

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 39

•

13 years ago

(In reply to Lars T Hansen from comment #32)
> Perhaps file a followup bug to address the expedient solution used in
> try_splice (not sure that it's worth it).

Forked off as Bug 701097 as a minor enhancement.

Tamarin Bot

Comment 40

•

13 years ago

changeset: 6720:cddb276c4350
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: windows builds broke so backing out to rev 6715:a2224e2b2a43; will revisit tomorrow (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/cddb276c4350

Tamarin Bot

Comment 41

•

13 years ago

changeset: 6721:4f7123852123
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: attempt to fix windows build by putting FASTCALL after fcn return type in signature (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/4f7123852123

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 42

•

13 years ago

(In reply to Tamarin Bot from comment #35)
> changeset: 6717:2a58c781d85d
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 688486: state tracking array simpleness (r=lhansen).
> 
> http://hg.mozilla.org/tamarin-redux/rev/2a58c781d85d

(In reply to Tamarin Bot from comment #36)
> changeset: 6718:ef9e442690e0
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 688486: mark arrays constructed via 'Array()' as simple
> (r=treilly, sr=lhansen).
> 
> http://hg.mozilla.org/tamarin-redux/rev/ef9e442690e0

changeset 6718 injected a failure in the deep builds; the actual bug is really in changeset 6717, in my opinion.

Filed as Bug 702262.

Tamarin Bot

Comment 43

•

13 years ago

changeset: 6728:ca173c61d9fb
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 688486: fix template instantiations to fix AOT builds (r=fklockii).

http://hg.mozilla.org/tamarin-redux/rev/ca173c61d9fb

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.