Closed Bug 562571 Opened 14 years ago Closed 14 years ago

TM: don't have two bounds checks for array getelem

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

Attached patch patchSplinter Review
This is a follow-on from bug 552837.

This patch adds a MINLENCAP slot to dense arrays, which allows the
"index < length" and "index < capacity" checks for getelem to be replaced
with a single "index < minLenCap" check.

- I split setArrayLength() into setDenseArrayLengt() and
  setSlowArrayLength().  This was to avoid a "isDenseArray()" test in
  setArrayLength() that was hit a lot.

- There's a function isDenseArrayMinLenCapOk() which checks that 
  minLenCap == min(length, capacity) whenever we get length or cap.  This is
  a great sanity check.

- I added JSCLASS_CONSTRUCT_PROTOTYPE to js_ArrayClass, this was necessary
  so that JSSLOT_ARRAY_LENGTH was set before the first call to
  isDenseArrayMinLenCapOk() occurred.

- I changed voidDenseArrayCount() to voidDenseArrayOnlySlots(), which
  indicates better what it does.

- I changed on-trace getelem to only check minLenCap, the whole point of the
  exercise.  Note that in the "not idx < min(length, capacity)" we no longer
  have to worry about the case where index==length.

- I didn't change the GETELEM op in the interpreter, it didn't seem worth
  it, but I can do it if someone thinks it's worthwhile.

The biggest controversy about this patch will probably be the "minLenCap"
name.  I like it because it's very descriptive -- if we invent a new term
like "limit" then everyone will just have to remember that
limit==min(length,capacity).  I'd be happy with other variants of minLenCap,
eg.  minLengthCapacity or whatever, but "len" and "cap" do appear in various
places (eg. "newlen", "newcap").

Here's the generated code diff for a typical array access:

-      ldi20 = ldi.o $stack2_3[28]
+      ldi14 = ldi.o $stack2_3[28]
   ......   mov ecx,28(edx)
-      ldi22 = ldi.o $stack2_3[16]
-  ......   mov eax,16(edx)
-      ltui8 = ltui ldi1, ldi22
-      xf14: xf ltui8 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003)
-  ......   cmp ebx,eax
-  ......   jnb   ......
-  ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf)
-  ......:
-  ## merging registers (intersect) with existing edge
-  ......   mov ecx,-16(ebp)  <= restore state
-  ......   mov ebx,-12(ebp)  <= restore ebx
-  ......   mov esi,-8(ebp)  <= restore esi
-  ......   mov edi,-4(ebp)  <= restore edi
-  ......   mov eax,......
-  ......   mov esp,ebp
-  ......:
-  ......   jmp ......
-  ----------------------------------- ## END exit block ......
-      eqi14 = eqi ldi20, immi2/*0*/
-      xt10: xt eqi14 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=004)
-  ......   test ecx,ecx
-  ......   je    ......
-  ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf)
-  ......:
-  ## merging registers (intersect) with existing edge
-  ......   mov ecx,-16(ebp)  <= restore state
-  ......   mov ebx,-12(ebp)  <= restore ebx
-  ......   mov esi,-8(ebp)  <= restore esi
-  ......   mov edi,-4(ebp)  <= restore edi
-  ......   mov eax,......
-  ......   mov esp,ebp
-  ......:
-  ......   jmp ......
-  ----------------------------------- ## END exit block ......
-      ldi21 = ldi.o ldi20[-4]
-  ......   mov eax,-4(ecx)
-      ltui7 = ltui ldi1, ldi21
-      xf13: xf ltui7 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=005)
+      minLenCap = ldi.o $stack2_3[24]
+  ......   mov eax,24(edx)
+      ltui4 = ltui ldi1, minLenCap
+      xf10: xf ltui4 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003)
   ......   cmp ebx,eax
   ......   jnb   ......

As well as the performance benefits this makes the LIR for array code much
easier to read.

SunSpider performance is about 1--1.5% better;  the most consistent
improvers are 3d-raytrace (4--5% better) and crypto-{sha1,aes} (3% better).
V8 is barely changed.  Instruction counts from Cachegrind:

------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
3d-cube:
  total         128,160,941     126,135,726     1.016x better
  on-trace       26,754,012      25,350,918     1.055x better
3d-morph:
  total          74,088,301      74,385,789    *1.004x worse*
  on-trace       26,384,915      26,384,230     -- 
3d-raytrace:
  total         146,285,201     141,266,931     1.036x better
  on-trace       27,644,337      26,154,392     1.057x better
access-binary-trees:
  total          81,945,453      81,948,256     -- 
  on-trace       18,746,077      18,746,091     -- 
access-fannkuch:
  total         249,655,251     241,919,269     1.032x better
  on-trace      124,802,969     117,430,433     1.063x better
access-nbody:
  total          84,972,662      84,430,151     1.006x better
  on-trace       31,472,323      31,020,142     1.015x better
access-nsieve:
  total          62,077,684      61,367,794     1.012x better
  on-trace       29,971,503      29,271,346     1.024x better
bitops-3bit-bits-in-byte:
  total          19,618,677      19,620,568     -- 
  on-trace        7,086,136       7,086,141     -- 
bitops-bits-in-byte:
  total          47,257,301      47,259,144     -- 
  on-trace       34,580,793      34,580,798     -- 
bitops-bitwise-and:
  total          24,307,385      24,309,378     -- 
  on-trace       12,019,326      12,019,331     -- 
bitops-nsieve-bits:
  total          93,413,931      90,512,810     1.032x better
  on-trace       50,621,770      47,730,405     1.061x better
controlflow-recursive:
  total          43,963,287      43,965,009     -- 
  on-trace       28,565,053      28,565,058     -- 
crypto-md5:
  total          48,713,639      48,540,006     1.004x better
  on-trace        5,908,817       5,831,001     1.013x better
crypto-sha1:
  total          34,176,458      33,734,164     1.013x better
  on-trace        8,135,265       7,854,316     1.036x better
date-format-tofte:
  total         148,708,101     148,789,672    *1.001x worse*
  on-trace       11,887,981      11,882,053     -- 
date-format-xparb:
  total         124,302,025     124,210,805     1.001x better
  on-trace       11,967,892      11,923,521     1.004x better
math-cordic:
  total          57,014,197      55,493,882     1.027x better
  on-trace       32,194,286      30,693,935     1.049x better
math-partial-sums:
  total          38,518,090      38,519,471     -- 
  on-trace        6,561,457       6,561,460     -- 
math-spectral-norm:
  total          30,368,921      29,817,602     1.018x better
  on-trace       12,262,111      11,770,822     1.042x better
regexp-dna:
  total         118,394,430     118,397,026     -- 
  on-trace       75,250,329      75,250,338     -- 
string-base64:
  total          54,888,741      54,713,331     1.003x better
  on-trace        8,452,906       8,288,783     1.020x better
string-fasta:
  total         198,961,457     198,962,830     -- 
  on-trace       31,559,026      31,559,039     -- 
string-tagcloud:
  total         286,758,374     286,837,572     -- 
  on-trace        7,461,579       7,434,727     1.004x better
string-unpack-code:
  total         257,417,869     257,161,931     1.001x better
  on-trace        5,425,999       5,374,035     1.010x better
string-validate-input:
  total          83,281,070      83,577,783    *1.004x worse*
  on-trace        8,165,827       8,045,488     1.015x better
-------
all:
  total       2,537,249,446   2,515,876,900     1.008x better
  on-trace      643,882,689     626,808,803     1.027x better


------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
v8-crypto:
  total       2,515,615,331   2,427,104,735     1.036x better
  on-trace    1,568,701,072   1,502,006,825     1.044x better
v8-deltablue:
  total       2,566,464,360   2,559,627,555     1.003x better
  on-trace    1,274,484,845   1,262,760,558     1.009x better
v8-earley-boyer:
  total       7,246,502,527   7,230,838,736     1.002x better
  on-trace       98,106,127      97,836,836     1.003x better
v8-raytrace:
  total       2,345,453,598   2,345,071,470     -- 
  on-trace      101,109,538     100,819,646     1.003x better
v8-regexp:
  total       3,961,519,742   3,961,588,339     -- 
  on-trace      288,227,091     288,229,538     -- 
v8-richards:
  total       2,540,833,609   2,533,293,083     1.003x better
  on-trace    2,156,391,594   2,148,948,873     1.003x better
v8-splay:
  total       3,926,401,052   3,929,236,483    *1.001x worse*
  on-trace      172,717,716     174,395,824    *1.010x worse*
-------
all:
  total      25,102,790,219  24,986,760,401     1.005x better
  on-trace    5,659,737,983   5,574,998,100     1.015x better


The ones which are worse are due to either (a) random fluctuations, or (b)
setting dense array length and capacity are slightly more expensive now.
Attachment #442321 - Flags: review?(brendan)
Attachment #442321 - Flags: feedback?(gal)
Gal, I'm asking for feedback in case this clashes badly with your array-ng stuff.
Ignore array-ng -- it's going to need at least one major restructuring before it gets to landing, so don't let it slow you down.

I don't think that people need to remember that it is the minimum of cap and len: I think they need to remember what it *means*, which is that slots accesses below that value are valid.
Thinking some more:  why do we need two checks?  It's obvious that we need the index < capacity check, otherwise we could access past the end of dslots.  But what about the length < index < capacity case?  Would we get a value that hasn't been correctly initialised?  I looked at the GETELEM code in jsops.cpp but ended up none the wiser.
Comment on attachment 442321 [details] [diff] [review]
patch

>+ *  - The array's length property as a uint32, accessible with
>+ *    getArrayLength(), setDenseArrayLength().

No mention of setSlowArrayLength here?

Nit on voidDenseArrayOnlySlots -- seems like the Only should go before Array.

r=me, great work.

/be
Attachment #442321 - Flags: review?(brendan) → review+
(In reply to comment #4)
> (From update of attachment 442321 [details] [diff] [review])
> >+ *  - The array's length property as a uint32, accessible with
> >+ *    getArrayLength(), setDenseArrayLength().
> 
> No mention of setSlowArrayLength here?

That comment is talking only about dense arrays.

> Nit on voidDenseArrayOnlySlots -- seems like the Only should go before Array.

Pushed with that nit fixed and the totally bogus change to jsnum.cpp removed -- not sure how that got in there.

http://hg.mozilla.org/tracemonkey/rev/990192b0e052
http://hg.mozilla.org/mozilla-central/rev/990192b0e052
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Attachment #442321 - Flags: feedback?(gal)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: