Closed
Bug 562571
Opened 14 years ago
Closed 14 years ago
TM: don't have two bounds checks for array getelem
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(1 file)
30.22 KB,
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
This is a follow-on from bug 552837. This patch adds a MINLENCAP slot to dense arrays, which allows the "index < length" and "index < capacity" checks for getelem to be replaced with a single "index < minLenCap" check. - I split setArrayLength() into setDenseArrayLengt() and setSlowArrayLength(). This was to avoid a "isDenseArray()" test in setArrayLength() that was hit a lot. - There's a function isDenseArrayMinLenCapOk() which checks that minLenCap == min(length, capacity) whenever we get length or cap. This is a great sanity check. - I added JSCLASS_CONSTRUCT_PROTOTYPE to js_ArrayClass, this was necessary so that JSSLOT_ARRAY_LENGTH was set before the first call to isDenseArrayMinLenCapOk() occurred. - I changed voidDenseArrayCount() to voidDenseArrayOnlySlots(), which indicates better what it does. - I changed on-trace getelem to only check minLenCap, the whole point of the exercise. Note that in the "not idx < min(length, capacity)" we no longer have to worry about the case where index==length. - I didn't change the GETELEM op in the interpreter, it didn't seem worth it, but I can do it if someone thinks it's worthwhile. The biggest controversy about this patch will probably be the "minLenCap" name. I like it because it's very descriptive -- if we invent a new term like "limit" then everyone will just have to remember that limit==min(length,capacity). I'd be happy with other variants of minLenCap, eg. minLengthCapacity or whatever, but "len" and "cap" do appear in various places (eg. "newlen", "newcap"). Here's the generated code diff for a typical array access: - ldi20 = ldi.o $stack2_3[28] + ldi14 = ldi.o $stack2_3[28] ...... mov ecx,28(edx) - ldi22 = ldi.o $stack2_3[16] - ...... mov eax,16(edx) - ltui8 = ltui ldi1, ldi22 - xf14: xf ltui8 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003) - ...... cmp ebx,eax - ...... jnb ...... - ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf) - ......: - ## merging registers (intersect) with existing edge - ...... mov ecx,-16(ebp) <= restore state - ...... mov ebx,-12(ebp) <= restore ebx - ...... mov esi,-8(ebp) <= restore esi - ...... mov edi,-4(ebp) <= restore edi - ...... mov eax,...... - ...... mov esp,ebp - ......: - ...... jmp ...... - ----------------------------------- ## END exit block ...... - eqi14 = eqi ldi20, immi2/*0*/ - xt10: xt eqi14 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=004) - ...... test ecx,ecx - ...... je ...... - ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf) - ......: - ## merging registers (intersect) with existing edge - ...... mov ecx,-16(ebp) <= restore state - ...... mov ebx,-12(ebp) <= restore ebx - ...... mov esi,-8(ebp) <= restore esi - ...... mov edi,-4(ebp) <= restore edi - ...... mov eax,...... - ...... mov esp,ebp - ......: - ...... jmp ...... - ----------------------------------- ## END exit block ...... - ldi21 = ldi.o ldi20[-4] - ...... mov eax,-4(ecx) - ltui7 = ltui ldi1, ldi21 - xf13: xf ltui7 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=005) + minLenCap = ldi.o $stack2_3[24] + ...... mov eax,24(edx) + ltui4 = ltui ldi1, minLenCap + xf10: xf ltui4 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003) ...... cmp ebx,eax ...... jnb ...... As well as the performance benefits this makes the LIR for array code much easier to read. SunSpider performance is about 1--1.5% better; the most consistent improvers are 3d-raytrace (4--5% better) and crypto-{sha1,aes} (3% better). V8 is barely changed. Instruction counts from Cachegrind: ------------------------------------------------------------------ instructions executed (nb: "on-trace" may overestimate slightly) ------------------------------------------------------------------ 3d-cube: total 128,160,941 126,135,726 1.016x better on-trace 26,754,012 25,350,918 1.055x better 3d-morph: total 74,088,301 74,385,789 *1.004x worse* on-trace 26,384,915 26,384,230 -- 3d-raytrace: total 146,285,201 141,266,931 1.036x better on-trace 27,644,337 26,154,392 1.057x better access-binary-trees: total 81,945,453 81,948,256 -- on-trace 18,746,077 18,746,091 -- access-fannkuch: total 249,655,251 241,919,269 1.032x better on-trace 124,802,969 117,430,433 1.063x better access-nbody: total 84,972,662 84,430,151 1.006x better on-trace 31,472,323 31,020,142 1.015x better access-nsieve: total 62,077,684 61,367,794 1.012x better on-trace 29,971,503 29,271,346 1.024x better bitops-3bit-bits-in-byte: total 19,618,677 19,620,568 -- on-trace 7,086,136 7,086,141 -- bitops-bits-in-byte: total 47,257,301 47,259,144 -- on-trace 34,580,793 34,580,798 -- bitops-bitwise-and: total 24,307,385 24,309,378 -- on-trace 12,019,326 12,019,331 -- bitops-nsieve-bits: total 93,413,931 90,512,810 1.032x better on-trace 50,621,770 47,730,405 1.061x better controlflow-recursive: total 43,963,287 43,965,009 -- on-trace 28,565,053 28,565,058 -- crypto-md5: total 48,713,639 48,540,006 1.004x better on-trace 5,908,817 5,831,001 1.013x better crypto-sha1: total 34,176,458 33,734,164 1.013x better on-trace 8,135,265 7,854,316 1.036x better date-format-tofte: total 148,708,101 148,789,672 *1.001x worse* on-trace 11,887,981 11,882,053 -- date-format-xparb: total 124,302,025 124,210,805 1.001x better on-trace 11,967,892 11,923,521 1.004x better math-cordic: total 57,014,197 55,493,882 1.027x better on-trace 32,194,286 30,693,935 1.049x better math-partial-sums: total 38,518,090 38,519,471 -- on-trace 6,561,457 6,561,460 -- math-spectral-norm: total 30,368,921 29,817,602 1.018x better on-trace 12,262,111 11,770,822 1.042x better regexp-dna: total 118,394,430 118,397,026 -- on-trace 75,250,329 75,250,338 -- string-base64: total 54,888,741 54,713,331 1.003x better on-trace 8,452,906 8,288,783 1.020x better string-fasta: total 198,961,457 198,962,830 -- on-trace 31,559,026 31,559,039 -- string-tagcloud: total 286,758,374 286,837,572 -- on-trace 7,461,579 7,434,727 1.004x better string-unpack-code: total 257,417,869 257,161,931 1.001x better on-trace 5,425,999 5,374,035 1.010x better string-validate-input: total 83,281,070 83,577,783 *1.004x worse* on-trace 8,165,827 8,045,488 1.015x better ------- all: total 2,537,249,446 2,515,876,900 1.008x better on-trace 643,882,689 626,808,803 1.027x better ------------------------------------------------------------------ instructions executed (nb: "on-trace" may overestimate slightly) ------------------------------------------------------------------ v8-crypto: total 2,515,615,331 2,427,104,735 1.036x better on-trace 1,568,701,072 1,502,006,825 1.044x better v8-deltablue: total 2,566,464,360 2,559,627,555 1.003x better on-trace 1,274,484,845 1,262,760,558 1.009x better v8-earley-boyer: total 7,246,502,527 7,230,838,736 1.002x better on-trace 98,106,127 97,836,836 1.003x better v8-raytrace: total 2,345,453,598 2,345,071,470 -- on-trace 101,109,538 100,819,646 1.003x better v8-regexp: total 3,961,519,742 3,961,588,339 -- on-trace 288,227,091 288,229,538 -- v8-richards: total 2,540,833,609 2,533,293,083 1.003x better on-trace 2,156,391,594 2,148,948,873 1.003x better v8-splay: total 3,926,401,052 3,929,236,483 *1.001x worse* on-trace 172,717,716 174,395,824 *1.010x worse* ------- all: total 25,102,790,219 24,986,760,401 1.005x better on-trace 5,659,737,983 5,574,998,100 1.015x better The ones which are worse are due to either (a) random fluctuations, or (b) setting dense array length and capacity are slightly more expensive now.
Attachment #442321 -
Flags: review?(brendan)
Attachment #442321 -
Flags: feedback?(gal)
Assignee | ||
Comment 1•14 years ago
|
||
Gal, I'm asking for feedback in case this clashes badly with your array-ng stuff.
Ignore array-ng -- it's going to need at least one major restructuring before it gets to landing, so don't let it slow you down. I don't think that people need to remember that it is the minimum of cap and len: I think they need to remember what it *means*, which is that slots accesses below that value are valid.
Assignee | ||
Comment 3•14 years ago
|
||
Thinking some more: why do we need two checks? It's obvious that we need the index < capacity check, otherwise we could access past the end of dslots. But what about the length < index < capacity case? Would we get a value that hasn't been correctly initialised? I looked at the GETELEM code in jsops.cpp but ended up none the wiser.
Comment 4•14 years ago
|
||
Comment on attachment 442321 [details] [diff] [review] patch >+ * - The array's length property as a uint32, accessible with >+ * getArrayLength(), setDenseArrayLength(). No mention of setSlowArrayLength here? Nit on voidDenseArrayOnlySlots -- seems like the Only should go before Array. r=me, great work. /be
Attachment #442321 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #4) > (From update of attachment 442321 [details] [diff] [review]) > >+ * - The array's length property as a uint32, accessible with > >+ * getArrayLength(), setDenseArrayLength(). > > No mention of setSlowArrayLength here? That comment is talking only about dense arrays. > Nit on voidDenseArrayOnlySlots -- seems like the Only should go before Array. Pushed with that nit fixed and the totally bogus change to jsnum.cpp removed -- not sure how that got in there. http://hg.mozilla.org/tracemonkey/rev/990192b0e052
Comment 6•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/990192b0e052
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•14 years ago
|
Attachment #442321 -
Flags: feedback?(gal)
You need to log in
before you can comment on or make changes to this bug.
Description
•