Closed
Bug 600127
Opened 14 years ago
Closed 14 years ago
nanojit: do implicit constant propagation after guards
Categories
(Core Graveyard :: Nanojit, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)
Attachments
(2 files)
4.44 KB,
patch
|
edwsmith
:
review+
|
Details | Diff | Splinter Review |
3.28 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
Once you get past a guard such as "xt cmp" you know that 'cmp' must be
false. Likewise, if you get past "xf cmp2" you know that 'cmp2' must be
true.
For a bigger example, consider the JS code:
a[i] += 3;
The relevant LIR looks like this (with the getelem and setelem annotated):
#------------------------------------------------------------------------
# getelem
#------------------------------------------------------------------------
ldi5 = ldi.o $stack5[4]
clasp = immi 0x2ee060
guard(class is Array) = eqi ldi5, clasp/*0x2ee060*/
xf4: xf guard(class is Array) -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=002)
capacity = ldi.o $stack5[40]
ltui1 = ltui ldi1, capacity
+ xf3: xf ltui1 -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=003)
dslots = ldi.o $stack5[24]
immi5 = immi 3
lshi2 = lshi ldi1, immi5/*3*/
addi2 = addi dslots, lshi2
ldi4 = ldi.o addi2[4]
JSVAL_TAG_INT32 = immi 0xffff0001
eqi4 = eqi ldi4, JSVAL_TAG_INT32/*0xffff0001*/
xf2: xf eqi4 -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=004)
ldi3 = ldi.o addi2[0]
sti.s sp[16] = ldi3
#--------------------------------------------------------------------
sti.s sp[24] = immi5/*3*/
addxovi2 = addxovi ldi3, immi5/*3*/ -> pc=0x613535 imacpc=0x0 sp+32 rp+0 (GuardID=005)
sti.s sp[16] = addxovi2
#------------------------------------------------------------------------
# setelem
#------------------------------------------------------------------------
* jt ltui1 -> label3
* js_EnsureDenseArrayCapacity1 = calli.all #js_EnsureDenseArrayCapacity ( cx $stack5 ldi1 )
* eqi3 = eqi js_EnsureDenseArrayCapacity1, strict/*0*/
* xt2: xt eqi3 -> pc=0x613536 imacpc=0x0 sp+24 rp+0 (GuardID=007)
* label3:
** dslots2 = ldi.o $stack5[24]
** immi4 = immi 3
** lshi1 = lshi ldi1, immi4/*3*/
** addi1 = addi dslots2, lshi1
JSVAL_TAG_MAGIC = immi 0xffff0004
** ldi2 = ldi.o addi1[4]
eqi2 = eqi ldi2, JSVAL_TAG_MAGIC/*0xffff0004*/
jf eqi2 -> label2
hasNoIndexedProperties = calli.all #js_Array_dense_setelem_hole ( cx $stack5 ldi1 )
immi3 = immi 0
eqi1 = eqi hasNoIndexedProperties, immi3/*0*/
xt1: xt eqi1 -> pc=0x613536 imacpc=0x0 sp+24 rp+0 (GuardID=008)
label2:
JSVAL_TAG_INT32_2 = immi 0xffff0001
sti.o addi1[4] = JSVAL_TAG_INT32_2/*0xffff0001*/
sti.o addi1[0] = addxovi2
#------------------------------------------------------------------------
We know that after "xf ltui1" (marked with '+') that 'ltui1' is true (else
we would have exited). This allows us to determine that "jt ltui1" will
always be taken, and so we can get rid of all the lines marked with '*'.
Because this gets rid of the js_EnsureDenseArrayCapacity() call, which was
causing some aliasing, CSE can do more and gets rid of the lines
marked '**'.
This pattern shows up frequently in code that interleaves array gets and
sets.
(Furthermore, once those are gone we can see that ldi4 gets compared against
JSVAL_TAG_INT32 and then later against JSVAL_TAG_MAGIC, and that the latter
cannot succeed, ie. the hole check can be removed. But that's fodder for a
follow-up bug.)
This patch implements this constant-propagation-after-guards. It does it in
the CseFilter, which seems the best place as that's when repeated
comparisons (which are what can be determined as constant after guards) are
detected. The clearing of the table holding the guard info follows the same
pattern as the clearing of the CSE tables.
Instruction counts for Sunspider and Kraken (V8's results aren't much changed):
---------------------------------------------------------------
| millions of instructions executed |
| total | on-trace (may overestimate) |
---------------------------------------------------------------
| 99.984 100.164 (0.998x) | 46.959 46.960 (------) | 3d-cube
| 47.143 47.152 (------) | 25.478 25.478 (------) | 3d-morph
| 106.615 106.640 (------) | 43.142 43.139 (------) | 3d-raytrace
| 42.552 42.556 (------) | 14.534 14.534 (------) | access-binary-
| 111.993 110.415 (1.014x) | 92.709 91.145 (1.017x) | access-fannkuc
| 38.230 38.277 (0.999x) | 17.185 17.186 (------) | access-nbody
| 48.448 48.452 (------) | 28.184 28.184 (------) | access-nsieve
| 16.410 16.412 (------) | 3.250 3.250 (------) | bitops-3bit-bi
| 45.795 45.798 (------) | 32.523 32.523 (------) | bitops-bits-in
| 24.833 24.833 (------) | 12.020 12.020 (------) | bitops-bitwise
| 52.266 48.980 (1.067x) | 37.967 34.686 (1.095x) | bitops-nsieve-
| 30.212 30.212 (------) | 17.050 17.050 (------) | controlflow-re
| 92.342 92.005 (1.004x) | 30.433 29.963 (1.016x) | crypto-aes
| 41.186 41.312 (0.997x) | 4.782 4.782 (------) | crypto-md5
| 28.889 28.885 (------) | 6.422 6.422 (------) | crypto-sha1
| 77.692 77.701 (------) | 21.841 21.841 (------) | date-format-to
| 80.715 80.866 (0.998x) | 9.730 9.730 (------) | date-format-xp
| 53.899 53.903 (------) | 31.069 31.070 (------) | math-cordic
| 29.318 29.321 (------) | 6.173 6.173 (------) | math-partial-s
| 31.074 31.105 (0.999x) | 13.412 13.413 (------) | math-spectral-
| 58.342 58.346 (------) | 34.589 34.589 (------) | regexp-dna
| 39.969 39.977 (------) | 9.412 9.412 (------) | string-base64
| 94.827 94.840 (------) | 24.564 24.564 (------) | string-fasta
| 121.170 121.184 (------) | 17.288 17.288 (------) | string-tagclou
| 173.959 173.976 (------) | 21.394 21.394 (------) | string-unpack-
| 52.211 52.219 (------) | 8.562 8.562 (------) | string-validat
-------
| 1640.087 1635.543 (1.003x) | 610.686 605.368 (1.009x) | all
---------------------------------------------------------------
| millions of instructions executed |
| total | on-trace (may overestimate) |
---------------------------------------------------------------
| 3316.501 3316.590 (------) | 3023.065 3023.066 (------) | ai-astar
| 3108.953 2990.084 (1.040x) | 1766.336 1647.540 (1.072x) | audio-beat-det
| 1527.903 1527.938 (------) | 1344.244 1344.244 (------) | audio-dft
| 3054.483 2934.490 (1.041x) | 1743.453 1623.644 (1.074x) | audio-fft
| 2794.566 2774.118 (1.007x) | 1856.143 1835.663 (1.011x) | audio-oscillat
| 7825.059 7825.088 (------) | 5634.200 5634.200 (------) | imaging-gaussi
| 3464.357 3462.532 (1.001x) | 892.333 890.520 (1.002x) | imaging-darkro
| 6820.828 6798.927 (1.003x) | 4739.454 4717.559 (1.005x) | imaging-desatu
| 707.028 707.028 (------) | 10.676 10.676 (------) | json-parse-fin
| 515.755 515.756 (------) | 6.323 6.323 (------) | json-stringify
| 1550.083 1550.791 (------) | 738.006 738.005 (------) | stanford-crypt
| 833.973 833.878 (------) | 374.553 374.466 (------) | stanford-crypt
| 1913.334 1913.157 (------) | 1171.623 1171.362 (------) | stanford-crypt
| 572.239 570.295 (1.003x) | 233.664 232.360 (1.006x) | stanford-crypt
-------
| 38005.068 37720.679 (1.008x) | 23534.080 23249.634 (1.012x) | all
The Kraken results measure the reading of the data file as well, so the
improvement is under-exaggerated.
Attachment #478972 -
Flags: review?(edwsmith)
Comment 2•14 years ago
|
||
Comment on attachment 478972 [details] [diff] [review]
NJ patch (against TM 54567:5bd0b374d87b)
Ah, the benefits of linear traces! This looks correct and elegant to boot. I don't think the empty hashmap will impact TR negatively.
I bet this could be extended to work for LIR_jt/jf as well, although from looking at lots of code, it would only apply over small regions (until the next LIR_label clears CSE state).
Attachment #478972 -
Flags: review?(edwsmith) → review+
![]() |
Assignee | |
Comment 3•14 years ago
|
||
(In reply to comment #2)
> Comment on attachment 478972 [details] [diff] [review]
> NJ patch (against TM 54567:5bd0b374d87b)
>
> Ah, the benefits of linear traces! This looks correct and elegant to boot. I
> don't think the empty hashmap will impact TR negatively.
Yeah... my plans for bug 600143 may impact TR more, in which case I will add a boolean parameter to CseFilter to indicate if it should do this guard stuff.
> I bet this could be extended to work for LIR_jt/jf as well, although from
> looking at lots of code, it would only apply over small regions (until the next
> LIR_label clears CSE state).
Yes... that case is definitely of less interest to TM.
Updated•14 years ago
|
Attachment #478973 -
Flags: review?(gal) → review+
![]() |
Assignee | |
Comment 4•14 years ago
|
||
http://hg.mozilla.org/projects/nanojit-central/rev/2e21c25656ee
http://hg.mozilla.org/tracemonkey/rev/7e831598f812
http://hg.mozilla.org/tracemonkey/rev/566ddadcedcd
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
![]() |
Assignee | |
Comment 5•14 years ago
|
||
I had to back this out, it was causing test failures -- some insBranch() calls were returning NULL due to the increased amount of optimization, but we weren't checking for NULL before calling setTarget() on the result. See bug 559268 comment 1.
Comment 6•14 years ago
|
||
Curious if we did a similar optimization for conditional branches, i.e. on the fall-thru case if there would be any gain.
![]() |
Assignee | |
Comment 7•14 years ago
|
||
(In reply to comment #6)
> Curious if we did a similar optimization for conditional branches, i.e. on the
> fall-thru case if there would be any gain.
I know it won't help TM because our conditional branches only cover short distances. In comparison, guards affect long stretches of code (everything that follows).
Comment 8•14 years ago
|
||
TR
original patch: http://hg.mozilla.org/tamarin-redux/rev/c04594f235d0
backout patch: http://hg.mozilla.org/tamarin-redux/rev/230f6a2181c2
I'm not marking this fixed-in-tamarin yet, because of the backout patch.
![]() |
Assignee | |
Updated•14 years ago
|
![]() |
Assignee | |
Comment 9•14 years ago
|
||
I didn't end up needing the TM patch on the 2nd landing, it has been subsumed by other patches.
http://hg.mozilla.org/projects/nanojit-central/rev/7ed1632ff307
http://hg.mozilla.org/tracemonkey/rev/5feb11557dae
Comment 10•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 11•14 years ago
|
||
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Updated•11 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•