nanojit: do implicit constant propagation after guards

RESOLVED FIXED

Status

Core Graveyard
Nanojit
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(2 attachments)

(Assignee)

Description

8 years ago
Created attachment 478972 [details] [diff] [review]
NJ patch (against TM 54567:5bd0b374d87b)

Once you get past a guard such as "xt cmp" you know that 'cmp' must be
false.  Likewise, if you get past "xf cmp2" you know that 'cmp2' must be
true.

For a bigger example, consider the JS code:

    a[i] += 3;

The relevant LIR looks like this (with the getelem and setelem annotated):

    #------------------------------------------------------------------------
    # getelem
    #------------------------------------------------------------------------
    ldi5 = ldi.o $stack5[4]
    clasp = immi 0x2ee060
    guard(class is Array) = eqi ldi5, clasp/*0x2ee060*/
    xf4: xf guard(class is Array) -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=002)
    capacity = ldi.o $stack5[40]
    ltui1 = ltui ldi1, capacity
+   xf3: xf ltui1 -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=003)
    dslots = ldi.o $stack5[24]
    immi5 = immi 3
    lshi2 = lshi ldi1, immi5/*3*/
    addi2 = addi dslots, lshi2
    ldi4 = ldi.o addi2[4]
    JSVAL_TAG_INT32 = immi 0xffff0001
    eqi4 = eqi ldi4, JSVAL_TAG_INT32/*0xffff0001*/
    xf2: xf eqi4 -> pc=0x613532 imacpc=0x0 sp+32 rp+0 (GuardID=004)
    ldi3 = ldi.o addi2[0]
    sti.s sp[16] = ldi3
    #--------------------------------------------------------------------
    sti.s sp[24] = immi5/*3*/
    addxovi2 = addxovi ldi3, immi5/*3*/ -> pc=0x613535 imacpc=0x0 sp+32 rp+0 (GuardID=005)
    sti.s sp[16] = addxovi2
    #------------------------------------------------------------------------
    # setelem
    #------------------------------------------------------------------------
*   jt ltui1 -> label3
*   js_EnsureDenseArrayCapacity1 = calli.all #js_EnsureDenseArrayCapacity ( cx $stack5 ldi1 )
*   eqi3 = eqi js_EnsureDenseArrayCapacity1, strict/*0*/
*   xt2: xt eqi3 -> pc=0x613536 imacpc=0x0 sp+24 rp+0 (GuardID=007)
*   label3:
**  dslots2 = ldi.o $stack5[24]
**  immi4 = immi 3
**  lshi1 = lshi ldi1, immi4/*3*/
**  addi1 = addi dslots2, lshi1
    JSVAL_TAG_MAGIC = immi 0xffff0004
**  ldi2 = ldi.o addi1[4]
    eqi2 = eqi ldi2, JSVAL_TAG_MAGIC/*0xffff0004*/
    jf eqi2 -> label2
    hasNoIndexedProperties = calli.all #js_Array_dense_setelem_hole ( cx $stack5 ldi1 )
    immi3 = immi 0
    eqi1 = eqi hasNoIndexedProperties, immi3/*0*/
    xt1: xt eqi1 -> pc=0x613536 imacpc=0x0 sp+24 rp+0 (GuardID=008)
    label2:
    JSVAL_TAG_INT32_2 = immi 0xffff0001
    sti.o addi1[4] = JSVAL_TAG_INT32_2/*0xffff0001*/
    sti.o addi1[0] = addxovi2
    #------------------------------------------------------------------------

We know that after "xf ltui1" (marked with '+') that 'ltui1' is true (else
we would have exited).  This allows us to determine that "jt ltui1" will
always be taken, and so we can get rid of all the lines marked with '*'.
Because this gets rid of the js_EnsureDenseArrayCapacity() call, which was
causing some aliasing, CSE can do more and gets rid of the lines
marked '**'.

This pattern shows up frequently in code that interleaves array gets and
sets.

(Furthermore, once those are gone we can see that ldi4 gets compared against
JSVAL_TAG_INT32 and then later against JSVAL_TAG_MAGIC, and that the latter
cannot succeed, ie. the hole check can be removed.  But that's fodder for a
follow-up bug.)

This patch implements this constant-propagation-after-guards.  It does it in
the CseFilter, which seems the best place as that's when repeated
comparisons (which are what can be determined as constant after guards) are
detected.  The clearing of the table holding the guard info follows the same
pattern as the clearing of the CSE tables.

Instruction counts for Sunspider and Kraken (V8's results aren't much changed):

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|    99.984   100.164 (0.998x) |    46.959    46.960 (------) | 3d-cube
|    47.143    47.152 (------) |    25.478    25.478 (------) | 3d-morph
|   106.615   106.640 (------) |    43.142    43.139 (------) | 3d-raytrace
|    42.552    42.556 (------) |    14.534    14.534 (------) | access-binary-
|   111.993   110.415 (1.014x) |    92.709    91.145 (1.017x) | access-fannkuc
|    38.230    38.277 (0.999x) |    17.185    17.186 (------) | access-nbody
|    48.448    48.452 (------) |    28.184    28.184 (------) | access-nsieve
|    16.410    16.412 (------) |     3.250     3.250 (------) | bitops-3bit-bi
|    45.795    45.798 (------) |    32.523    32.523 (------) | bitops-bits-in
|    24.833    24.833 (------) |    12.020    12.020 (------) | bitops-bitwise
|    52.266    48.980 (1.067x) |    37.967    34.686 (1.095x) | bitops-nsieve-
|    30.212    30.212 (------) |    17.050    17.050 (------) | controlflow-re
|    92.342    92.005 (1.004x) |    30.433    29.963 (1.016x) | crypto-aes
|    41.186    41.312 (0.997x) |     4.782     4.782 (------) | crypto-md5
|    28.889    28.885 (------) |     6.422     6.422 (------) | crypto-sha1
|    77.692    77.701 (------) |    21.841    21.841 (------) | date-format-to
|    80.715    80.866 (0.998x) |     9.730     9.730 (------) | date-format-xp
|    53.899    53.903 (------) |    31.069    31.070 (------) | math-cordic
|    29.318    29.321 (------) |     6.173     6.173 (------) | math-partial-s
|    31.074    31.105 (0.999x) |    13.412    13.413 (------) | math-spectral-
|    58.342    58.346 (------) |    34.589    34.589 (------) | regexp-dna
|    39.969    39.977 (------) |     9.412     9.412 (------) | string-base64
|    94.827    94.840 (------) |    24.564    24.564 (------) | string-fasta
|   121.170   121.184 (------) |    17.288    17.288 (------) | string-tagclou
|   173.959   173.976 (------) |    21.394    21.394 (------) | string-unpack-
|    52.211    52.219 (------) |     8.562     8.562 (------) | string-validat
-------
|  1640.087  1635.543 (1.003x) |   610.686   605.368 (1.009x) | all

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | on-trace (may overestimate)  |
---------------------------------------------------------------
|  3316.501  3316.590 (------) |  3023.065  3023.066 (------) | ai-astar
|  3108.953  2990.084 (1.040x) |  1766.336  1647.540 (1.072x) | audio-beat-det
|  1527.903  1527.938 (------) |  1344.244  1344.244 (------) | audio-dft
|  3054.483  2934.490 (1.041x) |  1743.453  1623.644 (1.074x) | audio-fft
|  2794.566  2774.118 (1.007x) |  1856.143  1835.663 (1.011x) | audio-oscillat
|  7825.059  7825.088 (------) |  5634.200  5634.200 (------) | imaging-gaussi
|  3464.357  3462.532 (1.001x) |   892.333   890.520 (1.002x) | imaging-darkro
|  6820.828  6798.927 (1.003x) |  4739.454  4717.559 (1.005x) | imaging-desatu
|   707.028   707.028 (------) |    10.676    10.676 (------) | json-parse-fin
|   515.755   515.756 (------) |     6.323     6.323 (------) | json-stringify
|  1550.083  1550.791 (------) |   738.006   738.005 (------) | stanford-crypt
|   833.973   833.878 (------) |   374.553   374.466 (------) | stanford-crypt
|  1913.334  1913.157 (------) |  1171.623  1171.362 (------) | stanford-crypt
|   572.239   570.295 (1.003x) |   233.664   232.360 (1.006x) | stanford-crypt
-------
| 38005.068 37720.679 (1.008x) | 23534.080 23249.634 (1.012x) | all

The Kraken results measure the reading of the data file as well, so the
improvement is under-exaggerated.
Attachment #478972 - Flags: review?(edwsmith)
(Assignee)

Comment 1

8 years ago
Created attachment 478973 [details] [diff] [review]
TM patch (against TM 54567:5bd0b374d87b)

TM part.
Attachment #478973 - Flags: review?(gal)
(Assignee)

Updated

8 years ago
Blocks: 600143

Comment 2

8 years ago
Comment on attachment 478972 [details] [diff] [review]
NJ patch (against TM 54567:5bd0b374d87b)

Ah, the benefits of linear traces!  This looks correct and elegant to boot.  I don't think the empty hashmap will impact TR negatively.

I bet this could be extended to work for LIR_jt/jf as well, although from looking at lots of code, it would only apply over small regions (until the next LIR_label clears CSE state).
Attachment #478972 - Flags: review?(edwsmith) → review+
(Assignee)

Comment 3

8 years ago
(In reply to comment #2)
> Comment on attachment 478972 [details] [diff] [review]
> NJ patch (against TM 54567:5bd0b374d87b)
> 
> Ah, the benefits of linear traces!  This looks correct and elegant to boot.  I
> don't think the empty hashmap will impact TR negatively.

Yeah... my plans for bug 600143 may impact TR more, in which case I will add a boolean parameter to CseFilter to indicate if it should do this guard stuff.

> I bet this could be extended to work for LIR_jt/jf as well, although from
> looking at lots of code, it would only apply over small regions (until the next
> LIR_label clears CSE state).

Yes... that case is definitely of less interest to TM.

Updated

8 years ago
Attachment #478973 - Flags: review?(gal) → review+
(Assignee)

Comment 5

8 years ago
I had to back this out, it was causing test failures -- some insBranch() calls were returning NULL due to the increased amount of optimization, but we weren't checking for NULL before calling setTarget() on the result.  See bug 559268 comment 1.
(Assignee)

Updated

8 years ago
Depends on: 559268
(Assignee)

Updated

8 years ago
Depends on: 600489
(Assignee)

Updated

8 years ago
No longer depends on: 559268

Comment 6

8 years ago
Curious if we did a similar optimization for conditional branches, i.e. on the fall-thru case if there would be any gain.
(Assignee)

Comment 7

8 years ago
(In reply to comment #6)
> Curious if we did a similar optimization for conditional branches, i.e. on the
> fall-thru case if there would be any gain.

I know it won't help TM because our conditional branches only cover short distances.  In comparison, guards affect long stretches of code (everything that follows).

Comment 8

8 years ago
TR
original patch: http://hg.mozilla.org/tamarin-redux/rev/c04594f235d0
backout patch: http://hg.mozilla.org/tamarin-redux/rev/230f6a2181c2

I'm not marking this fixed-in-tamarin yet, because of the backout patch.
(Assignee)

Updated

8 years ago
Depends on: 600779, 601009
(Assignee)

Updated

8 years ago
Depends on: 604297
(Assignee)

Comment 9

8 years ago
I didn't end up needing the TM patch on the 2nd landing, it has been subsumed by other patches.

http://hg.mozilla.org/projects/nanojit-central/rev/7ed1632ff307
http://hg.mozilla.org/tracemonkey/rev/5feb11557dae

Comment 10

8 years ago
http://hg.mozilla.org/mozilla-central/rev/5feb11557dae
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Depends on: 607856

Comment 11

8 years ago
http://hg.mozilla.org/tamarin-redux/rev/468941735974
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Depends on: 608646
Component: Nanojit → Nanojit
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.