Ionmonkey: add alignment, or partial constant, propagation to the range analysis

NEW
Assigned to

Status

()

defect
6 years ago
5 years ago

People

(Reporter: dougc, Assigned: dougc)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Alignment type information is needed for the elimination of
unnecessary asm.js style pointer alignment masks.

A representation that notes constant bit ranges might be
better than simple alignment information, and give room
to improve.

Perhaps the stride of loop counters could be noted in the
alignment information.

The information might also help in adding support for
hoisting masking operations.

A start has been made on this support. It might be possible
to break up the work if resources are available, or land
partial support and improve it as necessary.
(In reply to Douglas Crosher [:dougc] from comment #0)
> Perhaps the stride of loop counters could be noted in the
> alignment information.

The LinearSum should already provide such info.  It gives a step information.

> A start has been made on this support. It might be possible
> to break up the work if resources are available, or land
> partial support and improve it as necessary.

This is a good idea, I'll suggest adding another field aside to decimal_, which represent the number of 0 in the low part of the numbers.  Or this can even be mixed with the decimal part, if we can notice that we divide by two, we can have an alignment of at least -1.

I don't know if this might help, but by default the element vector and all allocations are aligned on 8 bytes in SpiderMonkey.  I don't know if there is anything special about small typed array.
Wip to add partial constant support to range analysis.

This also includes MBitAnd::foldUnnecessaryBitAnd() that uses this information to eliminate redundant bitwise and operations, as can be generated in type array accesses using the pattern b[i&m>>n].

It moves some of the op folding into the range analysis phase, before the beta nodes are removed, see RangeAnalysis::foldOps(), and this also folds away unnecessary Op_BoundsCheck.

This approach to optimize typed array access is much more complex but it might help optimize other code too and helps for a much wider range or patterns.
Attachment #8521411 - Flags: feedback?(jdemooij)
Comment on attachment 8521411 [details] [diff] [review]
Add alignment, or partial constant, propagation to the range analysis

Thanks Douglas. I'm not very familiar with range analysis so I'll forward the feedback request to our range analysis gurus :)
Attachment #8521411 - Flags: feedback?(sunfish)
Attachment #8521411 - Flags: feedback?(nicolas.b.pierron)
Attachment #8521411 - Flags: feedback?(jdemooij)
Comment on attachment 8521411 [details] [diff] [review]
Add alignment, or partial constant, propagation to the range analysis

Review of attachment 8521411 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this looks like a promising start on an important feature!

::: js/src/jit/RangeAnalysis.cpp
@@ +1441,5 @@
> +    for (int i = 0; i < 32; i++) {
> +        if (value & (1 << i))
> +            return i;
> +    }
> +    return 32;

Please use CountTrailingZeroes32.

@@ +1856,2 @@
>        case Scalar::Uint16:
> +        return Range::NewUInt32Range(alloc, 0, UINT16_MAX, 0, 0xffff0000);

It looks tedious to have to special these partial-constant bits everywhere, when what we mainly have are known ranges. This patch includes code to automatically refine partial-constant bits using lower/upper information. Could many of these cases be computed automatically?

@@ +3285,5 @@
> +                    uint32_t maximum = ins->maximum();
> +                    if (indexRange) {
> +                        int32_t lower = indexRange->lower();
> +                        int32_t upper = indexRange->upper();
> +                        if (lower + minimum >= 0 && upper + maximum < len) {

Since minimum is unsigned, the + will be unsigned, so these comparisons will be unsigned, which is not what you want here.

::: js/src/jit/RangeAnalysis.h
@@ +237,5 @@
>          // implementation in some places.
> +        int32_t lowerLimit = (JSVAL_INT_MIN & ~partialConstantMask_) | partialConstant_;
> +        int32_t upperLimit = (JSVAL_INT_MAX & ~partialConstantMask_) | partialConstant_;
> +        MOZ_ASSERT_IF(!hasInt32LowerBound_, lower_ == lowerLimit);
> +        MOZ_ASSERT_IF(!hasInt32UpperBound_, upper_ == upperLimit);

Much of the existing range analysis code is set up to check hasInt32LowerBound_ and hasInt32UpperBound_, and to skip looking at the lower_ and upper_ values if those flags are false. Consequently, I think it'd be better to make sure that when we have partialConstantMask_ bits set, that we set the hasInt32LowerBound_ and hasInt32UpperBound_ flags to true. This can be handled in the optimize() function, or in the code which computes partialConstantMask_ values.

@@ +290,3 @@
>              hasInt32LowerBound_ = false;
>          } else {
> +            lower_ = (int32_t(x) & ~partialConstantMask_) | partialConstant_;

I believe the optimize() function will handle these cases, so we don't need to do this optimization here.

@@ +303,3 @@
>              hasInt32UpperBound_ = true;
>          } else {
> +            upper_ = (int32_t(x) & ~partialConstantMask_) | partialConstant_;

Ditto.

@@ +389,5 @@
> +                        upper_ = 0;
> +                }
> +            }
> +
> +            MOZ_ASSERT(lower_ <= upper_);

Can we do a full assertInvariants() here?

@@ +409,5 @@
> +                if (upper_ < lower_)
> +                     lower_ = upper_;
> +             }
> +
> +            MOZ_ASSERT(lower_ <= upper_);

We should do a full assertInvariants() here.

@@ +435,5 @@
>              assertInvariants();
>          }
> +
> +        if (!canHaveRoundingErrors())
> +            refinePartialConstantByBounds();

We have a similar situation between lower/upper and max-exponent information, where new information of either kind can imply better information of the other kind. Currently, the rule is that inside the Range class, lower/upper are primary; max-exponent is refined if lower/upper are better, and it is asserted to be not worse, and this assertion has actually been quite useful for turning up unexpected situations. For the places where we get new max-exponent information, we have a standalone class outside the Range class, refineInt32BoundsByExponent, so that we can maintain the invariant within the Range class.

partial-constant bits introduce a third player into this arena, and right now this patch has partial-constant bits refining lower/upper, and lower/upper refining partial-constant bits. Would it make sense to convert this to a similar asymmetric relationship with one direction being asserted?
Attachment #8521411 - Flags: feedback?(sunfish)
Comment on attachment 8521411 [details] [diff] [review]
Add alignment, or partial constant, propagation to the range analysis

Review of attachment 8521411 [details] [diff] [review]:
-----------------------------------------------------------------

This patch computes the partialConstant and the partialConstantMask.  The purpose of these 2 fields are to compute the bits of Int32 numbers which are statically known ahead of time.
I think this is quite complicated to reason about the partialConstantMask, and the partialConstant is slightly redundant with the Range.  This code is doing a constant propagation of the known bits.

The intent of this modification is to remove the unnecessary bit mask by keeping a mask of the known bits and by checking that the known bits are already zero-ed, in which case the filtering bit-and operator can be removed.

Range analysis is already a complex beast with tons of edge cases (:scary code:).  This code is large and consequently hard to follow, in addition, it is entangled with the bounds of the Range, which add inter-dependencies and complexity.  An alternative which is less entangled with the rest of the Range information would be easier to follow.

As we are interested in the fact that the partial constant is zero for the filtered bits, I am convinced that the partialConstant can be replaced by a flag which state the status of all the bits masked by the partialConstantMask.  Then, I do not yet see the advantage of having knowledge about the inner bits, while the alignment seems to be the most discriminating factor for the accesses, in which case the partialConstantMask can be replaced by a knownLowBitsExponent_ constant which goes from [0, 32].

::: js/src/jit/RangeAnalysis.cpp
@@ +719,5 @@
>          canBeNegativeZero_ = ExcludesNegativeZero;
>  
> +    if (!canHaveRoundingErrors()) {
> +        partialConstant_ = d;
> +        partialConstantMask_ = -1L;

Any reason to support doubles, is that only for UInt32 max ?

@@ +769,5 @@
> +        int32_t carryMask = 1;
> +
> +        int32_t i, b;
> +
> +        for (i = 0, b = 1; i < 32; i++, b <<= 1) {

Could we use a conservative equivalent instead, such as:

  // Conservative approximation which keep the knowledge about the low bits.
  addCstMask = (~(lhs->partialConstantMask() + 1) & lhs->partialConstantMask())
             & (~(rhs->partialConstantMask() + 1) & rhs->partialConstantMask());
  addCst = lhs->partialConstant() + rhs->partialConstant();

@@ +1150,5 @@
> +    if (c > 0) {
> +        if (partialConstantMask & 0x80000000) {
> +            MOZ_ASSERT((lhs->lower_ >= 0 && lhs->upper_ >= 0) || (lhs->lower_ < 0 && lhs->upper_ < 0));
> +            // If the bounds are the same sign then constant bits are shifted in.
> +            partialConstantMask = (partialConstantMask >> shift) | ((int32_t)0x80000000 >> (shift - 1));

nit: prefer C++ cast, for its unambiguous scoping.

@@ +1452,5 @@
>          double d = value().toNumber();
>          setRange(Range::NewDoubleSingletonRange(alloc, d));
>      } else if (value().isBoolean()) {
>          bool b = value().toBoolean();
> +        setRange(Range::NewInt32Range(alloc, b, b, b, 0xffffffff));

Isn't the mask supposed to be 0xfffffffe ?

@@ +1602,5 @@
>  
>  void
>  MClz::computeRange(TempAllocator &alloc)
>  {
> +    setRange(Range::NewUInt32Range(alloc, 0, 32, 0, 0xfffffc00));

Isn't the mask supposed to be 0xffffffe0 ?
Attachment #8521411 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8540984 [details] [diff] [review]
Add alignment, or partial constant, propagation to the range analysis

Review of attachment 8540984 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +2651,5 @@
> +            int32_t partialConstant = rhsRange->partialConstant();
> +            int32_t partialConstantMask = rhsRange->partialConstantMask();
> +            if (!(~value & (~partialConstantMask | (partialConstantMask & partialConstant)))) {
> +                return rhs;
> +            }

No braces, also on the other single-liner below.  No need for js:: qualification inside this method, either -- you're in js::jit, so no need for it.  (Unless there's js::jit::Value, but that would be insane.)

::: js/src/jit/RangeAnalysis.cpp
@@ +780,5 @@
> +        int32_t carryMask = 1;
> +
> +        int32_t i, b;
> +
> +        for (i = 0, b = 1; i < 32; i++, b <<= 1) {

Supposing this goes all the way to i == 31, at the end you have b = 2**30 << 1, which is not representable in int32_t, thereby triggering undefined behavior in C++11.

I don't know of a way to concisely write this algorithm to loop through all the bits in a twos-complement int32_t, unfortunately.  Best I can come up with would be

  for (uint32_t i = 0, ub = 1; i < 32; i++, ub <<= 1) {
    int32_t b = mozilla::BitwiseCast<int32_t>(ub);

    ...
  }

but even doing that is at least kind of dodgy.  Maybe sunfish would know of something.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #7)
> Comment on attachment 8540984 [details] [diff] [review]
> Add alignment, or partial constant, propagation to the range analysis

Thank you for the feedback. This patch is still very much a wip. And thank you to the other reviewers too for their feedback which has a lot of merit and I am looking into it.

> > +        for (i = 0, b = 1; i < 32; i++, b <<= 1) {
> 
> Supposing this goes all the way to i == 31, at the end you have b = 2**30 <<
> 1, which is not representable in int32_t, thereby triggering undefined
> behavior in C++11.
>
> I don't know of a way to concisely write this algorithm to loop through all
> the bits in a twos-complement int32_t, unfortunately.  Best I can come up
> with would be
> 
>   for (uint32_t i = 0, ub = 1; i < 32; i++, ub <<= 1) {
>     int32_t b = mozilla::BitwiseCast<int32_t>(ub);
> 
>     ...
>   }

Thank you for alerting me to the issue. The operation can be performed on unsigned values to avoid left shifting a signed value - coercing the inputs and outputs. However can I assume the representation is two's complement?
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> Comment on attachment 8521411 [details] [diff] [review]
> Add alignment, or partial constant, propagation to the range analysis
> 
> Review of attachment 8521411 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch computes the partialConstant and the partialConstantMask.  The
> purpose of these 2 fields are to compute the bits of Int32 numbers which are
> statically known ahead of time.
> I think this is quite complicated to reason about the partialConstantMask,
> and the partialConstant is slightly redundant with the Range.  This code is
> doing a constant propagation of the known bits.
> 
> The intent of this modification is to remove the unnecessary bit mask by
> keeping a mask of the known bits and by checking that the known bits are
> already zero-ed, in which case the filtering bit-and operator can be removed.

I am still exploring this. Yes, the partial constants add complexity and could cause a testing and maintenance problem. However I would like to explore the potential of the partial constant propagation a little more to help answer some questions I have on it's utility.

* For example, it might be possible to optimize away bounds checks in code that aligns a pointer to a 2^n boundary when it can be proven that the index can not overflow the 2^n block. A guard area at the end of the buffer would work as well in this case, but using this pattern the guard zone could be avoided and the block size could vary for different operations. This could not be handled by an interval plus alignment info alone.

  ptr = ptr & 0xff00;
  for (i = 0; i < 256; i++)
    u8[ptr + i] = 0;

* For example, being able to represent known zero low and high bits might be useful for hoisted index masking. In this example the hoisted pointer mask both masks high bits to allow the index to be proven in bounds and the bounds check to be optimized away, and masks the low bits to allow the low bit masking to be eliminated. Although an interval range plus alignment would have been sufficient here.

  ptr &= 0xfffc; // Mask high bits and low bits. Buffer has a guard zone beyond 0xffff;
  for (i = 0; i < 256; i += 4)
    u32[(ptr + i) >> 2] = u32[(ptr + i + 4) >> 2];

* If only the number of low zero bits are represented (for alignment info) then even adding a one bit value to an index destroys the entire represented info whereas a partial constant do much better. For example, a pointer might usefully use low bit tagging. Let's say it uses two low bits as a type tag and 01 represents an object with a few 32 bit integer slots. In this example the bounds checking can be optimized away and the de-tagging offset de-hoisted into the access instructions.

  ptr &= 0xffff; // Mask high bits. Buffer has a guard zone beyond 0xffff;
  if ((ptr & 3) == 1)
    u32[(ptr - 1) >> 2] = u32[(ptr + 4 - 1) >> 2] + u32[(ptr + 8 - 1) >> 2]

The wip patches do not yet handle all these, but I'd like to explore if such patterns can be optimized.
You need to log in before you can comment on or make changes to this bug.