Closed Bug 1140932 Opened 11 years ago Closed 6 years ago

JIT: precise bitwise-or derived result range for all int32 input ranges.

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dougc, Assigned: dougc)

Details

Attachments

(3 files, 4 obsolete files)

The compilers range derivation for the bitwise-or operation can be more precise. This bug adds precise range derivation for the case of positive input ranges. It happens to be possible to derive a precise range in this case with a simple fast algorithm. The algorithm has been cross-contributed to V8, BSD Licensed. More thorough and consistent type derivation in JS compilers will assist app developers writing optimized code and help make it more practical for code generators such as Emscripten to depend on type derivation when making code generation decisions. Thus the cross-contribution. https://codereview.chromium.org/986073003/
Comment on attachment 8574479 [details] [diff] [review] Precise derived range for the bitwise-or operation given positive input ranges. Review of attachment 8574479 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/RangeAnalysis.cpp @@ +834,5 @@ > + int32_t rhsUpperBit = rhsUpper & bit; > + if (lhsLowerBit != lhsUpperBit || rhsLowerBit != rhsUpperBit) { > + if (lhsLowerBit == lhsUpperBit) { > + if (lhsLowerBit) > + rhsLower = (rhsLower & ~mask) | bit; I am sorry, but can you comment this algorithm or refer to a description of it. One thing I do not understand in this algorithm, is why do we have a "mutable" variable named rhsLower. I would either expect a different name, especially since this variable is being modified based on the content of lhsLower & lhsUpper. From what I understand this loop is try to handle 2 things at once. The first one is what is the final bit of lower at position p. The second, is to fill the "boundaries" with a mask if the bits are different. What I mean is that any "| bit" and "& ~bit" is made to go to the result of "lower", and any "| mask" and "& ~mask" is made to account for the "boundaries". I don't know exactly yet, but I have the feeling that this loop can be replaced with simple expressions, as we have below.
Attachment #8574479 - Flags: review?(nicolas.b.pierron)
Comment the code.
Attachment #8574479 - Attachment is obsolete: true
(In reply to Nicolas B. Pierron [:nbp] from comment #2) > Comment on attachment 8574479 [details] [diff] [review] > Precise derived range for the bitwise-or operation given positive input ... > I am sorry, but can you comment this algorithm or refer to a description of > it. An attempt to comment the code in the source has been made. Please take another look. > One thing I do not understand in this algorithm, is why do we have a > "mutable" variable named rhsLower. I would either expect a different name, > especially since this variable is being modified based on the content of > lhsLower & lhsUpper. The naming made sense to me when developing the algorithm as it represents the input ranges. If it still seems inappropriate then I am happy to change the names. > From what I understand this loop is try to handle 2 things at once. The > first one is what is the final bit of lower at position p. The second, is > to fill the "boundaries" with a mask if the bits are different. What I mean > is that any "| bit" and "& ~bit" is made to go to the result of "lower", and > any "| mask" and "& ~mask" is made to account for the "boundaries". > > I don't know exactly yet, but I have the feeling that this loop can be > replaced with simple expressions, as we have below. I've tried to explain my thoughts in the comments. It is accumulating the result in the ranges but this seems appropriate. I've taken another look over it and could not spot any way to simplify it further. A challenge is that on each iteration all the lower bits can be modified, not just an adjacent bit, and these are inputs into later iteration decisions. There may be another way to solve this problem, but it escapes me at present. I'll work on some black-box testing to try and give confidence in the algorithm even if the correctness is difficult to see on review. Exhaustive testing of narrower bit-length integer ranges is practical and might give the necessary confidence. It is a good deal of work to get this to the necessary standard but I think a good set of well scrutinised algorithms will be a good assert for the JS compiler developer community. This patch is just a start at improving range derivation and there appears to be much more to do. There is also the matter of adding support for deriving a low bit alignment, but I was planning to start by landing improvements to the existing code.
Clarify a comment.
Attachment #8575744 - Attachment is obsolete: true
Commented and with range derivation tests added.
Attachment #8575771 - Attachment is obsolete: true
Attachment #8576490 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8576490 [details] [diff] [review] Precise derived range for the bitwise-or operation given positive input ranges. Review of attachment 8576490 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/RangeAnalysis.cpp @@ +830,5 @@ > + rhsLower = rhs->lower(), rhsUpper = rhs->upper(); > + bit = 0x40000000, mask = 0x3fffffff; > + // Loop, terminating when the ranges are equal, moving down one bit on > + // each iteration. > + for (; lhsLower != lhsUpper || rhsLower != rhsUpper; bit >>= 1, mask >>= 1) { I do not understand this ending condition, nor why it would be safe and not do infinite loops. Also, this does not match the comment above which mention that the ranges are equal. IF we were to compare ranges shouldn't we compare something like: lhsLower != rhsLower || lhsUpper != rhsUpper ? Also, I still don't like these loops. Please have a look at the suggestion that I will attach, which is focused on replacing this loop by some simple operations which do not involve any loop. @@ +856,5 @@ > + // range is not significant and is culled. This also > + // maintains a one in this position which matches the > + // lhs and maintains the above invariant when comparing > + // the next bit. > + rhsLower = (rhsLower & ~mask) | bit; Thanks for the comment. I really have hard time understanding why this is safe to have these reenter the loop, and still change other values based on this lower/upper bound.
Attachment #8576490 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #7) > Comment on attachment 8576490 [details] [diff] [review] > Precise derived range for the bitwise-or operation given positive input > ranges. > > Review of attachment 8576490 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: js/src/jit/RangeAnalysis.cpp > @@ +830,5 @@ > > + rhsLower = rhs->lower(), rhsUpper = rhs->upper(); > > + bit = 0x40000000, mask = 0x3fffffff; > > + // Loop, terminating when the ranges are equal, moving down one bit on > > + // each iteration. > > + for (; lhsLower != lhsUpper || rhsLower != rhsUpper; bit >>= 1, mask >>= 1) { > > I do not understand this ending condition, nor why it would be safe and not > do infinite loops. > Also, this does not match the comment above which mention that the ranges > are equal. IF we were to compare ranges shouldn't we compare something like: Comment should have been '... when the range limits are equal.', shall fix. When both range limits are equal then the input ranges are singletons and the result is trivially known by applying the OR operation. Shall add an assertion that the upper bits of each range limit are equal and this should make it clearer. > Also, I still don't like these loops. Please have a look at the suggestion > that I will attach, which is focused on replacing this loop by some simple > operations which do not involve any loop. I follow that the body of the loop can be replaced with bit operations that avoid branching, but there is a big leap near the end of your work in which the loop is replaced by clz etc operations and it is not immediately obvious that this is correct and it lacks working code to demonstrate it, and is it any easier to validate by inspection? I can confirm that the low bits only need setting or clearing once, but it's not clear that the position at which to apply this can be simply calculated. I have recorded the first bit position that each limit low bits is set or cleared and compared them to your equations for lhsHighestCommonBit, rhsHighestCommonBit, lhsHighestDiffBitOfRhs, and rhsHighestDiffBitOfLhs but none of these seem to correlate with the noted bit positions. As noted in comment 4, a challenge for eliminating the loop is that the changes to the low bits are used as inputs into decisions in later loop iterations. Perhaps at any loop iteration it might be possible to jump ahead a number of iterations by using clz etc, looking for the next low bit change. This might even expose more opportunities to exit early. But this would appear to just add more complexity, and the performance of the current algorithm appears adequate. Searching for a simplification might be useful if it is easier to verify the simplification by inspection, or if it is performance critical, but is the lack of a non-looping algorithm really blocking this patch? Do I need to write up a web page documenting a derivation of the algorithm? Shall do this if really necessary, just let me know. Is the exhaustive black-box validation (of a much narrower bit width) not sufficient? > @@ +856,5 @@ > > + // range is not significant and is culled. This also > > + // maintains a one in this position which matches the > > + // lhs and maintains the above invariant when comparing > > + // the next bit. > > + rhsLower = (rhsLower & ~mask) | bit; > > Thanks for the comment. > > I really have hard time understanding why this is safe to have these reenter > the loop, and still change other values based on this lower/upper bound. I do not envy anyone trying to understand this. There is probably a lot of missing context. Perhaps a decision needs to be made if the source can be adequately commented or if an external derivation is really need?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Douglas Crosher [:dougc] from comment #9) > (In reply to Nicolas B. Pierron [:nbp] from comment #7) > > ::: js/src/jit/RangeAnalysis.cpp > > @@ +830,5 @@ > > > + rhsLower = rhs->lower(), rhsUpper = rhs->upper(); > > > + bit = 0x40000000, mask = 0x3fffffff; > > > + // Loop, terminating when the ranges are equal, moving down one bit on > > > + // each iteration. > > > + for (; lhsLower != lhsUpper || rhsLower != rhsUpper; bit >>= 1, mask >>= 1) { > > > > I do not understand this ending condition, nor why it would be safe and not > > do infinite loops. > > Also, this does not match the comment above which mention that the ranges > > are equal. IF we were to compare ranges shouldn't we compare something like: > > Comment should have been '... when the range limits are equal.', shall fix. > When both range limits are equal then the input ranges are singletons and > the result is trivially known by applying the OR operation. > > Shall add an assertion that the upper bits of each range limit are equal and > this should make it clearer. I think the loop condition might be easier to read with a loop..until, i-e written as: !(lhsLower == lhsUpper && rhsLower == rhsUpper) which is the state that we want to reach at the end. > > Also, I still don't like these loops. Please have a look at the suggestion > > that I will attach, which is focused on replacing this loop by some simple > > operations which do not involve any loop. > > I follow that the body of the loop can be replaced with bit operations that > avoid branching, but there is a big leap near the end of your work in which > the loop is replaced by clz etc operations and it is not immediately obvious > that this is correct and it lacks working code to demonstrate it, and is it > any easier to validate by inspection? I tihnk a finite sequence of instruction which can be back by the intuition is way better. I understand the steps that for modifying the rhsLower & LhsLower bounds. What I do not understand is why do we need to modify the rhsUpper & lhsUpper, and precisely how it plays when re-entering the loop. I have problems to understand why the previous condition is enough to guarantee that the loop will end. I agree, I did a giant leap at the end, and it might be incorrect as if we want to respect the loop, we should modify only one lhsLower / rhsLower before re-doing the same expression again, but I feel way safer with 5 lines out-side a loop, than having a loop where it takes me hours and I am still unable to explain why it will converge, nor to what it will converge. > I can confirm that the low bits only need setting or clearing once, but it's > not clear that the position at which to apply this can be simply calculated. The clz is used to calculate the first bit (as the loop does by iterating) at which the condition is changing. The bitwise expression is basically computing 32 conditions in parallel, and clz isolate the first which is evaluated differently. we could do same with: bit = 1 << (32 - clz(cond)); > I have recorded the first bit position that each limit low bits is set or > cleared and compared them to your equations for lhsHighestCommonBit, > rhsHighestCommonBit, lhsHighestDiffBitOfRhs, and rhsHighestDiffBitOfLhs but > none of these seem to correlate with the noted bit positions. As noted in > comment 4, a challenge for eliminating the loop is that the changes to the > low bits are used as inputs into decisions in later loop iterations. Which is exactly why I do not understand this algorithm. > Perhaps at any loop iteration it might be possible to jump ahead a number of > iterations by using clz etc, looking for the next low bit change. This might > even expose more opportunities to exit early. But this would appear to just > add more complexity, and the performance of the current algorithm appears > adequate. I agree, the clz does not bring anything, the clz was an attempt to remove the first mutation made by the loop. > Searching for a simplification might be useful if it is easier to verify the > simplification by inspection, or if it is performance critical, but is the > lack of a non-looping algorithm really blocking this patch? The simplification is mandatory for getting this code in a state which can be understood by new contributors. If I accept this patch, then I would be responsible for fixing it later. If I do not understand today how this patch is working, I will not be able to fix it. > Do I need to write up a web page documenting a derivation of the algorithm? > Shall do this if really necessary, just let me know. > > Is the exhaustive black-box validation (of a much narrower bit width) not > sufficient? Unfortunately no, a exhaustive (multiple billions of cases) black-box testing just validate that we have the expected output. It does not walk you through a proof which is humanly comprehensible.
Flags: needinfo?(nicolas.b.pierron)
Updated to handle all int32 input ranges, not just positive ranges. Also spotted a few more early-out opportunities.
Attachment #8576490 - Attachment is obsolete: true
Summary: JIT: precise derived range for the bitwise-or operation given positive input ranges. → JIT: precise bitwise-or derived result range for all int32 input ranges.
Priority: -- → P5
(In reply to Robin Templeton from comment #12) > Created attachment 8817205 [details] > correctness proof for BitwiseOrLow Thanks, I will review the proof, and look if the code match the proof.
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(nicolas.b.pierron) → needinfo?(jdemooij)

Clearing the NI. I think this isn't worth it for Ion, especially considering the WebAssembly/Cranelift future.

Flags: needinfo?(jdemooij)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: