Differential Testing: Different output message involving Math operations

RESOLVED FIXED in Firefox 46

Status

()

defect
--
major
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: gkw, Assigned: nbp)

Tracking

(Blocks 2 bugs, {regression, testcase})

Trunk
mozilla48
x86_64
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox46 fixed, firefox47 fixed, firefox48 fixed, firefox-esr45 fixed)

Details

Attachments

(7 attachments, 1 obsolete attachment)

function f(x) {
    return ~(((~0 | 0) >>> 0 || 0) + Math.pow(Math.cos(x >>> 0), Math.atan2(0, x)))
}
f(0)
print(f(-9999))


$ ./js-dbg-64-dm-darwin-6020a4cb41a7 --fuzzing-safe --no-threads --baseline-eager testcase.js
0

$ ./js-dbg-64-dm-darwin-6020a4cb41a7 --fuzzing-safe --no-threads --ion-eager testcase.js
-1

Tested this on m-c rev 6020a4cb41a7.

My configure flags are:

CC="clang -Qunused-arguments" CXX="clang++ -Qunused-arguments" AR=ar AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=x86_64-apple-darwin12.5.0 --enable-debug --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

python -u ~/funfuzz/js/compileShell.py -b "--enable-debug --enable-more-deterministic" -r 6020a4cb41a7

This goes back past m-c rev bcacb5692ad9 (early 2015), so it will be great if Jan or Hannes might be able to diagnose for a start, and move it on if necessary.
Flags: needinfo?(jdemooij)
Flags: needinfo?(hv1989)
Looks like a truncation issue. Looking
Flags: needinfo?(jdemooij)
Assignee: nobody → hv1989
Attachment #8707805 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8707805 [details] [diff] [review]
Spew some truncate info.

Review of attachment 8707805 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/RangeAnalysis.cpp
@@ +131,5 @@
> +        JitSpewHeader(JitSpew_Range);
> +        Fprinter& out = JitSpewPrinter();
> +        out.printf("truncating ");
> +        def->printName(out);
> +        out.printf(" (kind: %s, clone %d)\n", MDefinition::TruncateKindString(kind), shouldClone);

nit: Add a colon after "clone".
Attachment #8707805 - Flags: review?(nicolas.b.pierron) → review+
I think the fault is here:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/RangeAnalysis.cpp#2855

The comment is correct, but doesn't cover everything.
It doesn't take into account that something else can make an assumption which we are guarding, but not directly in the path from this instruction till the end.

      [block 1]
      MUrsh 0xffffffff 0

[block 2]         [block 3]
                  MConstant

      [block 4]
      (resumepoint mphi ...)
  (2) MPhi mursh mconstant
      MPow
  (4) MToInt32 mpow
      MToDouble mtoint32
  (3) MAdd mphi mtodouble
  (1) MBitNot madd

(1) activates the truncation analysis
(2) is the phi we are looking at if we need to clone
(3) is the assumption made that will fail
(4) is where we will bailout

For the MPhi -> MAdd -> MBitNot there are no removed uses or any resume point 'observing' the result. We identify correctly that the MUrsh cannot get truncated. Though in the current logic we can remove the MPhi...

That is incorrect. We made an assumption in (3). We assumed that all inputs will be int32, which is currently the case. Only on that assumption our truncation algorithm treats the MPhi (2) as truncatable.

Now that assumption is going to fail during execution in instruction (4). But since we were too eagerly in truncating (2), we will now see the wrong value for (2) during bailout...

(in attachment a patch to disable the MFilterTypeSet to show this issue is still working without it).

-------------------------------------------------------------


I'm not sure how to fix this. My quick response would be to be thoroughly and mark everything as needing 'shouldClone'. Which would possible cause regressions...

I'm also interested to know the feasibility of never replacing instructions, e.g. always keep the resumepoint instructions the same. And only be aggressive with optimizations on the non-resumepoint uses. That way bailing out will always have the right values... But we can be much more aggressive than we are now with optimizations ... And hope we can remove most resumepoint uses through the recover instructions
Flags: needinfo?(hv1989) → needinfo?(nicolas.b.pierron)
I landed the patch for extra spew in bug 1240055. In order to keep this bug dedicated on solving the issue.
Ok, the problem is not related to MPhi, but probably to the MToInt32 of Math.pow that is added early in the pipeline.  The following test case suffer from the same issue, without any use of MPhi instructions.


function g() { with({}){}; }
function f(y, x) {
    var a = y >>> 0;
    a = a - 1 + 1;
    g(); // Capture the truncate result after the call.
    var b = Math.pow(x / 2, x); // bailout.
    return ~(a + b);
}
f(-1, 0)
print(f(-1, 1))
Has STR: --- → yes
Same test case without MPow, nor MToInt32.

function g() { with({}){}; }
function f(y, x) {
    var a = y >>> 0;
    a = a - 1 + 1;
    g(); // Capture the truncate result after the call.
    var b = x / 2; // bailout.
    return ~(a + b);
    // also works with  return ~((a + b) + 1 - 1);
}
f(-1, 0)
print(f(-1, 1))


The problem of this test case is that we are producing a fractional part.
The fractional part is used to overflow the Int32 range, such that the negate operation make this change be reflected on all the bits.

It sounds that early specialization made of some math operations (MPow, MDiv, MMod, …) which are using the IC to truncate the results of the operation are fine, as long as we keep the remaining results as-is.

The problem is that the specialization of these operations to Int32-results is equivalent of removing a branch which handles numbers with a fractional part.  Thus, we should follow the same logic as when we remove a branch and flag other instructions as having removed uses.
The problem seen in this bug is that we are doing the truncation on a branch
of the data-flow, and recall this truncated result instead of the orginal
one.  As we bailout, we resume the execution with the truncated data instead
of the original one.

This patch assumes that if there is a resume point, then we might have
snapshot build with it.  Thus we might have a bailout which might have used
the non-truncated version in the original code.  Thus "isCaptured" implies
"hasRemoveUsed" (not isUSeRemoved) if-and-only-if we are going to truncate
the result.

I did not noticed any octane / kraken / sunspider regression while running
benchmarks with this patch.

Side-note 1, for slices of code where the results are not captured by any
resume points / recovered instruction, then we will truncate without cloning
the instruction. The reason being that if there is a bailout after, it will
use a resume point which does not indirectly capture this result.

Example:

 1  MCall
      ResumePoint
 2  MAdd .. ..
 3  MAdd 2 2
 4  MDiv .. .. // bailout
 5  MOr 3 4
    [...]

  In this example, (2) and (3) can safely be truncated, because their
  results is not caputred by any resume point, and thus their hypotetical
  recover instruction clone would have no uses.

Side-note 2, we might want to revisit the choice of adding resume points
inside for loops, which was made to reduce the liveness of input variables.
While recover instruction might increase the liveness of variables because
they are captured by resume points.
Attachment #8722550 - Flags: review?(hv1989)
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8722550 [details] [diff] [review]
RangeAnalysis: Assume that all captured results are used in bailing branches.

Review of attachment 8722550 [details] [diff] [review]:
-----------------------------------------------------------------

De-complicating the code. Awesome!
Attachment #8722550 - Flags: review?(hv1989) → review+
https://hg.mozilla.org/mozilla-central/rev/b4300d783a34
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Depends on: 1253586
Posted file regalloc.diff
Brian, do you have any idea how we should proceed to fix the following issues?

As there is a performance issue, and recover instructions have the potential to increase the live-span of virtual registers, I looked at the differences of regalloc caused by the current patch, 

While looking at the graph, I noticed that we are not even using all the registers.  In particular, r9 which was used before is not used anymore, still registers such as r12, r13, r14 and r15 remain unused in both compilations.

I can think of some work-around that I have to test, but this sounds like a regalloc issue, and I have no idea how to investigate it yet.  It sounds to me that we should attempt to use the unused registers instead of spilling content to the stack, especially if we are reloading the values after.

Also, I do yet understand how the backtracking allocator is able to produce code such as:

[MoveGroup] [stack:44 -> rdi]
[338,339 AddI] [def v127<i>:rdi] [use v120:r rdi] [use v126:r? rbx]
[MoveGroup] [rdi -> stack:44]
[MoveGroup] [stack:44 -> rdi]


This diff got generated by running a debug build of the JS shell without (---) and with (+++)  attachment 8722550 [details] [diff] [review], by using the following command:

  js --no-asmjs --ion-offthread-compile=off ./ubench.js -- ./corrections.js 4
Flags: needinfo?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> Created attachment 8727442 [details]
> regalloc.diff
> 
> Brian, do you have any idea how we should proceed to fix the following
> issues?

Can you attach the full output using IONFLAGS=regalloc and point to the spots we should be generating better code?

> Also, I do yet understand how the backtracking allocator is able to produce
> code such as:
> 
> [MoveGroup] [stack:44 -> rdi]
> [338,339 AddI] [def v127<i>:rdi] [use v120:r rdi] [use v126:r? rbx]
> [MoveGroup] [rdi -> stack:44]
> [MoveGroup] [stack:44 -> rdi]

Well, before the AddI stack:44 is storing v120, and afterwards it is storing v127 and v120 has been loaded into rdi.  The allocator will try to use the same stack slot for different live ranges if they do not overlap.  It's hard to tell how reasonable this is without seeing the entire regalloc spew.
The previous file should include the full content, within a diff, otherwise, this is the full log of the output without any diff annotation.

I would expect us to use any of the unused registers such as r9, r12, r13, r14 or r15 instead of "stack:44", in Block 23, 25, 27 and 31.
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> I would expect us to use any of the unused registers such as r9, r12, r13,
> r14 or r15 instead of "stack:44", in Block 23, 25, 27 and 31.

It looks like the associated live ranges do not have any register uses, which will cause the register allocator to put the range in memory even if there are available registers.  When developing the allocator it seemed that speed was more or less proportional to the number of instructions executed, regardless of the number of register vs. stack operands.

The main question I have here is what effect this patch series had on vreg live ranges in the hot parts of the function.  Extending live ranges is a huge problem for the register allocator, especially on x86/x64 where so many instructions reuse an input register for the output.  If the input vreg is still live after the instruction then the allocator is forced to copy it beforehand.
Flags: needinfo?(bhackett1024)
Is this a recent regression, in 46? Should we be tracking this?
Flags: needinfo?(hv1989)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #17)
> Is this a recent regression, in 46? Should we be tracking this?

Not recent. Already before early 2015 in tree.
Flags: needinfo?(hv1989)
(In reply to Brian Hackett (:bhackett) from comment #16)
> (In reply to Nicolas B. Pierron [:nbp] from comment #15)
> > I would expect us to use any of the unused registers such as r9, r12, r13,
> > r14 or r15 instead of "stack:44", in Block 23, 25, 27 and 31.
> 
> It looks like the associated live ranges do not have any register uses,
> which will cause the register allocator to put the range in memory even if
> there are available registers.  When developing the allocator it seemed that
> speed was more or less proportional to the number of instructions executed,
> regardless of the number of register vs. stack operands.

This is what the code let us infer, but then I do not understand why we would produce these 2 lines:

[MoveGroup] [stack:44 -> rdi]
[338,339 AddI] [def v127<i>:rdi] [use v120:r rdi] [use v126:r? rbx]

As we are moving something that we considered to have no register uses, and we move it to a register to do a computation.  Which I guess correspond to the way we lower[1] the value, but sounds wrong as we are spilling the value to the stack because we have no explicit register use.

[1] https://dxr.mozilla.org/mozilla-central/source/js/src/jit/x86-shared/Lowering-x86-shared.cpp#120

> The main question I have here is what effect this patch series had on vreg
> live ranges in the hot parts of the function.  Extending live ranges is a
> huge problem for the register allocator, especially on x86/x64 where so many
> instructions reuse an input register for the output.  If the input vreg is
> still live after the instruction then the allocator is forced to copy it
> beforehand.

Yes, this patch basically consider all uses of resume points as having removed uses, because it would be a correctness issue to resume with a truncated result while we are doing a bailout because of the lack of consideration for the fractional part which flow into the truncation.  The correct alternative to that is to compute double math, so an extra register->register move sounds unavoidable, unless we can remove the resume points.
OK, thanks Hannes, I won't track this then.
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> This is what the code let us infer, but then I do not understand why we
> would produce these 2 lines:
> 
> [MoveGroup] [stack:44 -> rdi]
> [338,339 AddI] [def v127<i>:rdi] [use v120:r rdi] [use v126:r? rbx]
> 
> As we are moving something that we considered to have no register uses, and
> we move it to a register to do a computation.  Which I guess correspond to
> the way we lower[1] the value, but sounds wrong as we are spilling the value
> to the stack because we have no explicit register use.

The allocation step that seems most relevant is here:

[RegAlloc] Allocating  v116 [303,305) (def) v116:r?@304 ## v117 [305,312) (def) v117:r@307 v117:r@311 ## v120 [314,339) v120:r?@338 ## v127 [339,342) (def) [priority 37] [weight 324]
[RegAlloc]   Requirement rax, fixed by definition
[RegAlloc]   rax collides with  v119 [311,314) rax (def) [weight 666]
[RegAlloc]   bundle does not contain cold code
[RegAlloc]   bundle is defined by a register
[RegAlloc]   split after last register use at 308
[RegAlloc]     splitting bundle  v116 [303,305) (def) ## v117 [305,312) (def) ## v120 [314,339) ## v127 [339,342) (def) into:
[RegAlloc]        v116 [303,305) (def) v116:r?@304 ## v117 [305,308) (def) v117:r@307
[RegAlloc]        v117 [310,312) v117:r@311 ## v120 [338,339) v120:r?@338 ## v127 [339,340) (def)
[RegAlloc]        v116 [304,305) ## v117 [306,312) ## v120 [314,339) ## v127 [340,342)

v120 has been separated into two bundles which are allocated separately.  One contains just the register use at 338, and one contains no register uses at all, and is spilled to the stack.  The allocator needs to insert a move between the two bundles at some point, which is the [MoveGroup] above.

Now, the allocator's behavior here isn't all that good.  The 'split after last register use' is an allocation strategy that is supposed to trim stack uses off the end of a bundle to shorten its ranges and allow things to be spilled which are never reloaded.  The presence of a conflicting range for v119 has caused it to split at an early point, 308, which has an effect similar to what would have happened if it had split at all register uses (which is the allocator's fallback strategy --- it will always make it easier to get a valid allocation, but the resulting code will be pretty bad).  The allocator could probably do better in cases like this, where there is a builtin conflict because of the fixed register uses imposed by DivI and UDivOrMod, but I don't see why that is relevant to this bug.

> > The main question I have here is what effect this patch series had on vreg
> > live ranges in the hot parts of the function.  Extending live ranges is a
> > huge problem for the register allocator, especially on x86/x64 where so many
> > instructions reuse an input register for the output.  If the input vreg is
> > still live after the instruction then the allocator is forced to copy it
> > beforehand.
> 
> Yes, this patch basically consider all uses of resume points as having
> removed uses, because it would be a correctness issue to resume with a
> truncated result while we are doing a bailout because of the lack of
> consideration for the fractional part which flow into the truncation.  The
> correct alternative to that is to compute double math, so an extra
> register->register move sounds unavoidable, unless we can remove the resume
> points.

Can you be more specific?  Which vregs had their live ranges extended, and what effect did this have on the register allocation?
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Brian Hackett (:bhackett) from comment #21)
> (In reply to Nicolas B. Pierron [:nbp] from comment #19)
> > > The main question I have here is what effect this patch series had on vreg
> > > live ranges in the hot parts of the function.  Extending live ranges is a
> > > huge problem for the register allocator, especially on x86/x64 where so many
> > > instructions reuse an input register for the output.  If the input vreg is
> > > still live after the instruction then the allocator is forced to copy it
> > > beforehand.
> > 
> > Yes, this patch basically consider all uses of resume points as having
> > removed uses, because it would be a correctness issue to resume with a
> > truncated result while we are doing a bailout because of the lack of
> > consideration for the fractional part which flow into the truncation.  The
> > correct alternative to that is to compute double math, so an extra
> > register->register move sounds unavoidable, unless we can remove the resume
> > points.
> 
> Can you be more specific?  Which vregs had their live ranges extended, and
> what effect did this have on the register allocation?

Basically every instruction which is captured by a resume point, and recovered on bailout would have its operand captured by the resume point as well, if one of the following instructions encode a snapshot.  Thus increasing the live ranges of the operands of recover instruction to the last snapshot (resume point) which captures it.

The alternative solution (Bug 1253586), which would work in this special case because we have no side-effect in this innermost loop, would be to get rid of all the resume point (including entry resume points) in the innermost loop, thus leaving only the entry resume point of the loop, and not capturing any math operations, and no recover instructions.

Seeing the above spilling issue sounded to me as a more urgent issue, as we might be using the stack while we still have a pile of spare registers.

side-note: The move-group issue mentioned before corresponds to what is contained in block 31.



(In reply to Brian Hackett (:bhackett) from comment #21)
> v120 has been separated into two bundles which are allocated separately. 
> One contains just the register use at 338, and one contains no register uses
> at all, and is spilled to the stack.  The allocator needs to insert a move
> between the two bundles at some point, which is the [MoveGroup] above.

So the problem is that splitting the range remove the register use requirement that we had on the bundle.  Should we add a new requirement kind which is based on the initial bundle?  Such that we attempt the allocate both parts as registers, but reduce the weight of the one which does not have any explicit register use?
Flags: needinfo?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #24)
> > Can you be more specific?  Which vregs had their live ranges extended, and
> > what effect did this have on the register allocation?
> 
> Basically every instruction which is captured by a resume point, and
> recovered on bailout would have its operand captured by the resume point as
> well, if one of the following instructions encode a snapshot.  Thus
> increasing the live ranges of the operands of recover instruction to the
> last snapshot (resume point) which captures it.

My question was about the specific vregs in this test case.  Which vregs in performance sensitive parts of this testcase had their live ranges extended?

> Seeing the above spilling issue sounded to me as a more urgent issue, as we
> might be using the stack while we still have a pile of spare registers.

I disagree, see comment 16.
Flags: needinfo?(bhackett1024)
Attachment #8727442 - Attachment mime type: text/x-patch → text/plain
(In reply to Brian Hackett (:bhackett) from comment #25)
> My question was about the specific vregs in this test case.  Which vregs in
> performance sensitive parts of this testcase had their live ranges extended?

Here is a sub-part of attachement 8727442, which I think answer your question from comment 16, about what register got their live range extended, as these are the only live range differences between the 2 graphs.


>  Live ranges by virtual register:
>    v45 [97,100) (def)
> -  v46 [107,109) (def) v46:r?@108 ## v46 [108,118) v46:*@111 v46:*@117
> +  v46 [107,109) (def) v46:r?@108
>    v47 [109,116) (def) v47:r@111 v47:r@115
>    […]
>    v119 [311,314) (def)
> -  v120 [314,340) v120:r?@339
> +  v120 [314,339) v120:r?@338
>    v121 [317,320) (def) v121:r@319
>    v122 [323,326) (def) v122:r@325
>    v123 [327,330) (def)
>    v124 [331,334) (def)
>    v125 [334,337) v125:r?@336
> -  v126 [337,339) (def) v126:r?@338
> +  v126 [337,340) (def) v126:r?@339
>    v127 [339,342) (def)
>    […]

I do not think this could be v46, as it is not involved inside the loop body.  Thus this must be either v120 and/or v126.

This is also the reason why I think is related to the fact that we spill this register on the stack instead of keeping it in a register, as you mentioned in comment 21.

(In reply to Brian Hackett (:bhackett) from comment #25)
> (In reply to Nicolas B. Pierron [:nbp] from comment #24)
> > Seeing the above spilling issue sounded to me as a more urgent issue, as we
> > might be using the stack while we still have a pile of spare registers.
> 
> I disagree, see comment 16.

Knowing that these 3 live ranges are the only 3 differences, and that v120 got split and spilled on the stack.  I think this is likely the issue we are looking for.
Flags: needinfo?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #26)
> (In reply to Brian Hackett (:bhackett) from comment #25)
> > My question was about the specific vregs in this test case.  Which vregs in
> > performance sensitive parts of this testcase had their live ranges extended?
> 
> Here is a sub-part of attachement 8727442, which I think answer your
> question from comment 16, about what register got their live range extended,
> as these are the only live range differences between the 2 graphs.
> 
> >  Live ranges by virtual register:
> >    v45 [97,100) (def)
> > -  v46 [107,109) (def) v46:r?@108 ## v46 [108,118) v46:*@111 v46:*@117
> > +  v46 [107,109) (def) v46:r?@108
> >    v47 [109,116) (def) v47:r@111 v47:r@115
> >    […]
> >    v119 [311,314) (def)
> > -  v120 [314,340) v120:r?@339
> > +  v120 [314,339) v120:r?@338
> >    v121 [317,320) (def) v121:r@319
> >    v122 [323,326) (def) v122:r@325
> >    v123 [327,330) (def)
> >    v124 [331,334) (def)
> >    v125 [334,337) v125:r?@336
> > -  v126 [337,339) (def) v126:r?@338
> > +  v126 [337,340) (def) v126:r?@339
> >    v127 [339,342) (def)
> >    […]
> 
> I do not think this could be v46, as it is not involved inside the loop
> body.  Thus this must be either v120 and/or v126.

The change with v120/v126 is that the operands to the AddI at 338 ended up being swapped somehow (some random perturbation in the MIR?) rather than there being a change in live ranges.  If that's the case then this patch isn't really at fault at all.  Small perturbations in the MIR are capable of having a large effect on the allocation, which is unfortunate but unavoidable imo.  Any performance difference before/after on this testcase should not I think prevent this patch from landing (are there other regressions that can be more meaningfully attributed to this patch?).

Fixing the regalloc to perform better on the allocation in comment 21 or to better utilize spare registers (I think it would be best to do this by modifying the pass that assigns physical stack slots to virtual stack slots) would both be good things to do, but I don't think such improvements are relevant to this bug.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #27)
> The change with v120/v126 is that the operands to the AddI at 338 ended up
> being swapped somehow (some random perturbation in the MIR?) rather than
> there being a change in live ranges.  If that's the case then this patch
> isn't really at fault at all.  Small perturbations in the MIR are capable of
> having a large effect on the allocation, which is unfortunate but
> unavoidable imo.  Any performance difference before/after on this testcase
> should not I think prevent this patch from landing (are there other
> regressions that can be more meaningfully attributed to this patch?).

This is the only function compiled in asmjs-ubench-corrections, and a-part from the mir graph looking different due to additional recover instructions.  This is smallest difference which has the potential of causing such performance issue.

One reason, would be the extra latency of the compilation, but knowing that we OSR in the unskipable inner loop, I would except that we jump into the compiled code as soon as it is compiled.

A second reason, would be the memory used by the compiler and for the compilation, but I do not see how this could influence the execution knowing that the memory encoded for the snapshots will remain unused until the end of the program.

Hannes, based on the above messages, how should we proceed with this patch?
Flags: needinfo?(hv1989)
(In reply to Nicolas B. Pierron [:nbp] from comment #28)
> Hannes, based on the above messages, how should we proceed with this patch?

Since you looked into asmjs-ubench-corrections, I looked into crypto. I also think that crypto regression is more important.
The issue being that recover instructions can increase the lifeness of the operands, which is detrimental for dense loops.
(Especially on 32bit, since register pressure).

E.g.
...
MAdd a b
MNop
...
Only needs to keep the result, but with recover instructions needs to keep the result, a and b! That is 3x increase in variables that need to kept life.
(This is possibly an issue when using recover instructions even more and need to get addressed, but this will be a bigger change and not quite suitable for fixing and maybe uplifting in this time frame.)

Now incorrectly we removed instructions. This is incorrect in some cases. E.g. in the case above.
Instead now we use a recover instruction in most cases. Which increases lifeness.

I thought about other solutions (RUndoAdd) but I think the best solution would be:
to still sometimes remove instructions instead of always using a recover instruction.
In the case of an instruction flows directly into BitAnd/BitOr/BitRsh ... we still can eagerly truncate that instruction without cloning.
We can do this because we are not eagerly changing the graph, given type information. But know certainly
that there is no way that the truncated result would be wrong.

(=> no uses removed and all uses are BitAnd/BitOr/BitRsh or resumepoint)

A quick and dirty patch suggests this allows us to land your simplification patch
and not have this regression on crypto!

@nbp: can you try this approach?
Flags: needinfo?(hv1989)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Hannes Verschore [:h4writer] from comment #29)
> In the case of an instruction flows directly into BitAnd/BitOr/BitRsh ... we
> still can eagerly truncate that instruction without cloning.

Unfortunately, this is not true if the instruction has removed uses, but I think I can restore half of the removed changes and add an additional condition.
This patch does what got suggested in comment 29.

Unfortunately I had to re-add the hasUseRemoved and the isObservableResult
flags.  These flags are used to determine if it is safe to always truncate
the value.  Thus I am replacing the old patch with this one.

When an operation explicitly truncated, the value of kind is set to
MDefinition::Truncate, as opposed to MDefinition::IndirectTruncate set by
MAdd.

To determine if it is safe, we have to check that all uses are explicitly
truncating the value, which implies that we should check that we have no
removed uses, and that the value does not flow in any observable slots of a
resume points.
Attachment #8735891 - Flags: review?(hv1989)
Attachment #8722550 - Attachment is obsolete: true
Comment on attachment 8735891 [details] [diff] [review]
RangeAnalysis: Assume that all captured results are used in bailing branches.

Review of attachment 8735891 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! This indeed fixes the crypto regression.
Attachment #8735891 - Flags: review?(hv1989) → review+
Flags: needinfo?(nicolas.b.pierron)
https://hg.mozilla.org/mozilla-central/rev/8ce603f20bac
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: mozilla47 → mozilla48
Hannes, do you mind nominating this for approval‑mozilla‑esr45 as well? It would aid fuzzing on that branch.
Flags: needinfo?(hv1989)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #35)
> Hannes, do you mind nominating this for approval‑mozilla‑esr45 as well? It
> would aid fuzzing on that branch.

Correcting the bookkeeping. This bug has been solved by nbp. Therefore forwarding to nbp.
Assignee: hv1989 → nicolas.b.pierron
Flags: needinfo?(hv1989) → needinfo?(nicolas.b.pierron)
Comment on attachment 8735891 [details] [diff] [review]
RangeAnalysis: Assume that all captured results are used in bailing branches.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
Differential testing request (comment 35)

Feature/regressing bug #: Bug 1072188
User impact if declined: low
Fix Landed on Version: 48
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch:  N/A
Flags: needinfo?(nicolas.b.pierron)
Attachment #8735891 - Flags: approval-mozilla-release?
Attachment #8735891 - Flags: approval-mozilla-esr45?
Attachment #8735891 - Flags: approval-mozilla-beta?
Attachment #8735891 - Flags: approval-mozilla-aurora?
Comment on attachment 8735891 [details] [diff] [review]
RangeAnalysis: Assume that all captured results are used in bailing branches.

We had this bug from 35. What is the rush to take into a dot release?
(or esr)
Attachment #8735891 - Flags: approval-mozilla-release? → approval-mozilla-release-
I don't understand the importance either, though better fuzzing sounds like a good thing. Nicolas can you explain more to  me, what is the risk if we don't take this change on beta?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Sylvestre Ledru [:sylvestre] from comment #38)
> We had this bug from 35. What is the rush to take into a dot release?
> (or esr)

I do not know, Gary wanted to fuzz on esr. (comment 35)

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #39)
> I don't understand the importance either, though better fuzzing sounds like
> a good thing. Nicolas can you explain more to  me, what is the risk if we
> don't take this change on beta?

Except for the correctness issue while going back to baseline, there is not much risk.
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8735891 [details] [diff] [review]
RangeAnalysis: Assume that all captured results are used in bailing branches.

Makes sense to take this so we can more accurately find issues in esr. 
This should also land for beta 11.
Attachment #8735891 - Flags: approval-mozilla-esr45?
Attachment #8735891 - Flags: approval-mozilla-esr45+
Attachment #8735891 - Flags: approval-mozilla-beta?
Attachment #8735891 - Flags: approval-mozilla-beta+
Attachment #8735891 - Flags: approval-mozilla-aurora?
Attachment #8735891 - Flags: approval-mozilla-aurora+
has problems to apply to esr45
Tomcats-MacBook-Pro-2:mozilla-central Tomcat$ hg graft --edit -r af47156de41a41fe20cb148971e8569a75b9aa2b
grafting 339111:af47156de41a "Bug 1239075 - RangeAnalysis: Assume that all captured results are used in bailing branches. r=h4writer, a=lizzard"
merging js/src/jit/RangeAnalysis.cpp
warning: conflicts while merging js/src/jit/RangeAnalysis.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(
Flags: needinfo?(nicolas.b.pierron)
(In reply to Carsten Book [:Tomcat] from comment #45)
> has problems to apply to esr45
> warning: conflicts while merging js/src/jit/RangeAnalysis.cpp

Testing locally, this is blocked by the fact that Bug 1215921 did not got cherry-picked on 45.
I think we should consider taking changeset b4046ca29b14 [1] from Bug 1215921 as well.

[1] https://hg.mozilla.org/mozilla-central/rev/b4046ca29b14
Flags: needinfo?(nicolas.b.pierron)
Sylvestre, before we uplift the work in bug 1215921 I wanted to double check with you what you think about this work for esr45.
Flags: needinfo?(sledru)
Should be possible with bug 1215921 (only one patch is needed)
Flags: needinfo?(sledru) → needinfo?(cbook)
(In reply to Sylvestre Ledru [:sylvestre] from comment #48)
> Should be possible with bug 1215921 (only one patch is needed)

done sir!
Flags: needinfo?(cbook)
You need to log in before you can comment on or make changes to this bug.