Closed Bug 814997 Opened 7 years ago Closed 7 years ago

IonMonkey: safepoints shouldn't artificially extend vreg lifetimes

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: bhackett, Assigned: bhackett)

References

Details

Attachments

(2 files)

Here is the body of the main loop in v8-crypto's am3:

  while(--n >= 0) {
    var l = this_array[i]&0x3fff;
    var h = this_array[i++]>>14;
    var m = xh*l+h*xl;
    l = xl*l+((m&0x3fff)<<14)+w_array[j]+c;
    c = (l>>28)+(m>>14)+xh*h;
    w_array[j++] = l&0xfffffff;
  }

The loop body is almost all straight line code, with only one side effect near the end.  A resume point will be placed at the start of the body, and the next one will not be until after the store.  This initial resume point is used for all safepoints up to the store, and those safepoints need to preserve the original state at the start of the body.  For this loop, that means keeping the original values of i, j and c alive until the store, which for the former two dramatically extends their live ranges and necessitates additional spilling in the regalloc.

If there were resume points after the writes to local variables in the loop, the lifetimes of the dead values at the start of the body would not be artifically extended this way and we would be able to generate better code.  I don't think there are really any drawbacks to this in terms of being able to apply optimizations or generate quality code, but need to try this change out and see.
It turns out that adding new resume points within a block can also extend the lifetimes of SSA values, if those resume points capture values which are created after the old resume point and are dead by the time of the new resume point --- the value will still be on the interpreter stack, which is exactly reconstructed.  This patch relaxes the latter restriction, by adding a short pass that changes provably dead resume point operands into constant undefined values.  This is done as a block local liveness analysis.

Doesn't affect v8 scores (need part 2 for that), but curiously enough using -D points to a 30% reduction in spill code executed on the (not time based) check-v8-crypto in jit-tests.  Still need to see whether that is bogus or not.
Assignee: general → bhackett1024
Attachment #685583 - Flags: review?(dvander)
(In reply to Brian Hackett (:bhackett) from comment #1)
> Doesn't affect v8 scores (need part 2 for that), but curiously enough using
> -D points to a 30% reduction in spill code executed on the (not time based)
> check-v8-crypto in jit-tests.  Still need to see whether that is bogus or
> not.

The reduction here is real, but the problem is that I've been measuring bytes of executed code rather than the actual number of instructions.  With this patch LSRA uses stack slots with smaller offsets (presumably less contention for them) which allows a more compact encoding, but the number of executed instructions hasn't changed.
(In reply to Brian Hackett (:bhackett) from comment #1)
> Created attachment 685583 [details] [diff] [review]
> part 1 (45c61a3e9678)
> 
> It turns out that adding new resume points within a block can also extend
> the lifetimes of SSA values, if those resume points capture values which are
> created after the old resume point and are dead by the time of the new
> resume point.

Have a look at IsPhiObservable I am sure some of the logic might be shared between the 2 phases.
Attached patch part 2Splinter Review
This patch allows introducing new resume points at arbitrary points in the MIR.  For now this is done after pop and arithmetic instructions inside loop bodies, which is enough for am3 but may need refinement in the future.  Per comment 0 this shortens the live ranges of i, c and other terms considerably, but does not affect LSRA's behavior all that much.  I think that a more sophisticated allocator (bug 814966) will be needed to really benefit from these changes on this benchmark.
Attachment #685778 - Flags: review?(dvander)
Attachment #685583 - Flags: review?(dvander) → review+
Attachment #685778 - Flags: review?(dvander) → review+
Oh, one concern on the earlier patch: is it possible for MResumePoint's instruction to be eliminated, but not the resumepoint itself? If so that could be problematic.

It would be nice to know where this benefits before checking in - if we need a new register allocator to see it, it might be worth waiting.
Depends on: 816786
(In reply to David Anderson [:dvander] from comment #5)
> Oh, one concern on the earlier patch: is it possible for MResumePoint's
> instruction to be eliminated, but not the resumepoint itself? If so that
> could be problematic.

After the two patches the instruction associated with an MResumePoint should not be able to be moved or eliminated (I needed to tweak EliminateDeadCode in the second patch to ensure the latter).

> It would be nice to know where this benefits before checking in - if we need
> a new register allocator to see it, it might be worth waiting.

After applying these patches and the patches in bug 773666 (needs this to be effective) and bug 816786 (fixes a regression caused by having too many resume points), approximating executed Ion instructions with -D I get a 2% overall reduction on v8 and 1% on ss (spill instructions are reduced by 10% and 6% respectively).  ss is regressing a couple % for me though, for no obvious reason (e.g. on benchmarks whose executed instructions did not change at all), need to look into this some more.
Part 1.  Testing this by itself, it seemed to leave ss alone (or a slight improvement).

https://hg.mozilla.org/integration/mozilla-inbound/rev/0952f7c80055
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/6477f8bf2be5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
(In reply to Brian Hackett (:bhackett) from comment #7)
> Part 1.  Testing this by itself, it seemed to leave ss alone (or a slight
> improvement).
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/0952f7c80055

This patch caused a 14.3% slowdown on misc-bugs-612930-solve-sudoku at the date of the 30 of November: http://arewefastyet.com/static/#machine=11&view=breakdown&suite=misc
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #11)
> (In reply to Brian Hackett (:bhackett) from comment #7)
> > Part 1.  Testing this by itself, it seemed to leave ss alone (or a slight
> > improvement).
> > 
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/0952f7c80055
> 
> This patch caused a 14.3% slowdown on misc-bugs-612930-solve-sudoku at the
> date of the 30 of November:
> http://arewefastyet.com/static/#machine=11&view=breakdown&suite=misc

Weird, this should be a strict speed improvement (except a minor ding to compilation time), will look into this.
Testing this locally, I do not see any performance difference on misc-bugs-612930-solve-sudoku whether I enable or disable the changes made in this bug.
Depends on: 820873
You need to log in before you can comment on or make changes to this bug.