Closed Bug 1396822 Opened 7 years ago Closed 3 years ago

Remove dead branches from IonBuilder.

Categories

(Core :: JavaScript Engine: JIT, enhancement, P5)

enhancement

Tracking

()

RESOLVED INCOMPLETE
Performance Impact medium
Tracking Status
firefox61 --- fix-optional

People

(Reporter: nbp, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(1 file)

Currently, we are generating a lot of branches which would bailout on the first execution for invalidation reasons.  Such as empty MFilterTypeSet (Bug 1377710), or empty type barriers.  These code paths will always bailout and invalidate the generated code, until we record them in Baseline.

We could remove these branches under the branch pruning phase, but this has the side effect of still adding a lot of extra cost to IonBuilder, while being of no use.

Doing the same from IonBuilder, could reduce the memory usage, by not generating MIR, and remove overhead from the main thread.

Unfortunately, removing branches from IonBuilder implies that we should flag the some of the slots at the entry of the removed basic blocks as having removed uses, as we are removing the branches.

Fortunately, we can probably do the same thing, as done in branch pruning, in a way which is both faster and works on the CFG, i-e before generating any MIR node.
Priority: -- → P3
For information, I currently have a dirty working directory with changes to make this work.
Unfortunately I still have issues to investigate to fix it completely.
This adds a logic similar made for Branch Pruning, but for the moment
focussed on code paths which are known to always bailout.

This patch does 2 things:
  1. It adds a way to collect the use-flags information.
  2. Add a MUnreachable if branches are unreachable because of an existing
  empty type set.

1/ In order to bailout correctly when we are removing branches, we have to
falg a bunch of instructions as having removed uses.  This apporach might be way
too conservative, but it guarantees that we will never bail out with an
optimized-out magic value, while the branch or one of the instruction after
depends on the value.

This is computed by using a bit mask where 1 bit corresponds to one
local/stack slot.  First we compute what each instruction depends on, and
then we populate this information to the predecessor blocks until we reach a
fix-point.

2/ When a MFilterTypeSet has an existing empty type set, the CodeGenerator
will always bailout.  This adds a function endIfUnreachable which end the
target branch by MUnreachable. This instruction should fail if it is ever
reached, thus if we made a mistake in removing a branch.

This might causes extra issues with OSR if we Ion compile on the first
execution of a loop (Interpreter => Baseline OSR => Ion OSR).  Currently
this patch does not attempt to fix this issue, and follow the same logic as
we did when we attempt to OSR in catch-blocks, i.e. to disable Ion
compilations.
Attachment #8932129 - Flags: review?(tcampbell)
Attachment #8932129 - Flags: feedback?(jdemooij)
Do you have benchmark numbers for at least Sunspider/Kraken/Octane/Speedometer?
Looking at Octane, there is a major regression on octane-raytrace (96k -> 44k) locally.
I will investigate and make an additional patch, to add heuristics if needed.
If you are still playing with patch, I'd prefer the BitSet stuff be split into its own patch file.
Comment on attachment 8932129 [details] [diff] [review]
Remove dead branches from IonBuilder. r=

Review of attachment 8932129 [details] [diff] [review]:
-----------------------------------------------------------------

I like the overall idea, but I think we need another iteration. Cancelling review.

Running this patch on a debug browser trips up assertions on Netflix. Please investigate.

I'd like to see the follow-up to your regression investigation too.

::: js/src/jit/BitSet.cpp
@@ +61,5 @@
>          bits[i] &= ~otherBits[i];
>  }
>  
>  void
> +BitSet::intersect(const BitSet& other, size_t nBits)

This behavior isn't great. If this was only an optimization, it seems hardly worth the complexity for typical sizes. Can we make the pass work with original intersect implementation?

@@ +144,5 @@
> +
> +bool
> +BitSetVector::init(TempAllocator& alloc, bool defaultBit)
> +{
> +    size_t sizeRequired = numEntries_ * BitSet::RawLengthForBits(numBits_);

This multiplication should be checked. On 32-bit this seems to risk being exploited.

::: js/src/jit/BitSet.h
@@ +22,5 @@
>  // of bits.
>  class BitSet
>  {
>    public:
> +    using bits_t = uint32_t;

I like adding the bits_t abstraction :)

@@ +183,5 @@
>      }
>  };
>  
> +// This is used to avoid many Bit set allocation with the same size.
> +class BitSetVector

Overall, I'm not convinced that have a single complex allocation saves a whole lot over a series of LifoAlloc allocations in a vector. If you can measure that's fine, but otherwise I'd prefer simpler code.

::: js/src/jit/IonBuilder.cpp
@@ +3005,5 @@
> +    // reports a flag for all locals and stack variable usage.
> +    MResumePoint* rp = block->entryResumePoint();
> +    size_t stackDepth = rp->stackDepth();
> +    BitSet unused = cfg->unusedFlags(cfgid);
> +    if (!unused.getNumBits()) {

This is completely un-intuitive. Why does getNumBits() == 0 imply try block? Better comment please.

::: js/src/jit/IonControlFlow.cpp
@@ +2257,5 @@
> +
> +        // Check the control flow instruction, and update the stack depth value
> +        // accordingly. Then update the depth at the entry of the successors.
> +        CFGControlInstruction* ins = block->stopIns();
> +        switch (ins->type()) {

Have a default statement guard (even if debug-only)

@@ +2318,5 @@
> +    //
> +    // Bit which are set represent unused slots, we aggregate the used-flags (0)
> +    // by doing an intersection of the bit which are set.
> +    Vector<size_t, 8, SystemAllocPolicy> backedges;
> +    for (size_t i = blocks_.length() - 1, e = i; i <= e; ) {

Add explicit assertion that blocks_.length() > 0 somewhere above as a form of documentation.

::: js/src/jit/IonControlFlow.h
@@ +847,3 @@
>          if (!cfg)
>              return nullptr;
> +        if (!cfg->init(alloc, blocks_, unusedInputSlots_))

Would be nice if we didn't need to pass unusedInputSlots_ to bother ::New and ::init
Attachment #8932129 - Flags: review?(tcampbell)
Comment on attachment 8932129 [details] [diff] [review]
Remove dead branches from IonBuilder. r=

Clearing f? for now.
Attachment #8932129 - Flags: feedback?(jdemooij)
Status: NEW → ASSIGNED
Flags: needinfo?(nicolas.b.pierron)
Priority: P3 → P1
Severity: normal → enhancement
Blocks: 1477211
In this profile [1] with have an IonCompile which last 144ms with a restartLoop take 35ms, and a lot of time is spent in creating basic blocks and resume points, which is usually a side effects of having tons of live variables within one frame.

This bug might help reduce this overhead by removing branches sooner than the BranchPruning phase. Which should reduce the amount of time in Basic block and resume points creation.

[1] https://perf-html.io/public/af86f00f11f7c7719ed6018a24dab7a86ecc507b/flame-graph/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=2573-0~&range=3.3981_8.3281&thread=5&v=3
Keywords: perf
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1:64]
Whiteboard: [qf:p1:64] → [qf:p1:f64]
Changing QF Target to qf:p1:f67, we don't believe we can provide a fix in the FF64 time frame.
Whiteboard: [qf:p1:f64] → [qf:p1:f67]
Blocks: 1490847
Whiteboard: [qf:p1:f67] → [qf:p2]
Since bug 1489142 landed, this issue is less urgent.

What remains is the extra time added by the compilation of zero-hit blocks and the inlining which occurs in them. I opened Bug 1507432 to address the inlining issue in the mean time as it should be a simpler patch than the current one.

Once Bug 1507432 is fixed, what will remain would be the compilation overhead of large functions and can only be addressed with this issue.
Priority: P1 → P2

IonMonkey is mostly disabled, we no longer care about doing these kind of optimization at the moment.

Flags: needinfo?(nicolas.b.pierron)
Priority: P2 → P5
Assignee: nicolas.b.pierron → nobody
Status: ASSIGNED → NEW

We've fully moved from IonBuilder to Warp now.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE

Is it possible https://bugzilla.mozilla.org/show_bug.cgi?id=1488435 isn't an issue now on Warp then?

(In reply to Mike Kaply [:mkaply] from comment #13)

Is it possible https://bugzilla.mozilla.org/show_bug.cgi?id=1488435 isn't an issue now on Warp then?

I commented there. Yes it's possible, Warp was a pretty big improvement on GDocs, but there might be other issues.

Performance Impact: --- → P2
Whiteboard: [qf:p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: