Closed Bug 1004363 Opened 6 years ago Closed 6 years ago

Dominator-tree DFS GVN

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: sunfish, Assigned: sunfish)

References

(Blocks 1 open bug)

Details

Attachments

(14 files, 10 obsolete files)

2.70 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
3.46 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
3.87 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
2.78 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
4.47 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
1.31 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
11.91 KB, patch
shu
: review+
Details | Diff | Splinter Review
4.73 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
10.66 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
4.06 KB, patch
nbp
: review+
Details | Diff | Splinter Review
2.20 KB, patch
nbp
: review+
Details | Diff | Splinter Review
8.88 KB, patch
nbp
: review+
Details | Diff | Splinter Review
3.51 KB, patch
nbp
: review+
Details | Diff | Splinter Review
111.30 KB, patch
nbp
: review+
Details | Diff | Splinter Review
Attached are patches for a new GVN implementation in IonMonkey. Highlights:

 - On-the-fly dead-code and unreachable-block elimination, without a separate UCE pass.
 - In most cases it does all its work in a single pass, by taking advantage of RPO traversal and dominance information.
 - It takes a little more than half the time of the current GVN (not even counting the separate UCE pass) over a full Octane run.
 - No value-number information stored in MDefinitions.

It eliminates almost the exact same set of redundancies as the old GVN. With that, and with GVN not usually being a significant contributor to compile time in IonMonkey, this patch doesn't make a big difference on most benchmarks.
Assignee: nobody → sunfish
Attached patch gvn-tocontrolinstruction.patch (obsolete) — Splinter Review
A preperatory patch which just adds a nice helper function.
Attached patch gvn-congruenttos.patch (obsolete) — Splinter Review
This patch implements congruentTo for several operators which previously lacked it.
Attached patch gvn.patch (obsolete) — Splinter Review
This is the main GVN patch.
(In reply to Dan Gohman [:sunfish] from comment #0)
>  - On-the-fly dead-code and unreachable-block elimination, without a separate UCE pass.
>  - In most cases it does all its work in a single pass, by taking advantage
> of RPO traversal and dominance information.
>  - It takes a little more than half the time of the current GVN (not even
> counting the separate UCE pass) over a full Octane run.
>  - No value-number information stored in MDefinitions.

Awesome!

> It eliminates almost the exact same set of redundancies as the old GVN. With
> that, and with GVN not usually being a significant contributor to compile
> time in IonMonkey, this patch doesn't make a big difference on most
> benchmarks.

FWIW, Kraken gaussian-blur is the poster child for GVN, but it doesn't require anything too fancy I think so I'm sure this patch handles it :)
Comment on attachment 8415741 [details] [diff] [review]
gvn.patch

My tree has evolved substantially from this patch as I've discovered more about how OSR blocks fit in, how resume points fit in, how ParallelSafetyAnalysis works, and numerous other details.
Attachment #8415741 - Attachment is obsolete: true
The following is a series of cleanup patches split out from the main GVN patch so that they can be reviewed and checked in independently.
Keywords: leave-open
This patch just deletes some unused functions.
Attachment #8418834 - Flags: review?(mrosenberg)
This adds a toControlInstruction() helper function, similar to toInstruction() and other methods, to help avoid unchecked casting.
Attachment #8415739 - Attachment is obsolete: true
Attachment #8418837 - Flags: review?(mrosenberg)
This patch adds congruentTo methods to a few more classes to allow then to be GVN'd.
Attachment #8415740 - Attachment is obsolete: true
Attachment #8418839 - Flags: review?(mrosenberg)
This patch adds dominator checking to AssertExtendedGraphCoherency.
Attachment #8418840 - Flags: review?(mrosenberg)
This patch changes a few places to use the entryBlock() accessor function, rather than calling begin(), which is a litle easier to read.
Attachment #8418841 - Flags: review?(mrosenberg)
This patch just expands a comment.
Attachment #8418842 - Flags: review?(mrosenberg)
This patch rewrites the way ParallelSafetyAnalysis rewrites blocks into MAbortPar blocks. Previously it created a new block and replaced the old one with it. With this patch, it just converts the existing block in place. This is simpler, and it makes it easier to keep the CFG consistent.

This also converts the code for updating setSuccessorWithPhis and clearLoopHeader out of being custom code in the UCE pass and into MBasicBlock::removePredecessor itself, so that other callers of removePredecessor (now including ParallelSafetyAnalysis) automatically keep these fields current.

Feel free to reassign the review for this (or any other) patch if there's someone more suited.
Attachment #8418849 - Flags: review?(mrosenberg)
This patch makes MBasicBlock::discardBlock also discard the resume points. There is a comment about multiple blocks sharing a resume point, but this does not seem to happen (anymore?) as far as I can tell. Several places which call removeBlock were previously also removing the resume points manually; with this patch they no longer need to do so.
Attachment #8418850 - Flags: review?(mrosenberg)
This patch adds dump debugging methods for LIRGraph and LBlock, it makes control LInstructions (such as LGoto) print their successors, and makes MBasicBlock print its block id.
Attachment #8418852 - Flags: review?(mrosenberg)
This patch eliminates the UnsplitEdges pass, since it does strange things to the IL, the the only excuse being that nothing cares about the IL beyond that point. Instead, it just has Codegen skip trivial blocks containing just a goto. One tricky point is that it can't skip trivial blocks which form infinite loops.
Attachment #8418856 - Flags: review?(mrosenberg)
Attachment #8418834 - Flags: review?(mrosenberg) → review+
Comment on attachment 8418837 [details] [diff] [review]
gvn-tocontrolinstruction.patch

Review of attachment 8418837 [details] [diff] [review]:
-----------------------------------------------------------------

wtb: sm-specific counts of the number of casts that we have that should really be done with accessor functions
Attachment #8418837 - Flags: review?(mrosenberg) → review+
Comment on attachment 8418839 [details] [diff] [review]
gvn-congruenttos.patch

Review of attachment 8418839 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't realize there were so many things not tracked by gvn!
Attachment #8418839 - Flags: review?(mrosenberg) → review+
Attachment #8418840 - Flags: review?(mrosenberg) → review+
Attachment #8418841 - Flags: review?(mrosenberg) → review+
Attachment #8418842 - Flags: review?(mrosenberg) → review+
Attachment #8418852 - Flags: review?(mrosenberg) → review+
Attachment #8418856 - Flags: review?(mrosenberg) → review+
Comment on attachment 8418849 [details] [diff] [review]
gvn-abortpar.patch

This patch greatly simplifies the way ParallelSafetyAnalysis rewrites blocks into MAbortPar blocks. Previously it created a new aborting block for each predecessor of the problematic block. With this patch, it just clears out the existing block and inserts an MAbortPar instruction. This takes advantage of some recent MIR utility function improvements, and it makes it easier to keep the CFG and RPO consistent.

This also converts the code for updating setSuccessorWithPhis and clearLoopHeader out of being custom code in the UCE pass and into MBasicBlock::removePredecessor itself, so that other callers of removePredecessor (now including ParallelSafetyAnalysis) automatically keep these fields current.

Niko, r? to you because hg suggests you're relevant for this code; let me know if you'd like someone else to review this. Thanks!
Attachment #8418849 - Flags: review?(mrosenberg) → review?(nmatsakis)
(In reply to Dan Gohman [:sunfish] from comment #0)
> ... and with GVN not usually being a significant contributor to compile
> time in IonMonkey ...

That is a wrong statement. GVN takes the 3th most time during compiling. Between 10%-20% of the full compile time. Only register allocation (33%) and code generation (30%) take longer.
Comment on attachment 8418849 [details] [diff] [review]
gvn-abortpar.patch

This is the ParallelSafetyAnalysis patch I mentioned. See comment 13 above for more background.
Attachment #8418849 - Flags: review?(nmatsakis) → review?(shu)
Got an in-person r=shu for gvn-abortpar.patch. This should simply his upcoming parallel bailout work as well.

https://hg.mozilla.org/integration/mozilla-inbound/rev/0ac90fa5f24a
Comment on attachment 8418849 [details] [diff] [review]
gvn-abortpar.patch

Review of attachment 8418849 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: js/src/jit/MIRGraph.cpp
@@ +1151,5 @@
>              JS_ASSERT(pred->successorWithPhis());
>              JS_ASSERT(pred->positionInPhiSuccessor() == i);
>              for (MPhiIterator iter = phisBegin(); iter != phisEnd(); iter++)
>                  iter->removeOperand(i);
> +            pred->setSuccessorWithPhis(nullptr, 0);

Why is this necessary?
Attachment #8418849 - Flags: review?(shu) → review+
(In reply to Shu-yu Guo [:shu] from comment #27)
> ::: js/src/jit/MIRGraph.cpp
> @@ +1151,5 @@
> >              JS_ASSERT(pred->successorWithPhis());
> >              JS_ASSERT(pred->positionInPhiSuccessor() == i);
> >              for (MPhiIterator iter = phisBegin(); iter != phisEnd(); iter++)
> >                  iter->removeOperand(i);
> > +            pred->setSuccessorWithPhis(nullptr, 0);
> 
> Why is this necessary?

It's code that previously was done manually in UCE, and is now being done in removePredesessor so that other code (now including ParallelSafetyAnalysis) that wants to call removePredesessor doesn't need to do it manually too.
Depends on: 1016137
Refresh.
Attachment #8418850 - Attachment is obsolete: true
Attachment #8418850 - Flags: review?(mrosenberg)
Attached patch gvn-more-congruentto.patch (obsolete) — Splinter Review
Implement congruentTo on more classes.
Tweak the hash function. This is still *very* simple, but it reduces the typical collision rate from about 45% to about 15%. I tried a few fancier hash functions, but I didn't find anything significantly better.
This patch changes the meaning of numDominated() to include the block itself, since dominance is typically defined such that a block dominates itself. This eliminates several awkward "+1"s.
This replaces the one-level-only recursive propagation of unreachableness from setUnreachable() with full propagation in range analysis itself, which is already doing an RPO walk.
Attached patch gvn.patch (obsolete) — Splinter Review
This is the main GVN patch. This patch just does GVN and does not include the integrated UCE.
Attached patch gvn-uce.patch (obsolete) — Splinter Review
This is the current patch to add integrated UCE to GVN. I've separated it out because it currently causes a mysterious (but reproducible) regression in one of the browser chrome mochitests.
Attached patch gvn-fold-more-control.patch (obsolete) — Splinter Review
This patch builds on the integrated UCE by implementing several more foldsTo rules for control instructions. This does most of the folding that is currently done in Lowering.cpp.
Attachment #8430157 - Flags: review?(nicolas.b.pierron)
Attachment #8430159 - Flags: review?(nicolas.b.pierron)
Attachment #8430164 - Flags: review?(nicolas.b.pierron)
Attachment #8430165 - Flags: review?(nicolas.b.pierron)
Attachment #8430169 - Flags: review?(nicolas.b.pierron)
Attachment #8430170 - Flags: review?(nicolas.b.pierron)
Attachment #8430157 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8430159 [details] [diff] [review]
gvn-more-congruentto.patch

Review of attachment 8430159 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +8001,5 @@
>          return AliasSet::None();
>      }
> +
> +    bool congruentTo(const MDefinition *ins) const {
> +        return congruentIfOperandsEqual(ins);

This sounds wrong to me.  I feel like the following test case might cause the problem to be observable:

function f(a) {
  return function g() { return a; }
}

function topLevel() {
  var x = f;
  var r1 = x(1); /* inlined */
  var r2 = x(2); /* inlined */
  assertEq(r1(), 1);
  assertEq(r2(), 2);
}

for (var i = 0; i < 10000; i++)
  topLevel();

In this case we might alias the scope chain of both inlined versions, and as we alias these, we would erase the value of "a" during the second call to f.
Attachment #8430159 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8430164 [details] [diff] [review]
gvn-tweak-value-hash.patch

Review of attachment 8430164 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +852,5 @@
>      {
>          MDefinition *lhs = getOperand(0);
>          MDefinition *rhs = getOperand(1);
>  
> +        return op() + lhs->valueNumber() + rhs->valueNumber();

Stupid question, I think this might make sense for commutative expressions such as arithmetic instructions, but in general don't we want to have a non-commutative way of combining the value numbers?
Attachment #8430164 - Flags: review?(nicolas.b.pierron) → review+
Duplicate of this bug: 1018998
Attached patch gvn.patch (obsolete) — Splinter Review
A minor update to gvn.patch.

The most interesting change here is the removal of the init(graph_.numInstructionIds()) on the main hash set. It turns out that (a) that's a *lot* bigger than we typically need, and (b) even if we end up using the amount of memory we ask for, allocating too much up front causes the hash set to think it needs to do compaction along the way. Just letting the hash set size itself seems to work much better.
Attachment #8430170 - Attachment is obsolete: true
Attachment #8430170 - Flags: review?(nicolas.b.pierron)
Attachment #8432854 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #38)
> ::: js/src/jit/MIR.h
> @@ +8001,5 @@
> >          return AliasSet::None();
> >      }
> > +
> > +    bool congruentTo(const MDefinition *ins) const {
> > +        return congruentIfOperandsEqual(ins);
> 
> This sounds wrong to me.  I feel like the following test case might cause
> the problem to be observable:
> 
> function f(a) {
>   return function g() { return a; }
> }
> 
> function topLevel() {
>   var x = f;
>   var r1 = x(1); /* inlined */
>   var r2 = x(2); /* inlined */
>   assertEq(r1(), 1);
>   assertEq(r2(), 2);
> }
> 
> for (var i = 0; i < 10000; i++)
>   topLevel();
> 
> In this case we might alias the scope chain of both inlined versions, and as
> we alias these, we would erase the value of "a" during the second call to f.

The inliner can't currently inline these, but if it could, I assumed they'd get different operand values. If so, congruentIfOperandsEqual does the right thing. If not, how can they compute different values? Looking at CodeGenerator::visitFunctionEnvironment, I don't see how they could.

(In reply to Nicolas B. Pierron [:nbp] from comment #39)
> ::: js/src/jit/MIR.h
> @@ +852,5 @@
> >      {
> >          MDefinition *lhs = getOperand(0);
> >          MDefinition *rhs = getOperand(1);
> >  
> > +        return op() + lhs->valueNumber() + rhs->valueNumber();
> 
> Stupid question, I think this might make sense for commutative expressions
> such as arithmetic instructions, but in general don't we want to have a
> non-commutative way of combining the value numbers?

Quite possibly. There are probably several ways in which these hash functions could probably be improved. For now, my patch here is just a simple tweak that happens to have a big impact.
Comment on attachment 8430159 [details] [diff] [review]
gvn-more-congruentto.patch

Review of attachment 8430159 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +3517,5 @@
> +        return AliasSet::None();
> +    }
> +    bool congruentTo(const MDefinition *ins) const {
> +        return congruentIfOperandsEqual(ins);
> +    }

Drive-by comment, this is wrong if the input is an object; MToId can call its toString() method etc. We could still do this if we know the input is not an object (string is probably common).
Comment on attachment 8430165 [details] [diff] [review]
gvn-num-dominated.patch

Review of attachment 8430165 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIRGraph.cpp
@@ +1104,5 @@
>  MBasicBlock::clearDominatorInfo()
>  {
>      setImmediateDominator(nullptr);
>      immediatelyDominated_.clear();
> +    numDominated_ = 1;

hum … I think it might be nicer to keep resetting this info to 0, such as 0 remains the default uninitialized number of dominated blocks.

This way with the assert in the MBasicBloc::numDominated() function, we can prevent misuses of the dominator.  Also the graph assertion could either check that all blocks have zero numDominated or the right number.
Attachment #8430165 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8430169 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #44)
> ::: js/src/jit/MIRGraph.cpp
> @@ +1104,5 @@
> >  MBasicBlock::clearDominatorInfo()
> >  {
> >      setImmediateDominator(nullptr);
> >      immediatelyDominated_.clear();
> > +    numDominated_ = 1;
> 
> hum … I think it might be nicer to keep resetting this info to 0, such as 0
> remains the default uninitialized number of dominated blocks.
> 
> This way with the assert in the MBasicBloc::numDominated() function, we can
> prevent misuses of the dominator.  Also the graph assertion could either
> check that all blocks have zero numDominated or the right number.

Good idea. I updated the patch to do this.

https://hg.mozilla.org/integration/mozilla-inbound/rev/33b398048270
https://hg.mozilla.org/integration/mozilla-inbound/rev/41f1efb02920
https://hg.mozilla.org/integration/mozilla-inbound/rev/07ab6ede1eff
https://hg.mozilla.org/integration/mozilla-inbound/rev/51b269a9c5c9
Summary: Combined GVN and UCE → Dominator-tree DFS GVN
Attachment #8430159 - Attachment is obsolete: true
Attachment #8430172 - Attachment is obsolete: true
Attachment #8430173 - Attachment is obsolete: true
I've now dropped gvn-more-congruentto.patch from this bug for now, since there were issues raised above, and it's not especially important. I've also dropped the UCE parts, gvn-uce.patch and gvn-fold-more-control.patch, for now, as I still haven't diagnosed the browser chrome mochitest regression mentioned above. I'll create a new bugs for those patches.

There's now just one remaining patch here, gvn.patch, the main patch to GVN itself.
Comment on attachment 8432854 [details] [diff] [review]
gvn.patch

Review of attachment 8432854 [details] [diff] [review]:
-----------------------------------------------------------------

(review in progress)

::: js/src/jit/Ion.cpp
@@ +1386,5 @@
>  
>      if (mir->optimizationInfo().gvnEnabled()) {
>          AutoTraceLog log(logger, TraceLogger::GVN);
> +        ValueNumberer gvn(mir, graph);
> +        if (!gvn.run(/*updateAliasAnalysis=*/true))

nit: add an enumerated type instead of "comment + boolean", such as nobody forget the comment ;)

::: js/src/jit/IonAnalysis.cpp
@@ +1120,5 @@
> +
> +/// Remove all blocks not marked with isMarked(). Unmark all remaining blocks.
> +/// Alias analysis dependencies may be invalid after calling this function.
> +bool
> +jit::RemoveUnmarkedBlocks(MIRGenerator *mir, MIRGraph &graph, uint32_t marked)

nit: marked -> numMarkedBlocks

@@ +1130,5 @@
> +    }
> +
> +    size_t id = marked;
> +    for (ReversePostorderIterator iter(graph.rpoBegin()); iter != graph.rpoEnd();) {
> +        MBasicBlock *block = *iter++;

(quoted for the comment below)

@@ +1142,5 @@
> +
> +        // Unconditionally clear the dominators.  It's somewhat complex to
> +        // adjust the values and relatively fast to just recompute.
> +        block->clearDominatorInfo();
> +        block->setId(--id);

This sounds wrong to me, I would expect the remaining blocks to be numbered in the reverse post-order order.
Also note that AccountForCFGChanges already do this renumbering of blocks.

::: js/src/jit/ValueNumbering.cpp
@@ +109,5 @@
> +static bool deadIfUnused(const MDefinition *def)
> +{
> +    // TODO: The old GVN didn't check isImplicitlyUsed() here. Should we?
> +    return !def->isEffectful() && !def->isGuard() && !def->isControlInstruction() &&
> +           (!def->isInstruction() || !def->toInstruction()->resumePoint());

isImplicitlyUsed was a miss-understood work around which should be replaced by a hasObservableUses() function.
I suggest to still check for isImplicitlyUsed and I should clean-up this code to get rid of it.

@@ +135,5 @@
> +    from->replaceAllUsesWith(to);
> +
> +    // Also transfer the ImplicitlyUsed flag. TODO: Should this move
> +    // to MDefinition::replaceAllUsesWith?
> +    if (from->isImplicitlyUsed()) {

Yes, this should move to replaceAllUsesWith.

@@ +153,5 @@
> +}
> +
> +/// Given a block which has had predecessors removed but is still reachable,
> +/// test whether the block's new dominator will be closer than its old one
> +/// and whether it will expose potential optimization opportunities.

nit: s,///,//,

@@ +161,5 @@
> +    MBasicBlock *now = block->getPredecessor(0);
> +
> +    for (size_t i = 1, e = block->numPredecessors(); i != e; ++i) {
> +        MBasicBlock *pred = block->getPredecessor(i);
> +        while (!now->dominates(pred)) {

I do not think it matters, but having the following condition would make more sense based on the comment as we are looking for a new dominator for block (which is also equivalent to the dominator of all the predecessors of the block)

  while (!now->dominates(block)) {

@@ +182,5 @@
> +    // newly-dominating definitions which look interesting.
> +    MOZ_ASSERT(old->dominates(now), "Refined dominator not dominated by old dominator");
> +    for (MBasicBlock *i = now; i != old; i = i->immediateDominator()) {
> +        if (!i->phisEmpty() || *i->begin() != i->lastIns())
> +            return true;

nit: I think we should move the "expose potential optimization opportunities" in a separated function by returning the new dominator.  This functions really seems to be doing 2 different things.
Comment on attachment 8432854 [details] [diff] [review]
gvn.patch

Review of attachment 8432854 [details] [diff] [review]:
-----------------------------------------------------------------

(review in progress)

::: js/src/jit/ValueNumbering.cpp
@@ +122,5 @@
> +/// Test whether the given definition will no longer be needed after its user
> +/// is deleted.
> +static bool willBecomeDead(const MDefinition *def)
> +{
> +    return def->hasOneUse() && deadIfUnused(def);

Note for later: I think you might want to add hasOneDefUse. The difference might matter for removing duplicated inputs of dead-Phi.

  var a = b = c = …;
  switch (…) {
    case 0: a = …; break;
    case 1: b = …; break;
    case 2: c = …; break;
  }
  // <--- Phi with multiple times the same operands.

@@ +286,3 @@
>  
> +/// Delete the given block and any block in its dominator subtree.
> +bool ValueNumberer::removeBlocksRecursively(MBasicBlock *start, const MBasicBlock *root)

nit:

bool
ValueNumberer::removeBlocksRecursively(MBasicBlock *start, const MBasicBlock *root)
{

@@ +288,5 @@
> +bool ValueNumberer::removeBlocksRecursively(MBasicBlock *start, const MBasicBlock *root)
> +{
> +    // Remove this block from its dominator parent's subtree. This is the only
> +    // immediately-dominated-block information we need to manually update,
> +    // because everything dominated by this block it is about to be swept away.

nit: s/block it is/block is/

@@ +291,5 @@
> +    // immediately-dominated-block information we need to manually update,
> +    // because everything dominated by this block it is about to be swept away.
> +    MBasicBlock *parent = start->immediateDominator();
> +    if (parent != start)
> +        parent->removeImmediatelyDominatedBlock(start);

Assert that this is not the start nor the osr block.

@@ +337,5 @@
> +        // detect this and delete them. In practice though, when this happens,
> +        // we often end up re-running GVN for other reasons anyway.
> +        graph_.removeBlockIncludingPhis(block);
> +        blocksRemoved_ = true;
> +    } while (!unreachableBlocks_.empty());

I guess we are not using the dominated info, because it might not be up-to-date as we might already have removed some block which might have added extra block under the dominated blocks of start.

right?

@@ +355,5 @@
>  {
> +    // If the value isn't suitable for eliminating, don't bother hashing it. The
> +    // convention is that congruentTo returns false for node kinds that wish to
> +    // opt out of redundance elimination.
> +    // TODO: It'd be nice to clean up that convention.

I agree with this Todo, and I think a simple way might be to check if the congruentTo method resolves to MDefinition::congruentTo or to another one.

bool
MDefinition::canFoldDefs() const {
  return &this->congruentTo != &this->MDefinition::congruentTo;
}

@@ +371,5 @@
> +            // dominator tree, so overwrite it.
> +            values_.overwrite(p, def);
> +        } else {
> +            // No match. Add a new entry.
> +            if (!values_.insert(p, def))

I would prefer if the declaration of p goes outside the if condition as it use it in the else part.
Depends on: 1022321
Comment on attachment 8432854 [details] [diff] [review]
gvn.patch

Review of attachment 8432854 [details] [diff] [review]:
-----------------------------------------------------------------

(review complete)

::: js/src/jit/MIRGraph.h
@@ +414,5 @@
>          numDominated_ += n;
>      }
>  
>      bool addImmediatelyDominatedBlock(MBasicBlock *child);
> +    void removeImmediatelyDominatedBlock(MBasicBlock *child);

These names are a bit confusing, as we expect that we either add a BasicBlock or remove one.
Add a comment to mention that we only add/remove the "immediately dominated" edge to the child block.

::: js/src/jit/ValueNumbering.cpp
@@ +423,5 @@
> +            def->block()->insertAfter(def->toInstruction(), sim->toInstruction());
> +
> +        IonSpew(IonSpew_GVN, "    Folded %s%u to %s%u",
> +                def->opName(), def->id(), sim->opName(), sim->id());
> +        replaceAllUsesWith(def, sim);

I guess we should assert at the end of replaceAllUsesWith that the all the isObservableUses as well as the ImplicitlyUsed flag are correctly moved and removed from the original definition.

In such case we might just assert in isDeadAfterReplace() that these are never can never be set.
Note that we still need these conditions for isDead as we use it in visitBlock().

@@ +466,5 @@
>      return true;
>  }
>  
> +/// Visit the control instruction at the end of the given block.
> +bool ValueNumberer::visitControlInstruction(MBasicBlock *block, const MBasicBlock *root)

nit: Replace root by something like dominatorRoot or outermostDominator.  Having a basic block named "root" in a CFG is confusing.

@@ +474,5 @@
> +    MDefinition *rep = simplified(control);
> +    if (rep == control)
> +        return true;
> +    if (rep == nullptr)
> +        return false;

nit: Add an extra new line above this if, to highlight the shortcut of the "return true".

@@ +540,5 @@
> +            uint64_t(root->numDominated()), root->id(),
> +            root == root->graph().entryBlock() ? " (normal entry block)" :
> +            root == root->graph().osrBlock() ? " (OSR entry block)" :
> +            " (normal entry and OSR entry merge point)");
> +    MOZ_ASSERT(numBlocksDeleted_ == 0, "numBlocksDeleted_ wasn't reset");

nit: MOZ_ASSERT(root->immediateDominator() == root, "root is not a dominator tree root.");

@@ +660,5 @@
> +            return false;
> +
> +        // If no further opportunities are possible, we're done.
> +        if (!rerun_)
> +            break;

I understand that this is finite but I think we need a better upper bound than the potential number of rerun.  From what I understand from this algorithm, I guess I can make it re-run a linear number of time with a code which might look like (assuming we were doing proper folding of constants in MTest):

  var c0 = false, c1 = false, c2 = false, c3 = false;
  for (;;) {
    if (c2) c3 = true else c3 = false;
    if (c1) c2 = true else c2 = false;
    if (c0) c1 = true else c1 = false;
  }

After each run, we would be removing the last then-block, and keeping the else-block.  The else-block set the value to its original content, which makes the Phi node of the loop header redundant.  As the Phi node is redundant, we can optimize the last MTest and continue with the removal of its then-block.
Attachment #8432854 - Flags: review?(nicolas.b.pierron)
Attached patch gvn.patch (obsolete) — Splinter Review
Attached is an updated patch. I've addressed all your comments except where noted below.

(In reply to Nicolas B. Pierron [:nbp] from comment #48)
> ::: js/src/jit/ValueNumbering.cpp
> @@ +109,5 @@
> > +static bool deadIfUnused(const MDefinition *def)
> > +{
> > +    // TODO: The old GVN didn't check isImplicitlyUsed() here. Should we?
> > +    return !def->isEffectful() && !def->isGuard() && !def->isControlInstruction() &&
> > +           (!def->isInstruction() || !def->toInstruction()->resumePoint());
> 
> isImplicitlyUsed was a miss-understood work around which should be replaced
> by a hasObservableUses() function.
> I suggest to still check for isImplicitlyUsed and I should clean-up this
> code to get rid of it.

Actually, the old GVN does not use isImplicitlyUsed at all, either here or at replaceAllUsesWith. And thinking about this more, I don't see how transferring isImplicitlyUsed here is correct; whatever is using the value isn't getting transfered by replaceAllUsesWith. I removed this code from the patch.

> @@ +161,5 @@
> > +    MBasicBlock *now = block->getPredecessor(0);
> > +
> > +    for (size_t i = 1, e = block->numPredecessors(); i != e; ++i) {
> > +        MBasicBlock *pred = block->getPredecessor(i);
> > +        while (!now->dominates(pred)) {
> 
> I do not think it matters, but having the following condition would make
> more sense based on the comment as we are looking for a new dominator for
> block (which is also equivalent to the dominator of all the predecessors of
> the block)
> 
>   while (!now->dominates(block)) {

We haven't recomputed dominators at this point yet. We're trying to compute what block's new immediate dominator will be, so we look for the block that dominates all of block's *remaining* predecessors. I added a comment.

> ::: js/src/jit/ValueNumbering.cpp
> @@ +122,5 @@
> > +/// Test whether the given definition will no longer be needed after its user
> > +/// is deleted.
> > +static bool willBecomeDead(const MDefinition *def)
> > +{
> > +    return def->hasOneUse() && deadIfUnused(def);
> 
> Note for later: I think you might want to add hasOneDefUse. The difference
> might matter for removing duplicated inputs of dead-Phi.
> 
>   var a = b = c = …;
>   switch (…) {
>     case 0: a = …; break;
>     case 1: b = …; break;
>     case 2: c = …; break;
>   }
>   // <--- Phi with multiple times the same operands.

That's a good point. This can affect regular instructions too. I added a comment for this.

> @@ +337,5 @@
> > +        // detect this and delete them. In practice though, when this happens,
> > +        // we often end up re-running GVN for other reasons anyway.
> > +        graph_.removeBlockIncludingPhis(block);
> > +        blocksRemoved_ = true;
> > +    } while (!unreachableBlocks_.empty());
> 
> I guess we are not using the dominated info, because it might not be
> up-to-date as we might already have removed some block which might have
> added extra block under the dominated blocks of start.
> 
> right?

Right, when we're removing blocks, the dominator tree isn't kept up to date. Doing so would require more work in many cases because we'd end up recomputing dominators for blocks about to be removed anyway.

> @@ +355,5 @@
> >  {
> > +    // If the value isn't suitable for eliminating, don't bother hashing it. The
> > +    // convention is that congruentTo returns false for node kinds that wish to
> > +    // opt out of redundance elimination.
> > +    // TODO: It'd be nice to clean up that convention.
> 
> I agree with this Todo, and I think a simple way might be to check if the
> congruentTo method resolves to MDefinition::congruentTo or to another one.
> 
> bool
> MDefinition::canFoldDefs() const {
>   return &this->congruentTo != &this->MDefinition::congruentTo;
> }

Unfortunately, C++ doesn't permit this.

> ::: js/src/jit/ValueNumbering.cpp
> @@ +423,5 @@
> > +            def->block()->insertAfter(def->toInstruction(), sim->toInstruction());
> > +
> > +        IonSpew(IonSpew_GVN, "    Folded %s%u to %s%u",
> > +                def->opName(), def->id(), sim->opName(), sim->id());
> > +        replaceAllUsesWith(def, sim);
> 
> I guess we should assert at the end of replaceAllUsesWith that the all the
> isObservableUses as well as the ImplicitlyUsed flag are correctly moved and
> removed from the original definition.
> 
> In such case we might just assert in isDeadAfterReplace() that these are
> never can never be set.
> Note that we still need these conditions for isDead as we use it in
> visitBlock().

I don't understand what you're saying here. After replaceAllUsesWith, there are no isObservableUses because there are no uses. It replaces *all* uses.

> @@ +660,5 @@
> > +            return false;
> > +
> > +        // If no further opportunities are possible, we're done.
> > +        if (!rerun_)
> > +            break;
> 
> I understand that this is finite but I think we need a better upper bound
> than the potential number of rerun.  From what I understand from this
> algorithm, I guess I can make it re-run a linear number of time with a code
> which might look like (assuming we were doing proper folding of constants in
> MTest):
> 
>   var c0 = false, c1 = false, c2 = false, c3 = false;
>   for (;;) {
>     if (c2) c3 = true else c3 = false;
>     if (c1) c2 = true else c2 = false;
>     if (c0) c1 = true else c1 = false;
>   }
> 
> After each run, we would be removing the last then-block, and keeping the
> else-block.  The else-block set the value to its original content, which
> makes the Phi node of the loop header redundant.  As the Phi node is
> redundant, we can optimize the last MTest and continue with the removal of
> its then-block.

Yes, that can happen. My claim is that it's rare for real-world code to do anything resembling this -- triggering a rerun once is even pretty uncommon -- and that it's ok for pathologically inefficient code to be compiled somewhat slower (of course, termination is still guaranteed). However, if you disagree, I can add a cutoff limit.
Attachment #8432854 - Attachment is obsolete: true
(In reply to Dan Gohman [:sunfish] from comment #51)
> > ::: js/src/jit/ValueNumbering.cpp
> > @@ +423,5 @@
> > > +            def->block()->insertAfter(def->toInstruction(), sim->toInstruction());
> > > +
> > > +        IonSpew(IonSpew_GVN, "    Folded %s%u to %s%u",
> > > +                def->opName(), def->id(), sim->opName(), sim->id());
> > > +        replaceAllUsesWith(def, sim);
> > 
> > I guess we should assert at the end of replaceAllUsesWith that the all the
> > isObservableUses as well as the ImplicitlyUsed flag are correctly moved and
> > removed from the original definition.
> > 
> > In such case we might just assert in isDeadAfterReplace() that these are
> > never can never be set.
> > Note that we still need these conditions for isDead as we use it in
> > visitBlock().
> 
> I don't understand what you're saying here. After replaceAllUsesWith, there
> are no isObservableUses because there are no uses. It replaces *all* uses.

Right. :)

> > @@ +660,5 @@
> > > +            return false;
> > > +
> > > +        // If no further opportunities are possible, we're done.
> > > +        if (!rerun_)
> > > +            break;
> > 
> > I understand that this is finite but I think we need a better upper bound
> > than the potential number of rerun.  From what I understand from this
> > algorithm, I guess I can make it re-run a linear number of time with a code
> > which might look like (assuming we were doing proper folding of constants in
> > MTest):
> > 
> >   var c0 = false, c1 = false, c2 = false, c3 = false;
> >   for (;;) {
> >     if (c2) c3 = true else c3 = false;
> >     if (c1) c2 = true else c2 = false;
> >     if (c0) c1 = true else c1 = false;
> >   }
> > 
> > After each run, we would be removing the last then-block, and keeping the
> > else-block.  The else-block set the value to its original content, which
> > makes the Phi node of the loop header redundant.  As the Phi node is
> > redundant, we can optimize the last MTest and continue with the removal of
> > its then-block.
> 
> Yes, that can happen. My claim is that it's rare for real-world code to do
> anything resembling this -- triggering a rerun once is even pretty uncommon
> -- and that it's ok for pathologically inefficient code to be compiled
> somewhat slower (of course, termination is still guaranteed). However, if
> you disagree, I can add a cutoff limit.

The idea of the cut-off limit is to prevent one pathological case to degrade Gecko's performances overall.  If a malicious person make such code, we can end-up with an infinite compilation cycle.

The problem being that if the compilation last long enough and that we are able to trigger a full GC which is interrupting the compilation.  Then our GC, which is still synchronous, has first to wait on the end of the compilation (as we kill all compilations before GCing), and once the GC is done, we will trigger this compilation from scratch again.
Attached patch gvn.patchSplinter Review
I've now added a re-run cutoff, so I believe I've addressed all of your concerns from the review round, and am ready for another round :-).
Attachment #8439614 - Attachment is obsolete: true
Attachment #8441012 - Flags: review?(nicolas.b.pierron)
Blocks: 1029830
Comment on attachment 8441012 [details] [diff] [review]
gvn.patch

Review of attachment 8441012 [details] [diff] [review]:
-----------------------------------------------------------------

(reviewing the interdiff)

This sounds good :)

::: js/src/jit/ValueNumbering.cpp
@@ +137,5 @@
> +// Test whether the given definition will no longer be needed after its user
> +// is deleted. TODO: This misses cases where the definition is used multiple
> +// times by the same user.
> +static bool
> +WillBecomeDead(const MDefinition *def)

Add a bug number to the TODO.

@@ +364,5 @@
>  
> +        // TODO: Removing a block deletes the phis, instructions, and resume
> +        // points in the block, and their operands may become dead. We should
> +        // detect this and delete them. In practice though, when this happens,
> +        // we often end up re-running GVN for other reasons anyway.

Do we need these TODO, as DCE should remove dead operands too?

@@ +387,5 @@
> +{
> +    // If the value isn't suitable for eliminating, don't bother hashing it. The
> +    // convention is that congruentTo returns false for node kinds that wish to
> +    // opt out of redundance elimination.
> +    // TODO: It'd be nice to clean up that convention.

Todo: add a bug number for this todo.

@@ +480,5 @@
> +            // This is effectively what the old GVN did. It allows isGuard()
> +            // instructions to be deleted if they are redundant, and the
> +            // replacement is not even guaranteed to have isGuard() set.
> +            // TODO: Understand this.
> +            def->setNotGuardUnchecked();

Is that clear now?

@@ +658,5 @@
> +
> +bool
> +ValueNumberer::run(UpdateAliasAnalysisFlag updateAliasAnalysis)
> +{
> +    updateAliasAnalysis_ = updateAliasAnalysis == UpdateAliasAnalysis;

haha. :D
Attachment #8441012 - Flags: review?(nicolas.b.pierron) → review+
Blocks: 1031396
Blocks: 1031406
Blocks: 1031410
Blocks: 1031412
(In reply to Nicolas B. Pierron [:nbp] from comment #54)
> @@ +364,5 @@
> >  
> > +        // TODO: Removing a block deletes the phis, instructions, and resume
> > +        // points in the block, and their operands may become dead. We should
> > +        // detect this and delete them. In practice though, when this happens,
> > +        // we often end up re-running GVN for other reasons anyway.
> 
> Do we need these TODO, as DCE should remove dead operands too?

DCE does do this, so we're not missing a lot, but there are passes that run between GVN and DCE, and the fewer instructions they have to think about, the faster and more aggressive they can be. I filed bug 1031412 for this (and added it to the TODO comment).

> @@ +480,5 @@
> > +            // This is effectively what the old GVN did. It allows isGuard()
> > +            // instructions to be deleted if they are redundant, and the
> > +            // replacement is not even guaranteed to have isGuard() set.
> > +            // TODO: Understand this.
> > +            def->setNotGuardUnchecked();
> 
> Is that clear now?

I filed bug 1031410 to track cleaning this up (and added it to the TODO comment).

I filed bugs for the other TODOs and added them to the comments also.

https://hg.mozilla.org/integration/mozilla-inbound/rev/6f2c1e191d9d
Keywords: leave-open
This caused a performance regression on awfy x64 misc-typedobj-write-struct-field-typedobj.js (besides several speedups on more prominent benchmarks). I filed bug 1031607 to track this.
https://hg.mozilla.org/mozilla-central/rev/6f2c1e191d9d
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Duplicate of this bug: 677415
Duplicate of this bug: 677143
You need to log in before you can comment on or make changes to this bug.