Closed Bug 845068 Opened 7 years ago Closed 5 years ago

slow behavior in ValueNumberer for large pathological asm.js function

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: luke, Assigned: sunfish)

Details

(Whiteboard: [js:p2])

Attachments

(4 files)

Attached file slow
On http://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey, the attached patch spends a huge amount of time (>1 minute) compiling 1 8KLOC function, with all the time spent in ValueNumberer::analyze.

In general, Odin will start running the backend on much bigger functions that IonMonkey would have previously rejected on basis of size so it'd be good if we had a general strategy to mitigate quadratic behavior.  One simple option would be to turn off known-quadratic passes for large scripts.  It's also questionable how much these passes even buy, given that LLVM does it's own optimization passes.  We should measure this.  I'm mostly filing this bug as a placeholder for future investigation after the initial Odin landing.
I may peek at this later, IIRC gvn is supposed to run in O(edges + blocks) time, which should never be quadratic (unless you have some horribly hairy control flow).  One thought is that we generate an absurd number of phi nodes, and spend lots of time processing those.
Whiteboard: [js:p2]
To wit this now takes 24s.  That's still a rather long time for 8KLOC, but I guess this is just a pathological worst case.  Resolving wfm, but Dan let me know if you think there is more to do here.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
I took a quick look at this testcase under a profiler and noticed that we're spending a surprising amount of time in ValueNumberer::VisibleValues::forget. In the common case when we're deleting the instruction we just looked at, it won't be in the map yet, so we don't need to forget it.
Assignee: general → sunfish
Severity: normal → enhancement
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: quadratic behavior in ValueNumberer::analyze for large asm.js function → slow behavior in ValueNumberer for large pathological asm.js function
This patch tweak's GVN's deleteDefsRecursively function to delete the first definition without calling forget, instead of just putting it on the worklist for processDeadDefs to process, which will call forget. This speeds it up in the case where it's doing lots of deleting. It cuts off a third of the time for the attached testcase.
Attachment #8456436 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8456436 [details] [diff] [review]
gvn-avoid-unnecessary-forget.patch

Review of attachment 8456436 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Dan Gohman [:sunfish] from comment #4)
> This patch tweak's GVN's deleteDefsRecursively function to delete the first
> definition without calling forget, instead of just putting it on the
> worklist for processDeadDefs to process, which will call forget. This speeds
> it up in the case where it's doing lots of deleting. It cuts off a third of
> the time for the attached testcase.

It seems that this code is only avoiding the call to forget() which call the HashTable remove function, which it-self does a lookup.
So, the cost that you mention is only in the failing lookup of the HashTable.

Maybe we can improve the valueHash() keys to avoid collisions?
Attachment #8456436 - Flags: review?(nicolas.b.pierron) → review+
Attached patch gvn-loads.patchSplinter Review
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> It seems that this code is only avoiding the call to forget() which call the
> HashTable remove function, which it-self does a lookup.
> So, the cost that you mention is only in the failing lookup of the HashTable.
> 
> Maybe we can improve the valueHash() keys to avoid collisions?

Why worry about pathological hashing when it helps expose nice constant-factor improvements? ;-) Looking into it, GVN is doing 568961 lookups (reasonable) and the "steps" field in the hash map stats which measures collisions is 1954854228 (eep). I've now prepared two more patches to fix this.

One adds a valueHash override to MAsmJSLoadGlobalVar so that loads to separate global variables hash to separate buckets. The testcase has a few different global variables, and this reduces "steps" by a few percent. While here, I also added a valueHash for MLoadSlot as well, and valueHash and congruentTo overrides for MAsmJSLoadFuncPtr and MAsmJSLoadFFIFunc.
Attachment #8456910 - Flags: review?(nicolas.b.pierron)
The other is the big win; When two nodes are congruent except for their dependency() field, they hash to the same bucket. The attached testcase has a very long sequence that's effectively "load call load call load call load call ..." where the loads are all loading from the same global variable, but because there's a call between each of them, GVN can't eliminate them. Currently, those loads all collect in that one bucket, causing it to grow very large. The fix for this is to make GVN overwrite an old load when there's a new load that's congruent but with a different dependency. GVN has no use for those loads after the calls that follow them, and this patch lets it forget about them.

With all three of these patches, the "steps" field is down to 409961, and the testcase compiles and runs in under a second.
Attachment #8456916 - Flags: review?(nicolas.b.pierron)
(In reply to Dan Gohman [:sunfish] from comment #7)
> With all three of these patches, the "steps" field is down to 409961, and
> the testcase compiles and runs in under a second.

\o/ Impressive.
Attachment #8456910 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8456916 [details] [diff] [review]
gvn-dependence-difference.patch

Review of attachment 8456916 [details] [diff] [review]:
-----------------------------------------------------------------

Nice :)

::: js/src/jit/ValueNumbering.cpp
@@ +410,5 @@
>              MDefinition *rep = *p;
> +            if (rep->block()->dominates(def->block()) &&
> +                rep->dependency() == def->dependency())
> +            {
> +                // It domintes and has the same dependency. Use it!

nit: s/domintes/dominates/
Attachment #8456916 - Flags: review?(nicolas.b.pierron) → review+
Haha, wow.  Any broader improvements, or would this only help pathological cases like this?
(In reply to Luke Wagner [:luke] from comment #10)
> Haha, wow.  Any broader improvements, or would this only help pathological
> cases like this?

I believe I'm seeing a 1% total compilation time speedup (measured by -w) compiling sqlite. 

https://hg.mozilla.org/integration/mozilla-inbound/rev/e0c6d4eb2d4a
https://hg.mozilla.org/integration/mozilla-inbound/rev/f92f0477237f
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b6ec15fea1b
https://hg.mozilla.org/mozilla-central/rev/e0c6d4eb2d4a
https://hg.mozilla.org/mozilla-central/rev/f92f0477237f
https://hg.mozilla.org/mozilla-central/rev/2b6ec15fea1b
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.