Last Comment Bug 678687 - TI: ammo.js runs very slowly
: TI: ammo.js runs very slowly
Status: RESOLVED FIXED
[paladin]
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla10
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
Depends on: 723574
Blocks: infer-perf-regress
  Show dependency treegraph
 
Reported: 2011-08-12 21:07 PDT by Alon Zakai (:azakai)
Modified: 2012-05-18 07:25 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
stress benchmark (2.32 KB, application/javascript)
2011-08-12 21:07 PDT, Alon Zakai (:azakai)
no flags Details
ammo.js (needed by stress benchmark) (1.41 MB, text/plain)
2011-08-12 21:08 PDT, Alon Zakai (:azakai)
no flags Details
ammo.js, pretty-printed version (270.01 KB, application/octet-stream)
2011-08-13 11:11 PDT, Alon Zakai (:azakai)
no flags Details
WIP (26.98 KB, patch)
2011-08-26 05:15 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
patch against c537139643a9 (23.45 KB, patch)
2011-10-19 08:23 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Review
alternative (1b4d0987b18d) (5.34 KB, patch)
2011-10-31 08:05 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Review

Description Alon Zakai (:azakai) 2011-08-12 21:07:43 PDT
Created attachment 552829 [details]
stress benchmark

The attachments are ammo.js (bullet physics engine + autogenerated js bindings) and a small stress benchmark. Running the benchmark gives much poorer results with -n than without:

mozilla-central  -m -j -p     1.88
jaegermonkey     -m -j -p     1.96
jaegermonkey     -m -j -p -n  5.47

Results are printed in the benchmark itself and are in seconds (lower is better). jaegermonkey without -n is slightly slower than m-c, while jaegermonkey with -n is almost 3 times slower.

This code is implicitly statically typed (it's compiled from C++), so I was hoping -n would make it faster :(

If it's important, I can provide a non-minified version of ammo.js.
Comment 1 Alon Zakai (:azakai) 2011-08-12 21:08:34 PDT
Created attachment 552830 [details]
ammo.js (needed by stress benchmark)
Comment 2 Brian Hackett (:bhackett) 2011-08-13 06:53:57 PDT
Can you attach or link to the non-minified version?
Comment 3 Alon Zakai (:azakai) 2011-08-13 11:11:04 PDT
Created attachment 552885 [details]
ammo.js, pretty-printed version

Here is a pretty-printed version of ammo.js.
Comment 4 Brian Hackett (:bhackett) 2011-08-24 16:12:02 PDT
The -D output is pointing to a bunch of element accesses which are almost always stubbed.  I looked at the inferred types for some of these and the element type accessed is int|float.  It's kind of silly to still not have a fast path for accesses on double-representable-as-int elements, but I think that a better fix is to try to convert doubles to ints at type barriers which have never seen doubles before.  This will speed up not just the element accesses but also other uses of the not-really-a-double doubles.  e.g. we take 4 million stub calls in the below function because a/c are thought to be doubles, but the 'a + e' and 'c + e' will go faster if a/c are known integers.

function B(a, c, d) {
  for(var e = 0;e < d;e++) {
    z[a + e] = z[c + e]
  }
}
Comment 5 Alon Zakai (:azakai) 2011-08-24 16:33:42 PDT
I'm curious, why are these values thought to be doubles in the first place? Is that just the default before anything else is known, or something else? (I am fairly sure that in functions similar to that in ammo.js, only integers would ever be passed in)
Comment 6 Brian Hackett (:bhackett) 2011-08-24 16:50:56 PDT
The arguments will only be marked as doubles if a double value has actually been observed to have been passed in.  The problem is that this property holds of doubles like '4.0', and JM+TI will coerce integers into doubles within blocks of code used to manipulate both ints and doubles (this allows JM to store the value in a floating point register and avoid testing its type whenever it is accessed).  If those ints-converted-to-doubles are then passed to functions that had previously seen only integers, the arguments get marked as doubles, every time the function gets called any integer argument is converted to a double, and if those new ints-converted-to-doubles are then passed somewhere else the process repeats.

This problem of imprecision steadily leaking across the program is normally addressed by inserting dynamic type tests ('type barriers') at key points (property access sites and function entry points), but as currently performed these type barriers can't cope with the above kind of leaking and need to be improved (should not be hard).
Comment 7 Alon Zakai (:azakai) 2011-08-24 17:26:51 PDT
Very interesting, thanks for the explanation. Sounds like ammo.js (which has heavy use of both ints and floats) would greatly benefit from limiting that imprecision leaking.
Comment 8 Brian Hackett (:bhackett) 2011-08-26 05:15:31 PDT
Created attachment 556004 [details] [diff] [review]
WIP

WIP for coercing doubles at type barriers, improves TI's time from ~1.9s to 1.0s (m-c is .8s, d8 is 2.5s).
Comment 9 Alon Zakai (:azakai) 2011-10-07 15:53:17 PDT
Here are courtesy of humph three very cool demos on the web that use ammo.js,

http://granular.cs.umu.se/ammo/Demos/index.html

They run much slower in FF9 and Nightly compared to earlier FFs because of this bug.
Comment 10 Brian Hackett (:bhackett) 2011-10-19 08:23:16 PDT
Created attachment 568063 [details] [diff] [review]
patch against c537139643a9

Updated patch.  Sorry about the delay here.
Comment 11 Brian Hackett (:bhackett) 2011-10-26 18:47:28 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/3b5b10d76887
Comment 12 Marco Bonardo [::mak] 2011-10-27 01:51:02 PDT
https://hg.mozilla.org/mozilla-central/rev/3b5b10d76887
Comment 13 Marco Bonardo [::mak] 2011-10-27 02:21:50 PDT
hm, tree-management points out a V8 regression of 2% on Linux and 4% on Linux64...
Comment 14 Marco Bonardo [::mak] 2011-10-27 02:49:31 PDT
and also a Dromaeo (String/Array/Eval/Regex) 8% regression
Comment 15 Ed Morley [:emorley] 2011-10-27 08:14:17 PDT
Backed out due to further dev.tree-management confirmation of regressing V8/Dromaeo on multiple platforms since comment 14:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/mps5nX3AvzA
...even if the fix ends up being easy, it will be easier to see if it is successful, if it lands as a relanding :-)

https://hg.mozilla.org/integration/mozilla-inbound/rev/6049e8ad9755
Comment 16 Brian Hackett (:bhackett) 2011-10-27 08:22:32 PDT
(In reply to Ed Morley [:edmorley] from comment #15)
> Backed out due to further dev.tree-management confirmation of regressing
> V8/Dromaeo on multiple platforms since comment 14:
> https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/
> mps5nX3AvzA
> ...even if the fix ends up being easy, it will be easier to see if it is
> successful, if it lands as a relanding :-)
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6049e8ad9755

Thanks!  This also dinged some AWFY numbers a little.  I think I will need to use a more targeted fix instead of this patch.
Comment 17 Brian Hackett (:bhackett) 2011-10-31 08:05:18 PDT
Created attachment 570702 [details] [diff] [review]
alternative (1b4d0987b18d)

Alternative approach which just allows GETELEM/SETELEM paths to work on known doubles, by first trying to convert the double to an integer.  This gives a similar speedup to the ammo benchmark.

pre-TI js -m -j -p:  3.49
js -m -n (old):      4.52
js -m -n (new):      3.46
d8:                  6.22

The Dromaeo regressions were caused by the VM changes, trying to convert doubles to ints after element/property accesses and when making function calls.  I rearranged things to be more efficient but the slowdown remained when pushing things to try.  I want to figure out what's up and get the real fix in, but will not be able to do so for Firefox 10 (shouldn't affect ammo.js performance much, however).
Comment 18 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-10-31 08:46:51 PDT
So that last change just makes ammo.js faster without making the VM changes that affected dromaeo?
Comment 19 Brian Hackett (:bhackett) 2011-10-31 08:53:34 PDT
(In reply to Boris Zbarsky (:bz) from comment #18)
> So that last change just makes ammo.js faster without making the VM changes
> that affected dromaeo?

Yes.  The latest patch just improves compiler paths so that x[y] and x[y] = z are faster when y is a double convertible to an int, and makes things strictly faster than they are now.
Comment 20 Brian Hackett (:bhackett) 2011-11-01 08:44:47 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/b979f9d59e89
Comment 21 Brian Hackett (:bhackett) 2011-11-01 14:54:19 PDT
And backed out again due to an intermittent timeout on test_jQuery...

https://hg.mozilla.org/integration/mozilla-inbound/rev/7cb178191144
Comment 22 Brian Hackett (:bhackett) 2011-11-03 09:26:53 PDT
Round 3:

https://hg.mozilla.org/integration/mozilla-inbound/rev/f8bbd27f41ce

I tweaked the patch a bit to slow path when accessing a double identifier in GETELEM or SETELEM ops which require an IC.  With this change, I wasn't able to reproduce the timeout on try:

https://tbpl.mozilla.org/?tree=Try&rev=995136d9f804
Comment 23 Ed Morley [:emorley] 2011-11-03 13:11:18 PDT
https://hg.mozilla.org/mozilla-central/rev/f8bbd27f41ce
Comment 24 Alon Zakai (:azakai) 2011-11-03 13:48:23 PDT
This patch is a big improvement, I now get

-m -j -p   0.699
-m -n      0.959

but it is still 37% slower with TI - and TI is something which should, in principle, make this kind of code *faster* (like it does on most benchmarks).

Do we know what the remaining issues are here?
Comment 25 Brian Hackett (:bhackett) 2011-11-03 14:00:29 PDT
(In reply to Alon Zakai (:azakai) from comment #24)
> This patch is a big improvement, I now get
> 
> -m -j -p   0.699
> -m -n      0.959
> 
> but it is still 37% slower with TI - and TI is something which should, in
> principle, make this kind of code *faster* (like it does on most benchmarks).
> 
> Do we know what the remaining issues are here?

What is the command line you are running?  I do js -f ammo.pretty.js -f ammo.stress.js, and get similar times with -mjp and -mn of about 3.5 seconds.
Comment 26 Alon Zakai (:azakai) 2011-11-03 15:35:26 PDT
I am using the original 2 files attached in this bug, and doing |js [opts] stress.js|. I'm on 32-bit linux. As mentioned in comment 0, I am looking at the number printed in the benchmark itself, not the overall time (the overall time would include startup, which isn't important in practice with this code).
Comment 27 Alex Keybl [:akeybl] 2011-12-01 14:27:53 PST
[Triage Comment]
This was nominated for tracking on FF9, and is now resolved/fixed. Is this still something that we want to take on aurorabeta? If so, please nominate the patch.

Note You need to log in before you can comment on or make changes to this bug.