Noticed that Firefox 7 on my local machine was significantly faster than Firefox 11 nightly at running: http://stepheneb.github.com/webgl-matrix-benchmarks/matrix_benchmark.html Pattern held in Linux and Windows, 32 bit and 64 bit, Firefox 7 and 8 (fast) vs Firefox 9, 10 and 11 (slow) Disabling TI did appear to slightly improve scores, but not as much as performance in older versions. Given these matrix libraries are heavily used by most WebGL out there, it seemed worth filing a bug about. Browser Library Multiplication Translation Scaling Rotation (Arbitrary axis) Rotation (X axis) Transpose Inverse Inverse 3x3 Vector Transformation Average FF7 closure 11.37 25.94 22.75 2.14 2.10 35.74 12.84 27.92 26.28 18.57 FF11 closure 6.90 14.08 9.10 1.74 1.73 24.51 5.74 13.26 20.87 10.88 FF7 TDLFast 10.24 24.27 29.46 2.01 2.28 30.98 10.17 15.63 FF11 TLDFast 6.44 8.40 30.06 1.71 2.29 29.76 4.86 11.93 ... and so on.
From IRC: <bhackett> I looked at one of the tests being done there <bhackett> I wish it was easier to isolate subpieces of that page for profiling <bhackett> it was a matrix multiplication in the closure library <bhackett> basically a fully unrolled loop accessing constant indexes of a typed array <bhackett> I suspect the problem is bad codegen due to a lack of CSE <bhackett> but need to confirm <dmandelin> do you know why it got slower? did it used to trace? <bhackett> yeah <dmandelin> ah, ok <dmandelin> ok, i'll post in that bug <dmandelin> thanks <bhackett> ok <bhackett> I'm hoping to look some more at that later this week <bhackett> have also been thinking about an easy fix, but the real solution will be to just push forward on IonMonkey <dmandelin> ok, i'll leave it on tracking status for now, in case you get the easy fix <bhackett> the easy fix may be worth it if this unrolled-constant-index pattern is pervasive in the regressing bits of the page <dmandelin> ok So, this used to trace before TI, and it benefited from the tracer's CSE, which isn't present in JM or JM+TI. IM will have that optimization, so long-term, this should only get better. Thanks for the test cases and report--this should help us. For now, Brian may have an easy fix, but otherwise, in the absence of specific programs that stop working because of the regression, this isn't urgent.
Does this need to be tracked any longer?
Since someone just subscribed to this bug, I figured I'd rerun it in FF19 (Ion) and FF16 (no Ion) aaand FF7 (trace) as in initial report. Closure (FF7 was best w/ this lib) mult trans scale rot rotX trans inv inv3x3 vectr Average FF7 11.42 26.21 22.65 2.17 2.18 36.34 13.22 28.22 26.35 18.75 FF16 7.03 14.87 9.23 1.63 1.74 26.77 5.83 13.16 21.60 11.32 FF19 12.53 20.54 14.90 2.09 2.07 40.19 15.87 24.82 32.13 18.35 TDLFast (FF19 was best at this, FF16 sucked least at this) mult trans scale rot rotX trans inv inv3x3 vectr Average FF7 10.35 24.42 29.43 2.14 2.40 29.92 10.39 15.58 FF16 6.41 7.75 31.56 1.73 2.27 30.64 5.03 12.20 FF19 14.29 9.63 65.77 1.78 58.46 76.18 11.12 33.89 While FF19 is still quite a bit slower at some individual operations (FF7 2.5x faster at TDLFast transpose, 1.5x faster at Closure scale) FF19 is now doing rather well, so I imagine unless you guys feel like filing IonSpeed bugs for those specific operations, the overall complaint about significant slowdowns after removal of tracing doesn't apply any more and I guess this should be closed?
Filing individual bugs on the things that are still slow, blocking this one, seems like a great idea. Bonus points for reduced testcases showing the slowdown....
I just did some comparisons with the competition, with the results screenshot'd and attached. The results are pretty interesting: overall, we trounce Chromium and Safari here, nice. Except for Inverse and Inverse 3x3 (where the top results are close), we also win every single benchmark, when only comparing the top results across frameworks. After that, it gets a bit more complicated, with results all over the map. For example, at TLDMath's Vector Transformation, we're the slowest, with Chromium more than 6x as fast. Having said all this, I don't actually think that we have that much to gain from tracking this, given that the test cases are hard to reduce and that there's nothing much wrong to begin with.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Wait. Why is being 6x slower on the non-microbenchmark not a problem?
Mmh, I guess you're right - we should at least make an attempt to look into the bad outliers. Will do that now. First guess: they create lots of objects using `new Klass()` (http://jsperf.com/new-vs-object-create/6)
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
You need to log in before you can comment on or make changes to this bug.