Closed Bug 940062 Opened 11 years ago Closed 7 years ago

[WBGP] Principled Technologies's WebXPRT 2013 Face Detection regression

Categories

(Core :: JavaScript Engine, defect, P3)

25 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: nihsanullah, Assigned: jandem)

References

()

Details

(Keywords: perf, regression)

Attachments

(1 file)

Firefox 25 is 5x slower than Firefox 22 on the Face Detection subtest of this benchmark. This has a large impact of our overall score. See also 935299
See Also: → 935299
Can we find a regression range?
Sean hypothesizes that the loss of JM is the cause of this regression. He looked at their some of their code and there are several very large function bodies. IM would bail but JM with chunk compilation could still JIT these. We lost JM between 22 and 23 so this seems likely. Could we increase our function size limit to allow this test to get JITed? What would be the consequences of doing so?
A profile shows we spend a lot of time in the VM because Baseline is not attaching a GETELEM stub. Fixing that should help a lot. I will investigate a bit more.
Attached file Testcase
Standalone HTML testcase. Runs the algorithm on the first image. Testcase is much slower than it was in Firefox 22. (Note: if you want to run this in Chrome, you have to use the --disable-web-security command line flag.)
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Some numbers for the attached benchmark: Firefox 22: 240 ms Safari 6.0.5: 571 ms Chrome 33: 285-320 ms Firefox 25, 28: 1189 ms This matches the ~5x slower Naveed posted in comment 0.
Two problems: (1) The slow GETELEM VM calls I mentioned in comment 3 are caused by typed array accesses with a double instead of int32 index. Baseline's typed array stubs should support double-to-int32 conversion of the index. I will post a patch for this. I think this should make us almost 2x faster. (2) There are some functions that we Ion-compile a bunch of times. We should figure out why. These functions are pretty big but nothing extreme: bytecode size is 3400 bytes, we can compile that just fine with off-thread compilation.
Depends on: 940525
The main problem is that there's a JSOP_MUL with an operand that's |undefined| sometimes. Then Ion looks at the Baseline IC's to determine how to specialize this op, sees only an int32 stub and specializes as int32. This will bailout for |undefined| though and we don't enter the Ion code again. Ion shouldn't look at the baseline cache if it had unoptimizable inputs. A fix for this gets us to 290-300 ms. About 50 ms slower than Firefox 22 but about as fast as Chrome and there's probably more we can optimize.
Depends on: 940925
Blocks: 935299
No longer blocks: WBGP
After Jan fixed bug 940925 in Firefox 28, we're still slower (~330 ms) than Firefox 22 (~220 ms) but we're about as fast as Chrome.
Keywords: regression
OS: Windows 8.1 → All
Priority: P1 → P3
Hardware: x86 → All
For the attached benchmark (requires a webserver instead of file:// url): Nightly: 343 ms Chrome: 369 ms Safari: 474 ms
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: