Open Bug 606897 Opened 13 years ago Updated 6 months ago

Profiling makes us much slower on the Celtic Kane Conway benchmark


(Core :: JavaScript Engine, defect)




Tracking Status
blocking2.0 --- -


(Reporter: bzbarsky, Unassigned)



(Keywords: perf)


(1 file)

Attached file Testcase
The attached shell testcase is more or less a copy of the Conway benchmark at <>.  The number it prints is the score; higher is better.

I see these numbers over here:

  -m: 26.6
  -j: 49.98
  -m -j: 53.19
  -m -j -p: 26.14
For reference, v8 and jsc both score about 45 on this testcase; jsc about the same.  So we may be able to get there with JM only...

I think the loops on lines 55 and 56 (well, and 49 and 60) are the core of the benchmark; if I make sure we trace those I see scores around 44.  The loop on 56 gets blacklisted both because maybeShortLoop is true for it and because selfOpsMult is 16100 (presumably due to those error-checking if statements; in this case, unlike the cases with unreached error-check bodies, I think we do hit all the 16 possible branches... but that's ok!).

Also, the array copy loops (talk about slow ways to copy arrays!) don't get traced because the loop bodies are short; I assume JM optimizes dense arrays pretty well, though.  If I take out the loop on line 49 and everything inside it, JM ends up scoring 153 while TM scores 176... So the array copies are faster in TM, but not hugely.
Blocks: 580468
blocking2.0: --- → ?
Keywords: regression
I'm not having a lot of luck getting the profiler to trace this one. There are multiple issues. All the loops execute for only a few iterations. There's lots of loop nesting. And the instruction mix doesn't have a lot of math in it; it's mostly control-flow stuff. I tried adding array access and comparisons to the goodOps calculation, but even that wasn't enough (unless I used really big multipliers).

Getting this to trace without regressing other stuff seems hard.
Hmm.  So I guess one question is why _is_ this faster with TM than with JM (or with other methodjits, though the difference there is 15%, not 2x)?  Can we address this by just fixing something in JM?
blocking2.0: ? → -
Interp: 1.89
TM: 1.95
JM: 25.19
JM+TI: 38.67
d8: 52.53

Looks like JM+TI got back some of the performance in the attached testcase, but v8 is about 1.4x faster. Obviously the profiling part of this bug is no longer relevant, but it appears that there are still improvements to be made.
js: 135-140
d8: 160-170

We still have some room for improvements.
NVM this still didn't have --enable-threadsafe, going to remeasure.
Keywords: regressionperf
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.