Currently, JM+TM is 12 ms slower on access-fannkuch than JM alone. Also, our Dromaeo score is quite a bit worse than what it used to be. We need to understand what determines fannkuch performance and what kind of tuning will make it reach its optimum.
First findings: - Since fannkuch runs better in JM-only mode, it seems like blacklisting everything would help. But if I blacklist every trace after it runs (on the assumption that some iteration-counting algorithm could do that), perf gets way worse. The reason is that we end up running in the interpreter a lot on fannkuch exiting trace, because of the way JM+TM works. There is a draft patch that fixes that, and with the draft patch, blacklisting everything is good. So, an algorithm that can do that without hurting anything else would be useful. I find that most loops in fannkuch run 8 or fewer iterations, so a low-iteration blacklister could get them. There is one loop nest that we do speed up by tracing (starting at line 34) The loop nest as a whole runs about 20 iterations each time. - Running in Dromaeo, JM+TM is only a little better than TM-only. JM-only is 2x better than that. So, we are probably tracing too much of this in Dromaeo. But, as noted above, in order to win by blacklisting, we need the patch that improves the trace integration. That patch currently causes a small regression, so we would need to figure that out first.
More on Dromaeo: In the shell on access-fannkuch alone, we end up blacklisting the loops at lines 25 and 50. This apparently works reasonably well, although it is slower than we get in pure JM, as noted in comment 1. In Dromaeo, we end up blacklisting the loops at lines 21, 25, 34, and 50. This causes a big slowdown. If I permaban (never even try to trace) the same set in the shell, I get about the same slowdown. The only subset that gives good perf is permabanning 21 only. Note that this doesn't match the blacklisting set from the shell. That is probably because of how we can end up stuck in the interpreter after leaving trace. So, it seems that blacklisting is going to be a minefield until we get the new trace integration patch to work without regressions.
FWIW, bug 580752 speeds up fannkuch by about 1.25x when run with TM+JM.
(In reply to comment #3) > FWIW, bug 580752 speeds up fannkuch by about 1.25x when run with TM+JM. Cool. Anything that improves TM perf on the hard cases makes this easier.
Moot now that TM+JM is no more.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.