Closed
Bug 463478
Opened 16 years ago
Closed 16 years ago
TM: 4-6x slowdown with jit on JS raytracer
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: graydon)
References
()
Details
(Keywords: fixed1.9.1, regression, testcase)
Attachments
(1 file)
35.55 KB,
text/plain
|
Details |
Spun off from bug 460964. Today's tracemonkey browser renders the page in the url in about 20 seconds with jit off, but takes 80 seconds with jit on. I stripped out enough stuff to get the testcase to run in the shell, after which we're looking at 15 seconds with jit off and 86 seconds with jit on.
Requesting blocking, since this is a serious perf regression over the no-jit case.
Flags: blocking1.9.1?
![]() |
Reporter | |
Comment 1•16 years ago
|
||
I tried running a verbose log of this testcase, but it grows very big very fast. I killed it at about a gig in size. At that point:
~% grep Abort ~/log.txt | sort | uniq -c | sort -nr
1560 Abort recording (line 993, pc 160): Inner tree not suitable for calling.
833 Abort recording (line 964, pc 39): Inner tree not suitable for calling.
768 Abort recording (line 964, pc 32): No compatible inner tree.
1 Abort recording (line 993, pc 143): No compatible inner tree.
1 Abort recording (line 875, pc 87): JSOP_GETARGPROP.
so it sounds to me like we're spending a lot of time here creating inner trees and then not using them or something....
![]() |
Reporter | |
Comment 2•16 years ago
|
||
Note that for that same log we've recorded 8659 trees (or at least we have:
~% grep "recording starting from" ~/log.txt | wc -l
8659
).
![]() |
Reporter | |
Comment 3•16 years ago
|
||
Oh, and the jitstats output for the full debug run through is:
recorder: started(15052), aborted(6038), completed(10515), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(1472)
monitor: triggered(242814), exits(242814), type mismatch(0), global mismatch(3)
Also particularly weird is that while the opt shell takes 80 seconds with jit on, debug shell only takes 28 seconds with jit on (and 53 with jit off).
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Comment 4•16 years ago
|
||
The debug-faster-than-opt part smells promising here. Are we resetting something or initializing something more fully in DEBUG?
![]() |
Reporter | |
Comment 5•16 years ago
|
||
That does seem like the place to start. For what it's worth, I just tried this again on current tip tracemonkey (with the better oracle), and opt no jit takes 15s, opt jit takes 88s, debug jit takes 29s.
![]() |
Reporter | |
Comment 6•16 years ago
|
||
OK, I tried removing all the debug ifdefs I could while still compiling in jstracer.cpp, but that didn't help. I _did_ manage to get an opt build with JS_JIT_SPEW going, so I can give the jit stats for the debug and opt versions.
Debug:
recorder: started(15053), aborted(6039), completed(10515), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(1472)
monitor: triggered(242814), exits(242814), type mismatch(0), global mismatch(4)
Opt:
recorder: started(192047), aborted(84441), completed(157792), different header(11206), trees trashed(31), slot promoted(0), unstable loop variable(0), breaks(0), returns(243), unstableInnerCalls(13458)
monitor: triggered(3760123), exits(3760123), type mismatch(0), global mismatch(22)
So that explains why the opt build is so slow, now doesn't it? Interesting question: why are there nonzero different header and trashed trees stats in the opt case but not the debug case?
![]() |
Reporter | |
Comment 7•16 years ago
|
||
For the opt build, here's |grep -i abort ~/log.txt | sort | uniq -c | sort -nr|:
16699 Abort recording (line 964, pc 39): Inner tree not suitable for calling.
15312 Abort recording (line 993, pc 160): Inner tree not suitable for calling.
11012 abort: 4887: failed to find property
8071 Abort recording (line 993, pc 143): Loop edge does not return to header.
7428 Abort recording (line 964, pc 32): No compatible inner tree.
6432 Abort recording (line 227, pc 166): JSOP_GETTHISPROP.
2738 Abort recording (line 993, pc 150): Inner tree is trying to grow, abort outer recording.
2353 Abort recording (line 175, pc 193): JSOP_GETTHISPROP.
1293 Abort recording (line 221, pc 41): JSOP_GETTHISPROP.
932 Abort recording (line 168, pc 41): JSOP_GETTHISPROP.
431 Abort recording (line 993, pc 143): No compatible inner tree.
45 Abort recording (line 174, pc 187): Inner tree is trying to grow, abort outer recording.
15 Abort recording (line 226, pc 160): Inner tree is trying to grow, abort outer recording.
14 Abort recording (line 221, pc 38): Inner tree is trying to grow, abort outer recording.
10 Abort recording (line 964, pc 32): No compatible inner tree (global demotions.
2 Abort recording (line 875, pc 87): JSOP_GETARGPROP.
1 Abort recording (line 242, pc 412): Inner tree is trying to grow, abort outer recording.
1 Abort recording (line 1137, pc 121): No compatible inner tree.
This is not a full verbose log; I killed it partway through when the log file got to be north of 5 gigs in size.
![]() |
Reporter | |
Comment 8•16 years ago
|
||
It looks like we're doing most of the tree-trashing due to type instability, for what that's worth.
I have no idea why multitrees is going nuts here, but that's certainly what it looks like is happening.
Comment 9•16 years ago
|
||
David: Andreas said you'd be a good candidate for this!
Assignee: general → danderson
Comment 10•16 years ago
|
||
Note that even in DEBUG mode we compile _WAY_ too many traces so I would start there.
Updated•16 years ago
|
Priority: -- → P2
Comment 11•16 years ago
|
||
Perf regression in shell is down to a 1.3x slowdown. Still a blocker?
![]() |
Reporter | |
Comment 12•16 years ago
|
||
That depends at least in part on whether we've really fixed the problem or whether it's just covered up slightly with this particular somewhat reduced testcase, imo. Also depends on whether the zero-tolerance perf regression policy from non-jit that Brendan was talking about at some point is actually happening.
I did some bisecting, and this testcase got sped up by changeset 8f8a4b924c32 (in tracemonkey repo). Was that expected to help cases like this one? Or is it that we're just no longer running the OOM handler in cases where we used to be and thus missing some codepath we used to hit?
Comment 13•16 years ago
|
||
We need to fix all non-noise slowdowns. Especially ones that are worse in browser than shell -- some bad bug(s) lurking there.
/be
Assignee | ||
Comment 14•16 years ago
|
||
Unfortunately this testcase was broken by revision 186e782c623d, which now traps on an assertion. I will need to shake that bug out first before picking up the underlying speed-of-raytracing issue.
Assignee: danderson → graydon
Assignee | ||
Comment 15•16 years ago
|
||
As with bug 463487, this appears to be fixed (both the crashy behavior noted in comment #14 and also the general speed instability / slowdown relative to non-JIT) as of the landing of bug 463478.
Also as in the former bug, I'm going to hold off closing until I complete multi-platform tests.
Assignee | ||
Comment 16•16 years ago
|
||
Er, the landing of bug 476653.
![]() |
Reporter | |
Comment 17•16 years ago
|
||
Graydon, do you happen to know the answer to my question from comment 12?
Assignee | ||
Comment 18•16 years ago
|
||
(In reply to comment #17)
> Graydon, do you happen to know the answer to my question from comment 12?
No. Nor have I looked at rev 8f8a4b924c32. I'll answer more fully later this afternoon when I've had a look at the verbose trace outputs and confirmed on multiple machines. I don't *think* we're papering over anything here, but best to be sure.
Assignee | ||
Comment 19•16 years ago
|
||
Good news is it "works" on all machines. As in: runs to completion in opt builds of browser, anyway.
Bad news is there is at least one, possibly 2 bugs left here.
First, it's still slower than non-JIT, in this case: I had the results mixed up, only bug 463487 recovered its speed.
Second: the debug build still faults due to a typemap mismatch, but in a different place. Filed as bug 480244, since it's different than the one previously blocking. Opt build also doesn't run to completion in the shell; random misbehavior, not sure why it raises an exception in the shell and not browser.
If I twiddle input parameters and/or comment out assertions enough to get the debug shell to run to completion, it reports aborting 99% of recordings as before, so this at least explains the "still a net slowdown". It hardly ever hits trace.
Unfortunately then, still mostly blocked investigating by the secondary typemap fault (though thankfully not a tertiary phantom bug I *thought* I was seeing due to a misconfigured oprofile on my end, sigh).
Assignee | ||
Comment 20•16 years ago
|
||
Turns out the typemap mismatch here was a somewhat serious flaw in the way we were accounting for argument ownership between frames. You just got unlucky and produced code that triggers it: a treecall that lines up with a function call, in which unexpected arguments are being passed *and* you have a specialization on the callee that differs from the improperly-aligned args on the caller stack.
Lucky day.
Anyway, fixing it in depth will take a nontrivial amount of work (filed as bug 481296), the initial symptom filed as bug 480244 is now bandaged over with an early abort. That's the best we can do for now.
After that fix, behavior picks up to "reasonable", though not stellar. A debug build is moderately slower (41s vs. 39s) jit vs. non-jit, whereas an opt build is moderately faster (19s vs. 20s) jit vs. non-jit.
IMO it's going to be a bit of a trick to get better speed than this given that the band-aid fix is aborting a particularly hot treecall, causing outer loops to unravel; the testcase is now running to completion in the shell, and we can see that it's losing most of its speed entering and leaving the innermost-loop miniature traces. It just can't knit them together into outer loops properly:
recorder: started(121), aborted(62), completed(711), different header(0), trees trashed(1), slot promoted(0), unstable loop variable(21), breaks(0), returns(15), unstableInnerCalls(13)
monitor: triggered(938714), exits(938714), type mismatch(0), global mismatch(0)
I think we're out of the "pathological" woods here; just stuck with "doesn't go as fast as we'd like" given that the example has an idiom that our tracer can't presently handle very well. We eventually need to revisit this when we're fixing bug 481296 for real, but that will be much more involved.
![]() |
Reporter | |
Comment 21•16 years ago
|
||
Graydon, thanks for sorting this out!
It sounds like we should mark this bug fixed and leave the remaining work in bug 481296 and bug 480244.
Assignee | ||
Comment 22•16 years ago
|
||
Fine by me. At least when bug 480244 closes; right now it's committed and fixed-in-tracemonkey, awaiting the diligent hand of a cross-branch merger to decide its final set of resting places.
(I was waiting for you to give an opinion on the matter :)
Status: NEW → ASSIGNED
Updated•16 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•