Closed Bug 463478 Opened 16 years ago Closed 15 years ago

TM: 4-6x slowdown with jit on JS raytracer

Categories

(Core :: JavaScript Engine, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: graydon)

References

()

Details

(Keywords: fixed1.9.1, regression, testcase)

Attachments

(1 file)

Attached file JS shell testcase
Spun off from bug 460964.  Today's tracemonkey browser renders the page in the url in about 20 seconds with jit off, but takes 80 seconds with jit on.  I stripped out enough stuff to get the testcase to run in the shell, after which we're looking at 15 seconds with jit off and 86 seconds with jit on.

Requesting blocking, since this is a serious perf regression over the no-jit case.
Flags: blocking1.9.1?
I tried running a verbose log of this testcase, but it grows very big very fast.  I killed it at about a gig in size.  At that point:

~% grep Abort ~/log.txt | sort | uniq -c | sort -nr
1560 Abort recording (line 993, pc 160): Inner tree not suitable for calling.
 833 Abort recording (line 964, pc 39): Inner tree not suitable for calling.
 768 Abort recording (line 964, pc 32): No compatible inner tree.
   1 Abort recording (line 993, pc 143): No compatible inner tree.
   1 Abort recording (line 875, pc 87): JSOP_GETARGPROP.

so it sounds to me like we're spending a lot of time here creating inner trees and then not using them or something....
Note that for that same log we've recorded 8659 trees (or at least we have:

~% grep "recording starting from" ~/log.txt | wc -l
    8659

).
Oh, and the jitstats output for the full debug run through is:

recorder: started(15052), aborted(6038), completed(10515), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(1472)
monitor: triggered(242814), exits(242814), type mismatch(0), global mismatch(3)

Also particularly weird is that while the opt shell takes 80 seconds with jit on, debug shell only takes 28 seconds with jit on (and 53 with jit off).
Flags: blocking1.9.1? → blocking1.9.1+
The debug-faster-than-opt part smells promising here.  Are we resetting something or initializing something more fully in DEBUG?
That does seem like the place to start.  For what it's worth, I just tried this again on current tip tracemonkey (with the better oracle), and opt no jit takes 15s, opt jit takes 88s, debug jit takes 29s.
OK, I tried removing all the debug ifdefs I could while still compiling in jstracer.cpp, but that didn't help.  I _did_ manage to get an opt build with JS_JIT_SPEW going, so I can give the jit stats for the debug and opt versions.

Debug:
recorder: started(15053), aborted(6039), completed(10515), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(1472)
monitor: triggered(242814), exits(242814), type mismatch(0), global mismatch(4)

Opt:
recorder: started(192047), aborted(84441), completed(157792), different header(11206), trees trashed(31), slot promoted(0), unstable loop variable(0), breaks(0), returns(243), unstableInnerCalls(13458)
monitor: triggered(3760123), exits(3760123), type mismatch(0), global mismatch(22)

So that explains why the opt build is so slow, now doesn't it?  Interesting question: why are there nonzero different header and trashed trees stats in the opt case but not the debug case?
For the opt build, here's |grep -i abort ~/log.txt | sort | uniq -c | sort -nr|:

16699 Abort recording (line 964, pc 39): Inner tree not suitable for calling.
15312 Abort recording (line 993, pc 160): Inner tree not suitable for calling.
11012 abort: 4887: failed to find property
8071 Abort recording (line 993, pc 143): Loop edge does not return to header.
7428 Abort recording (line 964, pc 32): No compatible inner tree.
6432 Abort recording (line 227, pc 166): JSOP_GETTHISPROP.
2738 Abort recording (line 993, pc 150): Inner tree is trying to grow, abort outer recording.
2353 Abort recording (line 175, pc 193): JSOP_GETTHISPROP.
1293 Abort recording (line 221, pc 41): JSOP_GETTHISPROP.
 932 Abort recording (line 168, pc 41): JSOP_GETTHISPROP.
 431 Abort recording (line 993, pc 143): No compatible inner tree.
  45 Abort recording (line 174, pc 187): Inner tree is trying to grow, abort outer recording.
  15 Abort recording (line 226, pc 160): Inner tree is trying to grow, abort outer recording.
  14 Abort recording (line 221, pc 38): Inner tree is trying to grow, abort outer recording.
  10 Abort recording (line 964, pc 32): No compatible inner tree (global demotions.
   2 Abort recording (line 875, pc 87): JSOP_GETARGPROP.
   1 Abort recording (line 242, pc 412): Inner tree is trying to grow, abort outer recording.
   1 Abort recording (line 1137, pc 121): No compatible inner tree.

This is not a full verbose log; I killed it partway through when the log file got to be north of 5 gigs in size.
It looks like we're doing most of the tree-trashing due to type instability, for what that's worth.

I have no idea why multitrees is going nuts here, but that's certainly what it looks like is happening.
David: Andreas said you'd be a good candidate for this!
Assignee: general → danderson
Note that even in DEBUG mode we compile _WAY_ too many traces so I would start there.
Priority: -- → P2
Perf regression in shell is down to a 1.3x slowdown. Still a blocker?
That depends at least in part on whether we've really fixed the problem or whether it's just covered up slightly with this particular somewhat reduced testcase, imo.  Also depends on whether the zero-tolerance perf regression policy from non-jit that Brendan was talking about at some point is actually happening.

I did some bisecting, and this testcase got sped up by changeset 8f8a4b924c32 (in tracemonkey repo).  Was that expected to help cases like this one?  Or is it that we're just no longer running the OOM handler in cases where we used to be and thus missing some codepath we used to hit?
We need to fix all non-noise slowdowns. Especially ones that are worse in browser than shell -- some bad bug(s) lurking there.

/be
Unfortunately this testcase was broken by revision 186e782c623d, which now traps on an assertion. I will need to shake that bug out first before picking up the underlying speed-of-raytracing issue.
Assignee: danderson → graydon
Depends on: 476653
As with bug 463487, this appears to be fixed (both the crashy behavior noted in comment #14 and also the general speed instability / slowdown relative to non-JIT) as of the landing of bug 463478. 

Also as in the former bug, I'm going to hold off closing until I complete multi-platform tests.
Er, the landing of bug 476653.
Graydon, do you happen to know the answer to my question from comment 12?
(In reply to comment #17)
> Graydon, do you happen to know the answer to my question from comment 12?

No. Nor have I looked at rev 8f8a4b924c32. I'll answer more fully later this afternoon when I've had a look at the verbose trace outputs and confirmed on multiple machines. I don't *think* we're papering over anything here, but best to be sure.
Depends on: 480244
Depends on: 480248
Good news is it "works" on all machines. As in: runs to completion in opt builds of browser, anyway.

Bad news is there is at least one, possibly 2 bugs left here. 

First, it's still slower than non-JIT, in this case: I had the results mixed up, only bug 463487 recovered its speed.

Second: the debug build still faults due to a typemap mismatch, but in a different place. Filed as bug 480244, since it's different than the one previously blocking. Opt build also doesn't run to completion in the shell; random misbehavior, not sure why it raises an exception in the shell and not browser.

If I twiddle input parameters and/or comment out assertions enough to get the debug shell to run to completion, it reports aborting 99% of recordings as before, so this at least explains the "still a net slowdown". It hardly ever hits trace.

Unfortunately then, still mostly blocked investigating by the secondary typemap fault (though thankfully not a tertiary phantom bug I *thought* I was seeing due to a misconfigured oprofile on my end, sigh).
No longer depends on: 480248
Turns out the typemap mismatch here was a somewhat serious flaw in the way we were accounting for argument ownership between frames. You just got unlucky and produced code that triggers it: a treecall that lines up with a function call, in which unexpected arguments are being passed *and* you have a specialization on the callee that differs from the improperly-aligned args on the caller stack.

Lucky day.

Anyway, fixing it in depth will take a nontrivial amount of work (filed as bug 481296), the initial symptom filed as bug 480244 is now bandaged over with an early abort. That's the best we can do for now.

After that fix, behavior picks up to "reasonable", though not stellar. A debug build is moderately slower (41s vs. 39s) jit vs. non-jit, whereas an opt build is moderately faster (19s vs. 20s) jit vs. non-jit. 

IMO it's going to be a bit of a trick to get better speed than this given that the band-aid fix is aborting a particularly hot treecall, causing outer loops to unravel; the testcase is now running to completion in the shell, and we can see that it's losing most of its speed entering and leaving the innermost-loop miniature traces. It just can't knit them together into outer loops properly:

recorder: started(121), aborted(62), completed(711), different header(0), trees trashed(1), slot promoted(0), unstable loop variable(21), breaks(0), returns(15), unstableInnerCalls(13)
monitor: triggered(938714), exits(938714), type mismatch(0), global mismatch(0)

I think we're out of the "pathological" woods here; just stuck with "doesn't go as fast as we'd like" given that the example has an idiom that our tracer can't presently handle very well. We eventually need to revisit this when we're fixing bug 481296 for real, but that will be much more involved.
Graydon, thanks for sorting this out!

It sounds like we should mark this bug fixed and leave the remaining work in bug 481296 and bug 480244.
Fine by me. At least when bug 480244 closes; right now it's committed and fixed-in-tracemonkey, awaiting the diligent hand of a cross-branch merger to decide its final set of resting places. 

(I was waiting for you to give an opinion on the matter :)
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Keywords: testcase
bug 480244 is fixed1.9.1
Keywords: fixed1.9.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: