Closed Bug 463487 Opened 16 years ago Closed 15 years ago

TM: 20x slowdown with jit on JS raytracer

Categories

(Core :: JavaScript Engine, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Assigned: graydon)

References

()

Details

(Keywords: regression)

Attachments

(1 file)

Testing with today's tracemonkey browser on the URL in the URL field, I get a 3500ms render time without jit, and a slow script dialog (so at least 10s) with jit.

I stripped out the HTML-dependent stuff, and the resulting js shell testcase runs in the following times, based on command-line args:

(none)           3200ms
-j               2600ms
-b 10000000      3500ms
-j -b 10000000   62000ms  (no, I did _not_ accidentally put an extra zero there!)

It's not a very minimal testcase, but it's a start, at least.

Clearly gc is screwing us over somehow...
Note: this was spun off of bug 460964.  I can't get a verbose trace without -b because of bug 463490.  With -b 10000000, I waited long enough to get a 300MB-some log file, at which point:

~% grep Abort ~/log.txt | sort | uniq -c | sort -nr
 443 Abort recording (line 164, pc 67): No compatible inner tree.
 432 Abort recording (line 185, pc 442): No compatible inner tree.
 431 Abort recording (line 91, pc 118): Inner tree not suitable for calling.
 102 Abort recording (line 65, pc 27): Inner tree not suitable for calling.
   2 Abort recording (line 229, pc 102): No compatible inner tree.
   1 Abort recording (line 88, pc 94): No compatible inner tree.
~% grep "recording starting from" ~/log.txt | wc -l
    5413

Which is not that different from what we see in bug 463478 in some ways.
Flags: blocking1.9.1?
Keywords: regression
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Can this be closed out?

crowdermac:src crowder$ ./js ~/raytrace-test.js
4345
crowdermac:src crowder$ ./js -j ~/raytrace-test.js
3022
That matches comment 0, yes.  Please either run the original testcase in the browser, or run this one in the shell with -b, as comment 0 says.
Ah, right.  This still reproduces identically.
I get these results now:

guest-225:opt dmandelin$ dist/bin/js -b 10000000 ../2ray.js 
3788
guest-225:opt dmandelin$ dist/bin/js -j -b 10000000 ../2ray.js 
6000
I still get long and unpredictable times in the browser, though.
Looks like in shell the testcase behavior changed as of changeset 8f8a4b924c32 (in tracemonkey).  This matches the range in which the behavior of bug 463478 changed.  Was this a real fix for this problem, or just a change in GC behavior that hides it? And if the latter, would that explain why browser still has a problem?
Assignee: general → graydon
Depends on: 476653
After the landing of bug 476653, this *appears* to no longer give wildly unpredictable time for me: I get a little instability, but the range is 400ms (!) to 1400 at the limits, with a pretty stable common case more around 900-1200.

For comparison, the most recent chrome I'm testing side-by-side here clocks 800-900, so we're pretty competitive on this one. Our non-JIT time is more like 4500. We're also beating chrome on the linked animation test.

This is win32 though. I'm going to wait to close this bug since I thought I saw perturbation on the linux system. Will want to do 3-platform tests to be sure.
It'd be really nice if someone could answer my question from comment 7.....
bz: that would be nice. We need to understand the variability in timing in comment 8, *and* any browser-only behavior change or slowdown. Fun.

/be
Some positive results: it runs at full speed on mac and linux in the browser as well, both the simple test and the animation test. At very good speed (~900ms simple, ~10ms/frame animation, regularly). 

In the shell, the statistics are also promising:

recorder: started(32), aborted(10), completed(95), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(20), breaks(3), returns(2), unstableInnerCalls(8)
monitor: triggered(50059), exits(50059), type mismatch(0), global mismatch(0)

Wrt. bz's comment #7, rev 8f8a4b924c32 was an OOM-handling bug; I have no idea why it would have affected performance, but I also don't really understand why you were passing the -b 10000000 flag in the first place, nor why either of those would cause the recorder to fail to find suitable inner trees too often. None of those are happening now, and what I was blocked on here until yesterday was yet another serious systemic bug, unrelated still. 

In general I tend to think what we had was a large stack (still nonempty, in the case of bug 463478) of *confounding* cross-the-JIT correctness errors, any one of which independently caused this test to fall down in a creatively different way, and so there is no single "thing to fix" here, nor any single thing we might be guilty of papering over. Each layer in the stack of confounding bugs was papering over the layer beneath it, and while it is difficult to prove when we've got to the bottom of that stack -- I'll admit, I can't prove that -- it is even more futile to look back up the stack and ask "was that the bug we were after?" We were after *all* of them. The story here is "keep knocking down correctness bugs until it starts behaving well". These larger raytracers seem to explore the breadth of the JIT much more than the microbenchmarks, and so quite a few systemic JIT bugs seem able to foul them up.

There *is* remaining time-variability in this case -- it's most stable on mac, big surprise, though on all platforms it tends to stick near a mean of 900ms when run repeatedly -- but I'd suggest if not closing this bug and declaring we reached the bottom of the stack, then at least downgrading it to a non-blocking state. The JIT makes performance on this case 10x faster than interp *in all builds* now, and it runs w/o error. It's just not perfectly noiseless when running in a browser. That alone seems low priority, as it's hardly something new. Many perf tests have noise in a browser.
If we want to use this bug to cover lower-priority variability/noise shown well by the raytracer, the summary should change.

/be
> I have no idea why it would have affected performance

Yes, that was my problem too.

> but I also don't really understand why you were passing the -b 10000000 flag
> in the first place

To trigger gc in the shell. Without that, the shell never GCed, and GC was key to triggering this bug in the first place.  That's why browser was showing the problem but shell without -b was not.

I assume current testing in the shell is using whatever the equivalent of -b is with the watchdog thread, right?

GC would affect ability to find inner trees because when we gc we flush all cached trees, no?

I'm fine with closing this bug out if we're reasonably sure that things are OK based on browser testing.  If we want a new bug on the noise, we should file a followup, in my opinion, but I bet that the noise is due to the exact timing of GC in relation to the test run.  Some experimentation in the shell might be able to prove or disprove that...
Please file appropriate followups, don't morph this bug.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: