Closed Bug 612019 Opened 15 years ago Closed 14 years ago

Tracing typed arrays is super fast, but trace/method heuristics lose most of it

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -

People

(Reporter: azakai, Unassigned)

References

Details

Attachments

(2 files)

The attached code runs almost 4 times faster with -j than with -m, but with -m -j -p it loses most of that speed. Data: raytrace - with typed arrays sm -j 0.699 :) sm -m 2.656 sm -m -j -p 2.074 v8 2.596 So -m -j -p is significantly faster than -m, which is very good, but most of the potential speedup appears to be lost. For comparison, here is the same code without typed arrays: raytrace - no typed arrays sm -j 3.515 sm -m 3.812 sm -m -j -p 3.674 v8 2.522 Looks like without typed arrays there is not much of a difference. (The attached source code uses typed arrays by default. They can be disabled by making the check for this.Int32Array and this.Float64Array turn out false).
Forgot to say, the numbers above are when running the code with arguments 5 64 to the benchmark.
Blocks: 580468
I don't know if you are aware, but dvander is working on a patch that should make typed array use in the mjit much faster.
Thanks, made this bug depend on that one. Maybe this issue will be resolved with dvander's work there.
Depends on: 594247
Didn't I say to cc Bill? ;)
blocking2.0: --- → ?
Alon, how the heck do I run the attached script? Just running it gives "too much recursion" errors. There's no obvious place to pass the arguments from comment 1. Care to attach a testcase that can just be run in the shell? Fwiw, I seriously doubt that bug 594247 will help enough here; the resulting code is still a lot slower than TM typed array code....
Sorry about not cc'ing, I forgot... The arguments should be passed to the shell, for example ./js -m -j -p src.cpp.cc.js 5 64
Ah, I see. Thanks. And I assume your timing is with |time|, not self-timed? With the patch for bug 594247 applied, I see numbers like so: -j: 0.21 -m: 0.41 -m -j -p: 0.33 -m -j: 0.36 Without that patch, I get, on the same hardware: -m: 0.80 -m -j -p: 0.66 -m -j: 0.64 So we do get about 2x faster, but are still 1.5x slower than pure tracing...
Though note, this is 64-bit. It's possible that on 32-bit the gap is smaller.
blocking2.0: ? → -
Depends on: 626986
This regressed with -j because we don't trace labeled break anymore; filed bug 626986. Some numbers: js: 3.12s js -j: 3.21s js -m: 1.29s js -m -j -p: 1.30s d8: 1.21s
Alon, almost all labeled breaks here are of the form a:for(;;) { if(!(g < C[N])) { break a } // ... } Is it possible to generate normal breaks here until bug 626986 is fixed? If I do that manually we're at 0.36s with -j (9x faster).
Here's the version without labeled breaks. I'm attaching it in case someone wants to investigate the profiling problem, which I can still reproduce.
I took a quick look at this. I think it might be fixable with a few small tweaks to the heuristics. I'll look at it after FF4.
Assignee: general → wmccloskey
(In reply to comment #10) > Is it possible to generate normal breaks here until bug 626986 is fixed? If I > do that manually we're at 0.36s with -j (9x faster). Very interesting, thanks! I wrote a patch now for emscripten to generate fewer labels, and I get 4-8x speedups on -j depending on the benchmark. However, running with -m -j -p has not changed, most of my benchmarks seem to be tied mainly to the method jit. (I guess perhaps there are other reasons the code doesn't trace well, or maybe just not enough labels were removed.)
No, that part is just this very bug.
Sorry, I wasn't clear: I am now seeing -j being slower (with typed arrays) than -m. But in comment 0, -j is much faster. So something has changed since comment 0. However, I did not test on the original code attached here, but on a new up to date version of the code. So it is possible a change there has something to do with the slowdown since comment 0. But, the patch that stops tracing labelled breaks landed since then, and removing even just labels from the innermost loops definitely has a huge effect, as mentioned above. So the labelled breaks not being traced issue that Jan raised definitely seems like the important thing. (Aside from that, there remains the issue in this specific bug, that even when tracing is much faster than -m, most of that is diluted in -m -p.) tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be done on this bug, since -j is slower than -m anyhow (and this bug cares about the case where -j is much faster).
(In reply to comment #15) > tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be > done on this bug, I attached a version without labeled breaks and it's much faster with -j than with -j -m -p. So the profiling bug is still reproducible, right?
Jan, you are absolutely right. I redid my tests from before: old attachment, with labels -j 2.73 (same as without any parameters) -m 0.88 -m -j -p 0.88 v8 0.81 new attachment, without labels -j 0.27 :) -m 0.88 -m -j -p 0.88 v8 0.81 So, removing labels gets back to basically the same situation as before, with -j being much faster, but diluted in -m -j -p. So the newer "without labelled breaks" attachment can be used to reproduce this issue. My 'tl;dr' comment from before can be ignored. In comment #13, I was testing on code that I compiled from scratch now. Probably some change in how emscripten works made the results different, or perhaps I just didn't remove enough labels (for simplicity I just wrote code to remove them from the inner loops, to begin with).
Assignee: wmccloskey → general
Current js shell numbers for the first attached testcase (using |5 64|): Interp: 5332.826 ms -j: 5408.841 ms -m: 492.374 ms -m -n: 240.655 ms Current js shell numbers for the second attached testcase (using |5 64|): Interp: 5329.897 ms -j: 5387.632 ms -m: 480.525 ms -m -n: 234.619 ms No real difference between the two testcases, and JM+TI is about 2x faster than plain JM. I don't have a JSC or v8 shell handy to try, but the numbers look good compared to the above results. Safe to call this bug WORKSFORME?
Yes. Thanks for running those numbers.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Trying to run v8 on those testcase throws a "RangeError: Maximum call stack size exceeded" exception.
bz: Did you run v8 with "--" to separate the parameters? Without that or without parameters at all, I get that error. d8 raytrace.js -- 5 64 should work.
> bz: Did you run v8 with "--" to separate the parameters? Ah, if I add that it does run. Overall runtime seems about the same, but that's external timing, not internal, so includes engine startup...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: