Closed
Bug 612019
Opened 15 years ago
Closed 14 years ago
Tracing typed arrays is super fast, but trace/method heuristics lose most of it
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WORKSFORME
| Tracking | Status | |
|---|---|---|
| blocking2.0 | --- | - |
People
(Reporter: azakai, Unassigned)
References
Details
Attachments
(2 files)
The attached code runs almost 4 times faster with -j than with -m, but with -m -j -p it loses most of that speed. Data:
raytrace - with typed arrays
sm -j 0.699 :)
sm -m 2.656
sm -m -j -p 2.074
v8 2.596
So -m -j -p is significantly faster than -m, which is very good, but most of the potential speedup appears to be lost.
For comparison, here is the same code without typed arrays:
raytrace - no typed arrays
sm -j 3.515
sm -m 3.812
sm -m -j -p 3.674
v8 2.522
Looks like without typed arrays there is not much of a difference.
(The attached source code uses typed arrays by default. They can be disabled by making the check for this.Int32Array and this.Float64Array turn out false).
| Reporter | ||
Comment 1•15 years ago
|
||
Forgot to say, the numbers above are when running the code with arguments
5 64
to the benchmark.
Blocks: 580468
Comment 2•15 years ago
|
||
I don't know if you are aware, but dvander is working on a patch that should make typed array use in the mjit much faster.
| Reporter | ||
Comment 3•15 years ago
|
||
Thanks, made this bug depend on that one. Maybe this issue will be resolved with dvander's work there.
Depends on: 594247
Comment 5•15 years ago
|
||
Alon, how the heck do I run the attached script? Just running it gives "too much recursion" errors. There's no obvious place to pass the arguments from comment 1.
Care to attach a testcase that can just be run in the shell?
Fwiw, I seriously doubt that bug 594247 will help enough here; the resulting code is still a lot slower than TM typed array code....
| Reporter | ||
Comment 6•15 years ago
|
||
Sorry about not cc'ing, I forgot...
The arguments should be passed to the shell, for example
./js -m -j -p src.cpp.cc.js 5 64
Comment 7•15 years ago
|
||
Ah, I see. Thanks. And I assume your timing is with |time|, not self-timed?
With the patch for bug 594247 applied, I see numbers like so:
-j: 0.21
-m: 0.41
-m -j -p: 0.33
-m -j: 0.36
Without that patch, I get, on the same hardware:
-m: 0.80
-m -j -p: 0.66
-m -j: 0.64
So we do get about 2x faster, but are still 1.5x slower than pure tracing...
Comment 8•15 years ago
|
||
Though note, this is 64-bit. It's possible that on 32-bit the gap is smaller.
Updated•15 years ago
|
blocking2.0: ? → -
Comment 9•14 years ago
|
||
This regressed with -j because we don't trace labeled break anymore; filed bug 626986.
Some numbers:
js: 3.12s
js -j: 3.21s
js -m: 1.29s
js -m -j -p: 1.30s
d8: 1.21s
Comment 10•14 years ago
|
||
Alon, almost all labeled breaks here are of the form
a:for(;;) {
if(!(g < C[N])) {
break a
}
// ...
}
Is it possible to generate normal breaks here until bug 626986 is fixed? If I do that manually we're at 0.36s with -j (9x faster).
Comment 11•14 years ago
|
||
Here's the version without labeled breaks. I'm attaching it in case someone wants to investigate the profiling problem, which I can still reproduce.
I took a quick look at this. I think it might be fixable with a few small tweaks to the heuristics. I'll look at it after FF4.
Assignee: general → wmccloskey
| Reporter | ||
Comment 13•14 years ago
|
||
(In reply to comment #10)
> Is it possible to generate normal breaks here until bug 626986 is fixed? If I
> do that manually we're at 0.36s with -j (9x faster).
Very interesting, thanks! I wrote a patch now for emscripten to generate fewer labels, and I get 4-8x speedups on -j depending on the benchmark.
However, running with -m -j -p has not changed, most of my benchmarks seem to be tied mainly to the method jit. (I guess perhaps there are other reasons the code doesn't trace well, or maybe just not enough labels were removed.)
Comment 14•14 years ago
|
||
No, that part is just this very bug.
| Reporter | ||
Comment 15•14 years ago
|
||
Sorry, I wasn't clear: I am now seeing -j being slower (with typed arrays) than -m. But in comment 0, -j is much faster. So something has changed since comment 0.
However, I did not test on the original code attached here, but on a new up to date version of the code. So it is possible a change there has something to do with the slowdown since comment 0.
But, the patch that stops tracing labelled breaks landed since then, and removing even just labels from the innermost loops definitely has a huge effect, as mentioned above. So the labelled breaks not being traced issue that Jan raised definitely seems like the important thing.
(Aside from that, there remains the issue in this specific bug, that even when tracing is much faster than -m, most of that is diluted in -m -p.)
tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be done on this bug, since -j is slower than -m anyhow (and this bug cares about the case where -j is much faster).
Comment 16•14 years ago
|
||
(In reply to comment #15)
> tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be
> done on this bug,
I attached a version without labeled breaks and it's much faster with -j than with -j -m -p. So the profiling bug is still reproducible, right?
| Reporter | ||
Comment 17•14 years ago
|
||
Jan, you are absolutely right. I redid my tests from before:
old attachment, with labels
-j 2.73 (same as without any parameters)
-m 0.88
-m -j -p 0.88
v8 0.81
new attachment, without labels
-j 0.27 :)
-m 0.88
-m -j -p 0.88
v8 0.81
So, removing labels gets back to basically the same situation as before, with -j being much faster, but diluted in -m -j -p. So the newer "without labelled breaks" attachment can be used to reproduce this issue. My 'tl;dr' comment from before can be ignored.
In comment #13, I was testing on code that I compiled from scratch now. Probably some change in how emscripten works made the results different, or perhaps I just didn't remove enough labels (for simplicity I just wrote code to remove them from the inner loops, to begin with).
Assignee: wmccloskey → general
Comment 18•14 years ago
|
||
Current js shell numbers for the first attached testcase (using |5 64|):
Interp: 5332.826 ms
-j: 5408.841 ms
-m: 492.374 ms
-m -n: 240.655 ms
Current js shell numbers for the second attached testcase (using |5 64|):
Interp: 5329.897 ms
-j: 5387.632 ms
-m: 480.525 ms
-m -n: 234.619 ms
No real difference between the two testcases, and JM+TI is about 2x faster than plain JM. I don't have a JSC or v8 shell handy to try, but the numbers look good compared to the above results. Safe to call this bug WORKSFORME?
| Reporter | ||
Comment 19•14 years ago
|
||
Yes. Thanks for running those numbers.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Comment 20•14 years ago
|
||
Trying to run v8 on those testcase throws a "RangeError: Maximum call stack size exceeded" exception.
| Reporter | ||
Comment 21•14 years ago
|
||
bz: Did you run v8 with "--" to separate the parameters? Without that or without parameters at all, I get that error.
d8 raytrace.js -- 5 64
should work.
Comment 22•14 years ago
|
||
> bz: Did you run v8 with "--" to separate the parameters?
Ah, if I add that it does run. Overall runtime seems about the same, but that's external timing, not internal, so includes engine startup...
You need to log in
before you can comment on or make changes to this bug.
Description
•