Closed Bug 612019 Opened 15 years ago Closed 14 years ago

Tracing typed arrays is super fast, but trace/method heuristics lose most of it

Tracking

()

Status:

RESOLVED WORKSFORME

Tracking Flags:

Tracking

Status

blocking2.0

---

People

(Reporter: azakai, Unassigned)

References

Details

Attachments

(2 files)

raytrace benchmark, with typed arrays by default 15 years ago Alon Zakai (:azakai) 15.29 KB, application/javascript		Details
raytrace benchmark, with typed arrays, without labeled breaks 14 years ago Jan de Mooij [:jandem] 15.24 KB, application/x-javascript		Details

Alon Zakai (:azakai)

Reporter

Description

•

15 years ago

Attached file raytrace benchmark, with typed arrays by default — Details

The attached code runs almost 4 times faster with -j than with -m, but with -m -j -p it loses most of that speed. Data: raytrace - with typed arrays sm -j 0.699 :) sm -m 2.656 sm -m -j -p 2.074 v8 2.596 So -m -j -p is significantly faster than -m, which is very good, but most of the potential speedup appears to be lost. For comparison, here is the same code without typed arrays: raytrace - no typed arrays sm -j 3.515 sm -m 3.812 sm -m -j -p 3.674 v8 2.522 Looks like without typed arrays there is not much of a difference. (The attached source code uses typed arrays by default. They can be disabled by making the check for this.Int32Array and this.Float64Array turn out false).

Alon Zakai (:azakai)

Reporter

Comment 1

•

15 years ago

Forgot to say, the numbers above are when running the code with arguments 5 64 to the benchmark.

Blocks: 580468

Luke Wagner [:luke]

Comment 2

•

15 years ago

I don't know if you are aware, but dvander is working on a patch that should make typed array use in the mjit much faster.

Alon Zakai (:azakai)

Reporter

Comment 3

•

15 years ago

Thanks, made this bug depend on that one. Maybe this issue will be resolved with dvander's work there.

Depends on: 594247

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

Didn't I say to cc Bill? ;)

blocking2.0: --- → ?

Boris Zbarsky [:bzbarsky]

Comment 5

•

15 years ago

Alon, how the heck do I run the attached script? Just running it gives "too much recursion" errors. There's no obvious place to pass the arguments from comment 1. Care to attach a testcase that can just be run in the shell? Fwiw, I seriously doubt that bug 594247 will help enough here; the resulting code is still a lot slower than TM typed array code....

Alon Zakai (:azakai)

Reporter

Comment 6

•

15 years ago

Sorry about not cc'ing, I forgot... The arguments should be passed to the shell, for example ./js -m -j -p src.cpp.cc.js 5 64

Boris Zbarsky [:bzbarsky]

Comment 7

•

15 years ago

Ah, I see. Thanks. And I assume your timing is with |time|, not self-timed? With the patch for bug 594247 applied, I see numbers like so: -j: 0.21 -m: 0.41 -m -j -p: 0.33 -m -j: 0.36 Without that patch, I get, on the same hardware: -m: 0.80 -m -j -p: 0.66 -m -j: 0.64 So we do get about 2x faster, but are still 1.5x slower than pure tracing...

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

Though note, this is 64-bit. It's possible that on 32-bit the gap is smaller.

Robert Sayre

Updated

•

15 years ago

blocking2.0: ? → -

Jan de Mooij [:jandem]

Updated

•

14 years ago

Depends on: 626986

Jan de Mooij [:jandem]

Comment 9

•

14 years ago

This regressed with -j because we don't trace labeled break anymore; filed bug 626986. Some numbers: js: 3.12s js -j: 3.21s js -m: 1.29s js -m -j -p: 1.30s d8: 1.21s

Jan de Mooij [:jandem]

Comment 10

•

14 years ago

Alon, almost all labeled breaks here are of the form a:for(;;) { if(!(g < C[N])) { break a } // ... } Is it possible to generate normal breaks here until bug 626986 is fixed? If I do that manually we're at 0.36s with -j (9x faster).

Jan de Mooij [:jandem]

Comment 11

•

14 years ago

Attached file raytrace benchmark, with typed arrays, without labeled breaks — Details

Here's the version without labeled breaks. I'm attaching it in case someone wants to investigate the profiling problem, which I can still reproduce.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 12

•

14 years ago

I took a quick look at this. I think it might be fixable with a few small tweaks to the heuristics. I'll look at it after FF4.

Assignee: general → wmccloskey

Alon Zakai (:azakai)

Reporter

Comment 13

•

14 years ago

(In reply to comment #10) > Is it possible to generate normal breaks here until bug 626986 is fixed? If I > do that manually we're at 0.36s with -j (9x faster). Very interesting, thanks! I wrote a patch now for emscripten to generate fewer labels, and I get 4-8x speedups on -j depending on the benchmark. However, running with -m -j -p has not changed, most of my benchmarks seem to be tied mainly to the method jit. (I guess perhaps there are other reasons the code doesn't trace well, or maybe just not enough labels were removed.)

Boris Zbarsky [:bzbarsky]

Comment 14

•

14 years ago

No, that part is just this very bug.

Alon Zakai (:azakai)

Reporter

Comment 15

•

14 years ago

Sorry, I wasn't clear: I am now seeing -j being slower (with typed arrays) than -m. But in comment 0, -j is much faster. So something has changed since comment 0. However, I did not test on the original code attached here, but on a new up to date version of the code. So it is possible a change there has something to do with the slowdown since comment 0. But, the patch that stops tracing labelled breaks landed since then, and removing even just labels from the innermost loops definitely has a huge effect, as mentioned above. So the labelled breaks not being traced issue that Jan raised definitely seems like the important thing. (Aside from that, there remains the issue in this specific bug, that even when tracing is much faster than -m, most of that is diluted in -m -p.) tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be done on this bug, since -j is slower than -m anyhow (and this bug cares about the case where -j is much faster).

Jan de Mooij [:jandem]

Comment 16

•

14 years ago

(In reply to comment #15) > tl;dr: Before labelled breaks are traced (bug 626986), there is no work to be > done on this bug, I attached a version without labeled breaks and it's much faster with -j than with -j -m -p. So the profiling bug is still reproducible, right?

Alon Zakai (:azakai)

Reporter

Comment 17

•

14 years ago

Jan, you are absolutely right. I redid my tests from before: old attachment, with labels -j 2.73 (same as without any parameters) -m 0.88 -m -j -p 0.88 v8 0.81 new attachment, without labels -j 0.27 :) -m 0.88 -m -j -p 0.88 v8 0.81 So, removing labels gets back to basically the same situation as before, with -j being much faster, but diluted in -m -j -p. So the newer "without labelled breaks" attachment can be used to reproduce this issue. My 'tl;dr' comment from before can be ignored. In comment #13, I was testing on code that I compiled from scratch now. Probably some change in how emscripten works made the results different, or perhaps I just didn't remove enough labels (for simplicity I just wrote code to remove them from the inner loops, to begin with).

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

14 years ago

Assignee: wmccloskey → general

Ryan VanderMeulen [:RyanVM]

Comment 18

•

14 years ago

Current js shell numbers for the first attached testcase (using |5 64|): Interp: 5332.826 ms -j: 5408.841 ms -m: 492.374 ms -m -n: 240.655 ms Current js shell numbers for the second attached testcase (using |5 64|): Interp: 5329.897 ms -j: 5387.632 ms -m: 480.525 ms -m -n: 234.619 ms No real difference between the two testcases, and JM+TI is about 2x faster than plain JM. I don't have a JSC or v8 shell handy to try, but the numbers look good compared to the above results. Safe to call this bug WORKSFORME?

Alon Zakai (:azakai)

Reporter

Comment 19

•

14 years ago

Yes. Thanks for running those numbers.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WORKSFORME

Boris Zbarsky [:bzbarsky]

Comment 20

•

14 years ago

Trying to run v8 on those testcase throws a "RangeError: Maximum call stack size exceeded" exception.

Alon Zakai (:azakai)

Reporter

Comment 21

•

14 years ago

bz: Did you run v8 with "--" to separate the parameters? Without that or without parameters at all, I get that error. d8 raytrace.js -- 5 64 should work.

Boris Zbarsky [:bzbarsky]

Comment 22

•

14 years ago

> bz: Did you run v8 with "--" to separate the parameters? Ah, if I add that it does run. Overall runtime seems about the same, but that's external timing, not internal, so includes engine startup...

You need to log in before you can comment on or make changes to this bug.