Closed
Bug 1322561
Opened 8 years ago
Closed 8 years ago
Wasm: run embenchen benchmarks in DEBUG builds on integration (with low setting)
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
Details
Attachments
(2 files)
186.39 KB,
patch
|
Details | Diff | Splinter Review | |
2.28 MB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
The embenchen benchmarks exercise many aspects of wasm not exercised by our test suite, and in DEBUG builds I've found many issues in the baseline compiler with these. Some are more valuable than others, box2d is pretty tough for example.
We should run them with a low setting, eg, "1", in debug builds, even in the arm simulator. (I'm not sure if the setting applies to a couple of them such as linpack, and if not, they can't really be run on the arm simulator. To be investigated.)
We should always check the output, as well. I have code that does this though it is for bash. Some assembly required.
Updated•8 years ago
|
Priority: -- → P1
Assignee | ||
Comment 1•8 years ago
|
||
Even with low arguments the benchmarks take a very long time to run on the arm simulator, even in an opt-debug build. (I had some numbers but they are lost.) So this is not obviously practical.
Comment 2•8 years ago
|
||
You might want to disable:
http://searchfox.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#5319 and AssertBasicGraph and related functions.
Assignee | ||
Comment 3•8 years ago
|
||
Thanks, that's a good tip. I suspect that even without DEBUG some of these programs may take too long to be useful on integration, but TBD. It's easy enough to test; I will do so now.
Comment 4•8 years ago
|
||
Is it the running time or compilation time that's long? If the former, we could at least have tests that compile them.
Assignee | ||
Comment 5•8 years ago
|
||
Yes, it's the running time. Running with option "0" to just compile gets us part of the way there, for sure, but running at least once on the lowest setting would be nice too...
Comment 6•8 years ago
|
||
Some random thoughts:
- is your harness running one benchmark per JS file, or all the benchmarks at once in one JS file? the former would significantly reduce running time (had the same issue for wast tests imported from the spec repository).
- could it be that generated code on ARM is... less good, and perf just a bit bad, and we should actually try improving that? It's a tough one to answer, because we don't really have anything to compare against (real devices on firefox for android, maybe?). We could compare the ratio of {time spent running Octane on x86/time spent running Octane on ARM} vs {time spent running embenchen on x86/time spent running embenchen on ARM}, "all other things being equal".
Assignee | ||
Comment 7•8 years ago
|
||
One benchmark per js file, standard embenchen setup just wrapped in a shell script.
The ARM code is OK, I see running times on my ARM dev board that are no more than 4x worse than the x86 running times on my MacBoook Pro, which pretty good.
Anyway, some preliminary data:
Running an arm-simulator release-no-debug build on my MacBook Pro, I see 10 minutes 30 seconds wall time on one core (running with argument "1"), obviously we don't care about the running times so we can parallelize if we want, but we'll have to remove bullet, which takes five minutes by itself:
box2d: 54931
bullet: 327044
conditionals: 4765
copy: 11836
corrections: 11405
fannkuch: 6647
fasta: 14986
ifs: 4205
linpack_float: 14149
lua_binarytrees: 22833
lua_scimark: 23906
memops: 37965
primes: 6591
skinning: 31725
zlib: 57005
(With argument "0" we finish in 40 seconds, most of that is setup time in a few of the benchmarks that don't bail directly after compilation - linpack, binarytrees, scimark.)
I'll do a release-debug build next, for a subset of these cases, to see where the land lies.
Assignee | ||
Comment 8•8 years ago
|
||
Oh, and those are baseline-compiled running times, Ion is presumably quite a lot faster.
Comment 9•8 years ago
|
||
In most of the benchmarks we can create times in between "0" and "1", if that would be useful. There are only a few exceptions like fannkuch where that won't work.
Assignee | ||
Comment 10•8 years ago
|
||
Release-debug is only about twice as slow, which is promising. Here are times for a subset of programs:
conditionals: 9275
copy: 18971
corrections: 26585
fannkuch: 16001
fasta: Assertion failure: has(reg), at .../js/src/jit/RegisterSets.h:750
ifs: 10023
linpack_float: Assertion failure: has(reg), at .../js/src/jit/RegisterSets.h:750
lua_binarytrees: 53444
lua_scimark: 41995
memops: 92099
primes: 12018
skinning: 66275
Time to get the regalloc bug fixed, I guess.
Assignee | ||
Updated•8 years ago
|
Priority: P1 → P2
Assignee | ||
Comment 11•8 years ago
|
||
I've promised to get this done this iteration, since we removed asm.js support from Rabaldr and need better test coverage.
Priority: P2 → P1
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Assignee | ||
Comment 12•8 years ago
|
||
Could use some feedback on the method here. The bulk of this patch is just the wasm box2d benchmark, copied over from the emscripten suite - ignore all of that. The interesting bits are in the driver, wasm_box2d.js, and maybe in the little patch to wasm.js.
What I'm trying to accomplish is to run (or only compile, if the system is slow) the benchmark on a low setting and check that it doesn't crash and that the output looks sane. This will probably be useful to catch some integration problems. I'm thinking that maybe four programs that don't take over a minute each on my computer might be appropriate.
(It will be a appropriate to also do this for baseline.)
I could do this for all platforms, which is what currently happens, or only for some platforms, notably ARM and/or ARM-sim. Opinions, please.
Attachment #8838053 -
Flags: feedback?(bbouvier)
Comment 13•8 years ago
|
||
Comment on attachment 8838053 [details] [diff] [review]
bug1322561-embenchen-in-debug-builds.patch
Review of attachment 8838053 [details] [diff] [review]:
-----------------------------------------------------------------
Let's say the benchmark run in one second; the isSlowComputer() call may take up to 1 second too, so now we have a total of 2 seconds, which seems OK. Note that if we run with --no-ion or --no-baseline, the initial isSlowComputer() call may be slow, and the test skipped, while it could run perfectly fine (= quickly) in wasm. What happens with GC zeal? with a CGC build? Is the test just skipped, or do we run it very slowly?
Also, I think testing for speed in JS is intrinsically hard, and under the assumption that we make the engine faster, we may not be able to rely on it. For instance, the fib call could be optimized and run very fast, but this would not be a good indicator that the machine is really. What if Ion gets Smart Enough © to memoize results of recursive calls? What if we implement at some point tail-call optimization? (I'm playing the devil's advocate since this is not supposed to happen without a new syntax, but you get the point) For instance, I'm pretty sure you're saving the value of fib(26) to make sure it's not DCE-d. So the test is already crafted to prevent an optimization to kick in. But we might not know which future optimizations may make this faster; and there's no way we can prevent people from implementing these, and then think about testing that isSlowComputer() is still slow. Even worse, this speed test would have to be done on *your machine*, since it's the reference.
I realize this is a tradeoff we have to make. I'd be all in for running these tests on all the plaforms (including ARM), since they're likely to show platform-dependent issues. If they are too slow on slow platforms, maybe we could just introduce a new mode, between 0 and 1, as Alon suggested, and always run in this mode, not taking into account the speed of the current machine?
This would test for crashiness and high-level correct behavior. Are these tests also doing checks of the numeric results, as part of the Module main program, whenever it's possible?
::: js/src/jit-test/tests/wasm/bench/wasm_box2d.js
@@ +1,4 @@
> +load(libdir + "wasm.js");
> +
> +// We want the assertions.
> +if (!getBuildConfiguration()["debug"])
For what it's worth, an optimized-only build would call the JS assertEq functions, at least.
@@ +8,5 @@
> +var output = [];
> +var Module = { wasmBinaryFile: scriptdir + "wasm_box2d.wasm",
> + arguments: [ arg ],
> + print: function (msg) {
> + output.push(msg);
Just in case print sometimes take several args, I'd use ...rest params:
print: function(...args) {
output.push(...args);
}
or just even
print: output.push.bind(output)
Attachment #8838053 -
Flags: feedback?(bbouvier)
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #13)
> Comment on attachment 8838053 [details] [diff] [review]
> bug1322561-embenchen-in-debug-builds.patch
>
> Review of attachment 8838053 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I realize this is a tradeoff we have to make. I'd be all in for running
> these tests on all the plaforms (including ARM), since they're likely to
> show platform-dependent issues. If they are too slow on slow platforms,
> maybe we could just introduce a new mode, between 0 and 1, as Alon
> suggested, and always run in this mode, not taking into account the speed of
> the current machine?
I don't think the new mode is a good solution, because on sufficiently slow computers those programs will still be very slow. But I appreciate the pushback on the current method and a new mode may be the best we can do.
> This would test for crashiness and high-level correct behavior. Are these
> tests also doing checks of the numeric results, as part of the Module main
> program, whenever it's possible?
They are not currently checking, but I want them to do that. (I do that in my benchmark runner and it works reasonably well.) Not all of the programs have checkable output -- some print "ok", some print a frame rate that varies -- but they often have patterns that can be checked.
Assignee | ||
Comment 15•8 years ago
|
||
OK, second attempt. This adds a very scaled down version of the box2d wasm benchmark to tests/wasm/bench -- it computes one frame and exits. The sources and makefile etc for this will appear in my embenchen fork, most of it is already there but this program is configured specially, see the Makefile in src/box2d in that repo (those hacks have not landed yet).
This runs quickly enough that I can run an arm-sim run on my laptop built for debug-no-optimize in about 40s. I consider this acceptable and we don't need any is-this-a-slow-compute guard. For native, it runs in a couple of seconds.
There's really nothing to "see" in this patch, it has the .js, .wasm, and .wast output (do we really want the .wast here?) and then a directives.txt.
This program does not check its output because it either prints a static string that it's meaningless to check, or it crashes. For other programs it will be interesting to do better.
Attachment #8850528 -
Flags: review?(bbouvier)
Comment 16•8 years ago
|
||
Comment on attachment 8850528 [details] [diff] [review]
bug1322561-box2d-test.patch
Review of attachment 8850528 [details] [diff] [review]:
-----------------------------------------------------------------
I worry about the build being hardly reproducible, as of now. Including the box2d c++ source is a no-go, though. Would it make sense to have a README in the bench/ dir, containing a few links to the instructions explaining how to compile the benchmark?
The wast shouldn't be needed so could we remove it, please?
Is there a need for a guard if wasm is disabled, in the JS file?
How much time does it take to run the benchmark on the ARM simulator, in the slowest mode (i.e. with --no-baseline --no-ion), on tryherder machines? The default timeout for jit-tests is 150 seconds, so if it's close to that, we might need to reduce the running time even more :/
Anyways, this is very nice to have, especially with a low setting, so thank you for the patch. Generalizing this to other tests (by keeping a very low setting) would be nice too.
::: js/src/jit-test/tests/wasm/bench/directives.txt
@@ +1,1 @@
> +|jit-test| test-also-wasm-baseline
Could you also add the test mode which emits all the redundant bounds checks?
Attachment #8850528 -
Flags: review?(bbouvier) → review+
Assignee | ||
Comment 17•8 years ago
|
||
Assignee | ||
Comment 18•8 years ago
|
||
Assignee | ||
Comment 19•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/551bb806cebf60b7d421a045b807af9ec3c32dea
Bug 1322561 - Add scaled-down box2d wasm benchmark to test suite. r=bbouvier
Comment 20•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•