Closed Bug 1322561 Opened 8 years ago Closed 8 years ago

Wasm: run embenchen benchmarks in DEBUG builds on integration (with low setting)

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: lth, Assigned: lth)

Details

Attachments

(2 files)

The embenchen benchmarks exercise many aspects of wasm not exercised by our test suite, and in DEBUG builds I've found many issues in the baseline compiler with these. Some are more valuable than others, box2d is pretty tough for example. We should run them with a low setting, eg, "1", in debug builds, even in the arm simulator. (I'm not sure if the setting applies to a couple of them such as linpack, and if not, they can't really be run on the arm simulator. To be investigated.) We should always check the output, as well. I have code that does this though it is for bash. Some assembly required.
Priority: -- → P1
Even with low arguments the benchmarks take a very long time to run on the arm simulator, even in an opt-debug build. (I had some numbers but they are lost.) So this is not obviously practical.
You might want to disable: http://searchfox.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#5319 and AssertBasicGraph and related functions.
Thanks, that's a good tip. I suspect that even without DEBUG some of these programs may take too long to be useful on integration, but TBD. It's easy enough to test; I will do so now.
Is it the running time or compilation time that's long? If the former, we could at least have tests that compile them.
Yes, it's the running time. Running with option "0" to just compile gets us part of the way there, for sure, but running at least once on the lowest setting would be nice too...
Some random thoughts: - is your harness running one benchmark per JS file, or all the benchmarks at once in one JS file? the former would significantly reduce running time (had the same issue for wast tests imported from the spec repository). - could it be that generated code on ARM is... less good, and perf just a bit bad, and we should actually try improving that? It's a tough one to answer, because we don't really have anything to compare against (real devices on firefox for android, maybe?). We could compare the ratio of {time spent running Octane on x86/time spent running Octane on ARM} vs {time spent running embenchen on x86/time spent running embenchen on ARM}, "all other things being equal".
One benchmark per js file, standard embenchen setup just wrapped in a shell script. The ARM code is OK, I see running times on my ARM dev board that are no more than 4x worse than the x86 running times on my MacBoook Pro, which pretty good. Anyway, some preliminary data: Running an arm-simulator release-no-debug build on my MacBook Pro, I see 10 minutes 30 seconds wall time on one core (running with argument "1"), obviously we don't care about the running times so we can parallelize if we want, but we'll have to remove bullet, which takes five minutes by itself: box2d: 54931 bullet: 327044 conditionals: 4765 copy: 11836 corrections: 11405 fannkuch: 6647 fasta: 14986 ifs: 4205 linpack_float: 14149 lua_binarytrees: 22833 lua_scimark: 23906 memops: 37965 primes: 6591 skinning: 31725 zlib: 57005 (With argument "0" we finish in 40 seconds, most of that is setup time in a few of the benchmarks that don't bail directly after compilation - linpack, binarytrees, scimark.) I'll do a release-debug build next, for a subset of these cases, to see where the land lies.
Oh, and those are baseline-compiled running times, Ion is presumably quite a lot faster.
In most of the benchmarks we can create times in between "0" and "1", if that would be useful. There are only a few exceptions like fannkuch where that won't work.
Release-debug is only about twice as slow, which is promising. Here are times for a subset of programs: conditionals: 9275 copy: 18971 corrections: 26585 fannkuch: 16001 fasta: Assertion failure: has(reg), at .../js/src/jit/RegisterSets.h:750 ifs: 10023 linpack_float: Assertion failure: has(reg), at .../js/src/jit/RegisterSets.h:750 lua_binarytrees: 53444 lua_scimark: 41995 memops: 92099 primes: 12018 skinning: 66275 Time to get the regalloc bug fixed, I guess.
Priority: P1 → P2
I've promised to get this done this iteration, since we removed asm.js support from Rabaldr and need better test coverage.
Priority: P2 → P1
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Could use some feedback on the method here. The bulk of this patch is just the wasm box2d benchmark, copied over from the emscripten suite - ignore all of that. The interesting bits are in the driver, wasm_box2d.js, and maybe in the little patch to wasm.js. What I'm trying to accomplish is to run (or only compile, if the system is slow) the benchmark on a low setting and check that it doesn't crash and that the output looks sane. This will probably be useful to catch some integration problems. I'm thinking that maybe four programs that don't take over a minute each on my computer might be appropriate. (It will be a appropriate to also do this for baseline.) I could do this for all platforms, which is what currently happens, or only for some platforms, notably ARM and/or ARM-sim. Opinions, please.
Attachment #8838053 - Flags: feedback?(bbouvier)
Comment on attachment 8838053 [details] [diff] [review] bug1322561-embenchen-in-debug-builds.patch Review of attachment 8838053 [details] [diff] [review]: ----------------------------------------------------------------- Let's say the benchmark run in one second; the isSlowComputer() call may take up to 1 second too, so now we have a total of 2 seconds, which seems OK. Note that if we run with --no-ion or --no-baseline, the initial isSlowComputer() call may be slow, and the test skipped, while it could run perfectly fine (= quickly) in wasm. What happens with GC zeal? with a CGC build? Is the test just skipped, or do we run it very slowly? Also, I think testing for speed in JS is intrinsically hard, and under the assumption that we make the engine faster, we may not be able to rely on it. For instance, the fib call could be optimized and run very fast, but this would not be a good indicator that the machine is really. What if Ion gets Smart Enough © to memoize results of recursive calls? What if we implement at some point tail-call optimization? (I'm playing the devil's advocate since this is not supposed to happen without a new syntax, but you get the point) For instance, I'm pretty sure you're saving the value of fib(26) to make sure it's not DCE-d. So the test is already crafted to prevent an optimization to kick in. But we might not know which future optimizations may make this faster; and there's no way we can prevent people from implementing these, and then think about testing that isSlowComputer() is still slow. Even worse, this speed test would have to be done on *your machine*, since it's the reference. I realize this is a tradeoff we have to make. I'd be all in for running these tests on all the plaforms (including ARM), since they're likely to show platform-dependent issues. If they are too slow on slow platforms, maybe we could just introduce a new mode, between 0 and 1, as Alon suggested, and always run in this mode, not taking into account the speed of the current machine? This would test for crashiness and high-level correct behavior. Are these tests also doing checks of the numeric results, as part of the Module main program, whenever it's possible? ::: js/src/jit-test/tests/wasm/bench/wasm_box2d.js @@ +1,4 @@ > +load(libdir + "wasm.js"); > + > +// We want the assertions. > +if (!getBuildConfiguration()["debug"]) For what it's worth, an optimized-only build would call the JS assertEq functions, at least. @@ +8,5 @@ > +var output = []; > +var Module = { wasmBinaryFile: scriptdir + "wasm_box2d.wasm", > + arguments: [ arg ], > + print: function (msg) { > + output.push(msg); Just in case print sometimes take several args, I'd use ...rest params: print: function(...args) { output.push(...args); } or just even print: output.push.bind(output)
Attachment #8838053 - Flags: feedback?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #13) > Comment on attachment 8838053 [details] [diff] [review] > bug1322561-embenchen-in-debug-builds.patch > > Review of attachment 8838053 [details] [diff] [review]: > ----------------------------------------------------------------- > > I realize this is a tradeoff we have to make. I'd be all in for running > these tests on all the plaforms (including ARM), since they're likely to > show platform-dependent issues. If they are too slow on slow platforms, > maybe we could just introduce a new mode, between 0 and 1, as Alon > suggested, and always run in this mode, not taking into account the speed of > the current machine? I don't think the new mode is a good solution, because on sufficiently slow computers those programs will still be very slow. But I appreciate the pushback on the current method and a new mode may be the best we can do. > This would test for crashiness and high-level correct behavior. Are these > tests also doing checks of the numeric results, as part of the Module main > program, whenever it's possible? They are not currently checking, but I want them to do that. (I do that in my benchmark runner and it works reasonably well.) Not all of the programs have checkable output -- some print "ok", some print a frame rate that varies -- but they often have patterns that can be checked.
OK, second attempt. This adds a very scaled down version of the box2d wasm benchmark to tests/wasm/bench -- it computes one frame and exits. The sources and makefile etc for this will appear in my embenchen fork, most of it is already there but this program is configured specially, see the Makefile in src/box2d in that repo (those hacks have not landed yet). This runs quickly enough that I can run an arm-sim run on my laptop built for debug-no-optimize in about 40s. I consider this acceptable and we don't need any is-this-a-slow-compute guard. For native, it runs in a couple of seconds. There's really nothing to "see" in this patch, it has the .js, .wasm, and .wast output (do we really want the .wast here?) and then a directives.txt. This program does not check its output because it either prints a static string that it's meaningless to check, or it crashes. For other programs it will be interesting to do better.
Attachment #8850528 - Flags: review?(bbouvier)
Comment on attachment 8850528 [details] [diff] [review] bug1322561-box2d-test.patch Review of attachment 8850528 [details] [diff] [review]: ----------------------------------------------------------------- I worry about the build being hardly reproducible, as of now. Including the box2d c++ source is a no-go, though. Would it make sense to have a README in the bench/ dir, containing a few links to the instructions explaining how to compile the benchmark? The wast shouldn't be needed so could we remove it, please? Is there a need for a guard if wasm is disabled, in the JS file? How much time does it take to run the benchmark on the ARM simulator, in the slowest mode (i.e. with --no-baseline --no-ion), on tryherder machines? The default timeout for jit-tests is 150 seconds, so if it's close to that, we might need to reduce the running time even more :/ Anyways, this is very nice to have, especially with a low setting, so thank you for the patch. Generalizing this to other tests (by keeping a very low setting) would be nice too. ::: js/src/jit-test/tests/wasm/bench/directives.txt @@ +1,1 @@ > +|jit-test| test-also-wasm-baseline Could you also add the test mode which emits all the redundant bounds checks?
Attachment #8850528 - Flags: review?(bbouvier) → review+
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: