For something that should be CPU bound, we are wasting a lot of wall clock time.
For kicks, I hacked up a standalone C++ program that takes a list of JS files and executes them under separate contexts (each file is executed in 18 separate contexts for the various combinations of engine/context related options). So, instead of thousands of processes, we have 1 process running thousands of contexts.
The results are very promising! The (single threaded) process is consistently maxing out a full core. On my reasonably fast i7-2600K, wall time goes from 8:40 to 7:00, a healthy 20% reduction. Of course, this was only on a single core. Since I'm now maxing out a core, I could theoretically get near linear gains by making things multi-threaded and executing on multiple cores. Assuming linear gains, going to 4 cores would yield an overall wall time of 1:45. And 8 cores would be ~0:52s. Not too shabby considering we started with 8:40 wall time.
I'm not sure how long it takes our current build machines to run the JIT test suite (I couldn't find good timing data in the logs). But, my current model MBP takes ~22 minutes. A savings of 20% with single core would net ~4.5 minutes wall time. If I utilized 4 cores, I might be looking at ~17 minutes wall time savings. Now that's going faster!
Now, this approach isn't all rosy. One gotcha is that on some failures, processes can segfault. The current execution method works around this by segregating every test on a new process. So, if we're serious about going faster and shaving minutes off of build times, we'll need to work around this drawback. There are various solutions such as having the jit_test.py driver restart the master execution process after where it crashed. Or, the C++ process could fork and do work in the children (so the parent process can catch a crash and recover gracefully.) Or, Python could start up a C++ worker pool and start tests via IPC and respawn after failure. Anyway, there are solutions. The new world would likely be a little more complicated than current, but I think the potential gains are worth it. For this type of solution, I made the assumption that misbehaving JIT code won't corrupt the underlying JSRuntime and that JSContext instances are completely isolated. I have no clue if these are valid assumptions. (Can someone in JS land validate?)
My proof of concept code is located at https://github.com/indygreg/mozilla-central/tree/jit-test-speedup. The main diff from m-c can be found at https://github.com/indygreg/mozilla-central/commit/ddfd6caacb8c779cb0eea44af8a0a881689a0318. I'm fully aware that the code is crap and jit_test.py is horribly broken. My main objective in writing it was to prove my hunch that the current implementation was far from optimal and that we could go much faster. I think I've made the case on both points and now relinquish this bug to the JS and RelEng teams for further action.
I was looking at the test running code in js/src last night over a beer. It seems to me the "easy" solution here would be to refactor jit_test.py to use the same test running "framework" as jstests.py. The crazy single process model described in the initial comment could be deferred to a follow-up bug if things aren't fast enough.
FWIW, the timings for jit_tests.py on my MBP are as follows:
We have ~13 minutes of CPU time running tests with one core. Assuming we could max out all 4 physical cores plus the 4 hyperthreading threads and yield 25% from hyperthreading, we'd get a nice 5x speedup and would execute tests in about 2.5 minutes!
|make check| takes about 30 minutes on builbot machines (this executes jit_tests.py). Parallel jit_test.py execution would shave a *lot* of time off of |make check| and free up build machines to perform more builds.
FYI bug 638219 covers merging those two harnesses.
Also I'd love to get these harnesses packaged up with the rest of the tests and run on the test slaves instead of the build slaves. Then we could fix them to run on mobile as well and gain the ARM test coverage we're currently lacking.
*** This bug has been marked as a duplicate of bug 638219 ***