Closed Bug 845748 Opened 11 years ago Closed 5 years ago

Tracking bug for parallelizing testsuites to reduce end-to-end times

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [buildfaster:?][capacity])

During the A-Team work week in November, we mentioned making as many of the harnesses run in parallel as possible (in order to reduce infrastructure load through >1 core usage, and also reduce testing time for devs locally), however as far as I can tell, we don't have a tracking bug for this yet.

Lowest hanging fruit is xpcshell aiui (bug 660788); but I'd like us to look at as many of the harnesses as possible.
Depends on: 819048
Depends on: 827960
Summary: Tracking bug for running testsuites in parallel to reduce end-to-end times → Tracking bug for parallelizing testsuites to reduce end-to-end times
I want to emphasize that the various bugs tracked here cumulatively represent the biggest thing we can do to reduce load on the build infrastructure. The wall time spent running tests completely dwarfs the wall time actually building the tree. Since we build in parallel but run tests serially, the build system looks efficient compared to the test running infrastructure!

Because of all the benefits of reduced infra load and the general trend towards more cores in CPUs, I think making the tests in parallel should be a Project priority.
Quoting gps from dev.platform:
{
To attach some numbers to the importance of this, in that set of builds I posted in the other thread about an hour ago, reftests accounted for 68,535,661 seconds and crashtests accounted for 15,195,143. The sum of all times in that sample was 486,014,956s. Reftests and crashtests accounted for 14.1% and 3.1% of the total times of builds, respectively.

These are non-trivial percentages. Any gains in wall clock efficiency would net huge wins for automation capacity.

While I'm here, mochitests and xpcshell tests consume 37% and 7%, respectively. 
}
Whiteboard: [buildfaster:?] → [buildfaster:?][capacity]
Depends on: parxpc
We have some early returns from system monitoring in mozharness (bug 859573). On OS X, it appears the overall system resource usage during xpcshell test execution is around 10-12%!

https://tbpl.mozilla.org/php/getParsedLog.php?id=25251657&tree=Cedar&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=25249673&tree=Cedar&full=1

Taking one example:

19:37:52     INFO - Total resource usage - Wall time: 874s; CPU: 10%; Read bytes: 28880896; Write bytes: 10214718464; Read time: 16594; Write time: 464182
19:37:52     INFO - install - Wall time: 18s; CPU: 13%; Read bytes: 215960576; Write bytes: 302470144; Read time: 17683; Write time: 30979
19:37:52     INFO - run-tests - Wall time: 856s; CPU: 10%; Read bytes: 21164032; Write bytes: 9909498880; Read time: 14153; Write time: 430615

Assuming we can get 100% CPU usage via parallel execution, we'd knock wall time down from 856s to 86s. Multiply 770s by the number of xpcshell test jobs over time and you get some monster savings, I reckon.
I don't know that we can ever hit 100% in some of these things, our xpcshell tests are very I/O heavy (storage etc), but I would bet we can do better than 12%.
Can we run the xpcshell tests on a RAM disk so that we don't have to deal with slow I/O?
No longer depends on: parxpc
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.