speed up unittests turnaround time by running each test suite concurrently

RESOLVED DUPLICATE of bug 452861

Status

Release Engineering
General
RESOLVED DUPLICATE of bug 452861
9 years ago
4 years ago

People

(Reporter: joduinn, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(This has been talked about for a while, but I cant find a tracking bug, so filing this. If I've missed an existing bug, pleaes close this as a dup.)

The end-to-end time from "code change being checked in" to "unittest completed" is broken into two parts:

1) reduce waiting time before starting:
Pending unittest jobs used to wait until the current-in-progress unittest run completed, before they could even start. This idle waiting time has been removed by consolidating the unittest machines into the pool of build slaves. We now start a new pending unittest job on any available slave in the pool even when current-in-progress unittest run is still ongoing on another slave in the pool.

2) reduce unittest execution time:
Unittest is actually a handful of suites. Each of these suites can be run concurrently, so long as we gather the results together correctly afterwards. Doing this will reduce the time taken to actually run unittests. If we find one specific suite takes longer then others, we can split the long suite into multiple smaller suites, each to be run concurrently. This bug tracks splitting out the separate suites and figuring out to collect the results as they become available. Requires bug#421611.

Triaged to Future, because of blocking dependency.
This is also an important issue for running the tests on Fennec.  It would take over 1 day to run them all in a series.  I agree that we need to get the tests running on an arbitrary build before making this a reality.  There are probably a handful of other items that need to be sorted out and don't necessarily depend on the work that Ted is doing for bug 421611.
Duplicate of this bug: 452067
I think there are four additional prerequisites to running test suites in parallel. The first is breaking up (or speeding up) mochitest. Here are some elapsed times from a win32 unit-test builds on mozilla-central:
      pull & compile  : 1 hr  4 min           (51%)
      Tunit           :       4 min           ( 3%)
      reftest         :       6 min  15 sec   ( 5%)
      crashtest       :       1 min  30 sec   ( 1%)
      mochitest       :      45 min  30 sec   (36%)
      mochichrome     :       2 min           ( 2%)
      browserchrome   :       3 min  30 sec   ( 3%)
      -------------------------------------
      Total run time  : 2 hr  5 min 
So it's clear mochitest is using up the lion's share of the non-build time, and you don't win as much by parallelising in that situation. Note that this is a "representative sampling", not an average or anything fancy.

The second issue is compile time. Currently the win32 unit test build does not do PGO, while the main win32 builder does. If we move to a system where we build a normal PGO build (and a package of tests), and then start running tests, then we would increase the build time from (roughly) one to two hours, and the time from push to mochitest result from 2 hours to 2hr 45 min. ie degrading the metric most developers care about when they're landing changes. We might be able to get better build times (bug 468554) to help this, or we make that trade off for the other benefits. Same applies to Mac, where we don't do universal builds for unit tests. Parallelism would be a small win on linux, better still if point 1 is addressed.

Third, we have to add slave capacity _before_ this goes live, since speedups rely on slave availability. When we converted unit tests from dedicated machines to the buildbot pool we effectively lost responsiveness because we were assigning more work to an unchanged pool of machines.

Fourth, --enable-tests and Mac universal builds are currently incompatible due to failures running unify, see bug 445611 comment #28.

Not sure how the test results would be shown on tinderbox, but it would also be great if we could avoid splitting each unit column on the tinderbox waterfall several ways.
Mochitest can be parallelized on a per-directory basis, as long as we make sure to run all the directories.  I wonder whether there is a particular directory that takes a really long time....
currently we run mochitest per directory on fennec (for memory issues).  The problem is when doing this on a desktop Firefox build, we get different results than running it as a single test.  I know ctalbert is looking into making this more stable.  Just something to keep in mind.

Updated

9 years ago
Depends on: 383136
I think this is a dup of 452861.  Re-open and explain the differences if not.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 452861

Comment 7

8 years ago
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.