Closed Bug 1309060 Opened 8 years ago Closed 8 years ago

Python unit tests take significantly longer to run

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox53 fixed)

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: glandium, Assigned: ahal)

References

Details

Attachments

(4 files)

Before bug 1304593, python unit tests were running in parallel. Bug 1304593 broke that, and they now run sequentially, making running them take significantly more time than they used to.
Specifically, on my machine, running mach build python/check with 27f488a1f9efa704afc98f2319aca94eaed4b32d backed out, takes 30s (and that's only because python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py takes 15s and runs last), while running mach python-test python takes 50s.
Giving |mach python-test| the ability to run in parallel shouldn't be too hard and would benefit the local case too. I'll see what I can do.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
With the above patches on Linux, running ./mach python-test -j8 python was about 20 seconds faster than with -j1. I see similar speedups in automation. Here's a try run with -j1: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6e159bf658d3a043f167fd96decf29e4d1300966 Here's a try run with -j8: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5bd0b383510b9afb5120bb90e2e2017762577c3 Is there a way to determine the number of cpus from a Makefile? Or is hardcoding 8 acceptable?
Why not just have the test runner use multiprocessing.cpu_count() to set the default value, and override it with -j1 if needed?
When more than 1 process is used, the output gets buffered and passed back to the main thread so test results don't get interleaved. The downside this is when you run into a long running test like test_toolchain_configure.py, or worse testing/mozbase/test.py (as far as |mach python-test| is concerned, the entire mozbase suite is a single test) it looks like the command has hung. I don't have a great solution to this, so I figured it could just default to a single process when running locally.
The output was not buffered when we were using make, and I don't see much reason to care more now. Especially when each line is essentially TEST-STATUS | test/file | test name It doesn't matter that much if those are interleaved.
Also note that for large tests like python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py, it would actually be beneficial to run each individual test in a separate process.
(In reply to Mike Hommey [:glandium] from comment #8) > The output was not buffered when we were using make, and I don't see much > reason to care more now. > Especially when each line is essentially > TEST-STATUS | test/file | test name > > It doesn't matter that much if those are interleaved. I disagree, especially for pytest based tests which don't have that format at all. Plus several tests dump stuff to stdout which could help with debugging, or we may want to enable more verbose logging in the tests to figure out what's failing. But um.. are you sure that we were running these tests in parallel before bug 1304593? Here's a log from before that landed, and there is definitely no interleaving going on: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux/1475472836/mozilla-central-linux-bm72-build1-build20.txt.gz I can't find any evidence of this being the case in automation.
(In reply to Andrew Halberstadt [:ahal] from comment #10) > (In reply to Mike Hommey [:glandium] from comment #8) > > The output was not buffered when we were using make, and I don't see much > > reason to care more now. > > Especially when each line is essentially > > TEST-STATUS | test/file | test name > > > > It doesn't matter that much if those are interleaved. > > I disagree, especially for pytest based tests which don't have that format > at all. Which is a problem in itself. > Plus several tests dump stuff to stdout which could help with > debugging, or we may want to enable more verbose logging in the tests to > figure out what's failing. > > But um.. are you sure that we were running these tests in parallel before > bug 1304593? Here's a log from before that landed, and there is definitely > no interleaving going on: > https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central- > linux/1475472836/mozilla-central-linux-bm72-build1-build20.txt.gz > > I can't find any evidence of this being the case in automation. Now that you mention it, we were running them manually with make -k, and without -j, so no. They would have if they had been run with mach, which I expect developers running them locally do (I do).
> Which is a problem in itself. Either way, I really don't want to sacrifice non-interleaved logs. I think we can perhaps improve the UX by drawing a progress bar or similar at the bottom a la |mach build|, then enable -j by default. Don't forget you can also add an alias in your ~/.mozbuild/machrc: [alias] python-test = python-test -j8
Comment on attachment 8800225 [details] Bug 1309060 - Give |mach python-test| the ability to run tests in parallel, https://reviewboard.mozilla.org/r/85212/#review86580 I'm a fan of running the Python tests in parallel. I'm not a fan of rolling our own a) thread pool b) python test harness. While I'd prefer we use a proper Python test harness (nose or pytest), I understand that's significant scope bloat. In lieux of doing that, let's refactor the custom thread pool to use futures.ThreadPoolExecutor or futures.ProcessPoolExecutor.
Attachment #8800225 - Flags: review?(gps) → review-
Comment on attachment 8800226 [details] Bug 1309060 - Run |mach python-test| in parallel from make check, https://reviewboard.mozilla.org/r/85214/#review86584 We shouldn't hard-code the concurrency because different machines have different numbers of cores. There is some code in client.mk that invokes Python's multiprocessing module to determine the number of CPU cores. Of course, this raises a larger problem: concurrency should ideally be handled by `make` running multiple targets, not running a single target that runs N processes. For this use case, it should be fine. Just something to think about. While I'm here, build tasks in TaskCluster must wait on `make check` (and Python unit tests) to complete before kicking off derived tasks (namely tests). We would ideally move Python unit tests out of the build task in automation. Although, we would likely retain some Python tests - notably related to the build system itself - because we want the build to fail and tests not to run if the build system fails its core tests.
Attachment #8800226 - Flags: review?(gps) → review-
(In reply to Gregory Szorc [:gps] from comment #14) > Of course, this raises a larger problem: concurrency should ideally be > handled by `make` running multiple targets, not running a single target that > runs N processes. For this use case, it should be fine. Just something to > think about. > > While I'm here, build tasks in TaskCluster must wait on `make check` (and > Python unit tests) to complete before kicking off derived tasks (namely > tests). We would ideally move Python unit tests out of the build task in > automation. Although, we would likely retain some Python tests - notably > related to the build system itself - because we want the build to fail and > tests not to run if the build system fails its core tests. Yes, I think your second paragraph is why the first paragraph is acceptable. Ideally most of these tests are their own separate tasks that only run when their related code gets modified.
Quick update, I ran into several issues when using the futures library, including a bug in futures itself that necessitated a version bump. That's fixed now, but currently I'm hitting an OSX only "No such process" error in mozprocess: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a771e263072481c0ffef29afb7b305b8d736820&selectedJob=29812784 This might stall out a bit as this is isn't a top priority.. But I'll see what I can do, hopefully it's easily resolved.
Attachment #8809924 - Flags: review?(gps) → review+
Comment on attachment 8809923 [details] Bug 1309060 - Ignore "No such process" when trying to find pgid in mozprocess, https://reviewboard.mozilla.org/r/92398/#review92732 ::: testing/mozbase/mozprocess/mozprocess/processhandler.py:748 (Diff revision 1) > - self.proc.pgid = os.getpgid(self.proc.pid) > + self.proc.pgid = os.getpgid(self.proc.pid) > + except OSError as e: > + # Do not consume errors except "No such process" > + if e.errno != 3: > + raise > + self.proc.pgid = self.proc.pid Maybe we should keep the pgid as None in case we cannot determine it. Something we do for the other code in this file. Maybe we should also consider to consolidate this duplicated code into a single function.
Attachment #8809923 - Flags: review?(hskupin)
Comment on attachment 8800225 [details] Bug 1309060 - Give |mach python-test| the ability to run tests in parallel, https://reviewboard.mozilla.org/r/85212/#review93210 This looks mostly good. I'd like to see the final series before formal r+. ::: python/mach_commands.py:155 (Diff revision 2) > + # https://gist.github.com/clchiou/f2608cbe54403edb0b13 > + executor._threads.clear() > + thread._threads_queues.clear() > + raise > + > + return 1 if return_code else 0 Can we not simply return `return_code` here? It defaults to 0 and should stay at 0 if nothing failed. ::: python/mach_commands.py:163 (Diff revision 2) > + # Buffer messages if more than one worker to avoid interleaving > + if self.jobs > 1: > + output.append(line) An unfortunate side-effect of this is that timestamps reported by mozharness will be incorrect. That can lead to confusion :( ::: python/mach_commands.py:183 (Diff revision 2) > - return_code += inner_return_code > > - if not file_displayed_test: > - self.log(logging.WARN, 'python-test', {'file': f}, > - 'TEST-UNEXPECTED-FAIL | No test output (missing mozunit.main() call?): {file}') > + _log(test_path) > + cmd = [self.virtualenv_manager.python_path, test_path] > + env = os.environ.copy() > + env[b'PYTHONDONTWRITEBYTECODE'] = '1' You need a `b''` for the value too, otherwise it could be a unicode type if unicode_literals are enabled.
Attachment #8800225 - Flags: review?(gps) → review-
Comment on attachment 8800226 [details] Bug 1309060 - Run |mach python-test| in parallel from make check, https://reviewboard.mozilla.org/r/85214/#review93212 I'm still not keen on invoking processes that take concurrency control away from `make`. But this is correct for the time being since `make check` doesn't do concurrent directory traversal, so there is likely only 1 Makefile being executed at a time. Plus, we'll be moving most Python tests out of `make check`. So this series prepares us for optimally executing Python tests concurrently in the future.
Attachment #8800226 - Flags: review?(gps) → review+
Comment on attachment 8800226 [details] Bug 1309060 - Run |mach python-test| in parallel from make check, https://reviewboard.mozilla.org/r/85214/#review93212 Yes, hopefully this is a very temporary solution and we get proper tasks running for python tests. I have a vague plan that goes something like: 1) Use pytest instead of mozunit 2) Create a thin wrapper around pytest that gives us manifestparser manifests (this would also allow us to stop special casing them in the build) 3) Use manifestparser subsuites to chunk tests out into separate tasks
Comment on attachment 8800225 [details] Bug 1309060 - Give |mach python-test| the ability to run tests in parallel, https://reviewboard.mozilla.org/r/85212/#review93210 > Can we not simply return `return_code` here? It defaults to 0 and should stay at 0 if nothing failed. I think my reasoning was if multiple tests fail with different return codes, then whichever one failed last would trump the first bad returncode. But always returning 1 isn't any better either.. > An unfortunate side-effect of this is that timestamps reported by mozharness will be incorrect. That can lead to confusion :( Yeah, this is the same problem as mochitest :(.. I don't have a good way around this. At least for now, this probably isn't a huge deal here (at least compared to mochitest), and is better than interleaved logs imo.
Comment on attachment 8800225 [details] Bug 1309060 - Give |mach python-test| the ability to run tests in parallel, https://reviewboard.mozilla.org/r/85212/#review93286
Attachment #8800225 - Flags: review?(gps) → review+
Comment on attachment 8809923 [details] Bug 1309060 - Ignore "No such process" when trying to find pgid in mozprocess, https://reviewboard.mozilla.org/r/92398/#review93288 The OS X man page for `getpgid()` states that errno is set to ``ESRCH`` when "there is no process whose process ID equals pid." In other words, the pid you are looking up info for isn't running. I reasonable explanation for this is a race condition between the process exiting and calling ``getpgid()``. ::: testing/mozbase/mozprocess/mozprocess/processhandler.py:859 (Diff revision 2) > + def _getpgid(cls, pid): > + try: > + return os.getpgid(pid) > + except OSError as e: > + # Do not raise for "No such process" > + if e.errno != 3: You should use an ``errno`` constant here. You want ``errno.ESRCH``.
Attachment #8809923 - Flags: review+
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a438c9e3a799 Ignore "No such process" when trying to find pgid in mozprocess, r=gps https://hg.mozilla.org/integration/autoland/rev/a0d07b344b2f Update futures to version 3.0.5, r=gps https://hg.mozilla.org/integration/autoland/rev/083a65f95471 Give |mach python-test| the ability to run tests in parallel, r=gps https://hg.mozilla.org/integration/autoland/rev/55a41c5e8e88 Run |mach python-test| in parallel from make check, r=gps
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: