Closed Bug 660788 Opened 9 years ago Closed 7 years ago

Run xpcshell tests in parallel

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla26

People

(Reporter: ted, Assigned: mihneadb)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently the xpcshell harness runs each test in a new xpcshell process one-at-a-time. Previously this had to be done this way because component registration would write compreg.dat in the app dir. Nowadays we have manifest-driven component registration, so this should be feasible.
Do we want to run them in parallel or just use the same process?  I ask because parallel running tests would result in logs that would be difficult to read in times of failure.
I think we need to continue to use a fresh process for each test, to ensure that each test gets a clean environment. The harness should be able to run tests in parallel but serialize the output from them to make it less confusing. (It already buffers all test output anyway.)
Assignee: ted.mielczarek → nobody
Attached patch WIP (obsolete) — Splinter Review
I've been fooling around with some Python and made this. It's far from shippable but on my (2-core, non SSD) machine it's giving a ~40% speed up. If anyone can help finish it off please go for it.
Comment on attachment 654928 [details] [diff] [review]
WIP

Review of attachment 654928 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for doing this! It's definitely on the right road.

I would refactor this to use multiprocessing.Pool. That way, you have true separate processes and you won't run into the GIL (the threading module has GIL issues). It will also automagically use the number of processors your machine has, likely yielding optimal results.

That being said, I'm not sure if every xpcshell test runner in the build infra has Python 2.6+ and thus access to multiprocessing. You can always add an "import multiprocessing", push to try, and see what happens.

There will also be real-world issues with tests in some test suites running in parallel. e.g. I'm pretty sure you will get random failures in services/sync/tests/unit because many of those tests establish a HTTP server on a hard-coded port. We definitely need support in the xpcshell.ini manifests to control whether parallel execution is allowed. The optimist in me says to default to parallel and only disable if we see issues. Over time, we can identify pain points preventing parallel execution (e.g. we need a way for Python to tell the xpcshell environment what network ports it can listen/bind on and tests need to use that rather than hard-coded port numbers).

I'm not sure how a flag in xpcshell.ini will play with the master manifest file (e.g. |make xpcshell-tests| from topobjdir). We obviously want the "no parallel" flag to be as fine as we can make it, otherwise we'll be limiting execution too much. I'm not sure if the manifest parser has a way of intelligently representing this.

Since you will be touching a lot of logic in the test execution bits, I'd appreciate a part 0 that factors out the "execute an individual test process" code into a standalone method and have the for loop or your pool worker call into that. I think that will result in cleaner code in the end. runxpcshelltests.py is hard enough to read as it is...

::: testing/xpcshell/head.js
@@ +339,5 @@
> +      "  if (isHttps || isWebSocket || isWebSocketSSL)" +
> +      "    return 'PROXY localhost:" + port + "';" +
> +      "  return 'DIRECT';" +
> +      "}");
> +  }

Did this sneak in?
Attachment #654928 - Flags: feedback+
I would love for the xpcshell harness and the JS harnesses to be merged someday. The JS harness already has this capability, FWIW.

(In reply to Gregory Szorc [:gps] from comment #4)
> That being said, I'm not sure if every xpcshell test runner in the build
> infra has Python 2.6+ and thus access to multiprocessing. You can always add
> an "import multiprocessing", push to try, and see what happens.

We only require 2.5 to build, still. If the build infra is up to the task we could require 2.6, but I would bet against that.

> There will also be real-world issues with tests in some test suites running
> in parallel. e.g. I'm pretty sure you will get random failures in
> services/sync/tests/unit because many of those tests establish a HTTP server
> on a hard-coded port. We definitely need support in the xpcshell.ini
> manifests to control whether parallel execution is allowed. The optimist in
> me says to default to parallel and only disable if we see issues. Over time,
> we can identify pain points preventing parallel execution (e.g. we need a
> way for Python to tell the xpcshell environment what network ports it can
> listen/bind on and tests need to use that rather than hard-coded port
> numbers).

I'd like to see (as a separate bug) a way for all the test servers we use to bind to an arbitrary free port and report that to tests so that we don't have to use hardcoded ports at all.
> We only require 2.5 to build, still. If the build infra is up to the task we
> could require 2.6, but I would bet against that.

We now require Python 2.6 to build, see bug 800614.
Building != running tests.

We still have test runners running older Python than what the build itself requires.

Work towards running Python 2.7 everywhere is ongoing and tracked in bug 724191.
Blocks: 845748
According to bug 883983, httpd.js can choose its own port. This will go a long way towards abating the resource contention in xpcshell tests.
Depends on: 884421
Depends on: 887064
Depends on: parxpc
No longer depends on: 884421
No longer depends on: 887064
Depends on: 898658
Comment on attachment 654928 [details] [diff] [review]
WIP

Marking older WIP patch as obsolete, hope it's fine.

After bug 887054 lands, I'll upload a patch that turns on parallel runs on automation here.
Attachment #654928 - Attachment is obsolete: true
So this patch will turn on parxpc in automation. Putting it here so we'll have
it ready to try landing soon.

As discussed in bug 898658, I'll do some "stress test" runs locally trying to
identify broken tests and after that we can try landing this.
Attachment #790940 - Flags: review?(ted)
Assignee: nobody → mihneadb
Status: NEW → ASSIGNED
Attachment #790940 - Flags: review?(ted) → review+
Depends on: 906510
\o/

Please blog this. Break your arm doing it.
(In reply to Gregory Szorc [:gps] from comment #12)
> \o/
> 
> Please blog this. Break your arm doing it.

I'm waiting for it to reach central. But yes.
Ted, it seems that killPid in the windows version of automation.py hung[1] once. 
It's clearly intermittent (see the retrigger[2] and the runs here[3]), but I don't know what could cause it. Any ideas?

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27209828&tree=Mozilla-Inbound&full=1#error0
[2] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=14619f24a8a8
[3] https://tbpl.mozilla.org/?tree=Try&rev=6c50d2ebcada
Flags: needinfo?(ted)
I don't have any idea about that. I have a patch floating around to replace all that code with something less horrible, I should just finish that.
Flags: needinfo?(ted)
Depends on: 911249
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #14)
> Ted, it seems that killPid in the windows version of automation.py hung[1]
> once. 
> It's clearly intermittent (see the retrigger[2] and the runs here[3]), but I
> don't know what could cause it. Any ideas?
> 
> [1]
> https://tbpl.mozilla.org/php/getParsedLog.php?id=27209828&tree=Mozilla-
> Inbound&full=1#error0
> [2] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=14619f24a8a8
> [3] https://tbpl.mozilla.org/?tree=Try&rev=6c50d2ebcada

Filed as bug 911249.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #15)
> I don't have any idea about that. I have a patch floating around to replace
> all that code with something less horrible, I should just finish that.

Guessing this is bug 890026 ? :-)
https://hg.mozilla.org/mozilla-central/rev/14619f24a8a8
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
(In reply to Ed Morley [:edmorley UTC+1] from comment #16)
> 
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #15)
> > I don't have any idea about that. I have a patch floating around to replace
> > all that code with something less horrible, I should just finish that.
> 
> Guessing this is bug 890026 ? :-)

That one. Also, trying bug 911262 as a quick(er) fix.
(In reply to Gregory Szorc [:gps] from comment #12)
> \o/
> 
> Please blog this. Break your arm doing it.

Blog post with details about the process of parallelizing the harness:
http://www.mihneadb.net/post/parallelizing-a-test-harness/
Blocks: 912596
You need to log in before you can comment on or make changes to this bug.