660788 - Run xpcshell tests in parallel

Reporter

Description

•

14 years ago

Currently the xpcshell harness runs each test in a new xpcshell process one-at-a-time. Previously this had to be done this way because component registration would write compreg.dat in the app dir. Nowadays we have manifest-driven component registration, so this should be feasible.

Shawn Wilsher :sdwilsh

Comment 1

•

14 years ago

Do we want to run them in parallel or just use the same process? I ask because parallel running tests would result in logs that would be difficult to read in times of failure.

(not currently active) Ted Mielczarek

Reporter

Comment 2

•

14 years ago

I think we need to continue to use a fresh process for each test, to ensure that each test gets a clean environment. The harness should be able to run tests in parallel but serialize the output from them to make it less confusing. (It already buffers all test output anyway.)

(not currently active) Ted Mielczarek

Reporter

Updated

•

14 years ago

Assignee: ted.mielczarek → nobody

Geoff Lankow (:darktrojan)

Comment 3

•

13 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

I've been fooling around with some Python and made this. It's far from shippable but on my (2-core, non SSD) machine it's giving a ~40% speed up. If anyone can help finish it off please go for it.

Gregory Szorc [:gps]

Comment 4

•

13 years ago

Comment on attachment 654928 [details] [diff] [review] WIP Review of attachment 654928 [details] [diff] [review]: ----------------------------------------------------------------- Thank you for doing this! It's definitely on the right road. I would refactor this to use multiprocessing.Pool. That way, you have true separate processes and you won't run into the GIL (the threading module has GIL issues). It will also automagically use the number of processors your machine has, likely yielding optimal results. That being said, I'm not sure if every xpcshell test runner in the build infra has Python 2.6+ and thus access to multiprocessing. You can always add an "import multiprocessing", push to try, and see what happens. There will also be real-world issues with tests in some test suites running in parallel. e.g. I'm pretty sure you will get random failures in services/sync/tests/unit because many of those tests establish a HTTP server on a hard-coded port. We definitely need support in the xpcshell.ini manifests to control whether parallel execution is allowed. The optimist in me says to default to parallel and only disable if we see issues. Over time, we can identify pain points preventing parallel execution (e.g. we need a way for Python to tell the xpcshell environment what network ports it can listen/bind on and tests need to use that rather than hard-coded port numbers). I'm not sure how a flag in xpcshell.ini will play with the master manifest file (e.g. |make xpcshell-tests| from topobjdir). We obviously want the "no parallel" flag to be as fine as we can make it, otherwise we'll be limiting execution too much. I'm not sure if the manifest parser has a way of intelligently representing this. Since you will be touching a lot of logic in the test execution bits, I'd appreciate a part 0 that factors out the "execute an individual test process" code into a standalone method and have the for loop or your pool worker call into that. I think that will result in cleaner code in the end. runxpcshelltests.py is hard enough to read as it is... ::: testing/xpcshell/head.js @@ +339,5 @@ > + " if (isHttps || isWebSocket || isWebSocketSSL)" + > + " return 'PROXY localhost:" + port + "';" + > + " return 'DIRECT';" + > + "}"); > + } Did this sneak in?

Attachment #654928 - Flags: feedback+

(not currently active) Ted Mielczarek

Reporter

Comment 5

•

13 years ago

I would love for the xpcshell harness and the JS harnesses to be merged someday. The JS harness already has this capability, FWIW. (In reply to Gregory Szorc [:gps] from comment #4) > That being said, I'm not sure if every xpcshell test runner in the build > infra has Python 2.6+ and thus access to multiprocessing. You can always add > an "import multiprocessing", push to try, and see what happens. We only require 2.5 to build, still. If the build infra is up to the task we could require 2.6, but I would bet against that. > There will also be real-world issues with tests in some test suites running > in parallel. e.g. I'm pretty sure you will get random failures in > services/sync/tests/unit because many of those tests establish a HTTP server > on a hard-coded port. We definitely need support in the xpcshell.ini > manifests to control whether parallel execution is allowed. The optimist in > me says to default to parallel and only disable if we see issues. Over time, > we can identify pain points preventing parallel execution (e.g. we need a > way for Python to tell the xpcshell environment what network ports it can > listen/bind on and tests need to use that rather than hard-coded port > numbers). I'd like to see (as a separate bug) a way for all the test servers we use to bind to an arbitrary free port and report that to tests so that we don't have to use hardcoded ports at all.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 6

•

13 years ago

> We only require 2.5 to build, still. If the build infra is up to the task we > could require 2.6, but I would bet against that. We now require Python 2.6 to build, see bug 800614.

Gregory Szorc [:gps]

Comment 7

•

13 years ago

Building != running tests. We still have test runners running older Python than what the build itself requires. Work towards running Python 2.7 everywhere is ongoing and tracked in bug 724191.

Ed Morley [:emorley]

Updated

•

12 years ago

Blocks: 845748

Gregory Szorc [:gps]

Comment 8

•

12 years ago

According to bug 883983, httpd.js can choose its own port. This will go a long way towards abating the resource contention in xpcshell tests.

Gregory Szorc [:gps]

Updated

•

12 years ago

Depends on: 884421

Gregory Szorc [:gps]

Updated

•

12 years ago

Depends on: 887064

Gregory Szorc [:gps]

Updated

•

12 years ago

Depends on: parxpc

Gregory Szorc [:gps]

Updated

•

12 years ago

No longer depends on: 884421

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

12 years ago

No longer depends on: 887064

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

12 years ago

Depends on: 898658

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 9

•

12 years ago

Comment on attachment 654928 [details] [diff] [review] WIP Marking older WIP patch as obsolete, hope it's fine. After bug 887054 lands, I'll upload a patch that turns on parallel runs on automation here.

Attachment #654928 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 10

•

12 years ago

Attached patch Turn on parxpc in automation — Details — Splinter Review

So this patch will turn on parxpc in automation. Putting it here so we'll have it ready to try landing soon. As discussed in bug 898658, I'll do some "stress test" runs locally trying to identify broken tests and after that we can try landing this.

Attachment #790940 - Flags: review?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

12 years ago

Assignee: nobody → mihneadb

Status: NEW → ASSIGNED

(not currently active) Ted Mielczarek

Reporter

Updated

•

12 years ago

Attachment #790940 - Flags: review?(ted) → review+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

12 years ago

Depends on: 906510

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Phil Ringnalda (:philor)

Comment 11

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/14619f24a8a8

Keywords: checkin-needed

Gregory Szorc [:gps]

Comment 12

•

12 years ago

\o/ Please blog this. Break your arm doing it.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 13

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #12) > \o/ > > Please blog this. Break your arm doing it. I'm waiting for it to reach central. But yes.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 14

•

12 years ago

Ted, it seems that killPid in the windows version of automation.py hung[1] once. It's clearly intermittent (see the retrigger[2] and the runs here[3]), but I don't know what could cause it. Any ideas? [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27209828&tree=Mozilla-Inbound&full=1#error0 [2] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=14619f24a8a8 [3] https://tbpl.mozilla.org/?tree=Try&rev=6c50d2ebcada

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Reporter

Comment 15

•

12 years ago

I don't have any idea about that. I have a patch floating around to replace all that code with something less horrible, I should just finish that.

Flags: needinfo?(ted)

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 911249

Ed Morley [:emorley]

Comment 16

•

12 years ago

(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #14) > Ted, it seems that killPid in the windows version of automation.py hung[1] > once. > It's clearly intermittent (see the retrigger[2] and the runs here[3]), but I > don't know what could cause it. Any ideas? > > [1] > https://tbpl.mozilla.org/php/getParsedLog.php?id=27209828&tree=Mozilla- > Inbound&full=1#error0 > [2] https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=14619f24a8a8 > [3] https://tbpl.mozilla.org/?tree=Try&rev=6c50d2ebcada Filed as bug 911249. (In reply to Ted Mielczarek [:ted.mielczarek] from comment #15) > I don't have any idea about that. I have a patch floating around to replace > all that code with something less horrible, I should just finish that. Guessing this is bug 890026 ? :-)

Ed Morley [:emorley]

Comment 17

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/14619f24a8a8

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 18

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #16) > > (In reply to Ted Mielczarek [:ted.mielczarek] from comment #15) > > I don't have any idea about that. I have a patch floating around to replace > > all that code with something less horrible, I should just finish that. > > Guessing this is bug 890026 ? :-) That one. Also, trying bug 911262 as a quick(er) fix.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 19

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #12) > \o/ > > Please blog this. Break your arm doing it. Blog post with details about the process of parallelizing the harness: http://www.mihneadb.net/post/parallelizing-a-test-harness/

Gregory Szorc [:gps]

Updated

•

12 years ago

Blocks: 912596

WIP 13 years ago Geoff Lankow (:darktrojan) 14.58 KB, patch	gps : feedback+	Details \| Diff \| Splinter Review
Turn on parxpc in automation 12 years ago Mihnea Dobrescu-Balaur (:mihneadb) 1006 bytes, patch	ted : review+	Details \| Diff \| Splinter Review