Last Comment Bug 887054 - (parxpc) Refactor xpcshell test harness to support parallel test runs. disabled by default in automation
(parxpc)
: Refactor xpcshell test harness to support parallel test runs. disabled by def...
Status: RESOLVED FIXED
:
Product: Testing
Classification: Components
Component: XPCShell Harness (show other bugs)
: unspecified
: x86_64 All
: -- enhancement (vote)
: mozilla26
Assigned To: Mihnea Dobrescu-Balaur (:mihneadb)
:
:
Mentors:
Depends on: 887064 892021 892121 895542 896093
Blocks: 660788 896087 898816
  Show dependency treegraph
 
Reported: 2013-06-25 16:38 PDT by Mihnea Dobrescu-Balaur (:mihneadb)
Modified: 2013-08-15 05:31 PDT (History)
9 users (show)
ryanvm: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
first try (44.43 KB, patch)
2013-06-25 16:40 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: feedback+
Details | Diff | Splinter Review
run at most NUM_CPUS * 1.5 threads/tests (47.22 KB, patch)
2013-07-02 16:39 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
handle run-sequentially flag in ini and --sequential as argv (50.12 KB, patch)
2013-07-02 18:45 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: feedback+
Details | Diff | Splinter Review
mach command update to handle --sequential (3.94 KB, patch)
2013-07-02 18:46 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: review+
gps: review+
Details | Diff | Splinter Review
mach command update to handle --sequential (3.93 KB, patch)
2013-07-03 12:01 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
parallel xpcshelltest harness (50.47 KB, patch)
2013-07-03 14:03 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
parallel xpcshelltest harness (50.52 KB, patch)
2013-07-03 15:40 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: feedback+
Details | Diff | Splinter Review
Parallelize xpcshell (52.12 KB, patch)
2013-07-16 15:55 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (53.40 KB, patch)
2013-07-17 15:21 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (53.41 KB, patch)
2013-07-17 15:30 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (55.02 KB, patch)
2013-07-17 17:15 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (55.69 KB, patch)
2013-07-18 10:05 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (56.53 KB, patch)
2013-07-18 10:06 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (56.69 KB, patch)
2013-07-18 15:58 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (56.68 KB, patch)
2013-07-18 16:56 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (56.82 KB, patch)
2013-07-18 17:30 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (57.49 KB, patch)
2013-07-19 10:07 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.25 KB, patch)
2013-07-19 10:58 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.14 KB, patch)
2013-07-19 11:00 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.39 KB, patch)
2013-07-19 17:57 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.42 KB, patch)
2013-07-19 19:28 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
adapt the mobile harnesses (34.43 KB, patch)
2013-07-23 22:13 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: feedback+
Details | Diff | Splinter Review
adapt mobile harness (34.65 KB, patch)
2013-07-25 18:02 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (59.08 KB, patch)
2013-07-30 13:55 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (34.67 KB, patch)
2013-07-30 14:42 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
adapt mobile harnesses (35.20 KB, patch)
2013-07-30 14:49 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (57.89 KB, patch)
2013-07-30 15:04 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.05 KB, patch)
2013-08-01 20:03 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
adapt mobile harness (35.31 KB, patch)
2013-08-01 22:37 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (35.35 KB, patch)
2013-08-02 00:18 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell test harness (58.16 KB, patch)
2013-08-02 16:02 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell test harness (58.41 KB, patch)
2013-08-02 16:24 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell test harness (58.41 KB, patch)
2013-08-02 16:32 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; (5.87 KB, patch)
2013-08-05 15:33 PDT, Chris Manchester (:chmanchester)
mihneadb: feedback+
Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (36.93 KB, patch)
2013-08-05 15:52 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; f?mihneadb (6.36 KB, patch)
2013-08-05 16:27 PDT, Chris Manchester (:chmanchester)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; (6.45 KB, patch)
2013-08-06 00:13 PDT, Chris Manchester (:chmanchester)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel (6.36 KB, patch)
2013-08-06 10:29 PDT, Chris Manchester (:chmanchester)
ted: review-
Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (37.09 KB, patch)
2013-08-06 13:55 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: review+
Details | Diff | Splinter Review
Parallelize xpcshell (58.13 KB, patch)
2013-08-06 15:13 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell (58.22 KB, patch)
2013-08-07 11:29 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Add support for --sequential to mach (4.64 KB, patch)
2013-08-07 12:37 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Parallelize xpcshell (58.25 KB, patch)
2013-08-07 14:14 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Clean up slacker directories after the test run ends (4.54 KB, patch)
2013-08-08 13:35 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell harness (57.88 KB, patch)
2013-08-08 23:23 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize xpcshell harness (57.88 KB, patch)
2013-08-09 09:39 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ted: review-
Details | Diff | Splinter Review
Clean up slacker directories after the test run ends (5.61 KB, patch)
2013-08-09 09:58 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ted: review+
Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; r?ted (6.87 KB, patch)
2013-08-10 13:50 PDT, Chris Manchester (:chmanchester)
ted: review+
Details | Diff | Splinter Review
Parallelize xpcshell harness (58.77 KB, patch)
2013-08-12 12:34 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (37.42 KB, patch)
2013-08-12 12:52 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; (6.92 KB, patch)
2013-08-12 13:18 PDT, Chris Manchester (:chmanchester)
cmanchester: review+
Details | Diff | Splinter Review
Parallelize desktop xpcshell harness (58.98 KB, patch)
2013-08-12 14:45 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (37.45 KB, patch)
2013-08-12 14:58 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; (6.77 KB, patch)
2013-08-12 15:07 PDT, Chris Manchester (:chmanchester)
cmanchester: review+
Details | Diff | Splinter Review
Parallelize desktop xpcshell harness (59.50 KB, patch)
2013-08-12 16:59 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Parallelize desktop xpcshell harness (59.32 KB, patch)
2013-08-12 17:47 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ted: review+
Details | Diff | Splinter Review
Adapt mobile xpcshell harnesses to the changes from parxpc (37.46 KB, patch)
2013-08-12 17:49 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
no flags Details | Diff | Splinter Review
Add support for --sequential to mach, (5.27 KB, patch)
2013-08-12 18:06 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
gps: review+
Details | Diff | Splinter Review
Synchronize blocks of output when running xpcshell tests in parallel; (6.93 KB, patch)
2013-08-12 18:22 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation; (93.10 KB, patch)
2013-08-14 10:16 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Part 2 - Add parallel warning and support for --sequential to mach xpcshell-test, (5.31 KB, patch)
2013-08-14 10:17 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation; (93.10 KB, patch)
2013-08-14 10:38 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Part 3 - Synchronize blocks of output when running xpcshell tests in parallel; (6.94 KB, patch)
2013-08-14 10:40 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
mihneadb: review+
Details | Diff | Splinter Review
Part 4 - Mark dom/network/tests/unit/test_tcpserversocket.js to run sequentially; (1017 bytes, patch)
2013-08-14 11:31 PDT, Mihnea Dobrescu-Balaur (:mihneadb)
ahalberstadt: review+
Details | Diff | Splinter Review

Description Mihnea Dobrescu-Balaur (:mihneadb) 2013-06-25 16:38:14 PDT

    
Comment 1 Mihnea Dobrescu-Balaur (:mihneadb) 2013-06-25 16:40:28 PDT
Created attachment 767499 [details] [diff] [review]
first try

This is the first shot at it. As we talked, this is doing a [pyhton] thread for every test. I'm working on getting all tests to pass and then I'll improve (if necessary, since the desired speedup seems to be here already) the threading model.
Comment 2 Gregory Szorc [:gps] 2013-06-25 21:15:04 PDT
How is this different from bug 660788?
Comment 3 Mihnea Dobrescu-Balaur (:mihneadb) 2013-06-25 21:38:20 PDT
As we talked, this takes out the running of the test part from the XPCShellTests class, which does only the test-discovery and aggregation part now, basically.
Comment 4 Andrew Halberstadt [:ahal] 2013-06-26 10:20:03 PDT
Comment on attachment 767499 [details] [diff] [review]
first try

Review of attachment 767499 [details] [diff] [review]:
-----------------------------------------------------------------

I think the approach looks good! I'm not crazy about calling things "runners" though. Runner could refer to the harness as a whole (i.e runxpcshelltests.py) or mozrunner or ...? Maybe just dropping the Runner and calling the class XPCShellTest would be better.
Comment 5 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-02 16:39:34 PDT
Created attachment 770525 [details] [diff] [review]
run at most NUM_CPUS * 1.5 threads/tests
Comment 6 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-02 18:45:04 PDT
Created attachment 770567 [details] [diff] [review]
handle run-sequentially flag in ini and --sequential as argv
Comment 7 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-02 18:46:25 PDT
Created attachment 770568 [details] [diff] [review]
mach command update to handle --sequential
Comment 8 Andrew Halberstadt [:ahal] 2013-07-03 07:41:52 PDT
Comment on attachment 770567 [details] [diff] [review]
handle run-sequentially flag in ini and --sequential as argv

Review of attachment 770567 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, I think this is looking really good so far. f+ because the approach is good, but there are still some things that need fixing, though nothing major.

::: testing/xpcshell/runxpcshelltests.py
@@ +11,2 @@
>  from glob import glob
> +from multiprocessing import cpu_count

We'll need to do something about this.. This SO answer is pretty detailed, http://stackoverflow.com/a/1006301/794460, though I don't know if going to that extreme is worth it (maybe we could file a separate bug to get a subset of that method into mozsystemresroucemonitor since it already uses psutil to some degree).

For now though I think we can do something like:
try:
    from multiprocessing import cpu_count
except ImportError:
    NUM_THREADS = 1

@@ +22,5 @@
>  from automationutils import *
>  
>  HARNESS_TIMEOUT = 5 * 60
>  
> +NUM_THREADS = int(cpu_count() * 1.5)

Is 1.5 the magic number that gives the best times? Or is it just a placeholder until we figure out what that number is? Either way we should document what the number is and why spawning more threads than NUM_THREADS speeds up the tests.

@@ +52,5 @@
>  def markGotSIGINT(signum, stackFrame):
>      global gotSIGINT
>      gotSIGINT = True
>  
> +class XPCShellTest(Thread):

It took me awhile to figure out that there were two separate classes here, one called XPCShellTest and one called XPCShellTests. I know I you to remove the Runner part (sorry for being so picky).. maybe XPCShellThread?

@@ +76,5 @@
> +        self.env = copy.deepcopy(self.harness.env)
> +        self.symbolsPath = self.harness.symbolsPath
> +        self.logfiles = self.harness.logfiles
> +        self.xpcshell = self.harness.xpcshell
> +        self.xpcsRunArgs = self.harness.xpcsRunArgs

Passing the whole harness class seems awkward. In XPCShellTests.run_tests() you could do something like:
    kwargs = locals() # before defining any other local vars
    thread = XPCShellTest(stuff, **kwargs)

Then here you could do kwargs.get('appPath'), etc.. You might need to build kwargs manually a bit still, but at least this way the two classes are less coupled.

@@ +1072,5 @@
> +        # create a queue of all tests that will run
> +        tests_queue = deque()
> +        # also a list for the tests that need to be run sequentially
> +        sequential_tests = []
> +        for test_file in self.alltests:

This is just an object right? test_obj might be better.
Comment 9 Andrew Halberstadt [:ahal] 2013-07-03 07:51:32 PDT
Comment on attachment 770568 [details] [diff] [review]
mach command update to handle --sequential

Review of attachment 770568 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.. though fair warning, I'm not super familiar with mach targets. This looks simple enough that I'll r+ it, but might want to get someone else to review if it gets more complicated than this.
Comment 10 Ted Mielczarek [:ted.mielczarek] 2013-07-03 09:36:54 PDT
Feel free to tag me for final review on this, I am very interested. I'll try to make time to take an initial look at it soon. For mach commands any build peer is fine for a review, I'm not sure exactly what module they fall into honestly.
Comment 11 Ted Mielczarek [:ted.mielczarek] 2013-07-03 09:37:22 PDT
re: multiprocessing.cpu_count, see bug 813742 comment 39, this should be fine to use.
Comment 12 Andrew Halberstadt [:ahal] 2013-07-03 10:07:36 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> re: multiprocessing.cpu_count, see bug 813742 comment 39, this should be
> fine to use.

We could also use psutil which is already in-tree. I think it uses multiprocess under the hood if available and if not does something else.

The --sequential argument and whatever we use for reftests in bug 813742 should also be consistent.
Comment 13 Gregory Szorc [:gps] 2013-07-03 11:43:25 PDT
Comment on attachment 770568 [details] [diff] [review]
mach command update to handle --sequential

Review of attachment 770568 [details] [diff] [review]:
-----------------------------------------------------------------

This change to mach is sane.
Comment 14 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-03 12:01:59 PDT
Created attachment 770946 [details] [diff] [review]
mach command update to handle --sequential

Corrected the help message for --sequential. Keeping previous r+
Comment 15 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-03 14:03:55 PDT
Created attachment 771015 [details] [diff] [review]
parallel xpcshelltest harness

Updated as requested by :ahal.

Kept the cpu_count as discussed in IRC.

cpus * 1.5 makes the most sense for my machine at least. Will keep it at that for now, and we can further tweak it when we start doing try runs.
Comment 16 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-03 15:40:04 PDT
Created attachment 771072 [details] [diff] [review]
parallel xpcshelltest harness

I previously introduced a bug by performing the copy of the env dict before spawning the thread. Fixed that.
Comment 17 Andrew Halberstadt [:ahal] 2013-07-08 08:05:39 PDT
Comment on attachment 771072 [details] [diff] [review]
parallel xpcshelltest harness

Review of attachment 771072 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! :)

::: testing/xpcshell/runxpcshelltests.py
@@ +22,5 @@
>  from automationutils import *
>  
>  HARNESS_TIMEOUT = 5 * 60
>  
> +NUM_THREADS = int(cpu_count() * 1.5)

Would still like a comment explaining why NUM_THREADS=cpu_count*1.5 is faster than NUM_THREADS=cpu_count
Comment 18 Robert Strong [:rstrong] (use needinfo to contact me) 2013-07-15 14:54:12 PDT
Prior to this patch if I have the following in a xpcshell.ini file

[include:xpcshell_updater.ini]
skip-if = os == 'win'

The tests in xpcshell_updater.ini will not be executed for windows.

After this patch is applied with a contents of xpcshell_updater.ini as follows:
[test_0110_general.js]
[test_0111_general.js]
[test_0112_general.js]
[test_0113_general.js]
skip-if = os == "mac" || toolkit == "gonk"
reason = bug 820380
[test_0114_general.js]
skip-if = os == "mac"
[test_0115_general.js]
[test_0200_app_launch_apply_update.js]
skip-if = toolkit == "gonk"
reason = bug 820380
[test_0201_app_launch_apply_update.js]
skip-if = toolkit == "gonk"
reason = bug 820380

the following occurs

TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0110_general.js | skip-if: os == 'win'
TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0111_general.js | skip-if: os == 'win'
TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0112_general.js | skip-if: os == 'win'
TEST-INFO | skipping c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0115_general.js | skip-if: os == 'win'
TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0114_general.js | running test ...
TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0113_general.js | running test ...
TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0200_app_launch_apply_update.js | running test ...
TEST-INFO | c:\moz\_1_mozilla-central\ff-x86-opt\_tests\xpcshell\toolkit\mozapps\update\test\unit\test_0201_app_launch_apply_update.js | running test ...


The expected result is these tests should not have been executed.
Comment 19 Robert Strong [:rstrong] (use needinfo to contact me) 2013-07-15 15:51:56 PDT
(In reply to Robert Strong [:rstrong] (do not email) from comment #18)
This appears to be bug 676876
Comment 20 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-16 14:45:04 PDT
:chmanchester spotted an issue with selftest.py not passing with the patch applied.

Main issue is with the testMissingHeadFile and testMissingTailFile tests, that are looking to catch exceptions that now are thrown in the child threads, so they never end up in the selftests.py thread.

Not sure how to address this.


I also encountered a failure with the xunit test in selftests.py.
Comment 21 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-16 15:55:12 PDT
Created attachment 776766 [details] [diff] [review]
Parallelize xpcshell

Improved so that:

* no more busy loop, using a semaphore
* handling exceptions now as well
Comment 22 Ted Mielczarek [:ted.mielczarek] 2013-07-17 04:55:10 PDT
Comment on attachment 771072 [details] [diff] [review]
parallel xpcshelltest harness

Review of attachment 771072 [details] [diff] [review]:
-----------------------------------------------------------------

I had a few comments sitting in draft here, so I might as well hit "Publish" since this patch is already out of date.

::: testing/xpcshell/runxpcshelltests.py
@@ +1016,4 @@
>  
>          # If we have an interactive debugger, disable ctrl-c.
> +        #if self.debuggerInfo and self.debuggerInfo["interactive"]:
> +            #signal.signal(signal.SIGINT, lambda signum, frame: None)

I assume this is just a hack, make sure it gets removed before you get a final patch ready.

@@ +1107,5 @@
> +        # keep a set of NUM_THREADS running tests and start running the
> +        # tests in the queue at most NUM_THREADS at a time
> +        running_tests = set()
> +        keep_going = True
> +        while len(tests_queue) or len(running_tests):

You don't need the len(), "while tests_queue or running_tests:" works fine.
Comment 23 Mark Côté [:mcote] 2013-07-17 10:12:41 PDT
Comment on attachment 776766 [details] [diff] [review]
Parallelize xpcshell

Review of attachment 776766 [details] [diff] [review]:
-----------------------------------------------------------------

Wasn't asked for a review, but mihneadb and I talked about this yesterday, so offering my two cents...

::: testing/xpcshell/runxpcshelltests.py
@@ +1122,5 @@
> +                test = tests_queue.popleft()
> +                running_tests.add(test)
> +                test.start()
> +                if len(tests_queue):
> +                    continue # try to keep as many running threads as possible

So this "if" block ends by repeating the loop if there are still entries in tests_queue.  You may as well just change the block from an "if" to a "while", since it will terminate if there is nothing in tests_queue.

@@ +1130,5 @@
> +
> +            # wait for at least one of the tests to finish
> +            self.sem.acquire()
> +            # add 1 back to the semaphore, will be subtracted in the loop
> +            self.sem.release()

This is a pretty interesting use of a semaphore!  It makes sense, but I think a comment explaining exactly what you are doing would be really beneficial down the road, namely, that the threads are releasing semaphores without acquiring them in order to raise the semaphore's count.

@@ +1141,5 @@
> +                    done_tests.add(test)
> +                    test.join()
> +                    # did the test encounter any exception?
> +                    if test.exception:
> +                        raise test.exception

I guess this aborts all the other tests?  Would it make more sense to collect all exceptions, log them, and then fail after they are all done?

@@ +1151,3 @@
>  
>              if not keep_going:
>                  break

Again, is it okay to just suddenly abort all other running tests?  Shouldn't this be used to bail out after all threads are done?

@@ +1151,5 @@
>  
>              if not keep_going:
>                  break
>  
> +            time.sleep(0) # tricking the scheduler to yield before other threads

Don't think you need this anymore.
Comment 24 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-17 11:44:51 PDT
Comment on attachment 776766 [details] [diff] [review]
Parallelize xpcshell

Obsoleting this so I can improve it per :mcote's comments.
Comment 25 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-17 15:21:27 PDT
Created attachment 777422 [details] [diff] [review]
Parallelize xpcshell

Rebased patch on changes made in bug 892021 and in bug 892121
Comment 26 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-17 15:30:58 PDT
Created attachment 777428 [details] [diff] [review]
Parallelize xpcshell

Forgot a stray print_stdout
Comment 27 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-17 17:15:50 PDT
Created attachment 777496 [details] [diff] [review]
Parallelize xpcshell

* raise the exception in the case of sequential tests as well
* run the tests in selftest.py sequentially (because they rely on ordering)
Comment 28 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-18 10:05:04 PDT
Created attachment 777895 [details] [diff] [review]
Parallelize xpcshell

Specifically request a sequential run for the remote harness.
Comment 29 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-18 10:06:01 PDT
Created attachment 777897 [details] [diff] [review]
Parallelize xpcshell

Specifically pass sequential=True for b2g as well.
Comment 30 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-18 15:58:57 PDT
Created attachment 778136 [details] [diff] [review]
Parallelize xpcshell

Properly override the sequential option for mobile tests.
Comment 31 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-18 16:56:22 PDT
Created attachment 778160 [details] [diff] [review]
Parallelize xpcshell

* log max number of threads used
* take out signal handling (does not work in a non-main thread)
* take out unnecessary len(queue) calls
Comment 32 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-18 17:30:09 PDT
Created attachment 778173 [details] [diff] [review]
Parallelize xpcshell

* log traceback of caught exception
Comment 33 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 10:07:20 PDT
Created attachment 778540 [details] [diff] [review]
Parallelize xpcshell

Rebased patch on latest m-c
Comment 34 Gregory Szorc [:gps] 2013-07-19 10:29:31 PDT
I think the number of threads should be max(4, int(cpu_count * 2))

(or 1.5). The reason is tests tend to use very little CPU and the current thread count doesn't seem to work well with machines with few cores.
Comment 35 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 10:58:05 PDT
Created attachment 778566 [details] [diff] [review]
Parallelize xpcshell

* changed num threads to :gps's suggestion
* when an exception is caused, harness will wait for the currently running
threads to end so we don't end up with orphans/zombies;
* all caught exceptions(tracebacks) are logged and the first encountered
exception is raised
Comment 36 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 11:00:55 PDT
Created attachment 778569 [details] [diff] [review]
Parallelize xpcshell

Changed "keep the running pool full" if to a while as suggested by :mcote.
Comment 37 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 15:11:08 PDT
Right now this breaks the mobile (android + b2g) xpc harnesses.

The Android and B2G harnesses extend the desktop XPCShellTests class and override some methods that they *know* will be called during runTests.

Problem is - my patch takes away the "run a test" part into a separate class, so the mobile harnesses end up never overriding those methods.


Not sure how to address this.
Comment 38 Andrew Halberstadt [:ahal] 2013-07-19 15:20:52 PDT
Personally I've found the remote inherits from desktop except not really model to be a huge pain in general. If it's not a simple matter to get the remote tests also using this new class, I'd be ok with duplicating a bit of code.

Alternatively you could put just the shared bits into a mixin that your new desktop class and the remote classes inherit from.
Comment 39 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 15:32:33 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #38)
> Personally I've found the remote inherits from desktop except not really
> model to be a huge pain in general. If it's not a simple matter to get the
> remote tests also using this new class, I'd be ok with duplicating a bit of
> code.
> 
> Alternatively you could put just the shared bits into a mixin that your new
> desktop class and the remote classes inherit from.

I don't see how the mixin could help. What I *think* we have to do is make a RemoteXPCShellTestThread class that extends the new one I introduced.

Then, we could make the runTests method use that. This is the way I'd do it since it would not create code duplication. I mean we can add a kwargs for the runTests method with the TestClass to be used.

Does that make sense?
Comment 40 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 17:57:12 PDT
Created attachment 778757 [details] [diff] [review]
Parallelize xpcshell

* use psutil if available (for easy profiling)
* define stdout and stderr out of the try block so we won't get the
'undefined stdout' error anymore
* calling shutdownNode first and then raising any exceptions
Comment 41 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-19 19:28:23 PDT
Created attachment 778778 [details] [diff] [review]
Parallelize xpcshell

Marking test threads as daemons to make sure harness does not hang.
Comment 42 Andrew Halberstadt [:ahal] 2013-07-22 06:51:31 PDT
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #39)
> Then, we could make the runTests method use that. This is the way I'd do it
> since it would not create code duplication. I mean we can add a kwargs for
> the runTests method with the TestClass to be used.
> 
> Does that make sense?


Yep, makes sense.
Comment 43 Geoff Brown [:gbrown] 2013-07-22 09:11:41 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #42)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #39)
> > Does that make sense?
> 
> Yep, makes sense.

+1. Sounds like you are back on track.
Comment 44 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-23 22:13:55 PDT
Created attachment 780188 [details] [diff] [review]
adapt the mobile harnesses

[this applies over the parxpc patch]

This adapts the mobile harnesses by replicating the harness-testthread pattern.

I couldn't find a *nicer* way to get the relevant data from the harness to the
mobile test threads other than the mobileArgs trick.


This try run seems to turn in green (only b2g done so far):
https://tbpl.mozilla.org/?tree=Try&rev=457c36eb497b
Comment 45 Andrew Halberstadt [:ahal] 2013-07-24 10:33:32 PDT
Comment on attachment 780188 [details] [diff] [review]
adapt the mobile harnesses

Review of attachment 780188 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me! It's a little hard to tell what changed due to the large refactors but seems like it's just moving code around anyway.

::: testing/xpcshell/remotexpcshelltests.py
@@ +17,5 @@
>  
> +def remoteJoin(path1, path2):
> +    joined = os.path.join(path1, path2)
> +    joined = joined.replace('\\', '/')
> +    return joined

I know this was here before, but you can do:
    import posixpath
    posixpath.join(...)
Comment 46 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-25 18:02:44 PDT
Created attachment 781376 [details] [diff] [review]
adapt mobile harness

Changed remoteJoin to use posixpath
Comment 47 Gregory Szorc [:gps] 2013-07-28 10:33:07 PDT
Did you ever perform a try push with shuffle enabled by default? I have a theory I/O load is concentrated around related tests in specific directories and shuffling test order will help spread it out (at least in the common case).

Also, since test order is no longer deterministic in a parallel-executing world, I'd like to toss out the idea of having shuffling enabled by default. We should still give people the option to run in order, of course.

I'm interested in what others have to think.
Comment 48 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-28 12:33:00 PDT
I remember suggesting that and the general idea was that we want automation to be as predictable/reproducible as possible.

I can/will schedule such a try push, and you can also check it out locally by passing the --shuffle parameter to mach xpcshell-test.
Comment 49 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-30 13:55:22 PDT
Created attachment 783339 [details] [diff] [review]
Parallelize xpcshell

Rebase over changes from bug 890098
Comment 50 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-30 14:42:46 PDT
Created attachment 783361 [details] [diff] [review]
Parallelize xpcshell

rebase mobile harness changes as well
Comment 51 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-30 14:49:59 PDT
Created attachment 783367 [details] [diff] [review]
adapt mobile harnesses

Took out leak logging from the remote harness
Comment 52 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-30 15:04:03 PDT
Created attachment 783376 [details] [diff] [review]
Parallelize xpcshell

Took out a comment about leak logging.
Comment 53 Mihnea Dobrescu-Balaur (:mihneadb) 2013-07-31 10:03:37 PDT
Comment on attachment 783376 [details] [diff] [review]
Parallelize xpcshell

Don't mind the mobile harness changes, there's a separate patch that addresses those.
Comment 54 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-01 20:03:44 PDT
Created attachment 784774 [details] [diff] [review]
Parallelize xpcshell

So.. I had some fun with the windows test slave. Turns out if we increase
the waiting time to 20s (experimented with the value) we won't encounter those
file removal hangs. From what I saw this happens only for ~30ish tests out of
~1800, so I'd say this workaround is worth a shot.
Comment 55 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-01 22:37:07 PDT
Created attachment 784809 [details] [diff] [review]
adapt mobile harness

Added remoteAPK to the mobileArgs. (Android was not passing on tbpl)
Comment 56 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 00:18:57 PDT
Created attachment 784846 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Fix for self.mobileArgs
Comment 57 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 01:55:29 PDT
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #54)
> Created attachment 784774 [details] [diff] [review]
> Parallelize xpcshell
> 
> So.. I had some fun with the windows test slave. Turns out if we increase
> the waiting time to 20s (experimented with the value) we won't encounter
> those
> file removal hangs. From what I saw this happens only for ~30ish tests out of
> ~1800, so I'd say this workaround is worth a shot.

Just wanted to confirm that I did multiple runs on the test slave and I encountered no more "file is locked" issues.
Comment 58 Andrew Halberstadt [:ahal] 2013-08-02 07:04:59 PDT
I don't think adding a 20s sleep is an acceptable solution. 20*300/4cores = 150 seconds of wasted time per job.

Maybe we could spawn a thread that immediately goes to sleep and wakes up every second or so to see if the plugin process has terminated and then performs the cleanup. Or if we could figure out how to "sudo rm -rf" those files, that would also work.
Comment 59 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 09:52:41 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #58)
> I don't think adding a 20s sleep is an acceptable solution. 20*300/4cores =
> 150 seconds of wasted time per job.

Well, if we don't do that, we can hope that running the tests sequentially will fix the issue (which will make us probably more time), and if that does not fix it, then we cannot land the parallel harness at all, and that means even more wasted time.

It's also just on windows and just when it's needed. If you prefer we can do a busy loop with a timeout.

> 
> Maybe we could spawn a thread that immediately goes to sleep and wakes up
> every second or so to see if the plugin process has terminated and then
> performs the cleanup. Or if we could figure out how to "sudo rm -rf" those
> files, that would also work.

2 things :) :
1. I looked this thing up for quite a while, there's seems to be no way of unlocking that file if it were indeed in use (there are some tools online that would inject some code and make the process crash)
2. It seems that the process is done, just that the Windows filesystem has a delay until figuring out that "hey, it's ok, we can unlock this file".


So.. should I try a busy loop with a 20s timeout?
Comment 60 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 16:02:34 PDT
Created attachment 785251 [details] [diff] [review]
Parallelize xpcshell test harness

Added locking for print_stdout
Comment 61 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 16:24:38 PDT
Created attachment 785257 [details] [diff] [review]
Parallelize xpcshell test harness

Acquiring the stdout lock in case of errors earlier to group all output together.
Comment 62 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-02 16:32:30 PDT
Created attachment 785263 [details] [diff] [review]
Parallelize xpcshell test harness

Was acquiring too early in one case.
Comment 63 Chris Manchester (:chmanchester) 2013-08-02 17:04:34 PDT
:mihneadb and chatted a bit about this - The lock doesn't actually isolate buffered failure output because other threads calling log for routine output don't acquire the lock. I'm assuming acquiring the lock for each and every call to log would introduce significant overhead.

The issue is that this will result in failure output interleaved with routine output - this could be a real problem for debugging failures. We could reconstruct failure output at the end of a test run or at other synchronization points, but it seems whatever we do there will be a compromise between getting output as it's generated and getting output that isn't interleaved.
Comment 64 Gregory Szorc [:gps] 2013-08-05 10:43:55 PDT
(In reply to Chris Manchester [:chmanchester] from comment #63)
> :mihneadb and chatted a bit about this - The lock doesn't actually isolate
> buffered failure output because other threads calling log for routine output
> don't acquire the lock. I'm assuming acquiring the lock for each and every
> call to log would introduce significant overhead.

I'm not convinced the overhead would be that significant.

I believe it will be necessary to acquire the lock on stdout. Otherwise, you will run into issues like bug 886498. You may not sees these problems, but trust me, others will.
Comment 65 Chris Manchester (:chmanchester) 2013-08-05 15:33:00 PDT
Created attachment 786001 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

I'm glad to have been wrong about my assumption - I ran some xpcshell time tests
locally for dom, network, and devtools directories and didn't observe any
performance penalty for acquiring a lock for each call to the logger. In fact,
for devtools, which produces a lot of failure output on my machine, sys time
_decreased_ significantly with this patch applied.
Comment 66 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-05 15:48:16 PDT
Comment on attachment 786001 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

Review of attachment 786001 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ +576,5 @@
>      def __init__(self, log=None):
>          """ Init logging and node status """
>          if log:
>              resetGlobalLog(log)
> +

Let's add a comment here saying what and why we are doing.
Comment 67 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-05 15:52:26 PDT
Created attachment 786017 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Found an issue with the updated android harness. This should fix it.

https://tbpl.mozilla.org/?tree=Try&rev=cec9735a998c
Comment 68 Chris Manchester (:chmanchester) 2013-08-05 16:27:15 PDT
Created attachment 786039 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel; f?mihneadb

Added comments and exhaustive enumeration of log attributes.
Comment 69 Chris Manchester (:chmanchester) 2013-08-06 00:13:00 PDT
Created attachment 786166 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

This patch fixes an error in the previous patch by removing 'log' from
the list of wrapped methods.
Comment 70 Chris Manchester (:chmanchester) 2013-08-06 10:29:08 PDT
Created attachment 786370 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel
Comment 71 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-06 13:55:39 PDT
Created attachment 786485 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Ok, the mobile harness is working as expected now:

https://tbpl.mozilla.org/?tree=Try&rev=3747be01371d
Comment 72 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-06 15:13:06 PDT
Created attachment 786534 [details] [diff] [review]
Parallelize xpcshell

pro tip: don't name python vars "try"
Comment 73 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-07 11:29:00 PDT
Created attachment 787027 [details] [diff] [review]
Parallelize xpcshell

Had a bug in the recursive cleanupDir retry logic.
Comment 74 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-07 12:37:15 PDT
Created attachment 787071 [details] [diff] [review]
Add support for --sequential to mach

Added support for this when no dir is specified.
Comment 75 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-07 12:38:26 PDT
Comment on attachment 787071 [details] [diff] [review]
Add support for --sequential to mach

keeping r+
Comment 76 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-07 14:14:03 PDT
Created attachment 787128 [details] [diff] [review]
Parallelize xpcshell

Changed the check to 1 second and increased the timeout, as discussed IRL.
Comment 77 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-07 14:49:46 PDT
Comment on attachment 786485 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Can I get your final ok on this? It's the same as when you last reviewed it, just moved some code around to make sure everything works.
Comment 78 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-08 13:35:11 PDT
Created attachment 787754 [details] [diff] [review]
Clean up slacker directories after the test run ends

As discussed on IRC.. miserable but kind of have to.
Comment 79 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-08 23:23:18 PDT
Created attachment 787970 [details] [diff] [review]
Parallelize xpcshell harness

So now we have two consecutive green try runs [1][2], I'd love to get this reviewed, fold the patches and try landing it on inbound as soon as possible.

[1] https://tbpl.mozilla.org/?tree=Try&rev=7104ec2d86e5
[2] https://tbpl.mozilla.org/?tree=Try&rev=0a913c0860a5
Comment 80 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-09 08:27:14 PDT
Two more consecutive all green runs :)
https://tbpl.mozilla.org/?tree=Try&rev=e3b02a523b74
https://tbpl.mozilla.org/?tree=Try&rev=cb4ced2feb16
Comment 81 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-09 09:39:36 PDT
Created attachment 788226 [details] [diff] [review]
Parallelize xpcshell harness

There's no need for such a big timeout now, since the "cleanup slacker dirs" patch.
Comment 82 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-09 09:48:28 PDT
Did a try run with Chris's patch applied as well. (had to fix some conflicts)
https://tbpl.mozilla.org/?tree=Try&rev=aaadd274c58c
Comment 83 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-09 09:58:59 PDT
Created attachment 788236 [details] [diff] [review]
Clean up slacker directories after the test run ends

Forgot to add self.cleanup_dir_list to the test thread class.
Comment 84 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-09 10:00:21 PDT
Corresponding new try run - https://tbpl.mozilla.org/?tree=Try&rev=a9d63b5d7b12
Comment 85 Ted Mielczarek [:ted.mielczarek] 2013-08-09 13:21:48 PDT
Comment on attachment 788226 [details] [diff] [review]
Parallelize xpcshell harness

Review of attachment 788226 [details] [diff] [review]:
-----------------------------------------------------------------

This is r- but only for the SIGINT changes. Otherwise it looks pretty good and there are just a few cleanup nits.

::: testing/xpcshell/runxpcshelltests.py
@@ +84,5 @@
> +        self.env = copy.deepcopy(kwargs.get('env'))
> +        self.symbolsPath = kwargs.get('symbolsPath')
> +        self.logfiles = kwargs.get('logfiles')
> +        self.xpcshell = kwargs.get('xpcshell')
> +        self.xpcsRunArgs = kwargs.get('xpcsRunArgs')

This pattern is a little odd. I'm not sure what ahal is trying to future-proof from here.

@@ +147,5 @@
> +            f.write(stdout)
> +
> +        finally:
> +            if f:
> +                f.close()

Can you switch this to use a with statement while you're moving the code?

@@ +168,5 @@
> +        """
> +          Simple wrapper to launch a process.
> +          On a remote system, this is more complex and we need to overload this function.
> +        """
> +        cmd = wrapCommand(cmd)

I think we can drop this now, this was for OS X 10.5 compat.

@@ +340,5 @@
> +                        # filesystem is slow to react to the changes
> +        try:
> +            self.removeDir(directory)
> +            return True
> +        except Exception:

I'd like to see this switched to catching specific exceptions (probably OSError, WindowsError) instead of a catch-all.

@@ +348,5 @@
> +            # little bit and try again.
> +            time.sleep(1)
> +
> +            try:
> +                self.removeDir(directory)

This is a little awkward. You could hoist the try_count < TRY_LIMIT block from below up to after the sleep here and get rid of the second try block. Recursion is not how I would have written this, but it's a legitimate way to implement this so that's fine.

@@ +1169,5 @@
> +            # wait for at least one of the tests to finish
> +            # (every test releases the semaphore (+1) when it's done)
> +            self.sem.acquire()
> +            # add 1 back to the semaphore, will be subtracted in the loop
> +            self.sem.release()

This is pretty confusing, but it does solve your use case. I might want input from someone who's more familiar with synchronization primitives if this is a legit way to use a semaphore.

@@ -970,5 @@
> -            # - don't move this line above launchProcess, or child will inherit the SIG_IGN
> -            signal.signal(signal.SIGINT, markGotSIGINT)
> -            # |stderr == None| as |pStderr| was either |None| or redirected to |stdout|.
> -            stdout, stderr = self.communicate(proc)
> -            signal.signal(signal.SIGINT, signal.SIG_DFL)

So this code got mostly removed in this patch, which is going to regress the "Ctrl-C to kill just xpcshell" feature.

I realize this is basically impossible in the parallel case, but I would like to preserve this behavior in the sequential case (or for running a single test). It should be fine to only do this if you're running sequentially.
Comment 86 Ted Mielczarek [:ted.mielczarek] 2013-08-09 13:24:01 PDT
Comment on attachment 788236 [details] [diff] [review]
Clean up slacker directories after the test run ends

Review of attachment 788236 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ +1247,5 @@
> +        for directory in self.cleanup_dir_list:
> +            try:
> +                shutil.rmtree(directory)
> +            except:
> +                pass

This could use a comment and a log.info. We agreed to just let this slide instead of causing a failure, since $TEMP should be cleaned up on reboot on test slaves now, and we're doing a best-effort to clean things up here, but I'd like it noted.
Comment 87 Ted Mielczarek [:ted.mielczarek] 2013-08-09 17:41:09 PDT
Comment on attachment 786370 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel

Review of attachment 786370 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ +361,5 @@
>                  message = "TEST-UNEXPECTED-FAIL | %s | Failed to clean up directory: %s" % (name, sys.exc_info()[1])
>                  self.log.error(message)
>                  self.print_stdout(stdout)
>                  self.print_stdout(traceback.format_exc())
> +                LOG_MUTEX.release()

You can write this as:
with LOG_MUTEX:
  ...

http://docs.python.org/2/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement

@@ +591,5 @@
> +                def wrapped(*args, **kwargs):
> +                    LOG_MUTEX.acquire()
> +                    unwrapped(*args, **kwargs)
> +                    LOG_MUTEX.release()
> +                setattr(self.log, fun_name, wrapped)

I don't think this will actually do what you think it will do. You're only actually defining one copy of wrapped, so all your log functions are going to get overwritten... You need to do something like

def wrap(fn):
  def wrapped(*args, **kwargs):
    ...
    fn(*args, **kwargs)
    ...
  return wrapped
setattr(self.log, fun_name, wrap(getattr(self.log, fun_name))
Comment 88 Chris Manchester (:chmanchester) 2013-08-10 13:50:12 PDT
Created attachment 788569 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel; r?ted
Comment 89 Ted Mielczarek [:ted.mielczarek] 2013-08-12 05:06:47 PDT
Comment on attachment 788569 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel; r?ted

Review of attachment 788569 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ +596,5 @@
> +                    def wrapped(*args, **kwargs):
> +                        with LOG_MUTEX:
> +                            fn(*args, **kwargs)
> +                    return wrapped
> +                setattr(self.log, fun_name, wrap(unwrapped))

Wish there was a less-verbose way to write this, but this will have to do.
Comment 90 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 12:34:29 PDT
Created attachment 789088 [details] [diff] [review]
Parallelize xpcshell harness

Folded the [updated according to review] cleanup dir patch into this.
Comment 91 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 12:52:08 PDT
Created attachment 789100 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Rebased patch on top of the current parxpc iteration.
Comment 92 Chris Manchester (:chmanchester) 2013-08-12 13:18:31 PDT
Created attachment 789129 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

Unbitrot. This applies on top of the base and mobile harness patches.
Comment 93 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 14:30:42 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #85)
> Comment on attachment 788226 [details] [diff] [review]
> Parallelize xpcshell harness
> 
> Review of attachment 788226 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: testing/xpcshell/runxpcshelltests.py
> @@ +84,5 @@
> > +        self.env = copy.deepcopy(kwargs.get('env'))
> > +        self.symbolsPath = kwargs.get('symbolsPath')
> > +        self.logfiles = kwargs.get('logfiles')
> > +        self.xpcshell = kwargs.get('xpcshell')
> > +        self.xpcsRunArgs = kwargs.get('xpcsRunArgs')
> 
> This pattern is a little odd. I'm not sure what ahal is trying to
> future-proof from here.
> 

http://pastebin.mozilla.org/2841780 Here's Andrew's comment.

I feel it makes it a bit easier for the classes that extend this (mobile). I'm fine either way.
Comment 94 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 14:45:00 PDT
Created attachment 789186 [details] [diff] [review]
Parallelize desktop xpcshell harness

Updated patch according to the review comments. Still waiting on the "OK" for
my semaphore approach.
Comment 95 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 14:58:37 PDT
Created attachment 789192 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Reupload.
Comment 96 Chris Manchester (:chmanchester) 2013-08-12 15:07:30 PDT
Created attachment 789197 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

Unbitrot
Comment 97 Gregory Szorc [:gps] 2013-08-12 16:05:19 PDT
Comment on attachment 789186 [details] [diff] [review]
Parallelize desktop xpcshell harness

Review of attachment 789186 [details] [diff] [review]:
-----------------------------------------------------------------

Using a semaphore as a guard variable seems wrong. For the existing usage, I think a threading.Condition is the better primitive. Just have the tests notify the condition variable on completion and have the results "poller" wait/acquire the condition. Upon notification, just iterate over completed tests and reap them, just as you do now.

If you have your heart set on using semaphores, you can initialize a semaphore with NUM_THREADS and have each thread attempt to acquire the semaphore as a means of starting execution. That is traditionally how semaphores are used.
Comment 98 Gregory Szorc [:gps] 2013-08-12 16:07:07 PDT
I ran this patch on my Macbook Pro and got the following:

:45.97 TEST-PASS | /Users/gps/src/firefox/obj-firefox.noindex/_tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell-unpack/test_blocklistchange.js | test passed (time: 13655.589ms)
 1:51.87 TEST-PASS | /Users/gps/src/firefox/obj-firefox.noindex/_tests/xpcshell/toolkit/components/url-classifier/tests/unit/test_partial.js | test passed (time: 27970.437ms)
 1:51.87 INFO | Following exceptions were raised:
 1:51.87 Traceback (most recent call last):
  File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 112, in run
    self.run_test()
  File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 538, in run_test
    self.cleanupDir(self.profileDir, name, stdout, self.xunit_result)
  File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 341, in cleanupDir
    except (OSError, WindowsError):
NameError: global name 'WindowsError' is not defined

Error running mach:

    ['xpcshell-test']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.

You should consider filing a bug for this issue.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

NameError: global name 'WindowsError' is not defined

  File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 179, in run_xpcshell_test
    return xpcshell.run_test(**params)
  File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 57, in run_test
    keep_going=keep_going, shuffle=shuffle)
  File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 45, in run_suite
    return self._run_xpcshell_harness(manifest=manifest, **kwargs)
  File "/Users/gps/src/firefox/testing/xpcshell/mach_commands.py", line 148, in _run_xpcshell_harness
    result = xpcshell.runTests(**filtered_args)
  File "/Users/gps/src/firefox/testing/xpcshell/runxpcshelltests.py", line 1229, in runTests
    raise exceptions[0]


isinstance(WindowsError, OSError) -> True, so you can omit WindowsError from the except statement.
Comment 99 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 16:59:24 PDT
Created attachment 789270 [details] [diff] [review]
Parallelize desktop xpcshell harness

After the IRL talk with :gps we agreed to stick to the semaphore.
(Can easily change on request, I have a patch that switches to a Cond in my
queue.)

Fixed the WindowsError thing.

Also, set options.sequential to True in the __main__ function. Let's land it
this way, so we have all the code there, and then we can just "turn it on"
with a small patch.

Developers who use mach will get parallel tests, which is fine, because the
intermittents only seem to happen on TBPL. (They can pass --sequential if
needed.)


Here's a try run: https://tbpl.mozilla.org/?tree=Try&rev=1b8407f88d2f
Comment 100 Gregory Szorc [:gps] 2013-08-12 17:02:59 PDT
I meant to say you should use http://docs.python.org/2/library/threading.html#event-objects. But, I suppose Semaphore will work. It was certainly simpler than Condition variables.
Comment 101 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 17:47:29 PDT
Created attachment 789290 [details] [diff] [review]
Parallelize desktop xpcshell harness

Changed this to using an Event instead of the Semaphore.
Comment 102 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 17:49:53 PDT
Created attachment 789291 [details] [diff] [review]
Adapt mobile xpcshell harnesses to the changes from parxpc

Rebased.
Comment 103 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 17:58:27 PDT
https://tbpl.mozilla.org/?tree=Try&rev=383d7fb2988d
Comment 104 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 18:06:52 PDT
Created attachment 789301 [details] [diff] [review]
Add support for --sequential to mach,

Added a warning for eventual local failures.
Comment 105 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 18:22:24 PDT
Created attachment 789307 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

Rebased chmanchester's patch as well.
Comment 106 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-12 18:22:59 PDT
Comment on attachment 789307 [details] [diff] [review]
Synchronize blocks of output when running xpcshell tests in parallel;

keeping r+
Comment 107 Ted Mielczarek [:ted.mielczarek] 2013-08-14 05:20:48 PDT
Comment on attachment 789290 [details] [diff] [review]
Parallelize desktop xpcshell harness

Review of attachment 789290 [details] [diff] [review]:
-----------------------------------------------------------------

Just a few little things to fix, looks good otherwise.

::: testing/xpcshell/runxpcshelltests.py
@@ -759,5 @@
>          self.pluginsPath = pluginsPath
> -
> -        # If we have an interactive debugger, disable ctrl-c.
> -        if self.debuggerInfo and self.debuggerInfo["interactive"]:
> -            signal.signal(signal.SIGINT, lambda signum, frame: None)

We might need to preserve this behavior as well, but I'm not 100% sure how it overlaps with the other SIGINT handling. If you want to be thorough and test with --debugger and see how that works that'd be cool, but I won't hold your patch up for it.

@@ -970,5 @@
> -            # - don't move this line above launchProcess, or child will inherit the SIG_IGN
> -            signal.signal(signal.SIGINT, markGotSIGINT)
> -            # |stderr == None| as |pStderr| was either |None| or redirected to |stdout|.
> -            stdout, stderr = self.communicate(proc)
> -            signal.signal(signal.SIGINT, signal.SIG_DFL)

You lost this line, this wants to get moved to after you finish running tests to restore the default signal handler.
Comment 108 Ted Mielczarek [:ted.mielczarek] 2013-08-14 05:22:46 PDT
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #93)
> http://pastebin.mozilla.org/2841780 Here's Andrew's comment.
> 
> I feel it makes it a bit easier for the classes that extend this (mobile).
> I'm fine either way.

I'm still not wild about it, but since it was ahal's suggestion and it's not terribly important let's just go with it. It's not a big deal to change later if we want to.
Comment 109 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:16:52 PDT
Created attachment 790316 [details] [diff] [review]
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation;
Comment 110 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:17:19 PDT
Created attachment 790317 [details] [diff] [review]
Part 2 - Add parallel warning and support for --sequential to mach xpcshell-test,
Comment 111 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:18:50 PDT
Comment on attachment 790316 [details] [diff] [review]
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation;

keeping previous r+'s from :ahal and :ted.
Comment 112 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:19:18 PDT
Comment on attachment 790317 [details] [diff] [review]
Part 2 - Add parallel warning and support for --sequential to mach xpcshell-test,

keep previous r+ from :gps
Comment 113 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:38:56 PDT
Created attachment 790327 [details] [diff] [review]
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation;

Added a 1s timeout to the wait call to make sure no deadlocks occur.
Comment 114 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:40:45 PDT
Created attachment 790330 [details] [diff] [review]
Part 3 - Synchronize blocks of output when running xpcshell tests in parallel;

Udpated commit msg
Comment 115 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:41:44 PDT
Comment on attachment 790327 [details] [diff] [review]
Part 1 - Refactor xpcshell test harness to support parallel test runs, disabled in automation;

keeping r+'s
Comment 116 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 10:42:27 PDT
Comment on attachment 790330 [details] [diff] [review]
Part 3 - Synchronize blocks of output when running xpcshell tests in parallel;

keep r+
Comment 117 Mihnea Dobrescu-Balaur (:mihneadb) 2013-08-14 11:31:30 PDT
Created attachment 790354 [details] [diff] [review]
Part 4 - Mark dom/network/tests/unit/test_tcpserversocket.js to run sequentially;
Comment 119 Gregory Szorc [:gps] 2013-08-14 16:34:08 PDT
\o/

Awesome work everyone - especially Mihnea!

Note You need to log in before you can comment on or make changes to this bug.