Last Comment Bug 813742 - Parallelize the reftest and crashtest suites
: Parallelize the reftest and crashtest suites
Status: NEW
[buildfaster:1][leave open]
:
Product: Testing
Classification: Components
Component: Reftest (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
: 845472 (view as bug list)
Depends on: 877661 930135 858203 859339 877824 881242 883981 883983 899547 918419 920493
Blocks: 845748 961234 1271162
  Show dependency treegraph
 
Reported: 2012-11-20 14:13 PST by :Ehsan Akhgari (busy, don't ask for review please)
Modified: 2016-05-08 18:05 PDT (History)
24 users (show)
nfroyd: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
enable parallel running of reftests (10.64 KB, patch)
2013-04-19 12:55 PDT, Nathan Froyd [:froydnj]
dholbert: feedback+
ted: feedback-
Details | Diff | Review
enable parallel running of reftests (11.28 KB, patch)
2013-05-07 13:26 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
enable parallel running of reftests (11.28 KB, patch)
2013-05-07 14:34 PDT, Nathan Froyd [:froydnj]
ted: feedback-
Details | Diff | Review
add --run-tests-in-parallel option (11.05 KB, patch)
2013-06-17 12:54 PDT, Nathan Froyd [:froydnj]
ted: feedback+
Details | Diff | Review
add --run-tests-in-parallel option (13.12 KB, patch)
2013-07-12 05:13 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
add --run-tests-in-parallel option to reftests (12.76 KB, patch)
2013-07-31 08:33 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
add --run-tests-in-parallel option to reftests (13.37 KB, patch)
2013-10-22 18:21 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
add --run-tests-in-parallel option to reftests (13.08 KB, patch)
2013-10-23 06:00 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
add --run-tests-in-parallel option to reftests (14.32 KB, patch)
2013-10-23 13:45 PDT, Nathan Froyd [:froydnj]
ted: review+
Details | Diff | Review
add --parallel option for mach reftest-esque things (3.09 KB, patch)
2014-01-15 06:43 PST, Nathan Froyd [:froydnj]
ted: review+
Details | Diff | Review

Description :Ehsan Akhgari (busy, don't ask for review please) 2012-11-20 14:13:28 PST
We should be able to run these tests in parallel relatively easily.  The idea is to launch multiple Firefox instances on a desktop environment and run a subset of each of these tests in each one.

The only tests that actually need to be run non-parallel are the ones which rely on having focus from the OS, so this project requires us to annotate them.  We already have the needs-focus annotation in the manifest file which we use to specify those tests, but there are tests that require focus and we don't know about them.

One way to find those tests is to run crashtests/reftests locally and make sure that the window is not focused, and see which tests fail, and then look at the code for each one to see if the test does something which relies on having focus from the OS.
Comment 1 Ed Morley [:emorley] 2013-02-27 02:18:15 PST
*** Bug 845472 has been marked as a duplicate of this bug. ***
Comment 2 Ed Morley [:emorley] 2013-02-27 02:19:21 PST
From gps' bug:

(In reply to Gregory Szorc [:gps] from comment #0)
> This is a purposefully high-level bug requesting that reftests be run in
> parallel (on the same machine) or achieve the traditional net reduction in
> wall time that executing in parallel typically achieves. Ideally, the wall
> time of reftests decreases linearly with the number of CPU cores in a
> machine. The trend in CPUs is towards more cores: we should make use of them
> in our automation!
> 
> From a quick glance at a random m-c job on TBPL, it appears we spend 30+
> hours in wall time running reftests and variants (like crashtests). This is
> a substantial percentage of the total wall time spent for checkins (>20%).
> Reduction in the total wall time spent running reftests would allow the
> build infrastructure to process more jobs and this would benefit everybody.
> I imagine reduction in wall time would also allow developers running
> reftests locally to be more productive!
> 
> I just talked with dbaron and he seemed to think that this goal is loosely
> attainable and there are a number of avenues worth pursuing. However, they
> require some discussions between layout, graphics, and possibly tools/build
> for the automation components.
> 
> I don't expect immediate movement on this bug. I just figured we should have
> something on file.
Comment 3 Ed Morley [:emorley] 2013-02-27 02:22:41 PST
Moving CCs from the duped bug.

I've also created the meta bug 845748, for looking into this for other suites as well.
Comment 4 Nathan Froyd [:froydnj] 2013-04-04 07:43:09 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #0)
> One way to find those tests is to run crashtests/reftests locally and make
> sure that the window is not focused, and see which tests fail, and then look
> at the code for each one to see if the test does something which relies on
> having focus from the OS.

Hm, do you have a good idea how to accomplish this?  I tried doing this last night, running tests on an embedded X server sans window manager and got:

REFTEST TEST-UNEXPECTED-FAIL | file:///home/froydnj/src/mozilla-central-official/layout/reftests/bidi/779003-1-dynamic.html | image comparison (==), max difference: 10, number of differing pixels: 2
REFTEST TEST-UNEXPECTED-FAIL | file:///home/froydnj/src/mozilla-central-official/layout/reftests/bugs/621918-1.svg | image comparison (==), max difference: 44, number of differing pixels: 2
REFTEST TEST-UNEXPECTED-FAIL | file:///home/froydnj/src/mozilla-central-official/layout/reftests/text-overflow/aligned-baseline.html | image comparison (==), max difference: 255, number of differing pixels: 4070
REFTEST TEST-UNEXPECTED-PASS | file:///home/froydnj/src/mozilla-central-official/layout/reftests/forms/textarea-rtl.html | image comparison (!=), max difference: 110, number of differing pixels: 109
REFTEST TEST-UNEXPECTED-FAIL | file:///home/froydnj/src/mozilla-central-official/layout/reftests/mathml/mfenced-10.html | image comparison (==), max difference: 255, number of differing pixels: 519

which, AFAICT, is solely due to my machine not having the correct fonts.  Thinking that the initial window on such an X server may have received focus anyway, I ran reftests on a different virtual desktop that the one I was using.  Theoretically, this should have meant the window never got focus.  This led to a pile of failures (51 UNEXPECTED-FAIL, 2 UNEXPECTED-PASS), including:

REFTEST TEST-UNEXPECTED-FAIL | file:///home/froydnj/src/mozilla-central-official/layout/reftests/reftest-sanity/needs-focus.html | load failed: timed out waiting for reftest-wait to be removed

This test is marked as needs-focus.  Other needs-focus tests did pass.  This leads me to believe that my testing methodology was not quite right.
Comment 5 :Ehsan Akhgari (busy, don't ask for review please) 2013-04-04 09:58:31 PDT
I'm not sure why you're using different virtual desktops... That probably just complicates things.

I think for starters, you want to run all of the crashtests locally (let's start with crashtest as they're a smaller set, but everything applies exactly the same way to reftests as well) and make sure that they all pass.  If you see tests that are failing on a pristine trunk for you locally (which unfortunately would not surprise me), just remove them for now and move on.  Then, you want to modify the reftest framework to *never* focus the window, no matter what, and then re-run the tests.  The subset of tests which fail in the second run but not the first run are the one that need to be marked as needs-focus.  We cannot run those tests in parallel, but we should be able to run everything else in parallel.

Once we're there, we should actually modify the framework to split up the work into multiple Firefox processes.  You'll probably want to consult dbaron on what the best way of doing that is, since he owns the framework.  But I think for now it's extremely valuable to focus on finding tests that actually require focus in order to succeed.
Comment 6 Nathan Froyd [:froydnj] 2013-04-04 10:21:19 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> I'm not sure why you're using different virtual desktops... That probably
> just complicates things.

Why would that complicate things any?
Comment 7 Nathan Froyd [:froydnj] 2013-04-04 12:47:50 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> I think for starters, you want to run all of the crashtests locally (let's
> start with crashtest as they're a smaller set, but everything applies
> exactly the same way to reftests as well) and make sure that they all pass. 
> If you see tests that are failing on a pristine trunk for you locally (which
> unfortunately would not surprise me), just remove them for now and move on. 
> Then, you want to modify the reftest framework to *never* focus the window,
> no matter what, and then re-run the tests.  The subset of tests which fail
> in the second run but not the first run are the one that need to be marked
> as needs-focus.  We cannot run those tests in parallel, but we should be
> able to run everything else in parallel.

I opened bug 858203 for the reftests that need annotating.

At least as of a couple of days ago, all the crashtests were properly annotated (assuming my testing methodology in bug 858203 comment 1 is sound).
Comment 8 :Ehsan Akhgari (busy, don't ask for review please) 2013-04-08 09:20:09 PDT
(In reply to Nathan Froyd (:froydnj) from comment #6)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> > I'm not sure why you're using different virtual desktops... That probably
> > just complicates things.
> 
> Why would that complicate things any?

Because it adds more things which we should rely on.  What's wrong with having multiple windows coming from multiple Gecko's on the same desktop?
Comment 9 Nathan Froyd [:froydnj] 2013-04-09 09:26:32 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> (In reply to Nathan Froyd (:froydnj) from comment #6)
> > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> > > I'm not sure why you're using different virtual desktops... That probably
> > > just complicates things.
> > 
> > Why would that complicate things any?
> 
> Because it adds more things which we should rely on.  What's wrong with
> having multiple windows coming from multiple Gecko's on the same desktop?

Nothing.  I'm not saying that we should rely on virtual desktops for testing; it's merely what I was trying to use for making sure that the browser window never received focus and I could still get some work done on the same machine.  Running on a virtual X server worked just as well.
Comment 10 :Ehsan Akhgari (busy, don't ask for review please) 2013-04-09 11:28:58 PDT
(In reply to comment #9)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #8)
> > (In reply to Nathan Froyd (:froydnj) from comment #6)
> > > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> > > > I'm not sure why you're using different virtual desktops... That probably
> > > > just complicates things.
> > > 
> > > Why would that complicate things any?
> > 
> > Because it adds more things which we should rely on.  What's wrong with
> > having multiple windows coming from multiple Gecko's on the same desktop?
> 
> Nothing.  I'm not saying that we should rely on virtual desktops for testing;
> it's merely what I was trying to use for making sure that the browser window
> never received focus and I could still get some work done on the same machine. 
> Running on a virtual X server worked just as well.

I'd like to avoid any solution which won't scale to multiple platforms.  I don't think that we have anything like that on Windows for example...
Comment 11 Nathan Froyd [:froydnj] 2013-04-09 11:31:48 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #10)
> I'd like to avoid any solution which won't scale to multiple platforms.  I
> don't think that we have anything like that on Windows for example...

We are in violent agreement.
Comment 12 Nathan Froyd [:froydnj] 2013-04-19 12:55:55 PDT
Created attachment 739761 [details] [diff] [review]
enable parallel running of reftests

Here's a first cut to get discussion started.  I considered modifying
runreftest.py to automagically decide how many reftest processes to
run.  But doing that way would require some surgery on how automation.py
works, and I wasn't sure I was willing to dive into that work quite
yet.  It's worth noting that the log management still needs some work,
as testsuite-targets.mk expresses so well. :)

This patch expresses my personal preference for doing parallel task
management: let make handle it, which is far more hardened and
battle-tested than one-off things we can write in Python.  I realize
that other people (gps, at least) have advocated for doing things in
Python instead.  I'm happy to do whichever, but I think this way of
doing things requires many fewer changes.  I suppose infra changes might
be required to specify different testsuite targets, etc.?  (This way of
doing things *does* mean that we have to manually decide what the
parallelization factor is, which is surely a detriment to developers
with > 4 core machines.  I think it's possible to be smarter with the
makefile to overcome this, though.)

In any event, I think we need something like the reftest.js changes
regardless of where process management winds up, whether in makefiles or
Python code.
Comment 13 Nathan Froyd [:froydnj] 2013-04-19 12:57:52 PDT
Comment on attachment 739761 [details] [diff] [review]
enable parallel running of reftests

I'm not sure who to ask for feedback here; flagging a few likely vic^Wknowledgeable people
Comment 14 :Ehsan Akhgari (busy, don't ask for review please) 2013-04-19 13:08:17 PDT
Can you also make sure that the reftest window is never focused?  This way, if somebody adds a test which relies on focus without setting needs-focus, then their test will fail every single time, instead of some of the times.
Comment 15 Daniel Holbert [:dholbert] 2013-04-19 13:23:31 PDT
Comment on attachment 739761 [details] [diff] [review]
enable parallel running of reftests

Generally this seems like a good idea.  A few nits below:

>+++ b/layout/tools/reftest/reftest.js
>@@ -40,16 +40,20 @@ Components.utils.import("resource://gre/modules/FileUtils.jsm");
>+const ALL_TESTS = 0;
>+const ONLY_NEEDS_FOCUS_TESTS = 1;
>+const ONLY_NON_NEEDS_FOCUS_TESTS = 2;
>+var gOnlyRunTests = 0;

I think the naming could be clearer here.

Some other ideas: s/gOnlyRunTests/gTestFilterMode/?

and then:
 FILTER_MODE_ALL_TESTS
 FILTER_MODE_ONLY_NEEDS_FOCUS_TESTS
 FILTER_MODE_ONLY_NON_NEEDS_FOCUS_TESTS
or something like that?

>@@ -1128,22 +1141,28 @@ function StartCurrentTest()
>-        } else if (test.needsFocus && !Focus()) {
>+        } else if (test.needsFocus &&
>+                   (gOnlyRunTests == ONLY_NON_NEEDS_FOCUS_TESTS || !Focus())) {
>             // FIXME: Marking this as a known fail is dangerous!  What
>             // if it starts failing all the time?
>             ++gTestResults.Skip;
>             gDumpLog("REFTEST TEST-KNOWN-FAIL | " + test.url1.spec + " | (SKIPPED; COULDN'T GET FOCUS)\n");
>             gURLs.shift();

This doesn't look quite right.

If a test needs focus, and we're doing a non-focus run, we want to just skip the test.

But this also makes us print out:
 REFTEST TEST-KNOWN-FAIL [...] (SKIPPED; COULDN'T GET FOCUS)
which seems misleading and undesirable.  (It's not that we *couldn't* get focus -- this is just a run where we're *explicitly skipping* the focus tests.)

>+        } else if (!test.needsFocus &&
>+                   (gOnlyRunTests == ONLY_NEEDS_FOCUS_TESTS)) {
>+            ++gTestResults.Skip;
>+            gDumpLog("REFTEST TEST-KNOWN-FAIL | " + test.url1.spec + " | (SKIP)\n");
>+            gURLs.shift();

As noted above, it seems odd to be printing out a "known fail" message for tests that we're just deciding to skip.

>+      strmap = { "all" : ALL_TESTS,
>+                 "needs-focus" : ONLY_NEEDS_FOCUS_TESTS,
>+                 "non-needs-focus" : ONLY_NON_NEEDS_FOCUS_TESTS }
>+      if not value in strmap:
>+        raise optparse.OptionValueError("argument to %s must be one of %s"
>+                                        % (opt_str, ", ".join(strmap.keys())))
>+      parser.values.onlyRunTests = strmap[value]
>+    self.add_option("--only-run-tests", type = "string",
>+                    action = "callback", callback = convertOnlyRunTestsString,
>+                    help = "which tests to run (needs-focus or non-needs-focus)."
>+                           "Defaults to all tests.")
>+    defaults["onlyRunTests"] = ALL_TESTS

Probably worth mentioning "all" as user-passable option in the help text there, maybe inside the parenthesis.
Comment 16 Nathan Froyd [:froydnj] 2013-04-19 13:33:31 PDT
(In reply to Daniel Holbert [:dholbert] from comment #15)
> Comment on attachment 739761 [details] [diff] [review]
> enable parallel running of reftests
> 
> Generally this seems like a good idea.  A few nits below:
> 
> >+++ b/layout/tools/reftest/reftest.js
> >@@ -40,16 +40,20 @@ Components.utils.import("resource://gre/modules/FileUtils.jsm");
> >+const ALL_TESTS = 0;
> >+const ONLY_NEEDS_FOCUS_TESTS = 1;
> >+const ONLY_NON_NEEDS_FOCUS_TESTS = 2;
> >+var gOnlyRunTests = 0;
> 
> I think the naming could be clearer here.
> 
> Some other ideas: s/gOnlyRunTests/gTestFilterMode/?
> 
> and then:
>  FILTER_MODE_ALL_TESTS
>  FILTER_MODE_ONLY_NEEDS_FOCUS_TESTS
>  FILTER_MODE_ONLY_NON_NEEDS_FOCUS_TESTS
> or something like that?

Yeah, I think that does work better.

> >+        } else if (!test.needsFocus &&
> >+                   (gOnlyRunTests == ONLY_NEEDS_FOCUS_TESTS)) {
> >+            ++gTestResults.Skip;
> >+            gDumpLog("REFTEST TEST-KNOWN-FAIL | " + test.url1.spec + " | (SKIP)\n");
> >+            gURLs.shift();
> 
> As noted above, it seems odd to be printing out a "known fail" message for
> tests that we're just deciding to skip.

Yeah, I agree with this.  OTOH, TEST-KNOWN-FAIL is what reftest.js uses all over the place.  I'll have to look: we don't want skipped messages for parallelized reftest runs, so just ditching these messages in such cases may be the right thing to do anyway.

Thanks for the feedback!
Comment 17 Nathan Froyd [:froydnj] 2013-04-19 13:34:33 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> Can you also make sure that the reftest window is never focused?  This way,
> if somebody adds a test which relies on focus without setting needs-focus,
> then their test will fail every single time, instead of some of the times.

Yup, that's bug 859339, which should block this one.
Comment 18 Gregory Szorc [:gps] 2013-04-19 14:26:15 PDT
While I haven't looked at the patches, keep in mind testsuite-targets.mk is only used locally - automation has their own configuration for invoking test harnesses (mozharness these days) on builders. This is one reason why I'd like to neuter testsuite-targets.mk: move all the critical business logic elsewhere and have testsuite-targets.mk just invoke that. That doesn't prohibit you from using makefiles as a means to launch processes in parallel. It does likely mean not using testsuite-targets.mk for that purpose, however.
Comment 19 Nathan Froyd [:froydnj] 2013-04-19 14:37:28 PDT
(In reply to Gregory Szorc [:gps] from comment #18)
> While I haven't looked at the patches, keep in mind testsuite-targets.mk is
> only used locally - automation has their own configuration for invoking test
> harnesses (mozharness these days) on builders. This is one reason why I'd
> like to neuter testsuite-targets.mk: move all the critical business logic
> elsewhere and have testsuite-targets.mk just invoke that.

This is news to me.  Where do those bits live?
Comment 20 Gregory Szorc [:gps] 2013-04-19 15:14:18 PDT
(In reply to Nathan Froyd (:froydnj) from comment #19)
> (In reply to Gregory Szorc [:gps] from comment #18)
> > While I haven't looked at the patches, keep in mind testsuite-targets.mk is
> > only used locally - automation has their own configuration for invoking test
> > harnesses (mozharness these days) on builders. This is one reason why I'd
> > like to neuter testsuite-targets.mk: move all the critical business logic
> > elsewhere and have testsuite-targets.mk just invoke that.
> 
> This is news to me.  Where do those bits live?

"Not in the tree."

mozharness lives in https://hg.mozilla.org/build/mozharness/. But I don't believe all trees are yet migrated to mozharness. So, you may also need to change foo in https://hg.mozilla.org/build/buildbotcustom/. This is beyond my knowledge. You'll likely have to consult with someone in RelEng - I'd start with aki.

If it were up to me, as much logic as possible would live in m-c so duplication of effort is as minimal as possible. If you e.g. want to use make targets to invoke the test runner, you'll need to ship said make file as part of the tests package. Then you'll need to update mozharness (possibly other systems) to use the new method of invocation. Then you'll need to coordinate pushing your change to mozilla-central with a deployment of updated automation configs. This is why we try to do as much magic from within the Python test runners (runtests.py) as possible while preserving the external CLI interface that automation/testsuite-targets.mk uses.
Comment 21 Nathan Froyd [:froydnj] 2013-04-19 15:32:20 PDT
(In reply to Gregory Szorc [:gps] from comment #20)
> (In reply to Nathan Froyd (:froydnj) from comment #19)
> > This is news to me.  Where do those bits live?
> 
> "Not in the tree."
> 
> mozharness lives in https://hg.mozilla.org/build/mozharness/. But I don't
> believe all trees are yet migrated to mozharness. So, you may also need to
> change foo in https://hg.mozilla.org/build/buildbotcustom/. This is beyond
> my knowledge. You'll likely have to consult with someone in RelEng - I'd
> start with aki.

So tests are difficult enough to run locally; not we're going to make them run using an entirely different infrastructure that doesn't come with the tree?  This doesn't seem like a step forward. :(
Comment 22 :Ehsan Akhgari (busy, don't ask for review please) 2013-04-20 10:30:24 PDT
FWIW you can see how we run tests in our tinderbox machines by looking at the full logs and look for buildbot commands.
Comment 23 Ted Mielczarek [:ted.mielczarek] 2013-04-21 09:37:40 PDT
(In reply to Nathan Froyd (:froydnj) from comment #21)
> So tests are difficult enough to run locally; not we're going to make them
> run using an entirely different infrastructure that doesn't come with the
> tree?  This doesn't seem like a step forward. :(

This has always been the case. We have never run the tests using the in-tree targets. (Except for make check.)
Comment 24 Ed Morley [:emorley] 2013-04-22 02:01:37 PDT
Switching to using in-tree targets for automation would be ideal tbh - I've lost track of the times where we've had to make simultaneous buildbot and in-tree changes when adding test runner command line params (and then having to special-case them as the updated test runners make their way through the trains).
Comment 25 Nathan Froyd [:froydnj] 2013-04-22 06:35:30 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> (In reply to Nathan Froyd (:froydnj) from comment #21)
> > So tests are difficult enough to run locally; not we're going to make them
> > run using an entirely different infrastructure that doesn't come with the
> > tree?  This doesn't seem like a step forward. :(
> 
> This has always been the case. We have never run the tests using the in-tree
> targets. (Except for make check.)

*boggle*

How does one test out a different test setup on try, then?
Comment 26 Ted Mielczarek [:ted.mielczarek] 2013-04-22 09:08:51 PDT
You don't. :-/ It's a pain in the ass I agree. Just some random thoughts:
* We're moving away from makefiles, mach targets are replacing makefile targets for test invocation
* I filed bug 524130 ages ago which overlaps with this problem. One of my suggested ways to fix things there was to ship testsuite-targets.mk in the test package.
* I think I would ideally like the test harnesses to just "do the right thing" regardless of where you run them from without having to specify a ton of arguments. There are only two common ways to run the tests: from an objdir or from a test package. It doesn't seem unreasonable to make the harnesses able to determine which configuration they're being run from and locate all the necessary resources.
Comment 27 Phil Ringnalda (:philor) 2013-04-22 12:53:44 PDT
Well, you don't, or you do. We have a single API, runreftest.py, which is called from mozharness, called from pre-mozharness, and called from testsuite-targets. If you want to test how it will work to have automation call runreftest.py with --do-my-fancy-thing=123, you just stick a patch on top of your domyfancything patch which hacks up the commandline handling to always do your fancy thing 123 times even without --do-my-fancy-thing=123, and push them both to try.

And while "just put it in the tree!" sounds like perfection to us, "I'll have to push this change to mozilla-central, merge that to every trunk project branch and twig, and push it to mozilla-aurora and mozilla-beta and mozilla-release and mozilla-esr17!" isn't going to sound like perfection to releng, so you have to be a little careful about what things you move into the tree.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2013-05-07 09:50:16 PDT
Comment on attachment 739761 [details] [diff] [review]
enable parallel running of reftests

Review of attachment 739761 [details] [diff] [review]:
-----------------------------------------------------------------

So clearly the testsuite-targets.mk stuff isn't going to help. I think we need to bite the bullet and either change the Python harness to accomodate launching multiple browsers in parallel, or let the JS harness run multiple tests in parallel (I'm not sure how we'd do this usefully since we can't touch the DOM from workers, maybe this isn't possible.)

::: layout/tools/reftest/runreftest.py
@@ +215,5 @@
>  
> +    def convertOnlyRunTestsString(option, opt_str, value, parser):
> +      strmap = { "all" : ALL_TESTS,
> +                 "needs-focus" : ONLY_NEEDS_FOCUS_TESTS,
> +                 "non-needs-focus" : ONLY_NON_NEEDS_FOCUS_TESTS }

Is there any reason to not write a string pref and let reftest.js deal with strings?
Comment 29 Nathan Froyd [:froydnj] 2013-05-07 13:26:17 PDT
Created attachment 746607 [details] [diff] [review]
enable parallel running of reftests

OK, here's a different incomplete version where runreftests.py launches
additional separate instances of itself if --run-tests-in-parallel is
passed.

I realize this uses the multiprocessing module, which isn't supported on
all Python installations.  However, AFAICS, it's supported on all the
machines we run tests on and that's really all I care about.  And
there's a graceful fallback to non-parallel test running if
multiprocessing isn't enabled.

I did try launching things with just Python functions and
multiprocessing.Pool, but that ran into unpickleable things in
Automation.  I also was only mostly convinced about Automation's
robustness in the face of shared accesses, so fully separate processes
seemed like the right thing to do in any event.

There's no attempt to merge the log files in any manner, intelligent or
otherwise.  That clearly needs to be fixed up before commit.

Something also appears to be funky with needs-focus; I see a lot of
tests timing out here, even with my patch from bug 859339 applied.  Some
debugging is going to be needed, since the tests run fine serially (at
least on Linux) with that patch applied.

Some tests also take an awfully long time, I don't know if there's a
good way to load-balance them, because it looks like the longest chunk
takes 5-8x longer than the shortest chunk...
Comment 30 Nathan Froyd [:froydnj] 2013-05-07 13:27:30 PDT
Comment on attachment 746607 [details] [diff] [review]
enable parallel running of reftests

Ted, WDYT about this version as a starting point?
Comment 31 Nathan Froyd [:froydnj] 2013-05-07 14:34:11 PDT
Created attachment 746638 [details] [diff] [review]
enable parallel running of reftests

Actually, here's a patch that works much better (helps if you get the
proper pref kind in reftest.js).  Some stats from a parallel reftest run
on my dev machine:

 217074:INFO | automation.py | Application ran for: 0:02:37.924890
 217829:INFO | automation.py | Application ran for: 0:02:39.749120
 217951:INFO | automation.py | Application ran for: 0:02:40.157419
 223021:INFO | automation.py | Application ran for: 0:02:55.443944
 224170:INFO | automation.py | Application ran for: 0:02:59.374452
 225753:INFO | automation.py | Application ran for: 0:03:05.819389
 227918:INFO | automation.py | Application ran for: 0:03:11.971117
 228501:INFO | automation.py | Application ran for: 0:03:15.124982
 229440:INFO | automation.py | Application ran for: 0:03:17.264513
 230954:INFO | automation.py | Application ran for: 0:03:23.714728
 232079:INFO | automation.py | Application ran for: 0:03:35.250424
 234444:INFO | automation.py | Application ran for: 0:03:46.318096
 235078:INFO | automation.py | Application ran for: 0:03:50.525366
 235191:INFO | automation.py | Application ran for: 0:03:51.536971
 238430:INFO | automation.py | Application ran for: 0:01:42.985169
 238635:INFO | automation.py | Application ran for: 0:04:25.865078
 248861:INFO | automation.py | Application ran for: 0:05:45.423506

and the `time' output:

REFTEST INFO | runreftest.py | Running tests: end.
1366.29user 194.35system 5:45.95elapsed 451%CPU (0avgtext+0avgdata 930320maxresident)k
2693272inputs+17815504outputs (14927major+59981695minor)pagefaults 0swaps
Comment 32 Ted Mielczarek [:ted.mielczarek] 2013-05-10 07:49:22 PDT
Comment on attachment 746638 [details] [diff] [review]
enable parallel running of reftests

Review of attachment 746638 [details] [diff] [review]:
-----------------------------------------------------------------

This seems a lot more workable. I have a few comments. I'm hesitant to f+ since I still think it's a bit ugly, but I think we could definitely go with something close to this.

::: layout/tools/reftest/reftest.js
@@ +350,5 @@
>      } catch(e) {}
>  
> +    try {
> +        gTestFilterMode = prefs.getCharPref("reftest.testFilterMode");
> +    } catch(e) {}

Having both reftest.filter and reftest.testFilterMode is kind of confusing. :-/ Maybe we can make this clearer?

::: layout/tools/reftest/runreftest.py
@@ +252,5 @@
>  
> +    self.add_option("--run-tests-in-parallel",
> +                    dest = "runTestsInParallel", action = "store_true",
> +                    help = "run tests in parallel if possible")
> +    defaults["parallel"] = False

I think this should be on by default so that we give this benefit to everyone. It might make sense to have a --not-parallel option to assist in debugging weird failures.

Given your test below, we could default it to on unless chunking or filterMode is specified.

@@ +348,5 @@
> +                      "--this-chunk=%d" % chunkNumber]
> +
> +  for jobArgs in perProcessArgs:
> +    jobArgs.remove("--run-tests-in-parallel")
> +    jobArgs[0:0] = [sys.executable]

This is sort of horrible. I have a counter-proposal: we don't really need to use multiprocessing here, since the actual work is being farmed out to multiple Firefox processes. You could just use threads and spawn off a separate Firefox process in each thread to work around the multiprocessing/Automation nastiness.

@@ +375,5 @@
> +
> +  if options.runTestsInParallel:
> +    sys.exit(runParallelReftests(reftest, options, args))
> +  else:
> +    sys.exit(runReftestWithOptions(reftest, options, args))

Feels like you could push this down a level, and have the runReftest method determine how many processes to use, and just run the tests directly if only using one process.
Comment 33 Nathan Froyd [:froydnj] 2013-05-10 09:21:29 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #32)
> @@ +348,5 @@
> > +                      "--this-chunk=%d" % chunkNumber]
> > +
> > +  for jobArgs in perProcessArgs:
> > +    jobArgs.remove("--run-tests-in-parallel")
> > +    jobArgs[0:0] = [sys.executable]
> 
> This is sort of horrible. I have a counter-proposal: we don't really need to
> use multiprocessing here, since the actual work is being farmed out to
> multiple Firefox processes. You could just use threads and spawn off a
> separate Firefox process in each thread to work around the
> multiprocessing/Automation nastiness.

Yeah, multiprocessing is not going to work long term, because there really needs to be some sort of merging of the logs or smarts about how to handle the Automation output of the subprocesses.  try can cope with a parallel-by-default run, but it only sees ~1/N of the tests being successful (though it does see all the failures, oddly enough), and I think the elapsed time display is off as well.  Maybe other things, too (leaks?).
Comment 34 Gregory Szorc [:gps] 2013-05-10 10:33:07 PDT
And such is a consequence of parsing logs. Any time you have events that span multiple lines, you will get bitten. We really need all our automation to key off machine readable files, not logs. Logs should be for humans only. Every piece of important metadata in the logs should be available in a machine readable file format (such as JSON).

I hate saying this, but a possible solution is to hack up the reftest output coming from the browser itself to insert a "marker" identifying the process and/or current test. The Python test harness will then buffer output belonging to the same marker/group and will flush when it sees a special sequence or when the marker/group changes. If you are using threading (not multiprocessing), the flush to output should be atomic (without interleaving) because the GIL will only allow one Python thread to execute at once. Or, you could always use locking. This will emulate the current log file behavior without requiring downstream consumers to adapt to a new logging format. The only things that may need special care are the process start/end events. Are there any that are important?
Comment 35 Nathan Froyd [:froydnj] 2013-06-17 12:54:00 PDT
Created attachment 763744 [details] [diff] [review]
add --run-tests-in-parallel option

This patch implements a better run-tests-in-parallel experience: it knows something
about the reftest log output so we can summarize relevant things for tbpl at the
end of the run.  It also knows how to chunk output from each test, so we can get
the expected:

TEST-START
...stuff from that test..
TEST-END
TEST-START
...stuff from *that* test...
TEST-END

instead of some depends-on-timing interleaving of the test output.  Output *not*
related to individual tests, however, is permitted to interleave freely.

Thankfully, tbpl only does line-by-line matching of the output and
doesn't care about the relative placement of lines, so we can get away
with a pretty simple-minded approach to this.

Try run looks pretty sane; the oranges that you see are in the process
of getting fixed:

https://tbpl.mozilla.org/?tree=Try&rev=207818b43177

Still need to make sure everything works sanely with remotereftest.py
and runreftestb2g.py.
Comment 36 Ted Mielczarek [:ted.mielczarek] 2013-06-18 06:44:17 PDT
Comment on attachment 763744 [details] [diff] [review]
add --run-tests-in-parallel option

Review of attachment 763744 [details] [diff] [review]:
-----------------------------------------------------------------

In general I'm kind of reeling in horror from the regex nightmare here. I'm hopeful that our structured logging work (starting this summer as an intern project) will make this less horrible since you'll be able to deal with log messages as data structures at some point, so it's probably worthwhile to swallow the pill here and deal with it later.

I've nitpicked a bit, I tried not to be super nitpicky since this is just a feedback+ run, the general approach looks good here, thanks for doing all the grunt work to get this going!

::: layout/tools/reftest/runreftest.py
@@ +40,5 @@
> +
> +  def run(self):
> +    process = subprocess.Popen(self.cmdlineArgs, stdout=subprocess.PIPE)
> +    for chunk in self.chunkForMergedOutput(process.stdout):
> +      print chunk,

Apparently Python's print is not threadsafe. :-/
http://bramcohen.livejournal.com/70686.html

I think you have two options here:
a) Roll your own threadsafe print like the link above
b) Use a Queue, shuffle all output to the main thread and print it there

@@ +42,5 @@
> +    process = subprocess.Popen(self.cmdlineArgs, stdout=subprocess.PIPE)
> +    for chunk in self.chunkForMergedOutput(process.stdout):
> +      print chunk,
> +    process.wait()
> +    self.retcode = process.returncode

.wait returns returncode.

@@ +44,5 @@
> +      print chunk,
> +    process.wait()
> +    self.retcode = process.returncode
> +
> +  def chunkForMergedOutput(self, logsource):

This could totally use a docstring comment describing what it does.

@@ +55,5 @@
> +    summaryRegexes = [re.compile(regex) for regex in summaryRegexStrings]
> +
> +    for line in logsource:
> +      haveYieldedOutput = False
> +      for (beginRegex, endRegex) in chunkingLinePairs:

I don't really understand why you're iterating over this when there's only one pair of regexes.

@@ +221,5 @@
> +      # But if we don't have enough CPU cores, it won't help much.
> +      cpuCount = multiprocessing.cpu_count()
> +    except ImportError: # multiprocessing could not be imported.
> +      pass
> +    except NotImplementedError: # cpu_count doesn't work

You can combine these:
except (ImportError, NotImplementedError):
  pass

@@ +225,5 @@
> +    except NotImplementedError: # cpu_count doesn't work
> +      pass
> +
> +    if options.runTestsInParallel != "yes" or not multiprocessingIsSupported:
> +      return self.runTests(testPath, options)

I wonder if it isn't better to just use the same codepath even if we're only running a single process, just to keep the common code paths common?

@@ +232,5 @@
> +    jobsWithoutFocus = cpuCount
> +    totalJobs = cpuCount + 1
> +    perProcessArgs = [sys.argv[:] for i in range(0, totalJobs)]
> +
> +    # First job is only needs-focus tests.  Remaining jobs are non-needs-focus and chunked.

I wonder if this will become a pain in the future if we accumulate too many needs-focus tests?

@@ +259,5 @@
> +        t.join(10)
> +      nowRunningThreads = [t for t in runningThreads if t.is_alive()]
> +      if len(nowRunningThreads) == 0:
> +        break
> +      runningThreads = nowRunningThreads

This is a little clunky. I'd probably just maintain one list and use filter() to do the counts/iteration for joining.

@@ +281,5 @@
> +    errors = [t.retcode for t in threads if t.retcode != 0]
> +    if len(errors) == 0:
> +      return 0
> +    # Arbitrarily pick the first failing result as the result of the whole run.
> +    return errors[0]

I'd probably just rewrite this whole thing as:

if any(t.retcode != 0 for t in threads):
  return 1
return 0

Trying to preserve the return code doesn't seem that important as long as you exit with error or success properly.

@@ +384,5 @@
>  
> +    self.add_option("--run-tests-in-parallel",
> +                    action = "store", type = "string", dest = "runTestsInParallel",
> +                    help = "run tests in parallel if possible")
> +    defaults["runTestsInParallel"] = "yes"

This should be using store_true so it's a bool instead of a string.
Comment 37 Nathan Froyd [:froydnj] 2013-06-18 07:15:55 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #36)
> In general I'm kind of reeling in horror from the regex nightmare here.

I was actually kind of happy that the regex bits worked fairly cleanly. =/

> I'm
> hopeful that our structured logging work (starting this summer as an intern
> project) will make this less horrible since you'll be able to deal with log
> messages as data structures at some point, so it's probably worthwhile to
> swallow the pill here and deal with it later.

Hooray for avoiding scope creep!

> I've nitpicked a bit, I tried not to be super nitpicky since this is just a
> feedback+ run, the general approach looks good here, thanks for doing all
> the grunt work to get this going!

Thanks for the feedback!

> ::: layout/tools/reftest/runreftest.py
> @@ +40,5 @@
> > +
> > +  def run(self):
> > +    process = subprocess.Popen(self.cmdlineArgs, stdout=subprocess.PIPE)
> > +    for chunk in self.chunkForMergedOutput(process.stdout):
> > +      print chunk,
> 
> Apparently Python's print is not threadsafe. :-/
> http://bramcohen.livejournal.com/70686.html
> 
> I think you have two options here:
> a) Roll your own threadsafe print like the link above
> b) Use a Queue, shuffle all output to the main thread and print it there

Bleh.  Actually, I should probably hook up to whatever automation uses for logging so that the master runreftests.py actually respects --log-file.  And that's threadsafe, so it kills two birds with one module.

> @@ +55,5 @@
> > +    summaryRegexes = [re.compile(regex) for regex in summaryRegexStrings]
> > +
> > +    for line in logsource:
> > +      haveYieldedOutput = False
> > +      for (beginRegex, endRegex) in chunkingLinePairs:
> 
> I don't really understand why you're iterating over this when there's only
> one pair of regexes.

At one point I thought I would need more regexes, but never actually did so.  Will fix.

> @@ +225,5 @@
> > +    except NotImplementedError: # cpu_count doesn't work
> > +      pass
> > +
> > +    if options.runTestsInParallel != "yes" or not multiprocessingIsSupported:
> > +      return self.runTests(testPath, options)
> 
> I wonder if it isn't better to just use the same codepath even if we're only
> running a single process, just to keep the common code paths common?

I thought about this...structuring things this way means that we don't have to worry about parallel jobs on b2g and remotereftest.py because they never go through the parallel-options-processing path.  But it's not hard to change and I can shuffle things around if desired.

> @@ +232,5 @@
> > +    jobsWithoutFocus = cpuCount
> > +    totalJobs = cpuCount + 1
> > +    perProcessArgs = [sys.argv[:] for i in range(0, totalJobs)]
> > +
> > +    # First job is only needs-focus tests.  Remaining jobs are non-needs-focus and chunked.
> 
> I wonder if this will become a pain in the future if we accumulate too many
> needs-focus tests?

It could, but I don't see a good way around it.  There's also only ~200 needs-focus tests in-tree currently, and you'd need to run on a very beefy machine before a chunked non-needs-focus job started having close to that number of tests.

From looking at timing output, I'd be more worried about trying to load-balance tests before worrying about too-many needs-focus tests.  The longest parallel job takes a good while longer (2-3x) than other jobs, depending on how the tests are chunked.
> @@ +384,5 @@
> >  
> > +    self.add_option("--run-tests-in-parallel",
> > +                    action = "store", type = "string", dest = "runTestsInParallel",
> > +                    help = "run tests in parallel if possible")
> > +    defaults["runTestsInParallel"] = "yes"
> 
> This should be using store_true so it's a bool instead of a string.

Hm.  I went with a string because it was suggested that the default be parallel and there didn't seem to be any good way to invert a boolean default from the command-line.  Maybe I need to look closer.  Maybe just --run-tests-in-parallel and --no-run-tests-in-parallel?
Comment 38 Ted Mielczarek [:ted.mielczarek] 2013-06-18 07:32:34 PDT
If the default is going to be parallel, then presumably you want a "store_false" option like --not-parallel ?
Comment 39 Gregory Szorc [:gps] 2013-06-18 10:18:37 PDT
Comment on attachment 763744 [details] [diff] [review]
add --run-tests-in-parallel option

Review of attachment 763744 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/tools/reftest/runreftest.py
@@ +218,5 @@
> +      import multiprocessing
> +      multiprocessingIsSupported = True
> +
> +      # But if we don't have enough CPU cores, it won't help much.
> +      cpuCount = multiprocessing.cpu_count()

This will work everywhere on Python 2.6+ (where multiprocessing is available). multiprocessing starts to break on BSDs and some other operating systems when you try to import more advanced functionality of multiprocessing, notably the locking primitive.

I think you can ditch the try..except.
Comment 40 Nathan Froyd [:froydnj] 2013-07-12 05:13:06 PDT
Created attachment 774590 [details] [diff] [review]
add --run-tests-in-parallel option

I think this addresses most of Ted's comments; it's been a little while since I
touched this code, so not asking for review until I have a chance to confirm.
Comment 41 Nathan Froyd [:froydnj] 2013-07-31 08:33:41 PDT
Created attachment 783783 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

This script addresses all of Ted and gps's feedback.  I think it's ready for
a round of closer scrutiny.
Comment 42 Ted Mielczarek [:ted.mielczarek] 2013-08-14 13:06:47 PDT
Comment on attachment 783783 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

Review of attachment 783783 [details] [diff] [review]:
-----------------------------------------------------------------

I have a lot of comments, most of them are trivial. The only thing I'm not comfortable with as it stands is the "run the whole Python harness in each thread" part.

::: layout/tools/reftest/runreftest.py
@@ +5,5 @@
>  """
>  Runs the reftest test harness.
>  """
>  
> +import re, sys, shutil, os, os.path, threading, subprocess, collections, multiprocessing

Would you mind splitting these to separate lines while you're here?

@@ +33,5 @@
> +# Python's print is not threadsafe.
> +printLock = threading.Lock()
> +
> +class ReftestThread(threading.Thread):
> +  def __init__(self, cmdlineArgs):

Maybe we should take this opportunity of you rewriting 90% of this file to re-indent everything to 4 spaces?

@@ +38,5 @@
> +    threading.Thread.__init__(self)
> +    self.cmdlineArgs = cmdlineArgs
> +    self.summaryMatches = {}
> +    self.retcode = -1
> +    for (text, _) in summaryLines:

nit: unnecessary parens

@@ +42,5 @@
> +    for (text, _) in summaryLines:
> +      self.summaryMatches[text] = None
> +
> +  def run(self):
> +    process = subprocess.Popen(self.cmdlineArgs, stdout=subprocess.PIPE)

Running a whole new Python per-Thread seems pretty heavyweight. Is there a reason you went with this approach? It seems like you could just execute runSerialTests here (that's basically what mihneadb wound up doing in bug 887054). Clearly you'd have to instantiate a separate RefTest object per-Thread, but that doesn't seem horrible.

@@ +54,5 @@
> +    Individual test results--anything between 'REFTEST TEST-START' and
> +    'REFTEST TEST-END' lines--are an atomic unit.  Lines with data from
> +    summaries are parsed and the data stored for later aggregation.
> +    Other lines are considered their own atomic units and are permitted
> +    to intermix freely."""

I can't wait for someone to rewrite this using chmanchester's structured logging stuff. You should note here and in reftest.js that if someone changes the format that these will have to be updated in sync.

@@ +58,5 @@
> +    to intermix freely."""
> +    testStartRegex = re.compile("^REFTEST TEST-START")
> +    testEndRegex = re.compile("^REFTEST TEST-END")
> +    summaryHeadRegex = re.compile("^REFTEST INFO \\| Result summary:")
> +    global summaryLines

This shouldn't need to be declared global if you're only reading from it, not writing to it.

@@ +59,5 @@
> +    testStartRegex = re.compile("^REFTEST TEST-START")
> +    testEndRegex = re.compile("^REFTEST TEST-END")
> +    summaryHeadRegex = re.compile("^REFTEST INFO \\| Result summary:")
> +    global summaryLines
> +    summaryRegexStrings = ["^REFTEST INFO \\| (?P<message>" + text + "): (?P<total>\\d+) " +

I'd prefer this to be string-interpolated, using either % or str.format (it's already hard to read, it's a regex).

@@ +71,5 @@
> +          chunkedLines.append(lineToBeChunked)
> +          if testEndRegex.search(lineToBeChunked) is not None:
> +            break
> +        yield ''.join(chunkedLines)
> +        continue

You could probably rewrite a lot of this function using itertools.takewhile:
http://docs.python.org/2/library/itertools.html#itertools.takewhile

But I don't know that it would be appreciably better.

@@ +219,5 @@
> +    while True:
> +      # XXX do we really need the timeouts here?
> +      for t in threads:
> +        t.join(10)
> +      if not any([t.is_alive() for t in threads]):

I didn't know it was safe to join already-joined threads. This is still a little weird since you have a timeout but can wind up ilooping anyway if a thread hangs. I guess you're relying on the fact that the harness does timeout handling and so you shouldn't get there, but maybe it'd be good to at least mention that here?

@@ +220,5 @@
> +      # XXX do we really need the timeouts here?
> +      for t in threads:
> +        t.join(10)
> +      if not any([t.is_alive() for t in threads]):
> +        break

We should seriously factor out "wait for N worker threads to complete" into a library, since bug 887054 implemented something similar. Yours is slightly less complicated than his because you don't have to feed the work queue after you start

@@ +238,5 @@
> +    for (summaryObj, (text, categories)) in zip(summaryObjects, summaryLines):
> +      details = ', '.join(["%d %s" % (summaryObj[attribute], description) for (attribute, description) in categories])
> +      print 'REFTEST INFO | ' + text + ': ' + str(summaryObj['total']) + ' (' +  details + ')'
> +
> +    return int(any([t.recode != 0 for t in threads]))

You don't need the [] here, that just creates an unnecessary intermediate list.
Comment 43 Nathan Froyd [:froydnj] 2013-08-14 13:34:35 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #42)
> @@ +33,5 @@
> > +# Python's print is not threadsafe.
> > +printLock = threading.Lock()
> > +
> > +class ReftestThread(threading.Thread):
> > +  def __init__(self, cmdlineArgs):
> 
> Maybe we should take this opportunity of you rewriting 90% of this file to
> re-indent everything to 4 spaces?

I can do that, but I'll do that in a separate patch somewhere; I don't think I'm touching *that* much. :)

> @@ +42,5 @@
> > +    for (text, _) in summaryLines:
> > +      self.summaryMatches[text] = None
> > +
> > +  def run(self):
> > +    process = subprocess.Popen(self.cmdlineArgs, stdout=subprocess.PIPE)
> 
> Running a whole new Python per-Thread seems pretty heavyweight. Is there a
> reason you went with this approach? It seems like you could just execute
> runSerialTests here (that's basically what mihneadb wound up doing in bug
> 887054). Clearly you'd have to instantiate a separate RefTest object
> per-Thread, but that doesn't seem horrible.

I think I did this for two reasons:

- option parsing seemed easier.  I was not totally sure that deep-copying options objects would DTRT (since each thread needs its own copy).  This is a pretty minor concern.

- I didn't want to run things and find out that the automation code depended in deep, subtle ways upon only running in a single thread.  Running separate processes seemed like the easiest way of Not Having To Think About It.

> @@ +54,5 @@
> > +    Individual test results--anything between 'REFTEST TEST-START' and
> > +    'REFTEST TEST-END' lines--are an atomic unit.  Lines with data from
> > +    summaries are parsed and the data stored for later aggregation.
> > +    Other lines are considered their own atomic units and are permitted
> > +    to intermix freely."""
> 
> I can't wait for someone to rewrite this using chmanchester's structured
> logging stuff. You should note here and in reftest.js that if someone
> changes the format that these will have to be updated in sync.

I will do that.  Is that structured logging coming Real Soon Now?

> @@ +71,5 @@
> > +          chunkedLines.append(lineToBeChunked)
> > +          if testEndRegex.search(lineToBeChunked) is not None:
> > +            break
> > +        yield ''.join(chunkedLines)
> > +        continue
> 
> You could probably rewrite a lot of this function using itertools.takewhile:
> http://docs.python.org/2/library/itertools.html#itertools.takewhile
> 
> But I don't know that it would be appreciably better.

I can look into that, that seems like a small improvement.

> @@ +220,5 @@
> > +      # XXX do we really need the timeouts here?
> > +      for t in threads:
> > +        t.join(10)
> > +      if not any([t.is_alive() for t in threads]):
> > +        break
> 
> We should seriously factor out "wait for N worker threads to complete" into
> a library, since bug 887054 implemented something similar. Yours is slightly
> less complicated than his because you don't have to feed the work queue
> after you start

Will file a followup.
Comment 44 Gregory Szorc [:gps] 2013-08-14 17:24:11 PDT
chmanchester is landing structured logging for xpcshell tests in bug 896087. It should hopefully land tomorrow, so it's ready. He can give you some pointers.
Comment 45 Ted Mielczarek [:ted.mielczarek] 2013-08-14 17:57:42 PDT
I think there's enough going on in this patch to not add that on top (especially since his patches don't add anything to reftest). I'd be happy to punt that to a followup.
Comment 46 Chris Manchester (:chmanchester) 2013-08-19 09:53:18 PDT
I took a look at the Javascript and python sides of reftests, and what's happening in this bug. The Javascript side should be fairly straightforward with some work, while Python consumption of structured messages will require more significant changes as long as reftests use automation.py.

The issue with log consumption from Python is that by using automation.py, we get its log parsing and responses to output for free, but we're tied to its approach. I've been working on a way to extract this for mochitests in bug 886570, but I'm not sure there's a future proof and clean solution to this as long as we're using automation.py.

Once we are free to extend the parsing logic of whatever runs the tests, the python side of this is just a matter of parsing each JSON message into a dict and doing whatever formatting is needed for downstream consumers.

From the javascript side, the approximately 50 call sites of gDumpLog in reftest.js that will each need to be converted to an appropriate call to a structured log function. Much of this is more or less mechanical, but it requires thought to determine the set of log actions we'll find useful, and which parts of the generated messages should be structured.

Log4Moz supports structured output, however this adds a dependency. We decided in the case of xpcshell to isolate the logging activity in head.js to a simple formatting function that just adds a bit of metadata to an object and writes it to standard output. Performance was a factor there, so we'll have to keep this in mind.

A note on interleaved output:
I noticed when working on top of Mihnea's patches form bug 887054 that output was pretty confusing if the output from a failing test wasn't written in a continuous block, so I used a lock to maintain coherent blocks of output (and keep other other threads from writing within these blocks). While this seems like a pattern for harnesses running in parallel, and something we could think about implementing in mozlog, at the moment there's a lot of harness-specific logic that goes into how our output should be chunked, and any library implementation might not end up giving significant benefits over a simple lock.
Comment 47 Chris Manchester (:chmanchester) 2013-08-20 10:10:33 PDT
Filed bug 907270 for the issue of parsing output from a process managed by automation.py
Comment 48 Nathan Froyd [:froydnj] 2013-10-22 18:21:12 PDT
Created attachment 820738 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

I think I've addressed or responded to all your feedback in comment 42.  The
only thing I don't think I've addressed is using itertools.takewhile.  I
looked at takewhile and it didn't seemed like it fit, since the sentinel
element in the iterator is removed from the iterator and not available for
examination.  This isn't what we want, so I left the code as-is.
Comment 49 Nathan Froyd [:froydnj] 2013-10-23 06:00:10 PDT
Created attachment 820975 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

Patch with some small issues from Try runs fixed up.  Everything in comment 48
still stands.
Comment 50 Nathan Froyd [:froydnj] 2013-10-23 09:21:14 PDT
Apparently this patch is really good at turning intermittent failures into perma failures.  You can follow along with the try run retriggers:

https://tbpl.mozilla.org/?tree=Try&rev=11e2c6342cd6

but it looks like several Windows intermittent oranges will have to be fixed.  (I think I understand the needs-focus.html failures and will push a fix for that soon.)

I do not understand the test-displayport-2.html failure on Linux reftest-ipc.
Comment 51 Nathan Froyd [:froydnj] 2013-10-23 13:37:57 PDT
Additional results:

- The Linux VMs we use for testing appear to be pretty impoverished; parallel reftests appear to provide no improvement and/or make things somewhat worse compared to serial reftests.

- Windows and Mac get nice speedups, as expected.

- Parallel reftests cause all sorts of problems for Win7 and Win8; perma-timeouts seem to be the norm there.  This may just be due with trying to launch 16 (!) reftest processes and running out of RAM.  We may need to scale things back a bit on Windows.

Things have gotten a lot worse on the oranges than they were 2 months ago. :(

https://tbpl.mozilla.org/?tree=Try&rev=35b35cd63fe6
Comment 52 Nathan Froyd [:froydnj] 2013-10-23 13:45:17 PDT
Created attachment 821266 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

Minor tweaks to this version: a brief blurb of output about a thread
getting launched, both for debugging and informative purposes.  We also
flush stdout after printing so the output doesn't pause in the middle of
messages.

We also serialize the needs-focus tests to run after the non-needs-focus
tests have completed.  I don't really like doing this, but trying to
launch everything simultaneously and guarantee that the focus is going
to wind up with the needs-focus testing windows is a sure fire way to
end up with intermittent oranges.  Serializing things ensures that we
can guarantee the focus ends up where it's supposed to be.

Given the intermittents that are cropping up here (thanks, Windows layers
rewrite!...I think), this patch can't go in, but maybe we can at least
have an r+'d patch ready to go when all the intermittents have been ironed
out. =/
Comment 53 :Ehsan Akhgari (busy, don't ask for review please) 2013-10-23 14:22:25 PDT
Can we turn this on for one platform initially, e.g., Mac?
Comment 54 Nathan Froyd [:froydnj] 2013-10-23 18:06:20 PDT
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #53)
> Can we turn this on for one platform initially, e.g., Mac?

10.8 debug reftests look like they're hitting some set of weird issues and are pretty orange.

I wouldn't mind turning this on for a subset of platforms, but we've got to get a green subset that's not Linux first. :(
Comment 55 Ted Mielczarek [:ted.mielczarek] 2013-10-24 05:21:08 PDT
I'll review this shortly, but if nothing else we can land this with the default set to "not parallel", so your patch doesn't bitrot and it's easy to test on try.
Comment 56 Ted Mielczarek [:ted.mielczarek] 2013-12-12 12:39:03 PST
Comment on attachment 821266 [details] [diff] [review]
add --run-tests-in-parallel option to reftests

Review of attachment 821266 [details] [diff] [review]:
-----------------------------------------------------------------

Mea culpa for the extreme review lag here, that's unacceptable.

::: layout/tools/reftest/runreftest.py
@@ +225,5 @@
> +    totalJobs = jobsWithoutFocus + 1
> +    perProcessArgs = [sys.argv[:] for i in range(0, totalJobs)]
> +
> +    # First job is only needs-focus tests.  Remaining jobs are non-needs-focus and chunked.
> +    perProcessArgs[0][-1:-1] = ["--focus-filter-mode=needs-focus"]

You can do .insert(-1, "--focus-filter-mode=needs-focus").

@@ +229,5 @@
> +    perProcessArgs[0][-1:-1] = ["--focus-filter-mode=needs-focus"]
> +    for (chunkNumber, jobArgs) in enumerate(perProcessArgs[1:], start=1):
> +      jobArgs[-1:-1] = ["--focus-filter-mode=non-needs-focus",
> +                        "--total-chunks=%d" % jobsWithoutFocus,
> +                        "--this-chunk=%d" % chunkNumber]

I wonder if it wouldn't look nicer to just do this all up in the assignment to perProcessArgs instead, like:
perProcessArgs = [["--focus-filter-mode=needs-focus" + sys.argv[1:]] + [["--focus-filter-mode=non-needs-focus",
		"--total-chunks=%d" % jobsWithoutFocus,
		"--this-chunk=%d" % i] + sys.argv[1:] for i in range(1, totalJobs)]

@@ +237,5 @@
> +        jobArgs.remove("--run-tests-in-parallel")
> +      except:
> +        pass
> +      jobArgs[-1:-1] = ["--no-run-tests-in-parallel"]
> +      jobArgs[0:0] = [sys.executable, "-u"]

I still wish we weren't reinvoking Python here, but we can punt that to a followup.

@@ +247,5 @@
> +    while True:
> +      # The test harness in each individual thread will be doing timeout
> +      # handling on its own, so we shouldn't need to worry about any of
> +      # the threads hanging for arbitrarily long.
> +      # XXX do we really need the timeouts here?

Just ditch this comment.
Comment 57 Nathan Froyd [:froydnj] 2014-01-15 06:43:07 PST
Created attachment 8360424 [details] [diff] [review]
add --parallel option for mach reftest-esque things

And a mach piece, nothing complicated.

Note You need to log in before you can comment on or make changes to this bug.