Closed Bug 827960 Opened 12 years ago Closed 11 years ago

Run jit-tests in parallel

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: decoder, Assigned: decoder)

References

Details

(Whiteboard: [capacity][buildfaster:1])

Attachments

(1 file, 3 obsolete files)

Attached patch Patch (obsolete) — Splinter Review
The attached patch adds the --parallel option to jit-test.py which allows running the tests in parallel. The patch was tested on all platforms that the Try server offers and all builds succeeded (Windows required a little more effort, comments in the patch).

With this patch, an optimized Linux64 build on Try runs the jit-tests in 6-7 minutes while the non-parallel version needs over 26 minutes.

I tried to refactor the code a bit to avoid too much duplication, but there's still some code duplicated in the run_test method. Not sure if we want to make it much more beautiful if this code is supposed to be merged eventually with jstests anyway.
Attachment #699366 - Flags: review?(terrence)
While I applaud the effort, this is already tracked in bug 638219. Please upload your patch there.

I will be cancelling the review momentarily.

From a cursory look at it, I wouldn't be surprised if there was push back because of the use of multiprocessing.Pool. There is already a generic parallel process execution "framework" in place in existing JS tests. The latest discussion on bug 638219 implied that this would be used instead of multiprocessing. So, I kinda expect this patch to not attain r+ on those grounds.

Anyway, read bug 638219 and please try to get this in the tree as soon as possible. This is one of the lowest hanging fruits to save build resources!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Attachment #699366 - Flags: review?(terrence)
This bug is not meant to replace bug 638219 but to solve the parallel execution problem with jit-tests *now* until bug 638219 is actually eventually ready to be landed (which it isn't right now).

Terrence mentioned I should therefore open this bug as a separate bug because even if this is on the tree, we still want bug 638219 to happen.

On a side node, I consider just closing/duping a bug with a working patch that is tested on all OSes and claiming I did not read bug 638219 without asking in advance about the backgrounds to be quite rude.
Comment on attachment 699366 [details] [diff] [review]
Patch

Review of attachment 699366 [details] [diff] [review]:
-----------------------------------------------------------------

First, I want to reapologize on the bug for duping without consultation. I'm atoning by giving a drive-by review!

I hope this lands in some form! My biggest concern is over use of multiprocessing.Pool. It has known major suckitude. You should definitely ensure that it responds to whatever mechanism buildbot uses to kill jobs.

General nit: trailing whitespace everywhere.

::: js/src/jit-test/jit_test.py
@@ +11,5 @@
>  from subprocess import *
>  from threading import Thread
> +import traceback
> +
> +from multiprocessing import Process, Pool, Queue, Manager

This will fail on BSDs. See bug 819048 and bug 808280.

You will need to handle the scenario where multiprocessing.{Pool, Queue} fail to import and fall back to serial execution.

Furthermore, multiprocessing.Pool doesn't handle SIGINT (KeyboardInterrupt) properly. See links in aforementioned bug. I see you are doing something with KeyboardInterrupt. I didn't have time to verify it is correct. The code I wrote at https://github.com/indygreg/hg-git/blob/performance-next/hggit/hg2git.py is AFAIK a robust process pool implementation using multiprocessing. Please steal code if needed!

The existing multi-process code for JS tests does seem to be quite robust. Although, it is somewhat difficult to integrate with in current form.

@@ +282,5 @@
> +    # available in the childs, while on Linux and OSX this is the case (because of fork).
> +    global OPTIONS
> +    global JS
> +    OPTIONS = globOPTIONS
> +    JS = globJS

See http://docs.python.org/2/library/multiprocessing.html#windows

Another reason why I don't like multiprocessing.

@@ +615,5 @@
>                    help='Run tests with --ion flag (ignores --jitflags)')
>      op.add_option('--tbpl', dest='tbpl', action='store_true',
>                    help='Run tests with all IonMonkey option combinations (ignores --jitflags)')
> +    op.add_option('--parallel', dest='parallel', action='store_true',
> +                  help='Run tests in parallel.')

default=True?

@@ +617,5 @@
>                    help='Run tests with all IonMonkey option combinations (ignores --jitflags)')
> +    op.add_option('--parallel', dest='parallel', action='store_true',
> +                  help='Run tests in parallel.')
> +    op.add_option('--max-jobs', dest='max_jobs',  type=int, default=None,
> +                  help='Maximum number of jobs to run in parallel with --parallel. Defaults to number of CPU cores.')

Since these are short-lived processes (typically), I'd default to > #cores (assuming it is faster) to minimize unused cycles during new process init.
Attachment #699366 - Flags: feedback+
(In reply to Gregory Szorc [:gps] from comment #3)
> Comment on attachment 699366 [details] [diff] [review]
> Patch
> 
> Review of attachment 699366 [details] [diff] [review]:
> -----------------------------------------------------------------
> General nit: trailing whitespace everywhere.

Thanks. I'll take care of that.

> 
> ::: js/src/jit-test/jit_test.py
> @@ +11,5 @@
> >  from subprocess import *
> >  from threading import Thread
> > +import traceback
> > +
> > +from multiprocessing import Process, Pool, Queue, Manager
> 
> This will fail on BSDs. See bug 819048 and bug 808280.
> 
> You will need to handle the scenario where multiprocessing.{Pool, Queue}
> fail to import and fall back to serial execution.

Taking care of that too, thanks for pointing that out.

> 
> Furthermore, multiprocessing.Pool doesn't handle SIGINT (KeyboardInterrupt)
> properly. See links in aforementioned bug. I see you are doing something
> with KeyboardInterrupt. I didn't have time to verify it is correct. The code
> I wrote at
> https://github.com/indygreg/hg-git/blob/performance-next/hggit/hg2git.py is
> AFAIK a robust process pool implementation using multiprocessing. Please
> steal code if needed!

Yes, I did hit exactly this problem with my original approach. The basic problem is that you cannot .join() on the pool and block because during that blocking, no KeyboardInterrupt will be processed (major suckage...). So I solved the problem by making sure that the .join() call will not block because all jobs are done already.

> 
> The existing multi-process code for JS tests does seem to be quite robust.
> Although, it is somewhat difficult to integrate with in current form.

I didn't look at that code but I guess that's what bug 638219 will work towards then and replace my hackish approach.

> > +    op.add_option('--parallel', dest='parallel', action='store_true',
> > +                  help='Run tests in parallel.')
> 
> default=True?

I can do that, I just thought keeping the default behavior sequential would prevent bad surprises for developers when they're not used to these tests sucking up all their cpu cores.


> > +    op.add_option('--max-jobs', dest='max_jobs',  type=int, default=None,
> > +                  help='Maximum number of jobs to run in parallel with --parallel. Defaults to number of CPU cores.')
> 
> Since these are short-lived processes (typically), I'd default to > #cores
> (assuming it is faster) to minimize unused cycles during new process init.

Ok. What do you suggest? 2*#cores?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee: general → choller
Status: REOPENED → ASSIGNED
Comment on attachment 699366 [details] [diff] [review]
Patch

Review of attachment 699366 [details] [diff] [review]:
-----------------------------------------------------------------

General comments: (1) this is quite nice and we definitely want to take it as soon as possible; (2) remove all the trailing whitespace; (3) test cancelling a try run in-progress as Gregory suggested to make sure we don't break everything irreparably.

That said, as with any code in our tree there is the danger that it will still be here, driving our infrastructure in 5 years, despite our best intentions to replace it. For that reason, I've made some fairly extensive suggestions which I'd like to see again before I give final check-off on this.

::: js/src/jit-test/jit_test.py
@@ +9,5 @@
>  import datetime, os, re, sys, tempfile, traceback, time, shlex
>  import subprocess
>  from subprocess import *
>  from threading import Thread
> +import traceback

Remove: |traceback| is already included in the import list above.

@@ +11,5 @@
>  from subprocess import *
>  from threading import Thread
> +import traceback
> +
> +from multiprocessing import Process, Pool, Queue, Manager

As Gregory mentioned, you need to be prepared for this to fail.

The optimal technique is to make a compatible interfaces available if multiprocessing is not present: e.g. from queue import Queue; from threading import Thread as Process, etc. This won't work in this case because we need Manager.

The slightly less optimal technique would be to move the import right next to the code that uses it, and exit early if the import fails. That also won't work in this case, because of how multiprocessing forces you to organize your code.

That leaves us with something like this:

try:
  from multiprocessing import Process, Pool, Queue, Manager
  HAVE_MULTIPROCESSING = True
except ImportError:
  HAVE_MULTIPROCESSING = False

And check for HAVE_MULTIPROCESSING before calling run_tests_parallel.

@@ +277,5 @@
>          result += ": " + message
>      print result
>  
> +def wrap_parallel_run_test(test, lib_dir, shell_args, resultQueue, globOPTIONS, globJS):
> +    # This is necessary because on Windows, global variables are not automatically

The comma in this line is unnecessary.

@@ +278,5 @@
>      print result
>  
> +def wrap_parallel_run_test(test, lib_dir, shell_args, resultQueue, globOPTIONS, globJS):
> +    # This is necessary because on Windows, global variables are not automatically
> +    # available in the childs, while on Linux and OSX this is the case (because of fork).

Childs should be children.

@@ +299,5 @@
> +    except KeyboardInterrupt:
> +        pass
> +
> +def run_tests_parallel(tests, test_dir, lib_dir, shell_args):
> +    # Refactor this part into setup_progress_bar

You might as well just do this in the reporter process once you've commoned the reporting code up (see below).

@@ +310,5 @@
> +            {'value': 'SKIP',    'color': 'brightgray'},
> +        ]
> +        pb = ProgressBar(len(tests), fmt)
> +    
> +    

Only one line break here.

@@ +335,5 @@
> +        
> +        # Wait until all remaining tests have run
> +        #
> +        # NOTE 1: This is required because an exception in the parallelized function
> +        #         will go unnoticed and won't be propagated which is undesired

This is missing a comma before which and missing a period at the end.

@@ +365,5 @@
> +        result_process.terminate()
> +    
> +    return False
> +        
> +def process_test_results_parallel(pb, async_test_result_queue, return_queue, globOPTIONS, globJS):

A generator is what you need here to avoid having to duplicate the test result processing. Also, just take in OPTIONS and JS as |options| and |js| and don't use the globals in this function.

I have in mind something like:

# The first argument |results| is a generator function that we can treat just like a generic list.
def process_test_results(results, options, js):
  pb = ... build progress bar ...
  failures = []
  ...
  try:
    for i, (ok, err, code, timed_out, test) in enumerate(results):
      ... process one result ...
  except KeyboardInterrupt:
    pass
  pb.finish()
  ... etc ...
  return print_test_summary(failures, complete, doing)

# Then we can encapsulate all the specialized parallel fetching into the generator.
def get_parallel_results(async_test_result_queue):
  while True:
    result = asyn_test_result_queue.get()
    if result is None:
      return
    yield (
        async_test_result['ok'],
        async_test_result['out'],
        ... etc ...
      )

# Then the Process just becomes a dispatch to the shared code with the right generator.
def process_test_results_parallel(async_test_result_queue, return_queue, options, js):
  gen = get_parallel_results(async_test_result_queue)
  ok = process_test_results(gen, options, js)
  return_queue.put(ok)

# And we can make the existing single-threaded runner use the new code by giving it a generator that just calls run_test.
def get_serial_results(tests, lib_dir, shell_args):
  for test in tests:
    yield run_test(test, lib_dir, shell_args)

Let me know if you have any questions.

@@ +617,5 @@
>                    help='Run tests with all IonMonkey option combinations (ignores --jitflags)')
> +    op.add_option('--parallel', dest='parallel', action='store_true',
> +                  help='Run tests in parallel.')
> +    op.add_option('--max-jobs', dest='max_jobs',  type=int, default=None,
> +                  help='Maximum number of jobs to run in parallel with --parallel. Defaults to number of CPU cores.')

Instead of having two options, lets do what jstests.py does here:
    harness_og.add_option('-j', '--worker-count', type=int,
                          default=max(1, get_cpu_count()),
                          help='Number of tests to run in parallel (default %default)')

Then disabled would just be -j1 and you get parallelism by default. This has the added benefit that people will not have to learn a new way to run tests.
(In reply to Christian Holler (:decoder) from comment #4)
> (In reply to Gregory Szorc [:gps] from comment #3)
> > The existing multi-process code for JS tests does seem to be quite robust.
> > Although, it is somewhat difficult to integrate with in current form.
> 
> I didn't look at that code but I guess that's what bug 638219 will work
> towards then and replace my hackish approach.

Yes, I rewrote the unix runner into a standard unix child dispatch loop because I was having issues with threads+subprocess freezing.
 
> > > +    op.add_option('--parallel', dest='parallel', action='store_true',
> > > +                  help='Run tests in parallel.')
> > 
> > default=True?
> 
> I can do that, I just thought keeping the default behavior sequential would
> prevent bad surprises for developers when they're not used to these tests
> sucking up all their cpu cores.

Trust me, that will be a /pleasant/ surprise!
 
> > > +    op.add_option('--max-jobs', dest='max_jobs',  type=int, default=None,
> > > +                  help='Maximum number of jobs to run in parallel with --parallel. Defaults to number of CPU cores.')
> > 
> > Since these are short-lived processes (typically), I'd default to > #cores
> > (assuming it is faster) to minimize unused cycles during new process init.
> 
> Ok. What do you suggest? 2*#cores?

I have run into problems in the ref-test runner with tests timing out when I push this up too far. Jit-tests are generally shorter, but it's still something to keep in mind.
> import datetime, os, re, sys, tempfile, traceback, time, shlex
> import subprocess
> from subprocess import *
> from threading import Thread

Why do we import subprocess then from subprocess import * again - isn't this a little redundant?

Also, should the imports be on one line for each import? For this style preference, I'd defer to gps / Terrence here.
(In reply to Gary Kwong [:gkw] from comment #7)
> > import datetime, os, re, sys, tempfile, traceback, time, shlex
> > import subprocess
> > from subprocess import *
> > from threading import Thread
> 
> Why do we import subprocess then from subprocess import * again - isn't this
> a little redundant?

Yeah, this is pretty suboptimal, but works well enough as a horrible hack. The first one gets you the subprocess name, the other gets you all of the fiddly subprocess flags. This is so that you don't have to have subprocess.FLAG_ONE|subprocess.FLAG_TWO|subprocess.FLAG_THREE|etc in your subprocess.Popen command, but can still call Popen as subprocess.Popen.
 
> Also, should the imports be on one line for each import? For this style
> preference, I'd defer to gps / Terrence here.

Yes and no. We should fix it when we replace this code, but shouldn't bother with working code until we do fix it. In particular, I'd like to avoid gratuitously conflicting with Dirkjan.
Attached patch Patch v2 (obsolete) — Splinter Review
New patch, with review and feedback comments addressed. I also ensured that SIGTERM is handled properly (like SIGINT) and that SIGINT is ignored in the childs (preventing ugly exceptions showing up sometimes on keyboard interrupt).

I'll also give this another try run on Linux, Mac and Windows to ensure it's still green there.
Attachment #699366 - Attachment is obsolete: true
Attachment #699506 - Flags: review?(terrence)
Generally Pythonic style frowns on "from foo import *", FWIW.
We generally adhere to PEP-8 style. I personally use flake8 (http://pypi.python.org/pypi/flake8/) to quickly audit my Python for style violations.
Comment on attachment 699506 [details] [diff] [review]
Patch v2

Review of attachment 699506 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/jit-test/jit_test.py
@@ +294,5 @@
> +    ret['out'] = out
> +    ret['err'] = err
> +    ret['code'] = code
> +    ret['timed_out'] = timed_out
> +    ret['test'] = test

It may be better to keep this as a tuple and just put tuples in the list.

@@ +384,5 @@
> +            async_test_result['out'],
> +            async_test_result['err'],
> +            async_test_result['code'],
> +            async_test_result['timed_out'],
> +            async_test_result['test']

If you put raw tuples in the results queue, you can just yield them here and not have to re-type all the fields.

@@ +452,5 @@
> +    complete = False
> +    doing = 'before starting'
> +    try:
> +        for i, (ok, out, err, code, timed_out, test) in enumerate(results):
> +            #doing = 'on %s'%test.path

Just cut this line.

@@ +491,5 @@
> +
> +def get_serial_results(tests, lib_dir, shell_args):
> +    for test in tests:
> +        ok, out, err, code, timed_out = run_test(test, lib_dir, shell_args)
> +        yield (ok, out, err, code, timed_out, test)

You can just keep this tuple in tuple form, no need to unpack and repack it.  Something like this should work:

result = run_test(test, lib_dir, shell_args)
yield result + (test,)
Attachment #699506 - Flags: review?(terrence) → review+
Attached patch Patch v3 (obsolete) — Splinter Review
Minor bugfix for Windows and reduce default number of jobs to number of cores again (otherwise Windows debug builds fail). Keeping r+ from last review and will land this in a few on inbound :)
Attachment #699506 - Attachment is obsolete: true
Attachment #700083 - Flags: review+
Backed out: http://hg.mozilla.org/integration/mozilla-inbound/rev/f01f7b2cd99a

I asked bz to back it out again because it's causing purple on Windows debug (where it was green right before landing). The problem is not a resource exhaustion problem as we thought earlier but something else that only happens on Windows debug, but reproducibly. What happens is that jit-tests doesn't produce output anymore for 5 minutes and then gets shut down. All other platforms (release or debug) are fine.

I'll setup a Windows VM here and try to reproduce locally with a debug build.
Some timings of the buildbot "check test" step from before this was backed out (obviously includes all of make check not just jit-tests):

Without patch...
Linux64 opt: 20mins
Linux64 debug: 60mins

With patch...
Linux64 opt: 11mins
Linux64 debug: 20mins

Pretty sweet given that the win above was only on a subset of make check :-)
Whiteboard: [capacity][buildfaster:1]
Attached patch Patch v4Splinter Review
Here's an improved (fixed) version of the patch. However, multiple things were wrong, not all related to parallel execution:

1. gc/bug-787703.js is extremely slow on Windows debug (1 minute per test run), and running all in parallel seems to exceed the 300 seconds timelimit on try. I decided to mark it as slow therefore.

2. On Windows, the timeout for a test isn't enforced. The code kills the JS binary, but only on Linux and Mac, on Windows there is no code to do so. I added code to do this (using ctypes) and verified it works on Try and locally. This affects also sequential execution. Also this could eliminate some intermittent orange on windows with jit-tests previously.

3. I removed Pool finally, and replaced it with my own implementation. It wasn't responsible for the failure here but not playing nice under Windows with signals. My implementation basically spawns an initial set of processes (still using multiprocess) and then spawns one process per item it receives from a notification queue. Every process finishing adds something new into this queue so there is no visible polling required. This makes it noticeably faster compared to explicit polling.

4. Childs now use signal handlers to properly terminate the JS binary on interrupt and term (works also on Windows). However, if buildbot hits the script with a sigkill immediately, then child processes will remain (no chance to avoid this).

5. Small misc bugfixes.
Attachment #700083 - Attachment is obsolete: true
Attachment #700761 - Flags: review?(terrence)
Comment on attachment 700761 [details] [diff] [review]
Patch v4

Review of attachment 700761 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/jit_test.py
@@ +382,5 @@
> +            if (testcnt < len(tests)):
> +                # Start one new worker
> +                worker_process = Process(target=wrap_parallel_run_test, args=(tests[testcnt], lib_dir, shell_args, async_test_result_queue, OPTIONS, JS))
> +                worker_processes.append(worker_process)
> +                worker_process.start()

If I am reading this correctly, you create a new process for every test job to execute. This process then creates another process to run the actual test. Then, the first process exits.

I think we can do better. I'd like to see persistent worker processes. They sit inside a while loop waiting for new jobs or a shutdown signal. This significantly cuts down on the number of new processes. If nothing crashes, you will run N + # cores new processes instead of 2N. On Windows, this will make a world of difference due to new process overhead.

This approach is what multiprocessing.Pool uses internally. On IRC, you mentioned your patch was slower than multiprocessing.Pool. I reckon this is why.

That being said, even with this inefficiency, we're already faster than serial. More efficient process usage could be pushed to a follow-up bug.
(In reply to Gregory Szorc [:gps] from comment #18)

> This approach is what multiprocessing.Pool uses internally. On IRC, you
> mentioned your patch was slower than multiprocessing.Pool. I reckon this is
> why.

Agreed that your approach will be faster :) I measured this approach vs. Pool and the difference got much smaller after I implemented it with the notification queue. So yes, there will be an additional gain, but it'll be less than expected I guess.

> That being said, even with this inefficiency, we're already faster than
> serial. More efficient process usage could be pushed to a follow-up bug.

I would appreciate that because I think this would require a few more days to work properly on all OSes and I don't have much more time to invest right now (though maybe another time I'm willing to rewrite this again if it hasn't been superseeded already by the JS tests harness merge then).
Comment on attachment 700761 [details] [diff] [review]
Patch v4

Review of attachment 700761 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Christian Holler (:decoder) from comment #19)
> (In reply to Gregory Szorc [:gps] from comment #18)
> 
> > This approach is what multiprocessing.Pool uses internally. On IRC, you
> > mentioned your patch was slower than multiprocessing.Pool. I reckon this is
> > why.
> 
> Agreed that your approach will be faster :) I measured this approach vs.
> Pool and the difference got much smaller after I implemented it with the
> notification queue. So yes, there will be an additional gain, but it'll be
> less than expected I guess.
> 
> > That being said, even with this inefficiency, we're already faster than
> > serial. More efficient process usage could be pushed to a follow-up bug.
> 
> I would appreciate that because I think this would require a few more days
> to work properly on all OSes and I don't have much more time to invest right
> now (though maybe another time I'm willing to rewrite this again if it
> hasn't been superseeded already by the JS tests harness merge then).

What you have here should be perfect until Dirkjan integrates the two test suites. Once that is complete, we can be using the reftests task system, which doesn't spawn any excess processes at all.

::: js/src/jit-test/jit_test.py
@@ +335,5 @@
> +def run_tests_parallel(tests, test_dir, lib_dir, shell_args):
> +    # This queue will contain the results of the various tests run.
> +    # We could make this queue a global variable instead of using
> +    # a manager to share, but this will not work on Windows.
> +    #async_test_result_queue_manager = Manager()

Remove the commented line.

@@ +356,5 @@
> +    # to terminate all child processes.
> +    sigint_handler = signal.getsignal(signal.SIGINT)
> +    signal.signal(signal.SIGTERM, sigint_handler)
> +
> +    #async_test_results = []

Remove.

@@ +384,5 @@
> +                worker_process = Process(target=wrap_parallel_run_test, args=(tests[testcnt], lib_dir, shell_args, async_test_result_queue, OPTIONS, JS))
> +                worker_processes.append(worker_process)
> +                worker_process.start()
> +                testcnt += 1
> +                

Remove the trailing whitespace.
Attachment #700761 - Flags: review?(terrence) → review+
Landed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/c8a9f356050e

I'm going to open a followup bug to implement what gps mentioned in comment 18. Maybe I even get the time tomorrow to do it, if it isn't too much work :)
Blocks: 829412
https://hg.mozilla.org/mozilla-central/rev/c8a9f356050e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Blocks: 829787
Blocks: 845748
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: