Closed Bug 890015 Opened 8 years ago Closed 8 years ago

Run test-masters.sh in parallel

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sfink, Assigned: sfink)

References

Details

Attachments

(1 file, 4 obsolete files)

Running test-masters.sh is boring.
Attached patch Run test-masters.sh in parallel (obsolete) — Splinter Review
Unless I'm doing something horribly wrong, running the tests in parallel drops my time from 415 seconds to 88 seconds.

Note that setup-master.py -l dumps out a bunch of masters that immediately succeed if passed to -t. That's probably because they aren't in the test masters_list, so I'm doing a test of an empty set of masters, and that likely means that the test master list is being filtered in some way that the -l list is not.

Good thing I'm not bothering to put this up for review, since then I'd need to figure it out. This is just a FYI patch that I'm going to keep in my queue.
Attached patch Run test-masters.sh in parallel (obsolete) — Splinter Review
Updating to the latest version I'm using, since I now find it indispensable (and it hasn't caused any issues. Yet.)
Attachment #770974 - Attachment is obsolete: true
Product: mozilla.org → Release Engineering
Attached patch Run test-masters.sh in parallel (obsolete) — Splinter Review
Finally pushing a bunch of related stuff for review.
Attachment #802523 - Flags: review?(bugspam.Callek)
Attachment #775760 - Attachment is obsolete: true
Blocks: 914807
Attached patch Run test-masters.sh in parallel (obsolete) — Splinter Review
Folded in the check for unknown masters, and more importantly, the restriction to the masters passed on the command line.
Attachment #806085 - Flags: review?(bugspam.Callek)
Attachment #802523 - Attachment is obsolete: true
Attachment #802523 - Flags: review?(bugspam.Callek)
Comment on attachment 806085 [details] [diff] [review]
Run test-masters.sh in parallel

Review of attachment 806085 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you answer/address my comments.

::: setup-master.py
@@ +368,5 @@
>      assert len(master_list) > 0, "No masters specified. Bad role?"
>  
> +    if options.list or options.test:
> +        masters = filter_masters(master_list)
> +        if len(args) > 0:

Nit: Please update the docstring:
http://mxr.mozilla.org/build/source/buildbot-configs/setup-master.py#2

e.g. add an extra line under setup-master.py master_dir master_name
setup-master.py [-t|-l] [wanted_masters]... or some such

::: test-masters.sh
@@ +3,5 @@
>  # the -t option.  We use that now
>  exit=0
>  
> +# Serial: 327 seconds
> +# Parallel: 72 seconds

please drop these comments from file, they vary too much from system to system to be useful here.

@@ +8,5 @@
> +
> +mkdir test-output 2>/dev/null
> +
> +actioning="Checking"
> +MASTERS_JSON_URL='http://hg.mozilla.org/build/tools/raw-file/tip/buildfarm/maintenance/production-masters.json'

can we do:
MASTERS_JSON_URL=${MASTERS_JSON_URL:-http://hg.mozilla.org/build/tools/raw-file/tip/buildfarm/maintenance/production-masters.json}

which allows someone who wants to pass to this a different master json the ability to do so :-)

@@ +28,5 @@
> +# Fire off all the tests in parallel.
> +for MASTER in ${MASTERS[*]}; do (
> +    OUTFILE=$(mktemp)
> +
> +    ./setup-master.py -t -j "$MASTERS_JSON" "$@" $MASTER > $OUTFILE 2>&1 || echo "$MASTER" >> $FAILFILE

doesn't this mean that FAILFILE will not get proper output if more than one fail at once? not sure how much that matters but worth calling out

@@ +53,5 @@
> +    echo "*** $(wc -l < $FAILFILE) master tests failed ***" >&2
> +    echo "Failed masters:" >&2
> +    sed -e 's/^/  /' "$FAILFILE" >&2
> +    exit=1
> +fi

we should do the test of FAILFILE above the trial block
Attachment #806085 - Flags: review?(bugspam.Callek) → review+
(In reply to Justin Wood (:Callek) from comment #5)
> Comment on attachment 806085 [details] [diff] [review]
> Run test-masters.sh in parallel
> 
> Review of attachment 806085 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ if you answer/address my comments.
> 
> ::: setup-master.py
> @@ +368,5 @@
> >      assert len(master_list) > 0, "No masters specified. Bad role?"
> >  
> > +    if options.list or options.test:
> > +        masters = filter_masters(master_list)
> > +        if len(args) > 0:
> 
> Nit: Please update the docstring:
> http://mxr.mozilla.org/build/source/buildbot-configs/setup-master.py#2
> 
> e.g. add an extra line under setup-master.py master_dir master_name
> setup-master.py [-t|-l] [wanted_masters]... or some such

Done

> ::: test-masters.sh
> @@ +3,5 @@
> >  # the -t option.  We use that now
> >  exit=0
> >  
> > +# Serial: 327 seconds
> > +# Parallel: 72 seconds
> 
> please drop these comments from file, they vary too much from system to
> system to be useful here.

I think I was using them as an example, for justifying all this complexity. Doesn't matter now.

> @@ +8,5 @@
> > +
> > +mkdir test-output 2>/dev/null
> > +
> > +actioning="Checking"
> > +MASTERS_JSON_URL='http://hg.mozilla.org/build/tools/raw-file/tip/buildfarm/maintenance/production-masters.json'
> 
> can we do:
> MASTERS_JSON_URL=${MASTERS_JSON_URL:-http://hg.mozilla.org/build/tools/raw-
> file/tip/buildfarm/maintenance/production-masters.json}
>
> which allows someone who wants to pass to this a different master json the
> ability to do so :-)

Hm, I'd always done ${foo-value}, which appears to be the same. But man bash, in addition to being a favorite leisure-time activity of groups of young women, says ${foo:-value}. TIL

> @@ +28,5 @@
> > +# Fire off all the tests in parallel.
> > +for MASTER in ${MASTERS[*]}; do (
> > +    OUTFILE=$(mktemp)
> > +
> > +    ./setup-master.py -t -j "$MASTERS_JSON" "$@" $MASTER > $OUTFILE 2>&1 || echo "$MASTER" >> $FAILFILE
> 
> doesn't this mean that FAILFILE will not get proper output if more than one
> fail at once? not sure how much that matters but worth calling out

Theoretically yes, sorta.

But in actual practice: the >> means it'll open in append mode. That goes all the way to the kernel, so it will guarantee that separate write() calls never get interleaved. The shell is almost certain to make that output line-buffered, at least up until a line hits a 1K or 4K boundary, which means that this echo will come out in a single write().

> @@ +53,5 @@
> > +    echo "*** $(wc -l < $FAILFILE) master tests failed ***" >&2
> > +    echo "Failed masters:" >&2
> > +    sed -e 's/^/  /' "$FAILFILE" >&2
> > +    exit=1
> > +fi
> 
> we should do the test of FAILFILE above the trial block

Ok. I made it do an immediate exit on FAILFILE, because when working through failures it's nice to not have that chunk of trial output between me and the log filename.
(In reply to Steve Fink [:sfink] from comment #7)
> http://hg.mozilla.org/build/buildbot-configs/rev/601d9ffd21c8

This was landed on the production branch (and so isn't on default) - intentional? :-)
Is this patch linux-only ?

(bb08)deathduck:/src/kill-aurora-localizer/buildbot-configs [13:48:14] (default)
539$ ./test-masters.sh
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
Hm. I *think* I should have fixed it for OSX, but I don't have an easy way to try it out. Can you test it, please?
Attachment #807627 - Flags: review?(aki)
Attachment #806085 - Attachment is obsolete: true
Comment on attachment 807627 [details] [diff] [review]
Run test-masters.sh in parallel

This works!
|time ./test-masters.sh| tells me it approximately halves the time to run test-masters.sh on my macbook air.  Cool.
Attachment #807627 - Flags: review?(aki) → review+
(In reply to Aki Sasaki [:aki] from comment #12)
> Comment on attachment 807627 [details] [diff] [review]
> Run test-masters.sh in parallel
> 
> This works!
> |time ./test-masters.sh| tells me it approximately halves the time to run
> test-masters.sh on my macbook air.  Cool.

Hm. Does that only have 2 cores or something? I get about a 4.5x speedup, but my laptop is quad core (plus hyperthreading.)

Relanded the fixed up version (to the default branch, this time!):

https://hg.mozilla.org/build/buildbot-configs/rev/24c750586403
Yup, 2 cores.
Live on production.

(Not CCing myself)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.