Closed Bug 620285 Opened 14 years ago Closed 5 months ago

Enable more parallelism in builds

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Unassigned)

Details

Attachments

(3 files, 9 obsolete files)

I was looking at the result of running |make -C xpcom| under |perf timechart|, and it appears that increasing the parallelism of our builds would make them substantially faster.

I have a script which parses the result of running make under strace and determines which subdirectories can be safely run in parallel.  I've only used it on the xpcom directory so far, but the results are promising.
Attached patch Proof of concept v1 (obsolete) — Splinter Review
Proof of concept patch for building xpcom in parallel.

+PARALLEL_LIBS_DIRS = \
+		typelib \
+		string \
+		glue \
+		base \
+		ds \
+		io \
+		components \
+		threads \
+		reflect \
+		proxy \
+                system \
+		../chrome \
+		$(NULL)

I wonder if there's a better way to do this.  In particular, I'd like to say that PARALLEL_LIBS_DIRS is DIRS minus the build directory.
Will post benchmarks soon.
I'm not terribly fond of this approach.  I'd guess the reason most of these can't be built in parallel in all tiers is the .idl->.h transformation, which is a bug in its own right that isn't hard to fix.  xpcom/typelib probably needs to be moved to before tier_platform too.
To be a bit more precise, during IDL generation if nsIBar.idl includes nsIFoo.idl xpidl needs to find nsIFoo.idl.  Right now we copy all IDL files to dist/idl when we encounter them in the build system[1] and tell xpidl to get them from there[2].  Instead we should create XPIDL_INCLUDES or something and then do the same thing we do with LOCAL_INCLUDES to use headers from other directories.  For bonus points we could even stop copying the files to dist/idl (not sure what if anything depends on that).  This would let us parallelize (among other things) dom/ and its piles of idl.

I've been meaning to do this for a while but haven't had the time.

[1] http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/rules.mk#l1808
[2] http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/rules.mk#l1748 and http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/config.mk#l618
I realized halfway between Austin and Tampa that my script wasn't looking at symlink() calls.  That explained why I saw parallelism in the export step.  :)

> I'm not terribly fond of this approach.

What specifically don't you like?  My thought at this point is to use a tool to inspect the build, find those directories which can be built in parallel, and explicitly mark that parallelism in the Makefiles.  It would be fantastic if we could make the build more parallel in the meantime, but I don't think that's a prerequisite.

Perhaps this approach reflects my unfamiliarity with the build system.
(In reply to comment #5)
> > I'm not terribly fond of this approach.
> 
> What specifically don't you like?  My thought at this point is to use a tool to
> inspect the build, find those directories which can be built in parallel, and
> explicitly mark that parallelism in the Makefiles.  It would be fantastic if we
> could make the build more parallel in the meantime, but I don't think that's a
> prerequisite.

I don't like enabling parallelism for only one phase (libs) when I think we can do it for all three almost as easily.
(In reply to comment #6)
> I don't like enabling parallelism for only one phase (libs) when I think we can
> do it for all three almost as easily.

To be clear, the PoC I posted enables parallelism within all three build steps, although it's not correct to parallelize make export at this point.
(In reply to comment #7)
> (In reply to comment #6)
> > I don't like enabling parallelism for only one phase (libs) when I think we can
> > do it for all three almost as easily.
> 
> To be clear, the PoC I posted enables parallelism within all three build steps,
> although it's not correct to parallelize make export at this point.

Right, but if we fix the export issues we can parallelize all three tiers with the existing infrastructure.
But there's still benefit in parallelizing the other two tiers, right?  We could always go back and turn on parallelism for export step, especially if all subdirectories are usually independent of one another.
Attached patch Proof of concept v2 (obsolete) — Splinter Review
This patch makes |make tools| run entirely in parallel.  At least, that's what I think it does.

I was hoping that this change would cause |make tools| to run faster on a tree with no changes.  But I observed quite the opposite:

In root objdir with no changes, warm buffer cache, |time make -s -jX tools|, varying X.  Quad-core Xeon W3520 @ 2.67GHz with hyper-threading, 12GB RAM.

Unpatched (rev b84f528e2e10):
  j1:  4.7s
  j2:  3.9s
  j4:  3.7s
  j8:  3.6s

Patched:
  j1:  17.2s
  j2:  13.5s
  j4:  13.0s
  j8:  13.1s

I wonder why this is happening.
Attachment #498644 - Attachment is obsolete: true
Attachment #500212 - Attachment is obsolete: true
Attached patch Proof of concept v3 (obsolete) — Splinter Review
Adding back a hunk I shouldn't have removed.
With the patch, we make just a few extra execve() calls.

81 extra invocations each of

> /bin/sh -c hg -R ../src parent --template=\{node|short}\\n\ 2>/dev/null

and

> /usr/bin/python2.7 ../src/config/printconfigsetting.py ./dist/bin/application.ini App BuildID

And one extra invocation of:

> config/nsinstall -D ../../../_tests/reftest

The plot thickens...
It looks like this include in the root Makefile.in

> include $(topsrcdir)/toolkit/mozapps/installer/package-name.mk

is being processed 81 times in a patched build, whereas in an unpatched build, it's processed only once.  That file invokes hg and runs printconfigsetting.py.  I'll keep poking around, but khuey, do you have any idea why this might be happening?
Oh, I bet the problem is that BUILDID and MOZ_SOURCE_STAMP are recursively-expanded, not simply expanded.  That is, they're defined with = and not :=.
Attachment #500480 - Attachment is obsolete: true
Attached patch Proof of concept v4 (obsolete) — Splinter Review
New benchmark results for |time make -s -jX tools|, with patch:

 j1:  5.0s
 j2:  2.8s
 j4:  1.6s
 j8:  1.5s
 j16: 1.3s

I'm not sure why j1 is a bit slower after the patch than before, but this is certainly more promising than the results from comment 10.  :)
Attachment #500537 - Flags: feedback?(khuey)
Comment on attachment 500537 [details] [diff] [review]
Proof of concept v4

LGTM, but I'm going to pass to ted.
Attachment #500537 - Flags: feedback?(khuey) → feedback?(ted.mielczarek)
I think -j1 is a bit slower than unpatched because the two commands

> /bin/sh -c hg -R ../src parent --template=\{node|short}\\n\ 2>/dev/null

> /usr/bin/python2.7 ../src/config/printconfigsetting.py ./dist/bin/application.ini App BuildID

are being run two extra times.  But not 80 extra times, which is an improvement.  :)
Attachment #500537 - Attachment is obsolete: true
Attachment #500537 - Flags: feedback?(ted.mielczarek)
Attached patch Proof of concept v5 (obsolete) — Splinter Review
This patch builds xpcom's libs in parallel; at least, that's the hope.

> objdir/xpcom$ make clean && make export && time make libs -s -j16

  (With ccache, so all the gcc invocations are cache hits.)
  Unpatched: 1.7s
  Patched:   1.2s

> objdir/xpcom$ time make libs -s -j16

  (Up-to-date, so no need to invoke the compiler.)
  Unpatched: 1.4s
  Patched:   0.6s

I'm not a fan of duplicating the directory list, but we could certainly build the DIRS variable out of the PARALLEL_LIB_DIRS_X variables.
Attachment #502279 - Flags: feedback?(ted.mielczarek)
I suppose ideally, we'd parallelize at the root directory, potentially in addition to parallelizing in subdirectories.  I'd guess that parellism at the top level only would be sufficient and might be necessary to get most of the speedup.

This is the result of compiling under strace and analyzing the result using the tool I wrote [1].  It shows that the dependencies at the root aren't *too* complicated, although I'm not sure we'd want to encode even these by hand.

I'll post a filtered version of the output in a moment and describe how to read it.

[1] http://hg.mozilla.org/users/jlebar_mozilla.com/build-tracer/
This file is a subset of the previous attachment.  It shows which top-level libs build tasks depend on which directories.
To give an idea of how much complexity is encoded here:

> $ grep '\* Must run before' <previous-attachment> | sort | uniq | wc -l
14

Put another way, let S_d be the set of directories which must be built before directory d is built.  Then there are 14 unique sets S_d.
I really don't think this level of complexity is worth it. It's going to make both our Makefiles and rules.mk more complicated. Kyle and I have a plan in bug 623617 that we think will give us speedups as well as make things less complicated, which is what I'd rather pursue.
(In reply to comment #22)
> I really don't think this level of complexity is worth it. It's going to make
> both our Makefiles and rules.mk more complicated. Kyle and I have a plan in bug
> 623617 that we think will give us speedups as well as make things less
> complicated, which is what I'd rather pursue.

Well, if we parallelized only at the top level, I don't think we'd need to touch the Makefiles.  Certainly rules.mk and toolkit-tiers.mk would increase in complexity.

If we were willing or able to paralellize the export tier, I'd argue that this bug might be a useful stopgap before bug 623617, since all appearances are that we could get a n-times or better speedup with n cores on a do-nothing build and with substantially less effort than bug 623617.

I'd expect a substantial speedup on builds that do something, too; consider that when you build a directory with 9 source files on an eight-core machine, seven of your cores sit idle for half the time.

That said, make export takes 1/3 of the do-nothing build time, and that limits the improvements we could see with the approach in this bug.

Anyway, bug 623617 seems like the Right Approach to fixing the build.  I had no such aspirations in this bug.  :)
Comment on attachment 502279 [details] [diff] [review]
Proof of concept v5

This is too gross for my tastes.
Attachment #502279 - Flags: feedback?(ted.mielczarek) → feedback-
Let's resolve this WONTFIX and cross our fingers for bug 623617.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
I have a proof-of-concept patch implementing the approach described in comment 23; I think it's worth considering, so I'm re-opening this.

On my 4-core Linux-64 machine, the patch takes the time for a ccached build from 2:30 to 2:00.  I'm timing non-ccached builds now.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Attached patch Proof of concept v6 (obsolete) — Splinter Review
Attachment #502279 - Attachment is obsolete: true
The patch takes my clobber ccache-miss builds from 12m to 10m.
cc'ing Mounir, who said he'd try this out on his 8-core machine.
It is failing with:
make[4]: *** No rule to make target `media/libvpx_platform_libs', needed by `layout_platform_libs'.  Stop.
Do you have a nonstandard mozconfig that disables a bunch of stuff?  Can you try with a vanilla one?
(In reply to comment #31)
> Do you have a nonstandard mozconfig that disables a bunch of stuff? 

Indeed... Using system libraries mostly.

> Can you try with a vanilla one?

Goes from:
real    10m53.785s
user    60m55.314s
sys     5m28.643s

To:
real    7m22.727s
user    66m14.474s
sys     5m59.150s
The 1.5x speedup on Monuir's machine (654s / 443s) seems like pretty strong validation of this approach to me.  Not everyone has 8-core machines, but even 2 minutes on a 4-core machine (12m / 10m = 1.2x) is a significant improvement.

Alas this style of analysis is difficult (impossible?) to do on Mac because dtrace sucks [1].  It looks like we'll have to compute these dependencies manually.

[1] http://benjamin.smedbergs.us/blog/2008-11-20/282/
I hacked something together to parallelize the export step in a similar manner, but the speedup was pretty negligible; I think we should concentrate on libs, at least for now.
I promised Justin I would time this on my 8 physical/16 logical core box, and will do so before the week is out.
Let me post a new patch first!  There's a bug in the one here which causes tier_platform_libs to be built twice.  (The parallel one goes first, which is why we still see an improvement.)

Anyway, I'm testing the fixed one now.
I'll test whatever's current on Thursday, before I box up my apartment on Friday.
Attached patch Proof of concept v7 (obsolete) — Splinter Review
This doesn't have the bug described in comment 36, and it works on my mac with
a vanilla mozconfig.  It shouldn't be too much work to get it to work with
other mozconfigs.
Attachment #525565 - Attachment is obsolete: true
$ rm -rf ../debug && time make -f client.mk

Mac 10.6.  -j16 on a 4-core i7 with hyper-threading with SSD.  ccache disabled.

  unpatched:

  real	13m10.292s
  user	63m28.180s
  sys	6m31.702s

  patched:

  real	11m58.820s
  user	71m12.840s
  sys	7m29.769s

Ubuntu 10.10.  -j12 on 4-core i7 with hyper-threading with SSD.  ccache disabled.

  unpatched:

  real	9m59.382s
  user	45m49.110s
  sys	4m4.930s

  patched:

  real	8m1.331s
  user	48m56.840s
  sys	4m26.100s

I think these results are faster than before on the Ubuntu box because I actually disabled, rather than cleaned, ccache.  (The relative improvement is also larger; 10/8 = 1.25 > 1.2.  This is probably because I fixed the bug from comment 36.)

The higher usertime indicates that we might still be doing some extra work with the patch; I'll see if I can find it.
Attachment #528363 - Attachment is obsolete: true
Attached patch Proof of concept v9 (obsolete) — Splinter Review
khuey and I were able to come up with four possible arguments against taking this patch.

a) It makes adding a top-level directory harder, since you have to manually resolve its dependencies.  But we don't add top-level directories so often, so maybe this is OK.

b) A top-level directory might acquire a new dependency without us noticing, and so builds will start to fail randomly.  But since this patch only touches the tools and libs step, and since the tools step is dependency free afaict, we don't expect this to be a problem.

c) The patch might not capture all dependencies between folders, so builds will randomly fail.
Ack.  I meant to cancel that bzexport!  It's not quite ready; there's an (apparently harmless) make check failure I need to fix first.

One other argument against this is that it distracts from the Plan To Rewrite The Build.  But it's a small change and gives us a large improvement we can have Now, rather than RSN.

I'll push a new patch once I fix the make check issue.
Attachment #534859 - Attachment is obsolete: true
Attachment #534884 - Attachment description: Patch for review v1 → Proof of concept v9
Fixed the make check orange -- I just needed to sync config/rules.mk with js/src/config/rules.mk.

There are still a few ouststanding issues with this patch (marked with XXX's), but I could use some help with them.
Attachment #534884 - Attachment is obsolete: true
Attachment #535168 - Flags: review?(ted.mielczarek)
Joey has something that sounds similar in bug 623617, a script that works out dependencies and builds things in the right order. I'm still not a fan of encoding additional complexity into the build system, but if we could make this automated as part of the build it might be a nice intermediate step.
> I'm still not a fan of encoding additional complexity into the build system, 
> but if we could make this automated as part of the build it might be a nice 
> intermediate step.

Intermediate in the sense of "before properly rewriting the build"?
Yeah, that's what I mean. If your list of dependencies is auto-generated, can we just run that script during configure to generate them? That would seem to make this a lot less painful to maintain.
> If your list of dependencies is auto-generated, can we just run that script 
> during configure to generate them?

Not with the approach I took here.  The dependencies are generated by running a clobber build under strace.

They're also not 100% auto-generated.  dtruss on Mac doesn't give me enough information to play the same game I play on Linux, and I didn't even try on Windows.  So for those top-level folders built only on Mac or Windows, you'd need to build the dependencies by hand.

Because the parallelism is at the granularity of top-level directories and only in the libs step, where the dependencies aren't so complex, I'm not sure the maintenance burden would be so huge.  Looking at hg log, we don't seem to mess around with toolkit-tiers.mk too often.
Comment on attachment 535168 [details] [diff] [review]
Patch for review v1

I can't in good conscience r+ this. I think the maintenance is going to be too much of a burden.
Attachment #535168 - Flags: review?(ted.mielczarek) → review-
What if we left it off by default?

I've been building with this patch for months now, and updating it has been very easy.  Since I'm already doing the work to keep my patch in sync, I'd be happy to volunteer to keep the code in sync in the tree.

I've had to update the patch just three times since the end of May.  But of course I haven't gone through the work to prove that the patch's dependencies are still correct.
> What if we left it off by default?

To be clear, you can't do this with the current patch.  But I imagine it'd be possible to add a configure switch.
I feel like glandium fixed most of this with his build system work.
Product: Core → Firefox Build System
Severity: normal → S3

we are doing a lot already. closing

Status: REOPENED → RESOLVED
Closed: 13 years ago5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: