Closed
Bug 620285
Opened 14 years ago
Closed 1 year ago
Enable more parallelism in builds
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: justin.lebar+bug, Unassigned)
Details
Attachments
(3 files, 9 obsolete files)
I was looking at the result of running |make -C xpcom| under |perf timechart|, and it appears that increasing the parallelism of our builds would make them substantially faster.
I have a script which parses the result of running make under strace and determines which subdirectories can be safely run in parallel. I've only used it on the xpcom directory so far, but the results are promising.
Reporter | ||
Comment 1•14 years ago
|
||
Proof of concept patch for building xpcom in parallel.
+PARALLEL_LIBS_DIRS = \
+ typelib \
+ string \
+ glue \
+ base \
+ ds \
+ io \
+ components \
+ threads \
+ reflect \
+ proxy \
+ system \
+ ../chrome \
+ $(NULL)
I wonder if there's a better way to do this. In particular, I'd like to say that PARALLEL_LIBS_DIRS is DIRS minus the build directory.
Reporter | ||
Comment 2•14 years ago
|
||
Will post benchmarks soon.
I'm not terribly fond of this approach. I'd guess the reason most of these can't be built in parallel in all tiers is the .idl->.h transformation, which is a bug in its own right that isn't hard to fix. xpcom/typelib probably needs to be moved to before tier_platform too.
To be a bit more precise, during IDL generation if nsIBar.idl includes nsIFoo.idl xpidl needs to find nsIFoo.idl. Right now we copy all IDL files to dist/idl when we encounter them in the build system[1] and tell xpidl to get them from there[2]. Instead we should create XPIDL_INCLUDES or something and then do the same thing we do with LOCAL_INCLUDES to use headers from other directories. For bonus points we could even stop copying the files to dist/idl (not sure what if anything depends on that). This would let us parallelize (among other things) dom/ and its piles of idl.
I've been meaning to do this for a while but haven't had the time.
[1] http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/rules.mk#l1808
[2] http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/rules.mk#l1748 and http://hg.mozilla.org/mozilla-central/annotate/e952221c3251/config/config.mk#l618
Reporter | ||
Comment 5•14 years ago
|
||
I realized halfway between Austin and Tampa that my script wasn't looking at symlink() calls. That explained why I saw parallelism in the export step. :)
> I'm not terribly fond of this approach.
What specifically don't you like? My thought at this point is to use a tool to inspect the build, find those directories which can be built in parallel, and explicitly mark that parallelism in the Makefiles. It would be fantastic if we could make the build more parallel in the meantime, but I don't think that's a prerequisite.
Perhaps this approach reflects my unfamiliarity with the build system.
(In reply to comment #5)
> > I'm not terribly fond of this approach.
>
> What specifically don't you like? My thought at this point is to use a tool to
> inspect the build, find those directories which can be built in parallel, and
> explicitly mark that parallelism in the Makefiles. It would be fantastic if we
> could make the build more parallel in the meantime, but I don't think that's a
> prerequisite.
I don't like enabling parallelism for only one phase (libs) when I think we can do it for all three almost as easily.
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> I don't like enabling parallelism for only one phase (libs) when I think we can
> do it for all three almost as easily.
To be clear, the PoC I posted enables parallelism within all three build steps, although it's not correct to parallelize make export at this point.
(In reply to comment #7)
> (In reply to comment #6)
> > I don't like enabling parallelism for only one phase (libs) when I think we can
> > do it for all three almost as easily.
>
> To be clear, the PoC I posted enables parallelism within all three build steps,
> although it's not correct to parallelize make export at this point.
Right, but if we fix the export issues we can parallelize all three tiers with the existing infrastructure.
Reporter | ||
Comment 9•14 years ago
|
||
But there's still benefit in parallelizing the other two tiers, right? We could always go back and turn on parallelism for export step, especially if all subdirectories are usually independent of one another.
Reporter | ||
Comment 10•14 years ago
|
||
This patch makes |make tools| run entirely in parallel. At least, that's what I think it does.
I was hoping that this change would cause |make tools| to run faster on a tree with no changes. But I observed quite the opposite:
In root objdir with no changes, warm buffer cache, |time make -s -jX tools|, varying X. Quad-core Xeon W3520 @ 2.67GHz with hyper-threading, 12GB RAM.
Unpatched (rev b84f528e2e10):
j1: 4.7s
j2: 3.9s
j4: 3.7s
j8: 3.6s
Patched:
j1: 17.2s
j2: 13.5s
j4: 13.0s
j8: 13.1s
I wonder why this is happening.
Attachment #498644 -
Attachment is obsolete: true
Reporter | ||
Updated•14 years ago
|
Attachment #500212 -
Attachment is obsolete: true
Reporter | ||
Comment 11•14 years ago
|
||
Adding back a hunk I shouldn't have removed.
Reporter | ||
Comment 12•14 years ago
|
||
With the patch, we make just a few extra execve() calls.
81 extra invocations each of
> /bin/sh -c hg -R ../src parent --template=\{node|short}\\n\ 2>/dev/null
and
> /usr/bin/python2.7 ../src/config/printconfigsetting.py ./dist/bin/application.ini App BuildID
And one extra invocation of:
> config/nsinstall -D ../../../_tests/reftest
The plot thickens...
Reporter | ||
Comment 13•14 years ago
|
||
It looks like this include in the root Makefile.in
> include $(topsrcdir)/toolkit/mozapps/installer/package-name.mk
is being processed 81 times in a patched build, whereas in an unpatched build, it's processed only once. That file invokes hg and runs printconfigsetting.py. I'll keep poking around, but khuey, do you have any idea why this might be happening?
Reporter | ||
Comment 14•14 years ago
|
||
Oh, I bet the problem is that BUILDID and MOZ_SOURCE_STAMP are recursively-expanded, not simply expanded. That is, they're defined with = and not :=.
Reporter | ||
Updated•14 years ago
|
Attachment #500480 -
Attachment is obsolete: true
Reporter | ||
Comment 15•14 years ago
|
||
New benchmark results for |time make -s -jX tools|, with patch:
j1: 5.0s
j2: 2.8s
j4: 1.6s
j8: 1.5s
j16: 1.3s
I'm not sure why j1 is a bit slower after the patch than before, but this is certainly more promising than the results from comment 10. :)
Reporter | ||
Updated•14 years ago
|
Attachment #500537 -
Flags: feedback?(khuey)
Comment on attachment 500537 [details] [diff] [review]
Proof of concept v4
LGTM, but I'm going to pass to ted.
Attachment #500537 -
Flags: feedback?(khuey) → feedback?(ted.mielczarek)
Reporter | ||
Comment 17•14 years ago
|
||
I think -j1 is a bit slower than unpatched because the two commands
> /bin/sh -c hg -R ../src parent --template=\{node|short}\\n\ 2>/dev/null
> /usr/bin/python2.7 ../src/config/printconfigsetting.py ./dist/bin/application.ini App BuildID
are being run two extra times. But not 80 extra times, which is an improvement. :)
Reporter | ||
Updated•14 years ago
|
Attachment #500537 -
Attachment is obsolete: true
Attachment #500537 -
Flags: feedback?(ted.mielczarek)
Reporter | ||
Comment 18•14 years ago
|
||
This patch builds xpcom's libs in parallel; at least, that's the hope.
> objdir/xpcom$ make clean && make export && time make libs -s -j16
(With ccache, so all the gcc invocations are cache hits.)
Unpatched: 1.7s
Patched: 1.2s
> objdir/xpcom$ time make libs -s -j16
(Up-to-date, so no need to invoke the compiler.)
Unpatched: 1.4s
Patched: 0.6s
I'm not a fan of duplicating the directory list, but we could certainly build the DIRS variable out of the PARALLEL_LIB_DIRS_X variables.
Reporter | ||
Updated•14 years ago
|
Attachment #502279 -
Flags: feedback?(ted.mielczarek)
Reporter | ||
Comment 19•14 years ago
|
||
I suppose ideally, we'd parallelize at the root directory, potentially in addition to parallelizing in subdirectories. I'd guess that parellism at the top level only would be sufficient and might be necessary to get most of the speedup.
This is the result of compiling under strace and analyzing the result using the tool I wrote [1]. It shows that the dependencies at the root aren't *too* complicated, although I'm not sure we'd want to encode even these by hand.
I'll post a filtered version of the output in a moment and describe how to read it.
[1] http://hg.mozilla.org/users/jlebar_mozilla.com/build-tracer/
Reporter | ||
Comment 20•14 years ago
|
||
This file is a subset of the previous attachment. It shows which top-level libs build tasks depend on which directories.
Reporter | ||
Comment 21•14 years ago
|
||
To give an idea of how much complexity is encoded here:
> $ grep '\* Must run before' <previous-attachment> | sort | uniq | wc -l
14
Put another way, let S_d be the set of directories which must be built before directory d is built. Then there are 14 unique sets S_d.
Comment 22•14 years ago
|
||
I really don't think this level of complexity is worth it. It's going to make both our Makefiles and rules.mk more complicated. Kyle and I have a plan in bug 623617 that we think will give us speedups as well as make things less complicated, which is what I'd rather pursue.
Reporter | ||
Comment 23•14 years ago
|
||
(In reply to comment #22)
> I really don't think this level of complexity is worth it. It's going to make
> both our Makefiles and rules.mk more complicated. Kyle and I have a plan in bug
> 623617 that we think will give us speedups as well as make things less
> complicated, which is what I'd rather pursue.
Well, if we parallelized only at the top level, I don't think we'd need to touch the Makefiles. Certainly rules.mk and toolkit-tiers.mk would increase in complexity.
If we were willing or able to paralellize the export tier, I'd argue that this bug might be a useful stopgap before bug 623617, since all appearances are that we could get a n-times or better speedup with n cores on a do-nothing build and with substantially less effort than bug 623617.
I'd expect a substantial speedup on builds that do something, too; consider that when you build a directory with 9 source files on an eight-core machine, seven of your cores sit idle for half the time.
That said, make export takes 1/3 of the do-nothing build time, and that limits the improvements we could see with the approach in this bug.
Anyway, bug 623617 seems like the Right Approach to fixing the build. I had no such aspirations in this bug. :)
Comment 24•14 years ago
|
||
Comment on attachment 502279 [details] [diff] [review]
Proof of concept v5
This is too gross for my tastes.
Attachment #502279 -
Flags: feedback?(ted.mielczarek) → feedback-
Reporter | ||
Comment 25•14 years ago
|
||
Let's resolve this WONTFIX and cross our fingers for bug 623617.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 26•14 years ago
|
||
I have a proof-of-concept patch implementing the approach described in comment 23; I think it's worth considering, so I'm re-opening this.
On my 4-core Linux-64 machine, the patch takes the time for a ccached build from 2:30 to 2:00. I'm timing non-ccached builds now.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reporter | ||
Comment 27•14 years ago
|
||
Reporter | ||
Updated•14 years ago
|
Attachment #502279 -
Attachment is obsolete: true
Reporter | ||
Comment 28•14 years ago
|
||
The patch takes my clobber ccache-miss builds from 12m to 10m.
Reporter | ||
Comment 29•14 years ago
|
||
cc'ing Mounir, who said he'd try this out on his 8-core machine.
Comment 30•14 years ago
|
||
It is failing with:
make[4]: *** No rule to make target `media/libvpx_platform_libs', needed by `layout_platform_libs'. Stop.
Reporter | ||
Comment 31•14 years ago
|
||
Do you have a nonstandard mozconfig that disables a bunch of stuff? Can you try with a vanilla one?
Comment 32•14 years ago
|
||
(In reply to comment #31)
> Do you have a nonstandard mozconfig that disables a bunch of stuff?
Indeed... Using system libraries mostly.
> Can you try with a vanilla one?
Goes from:
real 10m53.785s
user 60m55.314s
sys 5m28.643s
To:
real 7m22.727s
user 66m14.474s
sys 5m59.150s
Reporter | ||
Comment 33•14 years ago
|
||
The 1.5x speedup on Monuir's machine (654s / 443s) seems like pretty strong validation of this approach to me. Not everyone has 8-core machines, but even 2 minutes on a 4-core machine (12m / 10m = 1.2x) is a significant improvement.
Alas this style of analysis is difficult (impossible?) to do on Mac because dtrace sucks [1]. It looks like we'll have to compute these dependencies manually.
[1] http://benjamin.smedbergs.us/blog/2008-11-20/282/
Reporter | ||
Comment 34•14 years ago
|
||
I hacked something together to parallelize the export step in a similar manner, but the speedup was pretty negligible; I think we should concentrate on libs, at least for now.
I promised Justin I would time this on my 8 physical/16 logical core box, and will do so before the week is out.
Reporter | ||
Comment 36•14 years ago
|
||
Let me post a new patch first! There's a bug in the one here which causes tier_platform_libs to be built twice. (The parallel one goes first, which is why we still see an improvement.)
Anyway, I'm testing the fixed one now.
I'll test whatever's current on Thursday, before I box up my apartment on Friday.
Reporter | ||
Comment 38•14 years ago
|
||
This doesn't have the bug described in comment 36, and it works on my mac with
a vanilla mozconfig. It shouldn't be too much work to get it to work with
other mozconfigs.
Reporter | ||
Updated•14 years ago
|
Attachment #525565 -
Attachment is obsolete: true
Reporter | ||
Comment 39•14 years ago
|
||
$ rm -rf ../debug && time make -f client.mk
Mac 10.6. -j16 on a 4-core i7 with hyper-threading with SSD. ccache disabled.
unpatched:
real 13m10.292s
user 63m28.180s
sys 6m31.702s
patched:
real 11m58.820s
user 71m12.840s
sys 7m29.769s
Ubuntu 10.10. -j12 on 4-core i7 with hyper-threading with SSD. ccache disabled.
unpatched:
real 9m59.382s
user 45m49.110s
sys 4m4.930s
patched:
real 8m1.331s
user 48m56.840s
sys 4m26.100s
I think these results are faster than before on the Ubuntu box because I actually disabled, rather than cleaned, ccache. (The relative improvement is also larger; 10/8 = 1.25 > 1.2. This is probably because I fixed the bug from comment 36.)
The higher usertime indicates that we might still be doing some extra work with the patch; I'll see if I can find it.
Reporter | ||
Comment 40•14 years ago
|
||
Reporter | ||
Updated•14 years ago
|
Attachment #528363 -
Attachment is obsolete: true
Reporter | ||
Comment 41•14 years ago
|
||
khuey and I were able to come up with four possible arguments against taking this patch.
a) It makes adding a top-level directory harder, since you have to manually resolve its dependencies. But we don't add top-level directories so often, so maybe this is OK.
b) A top-level directory might acquire a new dependency without us noticing, and so builds will start to fail randomly. But since this patch only touches the tools and libs step, and since the tools step is dependency free afaict, we don't expect this to be a problem.
c) The patch might not capture all dependencies between folders, so builds will randomly fail.
Reporter | ||
Comment 42•14 years ago
|
||
Ack. I meant to cancel that bzexport! It's not quite ready; there's an (apparently harmless) make check failure I need to fix first.
One other argument against this is that it distracts from the Plan To Rewrite The Build. But it's a small change and gives us a large improvement we can have Now, rather than RSN.
I'll push a new patch once I fix the make check issue.
Reporter | ||
Updated•14 years ago
|
Attachment #534859 -
Attachment is obsolete: true
Reporter | ||
Updated•14 years ago
|
Attachment #534884 -
Attachment description: Patch for review v1 → Proof of concept v9
Reporter | ||
Comment 43•14 years ago
|
||
Fixed the make check orange -- I just needed to sync config/rules.mk with js/src/config/rules.mk.
There are still a few ouststanding issues with this patch (marked with XXX's), but I could use some help with them.
Reporter | ||
Updated•14 years ago
|
Attachment #534884 -
Attachment is obsolete: true
Reporter | ||
Updated•14 years ago
|
Attachment #535168 -
Flags: review?(ted.mielczarek)
Comment 44•14 years ago
|
||
Joey has something that sounds similar in bug 623617, a script that works out dependencies and builds things in the right order. I'm still not a fan of encoding additional complexity into the build system, but if we could make this automated as part of the build it might be a nice intermediate step.
Reporter | ||
Comment 45•14 years ago
|
||
> I'm still not a fan of encoding additional complexity into the build system,
> but if we could make this automated as part of the build it might be a nice
> intermediate step.
Intermediate in the sense of "before properly rewriting the build"?
Comment 46•14 years ago
|
||
Yeah, that's what I mean. If your list of dependencies is auto-generated, can we just run that script during configure to generate them? That would seem to make this a lot less painful to maintain.
Reporter | ||
Comment 47•14 years ago
|
||
> If your list of dependencies is auto-generated, can we just run that script
> during configure to generate them?
Not with the approach I took here. The dependencies are generated by running a clobber build under strace.
They're also not 100% auto-generated. dtruss on Mac doesn't give me enough information to play the same game I play on Linux, and I didn't even try on Windows. So for those top-level folders built only on Mac or Windows, you'd need to build the dependencies by hand.
Because the parallelism is at the granularity of top-level directories and only in the libs step, where the dependencies aren't so complex, I'm not sure the maintenance burden would be so huge. Looking at hg log, we don't seem to mess around with toolkit-tiers.mk too often.
Comment 48•13 years ago
|
||
Comment on attachment 535168 [details] [diff] [review]
Patch for review v1
I can't in good conscience r+ this. I think the maintenance is going to be too much of a burden.
Attachment #535168 -
Flags: review?(ted.mielczarek) → review-
Reporter | ||
Comment 49•13 years ago
|
||
What if we left it off by default?
I've been building with this patch for months now, and updating it has been very easy. Since I'm already doing the work to keep my patch in sync, I'd be happy to volunteer to keep the code in sync in the tree.
I've had to update the patch just three times since the end of May. But of course I haven't gone through the work to prove that the patch's dependencies are still correct.
Reporter | ||
Comment 50•13 years ago
|
||
> What if we left it off by default?
To be clear, you can't do this with the current patch. But I imagine it'd be possible to add a configure switch.
Comment 51•10 years ago
|
||
I feel like glandium fixed most of this with his build system work.
Updated•7 years ago
|
Product: Core → Firefox Build System
Updated•2 years ago
|
Severity: normal → S3
Comment 52•1 year ago
|
||
we are doing a lot already. closing
Status: REOPENED → RESOLVED
Closed: 14 years ago → 1 year ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•