Closed Bug 11219 Opened 25 years ago Closed 23 years ago

Dependencies not strong enough for parallel builds

Categories

(SeaMonkey :: Build Config, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.0

People

(Reporter: tor, Assigned: cls)

Details

Attachments

(6 files)

Doing a parallel build (gmake MAKE="gmake -j12") reveals some weak
dependencies in the build system.  In the following directories,
an attempt is make to link the library together before all the
object files are ready.

	xpcom/build
	netwerk/build
	dom/src/build
	layout/build
	rdf/build
	extensions/wallet/build
I cleaned up several of these makefiles at just about the time you submitted
this bug.  Have you noticed any improvements since then?

I think this is going to be an on-going problem until AOL deigns to actually
put some money into my group's hands so we can buy the *Unix* machines we need
to test stuff like this.  I'd really like to set up a multi-processor SPARC
server as a Tinderbox machine using gmake -jN.
Status: NEW → ASSIGNED
Here's a list of directories with dependecy problems in an 8/15 cvs pull:

	xpcom/build
	netwerk/build
	dom/src/build
	widget/src/gtk
	layout/build
	rdf/chrome/build
	extensions/wallet/build
	mailnews/db/mork/build
	mailnews/base/build
	mailnews/addrbook/build
I talked to tor about this one on #mozilla a while back.  It seems the problem
occurs when we extract the objects from the "static" libs to build the huge
shared lib.  We tried a few fixes but they only seemed to make the problem
intermittent.  Why do we bother making those "static" libs anyways?  Wouldn't it
be much easier (and faster) to just leave the objs in the subdirs and keep track
of the objs from the build dir?
I argued for that when kipp first introduced the "composite" shared libs
made up of static libs.  I claimed it would save time and disk space if
we just built .o's and linked them into whatever .so needed them.  But
it was decided that that would add too much complexity to the makefiles
and too much local knowledge of "unrelated" modules.  We _still_ haven't
achieved real modularity, but apparently using static "sub-libs" allows
people to convince themselves that mozilla is a truly modular product.
brian: if you make these changes for mailnews, I'll approve them
Adding kipp & shaver to the cc:.  We are _never_ going to be completely modular
if people don't care who they link against.  In all of the cases I've checked so
far ** (extensions/wallet/build, db/build, netwerk/build, rdf/build), the other
"modules" that the shared lib dependents upon are really submodules of the
shared lib's module or in some case part of the shared lib's module.  Sure, now
we can say the msgmork library is "independent" of the mork library by building
a static lib.  But we should not make mork dissect msgmork just for the sake of
"independence."  Especially when it appears to be the cause of some build
breakage.

NSPR does something similar to this but handles it sanely.  Take a look at
pr/src/md/unix/objs.mk & pr/src/Makefile.in on the
AUTOCONF_NSPR_WIN32_XCOMPILE_19990621_BRANCH .

** I just ran into the exceptions:
xpcom/appshell/eventloop/photon/Makefile.in which uses xp...but shaver informed
me that xp was being removed from the build.
widget/src/$TOOLKIT which *each* dissect
widget/src/xpwidgets/libraptorbasewidget_s.a  and the primary toollkit is
in-turn dissected in widget/src/build along with libraptorbasewidget_s.a again.
mailnews/addrbook/libaddrbook.so which includes rdf/util/src/librdfutil_s.a
mailnews/base/build/libmsgbase.so which includes rdf/util/src/librdfutil_s.a
mailnews/local/build/libmsglocal.so which includes rdf/util/src/librdfutil_s.a

So 7.5 exceptions to the 28 cases where this would be better handled by an
objs.mk like NSPR uses rather than the dissection rule.  All of which should be
thrown out as they are causing symbols to appear in multiple libraries.
mass reassigning briano's open bugs to me while he's on sabbatical.
accept bug.
mass move to M14.
Target Milestone: M14 → M18
Ok, I made what I think is some progress on this.   I managed to remove about
half of the SHARED_LIBRARY_LIBS usage from my tree.  Basically, for each
lib*_s.a, I create a objs.mk in the srcdir that creates the library.  Both the
Makefile.in that creates the lib*_S.a and the Makefile.in that links in that
lib*_s.a include the objs.mk file.  This gives us better dependency support for
the lib*_s.a's source files and we don't have to dissect the lib*_s.a.   

There is a caveat though. :(  Because I'm including every object file
individually on the link line with full relative paths from DEPTH, the links
lines can become fairly huge.  When I converted xpcom over, the link line for
libxpcom.so was over 3k!  I'm worried that we may hit some shell or process
argument size limit on some of our non-tier1 unix boxes.  I fully expect the
link line for layout to be twice the length of xpcom's.  Maybe I'll look into
incremental linking some more.
Also, because of the way we can potentially have additional CFLAGS and/ DEFINES
in each makefile, we cannot build the object files we depend upon from the
current directory as suggested in 'Recursive Make Considered Harmful'.  Instead,
we need to fork a make in the directory where the dependent objects need to be
built.
Great job, Chris.  Looks like you're all over this.  Do you want to reassign 
this bug to yourself and just put me on the Cc?

As for the 3k command, I had thought 2K was the typical line length limit for sh 
and csh which is why we need xargs, but I must be mistaken since the 3K command 
worked.  Even if it breaks some of the older non-tier 1 systems, it's a step in 
the right direction.
reassigning to cls per our conversation.
Assignee: granrose → cls
Status: ASSIGNED → NEW
Ok, so I underestimated just a tad.  Using --enable-mathml, the length of the
link line for libraptorhtml.so came to just under 13k.  I think the posix
standard length is 4k so this isn't going to work.  Do we know of any linkers
that don't do partial (or incremental) linking?
Status: NEW → ASSIGNED
mass re-assign of all bugs where i was listed as the qa contact
QA Contact: cyeh → chofmann
I applied the changes to the m16 tree so I would have a stable tree to work
from.  The link line for libraptorhtml.so is up to 15k now.  After a brief
conversation with Brad on irc, I'm not as concerned about the command line
length.  I realized that for non-gcc builds, some of our compile commands are
over 5k.  If a platform has a small shell line limit, chances are that they
cannot build mozilla anyways.  And if we can split up libraptorhtml.so (bug
#43142), then all of this should be a moot point.
Also, I forgot to mention that these changes signifcantly reduce the amount of
space needed to mozilla since they remove the unneeded lib*_s.a files.  On
linux, I see a savings of about 350M and on solaris, I see a savings of 600M. 
Both sets of builds were configured with: --enable-nspr-autoconf --enable-mathml
--enable-svg --with-extensions

I have been informed by Colin that the proposed changes will not work on OpenVMS
as it has a 4k cmd line limit.  (Previous comment about 5k lines rescinded
...copy/paste error)  He suggested using a linker script which appears to
supported by GNU ld but not Sun ld.  To make things more interesting, on a
number of platforms, we call $(CXX) or $(CC) to link, not $(LD).  Passing the
linker script options to the linker via the compiler flag -Wl does not work.  

To help us focus on the Sun/Solaris specific bugs when we do a bug query, I'm
moving this one to Platform/OS category of PC/Linux which is the tier-1
supported Unix platform. There needs to be an AllUnix Bugzilla platform
category.
OS: Solaris → Linux
Hardware: Sun → PC
adding myself to this one...
On the long drive home this weekend, I had an ephiphany.  It was so simple.  Why
don't we just use symlinks?  As in, symlink the dependent obj from two
directories away into the current directory and actually link against the local
symlink.

In rules.mk, add:
LDEP_OBJS               = $(notdir $(DEP_OBJS))

and for each target that uses DEP_OBJS, add:
        @echo $(LDEP_OBJS) | xargs rm -f
        @$(foreach f, $(DEP_OBJS), ln -s $f $(notdir $f);)
...
        rm -f $(LDEP_OBJS)

Using the local symlinks causes the link lines to shrink by about 50%. 
Unfortunately, due to the number of files in layout this is still too large
(where's Jenny Craig when you need it?).

floating:obj> wc foo2
      1     362    6593 foo2

The other question is how do OpenVMS & OS/2 handle symlinks?  Will they be able
to take advantage of such a change or do I need to head back to the drawing
board?
symlinks are already used throughout the build, so using symlinks here shouldn't
be a problem. Now if you could just name those local libraries L1, L2, L3... we
might be able to get the size of the command line down to something reasonable
(unfortunately at the cost of readability).
Target Milestone: M18 → mozilla1.0
The original focus of this bug has been fixed as we've had -j4 tinderboxes &
nightly builds for a long time now.  We no longer allow the building of static &
non-static libs in the same tree & the "static" build uses a completely
different process so the bug that triggered this problem shouldn't occur again.
 I still want to get rid of those intermediate libs but that's for some
indeterminate future date. Marking fixed.

Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: