Closed Bug 290526 Opened 16 years ago Closed 2 months ago

Missing dependency information prevents parallelized building of NSS

Categories

(NSS :: Build, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: georg.schwarz, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted)

Attachments

(12 files, 6 obsolete files)

484 bytes, patch
Details | Diff | Splinter Review
588 bytes, patch
Details | Diff | Splinter Review
196 bytes, patch
Details | Diff | Splinter Review
5.02 KB, patch
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
User-Agent:       iCab/2.9.8 (Macintosh; U; PPC)
Build Identifier: 

When building nss 3.9.2 on a Dual-Pentium system running Debian Linux using make -j2 (GNU make 3.79.1) I end up like this:

schwarz@sendai:~/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/nss$ env BSD_LDOPTS="-L/usr/local/pkg/lib/nspr -Wl,-R/usr/local/pkg/lib/nspr -Wl,-R/usr/local/pkg/lib/nss" LIBRUNPATH=/usr/local/pkg/lib BUILD_OPT=1 make -j2 nss_build_all
cd ../coreconf ;  make
cd ../dbm ; make export libs
make[1]: Entering directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf'
make[1]: Entering directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/dbm'
cd nsinstall; make export
cd include; make export
make[2]: Entering directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf/nsinstall'
make[2]: Entering directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/dbm/include'
make[2]: Nothing to be done for `export'.
make[2]: Leaving directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf/nsinstall'
cd nsinstall; make libs
../../coreconf/nsinstall/Linux2.4_x86_glibc_PTH_OPT.OBJ/nsinstall -R -m 444 ../../../dbm/include/nsres.h ../../../dbm/include/cdefs.h ../../../dbm/include/mcom_db.h ../../../dbm/include/ncompat.h ../../../dbm/include/winfile.h ../../../dist/public/dbm
make[2]: ../../coreconf/nsinstall/Linux2.4_x86_glibc_PTH_OPT.OBJ/nsinstall: Command not found
make[2]: *** [export] Error 127
make[2]: Leaving directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/dbm/include'
make[1]: *** [export] Error 2
make[1]: Leaving directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/dbm'
make: *** [build_dbm] Error 2
make: *** Waiting for unfinished jobs....
make[2]: Entering directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf/nsinstall'
gcc -o Linux2.4_x86_glibc_PTH_OPT.OBJ/nsinstall.o -c -O2 -fPIC -DLINUX1_2 -Di386 -D_XOPEN_SOURCE -DLINUX2_1 -ansi -Wall -pipe -DLINUX -Dlinux -D_POSIX_SOURCE -D_BSD_SOURCE -DHAVE_STRERROR -DXP_UNIX -UDEBUG -DNDEBUG -D_REENTRANT -I../../../dist/Linux2.4_x86_glibc_PTH_OPT.OBJ/include  -I../../../dist/public/coreconf -I../../../dist/private/coreconf  nsinstall.c
mkdir: cannot create directory `Linux2.4_x86_glibc_PTH_OPT.OBJ': File exists
make[2]: *** [Linux2.4_x86_glibc_PTH_OPT.OBJ/pathsub.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf/nsinstall'
make[1]: *** [libs] Error 2
make[1]: Leaving directory `/usr/src/pkgsrc/pkgsrc/devel/nss/work/nss-3.9.2/mozilla/security/coreconf'
make: *** [build_coreconf] Error 2

omitting the -j option makes the build process work fine.

Reproducible: Always
This is a known problem.  We don't plan to fix this.
Our makefiles were written a long time ago, and it's
a lot of work to parallelize makefiles after they were
written.

We would appreciate any help though.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: wtchang → nobody
QA Contact: wtchang → build
Keywords: helpwanted
Priority: -- → P3
Summary: nss does not build when using GNU make with option -j2 on an SMP machine → Missing dependency information prevents parallelized building of NSS
Except for one hunk (which creates a symlink from .so.1 to .so), the changes have to do with dependency information:

`all' depends on `libs', which depends on `export'. If any PRIVATE_EXPORTS are defined, then `export' also depends on `private_export'.

Various files (such as objects) explicitly depend on the OBJDIR -- which has a rule of its own. This not only helps build-parallelization, but is faster -- because make can remember, that it already created the OBJDIR and does not need to check, whether it exists (and is a directory) for every target.
This patch ensures, coreconf bits (nsinstall) are built before `export' is attempted.
In addition to the four submitted patches, the explicit

   export:: private_export

needs to be removed from all Makefiles, where it is found, as it breaks things. Something like:

find mozilla/security/nss -name Makefile | xargs /usr/bin/grep -l '^export:: private_export' | xargs /usr/bin/sed -i.bak -e '/^export:: private_export/d'

After that I was able to build nss with -j2, -j4, and -j11 on two different FreeBSD machines.
Mikhail, thank you for your patches. I am surprised how simple they are. This weekend, I experimented on getting NSS to build in parallel and by using non-recursive make for NSS. My changes were much bigger than your changes (I completely replaced coreconf/rules.mk with my own).

I found that if I add dependencies on $(OBJDIR) then, at least on Windows, the next rebuild would rebuild everything that depended on it. I guess this is because the directory's timestamp changes during the build? For that reason, I had to write dependencies as $(OBJDIR)/directory.exists, and write:

$(OBJDIR)/directory.exists:
	mkdir $(OBJDIR)
	touch $@

Perhaps I was doing something wrong though.

How much faster is the build with your patches, "make -j1" vs "make -j4" vs "make j11"? Can you post the timings from the "time" command?
(In reply to Brian Smith (:bsmith) from comment #8)
> I found that if I add dependencies on $(OBJDIR) then, at least on Windows,
> the next rebuild would rebuild everything that depended on it.

You are right. I'm working with the FreeBSD port of nss -- where the code is only built once anyway so I did not notice this effect. But it, likely, is not acceptable for ongoing development. Your method (of depending on a file inside $(OBJDIR)) seems find, though.

> How much faster is the build with your patches, "make -j1" vs
> "make -j4" vs "make j11"?

Here is from my quad Xeon FreeBSD/amd64 machine:

mi@narawntapu:ports/security/nss (1067) rm -rf work/ ; time csh -c "make MAKE_JOBS_NUMBER=11 > & m11"
74.584u 17.719s 0:44.58 207.0%  6052+2127k 138+1195io 144pf+0w
mi@narawntapu:ports/security/nss (1068) rm -rf work/ ; time csh -c "make MAKE_JOBS_NUMBER=4 > & m4"
72.791u 16.854s 0:44.33 202.2%  6025+2121k 2+1189io 144pf+0w
mi@narawntapu:ports/security/nss (1069) rm -rf work/ ; time csh -c "make MAKE_JOBS_NUMBER=1 > & m1"
69.714u 14.179s 1:22.40 101.7%  5876+2084k 0+1191io 144pf+0w

Not much difference between 11 and 4, but m1 is much slower...
Can anyone elaborate on the reason, `nsinstall -D' is used to create directories instead of mkdir (or md on Windows)?
(In reply to Brian Smith (:bsmith) from comment #8)
> $(OBJDIR)/directory.exists:
> 	mkdir $(OBJDIR)
> 	touch $@

A better choice for a file inside $(OBJDIR) would've been $(OBJDIR)/depend.mk, which is supposed to be regenerated by mkdepend. Unfortunately, it seems, that content of coreconf/mkdepend has bit-rotted (does not build out-of-the-box, time-stamps are from Feb 2009)...

Does it make sense to fix the tool and make the build use it every time, or is no one relying on it anyway -- and it should be ripped-out from rules.mk to make the file more readable? Please, advise. Thanks!
This incorporates Brian's suggestion for depending on a file inside $(OBJDIR). Timings are the same as before, but subsequent invocations of gmake do not rebuild everything over again, because "directory.exists" files are already there.
Attachment #585540 - Attachment is obsolete: true
(In reply to Mikhail Teterin from comment #11)
> Does it make sense to fix the tool and make the build use it every time, or
> is no one relying on it anyway -- and it should be ripped-out from rules.mk
> to make the file more readable? Please, advise. Thanks!

I would say that we should not rely on the mkdepend tool for an initial patch, and that we should also avoid removing the rules for it in rules.mk. In my modified version of the build system, I simply make every *.obj and every *.res file depend on every header--i.e. it basically forces a full rebuild when any header changes. That isn't ideal, but it is basically what we have to do now when we touch coreconf/coreconf.dep after editing a header.

I will post my patches for the de-recursified build system for NSS soon. Then we can decide which path to go down, regarding optimizing the build system. If I understand your numbers correctly, you ended up making the full build finish in about half the time of the current build system. In my de-recursified version, for far, the build finishes in about one fourth the time of the current build system. But, my patch is much larger than yours and it isn't finished yet.
Attached file Bug 290526 Don't delete directories (obsolete) —

If these files exist and aren't directories, there might be other
problems. Trying to fix them just will break the build.

Depends on D68692

This way we can separate shared and static build much easier.
And IMPORT_LIB_SUFFIX if now really a WIN32 platform specific.

I have no real clue, why PROGRAMS is actually working in the
sequence build. There is no special make code really handling it,
except for the install target.

This patches code is inspired by the $(eval ...) example in the
GNU make documentation. It generates a program specific make
target and maps the programs objects based on the defined
variables.

Depends on D69015

Stuff actually depends on the generated target to be copied into
the dist/ directory. And order-depend on the SOURCE_* directories,
so they are created before any copy attempts.

Depends on D69016

If these files exist and aren't directories, there might be other
problems. Trying to "fix" them by removing will break the build.

Depends on D69017

This still serializes many targets, but at least these targets
themself run their build in parallel. The main serialization
happens in nss/Makefile and nss/coreconf/rules.mk's all target.

We can't add these as real dependencies, as all Makefile snippets
use the same variable names. I tried to always run sub-makes to
hack in the depndencies, but these don't know of each other, so
targets very often run twice, and this breaks the build.

Having a tests:: target and a tests directory leads to misery (and
doesn't work), so it's renamed to check.

Depends on D69018

Copying private headers is now simply included in the exports
target, as these headers use an extra directory anyway.

Depends on D69019

Gets rid of a lot of duplicate Makefile code.

Depends on D69021

Double-colon rule behaviour isn't really compatible with parallel
build. This gets rid of all of them, so we can codify the
directory dependencies.

This leaves just three problems, which aren't really fixable with
the current build system without completely replacing it:

  • everything depends on nsinstall
  • everything depends on installed headers
  • ckfw child directories depend on the build parent libs

This is handled by the prepare_build target.

Overall this allows most if the build to run in parallel.

Depends on D69022

Attachment #9137154 - Attachment description: Bug 290526 Use SHARED_LIBRARY_NAME and drop the "32" default → Bug 290526 Introduce SHARED_LIBRARY_NAME

In my build none of the dlls have the "32" included in their name.

Bob,

I've done a first pass through these patches but still need to rekindle my Make-fu. In the meantime, could you suggest someone at RH to add as a reviewer for these Makefile changes, too? They look to be quite valuable! Thanks!

Flags: needinfo?(rrelyea)

either Daiki or I would be fine. You can add me if you want. (serves me right for passing on a gmake patch review to you yesterday).
bob

Flags: needinfo?(rrelyea)
  • Drop "32" default DLL suffix
  • Generalize resource file handling
  • Generate IMPORT_LIBRARY based on IMPORT_LIB_SUFFIX

I just realized, that all the patches from Phabricator are mirrored in Bugzilla and there are some additional comments. Maybe I should have added the following info here, before I starting submitting my patches.

My main motivation for these patches is making the LibreOffice Windows CI builds a few minutes faster, because they have to run the NSS build with -j1, even if the CI VM has a multitude of CPUs / cores. For Linux and Mac it's a smaller problem, as these platforms use ccache normally, but the first build is still slow.

FWIW: the original work was done in https://gerrit.libreoffice.org/c/core/+/74751 and there even was my - also reverted - predecessor in https://gerrit.libreoffice.org/c/core/+/62011

LO tested gyp + ninja last month (some info in https://lists.freedesktop.org/archives/libreoffice/2020-February/084461.html), but had too many different failures on Windows, so that change was reverted in general after a week of fixes in https://gerrit.libreoffice.org/c/core/+/91199.
I didn't look into this in detail, but AFAIK we had some problems with crashing python binaries for Win32 builds, rpath problems, utf-8 encoding, etc.

And there are a few more NSS build patches in https://git.libreoffice.org/core/+/refs/heads/master/external/nss/, but I guess most are a result of the way LO builds NSS on all the different platforms.

Attachment #9136540 - Attachment description: Bug 290526 Force synchronous PDB writes → Bug 290526 Write separate PDB files

The google_test gtest build doesn't provide any exports for the
shared library on Windows and the gyp build also builds just a
static library. So build gtest and gtestutil libraries as static.

For whatever reason, the Windows linker doesn't find the main
function inside the gtestutil library, if we don't tell it to
build a console executable. But linking works fine, if the object
file is used directly. But since we can have different main()
objects based on build flags, we enforce building console
applications binaries.

Depends on D69021

I just pushed a stripped down patchset. I removed all the cleanup patches, except for the private_exports patch, which saves one complete walk of the whole NSS source tree. It now passes the LibreOffice CI build on Linux, Windows and MacOS with debug and release config, including the previously failing gtests build. I'll open a new bug for all the patches I've currently removed / couldn't push, because of the missing bug number.

My main "problem" is still the freebl Makefile. While the current solution in D69023 seem to work, I don't like it. OTOH it's the already existing solution without the double-colon libs target. I kind of understand why it works, but I don't like overwriting the libs: rule. What I don't understand is the fact, that using TARGETS += and .PHONY += instead didn't work for me in my LibreOffice environment, but it worked in my separate NSS only environment :-(

The missing patches remove a few thousand lines of Makefile comments, duplicate empty lines, a further reworked LIBRARY_NAME rule and remove a few now basically empty config.mk snippets. There is also a minor VERSION define change in the pk11wrap code, to get rid of the naming "clash" with the LIBRARY_VERSION in the Makefile (not really a clash, as the library is actually static, but it's confusing to read). Current diffstat of the remaining patches is "239 files changed, 213 insertions(+), 4757 deletions(-)".

Thanks for the review and comments so far.

Blocks: 1629553

Comment on attachment 9137161 [details]
Bug 290526 Use eval templates for export and freebl

Revision D69022 was moved to bug 1629553. Setting attachment 9137161 [details] to obsolete.

Attachment #9137161 - Attachment is obsolete: true

Comment on attachment 9139466 [details]
Bug 290526 Cleanup WIN shared library rules

Revision D70369 was moved to bug 1629553. Setting attachment 9139466 [details] to obsolete.

Attachment #9139466 - Attachment is obsolete: true
Attachment #9138026 - Attachment is obsolete: true
Attachment #9137154 - Attachment is obsolete: true
Attachment #9136540 - Attachment description: Bug 290526 Write separate PDB files → Bug 290526 Write separate PDBs for test OBJs
Attachment #9136539 - Attachment is obsolete: true
Attachment #9137156 - Attachment description: Bug 290526 Add dist-install to generating targets → Bug 290526 Handle empty install variables
Regressions: 1637083

While the dependency information is now fixed, the parallel debug build will still fail on Windows, because the NSPR patch isn't merged yet.

And will someone adapt the CI, so it'll test the parallel build? Should I file an additional bug for that?

and NSPR .. oops, thanks Jan-Marek!

https://hg.mozilla.org/projects/nspr/rev/23940b78e965762287b772a8b6e7acf314d1aa2f

For parallel building, I think we should file a companion bug to https://bugzilla.mozilla.org/show_bug.cgi?id=1637102. IMO we should just make all Make builds parallel for CI.

If you want to file the bug and take a crack at editing the build config - it should just be this line and the equiv ones to set MAKEOPTS to -j16 or similar: https://searchfox.org/nss/rev/81ca42b7138b63ec48eec02fea6f76d87a099c60/automation/taskcluster/graph/src/extend.js#196

How soon do you need to pick up the NSPR change? FF 78 or 79?

Flags: needinfo?(jjones)
See Also: → 1640515
Blocks: 1640515

AFAIK Firefox uses gyp to build NSS (from bug 1638289), so probably also NSPR. So while the NSS patches are in 3.53, the included NSPR version is still 4.25, which doesn't include the patch. In the end that NSPR change just matters for NSPR, configure + make, parallel, Windows, MS VS, debug builds. But since the FF builds aren't constantly failing and AFAIK nobody ever complained about this failure, I seem to be the only one running this kind of builds. Mainly - I guess - because everyone had to build NSS sequential since "ever". And I'll just patch LibreOffice's NSS build with this minimal change, until it lands upstream.

So from my POV the NSPR patch doesn't matter for FF at all.

Regressions: 1642153

The NSPR patch should be in this next release set. Clearing my needinfo -- and Ben made some progress in Bug 1640328, just have to figure out some gcc-4 issues.

Flags: needinfo?(jjones)
You need to log in before you can comment on or make changes to this bug.