Unable to build, "recipe commences before first target" during shlibsign export in coreconf/rules.mk
Categories
(NSS :: Build, defect, P3)
Tracking
(Not tracked)
People
(Reporter: mark, Unassigned)
Details
(Whiteboard: [nss-nofx])
I'm trying to upgrade our NSS version in our Mozilla fork from a custom-sec-patched 3.48 to 3.59 to pick up additional features and bugfixes, but I'm running into a build issue during the configuration stage. We've resisted upgrading sooner due to the issue with dbm not being built any longer by later versions of NSS which we still had a need for.
0:14.87 ../../coreconf/rules.mk:164: *** recipe commences before first target. Stop.
0:14.87 Makefile:451: recipe for target 'private_export-nss/cmd/shlibsign' failed
0:14.87 mozmake.EXE[5]: *** [private_export-nss/cmd/shlibsign] Error 2
The indicated line is the following:
$(eval $(call PROGRAM_template,$(PROGRAM)))
which seems to occur outside of any target rule...
I'm a bit at a loss as to what I'm missing and why others don't seem to have trouble building later versions of NSS (as I haven't found this problem mentioned elsewhere)?
Make is gnumake 3.81, using the Mozillabuild tools on Windows.
Comment 1•5 years ago
|
||
I don't have a Windows Make environment configured to test this. Given the change history [1], I expect this is something from 3.53. Would you be able to confirm that, or even better, bisect to one of those commits? Thanks!
[1] https://hg.mozilla.org/projects/nss/log/tip/coreconf/rules.mk
| Reporter | ||
Comment 2•5 years ago
|
||
I can confirm that 3.53 is where the trouble starts.
3.52.1 is fine and build.
3.53 throws the recipe error as well as the following, which is definitely incorrect as well since it's an x86_64 machine on an x64 build of windows in an x64 shell. Of note uname -m gives i686 and uname -p gives unknown in Mozillabuild Windows environments.
../../coreconf/arch.mk:150: CPU_ARCH is not x86_64, disabling -mavx2
(-mavx2 is also not valid for cl, by the way.)
I'm not sure how best I could bisect this further; I normally import nss/nspr updates from the RTM archives.
| Reporter | ||
Comment 3•5 years ago
|
||
I happened to run across nss-dev on github being a mirror of nss mercurial, and used that to narrow down the problem to the following commit:
Bug 290526 Handle parallel PROGRAM and PROGRAMS r=rrelyea
The cpu architecture issues are unrelated (although obviously wrong) as they manifest before that commit already.
Comment 4•5 years ago
|
||
I've set up mozilla-build (3.3 with GNU Make 3.81.90) environment, and I'm unable to reproduce this - the build succeeds.
Are you able to use the GYP build system instead?
| Reporter | ||
Comment 5•5 years ago
|
||
I'm not sure what exactly you tested; if it's a standalone build from mozilla-build there may be differences wo our environment.
Building of NSS is integrated in our source tree; as I already said this is a Mozilla fork. Comparable system for building NSS/NSPR would be Firefox 52-ESR as we haven't make significant changes to that part of the build system since forking, and it has always worked with NSS upgrades without any serious issues like this.
I'm open to suggestions to improve surrounding build system mechanics to make it build with the indicated culprit bug, or suggestions to otherwise being able to get NSS improvements into our tree.
Comment 6•5 years ago
|
||
Jan-Marek, are you able to take a look into this issue? It seems to occur as a result of https://phabricator.services.mozilla.com/D69016.
Comment 7•5 years ago
|
||
(In reply to Kevin Jacobs [:kjacobs] from comment #6)
Jan-Marek, are you able to take a look into this issue? It seems to occur as a result of https://phabricator.services.mozilla.com/D69016.
As a start, I would need to know, how Mark tries to build NSS. I already learned the hard way, that there are many build targets for partial builds and whatnot with a multitude of make flags to specify. So please write down the steps to reproduce you problem.
From the error message, it looks like make considers the indented $(1)_OBJS lines from the start of PROGRAM_template not as valid make variable assignments, so it sees them as recipes and correctly complains about a missing target. That is just an almost blind guess, based on the error message, because
$(1): $$($(1)_OBJS) $$(EXTRA_LIBS) | $$$$(@D)/d
is the start of the real target, so make's claim of a recipe without a previous target could just happen for the previous lines, if my guess is correct.
That eval doesn't belong to a target, as it dynamically generates the rule to build the PROGRAM (or PROGRAMS).
The whole patchset did a lot of changes to the NSS make-based build system and there were regressions fixed later.
| Reporter | ||
Comment 8•5 years ago
|
||
Jan-Marek: As stated I'm building NSS as part of our Mozilla-forked platform, UXP. Github mirror here in case that's easier for you. As stated we haven't made any significant changes to the way NSS is built since we forked off from ESR-52 so you could also use Firefox as a reference to likely reproduce the problem.
In fact, I don't understand the reason for that particular change to begin with, and the commit message stating "I have no real clue, why PROGRAMS is actually working in the sequence build." doesn't instill confidence that the build environments the library is used in were properly considered with this change -- regressions fixed were only fixed in environments known about; obviously this is yet another regression we didn't run into before now because we held off on upgrading NSS for a while to allow db migration for users. If not using this template and reverting this change that make complains about, would this cause issues with other environments?
As said I'm open to any suggested changes to our build environment as well if it's difficult in the NSS build process to keep compatibility with our environment, but it seems like if the make complaint is legitimate, that a different approach is needed.
Comment 9•5 years ago
|
||
(In reply to Mark Straver from comment #8)
Jan-Marek: As stated I'm building NSS as part of our Mozilla-forked platform, UXP. Github mirror here in case that's easier for you.
That info wasn't in the previous comments. I assumed you have just an NSS fork for with some custom build system around it, based on an old mozilla-build for some private project. But all the info didn't add up for me, so I simply asked.
As stated we haven't made any significant changes to the way NSS is built since we forked off from ESR-52 so you could also use Firefox as a reference to likely reproduce the problem.
I'm not a Firefox developer, but I remember it as a rather a massive setup, just like LibreOffice, where I'm a developer. I used to package (and patch) Firefox ESR releases from HG tags on Linux and automating that for Jenkins was a larger effort. But I know even Firefox patches a part of the NSS Makefile based build system, even if their build currently runs with gyp in the end. At least that is my understanding from a bug I got after the patches went in and NSS broke Firefox CI in some way. It's not in the regressions mentioned in bug 290526 and I couldn't find that info anymore, which might have helped you too.
In fact, I don't understand the reason for that particular change to begin with, and the commit message stating "I have no real clue, why PROGRAMS is actually working in the sequence build." doesn't instill confidence that the build environments the library is used in were properly considered with this change -- regressions fixed were only fixed in environments known about; obviously this is yet another regression we didn't run into before now because we held off on upgrading NSS for a while to allow db migration for users. If not using this template and reverting this change that make complains about, would this cause issues with other environments?
Generally the merged patches start with "bug 290526", which references the problem they fix: it allows a parallel build of NSS with make.
This is important for the LibreOffice CI, where NSS (and Firebird DB FWIW) took a long time to build and prevented the Windows CI build to fully utilize all assigned cores, as they used "make -j1" to build these external dependencies.
If you mean just this particular patch of the series, I don't remember for sure, if it was needed, but I guess the parallel build would fail without it. The whole series was rather massive and turned out to be much larger in size and work then I originally anticipated. The "I have no real clue, why PROGRAMS is actually working in the sequence build." isn't really a point in this case and rrelyea added a comment on the patch in Phabricator (https://phabricator.services.mozilla.com/D69016#inline-413732) for a possible explanation, why it did worked this way at all.
Regarding the "doesn't instill confidence that the build environments the library is used in were properly considered with this change": I didn't care at all, because I simply can't. I tested quite a few variants using the LibreOffice CI, but even LibreOffice still applies ...
$ ls -1 external/nss/patch | wc -l
22
... patches, to build NSS on all platforms, with configurations like clang, asan, ubsan, cygwin, etc. And LibreOffice CI builds NSS daily a hundred times on many platforms.
As said I'm open to any suggested changes to our build environment as well if it's difficult in the NSS build process to keep compatibility with our environment, but it seems like if the make complaint is legitimate, that a different approach is needed.
While I would try to reproduce the problem within NSS, I don't plan to debug an old Firefox build system to see, why current NSS fails the build in that setup. With the patch from bug 1668698, I haven't seen any more NSS build failures in months.
You can try to revert that patch, but as I stated it's part of a larger patchset, so I doubt a revert will just work for you.
Comment 10•5 years ago
|
||
Missed the additional escaping for:
ls -1 external/nss/*patch* | wc -l
| Reporter | ||
Comment 11•5 years ago
|
||
For the record, we don't build NSS with -j1 and have been building parallel without issues without bug 290526 applied, so I'm still at a loss what that was trying to solve. (the tree is normally built with -j20 on a 24-thread system).
I'm not doubting that it builds successfully in a different environment so whatever LibreOffice is doing (which I don't know) clearly works for them, but that really doesn't help me get back to a building state. The whole point of using NSS as an external library is so we don't have to patch it to hell when we update the library. And we're also using a wrapping makefile to set it up for many targets by setting the correct make flags on it for each, so tossing in some patches to apply isn't trivial either.
As far as I can tell the makefile format is simply wrong. Does it need specific environment variables to be set that we might not have? Does it rely on CI? Because we don't use CI to build. I'm trying to figure out what this "new" make file needs that we aren't providing, or for a change to it so it provides what it needs itself. I'm not a makefile expert, and something like $(1): $$($(1)_OBJS) $$(EXTRA_LIBS) | $$$$(@D)/d is total gobbedygook to me. Please help.
Comment 12•5 years ago
|
||
@Jan-Marek. Thanks for interacting with us here! We build for Firefox only using gyp but we still have the Make build in our CI to prevent breaking it when accepting external contributions. The only place where I've seen the Make build fail in the Fx CI recently is FIPS build when trying to parallelize more: https://treeherder.mozilla.org/jobs?repo=nss-try&revision=460a47c980d9e067f0342f68bb3e81717cc7e61e
@Mark. Any chance you are building NSS in FIPS mode?
| Reporter | ||
Comment 13•5 years ago
|
||
@Benjamin We do build NSS FIPS-compliant (i.e. we don't purposefully disable it with the export) and our applications explicitly support operating in FIPS mode.
I don't think we even get to the actual compilation stage though, as it fails very early in the build in the export phase; so would that even be a possible cause?
| Reporter | ||
Comment 14•5 years ago
|
||
Disabling FIPS by defining NSS_FIPS_DISABLED=1 doesn't solve the problem.
| Reporter | ||
Comment 15•5 years ago
|
||
Right, so, as far as I can tell from looking at the various bugs that are completely changing the build system of NSS using Make, the changes are very specific to dealing with an integration setup and totally incompatible with the way NSS has plugged effortlessly into our tree (and others') for many years. And I mean totally incompatible as it is pretty much a full rewrite. With Firefox using gyp to build, this was likely simply not noticed because there are not many other consumers of NSS using the supplied Make build system (at least for the most part) that use current versions of the lib.
While I appreciate the effort it must have taken for Jan-Marek to create these patch sets, it's unfortunately very destructive to the general purpose use of NSS as a library. It shouldn't have been necessary.
I've tried several options to restore functionality for us, but the only way I could get back to a working build for us was by fully rolling back the build system changes Jan-Marek made to serve the LibreOffice CI, i.e. reverting bug 1637083, bug 1629553, bug 1438431, bug 1638289 and bug 1642153, before finally backing out bug 290526 in its totality. No further changes were needed after that; NSS 3.59 builds happily and in parallel (-j20) in that state as-configured in our tree.
Many of the bugs mentioned seem to convert make rules to more cryptic templates using /d @D and eval that neither seem to work outside a specific environment, nor are easily understood by most people with a passing knowledge of makefiles, while removing a lot of the plumbing needed to build shared libraries in Mozilla fashion. Of course it's possible I'm misreading this because, as i said, I'm no makefile expert, but that's what it looks like to me having had to roll all of this back.
I'm incredibly non-plussed about what happened here. The code reversal I have had to do get back to a sane state is very extensive (because it literally completely changes all of the makefiles) and it is going to be a maintenance nightmare if that needs to be done with every future version of NSS, especially if it was just done because it was "slow to build" in LibreOffice's specific CI setup having had an issue requiring -j1 -- while there's no real issue building it in jobserver mode with a lot of threads as we've been doing for years on quite a few platforms (i.e. there was no inherent flaw with the makefile build system as it was before these changes).
On top, NSS doesn't take that long to build; a few minutes maybe?
Please do note UXP is at the base of a whole range of applications published by different people, many of which have network connectivity as their core functionality, so the impact of this is quite significant.
I understand that this puts us at odds. LibreOffice would prefer to keep the changed makefiles since it makes their CI faster, and we would prefer them to be reverted since it is a hard blocker preventing us from building outright. I'm not sure who can make these kinds of directive decisions and what factors weigh in here, but I do hope that all things being considered, this can be reverted.
I can supply the patches of my work (which un-busts 3.59) if needed.
Comment 16•5 years ago
|
||
(In reply to Benjamin Beurdouche [:beurdouche] from comment #12)
@Jan-Marek. Thanks for interacting with us here! We build for Firefox only using gyp but we still have the Make build in our CI to prevent breaking it when accepting external contributions. The only place where I've seen the Make build fail in the Fx CI recently is FIPS build when trying to parallelize more: https://treeherder.mozilla.org/jobs?repo=nss-try&revision=460a47c980d9e067f0342f68bb3e81717cc7e61e
I was referring to a problem with Firefox, when someone pulled in NSS 3.53 (RC?) initially. I don't remember, what the fix (or even bug) was. I guess it was an easy fix, quite probably on the Firefox side.
(In reply to Mark Straver from comment #15)
Right, so, as far as I can tell from looking at the various bugs that are completely changing the build system of NSS using Make, the changes are very specific to dealing with an integration setup
These patches are not about any integration. They fix the stand-alone, parallel NSS (and NSPR) build with make, especially on Windows.
So I just read config/external/nss/Makefile.in from your source tree, and it has a few comments like "Work around NSS's export rule being racy..." or "Work around NSS build system race condition..." referring to bug 193164 or bug 836220 or bug 844880 or bug 1133073. It doesn't look like a normal "make all" build at all, and is probably the reason it worked. The default build doesn't.
And I mean totally incompatible as it is pretty much a full rewrite.
Kind of "yes" for the 2nd half of that sentence :-( I guess some of your workarounds now break your build.
With Firefox using gyp to build, this was likely simply not noticed because there are not many other consumers of NSS using the supplied Make build system (at least for the most part) that use current versions of the lib.
There was at least bug 1653975 early on with many different Linux distros in CC, so I assume many people still build NSS standalone with make. That's also what LibreOffice does. (LFS 9.1 states for the NSS 3.50 build, that "This package does not support parallel build." (http://www.linuxfromscratch.org/blfs/view/9.1/postlfs/nss.html) so it uses "make -j1", which was dropped for LFS 10.0 with NSS 3.55).
[more stuff about the palemoon / old Firefox NSS integration and parallel workarounds / fixes]
FWIW: I just used the LibreOffice CI, because it's much more powerful and diverse, then my laptop with Debian and a Windows KVM. With a max of 4 real cores and just building NSS, I simply didn't hit many parallel build problems locally, which the LibreOffice CI would catch with a very few runs. LibreOffice CI just runs the default nss_build_all, which builds NSS including NSPR (albeit with additional patches for the integration into the LibreOffice build).
| Reporter | ||
Comment 17•5 years ago
|
||
I'm fine with moving forward with this by changing the way we build (I already tried a few things but they were blind guesses and did not work) but please understand that this was not created or set up by us, but by Mozilla prior to our fork point. We simply don't have the in-house expertise to do this ourselves without assistance since we didn't write the code or the workarounds that allow us to build in parallel and now (allegedly) break our build.
Bug 1653975 is completely unrelated as that's not what we're having issues with since we don't build standalone (I'm not sure how that wasn't clear yet) and that was obviously already part of NSS 3.59 as-imported.
If we can move to building fully standalone as a subproject then that would be fine as well but once more, we don't have the knowledge to do so -- our build system was inherited from Mozilla and there's no documentation aside from the scattered bug references in the makefile that don't really help.
Updated•5 years ago
|
Updated•5 years ago
|
Description
•