Closed Bug 464406 Opened 16 years ago Closed 15 years ago

Fix signtool regressions, make signtool work

Categories

(NSS :: Libraries, defect)

3.11.1
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
3.12.3

People

(Reporter: nelson, Assigned: nelson)

References

()

Details

(Keywords: regression, testcase)

Attachments

(4 files)

In NSS 3.11.x several changes were made to loader.c to resolve these bugs:
Bug 274984 -  libsoftokn3 fails to load libfreebl in setuid programs
Bug 323079 -  When libsoftoken loads the freebl library, we don't always 
              want to resolve the symbolic link.

The net effect is that loader now calls PR_GetLibraryFilePathname to 
try to get the path name for softoken, then takes the directory name
from that path name, and then tries to load the new library from that
same directory.  These patches had the intended effect on programs that 
link with and use the shared library version of softoken.  

Unfortunately, they introduced a regression.  In programs that are 
built with USE_STATIC_LIBS = 1, the code that attempts to get the 
patch for libsoftoken3.so actually gets the path name for the 
executable program file itself.  Consequently, unless the library 
to be loaded (e.g. freebl) exists in the same directory as the 
executable, it fails to load.  This usually causes NSS_Init to fail.

This can be plainly seen on Solaris with NSS 3.11.x installed. Any 
attempt to use the signtool program in a way that tries to initialize NSS 
(e.g. doing more than displaying the usage message) now fails.  The command "signtool -L" suffices to demonstrate this.  signtool is in /usr/sfw/bin, 
and freebl does not live in that directory.  

Our QA completely failed to detect this regression.  It was reported to us by 
a user TWO YEARS AGO in November 2006, but we still failed to identify the 
problem at that time.  See http://www.nabble.com/signtool.exe-td7260366.html
We definitely need to add QA tests for this.

There are workarounds for this.  Find the directory in which the 
actual signtool program file (not a symlink to it) lives and create
in that directory symlinks to the NSS shared libraries that are 
dynamically loaded, and to their corresponding .chk files.

Alternatively, create a new directory, put a copy (not a symlink) of
the signtool executable there, and also put in that directory symlinks
to the NSS shared libraries that are dynamically loaded, and to their corresponding .chk files.  Then execute the program from that new 
directory.

Yet another workaround that seems to work on Solaris is to create a 
lib directory that is a sister directory to the directory in which
the signtool executable lives (that is, in the signtool directory
create ../lib).  On Solaris, this might be /usr/sfw/lib.  Then create 
symlinks to the dynamically loaded NSS shared libraries in that directory. 
I'm not sure why that works, or even if it should, but it seems to work. 

This needs a real fix for NSS 3.11.x, and I'm not sure what it is.  
In NSS 3.12, signtool uses the NSS shared libraries, I believe, so this 
is probably not an issue for 3.12 for signtool.  But other programs that
are built with USE_STATIC_LIBS = 1 include
bltest, dbck, fipstest, ocspclnt, rsaperf, signtool & signver.

It may also be desirable to back port some other bug fixes for signtool 
from NSS 3.12, especially the ones for signtool seen at 
https://bugzilla.mozilla.org/attachment.cgi?id=332303&action=diff#mozilla/security/nss/cmd/signtool/certgen.c_sec1
Nelson,

freebl is a private part of NSS, not a public one. Even the freebl shared library names are private.

Thus, any program built with USE_STATIC_LIBS=1 that contains a static copy of loader will not work properly going forward, not only because the older loader can't locate the freebl libraries due to this bug, but also because the library names are free to be changed in any future of NSS .

I believe the loader code only tries to load freebl in the same path as the module that contains loader *first*. But if that first load fails, it falls back to loading the library without a pathname.

Thus, as long as the freebl library names don't change in a future version of NSS, I believe there should be a simple workaround - the user can set PATH / LD_LIBRARY_PATH manually, and that should be sufficient to make the 3.11.x signtool binary work with 3.12.x.

The same bug exists between 3.10 and 3.11 . We renamed the freebl shared libs on all platforms, but in 3.10 many platforms didn't have freebl shared libs. For ones that didn't, the signtool binary appeared to "work" with 3.11 shared libs, but that's only because it was statically linked with softoken. On the platforms that did, like Sparc, HP-UX, and AIX, the 3.10 signtool binary is also broken when using 3.11 shared libs - the freebl libs were renamed and the 3.10 loader will never find them in the 3.11 distribution.

Going forward, since the freebl library names may change, the correct fix is to either backport the fix we did to signtool to not have USE_STATIC_LIBS to 3.11.x, or to mark the bug as fixed in 3.12 .

We should also be careful never to distribute binaries for any other tools that still have USE_STATIC_LIBS set. I believe signtool is the only one.
I meant that signtool is the only binary with USE_STATIC_LIBS that we have distributed to customers, eg. with Solaris. We still have other tools that have USE_STATIC_LIBS, but they are internal QA tools which are not distributed, and binary compatibility is not relevant for them.
Note that if we are only talking about making the 3.11 signtool bits work with 3.11 shared libs bits, setting the LD_LIBRARY_PATH / PATH should do the trick too. I just verified that this simple workaround fixes the issue on a Solaris 10 system with 3.11.9 installed.

Setting LD_LIBRARY_PATH or equivalent would also be required to get signtool to work on any previous releases of NSS on platforms that had freebl shared libs, ie. 3.9.x and 3.10.x on Solaris Sparc, HP-UX, AIX.

This bug has been with us for 10 years. It is only more visible in 3.11 since all platforms have freebl shared libs vs just a few previously. But the workaround for the problem is simple enough for the user, and doesn't require any code change. I would recommend WONTFIX for 3.11.x, since we have already done the correct architectural fix in 3.12 .
OS: Windows XP → All
There is now code in loader.c that gets the path name of the shared lib or
executable in which it resides, and then uses that path name to try to load
freebl.  That code was new in 3.10 or 3.11 and then enhanced to handle 
symlinks in 3.11. (I think it was new in 3.11, and then back ported, but 
I'm not sure).  

When that code runs in softoken, it generally finds freebl.  But when it 
runs in the executable because of static linking, it doesn't.
Nelson,

Even in 3.11, loader always falls back to loading the shared lib with a simple filename if the absolute path method does not work. Thus, I believe the changes for symlinks are irrelevant. You are talking about something that has never worked before 3.12.

This is a very old bug. On Solaris sparc and other platforms with freebl shared libs, signtool in NSS 3.3, 3.9, or 3.10 would not be able to locate freebl any better than it can in 3.11 , unless LD_LIBRARY_PATH was set .

The only difference in 3.11 is that the set of platforms with freebl shared libs changed to be "all platforms". At that point, the signtool bug became visible on all platforms instead of just a few. For example, Windows, Solaris x86, Linux, OS/2, etc, which did not have any freebl shared libraries to load in 3.10 and before. You can call that a regression if you like, but I think of it as the same old signtool problem showing up on more platforms.

The workaround is simple enough - setting LD_LIBRARY_PATH, which is what our QA uses, and why signtool works in it and has worked all those years. We have now solved this problem in 3.12 so that doing this is no longer needed. 

Do we have a critical customer that wants a fix for having to set the LD_LIBRARY_PATH hand to use signtool ? If so, then let's backport the fix for bug 438876 (Notice how that fix doesn't include any loader change).
If not, then please, let's mark this WONTFIX already.
In answer to the preceding question, 
Yes, we have an internal customer for whom this bug is critical.
I don't know if the LD_LIBRARY_PATH solution will be acceptable to them.
I think it should be, but I am frequently surprised by what some groups
think are crucial details, so I will inquire.
This patch fixes (or at least improves) the code that lists certs.  
It eliminates memory leaks, cert reference leaks, and some bogus 
tests for cert validity that rejects certs that were actually valid.
This patch is written for the branch, but I expect it will apply cleanly 
to the trunk also.
Assignee: nobody → nelson
Status: NEW → ASSIGNED
Attachment #348146 - Flags: review?(alexei.volkov.bugs)
Oh, I meant to add that the preceding patch makes it unnecessary to 
set both trusted peer and trusted CA trust flags on the same cert.
This allows signtool to be fixed before bug 464834 is fixed.
This patch fixes a bug in lib/jar.  That bug caused signtool to report that 
a jar's signature was valid, even when the jar had no signature at all!  
This code makes lib/jar count the number of files of each type required 
for a signed jar, and complain if there are none of any required type.
Attachment #348147 - Flags: review?(alexei.volkov.bugs)
Attachment #348146 - Attachment description: patch, part 1, v1 - Fix certutil -l and -L commands → patch, part 1, v1 - Fix certutil -l and -L commands on branch
This is the trunk version of this patch
Attachment #348152 - Flags: review?
This patch is a rewrite of CERT_IsCACert.  Mostly it factors out parts 
into new subroutine functions, but it makes three changes.
a) it sets NS_CERT_TYPE flags for peer flags as well as CA flags
b) trust flags ALWAYS override other flags, no matter where those other flags
come from
c) It attempts to fix bug 454437, because it rewrites the offending line of 
code anyway.
Comment on attachment 348155 [details] [diff] [review]
patch, part 3, v1 - reenable peer trust flags - for trunk (checked in)

Bob, I think you're the most qualified to review this patch.
Attachment #348155 - Flags: review?(rrelyea)
Attachment #348152 - Flags: review? → review?(alexei.volkov.bugs)
Nelson, did you attach these patches to the right bug ? Comment 0 through 6 seems to be discussing different issues than comment 7 through 11 .
Yes, these are all issues that stand in the way of making signtool work
adequately in NSS 3.11.x
The names of the patches incorrectly say "certutil -L and -l".
Comment on attachment 348146 [details] [diff] [review]
patch, part 1, v1 - Fix signtool -l and -L commands on branch

Good patch.

There is a comment that can be moved closer to an actual line of code:

201	/* Display this name or email address */	
207	PR_fprintf(outputFD, "%s\n", name);
Attachment #348146 - Flags: review?(alexei.volkov.bugs) → review+
Attachment #348146 - Attachment description: patch, part 1, v1 - Fix certutil -l and -L commands on branch → patch, part 1, v1 - Fix signtool -l and -L commands on branch
Attachment #348152 - Attachment description: patch, part 1, v1 - Fix certutil -l and -L commands on trunk → patch, part 1, v1 - Fix signtool -l and -L commands on trunk
> The names of the patches incorrectly say "certutil -L and -l".
Fixed.
re patch3 -- The only possible issue I see with this patch is the direct use of
nsCertType in the IsCA calculation.

nsCertType used to be a lazily evaluated cached value. If that is no longer the case, then I'll give the patch an r+. Otherwise you'll either have to restore the explicit trust checks, or make the call which updates the nsCertType (I think CERT_GetCertType() or some such call).

bob
Comment on attachment 348155 [details] [diff] [review]
patch, part 3, v1 - reenable peer trust flags - for trunk (checked in)

r+ Nelson verified that we now always call CertGetCertType at the decode stage and update it appropriately, so the patch is correct.

bob
Attachment #348155 - Flags: review?(rrelyea) → review+
Attachment #348147 - Flags: review?(alexei.volkov.bugs) → review+
Attachment #348152 - Flags: review?(alexei.volkov.bugs) → review+
On trunk:

Bug 464406: Fix signtool regressions, make signtool work again
part 1 - Fix signtool -l and -L commands, r=Alexei

cmd/signtool/list.c; new revision: 1.11; previous revision: 1.10

Bug 464406: Fix signtool regressions, make signtool work again
part 2 - detect missing jar signatures, r=Alexei

lib/jar/jarfile.c; new revision: 1.10; previous revision: 1.9

Bug 464406: Fix signtool regressions, make signtool work again
part 3 - reenable peer trust flags, r=Alexei

lib/certdb/certdb.c; new revision: 1.93; previous revision: 1.92
Comment on attachment 348155 [details] [diff] [review]
patch, part 3, v1 - reenable peer trust flags - for trunk (checked in)

The checkin of this patch appears to have introduced a regression that is documented in bug 468532.
Attachment #348155 - Attachment description: patch, part 3, v1 - reenable peer trust flags - for trunk → patch, part 3, v1 - reenable peer trust flags - for trunk (checked in)
Attachment #348147 - Attachment description: patch, part 2, v1 - detect missing jar signatures → patch, part 2, v1 - detect missing jar signatures (checked in on trunk)
Attachment #348152 - Attachment description: patch, part 1, v1 - Fix signtool -l and -L commands on trunk → patch, part 1, v1 - Fix signtool -l and -L commands on trunk (checked in)
I'm changing this target to 3.12.3 again, because I think we will
not try to fix this in 3.11.x.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Summary: Fix signtool regressions, make signtool work in NSS 3.11.x again → Fix signtool regressions, make signtool work
Target Milestone: 3.11.10 → 3.12.3
You need to log in before you can comment on or make changes to this bug.