Closed
Bug 240956
Opened 21 years ago
Closed 20 years ago
OS/2 VACPP and gcc builds of NSS use different PKCS#11 calling conventions
Categories
(NSS :: Libraries, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: julien.pierre, Assigned: wtc)
Details
When NSS was ported from the IBM Visual Age C++ compiler to the gcc compiler, no
attention was paid to the calling convention used for loading PKCS#11 modules,
and it got changed.
Unfortunately, the two calling conventions use the same symbol naming, but
different semantics for passing the arguments and returning values.
There is no obvious way to find out if the nssckbi.dll was built with one
calling convention or the other, until it's too late (ie. one of the C_
functions is called, and crashes).
The consequence if that if NSS code built with gcc tries to load a PKCS#11
module built with VACPP, or vice-versa, a crash results.
This affects a lot of OS/2 browser users with secmod.db, which has an absolute
reference in secmod.db The 2 common cases are :
a) users upgrading from a Mozilla VACPP build to a Mozilla gcc build, and using
their existing browser profile .
b) users switching back and forth between the two types of builds.
This is actually not uncommon. For example, the still-maintained Mozilla 1.4
branch is built with VACPP, but since 1.5 (or 1.6, I forget) it is built with gcc.
The known-workaround is to delete secmod.db .
I think a necessary part of the fix is to reconcile the calling convention for
PKCS#11 in NSS on OS/2 regardless of the compiler, so that long-term, this
problem just disappears.
Fixing the browser issue that users are seeing in the short-term is more
complicated. I feel that at least the upgrade case should be fixed. I don't
feel that supporting switching back & forth is important, as we don't do that
with the other DBs either.
Here is a proposed fix for the upgrade case.
#ifdef XP_OS2_EMX
1) In SECMOD_LoadModule, check if the name of the DLL to load is nssckbi.
2) If so, check the DLLs it depends on (this requires some OS/2-specific
3) If one of them is the VACPP runtime DLL bundled with the browser, don't try
to load it, for fear it would crash due to the calling convention mismatch.
Simply return an error. believe the PSM code will then find that it doesn't have
a root cert module loaded, and automatically load the DLL from the location
where Mozilla is running.
#endif
Other ideas welcome !
Comment 1•21 years ago
|
||
Anything we try to do to fix this will make things worse then they are now.
Your assertions are incorrect.
Installing GCC over VACPP works fine and the profile migrates fine.
The ONLY problem here is sharing profiles between GCC and VACPP which is
unsupported, or if you install a GCC build along side a VACPP build and try to
use the same profile.
Your suggestion about "detecting MOZRMI36" will still cause VACPP bulds fail
accessing security.
We can't go back in time and fix this, and any calling convention change we make
will break even more builds.
GCC works, VACPP is in the past, this decision isn't worth revisiting now.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 2•21 years ago
|
||
Mike,
Even the upgrade process isn't reliable. I never upgrade Mozilla by overwriting
the Mozilla code. Instead, I always install a new version of Mozilla to a new
directory, and use profile manager to point to the old profile, which is yet in
a third location.
When I went from a VACPP to a gcc build this way, I did get the crash. The
migration to a new version only works if you overwrite the Mozilla code
in-place, which is not necessarily what everybody does. So, I consider the
upgrade case still broken.
You are incorrect that we can only make the situation worse. We can actually
improve it if my proposed fix for this bug, and my proposed fix in
http://bugzilla.mozilla.org/show_bug.cgi?id=176501#c21 , are both implemented.
If you do that, you will only have to incorporate the fixes into the newer gcc
build. You will not have to touch any VAC code. At least as far as security is
concerned, that new gcc build would be able to share its profile with existing
VAC builds. Also, you could upgrade from existing gcc builds to the newest gcc
build without problem.
The one case that wouldn't work is if you try to share your profile between 3
builds, an existing VAC build, an existing gcc build, and a new gcc build. The
first 2 types just don't play together and that part can't be fixed.
Comment 3•21 years ago
|
||
Your idea about figuring out if the DLL loads MOZRMI36 won't work most of the time.
If the user has MOZRMI36 on their, the DLL failure will happen with SYS2070 when
PLC4 is loaded. These cannot be handles with DosLoadModule - your app will crash.
Believe it or not, if you don't have MOZRMI36 in your path, everything works
just like you want - loading the 1.4 NSSCKBI.DLL fails gracefully, and we load
the one from GCC. Works fine.
The issue here is that if MOZRMI36 is available, the system tries to load the
NSPR DLLs and all heck breaks loose and there is no way to prevent it. Period.
Comment 4•21 years ago
|
||
This would be an issue if any vendors tried to support OS/2 for their tokens.
In this case they would have to have 2 different PKCS #11 modules depending
which compilier your application happens to use.
If a common Binary API is not defined to PKCS #11, it would guarrentee that you
would never convince hardware vendors to support their tokens on OS/2
(including IBM). nssckbi is not the only issue here.
Maybe it's not a concern because no token vendors support or plan on supporting
OS/2, and no one cares if the ever do.
bob
Comment 5•21 years ago
|
||
Can someone tell me what APIs I should be looking at if I wanted to change the
calling convention?
It looks like NSS uses PR_CALLBACK and these are _Optlink on VACPP.
What would I actually need to change?
Comment 6•21 years ago
|
||
Bob, in your comment 4, I think you overlooked nssckbi. This is an issue
even for OS/2 users who have no hardware tokens, but how have two versions
of their browser, one built with gcc, the other not.
Having said that, As I wrote in bug 176501, the application that calls
NSS ultimately controls what filenames get into secmod.db. And they
control where secmod.db itself gets installed. Some applications control
this very well, others apparently do not.
Reporter | ||
Comment 7•21 years ago
|
||
I think Bob and I are in agreement that we should to resolve the calling
convention issue, even if it is only to standardize on the C calling convention
currently used in gcc builds of Mozilla/NSS and that doesn't affect the browser.
There are two PKCS#11 modules that come with NSS, nssckbi.dll and softokn3.dll .
This change would affect both. The pathname issue is irrelevant for softokn3.dll
since it doesn't get loaded from secmod.db, so changing its calling convention
(or more accurately, the way it selects the calling convention when building)
will not create additional problems.
In response to comment #5 : the PKCS#11 header definitions are under
lib/softoken . But it doesn't look like we currently use any macro similar to
PR_CALLBACK for the exportable symbols in the headers :-( !!! There are some
traces of CK_EXTERN which could do the job, but the macro isn't used. This means
that the issue may exists on other platforms as well when switching compilers
for NSS.
I think we haven't run into it because most compilers on other platforms don't
fool around with changing C calling conventions; mostly they have their own C++
calling conventions.
Bob, can you take a look at this ? If this is confirmed, we should at least
define the macros for calling conventions, and use them on all the exportable
PKCS#11 symbols.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 8•21 years ago
|
||
If you fix this, you will break migration of older GCC profiles...
Reporter | ||
Comment 9•21 years ago
|
||
Mike,
If we stick to the same calling convention used now in gcc, and just formalize
it in the header files with explicit macros, it won't break anything with
existing or future gcc builds.
It would only affect any new NSS code built with VACPP, which would suddenly
switch over to the new calling convention. I understand you won't be building
any new VACPP NSS code, so that shouldn't be a problem for you.
As Bob said, we need to set the calling convention in stone to allow any future
smartcard vendors to know what to do when building their PKCS#11 libraries. This
is the main value of the proposed change.
The other value of this change for me is that I could share my security DBs
(which are not necessarily browser DBs, but server DBs) between NSS command-line
tools built with the 2 different compilers, for example when doing performance
tests. Right now, I must always make two different copies of the DBs and erase
secmod.db in those tests. I should be able to just point to the same directory
on the command-line with either version of the tools.
Comment 10•21 years ago
|
||
I don't understand why more people don't use extern "C" when creating interfaces
like this.
It sure would make things nice and compatible :)
Comment 11•21 years ago
|
||
Nelson: I wasn't ignoring nssckbi.dll, I just felt Julien had covered it pretty
well.
Mike, These functions are more than Extern "C".... The are "C" functions called
from "C" code (both NSS and PKCS #11 are C APIs not C++). If there is a calling
convention problem its because GCC an VAC have different "C" calling
conventions. PKCS #11 has special defines to allow us to put whatever magic
keywords we need to make gcc and vac++ generate the same calling sequence (if
it's possible for them to use the same calling sequence) for the PKCS #11
functions themselves. Those magic defines need to be specified for the OS/2
platform.
The PKCS #11 declarations are defined in nss/softoken/pkcs11t.h . This file is
a modified version of the file supplied by RSA on the pkcs#11 site. There's a
long comment in pkcs11.h describing how these various macros are used, and how
they should be defined for certain platforms. NSPR makes some basic defines
(which for most platforms match the PKCS #11 definitions) which
nss/softoken/pkcs11t.h uses:
#define CK_CALLBACK_FUNCTION(rv,func) rv (PR_CALLBACK * func)
#define CK_DECLARE_FUNCTION(rv,func) PR_EXTERN(rv) func
#define CK_DECLARE_FUNCTION_POINTER(rv,func) rv (PR_CALLBACK * func)
These are meant to be compilier invariant values for defining C-function api's
across shared libraries. If we decide that these values are not correct for
OS/2, we could use ifdef OS2 around these and supply the correct macros.
bob
Comment 12•21 years ago
|
||
There's no such thing as different "C" calling conventions.
If you are using PR_CALLBACK, it is definitely different between the two versions.
PR_CALLBACK does NOT equal "C" calling convention - PR_CALLBACK can be set to
anything you want - for VACPP it is _Optlink, for GCC I believe it is nothing.
To make things 100% compatibile, "C" calling convention is the right thing to
use here, NOT PR_CALLBACK.
I repeat, PR_CALLBACK is NOT compiler invariant - it was never intended to be as
far as I know.
Reporter | ||
Comment 13•21 years ago
|
||
Mike,
_Optlink, __cdecl, _Pascal, __Far32, _System are all different C calling
conventions, ie. different calling conventions that can be used to invoke C
functions.
For PKCS#11, we need to choose a C calling convention that is compatible with
all invariant.
extern "C" is only a means for C++ to not mangle the function name. It uses the
compiler's default C calling convention as far as argument passing goes. The
problem comes from the fact that VAC and gcc have different opinions of what the
default C calling convention is.
For VACPP I believe there is actually a compiler flag to make the default C
calling convention _System instead of _Optlink ...
I don't know what calling convention is used by gcc.
Comment 14•21 years ago
|
||
One way to look at this problem is that on OS/2 there are apparently two
builds of mozilla, built with different compilers that use different calling
conventions by default, and that cannot share each other's libraries (unless
something is done to force them to use a common calling convention.
But perhaps the only aspect of that that is unusual is that there are two
commonly used builds of mozilla on the same box. Other platforms have
similar problems, potentially. For example, solaris supports 3 different
ABIs on the same CPU. NSS's makefiles support separate 32-bit and 64-bit
builds (the 32-bit build covers both 32-bit ABIs). If someone built a
64-bit build of the mozilla browser, including a 64-bit NSS, and then
tried to use it in a profile that had previously been setup by a 32-bit
mozilla, I ttink they'd have the same problem that OS/2 is having.
Reporter | ||
Comment 15•21 years ago
|
||
Nelson,
I would consider solaris 32-bits and 64-bits to be two different platforms. I'm
not sure the problem is the same as here. Can a 32-bit solaris .so even be
loaded by a 64-bit solaris process ? (or vice-versa) . I'd like to test this -
how do I force the NSS Solaris build to be 64 bits ? I'm not familar with the
the two 32-bit ABIs on solaris. Can you expand a little bit ? Thanks.
Comment 16•21 years ago
|
||
Julien,
Imagine that a user builds mozilla from source using the V9 ABI (LP64 model)
including the whole browser and NSS.
The user has previously run the normal 32-bit version of mozilla on his
ultrasparc box, and now has a profile with cert and key DBs and a secmod DB
that names the 32-bit nssckbi.so in some directory.
Then the user runs his 64-bit build, and it reads secmod.db, and it then tries
to load the 32-bit nssckbi.so. It will either fail in the load, or it will
fail later when the first attempt is made to call a function in that so.
Perhaps another solution is to have separate secmod.db files for different
ABIs (or calling conventions).
To build NSS in 64-bit mode, setenv USE_64 1 or export USE_64=1
then build normally.
Reporter | ||
Comment 17•21 years ago
|
||
Nelson,
I did in fact propose as one of the solutions to rename the secmod.db to solve
this problem, which can be done by simply changing one argument to NSS_Init .
I understand your case with loading 32-bit and 64-bit versions of Solaris.
However I think there is a big difference between not loading the .so and
loading the .so and then crashing . If the .so load fails, PSM will actually
look for the module somewhere else, and likely will find the correct version.
On the other hand, if the load succeeds and there is a crash, it would be the
same as the OS/2 situation. I don't think that is the case, but I'm trying to
confirm it. I just tried to build with USE_64=1 on my Blade 2000, but it failed
in NSPR in pr/src/md/unix/os_SunOS_sparcv9.s . Might have to file another bug
for that.
Comment 18•21 years ago
|
||
The crash should not be a normal situation. Most people would not have MOZRMI36
in their LIBPATH for the GCC build.
IF there is no MOZRMI36, we won't crash.
Comment 19•21 years ago
|
||
This problem is much worse than we think.
It also appears to happen when we switch LIBC versions (from LIBC04 to LIBC05)
See:
http://bugzilla.mozilla.org/show_bug.cgi?id=241722
Reporter | ||
Comment 20•21 years ago
|
||
Mike,
Am I correct that when you switch libc versions, you have to recompile Mozilla
with an updated GCC, that puts references to libc05.dll instead of libc04.dll ?
Is it possible that gcc OS/2 itself switched the default calling convention that
it uses for libc between two different compiler updates ?
That would be a really good case for fixing this problem and selecting the OS/2
PKCS#11 calling convention once and for all. And even if it hasn't actually
happened yet and the problem is something else, you want to prevent that problem
from popping up again and set the OS/2 PKCS#11 convention in stone now.
Pleasee Bob's message in comment #11 for more details on how to do that.
Basically, you need to settle on a calling convention, find the right keywords
for it that work with both gcc and VACPP, and update the macros.
As long as the C calling convention is the same between the PKCS#11 application
(here, the Mozilla browser and NSS) and the PKCS#11 library (nssckbi.dll) it
should not matter which runtime the PKCS#11 library is built against, as the
principle of PKCS#11 is that you can load libraries from many sources, including
third-party vendor sources. So, you should not be affected by any C runtime DLL
naming changes.
Comment 21•20 years ago
|
||
We're not going to fix this.
VACPP build is dead. Everyone's using GCC and we're not going to change things now.
Status: REOPENED → RESOLVED
Closed: 21 years ago → 20 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•