Closed Bug 74822 Opened 23 years ago Closed 21 years ago

PK11_FindCertFromNickname returns certificate with unexpected slot if identical keys/certs exist within the key DB and a PKCS11 token

Categories

(NSS :: Libraries, defect, P2)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: julien.pierre, Assigned: julien.pierre)

References

Details

This was found by Ken Chan in iWS QA and subsequently diagnosed by me.

What Ken did was to create a cert and key in the internal (perm) NSS DB.
Then we exported it to P12 with pk12util.
Then we imported the P12 file into a hardware token.

We tried to setup the server to use both certs. Basically one listen socket was 
using the cert from the DB - "Server-Cert", and another listen socket was using 
the one from the hardware token - with a nickname of "ISG 2.0 Cryptoki 
Interface:Server-Cert".

The server was properly authenticated to both the key DB and the hardware token.
However, it failed to come up with an error coming back from 
PK11_FindCertFromNickname . The error was that "Server-Cert" could not be found.
When we set the server to use only "ISG 2.0 Cryptoki Interface:Server-Cert", it 
found the cert.

I believe the problem is that NSS maintains only one cert structure for both of 
these certs because they are identical. This causes the lookup by nickname to 
subsequently fail.

The lookup for the DB cert only fails if both the key DB and the hardware tokens 
have been authenticated. Our admin only authenticates to the required token when 
you click on a cert to edit it through a CGI, so it never shows the problem and 
the PK11_FindCertFromNickname succeeds in it.

But the server always authenticates to all the tokens prior to looking for its 
certs, as it can use multiple server certs from the DB and a mix of PKCS#11 
tokens.

The only workaround I found to make the server come up using the DB cert was to 
remove the hardware token from the secmod.db . But of course at that point, only 
the DB cert can be used - the hardware token cert can no longer be used.

I know this defect is coming late but it is nevertheless a showstopper for us. 
This is especially true since we have a single shared secmod.db for all server 
instances and removing the PKCS#11 module from secmod.db is not acceptable.
OS: Windows NT → All
Priority: -- → P1
Bob
We need this fix for Hewitt associates and any oither such customer who is going 
to  migrate from the s/w ssl to a h/w accelerator.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Seems like deleting the cert from the software token once it has been 
installed in the hardware token should work around this problem. No?

This is not a regression, so how can it be a blocker?
Because we can have multiple lsiten sockets - one lsitening on the s/w token and 
the other on the h/w.
This is one way of gauranteeing failover from h/w to s/w using an additional 
load-balancer at the front end.
Target Milestone: --- → 3.2.1
You can work around that with two separate instances (two processes).
I think this is a fairly major change to NSS that you're asking for.
I doubt it can be done in just a few days.

I'm setting TFv to 3.2.1 so it will show up on a search for P1 
bugs for 3.2.1.  But I think that if we hold 3.2.1 for this, it 
will slip 3.2.1 a long time.  NSS has never done what you ask.
(This is just a guess, however.)
Never mind, I just had cookies disabled on my browser, that's why 
bugzilla wouldn't let me update before ;).

Anyway, we can't work around this bug by using several processes
(instances).

As I stated in the defect, we have a single shared secmod.db accross 
all
server instances. All server instances always authenticate to all
available tokens prior to coming up, so that all certs from all tokens
are available to them. Even if all server certs are not strictly
required for the server to come up initially, we support dynamic
reconfiguration and the server administrator could change a listen
socket's certificate on the fly from the software token to the hardware
token and vice-versa. Our dynamic reconfiguration process does not
include token (re) authentication - that part is currently always done
in the init before the config is even parsed - in fact, before we know
which cert nicknames are needed.
The reasoning for that decision I made is that it simplifies server
administration. You only need to authenticate to tokens when you do a
full start of the server, not when you make a dynamic config change. 
You
always authenticate to all tokens when you do a full start. If you add 
a
token to secmod.db, on some platforms (eg. NT) you have to shutdown all
your server instances anyway prior to doing it because the file is
locked, so adding a password transmission scheme in our dynamic 
reconfig
IPC didn't make the list. I considered this acceptable for a server
since usually the hardware has to be shutdown anyway at the time of
adding the crypto token.

The all-token authentication is done in the initialization of the 
server
process, before the configuration is even parsed; right after
NSS_Initialize is called. I simply enumerate all tokens and 
authenticate
to all of them. At that point, I don't know yet which server certs may
be needed (if any) because the config hasn't been parsed. That part
happens at a later stage.

The problem here is that once all tokens are authenticated - meaning 
the
internal DB + the extra token(s) - the DB cert becomes completely
inaccessible once there is a matching cert in one of the hardware
tokens. Creating additional server instances will not help because they
will still have to authenticate to the hardware token when they load
secmod.db, and therefore won't see the DB cert after that. We need a 
way
to be able to get to all server certs in all authenticated tokens, and
NSS breaks that currently.

I realize this may never have worked before in NSS, but that doesn't
mean it isn't a bug. The semantics of the PK11_FindCertByNickname are 
to
return a cert structure for a nickname, and in this case it returns
NULL. Regardless of past incorrect behavior of NSS, this is now a
requirement for us because of all the added security functionality we
provide in iWS 6.0, and the bundling of pk12util with the product.
pk12util is being documented in the iWS docs in the security chapter to
help customers migrate their certs. As it is right now, the migration
process seems a bit hazardous. Eg: you can do it one way, from the
software to the hardware tokens, by exporting a cert/key from the DB to
the hardware token. If you use the cert on the hardware token, then all
is fine. But if you try to go back to your software cert, it will fail.
The only workaround is to delete the hardware token from secmod.db and
this is not an acceptable solution.

FYI, Bob created an API called PK11_ListCerts that enumerates the 
server
certs.  That API has been able to enumerate all the server certs in all
tokens for me in the main cert admin page. It returns the nickname in a
special linked list field (appdata).

I figure that I could use that API to get a list of all the server 
certs
with the list of nicknames, and then search myself in the list to see 
if
one of the elements matches my nickname. Of course, I would get a cert
pointer that isn't unique - several elements in the list returned by
PK11_ListCerts will have the same cert pointer, but a different 
nickname
in the appData field. But if I use that approach, I will at least get 
to
a CERTCertificate structure, unlike the broken PK11_FindCertByNickname
which just returns NULL. It sounds to me like it would be a reasonable
approach to integrate into the PK11_FindCertByNickname code itself. It
would slow things down for misses since you would end up doing that
second lookup that calls PK11_ListCerts and searches the list of certs;
but at least it would be the correct behavior ! I could probably write
that code in an hour if I wasn't so sick at home at the moment.

However, getting the cert structure doesn't quite get me there as you
maintain a single cert structure for duplicate certs. It still doesn't
select the token. The next problem for me in the server then becomes
getting ahold of the proper key for that cert. Currently, in my core
server code that enables a particular cert/key on a socket, I see this 
:

serverkey = PK11_FindKeyByAnyCert(severcert, NULL);

What will the behavior of that call be when there are multiple key
objects (in different tokens) matching the cert that is passed in ? It
sounds like I would need another API to look for a key in a specific
token.

Browsing the NSS pk11func.h file, I see :

SECKEYPrivateKey * PK11_FindPrivateKeyFromCert(PK11SlotInfo
*slot,          
                                        CERTCertificate *cert, void
*wincx);

Does that API lookup the key for the cert in a particular token ? If 
so,
then I can just do a PK11_FindSlotByName based on the token prefix
string in the cert nickname to get the slot (or
PK11_GetInternalKeySlot() if there is no colon specified), then use 
this
call to get the right key and I'm all set.

On the other hand, if it looks for the key in any token and then 
returns
the token in the slot structure that's passed in, then it's not good.

I just looked at pk11cert.c and see that it does the former - so that's
good news.

I haven't tried it yet, but if what I wrote above works then we have a
solution. It would be best in my opinion to have the fix be part of
PK11_FindCertByNickname rather than me doing my own lookup separately.
You can probably optimize the lookup process better than I can - my
process only uses public NSS functions.
Summary: PK11_FindCertByNickname fails when identical keys/certs exist within the key DB and a PKCS11 token → PK11_FindCertFromNickname fails when identical keys/certs exist within the key DB and a PKCS11 token
My workaround works, so I was able to get this to work by bypassing 
PK11_FindCertFromNickname . I still say that this is a serious 
functional bug that needs to be addressed. I'm leaving it as a P1 but 
lowering severity to "major" from "blocker".
Severity: blocker → major
Since NSS 3.2.1 RTM'ed today, I'm changing the TFV to 3.3.
Target Milestone: 3.2.1 → 3.3
FYI, my workaround using PK11_ListCerts to find the certs by nickname, 
instead of PK11_FindCertFromNickname, has a performance impact on the server 
when coming up, since I need to constantly re-enumerate all the certs and walk 
the list over again. PK11_FindCertFromNickname should be fixed and would provide  
a much better lookup time. This is an issue when many secure listen sockets are 
used with different certs on them, or for a secure virtual server setup - eg. 
one listen socket on port 443, and different certs for each IP address.
We will probably require a new API to fix this correctly. The real issue here is
the current design of the Cert code only really handles certificates in one
token (this has always been true of NSS).

The solution is the new CERT/PKCS#11 reconciliation code scheduled for 3.3
Severity: major → enhancement
Target Milestone: 3.3 → 3.4
Bob,

How does this get addressed by the cert code changes in 3.4 ?
Priority: P1 → P2
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
I think there may still be an issue if you try to use 3.x API's to reference
certificates that live in multiple tokens. NSS 3.4, internally, does not get
confused anymore. (it keeps a list of all the tokens a given cert resides in).
Target Milestone: 3.4 → 4.0
*** Bug 213454 has been marked as a duplicate of this bug. ***
This was a problem for NES 2 years ago. It is also a problem for CMS now.
Unfortunately, this problem can only be truly resolved in NSS 4.0 (Stan). Until
then, we can only provide workarounds to the problem, unfortunately.

I had designed a workaround to PK11_FindCertFromNickname for NES 6.0 that used
to work well with NSS 3.2.1 . Somehow, later versions of NSS broke this workaround.

I have since provided a new patch for NES 6.1 that should work even with the NSS
regression, and should also resolve the problem for CMS. It needs to be tested
by the CMS team.
The behavior of PK11_FindCertFromNickname has changed in current versions of NSS
in this case.

Rather than failing, this function now always finds a certificate regardless of
which token was specified before the : in the nickname.

However, the slot in the CERTCertificate structure that is returned is
undetermined in this case. This is a problem because many applications rely on
this to subsequently find the private key. This causes applications to use the
wrong token.

I have changed the description of the bug accordingly. I have also raised the
priority since we have customers running into this problem.
Severity: enhancement → major
Summary: PK11_FindCertFromNickname fails when identical keys/certs exist within the key DB and a PKCS11 token → PK11_FindCertFromNickname returns certificate with unexpected slot if identical keys/certs exist within the key DB and a PKCS11 token
Target Milestone: 4.0 → 3.9
Taking bug.
Assignee: relyea → jpierre
Removing confidential flag.
Group: netscapeconfidential
According to Ian in http://bugzilla.mozilla.org/show_bug.cgi?id=72291#c27 , we
should not create multiple CERTCertificate structures for the same DER cert,
even if it exists in multiple slots. This is a design restriction of NSS 3.x.
Bob Relyea can provide more details when he comes back.

What this means is that we cannot fix this particular function in NSS 3.9, as we
only have a single CERTCertificate structure with a single slot and nickname.

It is however possible to add new NSS APIs to enumerate all the instances of a
CERTCertificate. Ian proposed the following APIs :

PK11SlotInfo **CERT_GetSlots(CERTCertificate *cert);
char *CERT_GetNameForSlot(CERTCertificate *cert, PK11SlotInfo *slot);

Steve, Christine, Thomas, please give your input on this proposal.
Can we extend the defintion of CERTCertificate structure to include
information about multiple slots?
Thomas,

The main problem with extending the CERTCertificate structure is breaking binary
compatibility with other 3.x DLLs and applications that use them. This would be
something to do in a major NSS release, but I don't think we can do it in a
patch release. The fact that the content of the CERTCertificate structure is
public is what has tied the hands of the NSS team for many years as far as
making changes to it. The more public fields we expose, the more legacy behavior
we have to emulate for all kinds of programs that access the public structure
fields. And we can never change the internal representation when the needs
arise. We can't just obsolete a field, such as the nickname and slot fields,
even though they should be replaced by lists.

In a way, CERTCertificate already contains the information about multiple slots,
in the form of a pointer to an NSSCertificate, which is a Stan object. That
object does have a list of all the instances, and there are internal Stan
functions to enumerate those instances. None of this is exposed however, and the
official word on Stan is that it will never become a public API because of lack
of development resources. However, the existence of these internal functions do
enable me to implemented the fix for PK11_ListCerts (72291), and they also allow
implementing the functions in comment #18.

We are discussing possible changes to the CERTCertificate structure for NSS 4.0
however, as we can make incompatible changes in that version. We may very well
add the multiple instance information to the public structure in that release. I
would personnally rather make the entire CERTCertificate structure opaque, with
accessor functions.
In our application, we have code to do the following with JSS:

    X509Certificate cert = CryptoManager.findCertByNickName("xxx:nickName");
    CryptoManager.findPrivKeyByCert(cert);

Couple Issues:

(1) X509Certificate.getOwningToken()

    X509Certificate object has an method to get the owning token. This function
    currently just return the first slot.

    <SUGGESTION #1: If NSS can make first cert->slot to point to the
                 slot that is requested in parameter (i.e. xxx). JSS
                 will return the correct token>

(2) findPrivKeyByCert() calls NSS's PK11_FindKeyByAnyCert(cert, NULL)


    <SUGGESTION #2: I did not look at how PK11_FindKeyByAnyCert works. But if
                 it returns the key from the first slot pointed by the cert,
                 then it should return the corresponding private correctly
                 given that suggestion #1 is implemented.>

I agree all this can also be fixed in JSS. Let me know if it will be
too much to implement suggestion #1 which involve reshuffule the slot
link.

    
Thomas,

In response to suggestion 1 :

NSS currently arbitrarily picks the slot that is stored in cert->slot when
multiple instances of a cert exist. The only decision it makes is to prefer the
hardware token instance vs the softoken instance. It is possible to change the
policy about the choice of the default instance.

However, I don't think that choice can be made in the way you demand. When you
call PK11_FindCertFromNickname, the CERTCertificate may already have been
created before, because some other thread or piece of code in your application
previously called other NSS functions and put in the cache. This is especially
true in a componentized process that includes a web server, Java servlet engine,
and certificate management system.

Changing the slot on an already existing CERTCertificate object would pose a
problem for concurrency. Other threads may already have the CERTCertificate
pointer. We don't have a lock around the CERTCertificate object, and anyone can
peek into the structure at any time. So we can't dynamically update the slot.

However, since you have a wrapper object in JSS called X509Certificate, it
should be easy enough to do the selection of the proper slot in your wrapper object.

You could do it like this :

cert = PK11_FindCertFromNickname(nickname);
/* now you have a CERTCertificate with an undetermined slot */

/* now get a pointer to the specific slot we want */

    PK11SlotInfo* slot = NULL;
    if (strchr(nickname, ':'))
    {
        char* tokenname = strdup(nickname);
        char* colon = strchr(tokenname, ':');
        if (colon)
        {
            *colon = 0;
            slot = PK11_FindSlotByName(tokenname);
            if (!slot)
            {
                /* slot not found . Should never happen, because we already
found the cert */
                /* process error here ... */
            };
        };
        free(tokenname);
    }
    else
    {
        slot = PK11_GetInternalKeySlot();
    };

then :

X509Certificate.cert = cert;
X509Certificate.slot = slot;

JSS can be modified to return that slot pointer outside of the CERTCertificate
to return the specific slot instance that's needed. Hopefully JSS has an
accessor method for returning the slot and there is only one place in the code
to change. cert->slot should only be referenced by JSS if that outside
X509Certificate.slot pointer wasn't set.

Note that we didn't even need the new APIs I proposed to enumerate the
instances. Because PK11_FindCertFromNickname found the cert, that means the slot
specified in the token:nickname string contained the certificate we were looking
for, even though cert->slot may not be the slot of the instance we were looking
for. So, all we need to do is get a handle to the slot we want to work with. We
already know that the slot contains an instance of the certificate.

Suggestion 2 :

To find the private key, don't use PK11_FindKeyByAnyCert, but instead :

serverkey = PK11_FindPrivateKeyFromCert(X509Certificate.slot,
X509Certificate.cert, NULL);

Hopefully this will meet your needs.
Thomas,

Have you considered not storing certs in NSS? It may be possible to maintain
your own database of certs, and then you wouldn't have these problems. Then you
would only use NSS for key management...
We can consider that. But as I know, a lot of the cryptopgraphic (encryption,
decryption) operations may require a NSS certificate structure. I may be wrong.
We have done all we can about this issue in NSS.
PK11_FindCertFromNickname always returns a CERTCertificate when spcecifying the
token prefix for any of the tokens in which the cert lives. However, the slot is
fixed because there can only be one CERTCertificate. This is a design issue.
Workarounds at the application level were provided that solve the problem.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.