Closed Bug 468279 Opened 16 years ago Closed 15 years ago

softoken crash importing email cert into newly upgraded DB

Categories

(NSS :: Libraries, defect, P1)

3.12
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.3

People

(Reporter: nelson, Assigned: rrelyea)

Details

(Whiteboard: FIPS)

Attachments

(4 files)

(Using a debug trunk tip build of NSS)
If a program upgrades a cert8 to cert9, and then imports a cert, softoken
imports the cert into the cert8 DB, and then crashes in a VERY confused state. 
 
After debugging this most of the day, I've concluded that after an upgrade,
softoken is really confused about whether it's using cert8 or cert9.
This MUST be fixed for Sun's 3.12 release.  

Steps to reproduce:

mkdir -p DB
# create new empty cert8 DB
certutil -N -d DB
# upgrade it to cert9 and import email cert
certutil -E -d sql:DB -n email@address -i email-cert.der -t ',,p'

Expected result:
email cert successfully imported into cert9.db.  cert8.db still empty.

Actual result:
cert is imported into cert8.db, then softoken crashes on an assertion.

I did this test using the cert attached to bug 185167.  The actual command
I used was:

certutil -E -d sql:DB -n ludek.rasek@centrum.cz -i /tmp/185167b.der -t ',,p'

Here is some info from debugging.  
Things get really interesting at the point in the NSC_CreateObject where it
tries to "handle" the token object, writing it to the DB.  The stack at that
point is:

softokn3.dll!sftkdb_write()  		Line 1106
softokn3.dll!sftk_handleCertObject()  	Line 701 + 0x14 bytes
softokn3.dll!sftk_handleObject()  	Line 1472 + 0xd bytes
softokn3.dll!NSC_CreateObject()  	Line 3681 + 0xd bytes
nss3.dll!import_object()  		Line 250 + 0x1b bytes
nss3.dll!nssToken_ImportCertificate()  	Line 579 + 0x18 bytes
nss3.dll!PK11_ImportCert()  		Line 907 + 0x3a bytes
certutil.exe!AddCert()  		Line 189 + 0x15 bytes
certutil.exe!certutil_main()  		Line 2803 + 0x3d bytes
certutil.exe!main()  			Line 2934 + 0xf bytes

When sftkdb_write calls db->sdb_Begin, 
    crv = (*db->sdb_Begin)(db);
the function it actually calls is lg_Begin.

When sftkdb_write calls sftkdb_lookupObject
    /* Find any copies that match this particular object */
    crv = sftkdb_lookupObject(db, object->objclass, &id, template, count);
that function finds the SMIME object (!?) as if it is already in the DB.

So, sftkdb_write calls:
	/* object already exists, modify it's attributes */
	*objectID = id;
        crv = sftkdb_setAttributeValue(arena, handle, db, id, template, count);

which eventually calls into lg_SetSingleAttribute.  The stack looks like:

lg_SetSingleAttribute()     Line 1729
lg_SetAttributeValue()      Line 1786 + 0x17 bytes
sftkdb_setAttributeValue()  Line 1070 + 0x18 bytes
sftkdb_write()              Line 1148 + 0x1d bytes

lg_SetSingleAttribute handles some classes of objects, but not SMIME objects.
For unhandled object classes, it returns CKR_ATTRIBUTE_READ_ONLY, and that's
just what it does for this SMIME object.  

So, back in sftkdb_write, it goes to loser, bypassing the call to 
    crv = (*db->sdb_Commit)(db);

At loser, it calls:
loser:
    if (inTransaction) {
	(*handle->db->sdb_Abort)(handle->db);

This is a call to sdb_Abort (not to lg_Abort).  Perhaps this is because
it calls handle->db->sdb_Abort instead of calling db->sdb_Abort.  
sdb_Abort calls sdb_complete, which crashes on this assertion:

    /* We must have a transation database, or we shouldn't have arrived here */
    PR_EnterMonitor(sdb_p->dbMon);
->  PORT_Assert(sdb_p->sqlXactDB);

Now, it's bad enough that sftkdb_write is confused about whether it's using
the legacy DB or the new one, and it would be an improvement just to make
it be consistent.  But it really should not be using the legacy DB at all
after NSS_Initialize completes the upgrade.  Yet it does.


Other discoveries:

PK11_SaveSMimeProfile adds 1 to the length of the string it passes for 
the CKA_NETSCAPE_EMAIL attribute, apparently to include the trailing NUL 
character (?!).  But the stored email address does not have a trailing NUL
character, so the string being stored always mismatches the string already
stored because their lengths differ by 1.  I think the "+1" in this line
    PK11_SETATTRS(attrs, CKA_NETSCAPE_EMAIL, 
				emailAddr, PORT_Strlen(emailAddr)+1); attrs++;
of PK11_SaveSMimeProfile is wrong.

Finally, I found one other major error in lg_mkHandle.  Here is patch for it;

     /* there is only one KRL, use a fixed handle for it */
-    if (handle != LG_TOKEN_KRL_HANDLE) {
+    if (handle == LG_TOKEN_KRL_HANDLE) {
        lg_XORHash(hashBuf,dbKey->data,dbKey->len);
        handle = (hashBuf[0] << 24) | (hashBuf[1] << 16) |
                                        (hashBuf[2] << 8)  | hashBuf[3];
        handle =  class | (handle & ~(LG_TOKEN_TYPE_MASK|LG_TOKEN_MASK));
In retrospect, several problems seem obvious.

1. At loser, it is definitely calling the wrong sdb_Abort.  Patch:
-    (*handle->db->sdb_Abort)(handle->db);
+    (*db->sdb_Abort)(db);

2. The value in local variable db should not be pointing at the legacy DB.
That value is returned by  a call to SFTK_GET_SDB(handle) which is defined as
       ((handle)->update ? (handle)->update : (handle)->db)
So, even though the update is done, handle->update still points to the 
legacy DB.  That's the second problem.  I don't have a patch for that. :(

3. lg_SetSingleAttribute really needs to handle all classes of objects found
in the legacy DB.
As an experiment, I set a breakpoint in sftkdb_write, just before the call 
to SFTK_GET_SDB(handle).  When I hit it, after checking that the stack was
not in NSS_Initialize, I modified handle->update, making it NULL.  Then
ran the program to completion without error, with the expected result that
the cert was stored in the new cert9 DB, not in the old cert8 DB.

It did demand that I login to do so, which surprised me, as I was not 
running in FIPS mode.
Nelson, in the "certutil -E -d sql:DB -n email@address -i email-cert.der -t ',,p'
" did you supply a password?

If not the update is only half complete. Even so, we should be able to run against the new database...


> PK11_SaveSMimeProfile adds 1 to the length of the string it passes for 
> the CKA_NETSCAPE_EMAIL attribute, apparently to include the trailing NUL 
> character (?!).  But the stored email address does not have a trailing NUL
> character, so the string being stored always mismatches the string already
> stored because their lengths differ by 1.  I think the "+1" in this line
>    PK11_SETATTRS(attrs, CKA_NETSCAPE_EMAIL, 
>                emailAddr, PORT_Strlen(emailAddr)+1); attrs++;
> of PK11_SaveSMimeProfile is wrong.

It may be, but we do know there is a NULL there., or the PORT_Strlen() is incorrect (or did I misunderstand your comment?).


> Finally, I found one other major error in lg_mkHandle.  Here is patch for it;
>
>     /* there is only one KRL, use a fixed handle for it */
>-    if (handle != LG_TOKEN_KRL_HANDLE) {
>+    if (handle == LG_TOKEN_KRL_HANDLE) {
>        lg_XORHash(hashBuf,dbKey->data,dbKey->len);
>        handle = (hashBuf[0] << 24) | (hashBuf[1] << 16) |
>                                        (hashBuf[2] << 8)  | hashBuf[3];
>        handle =  class | (handle & ~(LG_TOKEN_TYPE_MASK|LG_TOKEN_MASK));

The original code is correct. There is only one KRL handle, so don't go off and create a hash of the dbkey to find it (You'll find your system will not work very well if you try to run with this patch;).

> 2. The value in local variable db should not be pointing at the legacy DB.
> That value is returned by  a call to SFTK_GET_SDB(handle) which is defined as
>       ((handle)->update ? (handle)->update : (handle)->db)
> So, even though the update is done, handle->update still points to the
> legacy DB.  That's the second problem.  I don't have a patch for that. :(

This is probable the issue. Why isn't update cleared? BTW which handle are we looking at there, the Cert or the key datbase...


> 3. lg_SetSingleAttribute really needs to handle all classes of objects found
> in the legacy DB.

Not possible. We don't store single attributes in the legacy DB (part of the reason I wanted to move away from it. This has never worked before, the current code is attempting to keep this old semantic so that existing applications don't break.

> As an experiment, I set a breakpoint in sftkdb_write, just before the call 
> to SFTK_GET_SDB(handle).  When I hit it, after checking that the stack was
> not in NSS_Initialize, I modified handle->update, making it NULL.  Then
> ran the program to completion without error, with the expected result that
> the cert was stored in the new cert9 DB, not in the old cert8 DB.
> 
> It did demand that I login to do so, which surprised me, as I was not 
> running in FIPS mode.
> 

Documented in the New Database Design:

https://wiki.mozilla.org/NSS_Shared_DB#Special_coding_for_MACed_entries

You are trying to modify the trust entries...


bob
(In reply to some of the questions in comment #3)
> in the "certutil -E -d sql:DB -n email@address -i email-cert.der -t ',,p'
> did you supply a password?

no.  I was never prompted for one.  As shown in the steps to reproduce,
the old key DB is empty, in case that matters.

>> PK11_SaveSMimeProfile adds 1 to the length of the string it passes for 
>> the CKA_NETSCAPE_EMAIL attribute, apparently to include the trailing NUL 
>> character (?!).  But the stored email address does not have a trailing NUL
>> character, so the string being stored always mismatches the string already
>> stored because their lengths differ by 1.  I think the "+1" in this line
>>    PK11_SETATTRS(attrs, CKA_NETSCAPE_EMAIL, 
>>                emailAddr, PORT_Strlen(emailAddr)+1); attrs++;
>> of PK11_SaveSMimeProfile is wrong.
>
> It may be, but we do know there is a NULL there., or the PORT_Strlen() is
> incorrect (or did I misunderstand your comment?).

We must be consistent about whether this attribute does or does not include
a trailing NUL.  Otherwise, if we create it with the trailing NUL, but search
for it without the trailing NUL, or vice versa, we will never find a match,
because a match requires that the value lengths match.

I observed that in that same source file, there are two places that setup an 
attribute template for CKA_NETSCAPE_EMAIL.  Both use strlen.  One adds one,
the other does not.  


I agree that lg_mkHandle was OK.

>> 2. The value in local variable db should not be pointing at the legacy DB.
>> That value is returned by  a call to SFTK_GET_SDB(handle) which is defined as
>>       ((handle)->update ? (handle)->update : (handle)->db)
>> So, even though the update is done, handle->update still points to the
>> legacy DB.  That's the second problem.  I don't have a patch for that. :(
> 
> This is probable the issue. Why isn't update cleared? 

That's the question I hope you will answer.  Another you can answer is:
where (in what places) SHOULD it be cleared?

> BTW which handle are we looking at there, the Cert or the key datbase...

cert.  We are definitely past the end of NSS_Initialize.
When we reach sftk_DBInit, the stack is:

softokn3.dll!sftk_DBInit()          Line 2512
softokn3.dll!SFTK_SlotReInit()      Line 2090 + 0x9f bytes
softokn3.dll!SFTK_SlotInit()        Line 2195 + 0x1d bytes
softokn3.dll!nsc_CommonInitialize() Line 2569 + 0x1f bytes
softokn3.dll!NSC_Initialize()       Line 2629 + 0xb bytes
nss3.dll!secmod_ModuleInit()        Line 146 + 0xf bytes
nss3.dll!SECMOD_LoadPKCS11Module()  Line 378 + 0xd bytes
nss3.dll!SECMOD_LoadModule()        Line 323 + 0x9 bytes
nss3.dll!SECMOD_LoadModule()        Line 338 + 0x11 bytes
nss3.dll!nss_Init()                 Line 536 + 0xd bytes
nss3.dll!NSS_Initialize()           Line 663 + 0xc1 bytes
certutil.exe!certutil_main()        Line 2339 + 0x27 bytes
certutil.exe!main()                 Line 2934 + 0xf bytes

When we reach line 2626
    if (needUpdate) {
needUpdate is true.  sftkdbCall_open succeeeds.
*certDB and *keyDB are both non-NULL.

At line 2646, we see this curious double assignment:

		(*keyDB)->updateDBIsInit = PR_TRUE;
		(*keyDB)->updateDBIsInit = 
			(sftkdb_HasPasswordSet(*keyDB) == SECSuccess) ?
			 PR_TRUE : PR_FALSE;

I wonder if one of those statements is supposed to read;
                (*certDB)->updateDBIsInit = PR_TRUE;
because, as it is, (*certDB)->updateDBIsInit contains an uninitialized value.

Finally, at line 2650, we see 

		/* if the password on the key db is NULL, kick off our update
		 * chain of events */
		sftkdb_CheckPassword((*keyDB), "", &tokenRemoved);

stfkdb_CheckPassword calls sftk_getPWSDB(keydb) which returns keydb->update.
The call to (*db->sdb_GetMetaData) returns CRV_OK.
The call to sftkdb_passwordToKey return SECSuccess.
The call to sftkdb_DecryptAttribute returns SECFailure (no surprise), so we 
goto done where we return SECFailure to the caller, sftk_DBInit.  But 
sftk_DBInit ignores the return value from stfkdb_CheckPassword, and returns 
leaving handle->update in that state.  

Maybe that's OK, and maybe there's some subsequent function that is 
supposed to finish the upgrade, but it doesn't happen.
The shared DB wiki page says:

> The first time the new application version runs, when NSS first creates 
> the shareable databases, NSS will automatically detect instances of old 
> DBM databases and will upgrade those legacy databases to the new shareable 
> ones without user interaction.

But in fact, it seems that user interaction is necessary.  At a miniimum,
the user must authenticate to the DB to do the key DB upgrade.  

The problem here seems to be that the DB upgrade code assumes that, if an
old DB exists and had a non-empty password, then the user will authenticate
to the token before attempting to do other operations, such as adding new
certs to the DB.  certutil -E is a type 2A application that does not 
behave as the upgrade design seems to assume it will.

It appears that the code will attempt to initiate the DB upgrade during 
C_Initialize, but will only get as far as creating empty new DB files,
before returning from C_Initialize.  If the application authenticates 
to the token before attempting to make any change, then perhaps the 
upgrade will take place.  Otherwise, (as shown by certutil -E) the DB 
is not actually upgraded, and the application continues to use the old 
DB rather than the new one.  

This seems to be a design issue.  Perhaps, before returning from C_Initialize 
in this partially upgraded state, the softoken needs to remember that it 
is in the middle of an upgrade, and then, in subsequent PKCS#11 API calls,
it should not allow any changes to be made to the token until the user has 
authenticated to the token and the upgrade has been completed.  

Bob, I'm looking to you to figure out the solution to this problem.
I'm marking this as a FIPS bug because it almost certainly involves 
changes to the softoken.
Priority: -- → P1
Whiteboard: FIPS
Target Milestone: --- → 3.12.3
If the databases were not upgraded, then, even though the are created, the are not marked as initialized yet. Subsequent opens will treat the new databases 'as if they aren't there' if they haven't been initialized. The particular scenario has been tested.

I think the problem here is we need to make sure we authenticate in the update case before we try to load the certificate. I need to go back and look to see if I try to update the two databases separately.

bob
OK, I think I know what the issue is. We don't update either database until we have the password. Softoken opens both databases, running against the old one until we do the upgrade. Basically it means we can't do any writes to the database until we have a password. I'm testing a patch now that prevents softoken writes to the database handle until we have updated, returning 'user not logged in'.

bob
With this patch we no longer have any code trying to guess if we are using FIPS mode or some other token. This code will have the benefit that if the token can be written to without a password, that will now work automatically.

Ideally the relevant interfaces in pk11wrap would have automatically prompted, but these import interfaces do not take a password_arg (sigh).
Patches Option 1 and Option 2 will handle the case of silently writing to the old database before it's initialized.

The last patch makes certutil more friendly. Rather than trying to predict which tokens need to be logged-in certutil now logs into tokens only when required.

bob
Comment on attachment 352389 [details] [diff] [review]
Option 2: only pass back the 'not logged in' error

Bob, You have not asked for review for this patch, so I'm not giving
it r+ or r-.  Here are a couple of comments though.

1. This patch adds two calls of CERT_MapStanError() inside function
PK11_ImportCert.  If nssItemCreate fails, it calls CERT_MapStanError
then goes to loser, where it calls CERT_MapStanError again.
I think the second of those calls is both necessary and sufficient.
The first of them should be removed, I believe.

2. The crash in sftkdb_write was because that function is inconsistent
about which "db" pointer it uses.  In all cases except one, it uses
the local db variable, whose value was gotten from this statement.
1099     db = SFTK_GET_SDB(handle);
e.g., it uses 
1111     crv = (*db->sdb_Begin)(db);
1154     crv = (*db->sdb_Commit)(db);
  
But in one case, instead of using that db variable, it uses a different
value of db

1159         (*handle->db->sdb_Abort)(handle->db);

If it had been consistent, and used the db variable there, it would
not have crashed.  I believe that needs to be fixed, too, e.g. 

-         (*handle->db->sdb_Abort)(handle->db);
+         (*db->sdb_Abort)(db);
Comment on attachment 352390 [details] [diff] [review]
Certutil Fix. No longer try to guess if we should authenticate.

Bob, I have questions about this patch to certutil.

Did you find this patch to be NECESSARY?

Did you find, with the previous patch to softoken and pk11wrap 
applied, that the certutil -E command simply failed with an error 
about not being logged in, rather than prompting the user and 
succeeding?

If so, I think that's a problem.  It means that programs will not
necessarily be able to do an "automatic upgrade" without being 
modified to authenticate in places where they did not do so before.
That would be a problem for our backward compatibility story, IMO.
Re comment 13 

RE: review -- You gave exactly what I was looking for. Basically comments on the current patch.

issue 1: yeah I saw that when looking at the diffs. The topmost CERT_MapStanError() should go away. (part of the reason I didn't mark r?)

issue 2: OK, I'll take a look at that one... it just looks like a coding bug.

To write the cert, yes.

We have 2 possible semantics we could implement here....

Semantic 1 - Continue to use the old database in *ALL* cases (including writes) until the application does some operation the authenticates to the token.

Semantic 2 - All operations except writes will work until the application authenticates.

If I just fix abort, we will return to Semantic 1, which you thought confusing. If we take my patch we will operation with Semantic 2.

In some sense, Certutil was broken. It tried to know to much about the semantic of the given token. Whether or not we choose semantic 1 or 2, we should fix certutil (as well as pk11cert.c, which was probably the bug that requires the certutil hacks to begin with...).

Anyway this is the fundamental question I need answered to supply a proper patch. I can go either way on this (the issue is the short term issue between the time you start an upgrade and you authenticate to NSS is R/W mode).

bob
Bob, I vote for semantic 2 and the related certutil fix. 
I'd give r+ except for the issues in comment 13 and comment 15 above.
The 2 review issues were

1) remove redundant map call.
2) fixing abort to be consistant with it's corresponding Begin.
Attachment #367336 - Flags: review?(nelson)
Comment on attachment 352390 [details] [diff] [review]
Certutil Fix. No longer try to guess if we should authenticate.

certutil patch had no comments, so submitting for review.
Attachment #352390 - Flags: review?(nelson)
Comment on attachment 352390 [details] [diff] [review]
Certutil Fix. No longer try to guess if we should authenticate.

r=nelson for this patch.
Attachment #352390 - Flags: review?(nelson) → review+
Attachment #367336 - Flags: review?(nelson) → review+
Comment on attachment 367336 [details] [diff] [review]
Fix review issues for option 2

r=nelson
Checking in cmd/certutil/certutil.c;
/cvsroot/mozilla/security/nss/cmd/certutil/certutil.c,v  <--  certutil.c
new revision: 1.146; previous revision: 1.145
done



Checking in lib/pk11wrap/pk11akey.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11akey.c,v  <--  pk11akey.c
new revision: 1.24; previous revision: 1.23
done
Checking in lib/pk11wrap/pk11cert.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11cert.c,v  <--  pk11cert.c
new revision: 1.169; previous revision: 1.168
done
Checking in lib/softoken/sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v  <--  sftkdb.c
new revision: 1.21; previous revision: 1.20
done
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.